The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Load Avro file into 'Apache Spark'
Version: 0.3.0
Author: Aki Ariga
Maintainer: Aki Ariga <chezou@gmail.com>
Description: Load Avro Files into 'Apache Spark' using 'sparklyr'. This allows to read files from 'Apache Avro' https://avro.apache.org/.
License: Apache License 2.0 | file LICENSE
BugReports: https://github.com/chezou/sparkavro
Encoding: UTF-8
LazyData: true
Imports: sparklyr, dplyr, DBI
RoxygenNote: 7.0.2
Suggests: testthat
Language: en-us
NeedsCompilation: no
Packaged: 2020-01-08 23:45:31 UTC; aki
Repository: CRAN
Date/Publication: 2020-01-10 04:40:02 UTC

Reads a Avro File into Apache Spark

Description

Reads a Avro file into Apache Spark using sparklyr.

Usage

spark_read_avro(
  sc,
  name,
  path,
  readOptions = list(),
  repartition = 0L,
  memory = TRUE,
  overwrite = TRUE
)

Arguments

sc

An active spark_connection.

name

The name to assign to the newly generated table.

path

The path to the file. Needs to be accessible from the cluster. Supports the ‘⁠"hdfs://"⁠’, ‘⁠"s3n://"⁠’ and ‘⁠"file://"⁠’ protocols.

readOptions

A list of strings with additional options.

repartition

The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning.

memory

Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?)

overwrite

Boolean; overwrite the table with the given name if it already exists?

Examples

## Not run: 
## If you haven't got a Spark cluster, you can install Spark locally like this
library(sparklyr)
spark_install(version = "2.0.1")

sc <- spark_connect(master = "local")
df <- spark_read_avro(
  sc,
  "twitter",
  system.file("extdata/twitter.avro", package = "sparkavro"),
  repartition = FALSE,
  memory = FALSE,
  overwrite = FALSE
)

spark_disconnect(sc)

## End(Not run)

Write a Spark DataFrame to a Avro file

Description

Serialize a Spark DataFrame to the Parquet format.

Usage

spark_write_avro(x, path, mode = NULL, options = list())

Arguments

x

A Spark DataFrame or dplyr operation

path

The path to the file. Needs to be accessible from the cluster. Supports the ‘⁠"hdfs://"⁠’, ‘⁠"s3n://"⁠’ and ‘⁠"file://"⁠’ protocols.

mode

Specifies the behavior when data or table already exists.

options

A list of strings with additional options. See http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.