The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The s3.resourcer
package is for accessing a file stored
in the AWS S3 system or in a HTTP S3 compatible object store such as minio. It makes use of the aws.s3 R package and of sparklyr when the S3 file store is
accessed through Apache Spark.
The resource is a file which location is described by a URL with
scheme s3
(Amazon Web Services S3 file store) or
s3+http
or s3+https
(Minio implementation of
the S3 API over HTTP). To authenticate, the AWS/HTTP S3 key is the
resource’s identity and the AWS/HTTP S3 secret is the resource’s
secret.
For instance this is a valid resource object that can be accessed by
the S3FileResourceGetter
:
library(s3.resourcer)
res <- resourcer::newResource(url="s3://my_bucket/mtcars.Rdata", format = "data.frame")
client <- resourcer::newResourceClient(res)
client$asDataFrame()
or
library(s3.resourcer)
res <- resourcer::newResource(url="s3+https://minio.example.org/test/mtcars.Rdata", format = "data.frame")
client <- resourcer::newResourceClient(res)
client$asDataFrame()
The resource is a Parquet
file which location is described by a URL with scheme
s3+spark
(Amazon Web Services S3 file store) or
s3+spark+http
or s3+spark+https
(Minio
implementation of the S3 API over HTTP). The dataset will not be
download as a file: instead of that Apache Spark will be used to access
the resource, with the help of sparklyr. To authenticate, the AWS/HTTP
S3 key is the resource’s identity and the AWS/HTTP S3 secret is the
resource’s secret.
For instance this is a valid resource object that can be accessed by
the S3SparkDBIConnector
:
library(s3.resourcer)
res <- resourcer::newResource(url="s3+spark://my_bucket/mtcars")
client <- resourcer::newResourceClient(res)
client$asTbl()
or
library(s3.resourcer)
res <- resourcer::newResource(url="s3+spark+https://minio.example.org/test/mtcars")
client <- resourcer::newResourceClient(res)
client$asTbl()
or for a Parquet file inside a Delta
Lake, the query parameter read
can be used
library(s3.resourcer)
res <- resourcer::newResource(url="s3+spark+https://minio.example.org/test/mtcars?read=delta")
client <- resourcer::newResourceClient(res)
client$asTbl()
A local installation of Spark is expected. If not found, Spark will be installed using the following code:
library(sparklyr)
spark_install(version="3.2.1", hadoop_version = "3.2")
jars <- c("https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.1/hadoop-aws-3.3.1.jar",
"https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.901/aws-java-sdk-bundle-1.11.901.jar",
"https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.1.0/delta-core_2.12-1.1.0.jar")
lapply(jars, function(jar) {
httr::GET(jar, write_disk(file.path(spark_home_dir(), "jars", basename(jar)), overwrite = TRUE))
})
You can adjust this to your needs.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.