The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The RAthena
package aims to make it easier to work with
data stored in AWS Athena
.
RAthena
package attempts to provide three levels of
interacting with AWS Athena:
AWS Athena
backend utilising the AWS SDK paws
. This
includes configuring AWS Athena Work Groups
to assuming different roles within AWS
when connecting to
AWS Athena
.RAthena
, by providing a DBI
interface to AWS Athena
. Users are able to interact with
AWS Athena
utilising familiar functions and methods they
have used for other Databases from R.dplyr
is coming more popular, RAthena
aims to
give dplyr
a seamless interface into
AWS Athena
.RAthena
:As RAthena
utilising the python AWS SDK
boto3
, Python 3+ is required. Please install Python 3+
either by Python or Python
Anaconda. To install RAthena
:
# cran version
install.packages("RAthena")
# Dev version
::install_github("dyfanjones/RAthena") remotes
Next is to install Python boto3
. This can be done either
by RAthena
’s installation method:
::install_boto() RAthena
Or pip method:
pip install boto3
If RAthena
doesn’t pick up boto3
after
using install_boto()
, please consider specifying the python
environment.install_boto()
creates RAthena
environment. This is either a Python virtual environment or a conda
environment depending on your system.
library(DBI)
# Specify python conda environment and force reticulate to use it
::use_condaenv("RAthena", required = TRUE)
reticulate
# Or specify python virtual environment and force reticulate to use it
::use_virtualenv("RAthena", required = TRUE)
reticulate
<- dbConnect(RAthena::athena()) con
Note: Python environments are not required if
boto3
is either in the root Python or if R and Python are
in their own environment (for example conda environment).
To help with users wishing to run RAthena
in a docker, a simple docker file has been
created here.
To set up the docker please refer to link.
For demo purposes we will use the example
docker and run it locally:
# build docker image
docker build . -t rathena
# start container with aws credentials passed from local
docker run \
-e AWS_ACCESS_KEY_ID="$(aws configure get aws_access_key_id)" \
-e AWS_SECRET_ACCESS_KEY="$(aws configure get aws_secret_access_key)" \
-e AWS_SESSION_TOKEN="$(aws configure get aws_session_token)" \
-e AWS_DEFAULT_REGION="$(aws configure get region)" \
-it rathena
When running RAthena
in the docker environment you might
be required to let reticulate
know what python you are
using.
::use_python("/usr/bin/python3")
reticulate
library(DBI)
<- dbConnect(RAthena::athena(), s3_staging_dir = "s3://mybucket/") con
library(DBI)
library(RAthena)
<- dbConnect(athena())
con
# list all current work groups in AWS Athena
list_work_groups(con)
# Create a new work group
create_work_group(con, "demo_work_group", description = "This is a demo work group",
tags = tag_options(key= "demo_work_group", value = "demo_01"))
library(DBI)
<- dbConnect(RAthena::athena())
con
# Get metadata
dbGetInfo(con)
# $profile_name
# [1] "default"
#
# $s3_staging
# [1] ######## NOTE: Please don't share your S3 bucket to the public
#
# $dbms.name
# [1] "default"
#
# $work_group
# [1] "primary"
#
# $poll_interval
# NULL
#
# $encryption_option
# NULL
#
# $kms_key
# NULL
#
# $expiration
# NULL
#
# $region_name
# [1] "eu-west-1"
#
# $boto3
# [1] "1.11.5"
#
# $RAthena
# [1] "1.7.1"
# create table to AWS Athena
dbWriteTable(con, "iris", iris)
dbGetQuery(con, "select * from iris limit 10")
# Info: (Data scanned: 860 Bytes)
# sepal_length sepal_width petal_length petal_width species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.9 3.0 1.4 0.2 setosa
# 3: 4.7 3.2 1.3 0.2 setosa
# 4: 4.6 3.1 1.5 0.2 setosa
# 5: 5.0 3.6 1.4 0.2 setosa
# 6: 5.4 3.9 1.7 0.4 setosa
# 7: 4.6 3.4 1.4 0.3 setosa
# 8: 5.0 3.4 1.5 0.2 setosa
# 9: 4.4 2.9 1.4 0.2 setosa
# 10: 4.9 3.1 1.5 0.1 setosa
library(dplyr)
<- tbl(con, "iris")
athena_iris
%>%
athena_iris select(species, sepal_length, sepal_width) %>%
head(10) %>%
collect()
# Info: (Data scanned: 860 Bytes)
# # A tibble: 10 x 3
# species sepal_length sepal_width
# <chr> <dbl> <dbl>
# 1 setosa 5.1 3.5
# 2 setosa 4.9 3
# 3 setosa 4.7 3.2
# 4 setosa 4.6 3.1
# 5 setosa 5 3.6
# 6 setosa 5.4 3.9
# 7 setosa 4.6 3.4
# 8 setosa 5 3.4
# 9 setosa 4.4 2.9
# 10 setosa 4.9 3.1
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.