See all documentation on the googleComputeEngineR website
The following are some R scripts for common workflows. They assume you have previously signed up and setup a Google project, authentication and SSH.
This gives you a private RStudio Server with your custom packages and users.
In summary it:
Here we are setting up a 13GB RAM instance, as found via gce_list_machinetype()
library(googleComputeEngineR)
## setting up a 13GB RAM instance
## see gce_list_machinetype() for options of predefined_type
vm <- gce_vm(template = "rstudio-hadleyverse",
name = "rstudio-team",
username = "mark", password = "mark1234",
predefined_type = "n1-highmem-2")
## wait a bit, login at the IP it gives you
You can add users via:
gce_rstudio_adduser(vm, username = "bill", password = "flowerpot")
You can then login at the IP address given via vm
or gce_get_external_ip(vm)
, and install packages as you would on RStudio Desktop.
Every Google project has its own private Docekr repo called the Container Registry.
This command takes the running container that has your changes and saves it to there.
By default, the RStudio container runs with name “rstudio” which you can see via containers(vm)
gce_push_registry(vm,
save_name = "my_rstudio",
container_name = "rstudio")
This can take a while the first time so go make a cup of tea. If successful you should be able to see your container saved at this URL https://console.cloud.google.com/kubernetes/images/list
Now say you want a larger more powerful instance, or to launch another with your settings. You can now pull from the Container Registry and start up a VM with your settings enabled.
We use template=rstudio
to make sure the right ports and so forth are configured for your Rstudio, and dynamic_image="my_rstudio"
to instruct the template to pull from your own image instead of using the default. You need to make sure the dynamic image is based on an RStudio one for this to work correctly.
The function gce_tag_container
constructs the name of the custom image on your Container Registry for you.
## construct the correct tag name for your custom image
tag <- gce_tag_container("my_rstudio")
# gcr.io/mark-edmondson-gde/my_rstudio
## start a 50GB RAM instance
vm2 <- gce_vm(name = "rstudio-big",
predefined_type = "n1-highmem-8",
template = "rstudio",
dynamic_image = tag)
## wait for it to launch
You don’t get charged for stopped containers, and the next time you start them they will start within 20 seconds.
gce_vm_stop(vm2)
gce_vm_stop(vm)
This workflow takes advatage of the future
integration to run your local R-functions within a cluster of GCE machines.
You can do this to throw up expensive computations by spinning up a cluster and tearing it down again once you are done.
In summary, this workflow:
The example below uses a default r-base
template, but you can use the steps above to create a dynamic_template
pulled from the Container Registry if required.
Instead of the more generic gce_vm()
that is used for more interactive use, we create the instances directly using gce_vm_container()
so it doesn’t wait for the job to complete before starting the next (not useful if you have a lot of VMs). You can then use gce_get_zone_op()
to get the job status.
library(future)
library(googleComputeEngineR)
## names for your cluster
vm_names <- c("vm1","vm2","vm3")
## create the cluster using default template for r-base
## creates jobs that are creating VMs in background
jobs <- lapply(vm_names, function(x) {
gce_vm_container(file = get_template_file("r-base"),
predefined_type = "n1-highmem-2",
name = x)
})
jobs
# [[1]]
# ==Operation insert : PENDING
# Started: 2016-11-16 06:52:58
# [[2]]
# ==Operation insert : PENDING
# Started: 2016-11-16 06:53:04
# [[3]]
# ==Operation insert : PENDING
# Started: 2016-11-16 06:53:09
## check status of jobs
lapply(jobs, function(x) gce_get_zone_op(x))
# [[1]]
# ==Operation insert : DONE
# Started: 2016-11-16 06:52:58
# Ended: 2016-11-16 06:53:14
# Operation complete in 16 secs
# [[2]]
# ==Operation insert : DONE
# Started: 2016-11-16 06:53:04
# Ended: 2016-11-16 06:53:20
# Operation complete in 16 secs
# [[3]]
# ==Operation insert : DONE
# Started: 2016-11-16 06:53:09
# Ended: 2016-11-16 06:53:30
# Operation complete in 21 secs
## get the VM objects
vms <- lapply(vm_names, gce_vm)
It is safest to setup the SSH keys seperately for multiple instances, using gce_ssh_setup()
- this is normally called for you when you first connect to a VM.
## set up SSH for the VMs
vms <- lapply(vms, gce_ssh_setup)
We now make the VM cluster as per details given in the future README
## make a future cluster
plan(cluster, workers = as.cluster(vms))
The cluster is now ready to recieve jobs. You can send them by simply using %<-%
instead of <-
.
## use %<-% to send functions to work on cluster
## See future README for details: https://github.com/HenrikBengtsson/future
a %<-% Sys.getpid()
## make a big function to run asynchronously
f <- function(my_data, args){
## ....expensive...computations
result
}
## send to cluster
result %<-% f(my_data)
For long running jobs you can use future::resolved
to check on its progress.
## check if resolved
resolved(result)
[1] TRUE
Remember to shut down your cluster. You are charged per minute, per instance of uptime.
## shutdown instances when finished
lapply(vms, gce_vm_stop)
This workflow demonstrates how you can take advatage of Dockerfiles to customise the VM templates.
Using Dockerfiles
is recommended if you are making a lot of changes to a template, as its a lot easier to keep track on what is happening.
In summary:
Dockerfile
in a folder with any other files or dependencies, such as crondocker_build
to upload and build your custom Docker image on the VMBuild VMs should be more powerful than the default machine type (f1-micro
) else there is a danger of it hanging for big expensive builds.
library(googleComputeEngineR)
## installs rocker/hadleyverse docker image
vm <- gce_vm(name = "build-schedule-r",
template = "rstudio-hadleyverse",
predefined_type = "n1-standard-2")
Dockerfile
The Dockerfile
here is available via get_dockerfile("hadleyverse-crontab")
.
It is shown below, which you could base your own upon. This one installs cron
for scheduling, and nano
a simple text editor. It then also installs some libraries needed for my scheduled scripts:
From CRAN:
googleAuthR
- google authenticationshinyFiles
- for cron jobsgoogleCloudStorageR
- for uploading to Google Cloud StoragebigQueryR
- for uploading to BigQuerygmailR
- an email R packagegoogleAnalyticsR
- for downloading Google Analytics dataFrom Github
bnosac/cronR
- to help with creating cron jobs within RStudio.FROM rocker/hadleyverse
MAINTAINER Mark Edmondson (r@sunholo.com)
# install cron and R package dependencies
RUN apt-get update && apt-get install -y \
cron \
nano \
## clean up
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/ \
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds
## Install packages from CRAN
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR shinyFiles googleCloudStorage bigQueryR gmailR googleAnalyticsR \
## install Github packages
&& Rscript -e "devtools::install_github(c('bnosac/cronR'))" \
## clean up
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds \
docker_build
We now upload to the instance the Dockerfile, and build a new docker image called my-cron-verse
.
This can take some time the first time, so its time for another cup of tea.
## build a new image based on rocker/hadleyverse with this Dockerfile
docker_build(vm,
dockerfile = get_dockerfile("hadleyverse-crontab"),
new_image = "my-cron-verse")
## wait for it to build (5mins +)
# ...
# [0m ---> 059fe3ed926a
# Removing intermediate container 7695f3dc071f
# Successfully built 059fe3ed926a
# Using existing public key in /Users/mark/.ssh/google_compute_engine.pub
# REPOSITORY TAG IMAGE ID CREATED SIZE
# my-cron-verse latest 059fe3ed926a 2 seconds ago 1.933 GB
# rocker/hadleyverse latest 05e3b636e90b 24 hours ago 1.921 GB
This image is now saved to the Container Registry
gce_push_registry(vm, "my-cron-verse", container_name = "myrstudio")
Once that is done, we don’t need this instance anymore.
gce_vm_delete(vm)
You can now launch instances using your constructed image.
You can also use your custom image to create further Dockerfiles
that use it as a dependency, using gce_tag_container()
to get its correct name.
## now start an instance using our rstudio image in cloud-config
## this takes care of rstudio friendly settings, restart behaviour etc.
tag <- gce_tag_container("my-rstudio")
## rstudio template, but with your private rstudio build
vm2 <- gce_vm(name = "myrstudio2",
template = "rstudio",
dynamic_image = tag,
username = "mark",
password = "mark1234")
You can check when the images are downloaded by using gce_check_container()
## check on progress on the container pull
gce_check_container(vm2, "rstudio")
# -- Logs begin at Thu 2016-11-17 14:54:38 UTC, end at Thu 2016-11-17 14:57:38 UTC. --
# Nov 17 14:54:43 myrstudio2 docker[1045]: Unable to find image 'gcr.io/mark-edmondson-gde/my-rstudio:latest' locally
# Nov 17 14:54:47 myrstudio2 docker[1045]: latest: Pulling from mark-edmondson-gde/my-rstudio
# Nov 17 14:54:47 myrstudio2 docker[1045]: a84f66826a7f: Pulling fs layer
# ...
# ...
# Nov 17 14:58:36 myrstudio2 docker[1045]: [cont-init.d] conf: exited 0.
# Nov 17 14:58:36 myrstudio2 docker[1045]: [cont-init.d] done.
# Nov 17 14:58:36 myrstudio2 docker[1045]: [services.d] starting services
# Nov 17 14:58:36 myrstudio2 docker[1045]: [services.d] done.
## your custom rstudio instance is now ready
> vm2
# ==Google Compute Engine Instance==
#
# Name: myrstudio2
# Created: 2016-11-17 06:54:18
# Machine Type: f1-micro
# Status: RUNNING
# Zone: europe-west1-b
# External IP: 104.199.67.250
# Disks:
# deviceName type mode boot autoDelete
# 1 myrstudio2-boot-disk PERSISTENT READ_WRITE TRUE TRUE
You can delete your instances, knowing that the custom image is safe in the Container Registry, or just stop them using gce_vm_stop()
and start again with gce_vm_start()
## delete the instance (the container is safe)
gce_vm_delete(vm2)
A demo script for scheduling is below.
It is not recommended to rely on loading data into a Docker contianer, so we do not include data in the Docker container. Instead, call outside dedicated data stores such as BigQuery or Cloud Storage, which if you are using Google Compute Engine you have access to under the same project.
In summary the script below:
Log into your RStudio Server instance and create the following script:
library(googleCloudStorageR)
library(bigQueryR)
library(gmailr)
library(googleAnalyticsR)
## set options for authentication
options(googleAuthR.client_id = XXXXX)
options(googleAuthR.client_secret = XXXX)
options(googleAuthR.scopes.selected = c("https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/analytics.readonly"))
## authenticate
## using service account, ensure service account email added to GA account, BigQuery user permissions set, etc.
googleAuthR::gar_auth_service("auth.json")
## get Google Analytics data
gadata <- google_analytics_4(123456,
date_range = c(Sys.Date() - 2, Sys.Date() - 1),
metrics = "sessions",
dimensions = "medium",
anti_sample = TRUE)
## upload to Google BigQuery
bqr_upload_data(projectId = "myprojectId",
datasetId = "mydataset",
tableId = paste0("gadata_",format(Sys.Date(),"%Y%m%d")),
upload_data = gadata,
create = TRUE)
## upload to Google Cloud Storage
gcs_upload(gadata, name = paste0("gadata_",Sys.Date(),".csv"))
## get top medium referrer
top_ref <- paste(gadata[order(gadata$sessions, decreasing = TRUE),][1, ], collapse = ",")
# 3456, organic
## send email with todays figures
daily_email <- mime(
To = "bob@myclient.com",
From = "bill@cool-agency.com",
Subject = "Todays winner is....",
body = paste0("Top referrer was: "),top_ref)
send_message(daily_email)
Save the script within RStudio as daily-report.R
You can then use cronR
to schedule the script for a daily extract.
Use its RStudio addin, or in the console issue:
library(cronR)
cron_add(paste0("Rscript ", normalizePath("daily-report")), frequency = "daily")
# Adding cronjob:
# ---------------
#
# ## cronR job
# ## id: fe9168c7543cc83c1c2489de82216c0f
# ## tags:
# ## desc:
# 0 0 * * * Rscript /home/mark/demo-schedule.R
The script will then run every day.
Make sure to run the script locally and in a test CRON job first. Once satisfied, you can run locally the gce_push_registry()
again to save the RStudio image with your scehduled script embedded within.
If you want to call the data from a Shiny app, then you can use bqr_query
from bigQueryR or gcs_get_object
from googleCloudStorageR within your Shiny script to pull in the data into your app.
This is useful to have a dedicated Docker container that has all the libraries, files and scripts necessary to run your app, on any machine that can run Docker, without worrying about versions etc.
This example uses a local Dockerfile
to install the libraries you need for your Shiny app, but also copies your Shiny app into itself so its all self-contained.
The Shiny app can then be deployed on new instances via the gce_template()
function.
In summary:
We start with a Shiny template:
library(googleComputeEngineR)
## make sure the instance is big enough to install,
## the default "f1-micro" does not compile packages easily
vm <- gce_vm(name = "build-app", template = "shiny", predefined_type = "n1-standard-2")
This is as before, but the Dockerfile
also includes a COPY
command to copy necessary Shiny ui.R
and server.R
files into the Docker image.
The Shiny app used is the googleAuthR
demo app, and the build directory can be found via: get_dockerfolder("shiny-googleAuthRdemo")
FROM rocker/shiny
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev \
## clean up
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/ \
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds
## Install packages from CRAN
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
## clean up
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds
## assume shiny app is in build folder /shiny
COPY ./shiny/ /srv/shiny-server/myapp/
Note the COPY
command at the end - this copies from a folder in the same location as the Dockerfile
. and then places it in the /srv/shiny-server/
folder which is the default location for Shiny apps - this location means that the Shiny app will be avialable at your.ip.addr.ess/myapp/
We also install googleAuthR
from CRAN, and a Debian dependency for googleAuthR
that is needed, libssl-dev
via apt-get
.
The file structure for the build is then:
list.files(get_dockerfolder("shiny-googleAuthRdemo"), recursive = TRUE)
# "Dockerfile" "shiny/DESCRIPTION" "shiny/readme.md" "shiny/server.R" "shiny/ui.R"
We now build the custom image:
docker_build(vm,
dockerfolder = get_dockerfolder("shiny-googleAuthRdemo"),
new_image = "shiny_gar")
On a successful build, you can now upload to the Container Registry.
## push up to your private Google Container registry
gce_push_registry(vm,
save_name = "shiny_gar",
image_name = "shiny_gar")
# ...
# ...
# b363013633c9: Pushed
# 4809649dffb9: Pushed
# latest: digest: sha256:cb233f547d84dd94e7616fa7615522e15213d65cc2abb423dd4f3305d19309ce size: 19731
You can now deploy your Shiny app on any instance by calling it from your Container Registry:
## make new Shiny template VM with your self-contained Shiny app
vm2 <- gce_vm(name = "deployedapp",
template = "shiny",
dynamic_image = gce_tag_container("shiny_gar"),
predefined_type = "n1-standard-2")
## check for when image has finished downloading
gce_check_container(vm2, "shinyserver")
# ...
# ...
# Nov 17 22:11:15 myshinyapp2 docker[1039]: [cont-init.d] done.
# Nov 17 22:11:15 myshinyapp2 docker[1039]: [services.d] starting services
# Nov 17 22:11:15 myshinyapp2 docker[1039]: [services.d] done.
Your app should now be running on your IP + folder in Dockerfile, such as http://123.456.XXX.XXX/myapp/
Clean up the VMs to avoid unnecessary costs:
# delete build VM
gce_vm_delete(vm)
# stop and start production shiny app as needed
gce_vm_stop(vm2)
The below installs your Github repo into the Docker container using a small Dockerfile
.
In summary:
library(googleComputeEngineR)
## start an opencpu template
vm <- gce_vm(name = "opencpu", template = "opencpu", predefined_type = "n1-standard-2")
## wait for opencpu image to load
gce_check_container(vm, "opencpu")
This Dockerfile is available via get_dockerfolder("opencpu-installgithub")
and below. It installs an OpenCPU package from my prediction of user URLs for prefetching application.
FROM opencpu/base
MAINTAINER Mark Edmondson (r@sunholo.com)
# install any package dependencies
RUN apt-get update && apt-get install -y \
nano \
## clean up
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/ \
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds
## Install your custom package from Github
RUN Rscript -e "devtools::install_github(c('MarkEdmondson1234/predictClickOpenCPU'))"
The Dockerfile is used to build the custom image below:
## build a docker image with your package installed
docker_build(vm,
dockerfolder = get_dockerfolder("opencpu-installgithub"),
new_image = "opencpu-predictclick")
## push up to your private Google Container registry
gce_push_registry(vm,
save_name = "opencpu-predictclick",
image_name = "opencpu-predictclick")
In this case we don’t start a new instance, just stop the running OpenCPU container and start our own. We need to stop the default container to free up the ports 80
and 8004
that are needed for OpenCPU to work.
## stop default opencpu container
docker_cmd(vm, "stop opencpu-server")
## run custom opencpu server
docker_run(vm,
image = "opencpu-predictclick",
name = "predictclick",
detach = TRUE,
docker_opts = "-p 80:80 -p 8004:8004")
Clean up when you are done to avoid charges.
gce_vm_stop(vm)