This is a short introduction on how to use AzureStor.
AzureStor implements an interface to Azure Resource Manager, which you can use manage storage accounts: creating them, retrieving them, deleting them, and so forth. This is done via the appropriate methods of the az_resource_group
class. For example, the following code shows how you might create a new storage account from scratch.
# create a new resource group for the storage account
rg <- AzureRMR::az_rm$
new(tenant="{tenant_id}", app="{app_id}", password="{password}")$
get_subscription("{subscription_id}")$
create_resource_group("myresourcegroup", location="australiaeast")
# create the storage account
stor <- rg$create_storage_account("mynewstorage")
stor
# <Azure resource Microsoft.Storage/storageAccounts/mynewstorage>
# Account type: StorageV2
# SKU: name=Standard_LRS, tier=Standard
# Endpoints:
# dfs: https://mynewstorage.dfs.core.windows.net/
# web: https://mynewstorage.z26.web.core.windows.net/
# blob: https://mynewstorage.blob.core.windows.net/
# queue: https://mynewstorage.queue.core.windows.net/
# table: https://mynewstorage.table.core.windows.net/
# file: https://mynewstorage.file.core.windows.net/
# ---
# id: /subscriptions/35975484-5360-4e67-bf76-14fcb0ab5b9d/resourceGroups/myresourcegroup/providers/Micro ...
# identity: NULL
# location: australiaeast
# managed_by: NULL
# plan: NULL
# properties: list(networkAcls, supportsHttpsTrafficOnly, encryption, provisioningState, creationTime,
# primaryEndpoints, primaryLocation, statusOfPrimary)
# tags: list()
# ---
# Methods:
# check, delete, do_operation, get_account_sas, get_blob_endpoint, get_file_endpoint, get_tags, list_keys,
# set_api_version, set_tags, sync_fields, update
Without any options, this will create a storage account with the following parameters: - General purpose account (all storage types supported) - Locally redundant storage (LRS) replication - Hot access tier (for blob storage) - HTTPS connection required for access
You can change these by setting the arguments to create_storage_account()
. For example, to create an account with geo-redundant storage replication and the default blob access tier set to “cool”:
stor2 <- rg$create_storage_account("myotherstorage",
replication="Standard_GRS",
access_tier="cool")
And to create a blob storage account and allow non-encrypted (HTTP) connections:
blobstor <- rg$create_storage_account("mynewblobstorage",
kind="blobStorage",
https_only=FALSE)
You can verify that these accounts have been created by going to the Azure Portal (https://portal.azure.com/).
One factor to remember is that all storage accounts in Azure share a common namespace. For example, there can only be one storage account named “mynewstorage” at a time, across all Azure users.
To retrieve an existing storage account, use the get_storage_account()
method. Only the storage account name is required.
# retrieve one of the accounts created above
stor2 <- rg$get_storage_account("myotherstorage")
Finally, to delete a storage account, you simply call its delete()
method. Alternatively, you can call the delete_storage_account()
method of the az_resource_group
class, which will do the same thing. In both cases, AzureStor will prompt you for confirmation that you really want to delete the storage account.
# delete the storage accounts created above
stor$delete()
stor2$delete()
blobstor$delete()
# if you don't have a storage account object, use the resource group method:
rg$delete_storage_account("mynewstorage")
rg$delete_storage_account("myotherstorage")
rg$delete_storage_account("mynewblobstorage")
Perhaps the more relevant part of AzureStor for most users is its client interface to storage. With this, you can upload and download files and blobs, create containers and shares, list files, and so on. Unlike the ARM interface, the client interface uses S3 classes. This is for a couple of reasons: it is more familiar to most R users, and it is consistent with most other data manipulation packages in R, in particular the tidyverse.
The starting point for client access is the storage_endpoint
object, which stores information about the endpoint of a storage account: the URL that you use to access storage, along with any authentication information needed. The easiest way to obtain an endpoint object is via the storage account resource object’s get_blob_endpoint()
, get_file_endpoint()
and get_adls_endpoint()
methods:
# create the storage account
rg <- AzureRMR::az_rm$
new(tenant="{tenant_id}", app="{app_id}", password="{password}")$
get_subscription("{subscription_id}")$
get_resource_group("myresourcegroup")
stor <- rg$create_storage_account("mynewstorage")
stor$get_blob_endpoint()
# Azure blob storage endpoint
# URL: https://mynewstorage.blob.core.windows.net/
# Access key: <hidden>
# Account shared access signature: <none supplied>
# Storage API version: 2018-03-28
stor$get_file_endpoint()
# Azure file storage endpoint
# URL: https://mynewstorage.file.core.windows.net/
# Access key: <hidden>
# Account shared access signature: <none supplied>
# Storage API version: 2018-03-28
stor$get_adls_endpoint()
# Azure Data Lake Storage Gen2 endpoint
# URL: https://mynewstorage.dfs.core.windows.net/
# Access key: <hidden>
# Account shared access signature: <none supplied>
# Storage API version: 2018-03-28
More practically, you will usually want to work with a storage endpoint without having to go through the process of authenticating with Azure Resource Manager. Often, you may not have any ARM credentials to start with (a tenant ID and/or service principal details). In this case, you can create the endpoint object directly with the blob_endpoint()
, file_endpoint()
and adls_endpoint()
functions. When you create the endpoint this way, you have to provide the access key explicitly (assuming you know what it is).
# same as using the get_xxxx_endpoint() methods above
blob_endpoint("https://mynewstorage.blob.core.windows.net/",
key="mystorageaccesskey")
file_endpoint("https://mynewstorage.file.core.windows.net/",
key="mystorageaccesskey")
adls_endpoint("https://mynewstorage.dfs.core.windows.net/",
key="mystorageaccesskey")
Instead of an access key, you can provide a shared access signature (SAS) to gain authenticated access. The main difference between using a key and a SAS is that the former unlocks access to the entire storage account. A user who has a key can access all containers and files, and can transfer, modify and delete data without restriction. On the other hand, a user with a SAS can be limited to have access only to specific containers, or be limited to read access, or only for a given span of time, and so on. This is usually much better in terms of security.
Usually, the SAS will be provided to you by your system administrator. However, if you have the storage acccount resource object, you can generate and use a SAS as follows. Note that generating a SAS requires the storage account’s access key.
# shared access signature: read/write access, container+object access, valid for 8 hours
sas <- stor$get_account_sas(permissions="rw",
resource_types="co",
start=Sys.time(),
end=Sys.time() + 8 * 60 * 60,
key=stor$list_keys()[1])
# create an endpoint object with a SAS, but without an access key
blob_endp <- stor$get_blob_endpoint(key=NULL, sas=sas)
If you don’t have a key or a SAS, you will only have access to unauthenticated (public) containers.