The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The Google COVID-19 data repository is a comprehensive open repository of COVID-19 data.
This vignette shows how to use (some of) this data through the
diseasystore
package.
First, it is a good idea to copy the relevant Google COVID-19 data
files locally and store that location as an option for the package.
?DiseasystoreGoogleCovid19
uses only the age-stratified
metrics for COVID-19, so only a subset of the repository is needed to
download.
# First we set the path we want to use as an option
options(
"diseasystore.DiseasystoreGoogleCovid19.source_conn" =
file.path("local", "path")
)
# Ensure folder exists
source_conn <- diseasyoption("source_conn", "DiseasystoreGoogleCovid19")
if (!dir.exists(source_conn)) {
dir.create(source_conn, recursive = TRUE, showWarnings = FALSE)
}
# Define the Google files to download
google_files <- c("by-age.csv", "demographics.csv", "index.csv", "weather.csv")
# Download each file and compress them to reduce storage
purrr::walk(google_files, ~ {
url <- paste0(diseasyoption("remote_conn", "DiseasystoreGoogleCovid19"), .)
destfile <- file.path(
diseasyoption("source_conn", "DiseasystoreGoogleCovid19"),
.
)
if (!file.exists(destfile)) {
download.file(url, destfile)
}
})
The diseasystores
require a database to store its
features in. These should be configured before use and can be stored in
the packages options.
# We define target_conn as a function that opens a DBIconnection to the DB
target_conn <- \() DBI::dbConnect(duckdb::duckdb())
options(
"diseasystore.DiseasystoreGoogleCovid19.target_conn" = target_conn
)
Once the files are downloaded and the target DB is configured, we can
initialize the diseasystore
that uses the Google COVID-19
data.
Once configured such, we can use the feature store directly to get data.
# We can see all the available features in the feature store
ds$available_features
#> [1] "n_population" "age_group" "country_id" "country"
#> [5] "region_id" "region" "subregion_id" "subregion"
#> [9] "n_hospital" "n_deaths" "n_positive" "n_icu"
#> [13] "n_ventilator" "min_temperature" "max_temperature"
# And then retrieve a feature from the feature store
ds$get_feature(feature = "n_hospital",
start_date = as.Date("2020-01-01"),
end_date = as.Date("2020-06-01"))
#> # Source: table<dbplyr_etLUzA01xA> [?? x 5]
#> # Database: DuckDB v1.1.1 [B246705@Windows 10 x64:R 4.4.0/:memory:]
#> key_location key_age_bin n_hospital valid_from valid_until
#> <chr> <chr> <dbl> <date> <date>
#> 1 AR 2 0 2020-01-01 2020-01-02
#> 2 AR 3 0 2020-01-02 2020-01-03
#> 3 AR 9 NA 2020-01-02 2020-01-03
#> 4 AR 1 0 2020-01-04 2020-01-05
#> 5 AR 2 0 2020-01-04 2020-01-05
#> # ℹ more rows
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.