The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Broad technical terms | |
Object | Description |
argset | A named list containing a set of arguments. |
analysis | These are the fundamental units that are scheduled in
|
plan | This is the overarching “scheduler”:
|
Different types of plans | |
Plan Type | Description |
Single-function plan | Same action function applied multiple times with different argsets applied to the same datasets. |
Multi-function plan | Different action functions applied to the same datasets. |
Plan Examples | |
Plan Type | Example |
Single-function plan | Multiple strata (e.g. locations, age groups) that you need to apply the same function to to (e.g. outbreak detection, trend detection, graphing). |
Single-function plan | Multiple variables (e.g. multiple outcomes, multiple exposures) that you need to apply the same statistical methods to (e.g. regression models, correlation plots). |
Multi-function plan | Creating the output for a report (e.g. multiple different tables and graphs). |
In brief, we work within the mental model where we have one (or more) datasets and we want to run multiple analyses on these datasets. These multiple analyses can take the form of:
table_1
) called multiple times with different argsets
(e.g. year=2019
, year=2020
).table_1
, table_2
) called multiple times
with different argsets (e.g. table_1
:
year=2019
, while for table_2
:
year=2019
and year=2020
)By demanding that all analyses use the same data sources we can:
By demanding that all analysis functions only use two arguments
(data
and argset
) we can:
By including all of this in one Plan
class, we can
easily maintain a good overview of all the analyses (i.e. outputs) that
need to be run.
We now provide a simple example of a single-function plan that shows how a person can develop code to provide graphs for multiple years. More examples are provided inside the vignette Adding Analyses to a Plan.
library(ggplot2)
library(data.table)
# We begin by defining a new plan
<- plnr::Plan$new()
p
# We add sources of data
# We can add data directly
$add_data(
pname = "deaths",
direct = data.table(deaths=1:4, year=2001:2004)
)
# We can add data functions that return data
$add_data(
pname = "ok",
fn = function() {
3
}
)
# We can then add a simple analysis that returns a figure.
# Because this is a single-analysis plan, we begin by adding the argsets.
# We add the first argset to the plan
$add_argset(
pname = "fig_1_2002",
year_max = 2002
)
# And another argset
$add_argset(
pname = "fig_1_2003",
year_max = 2003
)
# And another argset
# (don't need to provide a name if you refer to it via index)
$add_argset(
pyear_max = 2004
)
# Create an analysis function
# (takes two arguments -- data and argset)
<- function(data, argset){
fn_fig_1 <- data$deaths[year<= argset$year_max]
plot_data
<- ggplot(plot_data, aes(x=year, y=deaths))
q <- q + geom_line()
q <- q + geom_point(size=3)
q <- q + labs(title = glue::glue("Deaths from 2001 until {argset$year_max}"))
q
q
}
# Apply the analysis function to all argsets
$apply_action_fn_to_all_argsets(fn_name = "fn_fig_1")
p
# How many analyses have we created?
$x_length() p
## [1] 3
# Examine the argsets that are available
$get_argsets_as_dt() p
## name_analysis index_analysis year_max
## 1: fig_1_2002 1 2002
## 2: fig_1_2003 2 2003
## 3: ba73edd8-1509-4311-bd7d-0b5c035d40d5 3 2004
# When debugging and developing code, we have a number of
# convenience functions that let us directly access the
# data and argsets.
# We can directly access the data:
$get_data() p
## $deaths
## deaths year
## 1: 1 2001
## 2: 2 2002
## 3: 3 2003
## 4: 4 2004
##
## $ok
## [1] 3
##
## $hash
## $hash$current
## [1] "30beabc342f7f5cd1bcae9ce9b1ddfbe"
##
## $hash$current_elements
## $hash$current_elements$deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
##
## $hash$current_elements$ok
## [1] "96455a3f86beb595df04fb314776bd1f"
# We can access the argset by index (i.e. first argset):
$get_argset(1) p
## $year_max
## [1] 2002
# We can also access the argset by name:
$get_argset("fig_1_2002") p
## $year_max
## [1] 2002
# We can acess the analysis (function + argset) by both index and name:
$get_analysis(1) p
## $argset
## $argset$year_max
## [1] 2002
##
## $argset$index_analysis
## [1] 1
##
##
## $fn_name
## [1] "fn_fig_1"
# We recommend using plnr::is_run_directly() to hide
# the first two lines of the analysis function that directly
# extracts the needed data and argset for one of your analyses.
# This allows for simple debugging and code development
# (the programmer would manually run the first two lines
# of code and then run line-by-line inside the function)
<- function(data, argset){
fn_analysis if(plnr::is_run_directly()){
<- p$get_data()
data <- p$get_argset("fig_1_2002")
argset
}
# function continues here
}
# We can run the analysis for each argset (by index and name):
$run_one("fig_1_2002") p
$run_one("fig_1_2003") p
$run_one(3) p
In the functions add_analysis
,
add_analysis_from_df
,
apply_action_fn_to_all_argsets
, and add_data
there is the option to use either fn_name
or
fn
to add the function.
We use them as follows:
library(ggplot2)
library(data.table)
# We begin by defining a new plan and adding data
<- plnr::Plan$new()
p $add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p
# We can then add the analysis with `fn_name`
$add_analysis(
pname = "fig_1_2002",
fn_name = "fn_fig_1",
year_max = 2002
)
# Or we can add the analysis with `fn_name`
$add_analysis(
pname = "fig_1_2003",
fn = fn_fig_1,
year_max = 2003
)
$run_one("fig_1_2002") p
$run_one("fig_1_2003") p
The difference is that with fn_name
we provide the name
of the function (e.g. fn_name = "fn_fig_1"
) while with
fn
we provide the actual function
(e.g. fn = fn_fig_1
).
It is recommended to use fn_name
because
fn_name
calls the function via do.call
which
means that RStudio debugging will work properly. The only reason you
would use fn
is when you are using function
factories.
A hash function is used to map data of arbitrary size to fixed-size values. We can use this to uniquely identify datasets.
The Plan
method get_data
will automatically
compute the spookyhash
via digest::digest for:
library(data.table)
# We begin by defining a new plan and adding data
<- plnr::Plan$new()
p1 $add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p1$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")
p1$add_data(direct = data.table(deaths=1:5, year=2001:2005), name = "deaths3")
p1
# The hash for 'deaths' and 'deaths2' is the same.
# The hash is different for 'deaths3' (different data).
$get_data()$hash$current_elements p1
## $deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
##
## $deaths2
## [1] "82519debaef80054a7b2ed512f8dfb94"
##
## $deaths3
## [1] "d740b5c163d702dde31061bcd9e00716"
# We begin by defining a new plan and adding data
<- plnr::Plan$new()
p2 $add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p2$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")
p2
# The hashes for p1 'deaths', p1 'deaths2', p2 'deaths', and p2 'deaths2'
# are all identical, because the content within each of the datasets is the same.
$get_data()$hash$current_elements p2
## $deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
##
## $deaths2
## [1] "82519debaef80054a7b2ed512f8dfb94"
# The hash for the entire named list is different for p1 vs p2
# because p1 has 3 datasets while p2 only has 2.
$get_data()$hash$current p1
## [1] "a62de2f423eeb9e516442ffcce641dc3"
$get_data()$hash$current p2
## [1] "505ea771d16df0c71946a0276a4bd4d0"
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.