The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Once you have built your full specification blueprint and feel comfortable with how the pipeline is executed, you can implement a full multiverse-style analysis.
Simply use
run_multiverse(<your expanded grid object>)
:
library(tidyverse)
library(multitool)
# create some data
the_data <-
data.frame(
id = 1:500,
iv1 = rnorm(500),
iv2 = rnorm(500),
iv3 = rnorm(500),
mod = rnorm(500),
dv1 = rnorm(500),
dv2 = rnorm(500),
include1 = rbinom(500, size = 1, prob = .1),
include2 = sample(1:3, size = 500, replace = TRUE),
include3 = rnorm(500)
)
# create a pipeline blueprint
full_pipeline <-
the_data |>
add_filters(include1 == 0, include2 != 3, include3 > -2.5) |>
add_variables(var_group = "ivs", iv1, iv2, iv3) |>
add_variables(var_group = "dvs", dv1, dv2) |>
add_model("linear model", lm({dvs} ~ {ivs} * mod))
# expand the pipeline
expanded_pipeline <- expand_decisions(full_pipeline)
# Run the multiverse
multiverse_results <- run_multiverse(expanded_pipeline)
multiverse_results
#> # A tibble: 48 × 4
#> decision specifications model_fitted pipeline_code
#> <chr> <list> <list> <list>
#> 1 1 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 2 2 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 3 3 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 4 4 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 5 5 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 6 6 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 7 7 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 8 8 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 9 9 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 10 10 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> # ℹ 38 more rows
The result will be another tibble
with various list
columns.
It will always contain a list column named
specifications
containing all the information you generated
in your blueprint. Next, there will a list column for your fitted model
fitted, labelled model_fitted
.
There are two main ways to unpack and examine multitool
results. The first is by using tidyr::unnest()
.
Inside the model_fitted
column, multitool
gives us 4 columns: model_parameters
,
model_performance
, model_warnings
, and
model_messages
.
multiverse_results |> unnest(model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
The model_parameters
column gives you the result of
calling parameters::parameters()
on each model in your
grid, which is a data.frame
of model coefficients and their
associated standard errors, confidence intervals, test statistic, and
p-values.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_parameters)
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) 0.140 0.0613 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.00984 0.0607 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0864 0.0612 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod 0.0847 0.0655 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0763 0.0605 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0698 0.0599 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0474 0.0604 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.0651 0.0646 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) 0.143 0.0611 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0368 0.0590 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
The model_performance
column gives fit statistics, such
as r2 or AIC and BIC values, computed by running
performance::performance()
on each model in your grid.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_performance)
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 834. 835. 853.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
The model_messages
and model_warnings
columns contain information provided by the modeling function. If
something went wrong or you need to know something about a particular
model, these columns will have captured messages and warnings printed by
the modeling function.
I wrote wrappers around the tidyr::unnest()
workflow.
The main function is reveal()
. Pass a multiverse results
object to reveal()
and tell it which columns to grab by
indicating the column name in the .what
argument:
multiverse_results |>
reveal(.what = model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
If you want to get straight to a specific result you can specify a
sub-list with the .which
argument:
multiverse_results |>
reveal(.what = model_fitted, .which = model_parameters)
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) 0.140 0.0613 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.00984 0.0607 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0864 0.0612 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod 0.0847 0.0655 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0763 0.0605 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0698 0.0599 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0474 0.0604 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.0651 0.0646 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) 0.143 0.0611 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0368 0.0590 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_*
multitool
will run and save anything you put in your
pipeline but most often, you will want to look at model parameters
and/or performance. To that end, there are a set of convenience
functions for getting at the most common multiverse results:
reveal_model_parameters
,
reveal_model_performance
,
reveal_model_messages
, and
reveal_model_warnings
.
reveal_model_parameters
unpacks the model parameters in
your multiverse:
multiverse_results |>
reveal_model_parameters()
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) 0.140 0.0613 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.00984 0.0607 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0864 0.0612 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod 0.0847 0.0655 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0763 0.0605 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0698 0.0599 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0474 0.0604 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.0651 0.0646 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) 0.143 0.0611 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0368 0.0590 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_performance
unpacks the model
performance:
multiverse_results |>
reveal_model_performance()
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 834. 835. 853.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
You can also choose to expand your decision grid with
.unpack_specs
to see which decisions produced what result.
You have two options for unpacking your decisions - wide
or
long
. If you set .unpack_specs = 'wide'
, you
get one column per decion variable. This is exactly the same as how your
decisions appeared in your grid.
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide")
#> # A tibble: 192 × 22
#> decision ivs dvs include1 include2 include3 model model_meta
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 2 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 3 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 4 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 5 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 6 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 7 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 8 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 9 3 iv2 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 10 3 iv2 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> # ℹ 182 more rows
#> # ℹ 14 more variables: model_function <chr>, parameter <chr>,
#> # coefficient <dbl>, se <dbl>, ci <dbl>, ci_low <dbl>, ci_high <dbl>,
#> # t <dbl>, df_error <int>, p <dbl>, model_performance <list>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
If you set .unpack_specs = 'long'
, your decisions get
stacked into two columns: decision_set
and
alternatives
. This format is nice for plotting a particular
result from a multiverse analyses per different decision
alternatives.
multiverse_results |>
reveal_model_performance(.unpack_specs = "long")
#> # A tibble: 288 × 15
#> decision decision_set alternatives model_function model_parameters aic
#> <chr> <chr> <chr> <chr> <list> <dbl>
#> 1 1 ivs iv1 lm <prmtrs_m [4 × 9]> 838.
#> 2 1 dvs dv1 lm <prmtrs_m [4 × 9]> 838.
#> 3 1 include1 include1 == 0 lm <prmtrs_m [4 × 9]> 838.
#> 4 1 include2 include2 != 3 lm <prmtrs_m [4 × 9]> 838.
#> 5 1 include3 include3 > -2.5 lm <prmtrs_m [4 × 9]> 838.
#> 6 1 model linear model lm <prmtrs_m [4 × 9]> 838.
#> 7 2 ivs iv1 lm <prmtrs_m [4 × 9]> 831.
#> 8 2 dvs dv2 lm <prmtrs_m [4 × 9]> 831.
#> 9 2 include1 include1 == 0 lm <prmtrs_m [4 × 9]> 831.
#> 10 2 include2 include2 != 3 lm <prmtrs_m [4 × 9]> 831.
#> # ℹ 278 more rows
#> # ℹ 9 more variables: aicc <dbl>, bic <dbl>, r2 <dbl>, r2_adjusted <dbl>,
#> # rmse <dbl>, sigma <dbl>, model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
Unpacking specifications alongside specific results allows us to examine the effects of our pipeline decisions.
A powerful way to organize these results is to summarize a specific
results column, say the r2 values of our model over the
entire multiverse. condense()
takes a result column and
summarizes it with the .how
argument, which takes a list in
the form of
list(<a name you pick> = <summary function>)
.
.how
will create a column named like so
<column being condsensed>_<summary function name provided>
.
For this case, we have r2_mean
and
r2_median
.
# model performance r2 summaries
multiverse_results |>
reveal_model_performance() |>
condense(r2, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> r2_mean r2_median r2_list
#> <dbl> <dbl> <list>
#> 1 0.00776 0.00585 <dbl [48]>
# model parameters for our predictor of interest
multiverse_results |>
reveal_model_parameters() |>
filter(str_detect(parameter, "iv")) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> coefficient_mean coefficient_median coefficient_list
#> <dbl> <dbl> <list>
#> 1 -0.00606 -0.0114 <dbl [96]>
In the last example, we have filtered our multiverse results to look
at our predictors iv*
to see what the mean and median
effect was (over all combinations of decisions) on our outcomes.
However, we had three versions of our predictor and two outcomes, so
combining dplyr::group_by()
with condense()
might be more informative:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
filter(str_detect(parameter, "iv")) |>
group_by(ivs, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 6 × 5
#> # Groups: ivs [3]
#> ivs dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 iv1 dv1 0.0377 0.0300 <dbl [16]>
#> 2 iv1 dv2 -0.0265 -0.0317 <dbl [16]>
#> 3 iv2 dv1 0.00177 -0.00132 <dbl [16]>
#> 4 iv2 dv2 -0.00699 -0.00879 <dbl [16]>
#> 5 iv3 dv1 -0.00322 0.0156 <dbl [16]>
#> 6 iv3 dv2 -0.0391 -0.0427 <dbl [16]>
If we were interested in all the terms of the model, we can leverage
group_by
further:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
group_by(parameter, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 16 × 5
#> # Groups: parameter [8]
#> parameter dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 (Intercept) dv1 0.102 0.0987 <dbl [24]>
#> 2 (Intercept) dv2 -0.0393 -0.0363 <dbl [24]>
#> 3 iv1 dv1 0.0120 0.0130 <dbl [8]>
#> 4 iv1 dv2 -0.0516 -0.0506 <dbl [8]>
#> 5 iv1:mod dv1 0.0633 0.0699 <dbl [8]>
#> 6 iv1:mod dv2 -0.00149 0.000479 <dbl [8]>
#> 7 iv2 dv1 0.0130 0.0151 <dbl [8]>
#> 8 iv2 dv2 -0.00547 -0.00879 <dbl [8]>
#> 9 iv2:mod dv1 -0.00946 -0.00811 <dbl [8]>
#> 10 iv2:mod dv2 -0.00852 -0.00955 <dbl [8]>
#> 11 iv3 dv1 -0.0667 -0.0677 <dbl [8]>
#> 12 iv3 dv2 -0.0395 -0.0427 <dbl [8]>
#> 13 iv3:mod dv1 0.0602 0.0609 <dbl [8]>
#> 14 iv3:mod dv2 -0.0386 -0.0395 <dbl [8]>
#> 15 mod dv1 0.0663 0.0653 <dbl [24]>
#> 16 mod dv2 -0.0455 -0.0474 <dbl [24]>
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.