The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Run your Pipeline

Once you have built your full specification blueprint and feel comfortable with how the pipeline is executed, you can implement a full multiverse-style analysis.

Simply use run_multiverse(<your expanded grid object>):

library(tidyverse)
library(multitool)

# create some data
the_data <-
  data.frame(
    id  = 1:500,
    iv1 = rnorm(500),
    iv2 = rnorm(500),
    iv3 = rnorm(500),
    mod = rnorm(500),
    dv1 = rnorm(500),
    dv2 = rnorm(500),
    include1 = rbinom(500, size = 1, prob = .1),
    include2 = sample(1:3, size = 500, replace = TRUE),
    include3 = rnorm(500)
  )

# create a pipeline blueprint
full_pipeline <- 
  the_data |>
  add_filters(include1 == 0, include2 != 3, include3 > -2.5) |> 
  add_variables(var_group = "ivs", iv1, iv2, iv3) |> 
  add_variables(var_group = "dvs", dv1, dv2) |> 
  add_model("linear model", lm({dvs} ~ {ivs} * mod))

# expand the pipeline
expanded_pipeline <- expand_decisions(full_pipeline)

# Run the multiverse
multiverse_results <- run_multiverse(expanded_pipeline)

multiverse_results
#> # A tibble: 48 × 4
#>    decision specifications   model_fitted     pipeline_code   
#>    <chr>    <list>           <list>           <list>          
#>  1 1        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  2 2        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  3 3        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  4 4        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  5 5        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  6 6        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  7 7        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  8 8        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#>  9 9        <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 10 10       <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> # ℹ 38 more rows

The result will be another tibble with various list columns.

It will always contain a list column named specifications containing all the information you generated in your blueprint. Next, there will a list column for your fitted model fitted, labelled model_fitted.

Unpacking a multiverse analysis

There are two main ways to unpack and examine multitool results. The first is by using tidyr::unnest().

Unnest

Inside the model_fitted column, multitool gives us 4 columns: model_parameters, model_performance, model_warnings, and model_messages.

multiverse_results |> unnest(model_fitted)
#> # A tibble: 48 × 8
#>    decision specifications   model_function model_parameters   model_performance
#>    <chr>    <list>           <chr>          <list>             <list>           
#>  1 1        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  2 2        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  3 3        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  4 4        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  5 5        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  6 6        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  7 7        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  8 8        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  9 9        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#> 10 10       <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> #   pipeline_code <list>

The model_parameters column gives you the result of calling parameters::parameters() on each model in your grid, which is a data.frame of model coefficients and their associated standard errors, confidence intervals, test statistic, and p-values.

multiverse_results |> 
  unnest(model_fitted) |> 
  unnest(model_parameters)
#> # A tibble: 192 × 16
#>    decision specifications   model_function parameter   coefficient     se    ci
#>    <chr>    <list>           <chr>          <chr>             <dbl>  <dbl> <dbl>
#>  1 1        <tibble [1 × 3]> lm             (Intercept)     0.140   0.0613  0.95
#>  2 1        <tibble [1 × 3]> lm             iv1            -0.00984 0.0607  0.95
#>  3 1        <tibble [1 × 3]> lm             mod             0.0864  0.0612  0.95
#>  4 1        <tibble [1 × 3]> lm             iv1:mod         0.0847  0.0655  0.95
#>  5 2        <tibble [1 × 3]> lm             (Intercept)    -0.0763  0.0605  0.95
#>  6 2        <tibble [1 × 3]> lm             iv1            -0.0698  0.0599  0.95
#>  7 2        <tibble [1 × 3]> lm             mod            -0.0474  0.0604  0.95
#>  8 2        <tibble [1 × 3]> lm             iv1:mod        -0.0651  0.0646  0.95
#>  9 3        <tibble [1 × 3]> lm             (Intercept)     0.143   0.0611  0.95
#> 10 3        <tibble [1 × 3]> lm             iv2             0.0368  0.0590  0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> #   p <dbl>, model_performance <list>, model_warnings <list>,
#> #   model_messages <list>, pipeline_code <list>

The model_performance column gives fit statistics, such as r2 or AIC and BIC values, computed by running performance::performance() on each model in your grid.

multiverse_results |> 
  unnest(model_fitted) |>
  unnest(model_performance)
#> # A tibble: 48 × 14
#>    decision specifications   model_function model_parameters     aic  aicc   bic
#>    <chr>    <list>           <chr>          <list>             <dbl> <dbl> <dbl>
#>  1 1        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  838.  839.  857.
#>  2 2        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  831.  831.  849.
#>  3 3        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  840.  840.  858.
#>  4 4        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  832.  832.  851.
#>  5 5        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  834.  835.  853.
#>  6 6        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  832.  832.  851.
#>  7 7        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  838.  839.  857.
#>  8 8        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  831.  831.  849.
#>  9 9        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  840.  840.  858.
#> 10 10       <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  832.  832.  851.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> #   model_warnings <list>, model_messages <list>, pipeline_code <list>

The model_messages and model_warnings columns contain information provided by the modeling function. If something went wrong or you need to know something about a particular model, these columns will have captured messages and warnings printed by the modeling function.

Reveal

I wrote wrappers around the tidyr::unnest() workflow. The main function is reveal(). Pass a multiverse results object to reveal() and tell it which columns to grab by indicating the column name in the .what argument:

multiverse_results |> 
  reveal(.what = model_fitted)
#> # A tibble: 48 × 8
#>    decision specifications   model_function model_parameters   model_performance
#>    <chr>    <list>           <chr>          <list>             <list>           
#>  1 1        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  2 2        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  3 3        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  4 4        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  5 5        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  6 6        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  7 7        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  8 8        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#>  9 9        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#> 10 10       <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]> <prfrmnc_>       
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> #   pipeline_code <list>

If you want to get straight to a specific result you can specify a sub-list with the .which argument:

multiverse_results |> 
  reveal(.what = model_fitted, .which = model_parameters)
#> # A tibble: 192 × 16
#>    decision specifications   model_function parameter   coefficient     se    ci
#>    <chr>    <list>           <chr>          <chr>             <dbl>  <dbl> <dbl>
#>  1 1        <tibble [1 × 3]> lm             (Intercept)     0.140   0.0613  0.95
#>  2 1        <tibble [1 × 3]> lm             iv1            -0.00984 0.0607  0.95
#>  3 1        <tibble [1 × 3]> lm             mod             0.0864  0.0612  0.95
#>  4 1        <tibble [1 × 3]> lm             iv1:mod         0.0847  0.0655  0.95
#>  5 2        <tibble [1 × 3]> lm             (Intercept)    -0.0763  0.0605  0.95
#>  6 2        <tibble [1 × 3]> lm             iv1            -0.0698  0.0599  0.95
#>  7 2        <tibble [1 × 3]> lm             mod            -0.0474  0.0604  0.95
#>  8 2        <tibble [1 × 3]> lm             iv1:mod        -0.0651  0.0646  0.95
#>  9 3        <tibble [1 × 3]> lm             (Intercept)     0.143   0.0611  0.95
#> 10 3        <tibble [1 × 3]> lm             iv2             0.0368  0.0590  0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> #   p <dbl>, model_performance <list>, model_warnings <list>,
#> #   model_messages <list>, pipeline_code <list>

reveal_model_*

multitool will run and save anything you put in your pipeline but most often, you will want to look at model parameters and/or performance. To that end, there are a set of convenience functions for getting at the most common multiverse results: reveal_model_parameters, reveal_model_performance, reveal_model_messages, and reveal_model_warnings.

reveal_model_parameters unpacks the model parameters in your multiverse:

multiverse_results |> 
  reveal_model_parameters()
#> # A tibble: 192 × 16
#>    decision specifications   model_function parameter   coefficient     se    ci
#>    <chr>    <list>           <chr>          <chr>             <dbl>  <dbl> <dbl>
#>  1 1        <tibble [1 × 3]> lm             (Intercept)     0.140   0.0613  0.95
#>  2 1        <tibble [1 × 3]> lm             iv1            -0.00984 0.0607  0.95
#>  3 1        <tibble [1 × 3]> lm             mod             0.0864  0.0612  0.95
#>  4 1        <tibble [1 × 3]> lm             iv1:mod         0.0847  0.0655  0.95
#>  5 2        <tibble [1 × 3]> lm             (Intercept)    -0.0763  0.0605  0.95
#>  6 2        <tibble [1 × 3]> lm             iv1            -0.0698  0.0599  0.95
#>  7 2        <tibble [1 × 3]> lm             mod            -0.0474  0.0604  0.95
#>  8 2        <tibble [1 × 3]> lm             iv1:mod        -0.0651  0.0646  0.95
#>  9 3        <tibble [1 × 3]> lm             (Intercept)     0.143   0.0611  0.95
#> 10 3        <tibble [1 × 3]> lm             iv2             0.0368  0.0590  0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> #   p <dbl>, model_performance <list>, model_warnings <list>,
#> #   model_messages <list>, pipeline_code <list>

reveal_model_performance unpacks the model performance:

multiverse_results |> 
  reveal_model_performance()
#> # A tibble: 48 × 14
#>    decision specifications   model_function model_parameters     aic  aicc   bic
#>    <chr>    <list>           <chr>          <list>             <dbl> <dbl> <dbl>
#>  1 1        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  838.  839.  857.
#>  2 2        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  831.  831.  849.
#>  3 3        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  840.  840.  858.
#>  4 4        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  832.  832.  851.
#>  5 5        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  834.  835.  853.
#>  6 6        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  832.  832.  851.
#>  7 7        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  838.  839.  857.
#>  8 8        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  831.  831.  849.
#>  9 9        <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  840.  840.  858.
#> 10 10       <tibble [1 × 3]> lm             <prmtrs_m [4 × 9]>  832.  832.  851.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> #   model_warnings <list>, model_messages <list>, pipeline_code <list>

Unpacking Specifications

You can also choose to expand your decision grid with .unpack_specs to see which decisions produced what result. You have two options for unpacking your decisions - wide or long. If you set .unpack_specs = 'wide', you get one column per decion variable. This is exactly the same as how your decisions appeared in your grid.

multiverse_results |> 
  reveal_model_parameters(.unpack_specs = "wide")
#> # A tibble: 192 × 22
#>    decision ivs   dvs   include1      include2      include3    model model_meta
#>    <chr>    <chr> <chr> <chr>         <chr>         <chr>       <chr> <chr>     
#>  1 1        iv1   dv1   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  2 1        iv1   dv1   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  3 1        iv1   dv1   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  4 1        iv1   dv1   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  5 2        iv1   dv2   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  6 2        iv1   dv2   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  7 2        iv1   dv2   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  8 2        iv1   dv2   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#>  9 3        iv2   dv1   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 10 3        iv2   dv1   include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> # ℹ 182 more rows
#> # ℹ 14 more variables: model_function <chr>, parameter <chr>,
#> #   coefficient <dbl>, se <dbl>, ci <dbl>, ci_low <dbl>, ci_high <dbl>,
#> #   t <dbl>, df_error <int>, p <dbl>, model_performance <list>,
#> #   model_warnings <list>, model_messages <list>, pipeline_code <list>

If you set .unpack_specs = 'long', your decisions get stacked into two columns: decision_set and alternatives. This format is nice for plotting a particular result from a multiverse analyses per different decision alternatives.

multiverse_results |> 
  reveal_model_performance(.unpack_specs = "long")
#> # A tibble: 288 × 15
#>    decision decision_set alternatives    model_function model_parameters     aic
#>    <chr>    <chr>        <chr>           <chr>          <list>             <dbl>
#>  1 1        ivs          iv1             lm             <prmtrs_m [4 × 9]>  838.
#>  2 1        dvs          dv1             lm             <prmtrs_m [4 × 9]>  838.
#>  3 1        include1     include1 == 0   lm             <prmtrs_m [4 × 9]>  838.
#>  4 1        include2     include2 != 3   lm             <prmtrs_m [4 × 9]>  838.
#>  5 1        include3     include3 > -2.5 lm             <prmtrs_m [4 × 9]>  838.
#>  6 1        model        linear model    lm             <prmtrs_m [4 × 9]>  838.
#>  7 2        ivs          iv1             lm             <prmtrs_m [4 × 9]>  831.
#>  8 2        dvs          dv2             lm             <prmtrs_m [4 × 9]>  831.
#>  9 2        include1     include1 == 0   lm             <prmtrs_m [4 × 9]>  831.
#> 10 2        include2     include2 != 3   lm             <prmtrs_m [4 × 9]>  831.
#> # ℹ 278 more rows
#> # ℹ 9 more variables: aicc <dbl>, bic <dbl>, r2 <dbl>, r2_adjusted <dbl>,
#> #   rmse <dbl>, sigma <dbl>, model_warnings <list>, model_messages <list>,
#> #   pipeline_code <list>

Condense

Unpacking specifications alongside specific results allows us to examine the effects of our pipeline decisions.

A powerful way to organize these results is to summarize a specific results column, say the r2 values of our model over the entire multiverse. condense() takes a result column and summarizes it with the .how argument, which takes a list in the form of list(<a name you pick> = <summary function>).

.how will create a column named like so <column being condsensed>_<summary function name provided>. For this case, we have r2_mean and r2_median.

# model performance r2 summaries
multiverse_results |>
  reveal_model_performance() |> 
  condense(r2, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#>   r2_mean r2_median r2_list   
#>     <dbl>     <dbl> <list>    
#> 1 0.00776   0.00585 <dbl [48]>

# model parameters for our predictor of interest
multiverse_results |>
  reveal_model_parameters() |> 
  filter(str_detect(parameter, "iv")) |>
  condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#>   coefficient_mean coefficient_median coefficient_list
#>              <dbl>              <dbl> <list>          
#> 1         -0.00606            -0.0114 <dbl [96]>

In the last example, we have filtered our multiverse results to look at our predictors iv* to see what the mean and median effect was (over all combinations of decisions) on our outcomes.

However, we had three versions of our predictor and two outcomes, so combining dplyr::group_by() with condense() might be more informative:

multiverse_results |>
  reveal_model_parameters(.unpack_specs = "wide") |> 
  filter(str_detect(parameter, "iv")) |>
  group_by(ivs, dvs) |>
  condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 6 × 5
#> # Groups:   ivs [3]
#>   ivs   dvs   coefficient_mean coefficient_median coefficient_list
#>   <chr> <chr>            <dbl>              <dbl> <list>          
#> 1 iv1   dv1            0.0377             0.0300  <dbl [16]>      
#> 2 iv1   dv2           -0.0265            -0.0317  <dbl [16]>      
#> 3 iv2   dv1            0.00177           -0.00132 <dbl [16]>      
#> 4 iv2   dv2           -0.00699           -0.00879 <dbl [16]>      
#> 5 iv3   dv1           -0.00322            0.0156  <dbl [16]>      
#> 6 iv3   dv2           -0.0391            -0.0427  <dbl [16]>

If we were interested in all the terms of the model, we can leverage group_by further:

multiverse_results |>
  reveal_model_parameters(.unpack_specs = "wide") |> 
  group_by(parameter, dvs) |>
  condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 16 × 5
#> # Groups:   parameter [8]
#>    parameter   dvs   coefficient_mean coefficient_median coefficient_list
#>    <chr>       <chr>            <dbl>              <dbl> <list>          
#>  1 (Intercept) dv1            0.102             0.0987   <dbl [24]>      
#>  2 (Intercept) dv2           -0.0393           -0.0363   <dbl [24]>      
#>  3 iv1         dv1            0.0120            0.0130   <dbl [8]>       
#>  4 iv1         dv2           -0.0516           -0.0506   <dbl [8]>       
#>  5 iv1:mod     dv1            0.0633            0.0699   <dbl [8]>       
#>  6 iv1:mod     dv2           -0.00149           0.000479 <dbl [8]>       
#>  7 iv2         dv1            0.0130            0.0151   <dbl [8]>       
#>  8 iv2         dv2           -0.00547          -0.00879  <dbl [8]>       
#>  9 iv2:mod     dv1           -0.00946          -0.00811  <dbl [8]>       
#> 10 iv2:mod     dv2           -0.00852          -0.00955  <dbl [8]>       
#> 11 iv3         dv1           -0.0667           -0.0677   <dbl [8]>       
#> 12 iv3         dv2           -0.0395           -0.0427   <dbl [8]>       
#> 13 iv3:mod     dv1            0.0602            0.0609   <dbl [8]>       
#> 14 iv3:mod     dv2           -0.0386           -0.0395   <dbl [8]>       
#> 15 mod         dv1            0.0663            0.0653   <dbl [24]>      
#> 16 mod         dv2           -0.0455           -0.0474   <dbl [24]>

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.