The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
simpr
provides a general, simple, and tidyverse-friendly
framework for generating simulated data, fitting models on simulations,
and tidying model results. The full workflow can happen in a single tidy
pipeline without creating external functions, global values, or using
loops. It’s useful for power analysis, design analysis, simulation
studies, and for teaching statistics.
Easily
readable simulation specifications. You can specify
simulations in a few lines, including referring to other simulation
variables and to simulation parameters that you’re varying (such as
sample size). simpr
takes care of all the details of
generating your simulation across varying parameters.
Sensibly
handle errors. simpr
has various options to
keep going even when simulation or model-fitting hits errors, so that
you don’t need to start over if a single iteration hits fatal numerical
issues.
Reproducible workflows. Individual simulations can be reproduced exactly without needed to perform the whole simulation again.
Easy-to-use
parallel processing. Building on furrr
,
parallel processing for simpr
can usually be turned on with
a couple lines of code.
The hardest part of any simulation is designing the data-generating
process and deciding what values of parameters you want to explore.
simpr
takes care of the rest so you can focus on these
central issues.
## Install stable CRAN version
install.packages("simpr")
## Install latest development version
::install_github("statisfactions/simpr")
remotes
library(simpr)
The simpr
workflow, inspired by the infer
package,
distills a simulation study into five primary steps:
specify()
your data-generating process
define()
parameters that you want to systematically
vary across your simulation design (e.g. n, effect
size)
generate()
the simulation data
fit()
models to your data
(e.g. lm()
)
tidy_fits()
for further processing using
broom::tidy()
, such as computing power or Type I Error
rates
simpr
makes no assumptions about your data and is not
specialized to any particular type of data generating process or model.
If R can generate it and if R can fit models, you can use
simpr
to run your simulation. (The tidying step is limited
by the models supported broom::tidy()
, although you can
also supply your own tidying function or expression.)
Suppose we are calculating the power for a two-sample t-test
where the data is log-normally distributed, which can be generated by
stats::rlnorm()
.
set.seed(100)
## Data-generating mechanism
specify(a = ~ rlnorm(n, mean = 0),
b = ~ rlnorm(n, mean = 0.5)) %>%
## Vary n from 30 to 100
define(n = seq(30, 100, by = 10)) %>%
## 100 repetitions
generate(100) %>%
## fit t-tests
fit(t_test = ~ t.test(a, b)) %>%
## bring model results into a tidy tibble
tidy_fits()
#> # A tibble: 800 × 14
#> .sim_id n rep Source estimate estimate1 estimate2 statistic p.value
#> <int> <dbl> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 30 1 t_test -0.953 1.73 2.68 -1.60 0.117
#> 2 2 40 1 t_test -0.249 1.64 1.89 -0.581 0.563
#> 3 3 50 1 t_test -0.616 1.67 2.29 -1.19 0.237
#> 4 4 60 1 t_test -1.75 1.28 3.03 -3.30 0.00146
#> 5 5 70 1 t_test -0.876 1.61 2.48 -1.96 0.0525
#> 6 6 80 1 t_test -0.780 1.71 2.49 -2.13 0.0352
#> 7 7 90 1 t_test -0.818 1.60 2.42 -2.51 0.0129
#> 8 8 100 1 t_test -0.878 1.51 2.38 -2.61 0.00988
#> 9 9 30 2 t_test -0.487 1.96 2.44 -0.713 0.479
#> 10 10 40 2 t_test -2.29 1.37 3.66 -1.76 0.0851
#> # … with 790 more rows, and 5 more variables: parameter <dbl>, conf.low <dbl>,
#> # conf.high <dbl>, method <chr>, alternative <chr>
specify()
creates two variables a
and
b
that are distributed lognormally (any R expression that
generates data can be used here). The specify
expressions
refer to the sample size, n
. define()
clarifies that n
varies between 30 and 100 by 10s.
generate()
actually does the data generation, with 100
simulated datasets for each possible value of define()
.
fit()
applies an arbitrary R expression to each simulated
dataset, and tidy_fits()
brings things together in a tidy
tibble that can be easily aggregated and plotted to calculate bias,
power, etc.
See vignette("simpr")
to get started on using the
package, or view the simpr
showcase
for several applied examples.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.