The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to the simITS package

Luke Miratrix

2020-05-20

Introduction

This vignette quickly outlines the primary method calls for conducting an analysis of an Interrupted Time Series using the simulation approach proposed in the companion paper.

We first cover a simple regression model, then show how to do smoothing, then seasonality. We also make a brief note about generating fake data for the purposes of conducting simulation studies.

Basic ITS analysis

We use the raw Mecklenberg data to illustrate the simITS package.

data(mecklenberg)
head( mecklenberg )
#> # A tibble: 6 x 10
#>   month  karr pbail pptrel  pror  pb4c avg_days_initial avg_t2d pstint7 pstint30
#>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>            <dbl>   <dbl>   <dbl>    <dbl>
#> 1   -29  2450 0.661 0.0347 0.304 0.634            11.4     193.   0.204   0.0820
#> 2   -28  2518 0.668 0.0365 0.295 0.636             9.96    188.   0.156   0.0564
#> 3   -27  2501 0.641 0.0424 0.317 0.628             9.68    191.   0.158   0.0636
#> 4   -26  2472 0.646 0.0376 0.316 0.649            13.9     183.   0.174   0.0781
#> 5   -25  2628 0.664 0.0438 0.293 0.655            12.8     184.   0.211   0.0799
#> 6   -24  2629 0.647 0.0312 0.322 0.697            10.1     193.   0.169   0.0677
meck = mutate( mecklenberg, pbail = 100 * pbail )
ggplot( meck, aes( x=month, y=pbail)) +
  geom_rect(aes( ymin=-Inf, ymax=Inf, xmin=0.5, xmax=25, fill="lightgray"), col = "lightgray", alpha=0.25) +
  scale_fill_identity(name = "", guide = "none", labels = c('Post Policy era')) +
  geom_hline( yintercept = 0, col="black") +
  geom_line( col="black", lty=1, lwd=0.5) +
  geom_point() +
  scale_x_continuous( breaks = c(-29,-24,-18,-12,-6,1,6,12,18,24)) +    
  coord_cartesian(xlim=c(-29.5,24.5), ylim=c(0,100), expand=FALSE) +
  labs( title = " ", y = "Percent cases assigned bail", x = " " )

To have autoregressive errors we use lagged outcomes. We can add lagged outcomes (and covariates) as so:

meck = add_lagged_covariates( meck, outcomename = "pbail", covariates=NULL )
sample_n( meck, 5 ) %>% arrange( month )
#> # A tibble: 5 x 11
#>   month  karr pbail pptrel  pror  pb4c avg_days_initial avg_t2d pstint7 pstint30
#>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>            <dbl>   <dbl>   <dbl>    <dbl>
#> 1   -19  2262  64.2 0.0385 0.320 0.757            11.5     189.   0.175   0.0637
#> 2    -4  1938  62.2 0.0289 0.349 0.686             8.94    216.   0.155   0.0599
#> 3     7  2061  50.1 0.0359 0.463 0.585             7.83    208.   0.154   0.0456
#> 4    18  1850  52.1 0.0319 0.448 0.588             7.75    161.   0.165   0.0551
#> 5    22  1993  49.4 0.0311 0.475 0.549             6.10    150.   0.127   0.0426
#> # … with 1 more variable: lag.outcome <dbl>

This package passes functions for fitting the model, and then uses these functions for doing the extrapolation. For the default, we use the package’s fit_model_default() which is a simple line (with lagged outcome as a covariate):

meck.pre = filter( meck, month <= 0 )    
mod = fit_model_default( meck.pre, "pbail" )
summary( mod )
#> 
#> Call:
#> stats::lm(formula = stats::as.formula(form), data = dat[-c(1), 
#>     ])
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.7374 -1.6059 -0.2528  1.3190  4.6280 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 44.99557   11.73346   3.835 0.000718 ***
#> month       -0.12320    0.05526  -2.229 0.034642 *  
#> lag.outcome  0.26357    0.19033   1.385 0.177875    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.094 on 26 degrees of freedom
#> Multiple R-squared:  0.3575, Adjusted R-squared:  0.3081 
#> F-statistic: 7.234 on 2 and 26 DF,  p-value: 0.003178

To run the entire simulation and extrapolation as a call, we can directly do:

t0 = 0
envelope = process_outcome_model( "pbail", meck, 
                                  t0=t0, R = 100, 
                                  summarize = TRUE, smooth=FALSE )
sample_n( envelope, 5 ) %>% arrange( month )
#> # A tibble: 5 x 10
#>   month  Ymin  Ymax range    SE Ystar     Y Ysmooth Ysmooth1  Ybar
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>   <lgl>    <dbl>
#> 1   -14  59.9  59.9   0    0     59.9  59.9 NA      NA        63.5
#> 2    -3  62.5  62.5   0    0     62.5  62.5 NA      NA        61.7
#> 3     6  53.3  64.9  11.5  2.98  60.3  47.1 NA      NA        60.2
#> 4    16  53.0  63.9  10.8  3.39  58.4  49.1 NA      NA        58.6
#> 5    18  51.3  63.8  12.5  3.20  58.0  52.1 NA      NA        58.2

And plotting our results:

ggplot( envelope, aes( month ) ) +
  geom_line( aes( y=Y ), alpha = 0.6 ) +  # original data
  geom_point( aes( y=Y ) ) +              # original data
  geom_ribbon( aes( ymin=Ymin, ymax=Ymax ), alpha=0.2 ) +
  geom_line( aes( y = Ystar ), col="darkgrey" ) +
  geom_vline( xintercept = t0+0.5)

We provide a nice utility function to generate these graphs:

make_envelope_graph(envelope, t0=t0)

Testing and Impact Intervals

We can aggregate impacts for several time points as follows. First call process_outcome_model() without summarizing:

predictions = process_outcome_model( "pbail", meck,
                                     t0=t0, R = 100,
                                     summarize = FALSE, smooth=FALSE )

Then use aggregate_simulation_results():

sstat = aggregate_simulation_results( orig.data = meck, outcomename = "pbail",
                                      predictions = predictions, months = 1:18 )

quantile( sstat$t, c( 0.025, 0.975 ))
#>     2.5%    97.5% 
#> 57.06750 62.40886
sstat$t.obs
#> [1] 52.0368
sstat$t.obs - quantile( sstat$t, c( 0.025, 0.975 ))
#>       2.5%      97.5% 
#>  -5.030692 -10.372056

Generating fake data

For simulation we also offer a fake data generator. It works like this:

dat = generate_fake_data( t_min=-60, t_max=18, t0 = 0 )
qplot( month, Y, data=dat, geom = c( "point","line") )

Smoothing

Here we demonstrate summarizing and smoothing, using the fake data we just generated.

envelope = process_outcome_model( "Y", dat, t0=t0, R = 100, 
                                  summarize = TRUE, smooth=TRUE )
make_envelope_graph(envelope, t0 )

We can smooth to different degrees using the smooth_k parameter:

alphas = c( 6, 11, 20, 100 )
preds = purrr::map( alphas, function( alpha ) {
  pds = process_outcome_model( "Y", dat,
                               t0=t0, R = 20,
                               summarize = FALSE, smooth=TRUE,
                               smooth_k = alpha )
  pds
} )
names( preds ) = alphas
preds = bind_rows( preds, .id="alpha_k" )
ggplot( filter( preds, month >= t0 ), aes( month, Ysmooth ) ) +
  facet_wrap( ~ alpha_k ) +
  geom_line( aes( group=Run, col=alpha_k ), alpha=0.5, na.rm=TRUE) +
  geom_line( data=dat, aes( month, Y ), col="black", alpha=0.5 ) +
  geom_vline( xintercept=t0, col="red" ) +
  labs( x="month", y="proportion given bail")

Seasonality and covariates

A seasonality model on some fake data with a strong seasonality component is easy to fit. You just construct some code to fit the seasonality model via the make_fit_season_model() factory (you need to have the covariates pre-constructed in your data):

data( newjersey )
fit_season_model_qtemp =  make_fit_season_model( ~ temperature + Q2 + Q3 + Q4 )

envelope = process_outcome_model( "n.warrant", newjersey, t0=-7, R = 100, 
                                  summarize = TRUE, smooth=TRUE, 
                                  fit_model = fit_season_model_qtemp )
make_envelope_graph( envelope, t0=-7 )

Note how it will construct the lagged covariates automatically. The make_fit_season_model() method records what covariates are needed from the passed formula.

Smoothing and seasonality

We can smooth around a seasonality model either with a default smoother made from the specified seasonality model (as was done above) or, like the following, with a specified one of your choice:

smoother = make_model_smoother( fit_model = fit_season_model_sin, covariates = newjersey )
envelope_sin = process_outcome_model( "n.warrant", newjersey, t0=-7, R = 100,
                                  summarize = TRUE, smooth=TRUE, smoother = smoother, smooth_k = 11,
                                  fit_model = fit_season_model_qtemp )
envelope_sin$Ysmooth.base = envelope$Ysmooth
envelope_sin$Ysmooth1.base = envelope$Ysmooth1
make_envelope_graph( filter( envelope_sin, month > -30 ), t0=-7 ) +
  geom_line( aes( y=Ysmooth.base ), col="blue", na.rm=TRUE ) +
  geom_line( aes( y=Ysmooth1.base ), col="blue", lty=2, na.rm=TRUE )

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.