The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This document demonstrates how to perform permutation test inference for heterogeneous treatment effects. We use a simulated dataset to illustrate. This data is provided with the package as . Overall, this vignette illustrates the different tests for ideosyncratic variation one might make using our package, and shows how different forms of covariate adjustment can increase power or target inferential questions differently.
We first load our package along with some other useful packages.
We begin with exploring a toy dataset with 500 observations that we included with the package for illustration. ToyData
has an outcome, four covariates, and a treatment indicator. They were generated with the following model: \[
Y_i(0) = 1 + x_{1i} + 2 x_{2i} + 4x_{3i} + \epsilon_i
\] with \(\epsilon_i \sim N( 0, 1^2 )\). The treatment model is all systematic: \[
\tau_i = 2 + 2x_{1i} + x_{2i}
\] So \(x_1\) and \(x_2\) are both predictive of treatment impact. \(x_3\) also predicts control side variation, and should therefore be useful for increasing precision. \(x_4\) is useless. \(Y_i(1) = Y_i(0) + \tau_i\), so there is no ideosyncratic variation if we control for both \(x_1\) and \(x_2\).
As we generated these data, we know the true individual treatment effects.
## Y Z x1 x2 x3 x4 tau
## 1 2.2430188 0 0.7635779 0.9415463 -0.50291814 -1.4374786 4.468702
## 2 0.3997241 0 -0.5884278 0.7976774 -0.32835669 -0.7140096 1.620822
## 3 3.5865779 0 0.4881701 0.1335321 0.54933171 -0.7098844 3.109872
## 4 0.9929092 0 1.3796253 0.1786193 -0.23264879 -0.3467998 4.937870
## 5 3.0077115 1 0.1722719 0.2836023 -0.03172434 -2.2100810 2.628146
## 6 2.5547212 0 -0.2978002 0.2678954 0.46776121 -1.6495943 1.672295
td = gather( ToyData, x1, x2, x3, x4, key="X", value="value" )
td = gather( td, Y, tau, key="outcome", value="value2" )
ggplot( td, aes( x=value, y=value2, col=as.factor(Z) ) ) +
facet_grid( outcome ~ X, scales="free" ) +
geom_point( alpha=0.5, size=0.5) +
geom_smooth( method="loess", se=FALSE ) +
labs( x="Covariates", y="" )
## `geom_smooth()` using formula = 'y ~ x'
As the data is simulated, we have the true finite-sample ATE:
## [1] 1.858165
Before testing, we quickly look at the marginal and residual CDFs of treatment and control. We see heterogeniety at left, but after controlling for observed covariates we have no visibile ideosyncratic heterogeniety left over.
par( mfrow=c(1,2) )
ll0 = lm( Y ~ Z, data=ToyData )
plot( ecdf( resid(ll0)[ToyData$Z==1] ), pch=".", main="Marginal CDFs of \n treatment and control")
plot( ecdf( resid(ll0)[ToyData$Z==0] ), pch=".", col="red", add=TRUE )
ll1 = lm( Y ~ Z + x1 + x2 + x3 + x4, data=ToyData )
plot( ecdf( resid(ll1)[ToyData$Z==1] ), pch=".", main="Residual CDFs of \n treatment and control" )
plot( ecdf( resid(ll1)[ToyData$Z==0] ), pch=".", col="red", add=TRUE )
A simple linear model should give us our parameters from above, since it is a correct specification in this case.
## (Intercept) Z x1 x2 x3 Z:x1
## 1.0 2.1 1.1 2.0 4.0 2.0
## Z:x2 Z:x3
## 0.9 0.1
To do our tests, we first must specify some parameters determining the resolution of the grid search and number of permutations at each grid point. Note that these values should be increased for a real analysis. We chose these particular values to reduce computation time for illustration.
The basic test for ideosyncratic treatment effect variation is as follows (with no adjustment for covariates):
tst1 = detect_idiosyncratic( Y ~ Z, data=ToyData, B=B, grid.size = grid.size, verbose=FALSE )
summary( tst1 )
##
## FRT CI Test for Treatment Effect Heterogeneity
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, B = B,
## grid.size = grid.size, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat
## Grid: 11 points tested over range of 0.347343 to 3.936448 with 20 repititions/point
## Estimated ATE (grid center): 2.142
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.08 0.1501 0.1001
## CI for p-value (monte carlo error) = 0.000100 - 0.379027
We can increase the power by adjusting for covariates. Please note that we do not include any interaction terms at this point.
We specify a control.formula
which will be used to generate a matrix to hand to the linear regression function. This will convert factors to dummy variables as needed.
tst2 = detect_idiosyncratic( Y ~ Z, data=ToyData, control.formula = ~ x1 + x2 + x3 + x4, B=B,
test.stat="SKS.stat.cov", verbose=FALSE )
summary( tst2 )
##
## FRT CI Test for Treatment Effect Heterogeneity
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, control.formula = ~x1 +
## x2 + x3 + x4, test.stat = "SKS.stat.cov", B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.cov
## Grid: 151 points tested over range of 1.441206 to 2.454906 with 20 repititions/point
## Estimated ATE (grid center): 1.948
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.232 1e-04 1e-04
## CI for p-value (monte carlo error) = 0.000100 - 0.168533
Let’s explore how the results might differ when non-treatment covariates are used.
tst2b = detect_idiosyncratic( Y ~ Z, data=ToyData, control.formula = ~ x3 + x4, B=B,
test.stat="SKS.stat.cov", verbose=FALSE )
summary( tst2b )
##
## FRT CI Test for Treatment Effect Heterogeneity
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, control.formula = ~x3 +
## x4, test.stat = "SKS.stat.cov", B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.cov
## Grid: 151 points tested over range of 1.047639 to 3.444855 with 20 repititions/point
## Estimated ATE (grid center): 2.246
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.148 1e-04 1e-04
## CI for p-value (monte carlo error) = 0.000100 - 0.168533
We can compare to when we correct for some useless covariates not related to outcome.
N = nrow(ToyData)
ToyData$x4 = rnorm( N )
tst1b = detect_idiosyncratic( Y ~ Z, data=ToyData, control.formula = ~ x4, B=B,
test.stat="SKS.stat.cov", verbose=FALSE )
summary( tst1b )
##
## FRT CI Test for Treatment Effect Heterogeneity
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, control.formula = ~x4,
## test.stat = "SKS.stat.cov", B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.cov
## Grid: 151 points tested over range of 0.347829 to 3.948574 with 20 repititions/point
## Estimated ATE (grid center): 2.148
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.068 0.3501 0.1501
## CI for p-value (monte carlo error) = 0.012449 - 0.592289
To test for ideosyncratic variation beyond systematic, we pass an interaction.formula
to the method.
We first test for ideosyncratic variation beyond x1 (and we should get high \(p\)-value).
B = 20
tst3a1 = detect_idiosyncratic( Y ~ Z, data=ToyData, interaction.formula = ~ x1, B=B,
test.stat="SKS.stat.int.cov", verbose=FALSE )
summary( tst3a1 )
##
## FRT CI Test for Tx Effect Heterogeneity Beyond a Systematic Model
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, interaction.formula = ~x1,
## test.stat = "SKS.stat.int.cov", B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.int.cov
## Grid: 151 points tested with 20 repititions/point
## Grid range:
## Z Z:W
## 0.8996999 0.3335783
## 3.2388098 3.0164854
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.072 0.3501 0.2001
Include additional terms to increase power. We are correcting for x3 and x4.
tst3a2 <- detect_idiosyncratic( Y ~ Z, data=ToyData,
interaction.formula = ~ x1,
control.formula = ~ x3 + x4,
B=B, test.stat="SKS.stat.int.cov",
verbose=FALSE )
summary( tst3a2 )
##
## FRT CI Test for Tx Effect Heterogeneity Beyond a Systematic Model
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, interaction.formula = ~x1,
## control.formula = ~x3 + x4, test.stat = "SKS.stat.int.cov",
## B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.int.cov
## Grid: 151 points tested with 20 repititions/point
## Grid range:
## Z Z:W
## 1.862449 1.582119
## 2.927466 2.724050
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.128 1e-04 1e-04
For comparison, we next include all terms.
tst3a2b <- detect_idiosyncratic( Y ~ Z, data=ToyData, control.formula = ~ x2 + x3 + x4,
interaction.formula = ~ x1, B=B, test.stat="SKS.stat.int.cov",
verbose=FALSE )
summary( tst3a2b )
##
## FRT CI Test for Tx Effect Heterogeneity Beyond a Systematic Model
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, interaction.formula = ~x1,
## control.formula = ~x2 + x3 + x4, test.stat = "SKS.stat.int.cov",
## B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.int.cov
## Grid: 151 points tested with 20 repititions/point
## Grid range:
## Z Z:W
## 1.793043 1.853797
## 2.360109 2.297513
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.108 1e-04 1e-04
Now correct for the other covariates as well, but still have correct heterogeneous treatment model. Note that you should still get high \(p\)-value.
Start testing for variation beyond x1 and x2.
tst3b <- detect_idiosyncratic( Y ~ Z, data=ToyData,
interaction.formula = ~ x1 + x2, B=B, test.stat="SKS.stat.int.cov",
verbose=FALSE )
summary( tst3b )
##
## FRT CI Test for Tx Effect Heterogeneity Beyond a Systematic Model
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, interaction.formula = ~x1 +
## x2, test.stat = "SKS.stat.int.cov", B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.int.cov
## Grid: 151 points tested with 20 repititions/point
## Grid range:
## Z Z:Wx1 Z:Wx2
## 1.071071 0.4440212 -0.1189199
## 3.134826 2.7356742 2.0946137
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.044 0.9501 0.7501
Continue to test for ideosyncratic variation beyond x1 and x2, adjusting for x3 and x4. Note that you should still get high \(p\)-value.
tst3c <- detect_idiosyncratic( Y ~ Z, data=ToyData,
interaction.formula = ~ x1 + x2,
control.formula = ~ x3 + x4,
B=B, test.stat="SKS.stat.int.cov",
verbose=FALSE )
summary( tst3c )
##
## FRT CI Test for Tx Effect Heterogeneity Beyond a Systematic Model
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, interaction.formula = ~x1 +
## x2, control.formula = ~x3 + x4, test.stat = "SKS.stat.int.cov",
## B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.int.cov
## Grid: 151 points tested with 20 repititions/point
## Grid range:
## Z Z:Wx1 Z:Wx2
## 1.838445 1.839584 0.6848196
## 2.360001 2.251090 1.2407400
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.068 0.5501 0.4001
Finally, test for ideosyncratic variation beyond all covariates, even irrelevant ones. Again, you should expect to get high \(p\)-value.
tst3d <- detect_idiosyncratic( Y ~ Z, data=ToyData,
interaction.formula = ~ x1 + x2 + x3 + x4,
B=B, test.stat="SKS.stat.int.cov",
verbose=FALSE )
summary( tst3d )
##
## FRT CI Test for Tx Effect Heterogeneity Beyond a Systematic Model
## Call:
## detect_idiosyncratic(formula = Y ~ Z, data = ToyData, interaction.formula = ~x1 +
## x2 + x3 + x4, test.stat = "SKS.stat.int.cov", B = B, verbose = FALSE)
##
## # observations: 500
## Test Statistic: SKS.stat.int.cov
## Grid: 151 points tested with 20 repititions/point
## Grid range:
## Z Z:Wx1 Z:Wx2 Z:Wx3 Z:Wx4
## 1.897276 1.715701 0.6765483 -0.2020442 -0.1411004
## 2.307015 2.316495 1.1294212 0.3874870 0.2749357
## gamma (for CI for grid): 0.000100
##
## Statistic P-Value (Sweep) P-Value (Plug-In)
## 0.06 0.5501 0.3001
We can easily compare the results by simultaneously displaying the outputs from all tested models by using the get.p.value()
method. This method extracts some core summary statistics:
## p.value min.p max.p plug
## 0.35010000 0.01234853 0.59218853 0.15010000
We can thus collect all our models from above, and make a dataframe of the overall results:
tests = list( no_cov=tst1, useless_cov=tst1b, all_covariates=tst2,
non_tx_covariates_only=tst2b, het_beyond_x1 = tst3a1,
het_beyond_x1_with_x3_x4_cov=tst3a2, het_beyond_x1_with_all_cov=tst3a2,
het_beyond_x1_x2=tst3b,
het_beyond_x1_x2_with_cov=tst3c, het_beyond_all=tst3d )
agg.res = purrr::map( tests, get.p.value ) %>%
purrr::map( as.list )
agg.res = bind_rows( agg.res, .id = "test" )
agg.res
## # A tibble: 10 × 5
## test p.value min.p max.p plug
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 no_cov 0.150 0 0.379 0.100
## 2 useless_cov 0.350 0.0123 0.592 0.150
## 3 all_covariates 0.0001 0 0.168 0.0001
## 4 non_tx_covariates_only 0.0001 0 0.168 0.0001
## 5 het_beyond_x1 0.350 0.0321 0.592 0.200
## 6 het_beyond_x1_with_x3_x4_cov 0.0001 0 0.168 0.0001
## 7 het_beyond_x1_with_all_cov 0.0001 0 0.168 0.0001
## 8 het_beyond_x1_x2 0.950 0.408 0.999 0.750
## 9 het_beyond_x1_x2_with_cov 0.550 0.0573 0.769 0.400
## 10 het_beyond_all 0.550 0.0573 0.769 0.300
Let’s fit a model that allows for no systematic treatment impact heterogeniety. This means that all variation would have to be considered ideosyncratic. The key, however, is we control for covariates to increase precision.
##
## Call:
## lm(formula = Y ~ Z + x1 + x2 + x3 + x4, data = ToyData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8197 -0.9376 -0.1076 1.0141 4.1728
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.06653 0.09234 11.55 <2e-16 ***
## Z 1.95010 0.13065 14.93 <2e-16 ***
## x1 2.11336 0.06757 31.27 <2e-16 ***
## x2 2.43754 0.06926 35.19 <2e-16 ***
## x3 3.97540 0.06682 59.49 <2e-16 ***
## x4 -0.01637 0.06829 -0.24 0.811
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.455 on 494 degrees of freedom
## Multiple R-squared: 0.9243, Adjusted R-squared: 0.9235
## F-statistic: 1206 on 5 and 494 DF, p-value: < 2.2e-16
The estimated ATE is close to the truth, as expected considering the random assignment.
Next plot residual CDFS of treatment and control groups.
plot( ecdf( resid(ll1)[ToyData$Z==1] ), pch=".", main="Residual CDFs of treatment and control" )
plot( ecdf( resid(ll1)[ToyData$Z==0] ), pch=".", col="red", add=TRUE )
Note the residual ECDFs from above are quite aligned. The Tx effect variation has been picked up by main effects which means we would not detect ideosyncratic variation even though there is such variation. In other words, the treatment variation has been distributed across the residuals of the control group as well as treatment, and thus when we compare the distributions they are the same. This is why the choice of test statistic is delicate, and why for the covariate adjusted SKS statistic, we need to fit the model on the control units only, and then extract the residuals.
We also offer an adjusted variance ratio test:
## pvalue var1 var0 ratio log.ratio asy.se z
## 1 0.0004993163 32.53071 20.6582 1.574711 0.454072 0.1379776 3.290912
This does not use permutation inference.
We offer several test statistics one might use. We also have a method to print out some info on what is available:
##
## ## -----------------------------------------------------
## ## Available test statistics for detect.idiosyncratic():
## ## -----------------------------------------------------
## Test statistics when not adjusting for covariates or specifying interactions:
## 1) KS.stat - Calculate classic (not shifted) KS statistic. If tau passed, Y1 will be shifted by tau.
## 2) SKS.stat - Shifted KS statistic. Calculate KS distance between Y0 and Y1 shfited by tau.
## 3) rq.stat - Classic KS statistic via quantile regression with covariates
##
## Test statistics when adjusting for covariates but not specifying interactions:
## 1) SKS.stat.cov.pool - Shifted KS statistic with covariates to increase precision.
## 2) SKS.stat.cov - Shifted KS statistic with covariates with model for outcomes calculated on control group only.
## This avoids 'splitting' the treatment variation between treatment and control groups. We recommend this method
## over the 'pool' method in SKS.stat.cov.pool.
## 3) SKS.stat.cov.rq - Shifted KS statistic via quantile regression with covariates.
## 4) rq.stat.cond.cov - KS statistic via quantile regression with covariates. Conditional approach; see Koenker
## and Xiao (2002) for more information.
## 5) rq.stat.uncond.cov - KS statistic via quantile regression with covariates. Unconditional approach; see
## Firpo (2007) for more information.
##
## Test statistics when specifying interactions, with or without covariate adjustment:
## 1) SKS.stat.int.cov.pool - Shifted KS statistic with a linear treatment effect model and optional covariate
## adjustment. This will attempt to remove any systematic variation corresponding to the specified interaction
## model and then return an SKS statistic on the residuals to measure any variation 'left over'. This is the
## test statistic used in Ding, Feller, and Miratrix (2016), JRSS-B.
## 2) SKS.stat.int.cov - Similar to SKS.stat.int.cov.pool, but this method first adjusts for baseline and then
## models treatment effects on the residuals to not split treatment effects. We recommend this method over
## the 'pool' method in SKS.stat.int.cov.pool.
##
## Test statistics when specifying interactions, without covariate adjustment:
## 1) WSKS.t - Calculates the shifted KS statistic within each group specified in the interaction model, and
## then aggregates together as a weighted average. Should be used when the interaction model is a single
## categorical or factor covariate.
## 2) SKS.pool.t - Subtract off group level treatment effect estimates and then look at KS statistic on residuals.
## Should be used when the interaction model is a single categorical or factor covariate.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.