The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Sensitivity analysis for external validity

In the following vignette, we will walk through a basic example of how to conduct a sensitivity analysis for external validity using senseweight. We will use a subset of the JTPA data.

library(senseweight)
library(ggplot2)
# Load in JTPA data:
data(jtpa_women)
# Summarize sites
jtpa_women |>
  dplyr::group_by(site) |>
  dplyr::summarize(
    length(prevearn),
    dplyr::across(
      c(prevearn, age, married, hrwage, black, hispanic, hsorged, yrs_educ), 
      mean
    )
  )
#> # A tibble: 16 × 10
#>    site  `length(prevearn)` prevearn   age married hrwage   black hispanic
#>    <chr>              <int>    <dbl> <dbl>   <dbl>  <dbl>   <dbl>    <dbl>
#>  1 CC                   524    1855.  32.1  0.219    479. 0.101    0.693  
#>  2 CI                   190    2250.  33.5  0.253    458. 0.0684   0.0105 
#>  3 CV                   788    2192.  33.6  0.278    455. 0.173    0.00635
#>  4 HF                   234    1997.  31.6  0.184    455. 0.432    0.0342 
#>  5 IN                  1392    3172.  34.9  0.193    466. 0.243    0.0194 
#>  6 JC                    81    2564.  30.6  0.136    531. 0.642    0.247  
#>  7 JK                   353    1928.  30.0  0.113    453. 0.912    0      
#>  8 LC                   485    3039.  33.9  0.258    464. 0.0165   0.165  
#>  9 MD                   177    2915.  34.6  0.181    480. 0.367    0      
#> 10 MN                   179    2215.  37.6  0.352    454. 0.00559  0.0782 
#> 11 MT                    38    1680.  33.8  0.395    474. 0        0.0526 
#> 12 NE                   636    2161.  31.7  0.0975   477. 0.511    0.0377 
#> 13 OH                    74    2568.  34.6  0.324    486. 0.0135   0      
#> 14 OK                    87    2320.  37.3  0.126    586. 0.759    0.0805 
#> 15 PR                   463    1783.  32.8  0.0842   506. 0.268    0.378  
#> 16 SM                   401    2997.  32.2  0.284    429. 0.0200   0.00249
#> # ℹ 2 more variables: hsorged <dbl>, yrs_educ <dbl>

Assume researchers are interested in generalizing the results from the site of Omaha, Nebraska to the other 15 experimental sites:

site_name <- "NE"
df_site <- jtpa_women[which(jtpa_women$site == site_name), ]
df_else <- jtpa_women[which(jtpa_women$site != site_name), ]

# Estimate unweighted estimator:
model_dim <- estimatr::lm_robust(Y ~ T, data = df_site)
PATE <- coef(lm(Y ~ T, data = df_else))[2]
DiM <- coef(model_dim)[2]

# Generate weights using observed covariates:
df_all <- jtpa_women
df_all$S <- ifelse(jtpa_women$site == "NE", 1, 0)
model_ps <- WeightIt::weightit(
  (1 - S) ~ . - site - T - Y, 
  data = df_all, method = "ebal", estimand = "ATT"
)
weights <- model_ps$weights[df_all$S == 1]

# Estimate IPW model:
model_ipw <- estimatr::lm_robust(Y ~ T, data = df_site, weights = weights)
ipw <- coef(model_ipw)[2]

# Estimate bound for var(tau):
m <- sqrt(stats::var(df_site$Y[df_site$T == 1]) / stats::var(df_site$Y[df_site$T == 0]))
# Since m > 1:
vartau <- stats::var(df_site$Y[df_site$T == 1]) - stats::var(df_site$Y[df_site$T == 0])

Sensitivity Summary Measures

Robustness value

We can generate the sensitivity summary measures using the summarize_sensitivity function:

summarize_sensitivity(
  weights = weights, 
  Y = df_site$Y, 
  Z = df_site$T, 
  sigma2 = vartau, 
  estimand = "PATE"
)
#>   Unweighted Unweighted_SE Estimate     SE   RV sigma_tau_bound cor_w
#> Z    1107.35        982.65  1356.66 1417.3 0.36          2897.9  0.07

The summarize_sensitivity function defaults to evaluating the robustness value at q=1, indicating a robustness value, relative to a bias equal to the point estimate. Researchers can specify different values for q in the function. In the generalization setting, researchers can modify the sigma2 bound and posit their own values for a plausible bound (given substantive justification). With no specification, sigma2 will be automatically calculated to be bound by var(Y(1)) + var(Y(0)).

RV = robustness_value(estimate = ipw, b_star = 0, sigma2 = vartau, weights = weights)
print(RV)
#> [1] 0.4113622

Benchmarking

# Select weighting variables:
weighting_vars = names(df_all)[which(!names(df_all) %in% c("site", "S", "Y", "T"))]

# Run bechmarking:
df_benchmark <- run_benchmarking(
  weighting_vars = weighting_vars,
  data = df_all[, -1],
  treatment = "T", outcome = "Y", selection = "S",
  estimate = ipw,
  RV = RV, sigma2 = vartau,
  estimand = "PATE"
)


print(df_benchmark)
#>   variable R2_benchmark rho_benchmark    bias    MRCS k_sigma_min k_rho_min
#> 1 prevearn         0.04         -0.22 -115.00  -11.80        9.99     -2.92
#> 2      age         0.06         -0.09  -55.45  -24.47        6.91     -7.37
#> 3  married         0.11          0.00   -2.52 -539.33        3.82   -224.19
#> 4   hrwage         0.05          0.02   13.51  100.40        8.32     27.40
#> 5    black         0.20         -0.03  -33.11  -40.97        2.03    -24.72
#> 6 hispanic         0.14         -0.06  -62.63  -21.66        3.01    -10.30
#> 7  hsorged         0.12          0.10   95.06   14.27        3.51      6.22
#> 8 yrs_educ         0.00         -0.07   -5.23 -259.49      408.90     -9.85

Bias Contour Plot

contour_plot(
  var(weights), vartau, ipw, df_benchmark,
  benchmark = TRUE, shade = TRUE,
  shade_var = c("age", "prevearn"),
  label_size = 4
) 

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.