The vbm package implements variance-based
sensitivity analysis for weighting estimators, providing more
informative bounds on treatment effect estimates in observational
studies.
If the package vbm is not installed in their current R versions, users can install it by following the standard instruction:
Each time an R session is opened, the vbm library must be loaded with:
Moreover, the development version of vbm can be installed from GitHub with:
The National Child Development Survey (NCDS) follows a group of individuals born in the United Kingdom back in 1958 1: This longitudinal study collects respondents’ information on education, family, socioeconomic status and health. Among over 17,000 individuals, we took a subset of 3,642 men who were working in 1991 and had complete data on their education and wages, following the approach of Battistin and Sianesi (2011). Missing values were then imputed using MICE (van Buuren & Groothuis-Oudshoorn, 2010). We excluded variables related to education level and wage to avoid data leakage in this study.
We are interested in whether attainment of academic qualification
leads to higher hourly wage. The outcome variable wage is
log of hourly pay in British pounds. The treatment variable
Dany is a binary indicator for education attainment,
i.e. whether the respondents has attained any academic qualification.
2399 people have qualification no lower than O-level, while the left
1243 have no formal education credential. A variety of covariates to
control are considered as follows:
white (whether is identified as white people),
scht (type of school at age 16), qmab &
qmab2 (math scores at ages 7 & 11), qvab
& qvab2 (reading scores at ages 7 & 11),
sib_u (number of siblings), agepa &
agema (age of father and mother in 1974),
maemp (mother’s job status in 1974), paed_u
& maed_u (years of schooling for father and mother). In
terms of potential unobserved confoundedness, differences in
neighborhood culture, economic background and mental health can be
considered.
Using str() helps to give a glimpse on the dataset.
library(WeightIt)
library(dplyr)
library(ggplot2)
library(jointVIP)
library(cobalt)
library(vbm)
data(NCDS)
str(ncds)
#> 'data.frame': 3642 obs. of 14 variables:
#> $ white : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ wage : num 2.57 2.04 1.72 2.2 2.48 ...
#> $ Dany : int 1 1 0 1 1 0 0 1 1 1 ...
#> $ maemp : int 0 0 0 0 1 1 0 1 0 1 ...
#> $ scht : int 2 1 1 3 1 2 1 1 1 3 ...
#> $ qmab : int 2 5 4 5 3 1 4 5 5 2 ...
#> $ qmab2 : int 2 5 4 4 3 1 1 4 4 5 ...
#> $ qvab : int 1 5 4 4 2 2 2 4 3 4 ...
#> $ qvab2 : int 2 5 5 5 3 2 1 3 1 5 ...
#> $ paed_u: int 9 0 0 10 9 10 0 11 10 10 ...
#> $ maed_u: int 9 0 0 10 9 9 0 11 9 10 ...
#> $ agepa : int 60 56 57 40 57 43 43 46 43 47 ...
#> $ agema : int 59 56 53 41 45 42 38 45 43 40 ...
#> $ sib_u : int 3 0 0 1 1 1 1 1 0 3 ...The outcome variable can be influenced by treatment variable, covariates, and other unobserved confounders. Given the lack of randomization in treatment assignment, we examined the covariate imbalance and adjusted it by weighting to have credible causal inference conclusion.
WeightIt package offers a handy approach to estimate propensity score in weighting:
weightlist_att <- weightit(
formula = Dany ~ white + maemp + scht +
qmab + qmab2 +
qvab + qvab2 +
paed_u + maed_u +
agepa + agema +
sib_u,
data = ncds,
method = "ebal",
estimand = "ATT",
stabilize = FALSE,
include.obj = TRUE
)
summary(weightlist_att)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> treated 1. || 1.
#> control 0.019 |---------------------------| 27.489
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 1 2 3 4 5
#> treated 1 1 1 1 1
#> 1135 503 185 129 55
#> control 15.566 15.887 16.575 21.105 27.489
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> treated 0.000 0.000 0.000 0
#> control 1.974 1.036 0.906 0
#>
#> - Effective Sample Sizes:
#>
#> Control Treated
#> Unweighted 1243. 2399
#> Weighted 253.97 2399
love.plot(weightlist_att, stars="std")
The following love plot shows the standardized mean differences between
treated and control groups in each observed covariate. Red and green
dots respectively indicate pre-weighting and post-weighting standardized
mean differences. Before adjustment, early academic performances—such as
reading and math scores at age 7 and 11—shows substantial imbalance.
Entropy balancing effectively eliminates the mean differences as
adjusted dots cluster around 0. This conclusion is also supported by the
joint variable importance plot generated by
jointVIP.
jointVIP focuses on identification of variables that should be prioritized for adjustment based on both treatment and outcome, while traditional love plot only considers treatment imbalance. The two dimensions of variable’s confoundedness possibility by jointVIP gives a wider picture in variable importance measurement. Treatment imbalance and outcome association can be estimated by standardized mean difference by treatment and correlation between outcome and covariates. Bias is mathematically proportional to their product. In the context of variable selection, estimation on whole dataset will cause data leakage, thus an individual pilot dataset should be included for covariate balance evaluation. In this case, a separate subsample from the survey population is used in jointVIP. We will go back to the full dataset in the following analysis.
set.seed(123)
pilot_prop = 0.1
controls <- which(ncds$Dany == 0)
pilot_indices <- sample(controls, length(controls) * pilot_prop)
pilot_df <- ncds[pilot_indices, ]
analysis_data <- ncds[-pilot_indices, ]
analysis_data1 <- ncds[-pilot_indices, ]
post_df <- weightit(
formula = Dany ~ white + maemp + scht +
qmab + qmab2 +
qvab + qvab2 +
paed_u + maed_u +
agepa + agema +
sib_u,
data = analysis_data,
method = "ebal",
estimand = "ATT",
stabilize = FALSE,
include.obj = TRUE
)
new_jointVIP <- create_jointVIP(
treatment = "Dany",
outcome = "wage",
covariates = c("white", "maemp", "scht", "qmab", "qmab2", "qvab",
"qvab2", "paed_u", "maed_u", "agepa", "agema", "sib_u"),
pilot_df = pilot_df,
analysis_df = analysis_data
)
plot(create_post_jointVIP(new_jointVIP,
post_analysis_df = analysis_data),
plot_title = "Pre-weighting jointVIP: Educational Attainment Analysis")plot(create_post_jointVIP(new_jointVIP,
post_analysis_df = analysis_data,
wts = post_df$weights),
plot_title = "Post-weighting jointVIP: Educational Attainment Analysis")
Similarly, jointVIP shows that reading score at age 11 has the strongest
association with both treatment and outcome, followed by reading score
at 7 and math score at 11. Number of siblings along with these three is
most likely to be an influential confounder in this study. After entropy
balancing, treatment imbalance is largely reduced, while outcome
correlation remains unchanged.
Given this successful weighting, we move on to estimate average treatment effect of the treated units and sensitivity analysis with respect to unobserved confounders.
run_sensitivity_analysis() is capable of both ATT
estimation and sensitivity analysis.
# Run the sensitivity analysis
vbm_results <- run_sensitivity_analysis(
weightlist = weightlist_att,
Y = "wage",
treatment = "Dany",
data = ncds,
approach = "vbm",
n_bootstrap = 1000,
seed = 331,
n_cores = 1,
benchmarking = TRUE
)
vbm_results$ipw_estimate
#> treatment
#> 0.2260783
vbm_results$ipw_se
#> [1] 0.02732327
vbm_results$difference_in_means
#> treatment
#> 0.3254757
vbm_results$difference_in_means_se
#> [1] 0.01352582The unadjusted ATT is 0.33 with standard error of 0.01, while the adjusted is 0.23 with standard error of 0.03. After covariate adjustment and accounting of the log scale, respondents with academic qualification have approximately 1.26 times higher wage than those without. The causal effect is 95% significant regardless of covariate balancing.
The variance-based sensitivity model (Huang & Pimentel, 2024) quantifies sensitivity to unobserved confounding by constraining the residual variance in the inverse propensity weights not explained by observed covariates. Instead of bounding the worst-case individual error as in traditional marginal sensitivity models, this approach bounds the distributional differences between the ideal weights (which adjust for both observed and unobserved confounders) and the observable weights (which adjust only for observed covariates). The key parameter is an \(R^2\) value, ranging from 0 to 1, that represents the proportion of variation in the true weights unexplained by the estimated weights. For a given \(R^2\), the maximum possible bias from omitting a confounder has a closed-form expression involving the correlation between weights and outcomes, the outcome variance, and the weight variance. This formulation moves away from worst-case bounds, yielding more informative and stable sensitivity intervals while maintaining a standardized, interpretable sensitivity parameter that can be benchmarked against observed covariates.
head(vbm_results$point_bounds)
#> R2 ATT_low ATT_high
#> 1 0.00 0.22607828 0.2260783
#> 2 0.01 0.15375973 0.2983968
#> 3 0.02 0.12328392 0.3288726
#> 4 0.03 0.09953412 0.3526224
#> 5 0.04 0.07919860 0.3729580
#> 6 0.05 0.06099977 0.3911568
head(vbm_results$benchmark_parameters)
#> variable bias R2_benchmark rho_benchmark
#> 1 white 9.683647e-05 0.0025764368 0.002622954
#> 2 maemp 5.006839e-04 0.0007849679 0.024591711
#> 3 scht -1.112915e-02 0.1003143359 -0.045882530
#> 4 qmab -4.970010e-03 0.0128957701 -0.059860009
#> 5 qmab2 -1.364044e-02 0.1024507301 -0.055580385
#> 6 qvab -4.235385e-03 0.0658596166 -0.021958920
plot_sensitivity_analysis(
vbm_results,
parameter_level = seq(0, 0.15, by = 0.01),
benchmarking = TRUE,
benchmark_variable = c("scht", "qmab2", "agema", "qvab2", "white", "qmab", "qvab"),
variable_name = c("Selective School", "Math Score at 11", "Mother's Age",
"Reading Score at 11", "White",
"Math Score at 7", "Reading Score at 7"),
header = "Effect of Education Attainment on Hourly Wages by VBM"
)
In NCDS study, the above figure displays the variation of point estimate
bounds and confidence interval of ATT with respect to different
unobserved confounders under variance-based sensitivity model. The solid
bars denote the point estimate bounds for a specified value. The lighter
intervals represent the 95% confidence intervals. Benchmarked parameters
for selected observed covariates (selective school, math score at 11,
mother’s age, reading score at 11, white, math score at 7, reading score
at 7) are provided in light brown. The critical \(R^2_*\) is plotted in dashed line.
We estimated \(R^2_*=0.05\), which implies that if an omitted confounder explains 5% or more of the variation in the true weights, our estimated effect of education attainment on log of hourly wage is no longer significantly different from the expected distribution under the null. To assess the plausibility of an omitted confounder resulting in an \(R^2\) value of 0.05, we extended formal benchmarking approaches to calibrate possible \(R^2\) values against the strength of observed covariates. The benchmarking results indicate that math and score at 11 might have strong impact on causal effect if neglected under variance-based model. The results echo the finding from jointVIP in Section 2. Comparing the critical and the benchmarking values, we can conclude that if the unobserved confounder accounts for the variance of weights at the lower level of math score at age of 7, the average treatment effect for the treated may be significant; otherwise, the causal conclusion will be altered. Notice that this dataset has a sensitive causal effect due to misclassification of treatment. See more details at Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kingdom (Battistin.E & Sianesi.B, 2011).
The marginal sensitivity model (Zhao at el., 2019) bounds the individual-level multiplicative error in propensity weights using a parameter \(\Lambda \geq 1\), such that \(\Lambda^{-1} \leq w^*(X,U)/w(X) \leq \Lambda\) for all observed covariates \(X\) and unobserved confounders \(U\). For a fixed \(\Lambda\), researchers compute worst-case bias bounds and confidence intervals, then increase \(\Lambda\) until the interval contains zero. The resulting \(\Lambda^*\) value indicates the confounding strength needed to overturn the estimated treated effect. While widely used due to its simplicity, the MSM’s worst-case constraint can yield overly wide or misleadingly narrow intervals depending on the underlying distribution of unobserved confounders and the degree of outcome overlap between treatment groups.
msm_results <- run_sensitivity_analysis(
weightlist = weightlist_att,
Y = "wage",
treatment = "Dany",
data = ncds,
approach = "msm",
n_bootstrap = 1000,
seed = 331,
n_cores = 1,
benchmarking = TRUE
)
msm_results$lambda_star
#> [1] 2
head(msm_results$point_bounds)
#> lambda ATT_low ATT_high
#> 1 1.00 0.22607828 0.2260783
#> 2 1.25 0.16005009 0.2913671
#> 3 1.50 0.10663907 0.3443859
#> 4 1.75 0.06279558 0.3893368
#> 5 2.00 0.02590018 0.4279904
#> 6 2.25 -0.00742418 0.4616414
msm_results$benchmark_parameters
#> variable lambda
#> 1 white 1.8
#> 2 maemp 1.1
#> 3 scht 3.5
#> 4 qmab 1.4
#> 5 qmab2 3.4
#> 6 qvab 2.4
#> 7 qvab2 2.9
#> 8 paed_u 1.4
#> 9 maed_u 1.3
#> 10 agepa 2.0
#> 11 agema 2.6
#> 12 sib_u 4.8
plot_sensitivity_analysis(
msm_results,
parameter_level = seq(1, 5.5, by = 0.25),
benchmarking = TRUE,
benchmark_variable = c("maemp", "qvab", "scht", "sib_u", "qvab2", "qmab2", "qmab"),
variable_name = c("Mother's Employment", "Reading Score at 7", "Selective School",
"No. of Siblings", "Reading Score at 11", "Math Score at 11",
"Math Score at 7"),
header = "Effect of Education Attainment on Hourly Wages by MSM"
)
This figure displays the variation of point estimate bounds and
confidence interval of unconfounded ATT with respect to different \(\lambda\). The solid bars denote the point
estimate bounds for a specified \(\lambda\) value. The lighter intervals
represent the 95% confidence intervals. Benchmarked parameters selected
for similar observed covariates are provided in light brown. The
critical \(\Lambda*\) is plotted in
dashed line.
Estimated critical \(\Lambda*\) is 2. If the largest possible error that can arise from omitting a confounder is larger than 2, the estimated average treatment effect of the treated units will lose its significance. According to the benchmarking \(\Lambda\) values, math and reading score at 7 are the most important covariates, which is partially revealed by jointVIP. Comparing the critical \(\Lambda*\) and benchmarked values, the average treatment effect for the treated is not robust if there exists some unconfoundedness as strong as twice that of mother’s employment or equally important as reading score at 11 in influencing the variance of weights.
Variance-based model gives more informative bounds in terms of rare unobserved covariates, while marginal sensitivity is historically widely used for its good interpretability. In typical variance-based model, there is one single parameter accounting for the proportion of residual variation in true weights due to the omission of unobserved confounding where the assumption takes the correlation between the outcome and the imbalance in unobserved confounders as 1. However, if researchers have some intuition on the omitted confounding and outcome, they can have a tighter bounds by specifying this correlation in variance-based model, and obtain a tighter bounds on point estimates and confidence intervals.
# Run the sensitivity analysis
vbm_w_cor_results <- run_sensitivity_analysis(
weightlist = weightlist_att,
Y = "wage",
treatment = "Dany",
data = ncds,
approach = "vbm_w_cor",
R2_seq = seq(0, 0.8, by = 0.01),
cor_eps_seq = rep(0.8, 81),
n_bootstrap = 1000,
n_cores = 1,
seed = 331
)
# Critical lambda*
vbm_w_cor_results$Rstar
#> [1] 0.07
# Point bounds by vbm with less conservative bounds
head(vbm_w_cor_results$point_bounds)
#> R2 rho ATT_low ATT_high
#> 1 0.00 0.8 0.2260783 0.2260783
#> 2 0.01 0.8 0.1859928 0.2661637
#> 3 0.02 0.8 0.1691004 0.2830562
#> 4 0.03 0.8 0.1559361 0.2962205
#> 5 0.04 0.8 0.1446643 0.3074922
#> 6 0.05 0.8 0.1345769 0.3175797
# Point bounds by vbm with worst case bounds
head(vbm_results$point_bounds)
#> R2 ATT_low ATT_high
#> 1 0.00 0.22607828 0.2260783
#> 2 0.01 0.15375973 0.2983968
#> 3 0.02 0.12328392 0.3288726
#> 4 0.03 0.09953412 0.3526224
#> 5 0.04 0.07919860 0.3729580
#> 6 0.05 0.06099977 0.3911568
# Point bounds by msm with worst case bounds
head(msm_results$point_bounds)
#> lambda ATT_low ATT_high
#> 1 1.00 0.22607828 0.2260783
#> 2 1.25 0.16005009 0.2913671
#> 3 1.50 0.10663907 0.3443859
#> 4 1.75 0.06279558 0.3893368
#> 5 2.00 0.02590018 0.4279904
#> 6 2.25 -0.00742418 0.4616414
# Confidence intervals by vbm with less conservative bounds
head(vbm_w_cor_results$confidence_intervals)
#> R2 rho ci_low ci_high
#> 1 0.00 0.8 0.16640460 0.2875252
#> 2 0.01 0.8 0.10658844 0.3469118
#> 3 0.02 0.8 0.08170304 0.3737700
#> 4 0.03 0.8 0.06081503 0.3936648
#> 5 0.04 0.8 0.04331116 0.4107221
#> 6 0.05 0.8 0.02640352 0.4262472
# Confidence intervals by vbm with worst case bounds
head(vbm_results$confidence_intervals)
#> R2 ci_low ci_high
#> 1 0.00 0.166404603 0.2875252
#> 2 0.01 0.092935562 0.3623717
#> 3 0.02 0.060688592 0.3948949
#> 4 0.03 0.034658128 0.4196833
#> 5 0.04 0.013557837 0.4426278
#> 6 0.05 -0.005090294 0.4631943
# Confidence intervals by msm with worst case bounds
head(msm_results$confidence_intervals)
#> lambda ci_low ci_high
#> 1 1.00 0.166404603 0.2875252
#> 2 1.25 0.100726540 0.3534666
#> 3 1.50 0.047401891 0.4080163
#> 4 1.75 0.003236563 0.4546177
#> 5 2.00 -0.035025822 0.4962254
#> 6 2.25 -0.069441545 0.5348832When assuming the correlation between the outcome and the imbalance in unobserved confounders is bounded by 0.8, the critical \(R^2_*\) is 0.07 and tighter point bounds and confidence intervals can be obtained. This method gives a more realistic sensitivity analysis if researchers have further information on potentially unobserved confounding.
Huang, M., & Pimentel, S. D. (2025). Variance-based sensitivity analysis for weighting estimators results in more informative bounds. Biometrika, 112(1).
Zhao, Q., Small, D. S. & Bhattacharya, B. B. (2019). Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81.
Battistin, E., & Sianesi, B. (2011). Misclassified Treatment Status and Treatment Effects: An Application to Returns to Education in the United Kingdom. The Review of Economics and Statistics, 93(2), 495–509.
Buuren Sv, Groothuis-Oudshoorn K (2010). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, pp. 1–68. doi:10.18637/jss.v045.i03.