The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
WMAP (Weighted Multi-site Analysis Package) integrates evidence from multiple observational studies with different group compositions using balancing weights.
# Load demonstration dataset
data(demo)
# The demo data contains:
# S - Study/site indicators (7 hospitals)
# Z - Group indicators (2 groups: IDC vs ILC)
# X - Patient covariates (clinical variables)
# Y - Outcomes (8 gene expression measurements)
# naturalGroupProp - Known population group proportions# Estimate causal effects
result <- causal.estimate(
S = S, # Study indicators
Z = Z, # Group indicators
X = X, # Covariates
Y = Y, # Outcomes
B = 20, # Bootstrap samples (use 100+ in practice)
method = "IC", # Weighting method
naturalGroupProp = naturalGroupProp
)
#> *************************
#> Bootstrap: 1
#> *************************
#> *************************
#> Bootstrap: 2
#> *************************
#> *************************
#> Bootstrap: 3
#> *************************
#> *************************
#> Bootstrap: 4
#> *************************
#> *************************
#> Bootstrap: 5
#> *************************
#> *************************
#> Bootstrap: 6
#> *************************
#> *************************
#> Bootstrap: 7
#> *************************
#> *************************
#> Bootstrap: 8
#> *************************
#> *************************
#> Bootstrap: 9
#> *************************
#> *************************
#> Bootstrap: 10
#> *************************
#> *************************
#> Bootstrap: 11
#> *************************
#> *************************
#> Bootstrap: 12
#> *************************
#> *************************
#> Bootstrap: 13
#> *************************
#> *************************
#> Bootstrap: 14
#> *************************
#> *************************
#> Bootstrap: 15
#> *************************
#> *************************
#> Bootstrap: 16
#> *************************
#> *************************
#> Bootstrap: 17
#> *************************
#> *************************
#> Bootstrap: 18
#> *************************
#> *************************
#> Bootstrap: 19
#> *************************
#> *************************
#> Bootstrap: 20
#> *************************# Print summary
summary(result)
#> ===========================================
#> Summary of WMAP Causal Estimates
#> ===========================================
#> Method: IC
#> Bootstrap samples: 20
#> Effective sample size (ESS): 39.44%
#>
#> Mean differences with 95% CI:
#> -------------------------------------------
#> Outcome Estimate (95% CI)
#> -------------------------------------------
#> Y 1 0.12 (-0.32, 0.08)
#> Y 2 -0.76 (-0.82, -0.34)
#> Y 3 -0.88 (-1.07, -0.57)
#> Y 4 -0.20 (-0.21, 0.26)
#> Y 5 0.33 ( 0.19, 0.45)
#> Y 6 -0.53 (-0.53, -0.14)
#> Y 7 0.01 (-0.22, 0.18)
#> Y 8 -0.33 (-0.89, -0.36)
#>
#> Standard Deviation Ratios with 95% CI:
#> -------------------------------------------
#> Outcome Ratio (95% CI)
#> -------------------------------------------
#> Y 1 1.61 ( 1.39, 1.85)
#> Y 2 1.43 ( 0.88, 1.37)
#> Y 3 1.40 ( 0.97, 1.55)
#> Y 4 1.68 ( 1.02, 1.50)
#> Y 5 1.76 ( 1.21, 2.07)
#> Y 6 1.08 ( 0.95, 1.32)
#> Y 7 0.73 ( 0.29, 3.35)
#> Y 8 1.43 ( 0.57, 0.99)
# Visualize bootstrap ESS distribution
plot(result)Effective Sample Size (ESS): The Percent ESS indicates how much information remains after weighting:
Mean Differences: Shows estimated differences between Group 1 and Group 2 with 95% confidence intervals.
SD Ratios: Ratios of standard deviations between groups.
WMAP provides three methods:
# Compare methods
result_ic <- causal.estimate(S, Z, X, Y, B = 100, method = "IC", naturalGroupProp)
result_igo <- causal.estimate(S, Z, X, Y, B = 100, method = "IGO", naturalGroupProp)
result_flexor <- causal.estimate(S, Z, X, Y, B = 100, method = "FLEXOR", naturalGroupProp)
# Compare ESS
result_ic$percentESS
result_igo$percentESS
result_flexor$percentESSBy default, causal.estimate uses observed outcomes
directly (outcome_model = FALSE). Set
outcome_model = TRUE to fit a random forest on X, S, Z and
substitute predicted outcomes for observed Y in all moment calculations
(mean, SD, median).
# Default: use observed outcomes directly
result <- causal.estimate(S, Z, X, Y, B = 100, method = "IC",
naturalGroupProp = naturalGroupProp)
# RF predictions substituted for observed Y
result <- causal.estimate(S, Z, X, Y, B = 100, method = "IC",
naturalGroupProp = naturalGroupProp,
outcome_model = TRUE)When outcome_model = TRUE, a random forest is fit per
outcome per bootstrap iteration, which increases runtime.
By default (crossfit_folds = NULL), the random forest is
fit on the full sample. Set crossfit_folds to an integer
>= 2 for K-fold cross-fitting:
# Minimum for stable CIs
result <- causal.estimate(S, Z, X, Y, B = 50, method = "IC", naturalGroupProp)
# Recommended for publication
result <- causal.estimate(S, Z, X, Y, B = 200, method = "IC", naturalGroupProp)
# High precision
result <- causal.estimate(S, Z, X, Y, B = 500, method = "IC", naturalGroupProp)Trade-off: More bootstrap samples = better precision but longer computation time
Skip bootstrap for quick exploration:
Speed up computation using multiple cores:
# Install required packages first
install.packages(c("future", "future.apply"))
# Define n_cores based on your system (e.g., 4)
result <- causal.estimate(
S, Z, X, Y,
B = 100,
method = "IC",
naturalGroupProp = naturalGroupProp,
parallel = TRUE,
n_cores = 4
)The parallelization strategy depends on the environment. On
Mac/Linux, multicore is used: workers are created by
duplicating the current R process, so data is already in memory and no
copying is required. On Windows, multisession is used: each
worker is a fresh R process that receives a copy of the data, which has
higher startup overhead. RStudio disables multicore
regardless of platform, so multisession will be used when
running inside RStudio. For best performance on Mac/Linux, run your
script from the terminal using Rscript or an interactive R
session.
WMAP automatically checks for common problems and warns you:
# 1. Load and check data
data(demo)
table(S, Z) # Check all cells have observations
sum(is.na(cbind(S, Z, X, Y))) # Check for missing values
# 2. Run analysis
result <- causal.estimate(
S, Z, X, Y,
B = 100,
method = "FLEXOR",
naturalGroupProp = naturalGroupProp
)
# 3. Check diagnostics
summary(result)
plot(result)# Compare all three methods
set.seed(123)
results_ic <- causal.estimate(S, Z, X, Y, B = 100, method = "IC", naturalGroupProp)
results_igo <- causal.estimate(S, Z, X, Y, B = 100, method = "IGO", naturalGroupProp)
results_flexor <- causal.estimate(S, Z, X, Y, B = 100, method = "FLEXOR", naturalGroupProp)
# Compare ESS
cat("IC ESS:", results_ic$percentESS, "%\n")
cat("IGO ESS:", results_igo$percentESS, "%\n")
cat("FLEXOR ESS:", results_flexor$percentESS, "%\n")
# Compare estimates
summary(results_ic)
summary(results_igo)
summary(results_flexor)# Test sensitivity to bootstrap sample size
result_b50 <- causal.estimate(S, Z, X, Y, B = 50, method = "IGO", naturalGroupProp)
result_b100 <- causal.estimate(S, Z, X, Y, B = 100, method = "IGO", naturalGroupProp)
result_b200 <- causal.estimate(S, Z, X, Y, B = 200, method = "IGO", naturalGroupProp)
# Compare CI widths
summary(result_b50)
summary(result_b100)
summary(result_b200)Extract balancing weights without running full causal analysis:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.