The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
MLCausal provides a tidy, end-to-end pipeline for causal inference in clustered (multilevel) observational data — students in schools, patients in hospitals, employees in firms.
The standard API uses five consistent argument names across every function:
| Argument | Meaning |
|---|---|
treatment |
Name of the 0/1 treatment variable |
outcome |
Name of the outcome variable |
covariates |
Character vector of covariate names |
cluster |
Name of the cluster identifier |
weights |
Name of the weight variable |
The main workflow:
simulate_ml_data() → ml_ps() → ml_weight() or ml_match()
→ balance_ml() → estimate_att_ml() → sens_ml()
library(MLCausal)
dat <- simulate_ml_data(n_clusters = 20, cluster_size = 25,
n_min = 10, seed = 42)
head(dat)
#> school_id x1 x2 x3 z y
#> 1 1 0.1602218 1 1.19316091 1 3.601848
#> 2 1 -1.3408750 0 1.87422963 1 2.140405
#> 3 1 -0.5094501 0 1.19309386 1 2.490992
#> 4 1 -3.4929064 0 1.98230859 0 1.345384
#> 5 1 1.4987760 1 -0.80447377 1 2.539185
#> 6 1 0.6252433 0 0.07808106 0 1.972232
table(dat$z)
#>
#> 0 1
#> 300 225The true ATT is approximately 0.5 (slightly above because treated units are over-represented in high-effect clusters).
ps <- ml_ps(
data = dat,
treatment = "z",
covariates = c("x1", "x2", "x3"),
cluster = "school_id",
method = "mundlak",
estimand = "ATT"
)
print(ps)
#> MLCausal propensity score model
#> Method: mundlak
#> Estimand: ATT
#> N: 525
#> Clusters: 20dat_w <- ml_weight(ps, estimand = "ATT", stabilize = TRUE, trim = 10)
#> 1 weight(s) Winsorised to upper bound of 10.
summary(dat_w$weights)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.006576 0.245367 1.000000 0.767135 1.000000 10.000000bal <- balance_ml(
data = dat_w,
treatment = "z",
covariates = c("x1", "x2", "x3"),
cluster = "school_id",
weights = "weights"
)
print(bal)
#> MLCausal balance diagnostics
#>
#> Individual-level SMDs:
#> variable level smd
#> x1 individual -0.0789
#> x2 individual -0.0004
#> x3 individual -0.0927
#>
#> Cluster-mean SMDs:
#> variable level smd
#> x1 cluster_mean 1.1713
#> x2 cluster_mean 0.0193
#> x3 cluster_mean 1.6173Individual-level SMDs are numeric. Cluster-mean SMDs are character strings: either a formatted number or a descriptive message if not estimable.
est <- estimate_att_ml(
data = dat_w,
outcome = "y",
treatment = "z",
cluster = "school_id",
covariates = c("x1", "x2", "x3"),
weights = "weights"
)
print(est)
#> MLCausal treatment effect estimate
#> Estimate: 0.8196
#> SE: 0.137
#> p-value: 4.069e-09sens <- sens_ml(estimate = est$estimate, se = est$se)
sens[sens$crosses_null, ][1, ]
#> confounder_strength adjusted_estimate original_z adjusted_z crosses_null
#> 42 4.1 0.2579827 5.983486 1.883486 TRUEThe lambda parameter in ml_match() is the
core innovation: it adds a cluster-mean balance penalty to the matching
distance, so matches are chosen to improve balance at both the
individual and cluster-mean levels simultaneously.
# lambda = 0 → standard PS matching
# lambda = 1 → equal weight on PS distance and cluster-mean balance (default)
# lambda > 1 → prioritise cluster-mean balance over PS proximity
matched <- ml_match(ps, ratio = 1, caliper = 0.5, lambda = 1)
#> 92 treated unit(s) could not be matched and were excluded from the matched sample.
print(matched)
#> MLCausal matched sample (dual-balance)
#> Matched rows: 266
#> Matched sets: 133
#> Clusters used: 19
#> Unmatched (trt): 92
#> Lambda (balance weight): 1
#> Caliper (logit-PS): 0.5
bal_m <- balance_ml(
data = matched$data_matched,
treatment = "z",
covariates = c("x1", "x2", "x3"),
cluster = "school_id",
weights = "match_weight"
)
print(bal_m)
#> MLCausal balance diagnostics
#>
#> Individual-level SMDs:
#> variable level smd
#> x1 individual 0.0059
#> x2 individual 0.0301
#> x3 individual 0.0417
#>
#> Cluster-mean SMDs:
#> variable level
#> x1 cluster_mean
#> x2 cluster_mean
#> x3 cluster_mean
#> smd
#> Cluster-level SMD not estimable due to insufficient within-cluster variation after matching.
#> Cluster-level SMD not estimable due to insufficient within-cluster variation after matching.
#> Cluster-level SMD not estimable due to insufficient within-cluster variation after matching.
est_m <- estimate_att_ml(
data = matched$data_matched,
outcome = "y",
treatment = "z",
cluster = "school_id",
covariates = c("x1", "x2", "x3"),
weights = "match_weight"
)
print(est_m)
#> MLCausal treatment effect estimate
#> Estimate: 0.5177
#> SE: 0.1203
#> p-value: 2.372e-05Compare bal (weighting) and bal_m
(dual-balance matching): the matching approach should show lower
cluster-mean SMDs, demonstrating the benefit of the composite
distance.
| Step | Function | Key argument change |
|---|---|---|
| Simulate | simulate_ml_data() |
n_min prevents tiny clusters |
| PS model | ml_ps() |
treatment = (not treat) |
| Weight | ml_weight() |
output column is weights |
| Match | ml_match() |
lambda = for dual-balance |
| Balance | balance_ml() |
treatment =; cluster SMD is string not NA |
| Estimate | estimate_att_ml() |
treatment =, weights = |
| Sensitivity | sens_ml() |
default q extended to 5 |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.