Introduction to CausalSpline: Nonlinear Causal Dose-Response Estimation

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Your Name

2026-03-21

Why nonlinear causal effects?

Most causal inference software assumes the treatment effect enters linearly:

\[Y = \beta_0 + \beta_1 T + \gamma X + \varepsilon\]

But many real-world relationships are genuinely nonlinear:

Dosage effects in pharmacology (toxicity thresholds)
Pollution-health curves (non-monotone at extreme exposures)
Education/income (diminishing returns)
Policy intensity (minimum effective dose, saturation)

CausalSpline replaces the linear \(\beta_1 T\) with a spline \(f(T)\):

\[Y = \beta_0 + f(T) + \gamma X + \varepsilon, \quad E[Y(t)] = \beta_0 + f(t)\]

under standard unconfoundedness and positivity assumptions.

Installation

# From CRAN (once published)
install.packages("CausalSpline")

# Development version from GitHub
remotes::install_github("yourgithub/CausalSpline")

Quick start

library(CausalSpline)

1. Simulate data with a threshold effect

set.seed(42)
dat <- simulate_dose_response(n = 600, dgp = "threshold", confounding = 0.6)
head(dat)
#>          Y        T         X1         X2 X3 true_effect
#> 1 4.676083 5.251499  1.3709584 -0.2484829  0    4.502997
#> 2 3.505122 4.493596 -0.5646982  0.4223204  0    2.987192
#> 3 2.676237 4.328187  0.3631284  0.9876533  1    2.656375
#> 4 5.245936 5.958644  0.6328626  0.8355682  1    5.917287
#> 5 3.958747 4.996158  0.4042683 -0.6605219  1    3.992315
#> 6 3.086915 4.992211 -0.1061245  1.5640695  1    3.984422

The true dose-response is flat below \(T = 3\), then rises linearly:

plot(dat$T, dat$Y, pch = 16, col = rgb(0, 0, 0, 0.2),
     xlab = "Treatment T", ylab = "Outcome Y",
     main = "Observed data (confounded)")
lines(sort(dat$T), dat$true_effect[order(dat$T)],
      col = "red", lwd = 2)
legend("topleft", legend = "True f(T)", col = "red", lwd = 2)

True vs observed relationship

2. Fit with IPW

fit_ipw <- causal_spline(
  Y ~ T | X1 + X2 + X3,
  data       = dat,
  method     = "ipw",
  df_exposure = 5,
  eval_grid  = 100
)
summary(fit_ipw)
#> === CausalSpline Summary ===
#> 
#> Call:
#>   causal_spline(formula = Y ~ T | X1 + X2 + X3, data = dat, method = "ipw", 
#>     df_exposure = 5, eval_grid = 100)
#> 
#> Estimation method  : IPW 
#> Spline type        : ns 
#> df (exposure)      : 5 
#> Interior knots     : 4.278 4.887 5.349 5.965 
#> 
#> Treatment variable:
#>   n = 600 
#>   Mean = 5.133   SD = 0.994 
#>   Range = [ 1.928 , 8.586 ]
#> 
#> IPW diagnostics:
#>   ESS = 406 / n = 600 ( 67.7 %)
#>   Weight range = [ 0.119 , 4.978 ]
#> 
#> Dose-response curve (selected percentiles):
#>      t estimate     se   lower   upper
#>  1.928  -1.0895 0.6925 -2.4468  0.2678
#>  3.072   0.5492 0.1771  0.2021  0.8963
#>  4.147   2.3213 0.1231  2.0800  2.5626
#>  5.223   4.7107 0.0928  4.5289  4.8926
#>  6.366   6.7306 0.1498  6.4370  7.0243
#>  7.510   8.9276 0.5554  7.8390 10.0162
#>  8.586  11.2355 1.5384  8.2204 14.2507

# Build true curve data frame for overlay
truth_df <- data.frame(
  t           = dat$T,
  true_effect = dat$true_effect
)
plot(fit_ipw, truth = truth_df)

IPW estimated dose-response with 95% CI

3. Fit with G-computation

fit_gc <- causal_spline(
  Y ~ T | X1 + X2 + X3,
  data        = dat,
  method      = "gcomp",
  df_exposure = 5
)
plot(fit_gc, truth = truth_df,
     title = "G-computation — Threshold DGP")

4. Check overlap (positivity)

ov <- check_overlap(dat$T, fit_ipw$weights, plot = TRUE)
cat("ESS:", round(ov$ess), "/ n =", nrow(dat), "\n")
#> ESS: 406 / n = 600
ov$plot

Comparing DGPs

dgps <- c("threshold", "diminishing", "nonmonotone", "sinusoidal")
plots <- lapply(dgps, function(d) {
  dat_d <- simulate_dose_response(500, dgp = d, seed = 1)
  fit_d <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat_d,
                          method = "ipw", df_exposure = 5,
                          verbose = FALSE)
  truth_d <- data.frame(t = dat_d$T, true_effect = dat_d$true_effect)
  plot(fit_d, truth = truth_d,
       title = paste("DGP:", d),
       rug = FALSE)
})

# Combine with patchwork (if available) or print individually
for (p in plots) print(p)

Choosing degrees of freedom

The df_exposure argument controls spline flexibility. Too few df = underfitting; too many = high variance. As a guide:

Shape	Recommended df
Linear / simple trend	3
One bend / threshold	4–5
Inverted-U / hump	5–6
Oscillatory	7–10

You can use AIC/BIC on the outcome model or cross-validation for selection.

Methods summary

Argument	Consistent if …
`method = "ipw"`	GPS model correctly specified
`method = "gcomp"`	Outcome model correctly specified
`method = "dr"`	At least one of the two models is correct

References

Hirano, K., & Imbens, G. W. (2004). The propensity score with continuous treatments. doi:10.1002/0470090456.ch7
Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika.
Flores et al. (2012). Estimating the effects of length of exposure to instruction. Review of Economics and Statistics.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.