Estimating ITS parameters from pre-intervention data

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

PITS package

2026-06-23

Overview

Before running a power simulation, you need three nuisance parameters estimated from your pre-intervention time-series data:

Parameter	Symbol	Meaning
Baseline	\(\beta_0\)	Mean outcome at the start of the pre-intervention period
Residual SD	\(\sigma\)	Outcome variability from one time point to the next
Autocorrelation	\(\rho\)	AR(1) correlation between consecutive observations

A fourth quantity, trend_pre, checks whether the outcome was stable before the intervention. It should be close to zero.

These parameters are not the intervention effect — that is your clinical hypothesis (level_change), which must be set separately based on clinical judgement or published evidence.

Using individual estimation functions

PITS provides individual functions for each parameter, useful when you want fine-grained control or wish to estimate only one quantity.

data("example_cfr_data")
outcome <- example_cfr_data$outcome

# Baseline: intercept of a linear trend fit at t = 1
b <- estimate_baseline(outcome)
cat("Baseline:", round(b, 3), "\n")
#> Baseline: 14.696

# Sigma: residual SD after detrending
s <- estimate_sigma(outcome)
cat("Sigma:   ", round(s, 3), "\n")
#> Sigma:    1.514

# Rho: lag-1 autocorrelation of residuals
r <- estimate_rho(outcome)
cat("Rho:     ", round(r, 3), "\n")
#> Rho:      0.371

# Pre-trend: slope per time unit (should be near zero)
t_pre <- estimate_trend(outcome)
cat("Trend:   ", round(t_pre, 4), "per month\n")
#> Trend:    0.0019 per month

When to use individual functions

You have estimates from the literature for some parameters and want to estimate only the remainder
You want to compare estimates across different subsets of the pre-period
You are building a custom simulation pipeline

Using the all-in-one wrapper

For most use cases, estimate_its_params() is the most convenient entry point. It accepts either a data frame or a plain numeric vector.

# From a data frame with 'time' and 'outcome' columns
params <- estimate_its_params(example_cfr_data)
#> 
#> ============================================================
#>   PITS - ITS Parameter Estimation
#> ============================================================ 
#> 
#>   Estimation method:      GLS-ML
#>   Observations (n_pre):   24
#>   Baseline:               14.5853
#>   Sigma (residual SD):    1.4876  (10.2% of baseline)
#>   Rho (AR1):              0.3832
#>   Pre-trend (per unit):   0.0039
#> 
#>  ------------------------------------------------------------ 
#>   Copy these into calculate_power():
#> 
#>     n_pre        = 24
#>     baseline     = 14.5853
#>     sigma        = 1.4876
#>     rho          = 0.3832
#>     level_change = ???  # YOUR clinical hypothesis
#>     slope_change = 0    # set if testing slope
#> 
#> ============================================================

# Or from a plain numeric vector (time index auto-generated)
params2 <- estimate_its_params(example_cfr_data$outcome, verbose = FALSE)
identical(params$sigma, params2$sigma)
#> [1] TRUE

The returned list plugs directly into calculate_power():

result <- calculate_power(
  n_pre        = params$n_pre,
  n_post       = 30,
  baseline     = params$baseline,
  level_change = -3,          # <-- your clinical hypothesis
  sigma        = params$sigma,
  rho          = params$rho
)

Custom column names

If your data uses different column names, pass them explicitly:

# Rename for illustration
alt_data <- example_cfr_data
names(alt_data) <- c("month", "cfr_pct")

params_alt <- estimate_its_params(
  alt_data,
  outcome_col = "cfr_pct",
  time_col    = "month",
  verbose     = FALSE
)
cat("Sigma:", round(params_alt$sigma, 3), "\n")
#> Sigma: 1.488

Diagnostic plots

Always inspect the pre-intervention data before trusting parameter estimates. diagnose_params() produces a 2×2 panel covering the four key checks:

{r diag-full, fig.width = 8, fig.height = 6, fig.cap = "Diagnostic panel: (1) observed series with fitted trend — checks for outliers and non-linearity; (2) residuals over time — checks for heteroscedasticity; (3) Q-Q plot — checks for normality of residuals; (4) residual ACF — quantifies autocorrelation structure."} out <- diagnose_params(example_cfr_data)

What to look for:

Observed + trend: a roughly flat or very gently sloped trend line. A steep pre-trend (> 5% of baseline per period) may indicate confounding.
Residuals over time: random scatter around zero, no funnelling or systematic patterns.
Q-Q plot: points close to the diagonal. Heavy tails are acceptable for monthly rates; severe departures suggest the Gaussian model may not hold and counts-based modelling (Poisson/Negative Binomial) should be considered.
Residual ACF: a single large bar at lag 1 followed by rapid decay is consistent with AR(1). Multiple large bars suggest higher-order autocorrelation; consider a longer pre-period or seasonal adjustment.

What if I have no pre-intervention data?

If historical data are not available, use simulate_predata() to generate a synthetic pre-intervention series and explore how different parameter assumptions affect power. Then use the literature or expert elicitation to justify your choices.

# Simulate pre-intervention data with assumed parameters
pre_synthetic <- simulate_predata(
  n        = 24,
  baseline = 15,
  sigma    = 2.0,
  rho      = 0.40,
  seed     = 42
)
head(pre_synthetic)
#>   time  outcome
#> 1    1 17.51301
#> 2    2 14.97009
#> 3    3 15.65366
#> 4    4 16.42152
#> 5    5 16.30964
#> 6    6 15.32933

plot(pre_synthetic$time, pre_synthetic$outcome,
     type = "o", pch = 16, col = "steelblue",
     xlab = "Time", ylab = "Outcome",
     main = "Synthetic pre-intervention data",
     las = 1, bty = "l")
abline(h = 15, lty = 2, col = "grey50")
grid(NULL, NULL, lwd = 0.6, col = rgb(0, 0, 0, 0.1))

Synthetic pre-intervention series (baseline = 15, sigma = 2, rho = 0.40).

You can then recover the estimated parameters from this synthetic series to verify your simulation is self-consistent:

recovered <- estimate_its_params(pre_synthetic, verbose = FALSE)
cat(sprintf("Target  → baseline: 15.00  sigma: 2.000  rho: 0.400\n"))
#> Target  → baseline: 15.00  sigma: 2.000  rho: 0.400
cat(sprintf("Recovered → baseline: %.2f  sigma: %.3f  rho: %.3f\n",
            recovered$baseline, recovered$sigma, recovered$rho))
#> Recovered → baseline: 17.52  sigma: 2.246  rho: 0.279

Small discrepancies are expected from a single 24-point realisation. Longer pre-periods (n = 48–60) yield more reliable estimates.

Handling non-standard situations

Date-indexed time column

estimate_its_params() accepts date columns and converts them to a sequential integer index automatically:

# If your data has a 'date' column of class Date:
params <- estimate_its_params(
  my_data,
  outcome_col = "cfr",
  time_col    = "date"     # Date objects are handled automatically
)

Missing values

Missing values in the outcome are removed with a warning before fitting:

data_with_na <- example_cfr_data
data_with_na$outcome[c(5, 12)] <- NA

params_na <- suppressWarnings(
  estimate_its_params(data_with_na, verbose = FALSE)
)
cat("n_pre after removing NAs:", params_na$n_pre, "\n")
#> n_pre after removing NAs: 22

Very short pre-periods

Fewer than 12 observations produces a warning. Estimates become increasingly unreliable as the pre-period shrinks, and are essentially meaningless below n = 8.

params_short <- estimate_its_params(
  example_cfr_data$outcome[1:9],
  verbose = FALSE
)
#> Warning in estimate_its_params(example_cfr_data$outcome[1:9], verbose = FALSE):
#> Only 9 observations available. Estimates will be unreliable; aim for >= 24.

If you genuinely have fewer than 12 pre-intervention observations, consider whether the ITS design is appropriate, or whether a controlled ITS (with a comparison series) would be more robust.

Typical parameter ranges for monthly hospital data

As a reference when pre-intervention data are unavailable:

Parameter	Typical range	Notes
`sigma`	10–20% of baseline	Higher for small hospitals or rare outcomes
`rho` (monthly)	0.30–0.50	Use 0.40 as a conservative default
`rho` (weekly)	0.50–0.80	Consider aggregating to monthly
`rho` (daily)	0.70–0.90	Strong case for monthly aggregation

References

Lopez Bernal J, et al. (2017). Int J Epidemiol 46:348–355.
Wagner AK, et al. (2002). J Clin Pharm Ther 27:299–309.
Zhang F, et al. (2011). J Clin Epidemiol 64:1252–1261.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.