07. Parametric Survival Models

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

While the Cox model is semi-parametric (it leaves the baseline hazard unspecified), fully parametric models assume that survival times follow a specific mathematical distribution, such as the Weibull, Exponential, or Log-Normal distribution.

Parametric models are statistically powerful because they produce perfectly smooth survival curves. However, they are highly brittle: if you assume the data follows a Weibull distribution, but the true biological hazard has a completely different shape, the model will be heavily biased.

SuperSurv acts as a safety net. You can include multiple parametric assumptions in your library. If a parametric assumption perfectly matches your data, SuperSurv will give it a high weight. If the assumption is wrong, the cross-validation risk will spike, and SuperSurv will safely assign it a weight of zero.

1. Setup and Library Definition

library(SuperSurv)
library(survival)

data("metabric", package = "SuperSurv")
set.seed(42)

train_idx <- sample(1:nrow(metabric), 0.7 * nrow(metabric))
train <- metabric[train_idx, ]
test  <- metabric[-train_idx, ]

X_tr <- train[, grep("^x", names(metabric))]
X_te <- test[, grep("^x", names(metabric))]
new.times <- seq(50, 200, by = 25)

# Define a library covering different parametric assumptions
parametric_library <- c("surv.coxph",       # Semi-parametric baseline
                        "surv.weibull",     # Assumes hazard increases/decreases monotonically
                        "surv.exponential", # Assumes constant hazard over time
                        "surv.lognormal")   # Assumes hazard rises then falls

2. Fitting the Parametric Ensemble

We run the ensemble exactly as before. Internally, SuperSurv will fit these Accelerated Failure Time (AFT) models and map their continuous survival predictions onto our discrete new.times evaluation grid.

fit_parametric <- SuperSurv(
  time = train$duration,
  event = train$event,
  X = X_tr,
  newdata = X_te,
  new.times = new.times,
  event.library = parametric_library,
  cens.library = c("surv.coxph"),
  control = list(saveFitLibrary = TRUE),
  verbose = FALSE,
  selection = "ensemble",
  nFolds = 3
)

3. Evaluating the “Safety Net”

Let’s look at the cross-validated risks and the final meta-learner weights.

summary(fit_parametric)
#> Summary of SuperSurv fit
#>   Selection: ensemble 
#> 
#> Call:
#> SuperSurv(time = train$duration, event = train$event, X = X_tr, 
#>     newdata = X_te, new.times = new.times, event.library = parametric_library, 
#>     cens.library = c("surv.coxph"), verbose = FALSE, control = list(saveFitLibrary = TRUE), 
#>     selection = "ensemble", nFolds = 3)
#> 
#> Event ensemble:
#>                      learner weight   risk status
#>        surv.coxph_screen.all 0.9979 8.8384     ok
#>      surv.weibull_screen.all 0.0000 8.8420     ok
#>    surv.lognormal_screen.all 0.0021 8.8444     ok
#>  surv.exponential_screen.all 0.0000 8.8508     ok
#> 
#> Censoring ensemble:
#>                learner weight   risk status
#>  surv.coxph_screen.all      1 1.0488     ok
#> 
#> Predictions: 572 observations x 7 times
#> Evaluation times: 7 values from 50 to 200 
#> Elapsed time (seconds):
#> everything      train    predict 
#>      0.699      0.654      0.044

Interpretation

Look closely at the weights assigned to surv.exponential. The Exponential distribution assumes that the risk of the event (hazard) is completely constant over time. In real-world cancer datasets like metabric, this assumption is almost always false (risk usually increases with time or peaks shortly after surgery).

Because the Exponential assumption fits the data poorly, its cross-validated risk will be high, and SuperSurv will smartly assign it a weight of \(0.00\).

By including parametric models in your SuperSurv library, you allow the data—not the researcher—to dictate which mathematical distributions are actually appropriate for your patient cohort!

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.