The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

coxstream

Memory-efficient Cox proportional hazards regression via streaming Newton-Raphson. Peak RAM is O(p^2) in the number of covariates and flat in the number of rows n, so models fit on datasets that do not fit in memory. Coefficients are identical to survival::coxph() with Efron tie correction.

coxstream (Python and R) holds peak RAM flat as the cohort grows, while in-memory solvers (lifelines, survival::coxph) scale with n; coefficients agree to machine precision.

Installation

Development version from GitHub:

# install.packages("remotes")
remotes::install_github("tommycarstensen/coxstream-r")

Usage

In-memory fit, with the same formula interface as coxph():

library(survival)
library(coxstream)

fit <- coxstream(Surv(time, status) ~ age + sex, data = lung)
coef(fit)
fit

Out-of-core fit, streaming a time-DESCENDING-sorted parquet file one row group at a time (requires the optional arrow package):

fit <- coxstream_arrow(
    "events_sorted.parquet",
    x_cols    = c("age", "sex"),
    time_col  = "duration",
    event_col = "event"
)
coef(fit)

The reader loads one row-group chunk at a time and frees it before the next, so peak RAM stays at O(batch_size * p), flat in n. Efron tie groups that span chunk boundaries are carried in running state, giving coefficients bit-identical to the in-memory fit.

How it works

Each Newton-Raphson iteration makes a single descending-time pass to accumulate the Cox partial-likelihood score and Hessian. Only running sums of size O(p) and O(p^2) are held, never the full risk set, so memory does not grow with n. The accumulation kernel is implemented in C++ via Rcpp.

Dependencies

License

MIT, except src/arrow_c_abi.h, which is vendored from Apache Arrow under Apache-2.0; see inst/COPYRIGHTS.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.