The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Memory-efficient Cox proportional hazards regression via streaming
Newton-Raphson. Peak RAM is O(p^2) in the number of covariates and flat
in the number of rows n, so models fit on datasets that do not fit in
memory. Coefficients are identical to survival::coxph()
with Efron tie correction.
Development version from GitHub:
# install.packages("remotes")
remotes::install_github("tommycarstensen/coxstream-r")
In-memory fit, with the same formula interface as
coxph():
library(survival)
library(coxstream)
fit <- coxstream(Surv(time, status) ~ age + sex, data = lung)
coef(fit)
fit
Out-of-core fit, streaming a time-DESCENDING-sorted parquet file one
row group at a time (requires the optional arrow
package):
fit <- coxstream_arrow(
"events_sorted.parquet",
x_cols = c("age", "sex"),
time_col = "duration",
event_col = "event"
)
coef(fit)
The reader loads one row-group chunk at a time and frees it before the next, so peak RAM stays at O(batch_size * p), flat in n. Efron tie groups that span chunk boundaries are carried in running state, giving coefficients bit-identical to the in-memory fit.
Each Newton-Raphson iteration makes a single descending-time pass to accumulate the Cox partial-likelihood score and Hessian. Only running sums of size O(p) and O(p^2) are held, never the full risk set, so memory does not grow with n. The accumulation kernel is implemented in C++ via Rcpp.
coxstream_arrow()), testthat
(tests)MIT, except src/arrow_c_abi.h, which is vendored from
Apache Arrow under Apache-2.0; see inst/COPYRIGHTS.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.