The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Package {coxstream}


Title: Memory-Efficient Cox Proportional Hazards via Streaming Newton-Raphson
Version: 0.1.0
Description: Fits the Cox proportional hazards model using a single descending-order pass per Newton-Raphson iteration. Peak RAM is O(p^2) regardless of the number of rows, making it suitable for datasets that do not fit in memory. Produces identical coefficients to survival::coxph() with Efron tie correction.
URL: https://github.com/tommycarstensen/coxstream-r
BugReports: https://github.com/tommycarstensen/coxstream-r/issues
License: MIT + file LICENSE
Encoding: UTF-8
Imports: Rcpp, survival
LinkingTo: Rcpp
Suggests: arrow, testthat (≥ 3.0.0)
Config/testthat/edition: 3
Config/roxygen2/version: 8.0.0
NeedsCompilation: yes
Packaged: 2026-06-14 21:24:20 UTC; tommy
Author: Tommy Carstensen ORCID iD [aut, cre], Apache Software Foundation [cph, ctb] (vendored Arrow C Data/Stream interface header (src/arrow_c_abi.h), Apache-2.0)
Maintainer: Tommy Carstensen <cran@tommycarstensen.com>
Repository: CRAN
Date/Publication: 2026-06-20 13:40:11 UTC

coxstream: Memory-Efficient Cox Proportional Hazards via Streaming Newton-Raphson

Description

Fits the Cox proportional hazards model using a single descending-order pass per Newton-Raphson iteration. Peak RAM is O(p^2) regardless of the number of rows, making it suitable for datasets that do not fit in memory. Produces identical coefficients to survival::coxph() with Efron tie correction.

Author(s)

Maintainer: Tommy Carstensen cran@tommycarstensen.com (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Fit a Cox proportional hazards model via streaming Newton-Raphson

Description

Fits the Cox PH model using a single descending-time-order pass per Newton-Raphson iteration. Peak RAM is O(p^2) regardless of n, making it suitable for large datasets. Produces identical coefficients to survival::coxph() with Efron tie correction.

Usage

coxstream(
  formula,
  data,
  init = NULL,
  max_iter = 25L,
  tol = 1e-09,
  verbose = FALSE
)

Arguments

formula

A formula with a survival::Surv() response, e.g. Surv(time, event) ~ x1 + x2.

data

A data frame containing the variables in formula.

init

Optional numeric vector of starting values for beta (length p). Defaults to zero.

max_iter

Maximum Newton-Raphson iterations. Default 25.

tol

Convergence tolerance on the max absolute score element. Default 1e-9.

verbose

Currently unused; reserved for future per-iteration output. Default FALSE.

Value

An object of class "coxstream" with components:

coefficients

Named numeric vector of fitted coefficients.

var

Variance-covariance matrix (inverse of observed information).

loglik

Log-likelihood at convergence.

n_iter

Number of NR iterations taken.

n

Number of rows.

formula

The formula used.

call

The matched call.

Examples

library(survival)
fit <- coxstream(Surv(time, status) ~ age + sex, data = lung)
coef(fit)


Fit a Cox PH model by streaming a DESC-sorted parquet file

Description

Like coxstream() but reads data row-group by row-group from parquet. Peak RAM is O(batch_size * p) for the active chunk plus O(p^2) for the carry state, independent of total n. Uses exact Efron tie correction: tie groups that span row-group boundaries are handled via local carry state, giving bit-identical coefficients to coxstream() on any data.

Usage

coxstream_arrow(
  parquet_path,
  x_cols,
  time_col = "duration",
  event_col = "event",
  init = NULL,
  max_iter = 25L,
  tol = 1e-08,
  batch_size = 250000L,
  verbose = TRUE
)

Arguments

parquet_path

Path to a parquet file sorted by time DESCENDING.

x_cols

Character vector of covariate column names.

time_col

Column name for event/censoring time. Default "duration".

event_col

Column name for event indicator (1 = event). Default "event".

init

Optional starting values for beta (length p). Default zero.

max_iter

Maximum NR iterations. Default 25.

tol

Convergence tolerance on ||NR step|| (L2 norm of beta update). Default 1e-8. Same criterion as the Python coxstream implementations.

batch_size

Target rows per read call. Consecutive row groups are merged until the total reaches this size, then freed (with a gc()) before the next is read, so peak RAM is O(batch_size * p), flat in n. The default 250 000 keeps RAM genuinely flat; larger chunks are slightly faster but let the allocator's high-water ratchet up, so RAM regains a mild upward drift.

verbose

Print per-iteration progress. Default TRUE.

Details

Each NR iteration reads one row-group chunk at a time with mmap = FALSE (pread into heap buffers freed after each chunk – a memory-mapped reader would instead leave every touched file page resident for the mapping's lifetime, making RSS grow O(n)). Each chunk is exported to a C ArrowArrayStream and consumed zero-copy in C++ by efron_stream_chunk_inplace(), with the Efron tie-state carried across chunks in R – no R-level column materialisation (as.vector / cbind / concat_tables), which is what previously left a ~1.5x gap behind the Python streaming path.

Value

A "coxstream" object (same class as coxstream()).

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.