Repository Mirror for your Cloud Server and Webhosting

Title:

Regression with Interval-Censored Covariates

Version:

0.2.0

Description:

Provides functions to simulate and analyze data for a regression model with an interval censored covariate, as described in Morrison et al. (2021) <doi:10.1111/biom.13472>.

License:

MIT + file LICENSE

Encoding:

UTF-8

VignetteBuilder:

knitr

Config/testthat/edition:

Imports:

biglm, dplyr (≥ 1.1.0), lubridate, stats, lobstr, arm, ggplot2, scales, plotly, rlang, ggrepel

Suggests:

spelling, rmarkdown, knitr, testthat (≥ 3.0.0), markdown, pander, vdiffr, withr, badger

Language:

en-US

URL:

https://d-morrison.github.io/rwicc/, https://github.com/d-morrison/rwicc

BugReports:

https://github.com/d-morrison/rwicc/issues

Depends:

R (≥ 4.1.0)

Config/roxygen2/version:

8.0.0

NeedsCompilation:

Packaged:

2026-05-27 06:40:28 UTC; ezramorrison

Author:

Douglas Ezra Morrison

[aut, cre, cph], Ron Brookmeyer [aut]

Maintainer:

Douglas Ezra Morrison <demorrison@ucdavis.edu>

Repository:

CRAN

Date/Publication:

2026-05-27 07:10:02 UTC

rwicc: Regression with Interval-Censored Covariates

Description

Provides functions to simulate and analyze data for a regression model with an interval censored covariate, as described in Morrison et al. (2021) doi:10.1111/biom.13472.

Author(s)

Maintainer: Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]

Authors:

Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]
Ron Brookmeyer

Build table of event date possibilities

Description

Build table of event date possibilities

Usage

build_event_date_possibilities_table(
  participant_level_data,
  bin_width = 1,
  omega_hat = build_omega_table(participant_level_data, bin_width = bin_width)
)

Arguments

participant_level_data

a data.frame with columns:

ID: participant identifier
Stratum: indicator for which population stratum the participant belongs to
L: left censoring interval endpoint
R: right censoring interval endpoint

bin_width

number of days per bin

omega_hat

a data.frame from build_omega_table() representing the seroconversion hazard model.

Value

a data.frame with columns:

ID: participant identifier
Stratum: indicator for which population stratum the participant belongs to
S: possible seroconversion dates

Examples

library(dplyr)
simulate_interval_censoring()$pt_data |>
  mutate(Stratum = 1) |>
  build_event_date_possibilities_table()

Build table of dates with possible seroconversions in dataset

Description

Build table of dates with possible seroconversions in dataset

Usage

build_omega_table(participant_level_data, bin_width = 1)

Arguments

participant_level_data

a data.frame with columns:

Stratum: indicator for which population stratum the participant belongs to
L: left censoring interval endpoint
R: right censoring interval endpoint

bin_width

number of days per bin

Value

a data.frame

convert a pair of simple logistic regression coefficients into P(Y|T) curve:

Description

convert a pair of simple logistic regression coefficients into P(Y|T) curve:

Usage

build_phi_function_from_coefs(coefs)

Arguments

coefs

numeric vector of coefficients

Value

function(t) P(Y=1|T=t)

Validate participant-level data

Description

Internal helper function that performs basic validation checks on participant-level data to ensure data integrity before model fitting.

Usage

check_pt_data(participant_level_data)

Arguments

participant_level_data

A data.frame or tibble containing participant-level data with at least the following columns:

ID: participant ID
L: date of last negative test for seroconversion
R: date of first positive test for seroconversion

Value

NULL (invisibly). The function throws an error if validation fails.

compute mean window period duration from simple logistic regression coefficients

Description

compute mean window period duration from simple logistic regression coefficients

Usage

compute_mu(theta)

Arguments

theta

numeric vector of coefficients

Value

numeric scalar: mean window period duration

Filter dataset by participant IDs

Description

Internal helper function that filters all data frames within a dataset structure to include only the specified participant IDs.

Usage

filter_data_by_ID(dataset, included_IDs)

Arguments

dataset

A list containing data frames output from simulate_interval_censoring(), including pt_data, obs_data0, and obs_data components.

included_IDs

A character vector of participant IDs to retain in the filtered dataset.

Value

A list with the same structure as dataset, but with all data frames filtered to include only rows where the ID is in included_IDs.

Fit a logistic regression model with an interval-censored covariate

Description

This function fits a logistic regression model for a binary outcome Y with an interval-censored covariate T, using an EM algorithm, as described in Morrison et al (2021); doi:10.1111/biom.13472.

Usage

fit_joint_model(
  participant_level_data,
  obs_level_data,
  model_formula = stats::formula(Y ~ T),
  mu_function = compute_mu,
  bin_width = 1,
  denom_offset = 0.1,
  EM_toler_loglik = 0.1,
  EM_toler_est = 1e-04,
  EM_max_iterations = Inf,
  glm_tolerance = 1e-07,
  glm_maxit = 20,
  initial_S_estimate_location = 0.25,
  coef_change_metric = "max abs rel diff coefs",
  verbose = FALSE
)

Arguments

participant_level_data

a data.frame or tibble with the following variables:

ID: participant ID
E: study enrollment date
L: date of last negative test for seroconversion
R: date of first positive test for seroconversion
Cohort' (optional): this variable can be used to stratify the modeling of the seroconversion distribution.

obs_level_data

a data.frame or tibble with the following variables:

ID: participant ID
O: biomarker sample collection dates
Y: MAA classifications (binary outcomes)

model_formula

the functional form for the regression model for p(y|t) (as a formula() object)

mu_function

a function taking a vector of regression coefficient estimates as input and outputting an estimate of mu (mean duration of MAA-positive infection).

bin_width

the number of days between possible seroconversion dates (should be an integer)

denom_offset

an offset value added to the denominator of the hazard estimates to improve numerical stability

EM_toler_loglik

the convergence cutoff for the log-likelihood criterion ("Delta_L" in the paper)

EM_toler_est

the convergence cutoff for the parameter estimate criterion ("Delta_theta" in the paper)

EM_max_iterations

the number of EM iterations to perform before giving up if still not converged.

glm_tolerance

the convergence cutoff for the glm fit in the M step

glm_maxit

the iterations cutoff for the glm fit in the M step

initial_S_estimate_location

determines how seroconversion date is guessed to initialize the algorithm; can be any decimal between 0 and 1; 0.5 = midpoint imputation, 0.25 = 1st quartile, 0 = last negative, etc.

coef_change_metric

a string indicating the type of parameter estimate criterion to use:

"max abs rel diff coefs" is the "Delta_theta" criterion described in the paper.
"max abs diff coefs" is the maximum absolute change in the coefficients (not divided by the old values); this criterion can be useful when some parameters are close to 0.
"diff mu" is the absolute change in mu, which may be helpful in the incidence estimate calibration setting but not elsewhere.

verbose

whether to print algorithm progress details to the console

Value

a list with the following elements:

Theta: the estimated regression coefficients for the model of p(Y|T)
Mu: the estimated mean window period (a transformation of Theta)
Omega: a table with the estimated parameters for the model of p(S|E).
converged: indicator of whether the algorithm reached its cutoff criteria before reaching the specified maximum iterations. 1 = reached cutoffs, 0 = not.
iterations: the number of EM iterations completed before the algorithm stopped.
convergence_metrics: the four convergence metrics
convergence_stats: a table of the log-likelihood at each iteration

References

Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi:10.1111/biom.13472.

Examples

## Not run: 

# simulate data:
study_data <- simulate_interval_censoring()

# fit model:
EM_algorithm_outputs <- fit_joint_model(
  obs_level_data = study_data$obs_data,
  participant_level_data = study_data$pt_data
)

## End(Not run)

Fit model using midpoint imputation

Description

Fit model using midpoint imputation

Usage

fit_midpoint_model(
  participant_level_data,
  obs_level_data,
  maxit = 1000,
  tolerance = 1e-08
)

Arguments

participant_level_data

a data.frame or tibble with the following variables:

ID: participant ID
E: study enrollment date
L: date of last negative test for seroconversion
R: date of first positive test for seroconversion
Cohort' (optional): this variable can be used to stratify the modeling of the seroconversion distribution.

obs_level_data

a data.frame or tibble with the following variables:

ID: participant ID
O: biomarker sample collection dates
Y: MAA classifications (binary outcomes)

maxit

maximum iterations, passed to bigglm

tolerance

convergence criterion, passed to bigglm

Value

a vector of logistic regression coefficient estimates

Examples

sim_data <- simulate_interval_censoring(
  "theta" = c(0.986, -3.88),
  "study_cohort_size" = 4500,
  "preconversion_interval_length" = 365,
  "hazard_alpha" = 1,
  "hazard_beta" = 0.5
)

theta_est_midpoint <- fit_midpoint_model(
  obs_level_data = sim_data$obs_data,
  participant_level_data = sim_data$pt_data
)

Fit model using uniform imputation

Description

Fit model using uniform imputation

Usage

fit_uniform_model(
  participant_level_data,
  obs_level_data,
  maxit = 1000,
  tolerance = 1e-08,
  n_imputations = 10
)

Arguments

participant_level_data

a data.frame or tibble with the following variables:

ID: participant ID
E: study enrollment date
L: date of last negative test for seroconversion
R: date of first positive test for seroconversion
Cohort' (optional): this variable can be used to stratify the modeling of the seroconversion distribution.

obs_level_data

a data.frame or tibble with the following variables:

ID: participant ID
O: biomarker sample collection dates
Y: MAA classifications (binary outcomes)

maxit

maximum iterations, passed to bigglm

tolerance

convergence criterion, passed to bigglm

n_imputations

number of imputed data sets to create

Value

a vector of logistic regression coefficient estimates

Examples

sim_data <- simulate_interval_censoring(
  "theta" = c(0.986, -3.88),
  "study_cohort_size" = 4500,
  "preconversion_interval_length" = 365,
  "hazard_alpha" = 1,
  "hazard_beta" = 0.5
)

theta_est_midpoint <- fit_uniform_model(
  obs_level_data = sim_data$obs_data,
  participant_level_data = sim_data$pt_data
)

Plot the estimated seroconversion-date distribution for one participant

Description

Plot the estimated seroconversion-date distribution for one participant

Usage

graph_S(subject_level_data_possibilities, id = 1)

Arguments

subject_level_data_possibilities

a data.frame of per-subject seroconversion-date possibilities with their estimated probabilities

id

the participant ID to plot

Value

a ggplot2::ggplot

Graph seroconversion hazard model

Description

Graph seroconversion hazard model

Usage

graph_omega(omega)

Arguments

omega

a data.frame containing parameter values for the seroconversion hazard model

Value

a ggplot2::ggplot

Examples

example_model <-
  system.file("extdata", "example_model.rds", package = "rwicc") |>
  readRDS()
omega_est_EM <- example_model$Omega
omega_est_EM |> graph_omega()

Plot simulated seroconversion density curves

Description

Plots example seroconversion-time densities for several intercept/slope hazard combinations.

Usage

graph_simulated_densities()

Value

a ggplot2::ggplot

Plot simulated seroconversion hazard curves

Description

Plots example linear seroconversion hazards for several intercept/slope combinations.

Usage

graph_simulated_hazards()

Value

a ggplot2::ggplot

Plot simulated seroconversion survival curves

Description

Plots example seroconversion-time survival curves for several intercept/slope hazard combinations.

Usage

graph_simulated_survival_curves()

Value

a ggplot2::ggplot

plot estimated and true CDFs for seroconversion date distribution

Description

plot estimated and true CDFs for seroconversion date distribution

Usage

plot_CDF(true_hazard_alpha, true_hazard_beta, omega_hat)

Arguments

true_hazard_alpha

The data-generating hazard at the start of the study

true_hazard_beta

The change in data-generating hazard per calendar year

omega_hat

tibble of estimated discrete hazards

Value

a ggplot

Examples

## Not run: 

hazard_alpha <- 1
hazard_beta <- 0.5
study_data <- simulate_interval_censoring(
  "hazard_alpha" = hazard_alpha,
  "hazard_beta" = hazard_beta
)

# fit model:
EM_algorithm_outputs <- fit_joint_model(
  obs_level_data = study_data$obs_data,
  participant_level_data = study_data$pt_data
)
plot1 <- plot_CDF(
  true_hazard_alpha = hazard_alpha,
  true_hazard_beta = hazard_beta,
  omega_hat = EM_algorithm_outputs$Omega
)

print(plot1)

## End(Not run)

Plot censoring data

Description

Plot censoring data

Usage

plot_censoring_data(
  dataset,
  included_IDs = unique(dataset$pt_data$ID),
  label_size = 5,
  point_size = 5,
  s_vjust = 2,
  labelled_IDs = included_IDs,
  xmin = min(dataset$pt_data$E) - 28,
  xmax = max(dataset$obs_data$O)
)

Arguments

dataset

output from simulate_interval_censoring()

included_IDs

character vector of IDs from dataset to include

label_size

numeric: passed to ggrepel::geom_text_repel()'s size argument

point_size

numeric: passed to ggplot2::geom_point's size argument

s_vjust

passed to ggrepel::geom_text_repel's vjust argument

labelled_IDs

character vector indicating which IDs to label events for

xmin

minimum displayed value for x-axis

xmax

maximum displayed value for x-axis

Value

a ggplot

Plot true and estimated curves for P(Y=1|T=t)

Description

Plot true and estimated curves for P(Y=1|T=t)

Usage

plot_phi_curves(
  theta_true,
  theta.hat_joint,
  theta.hat_midpoint,
  theta.hat_uniform
)

Arguments

theta_true

the coefficients of the data-generating model P(Y=1|T=t)

theta.hat_joint

the estimated coefficients from the joint model

theta.hat_midpoint

the estimated coefficients from midpoint imputation

theta.hat_uniform

the estimated coefficients from uniform imputation

Value

a ggplot

Examples

## Not run: 

theta_true <- c(0.986, -3.88)
hazard_alpha <- 1
hazard_beta <- 0.5
sim_data <- simulate_interval_censoring(
  "theta" = theta_true,
  "study_cohort_size" = 4500,
  "preconversion_interval_length" = 365,
  "hazard_alpha" = hazard_alpha,
  "hazard_beta" = hazard_beta
)

# extract the participant-level and observation-level simulated data:
sim_participant_data <- sim_data$pt_data
sim_obs_data <- sim_data$obs_data
rm(sim_data)

# joint model:
EM_algorithm_outputs <- fit_joint_model(
  obs_level_data = sim_obs_data,
  participant_level_data = sim_participant_data,
  bin_width = 7,
  verbose = FALSE
)

# midpoint imputation:
theta_est_midpoint <- fit_midpoint_model(
  obs_level_data = sim_obs_data,
  participant_level_data = sim_participant_data
)

# uniform imputation:
theta_est_uniform <- fit_uniform_model(
  obs_level_data = sim_obs_data,
  participant_level_data = sim_participant_data
)

plot2 <- plot_phi_curves(
  theta_true = theta_true,
  theta.hat_uniform = theta_est_uniform,
  theta.hat_midpoint = theta_est_midpoint,
  theta.hat_joint = EM_algorithm_outputs$Theta
)

print(plot2)

## End(Not run)

rwicc: Regression with Interval-Censored Covariates

Description

The rwicc package implements a regression model with an interval-censored covariate using an EM algorithm, as described in Morrison et al (2021); doi:10.1111/biom.13472.

rwicc functions

The main rwicc functions are:

simulate_interval_censoring
fit_joint_model

Author(s)

Maintainer: Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]

Authors:

Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]
Ron Brookmeyer

References

Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi:10.1111/biom.13472.

Seroconversion cumulative hazard function

Description

Cumulative hazard obtained by integrating seroconversion_hazard_function().

Usage

seroconversion_cumhaz_function(t, intercept, slope, entry_time = 0)

Arguments

t

numeric vector of times since study start (years)

intercept

hazard at study start

slope

change in hazard per year

entry_time

time of study entry; the hazard is zero before this

Value

a numeric vector of cumulative hazard values

Seroconversion density function

Description

Probability density, survival * hazard.

Usage

seroconversion_density_function(...)

Arguments

...

arguments passed to seroconversion_survival_function() and seroconversion_hazard_function()

Value

a numeric vector of density values

Seroconversion hazard function

Description

Linear instantaneous hazard of seroconversion, intercept + slope * t, equal to zero before entry_time.

Usage

seroconversion_hazard_function(t, intercept, slope, entry_time = 0)

Arguments

t

numeric vector of times since study start (years)

intercept

hazard at study start

slope

change in hazard per year

entry_time

time of study entry; the hazard is zero before this

Value

a numeric vector of hazard values

Inverse survival function for time-to-event variable with linear hazard function

Description

This function determines the seroconversion date corresponding to a provided probability of survival. See doi:10.1111/biom.13472, Supporting Information, Section A.4.

Usage

seroconversion_inverse_survival_function(u, e, hazard_alpha, hazard_beta)

Arguments

u

a vector of seroconversion survival probabilities

e

a vector of time differences between study start and enrollment (in years)

hazard_alpha

the instantaneous hazard of seroconversion on the study start date

hazard_beta

the change in hazard per year after study start date

Value

numeric vector of time differences between study start and seroconversion (in years)

References

Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics, doi:10.1111/biom.13472.

Seroconversion survival function

Description

Survival probability, ⁠exp(-cumulative hazard)⁠.

Usage

seroconversion_survival_function(...)

Arguments

...

arguments passed to seroconversion_cumhaz_function()

Value

a numeric vector of survival probabilities

Simulate a dataset with interval-censored seroconversion dates

Description

simulate_interval_censoring generates a simulated data set from a data-generating model based on the typical structure of a cohort study of HIV biomarker progression, as described in Morrison et al (2021); doi:10.1111/biom.13472.

Usage

simulate_interval_censoring(
  study_cohort_size = 4500,
  probability_of_ever_seroconverting = 0.05,
  n_at_risk = stats::rbinom(n = 1, size = study_cohort_size, prob =
    probability_of_ever_seroconverting),
  hazard_alpha = 1,
  hazard_beta = 0.5,
  preconversion_interval_length = 84,
  theta = c(0.986, -3.88),
  years_in_study = 10,
  max_scheduling_offset = 7,
  days_from_study_start_to_recruitment_end = 365,
  study_start_date = lubridate::ymd("2001-01-01")
)

Arguments

study_cohort_size

the number of participants to simulate (N_0 in the paper)

probability_of_ever_seroconverting

the probability that each participant is at risk of HIV seroconversion

n_at_risk

number of participants who are at risk of infection; by default, this number is determined stochastically from study_cohort_size and probability_of_ever_seroconverting.

hazard_alpha

the hazard (instantaneous risk) of seroconversion at the start date of the cohort study for those participants at risk of seroconversion

hazard_beta

the change in hazard per calendar year

preconversion_interval_length

the number of days between tests for seroconversion

theta

the parameters of a logistic model (with linear functional from) specifying the probability of MAA-positive biomarkers as a function of time since seroconversion

years_in_study

the duration of follow-up for each participant

max_scheduling_offset

the maximum divergence of pre-seroconversion followup visits from the prescribed schedule

days_from_study_start_to_recruitment_end

the length of the recruitment period

study_start_date

the date when the study starts recruitment ("d_0" in the main text). The value of this parameter does not affect the simulation results; it is only necessary as a reference point for generating E, L, R, O, and S.

Value

A list containing the following two tibbles:

pt_data: a tibble of participant-level information, with the following columns:
- ID: participant ID
- E: enrollment date
- L: date of last HIV test prior to seroconversion
- R: date of first HIV test after seroconversion
obs_data: a tibble of longitudinal observations with the following columns:
- ID: participant ID
- O: dates of biomarker sample collection
- Y: MAA classifications of biomarker samples

References

Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi:10.1111/biom.13472.

Examples

study_data <- simulate_interval_censoring()
participant_characteristics <- study_data$pt_data
longitudinal_observations <- study_data$obs_data

Create a ggplot with standard theme settings

Description

Internal helper function that initializes a ggplot object with consistent theme and styling settings used across multiple plotting functions in the package.

Usage

standard_ggplot(data)

Arguments

data

A data frame to be passed to ggplot2::ggplot().

Value

A ggplot2::ggplot() object with standard theme settings applied, including a black and white theme, custom axis styling, and legend positioning at the bottom.

Update subject-level data with posterior probabilities

Description

Internal helper function that updates subject-level data by computing various conditional probabilities used in the EM algorithm for fitting the joint model. This function combines observation-level predictions with participant-level data and omega estimates to compute posterior probabilities needed for parameter estimation.

Usage

update_possible_subj_data(
  obs_data_possibilities,
  MAA_model,
  participant_level_data,
  omega_hat
)

Arguments

obs_data_possibilities

A data.frame containing all possible observation-level data combinations, including ID, S (possible seroconversion dates), and Y (MAA classifications).

MAA_model

A fitted stats::glm() model object used to predict P(Y=1|T=t), the probability of MAA-positive status given time since seroconversion.

participant_level_data

A data.frame containing participant-level information, including ID, Stratum, and P(S>=l|E=e) (the probability that seroconversion occurs on or after the last negative test date).

omega_hat

A data.frame containing estimated parameters for the seroconversion date distribution model, including S, Stratum, P(S=s|S>=s,E=e), and P(S>s|S>=s,E=e).

Value

A data.frame containing updated subject-level data with the following columns:

ID: Participant identifier
Stratum: Stratification variable
S: Possible seroconversion date
⁠P(Y=y|T=t)⁠: Probability of observed MAA outcomes given time since seroconversion
⁠P(S=s|E=e)⁠: Marginal probability of seroconversion at time s given enrollment date
P(S=s|e,l,r,o,y): Posterior probability of seroconversion at time s given all observed data (used to estimate omega and theta)
P(S>=s|e,l,r,o,y): Posterior probability that seroconversion occurs on or after time s (used to estimate omega)

Package {rwicc}

rwicc: Regression with Interval-Censored Covariates

Description

Author(s)

See Also

Build table of event date possibilities

Description

Usage

Arguments

Value

Examples

Build table of dates with possible seroconversions in dataset

Description

Usage

Arguments

Value

convert a pair of simple logistic regression coefficients into P(Y|T) curve:

Description

Usage

Arguments

Value

Validate participant-level data

Description

Usage

Arguments

Value

compute mean window period duration from simple logistic regression coefficients

Description

Usage

Arguments

Value

Filter dataset by participant IDs

Description

Usage

Arguments

Value

Fit a logistic regression model with an interval-censored covariate

Description

Usage

Arguments

Value

References

Examples

Fit model using midpoint imputation

Description

Usage

Arguments

Value

Examples

Fit model using uniform imputation

Description

Usage

Arguments

Value

Examples

Plot the estimated seroconversion-date distribution for one participant

Description

Usage

Arguments

Value

Graph seroconversion hazard model

Description

Usage

Arguments

Value

Examples

Plot simulated seroconversion density curves

Description

Usage

Value

Plot simulated seroconversion hazard curves

Description

Usage

Value

Plot simulated seroconversion survival curves

Description

Usage

Value

plot estimated and true CDFs for seroconversion date distribution

Description