The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

HCPclust

HCPclust is an R package for Hierarchical Conformal Prediction (HCP), designed for clustered data where each subject has repeated measurements with within-subject dependence, and outcomes can be missing under a covariate-dependent Missing At Random (MAR) mechanism.

Installation

You can install the development version of HCPclust directly from GitHub using:

install.packages("remotes")
remotes::install_github("judywangstat/HCP")
library(HCPclust)

Method overview

The proposed method consists of four steps.

Step (1) Model fitting

Both models are estimated on a subject-level training split to capture the distribution of \(Y \mid X\) and correct for covariate-dependent missingness.

Step (2) Subsampled calibration

This preserves the clustered structure while reducing the impact of within-subject dependence on conformal calibration.

Step (3) Weighted conformal scoring

Step (4) Aggregation

The final conformal prediction region contains all values \(y\) such that \(p_{\text{final}}(y) > \alpha.\)

What’s included

1) Data generator

2) Conditional density estimator

3) Missingness propensity model

4) HCP conformal prediction

5) Plotting

Quick examples

This section shows (i) how to call hcp_predict_targets() under the four common test-data scenarios, and (ii) how to generate and display two plots from plot_hcp_intervals().

A) Four scenarios for testing data (via hcp_predict_targets())

Generate training data (fixed across all cases)

set.seed(1)
dat_train <- generate_clustered_mar(
  n = 200, m = 4, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  target_missing = 0.30,
  seed = 1
)
y_grid <- seq(-6, 6, length.out = 201)

Case 1: one patient, one target

test_11 <- data.frame(pid = 1, X1 = 2.5)
out_11 <- hcp_predict_targets(
  dat = dat_train, test = test_11,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_11$pred

Case 2: one patient, multiple targets

test_1M <- data.frame(pid = 1, X1 = c(1, 3, 7, 9))
out_1M <- hcp_predict_targets(
  dat = dat_train, test = test_1M,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_1M$pred

Case 3: multiple patients, one target each

test_P1 <- data.frame(pid = 1:4, X1 = c(2, 4, 6, 8))
out_P1 <- hcp_predict_targets(
  dat = dat_train, test = test_P1,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_P1$pred

Case 4: multiple patients, multiple targets per patient

test_PM <- data.frame(
  pid = c(1,1, 2,2,2, 3,3),
  X1  = c(1,6,  2,5,9,  3,8)
)
out_PM <- hcp_predict_targets(
  dat = dat_train, test = test_PM,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_PM$pred

B) Two plots

Generate training data and test data

dat_train <- generate_clustered_mar(
  n = 200, m = 20, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  hetero_gamma = 2.5,
  target_missing = 0.30,
  seed = 1
)
y_grid <- seq(-6, 10, length.out = 201)

# test data with latent truth
dat_test <- generate_clustered_mar(
  n = 100, m = 20, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  hetero_gamma = 2.5,
  seed = 999
)

Plot A: one patient, multiple targets (band vs time)

pid <- dat_test$id[1]
idx <- which(dat_test$id == pid)
idx <- idx[order(dat_test$X1[idx])][1:10]
test_1M <- data.frame(pid = pid, X1 = dat_test$X1[idx], y_true = dat_test$Y_full[idx])

out_1M <- hcp_predict_targets(
  dat = dat_train, test = test_1M,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  S = 3, B = 3,
  seed = 1
)
plot_hcp_intervals(
  out_1M$pred, mode = "band", x_col = "X1",
  y_true_col = "y_true", show_true = TRUE,
  main = "Case A: one patient, multiple time points (band vs time)"
)

Plot B: multiple patients, one target each (intervals by patient)

pids <- unique(dat_test$id)[1:20]
test_P1 <- subset(dat_test, id %in% pids & j == 1, select = c(id, X1, Y_full))
names(test_P1) <- c("pid", "X1", "y_true")

out_P1 <- hcp_predict_targets(
  dat = dat_train, test = test_P1,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  S = 3, B = 3,
  seed = 1
)
plot_hcp_intervals(
  out_P1$pred, mode = "pid", pid_col = "pid", x_sort_col = "X1",
  y_true_col = "y_true", show_true = TRUE,
  main = "Case B: multiple patients, one time point (by patient)"
)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.