The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
HCPclust is an R package for Hierarchical Conformal Prediction (HCP), designed for clustered data where each subject has repeated measurements with within-subject dependence, and outcomes can be missing under a covariate-dependent Missing At Random (MAR) mechanism.
You can install the development version of HCPclust directly from GitHub using:
install.packages("remotes")
remotes::install_github("judywangstat/HCP")
library(HCPclust)The proposed method consists of four steps.
Step (1) Model fitting
fit_cond_density_quantile().fit_missingness_propensity().Both models are estimated on a subject-level training split to capture the distribution of \(Y \mid X\) and correct for covariate-dependent missingness.
Step (2) Subsampled calibration
This preserves the clustered structure while reducing the impact of within-subject dependence on conformal calibration.
Step (3) Weighted conformal scoring
Step (4) Aggregation
The final conformal prediction region contains all values \(y\) such that \(p_{\text{final}}(y) > \alpha.\)
generate_clustered_mar()
hetero_gamma,rho,target_missing.fit_cond_density_quantile()
method = "rq": linear quantile regression
(quantreg)method = "qrf": quantile random forest
(quantregForest)fit_missingness_propensity()
logistic: logistic regression (glm)grf: generalized random forests
(grf::probability_forest)boosting: gradient boosting (xgboost)[eps, 1 - eps] to
stabilize inverse-propensity weighting.hcp_conformal_region()
y_grid,
together with the interval endpoints [lo, hi].hcp_predict_targets()
hcp_conformal_region() that
automatically adapts to different test-data structures, including:
alpha_eff = alpha / M_p when a
patient has M_p > 1 target points.plot_hcp_intervals()
mode = "band": for one subject with repeated
measurements (e.g., over time), plots an interval band as a function of
a covariate.mode = "pid": for multiple subjects with one target
point each, displays one prediction interval per subject, optionally
sorted by a covariate.This section shows (i) how to call hcp_predict_targets()
under the four common test-data scenarios, and (ii) how to generate and
display two plots from plot_hcp_intervals().
hcp_predict_targets())set.seed(1)
dat_train <- generate_clustered_mar(
n = 200, m = 4, d = 1,
x_dist = "uniform", x_params = list(min = 0, max = 10),
target_missing = 0.30,
seed = 1
)
y_grid <- seq(-6, 6, length.out = 201)test_11 <- data.frame(pid = 1, X1 = 2.5)
out_11 <- hcp_predict_targets(
dat = dat_train, test = test_11,
x_cols = "X1", y_grid = y_grid,
alpha = 0.1,
seed = 1
)
out_11$predtest_1M <- data.frame(pid = 1, X1 = c(1, 3, 7, 9))
out_1M <- hcp_predict_targets(
dat = dat_train, test = test_1M,
x_cols = "X1", y_grid = y_grid,
alpha = 0.1,
seed = 1
)
out_1M$predtest_P1 <- data.frame(pid = 1:4, X1 = c(2, 4, 6, 8))
out_P1 <- hcp_predict_targets(
dat = dat_train, test = test_P1,
x_cols = "X1", y_grid = y_grid,
alpha = 0.1,
seed = 1
)
out_P1$predtest_PM <- data.frame(
pid = c(1,1, 2,2,2, 3,3),
X1 = c(1,6, 2,5,9, 3,8)
)
out_PM <- hcp_predict_targets(
dat = dat_train, test = test_PM,
x_cols = "X1", y_grid = y_grid,
alpha = 0.1,
seed = 1
)
out_PM$preddat_train <- generate_clustered_mar(
n = 200, m = 20, d = 1,
x_dist = "uniform", x_params = list(min = 0, max = 10),
hetero_gamma = 2.5,
target_missing = 0.30,
seed = 1
)
y_grid <- seq(-6, 10, length.out = 201)
# test data with latent truth
dat_test <- generate_clustered_mar(
n = 100, m = 20, d = 1,
x_dist = "uniform", x_params = list(min = 0, max = 10),
hetero_gamma = 2.5,
seed = 999
)pid <- dat_test$id[1]
idx <- which(dat_test$id == pid)
idx <- idx[order(dat_test$X1[idx])][1:10]
test_1M <- data.frame(pid = pid, X1 = dat_test$X1[idx], y_true = dat_test$Y_full[idx])
out_1M <- hcp_predict_targets(
dat = dat_train, test = test_1M,
x_cols = "X1", y_grid = y_grid,
alpha = 0.1,
S = 3, B = 3,
seed = 1
)
plot_hcp_intervals(
out_1M$pred, mode = "band", x_col = "X1",
y_true_col = "y_true", show_true = TRUE,
main = "Case A: one patient, multiple time points (band vs time)"
)
pids <- unique(dat_test$id)[1:20]
test_P1 <- subset(dat_test, id %in% pids & j == 1, select = c(id, X1, Y_full))
names(test_P1) <- c("pid", "X1", "y_true")
out_P1 <- hcp_predict_targets(
dat = dat_train, test = test_P1,
x_cols = "X1", y_grid = y_grid,
alpha = 0.1,
S = 3, B = 3,
seed = 1
)
plot_hcp_intervals(
out_P1$pred, mode = "pid", pid_col = "pid", x_sort_col = "X1",
y_true_col = "y_true", show_true = TRUE,
main = "Case B: multiple patients, one time point (by patient)"
)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.