The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
clusteredMSM provides nonparametric analysis of
clustered multistate process data, based on Bakoyannis (2021) and
Bakoyannis & Bandyopadhyay (2022). It exposes one user-facing
function, patp(), that:
The package depends only on survival.
clusteredMSM expects data in interval
format: one row per mutually-exclusive time interval per
subject, with columns for interval start time (Tstart), end
time (Tstop), starting state (Sstart), and
ending state (Sstop). The column names are flexible — they
are passed by name through the formula.
Censoring is encoded as Sstart == Sstop on the
final row of a subject’s record. Within each subject,
intervals must be temporally contiguous
(Tstop[k] == Tstart[k+1]) and state contiguous
(Sstop[k] == Sstart[k+1]).
library(clusteredMSM)
# 1=Healthy, 2=Ill, 3=Dead. Allowed: 1->2, 1->3, 2->3.
tmat <- trans_mat(
list(c(2, 3), 3, integer(0)),
names = c("Healthy", "Ill", "Dead")
)
tmat
#> to
#> from Healthy Ill Dead
#> Healthy NA 1 2
#> Ill NA NA 3
#> Dead NA NA NAA small example dataset in interval format:
mydata <- data.frame(
pid = c(1, 1, 2, 3, 4, 4),
site = c(1, 1, 1, 2, 2, 2),
treatment = c("A", "A", "A", "B", "B", "B"),
t0 = c(0.0, 1.5, 0.0, 0.0, 0.0, 1.0),
t1 = c(1.5, 3.0, 2.0, 1.0, 1.0, 2.5),
s0 = c(1, 2, 1, 1, 1, 2),
s1 = c(2, 3, 1, 3, 2, 3)
)
mydata
#> pid site treatment t0 t1 s0 s1
#> 1 1 1 A 0.0 1.5 1 2
#> 2 1 1 A 1.5 3.0 2 3
#> 3 2 1 A 0.0 2.0 1 1
#> 4 3 2 B 0.0 1.0 1 3
#> 5 4 2 B 0.0 1.0 1 2
#> 6 4 2 B 1.0 2.5 2 3Subject 1: H -> I -> D. Subject 2: censored healthy at t = 2.0
(s0 == s1). Subject 3: died at t = 1.0. Subject 4: H ->
I -> D.
fit <- patp(
msm(t0, t1, s0, s1) ~ 1,
data = mydata,
tmat = tmat,
id = "pid",
cluster = "site",
h = 1, j = 2, s = 0,
B = 1000, cband = TRUE,
seed = 1
)
fitfit$curves contains the time-indexed point estimate of
\[\begin{eqnarray*}
P(X(t) = j \,|\, X(s) = h) &=& P(X(t) = 2 \,|\, X(0) = 1) \\
&=& P(X(t) = 2),
\end{eqnarray*}\] (second equality due to the fact that all start
at state = 1 in the classical illness-death model), standard error,
pointwise confidence interval, and (with cband = TRUE)
simultaneous band limits.
If the formula’s right-hand side names a binary grouping variable,
patp() automatically estimates both curves and conducts a
Kolmogorov-Smirnov-type test:
tt <- patp(
msm(t0, t1, s0, s1) ~ treatment,
data = mydata,
tmat = tmat,
id = "pid",
cluster = "site",
h = 1, j = 2, s = 0,
B = 1000,
seed = 1
)
ttThe $curves slot now has one row block per group. The
$test slot contains the observed K-S statistic and the
bootstrap p-value.
For processes with cyclic transitions (e.g., illness-death with
recovery), use the long event format directly. The same
patp() call works:
tmat_rec <- trans_mat(
list(c(2, 3), c(1, 3), integer(0)),
names = c("Healthy", "Ill", "Dead")
)
# Subject who went Healthy -> Ill -> Healthy -> censored:
recovery_data <- data.frame(
pid = c(1, 1, 1),
t0 = c(0.0, 1.0, 2.0),
t1 = c(1.0, 2.0, 3.5),
s0 = c(1, 2, 1),
s1 = c(2, 1, 1) # last row: censored healthy
)
patp(msm(t0, t1, s0, s1) ~ 1,
data = recovery_data, tmat = tmat_rec,
id = "pid",
h = 1, j = 2, s = 0,
B = 500, seed = 1)When s > 0 and the Markov assumption is questionable,
use LMAJ = TRUE:
When cluster size is potentially informative (e.g., sicker centers contribute more patients), use the weighted estimator:
patp(msm(t0, t1, s0, s1) ~ 1,
data = mydata, tmat = tmat,
id = "pid", cluster = "site",
h = 1, j = 2, s = 0,
weighted = TRUE, B = 1000, seed = 1)weighted = TRUE requires the cluster
argument.
B = 0 doesSetting B = 0 returns the point estimate without any
bootstrap inference. This is permitted only for one-sample formulas;
two-sample formulas always require B > 0 (since the
entire point of a two-sample analysis is the test). Use
B = 0 for fast exploratory work, then switch to
B = 1000 for inference.
Bakoyannis G (2021). Nonparametric analysis of nonhomogeneous multistate processes with clustered observations. Biometrics 77(2):533-546. doi:10.1111/biom.13327
Bakoyannis G, Bandyopadhyay D (2022). Nonparametric tests for multistate processes with clustered data. Annals of the Institute of Statistical Mathematics 74(5):837-867. doi:10.1007/s10463-021-00819-x
Putter H, Spitoni C (2018). Non-parametric estimation of transition probabilities in non-Markov multi-state models: the landmark Aalen-Johansen estimator. Statistical Methods in Medical Research 27(7):2081-2092. doi:10.1177/0962280216674497
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.