Hierarchical composite endpoints

Introduction

Setup

Load the package hce and check the version:

library(hce)
packageVersion("hce")
#> [1] '0.8.5'

For citing the package run citation("hce") (Samvel B. Gasparyan 2025).

Definitions

Hierarchical composite endpoints (HCE) are a general class of endpoints combining different clinical outcomes of patients into a composite so as to preserve their different natures. A particular case of these endpoints is defined in a fixed follow-up period and accounts for the patient’s clinically most important outcome for the analysis. HCEs are analyzed using win odds and other win statistics (Samvel B. Gasparyan et al. 2023).

Examples

Here we provide examples of HCE using in clinical trials from different therapeutic areas. General considerations for creating HCEs can be found in Samvel B. Gasparyan et al. (2022).

COVID-19

The DARE-19 (M. Kosiborod et al. 2021; M. N. Kosiborod et al. 2021) trial used an HCE to assess outcomes in patients hospitalized for COVID-19 and treated for 30 days. The COVID-19 HCE is presented below. It combines death, in hospital organ dysfunction events with clinical status at Day 30 for patients alive, still hospitalized but without previous organ dysfunction events, and hospital discharge as the most favorable outcome for patients discharging without organ dysfunction events and being alive at Day 30.

Below a higher category signifies a better outcome. Patients are ranked into one and only one category based on their clinically most severe event. For example, patients experiencing an in-hospital new or worsening organ dysfunction event then dying will be included in the category I.

#>   Order                                               Category
#> 1     I                                                  Death
#> 2    II More than one new or worsened organ dysfunction events
#> 3   III            One new or worsened organ dysfunction event
#> 4    IV          Hospitalized at the end of follow-up (Day 30)
#> 5     V                 Discharged from hospital before Day 30

Patients in the category I are compared using the timing of the event, with an earlier event being a worse outcome (are assigned a lower rank). Similarly, in the category III the timing of the event is used for ranking patients within this category. In the category II patients are compared using the number of events with a higher number signifying a worse outcome. Patients in the category IV - hospitalized at the end of follow-up without previous worsening events - are further ranked according to oxygen support requirements at the hospital (IV.1 on high flow oxygen devices, IV.2 requiring supplemental oxygen, IV.3 not requiring supplemental oxygen, with a higher rank being a better outcome). Patients in the category V are compared using the timing of the event, but, the hospital discharge being a favorable outcome, here the earlier event signifies a better outcome than the late event (reverse of the ranking in categories I and III).

The simplest case of a COVID-19 HCE is an endpoint with ordinal scale outcomes assessed at a given timepoint. The endpoint uses 1-8 categories for assessing the physical limitations of hospitalized patients with COVID-19 after 15 or 30 days of treatment. But unlike the DARE-19 HCE, within each category it does not use the timing of events to reduce the ties in a paiwrise comparison of patients in the active group with patients in the control group. See, for example, COVID-19 and COVID-19b datasest ordinal scale outcomes (Beigel et al. 2020).

table(COVID19)
#>          GROUP
#> TRTP        1   2   3   4   5   6   7   8
#>   Active   34  95  28  58  38  14 117 157
#>   Placebo  58 121  24  60  33   8 102 115

The function hce::summaryWO() provides the number of wins, losses, and ties by categories. We can calculate the probability of ties from the provided numbers.

COVID19HCE <- hce(GROUP = COVID19$GROUP, TRTP = COVID19$TRTP)
SUM <- summaryWO(COVID19HCE, ref = "Placebo")$summary
SUM$Ptie <- round(SUM$TIE/SUM$TOTAL, 2)
SUM
#>   TRTP    WIN   LOSS   TIE  TOTAL        WR        WO Ptie
#> 1    A 135744  97143 48974 281861 1.3973627 1.3173641 0.17
#> 2    P  97143 135744 48974 281861 0.7156338 0.7590916 0.17

Kidney HCE

The kidney HCE defined in Heerspink et al. (2023) has the following ordinal outcomes (for the review of the topic see Little et al. (2023)).

#>   Order                                      Category
#> 1     I                                         Death
#> 2    II            Dialysis or kidney transplantation
#> 3   III         Sustained GFR < 15 ml/min per 1.73 m2
#> 4    IV Sustained GFR decline from baseline of >= 57%
#> 5     V Sustained GFR decline from baseline of >= 50%
#> 6    VI Sustained GFR decline from baseline of >= 40%
#> 7   VII                          Individual GFR slope

The dataset KHCE contains data on a kidney HCE outcomes

dat <- KHCE
Order <- c("Death (adj)", "Chronic dialysis (adj) >=90 days", 
           "Sustained eGFR<15 (mL/min/1.73 m2)", "Sustained >=57% decline in eGFR", 
           "Sustained >=50% decline in eGFR", "Sustained >=40% decline in eGFR", "eGFR slope")   
dat$GROUP <- factor(dat$GROUP, levels = Order)
table(dat$GROUP, dat$TRTP)
#>                                     
#>                                        A   P
#>   Death (adj)                         40  50
#>   Chronic dialysis (adj) >=90 days    17  29
#>   Sustained eGFR<15 (mL/min/1.73 m2)  16  28
#>   Sustained >=57% decline in eGFR      2   9
#>   Sustained >=50% decline in eGFR      7  22
#>   Sustained >=40% decline in eGFR     36  34
#>   eGFR slope                         632 578

This dataset is derived from ADSL which contains baseline characteristics, ADLB laboratory measurements of kidney function, and ADET for the time-to-event outcomes with their timing. For the detailed derivation see the Technical Appendix in Heerspink et al. (2023).

Heart Failure

In the Heart Failure population (see Kondo et al. (2023)) the following HCE was considered

#>   Order                                        Category
#> 1     I                            Cardiovascular death
#> 2    II Total (first and recurrent) HF hospitalizations
#> 3   III                          Total urgent HF visits
#> 4    IV           Improvement/deterioration in KCCQ-TSS

Dependent outcomes

To model dependent outcomes, several methods are available:

Joint Distribution Modeling Using Copulas: This method employs copulas to model the joint distribution of outcomes, capturing their dependence.
Random Frailty Modeling: This approach captures patient-level dependence between outcomes using a random frailty model.
Conditional Distribution Specification Through Multi-State Modeling: This technique uses multi-state models to describe the conditional distribution of outcomes.

Joint Distribution Modeling Using Copulas

Sklar’s theorem (Sklar 1959) shows that multivariate distribution functions can be expressed using a copula and univariate distributions. For a random vector \(X^d=(X_1,\cdots,X_d)\) with a multivariate distribution function \(H(x_1,\cdots,x_d),\) Sklar’s theorem states that there is a copula \(C(\cdot)\) such that: \[H(x_1,\cdots,x_d)=C(F_1(x_1),\cdots,F_d(x_d)),\] where each component \(X_j\) has the univariate distribution \(F_j.\) A copula is essentially a multivariate distribution function where each univariate marginal distribution is uniform, describing the dependency structure of the multivariate distribution function \(H(\cdot)\). To construct the multivariate distribution function, one combines each variable’s univariate distributions \(F_j\) with the copula.

If \(X_j\) has distribution function \(F_j,\) then \(U_j=F_j(X_j)\) is uniformly distributed, allowing random variables \(X_j\sim F_j\) to be simulated by generating uniform random variables \(U_j\) and applying the inverse transformation:

\[F_j^{-1}(y)=\inf\{x\in {\mathbf R}: \ \ F_j(x)\geq y\}, \ \ \inf\varnothing=\infty.\] Thus, if one has simulated a uniform random vector \(U^d=(U_1,\cdots, U_d)\) from the copula \(C(\cdot)\), the random vector \(X^d=(X_1,\cdots,X_d)\) can be simulated as: \[(X_1,\cdots,X_d)=(F_1^{-1}(U_1),\cdots,F_d^{-1} (U_d)).\] The main challenge remains in simulating from the given copula.

An Archimedean copula (Nelsen 2006) is one where:

\[C(u^d;\varphi)=\varphi(\varphi^{-1}(u_1)+\cdots+\varphi^{-1}(u_d)).\] The function \(\varphi:[0,+\infty]\rightarrow [0,1]\) is a generator - continuous, decreasing, with \(\varphi(0)=1\) and \(\lim_{t\rightarrow+\infty}\varphi(t)=0.\) When \(\varphi(t)=e^{-t^\theta},\ \ \theta>1,\) the copula is called a Gumbel copula.

The Marshall-Olkin algorithm

By Bernstein’s theorem, completely monotone Archimedean generators coincide with Laplace-Stieltjes transforms of distribution functions \(F,\) determined by \(\varphi=LS[F].\) The Marshall-Olkin algorithm (Marshall and Olkin 1988) for sampling from an Archimedean copula involves:

Sampling \(V\sim F=LS^{-1}[\varphi].\)
Sampling \(R_j\sim Exp(1),\,j\in\{1,\cdots,d\}.\)
Setting \(U_j=\varphi\left(\frac{R_j}{V}\right),\,j\in\{1,\cdots,d\}.\)

The vector \(U=(U_1,\cdots, U_d)\) is then a random vector from the Archimedean copula with generator \(\varphi\). For the Gumbel copula, one needs to use the inverse Laplace-Stieltjes transform of a stable distribution (Nolan 2020; Hofert and Mächler 2011): \[F\sim S(1/\theta, 1, \cos^\theta(\pi/(2\theta)), {\mathbf I}_{\{\theta=1\}},1)\] modifying the first step to sample from a stable distribution.

Chambers-Mallows-Stuck method for simulating stable random variables

Chambers-Mallows-Stuck method (Chambers, Mallows, and Stuck 1976) efficiently simulates stable variables with:

Generating independent uniform and exponential random variables \[\Theta\sim U\left[-\frac{\pi}{2},\frac{\pi}{2}\right] \text{ and } W\sim Exp(1).\]
Defining \(\alpha=1/\theta,\) setting \(b_{\tan}=\beta\tan\left(\frac{\alpha\pi}{2}\right),\) and \(\theta_0=\arctan(b_{\tan})/\alpha,\) with \[C_{\tan}=(1+b_{\tan}^2)^\frac{1}{2\alpha}.\]
Utilizing the transformations: \[Z(\theta) = \frac{\sin(a_0) C_{\tan}}{\cos(\Theta)^\frac{1}{\alpha}}\left(\frac{\cos(a_0-\Theta)}{W}\right)^\frac{1-\alpha}{\alpha}, \ \ a_0=\alpha(\Theta+\theta_0),\,\theta>1.\] \[Z(1)=\frac{2}{\pi}\left(\pi_\beta\tan(\Theta)-\beta\log\left(\frac{\pi}{2}W\frac{\cos(\Theta)}{\pi_\beta}\right)\right), \ \ \pi_\beta=\frac{\pi}{2}+\beta\Theta.\] Finally, \(\gamma Z + \delta\) has the desired distribution with \(\gamma =[\cos(\pi/(2\theta))]^\theta\) and \(\delta={\mathbf I}_{\{\theta=1\}}\) (and one needs to set \(\beta=1\)).

Note

In the original Chambers-Mallows-Stuck formula, the term \[\frac{1}{[\cos(\alpha\theta_0)\cos(\Theta)]^\frac{1}{\alpha}}\] is replaced by \(\frac{C_{\tan}}{[\cos(\Theta)]^\frac{1}{\alpha}},\) as suggested by the copula package (Hofert et al. 2025), which is based on the fact that \(C_{\tan} = 1/(\cos(\alpha\theta_0))^{1/\alpha}.\) Indeed, one needs to show that

\[(1+b_{\tan}^2)^\frac{1}{2\alpha}=1/(\cos(\alpha\theta_0))^{1/\alpha},\] which is equivalent to showing that \[1+\left[\beta\tan\left(\frac{\alpha\pi}{2}\right)\right]^2=\frac{1}{[\cos(\alpha\theta_0)]^2}.\] And this is true because of the trigonometric identity \[1+[\tan(y)]^2=\frac{1}{[\cos(y)]^2}.\] Here we have set \(y=\alpha\theta_0=\arctan(b_{\tan})=\arctan\left(\beta\tan\left(\frac{\alpha\pi}{2}\right)\right)\) and hence \(\tan(y)=\beta\tan\left(\frac{\alpha\pi}{2}\right).\)

Implementation

The function simHCE() provides the implementation above with the argument theta specifying the dependence of outcomes.

Rates_A <- c(10, 20)
Rates_P <- c(20, 20)
dat1 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P, 
CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 1, seed = 1)
dat2 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P, 
CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 1.0001, seed = 1)
dat3 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P, 
CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 10, seed = 1)
calcWO(dat1)
#>         WO     LCL      UCL         SE WOnull alpha Pvalue        WP    LCL_WP
#> 1 1.483251 1.38986 1.582918 0.03318096      1  0.05      0 0.5973021 0.5816594
#>      UCL_WP       SE_WP     SD_WP    N
#> 1 0.6129447 0.007981092 0.5643484 5000
calcWO(dat2)
#>        WO      LCL      UCL         SE WOnull alpha Pvalue        WP    LCL_WP
#> 1 1.52306 1.426936 1.625659 0.03326189      1  0.05      0 0.6036558 0.5880583
#>      UCL_WP      SE_WP     SD_WP    N
#> 1 0.6192534 0.00795809 0.5627219 5000
calcWO(dat3)
#>         WO      LCL      UCL         SE WOnull alpha Pvalue        WP    LCL_WP
#> 1 1.387231 1.299852 1.480482 0.03319378      1  0.05      0 0.5811046 0.5652679
#>      UCL_WP       SE_WP     SD_WP    N
#> 1 0.5969413 0.008080097 0.5713491 5000

References

Beigel, John H, Kay M Tomashek, Lori E Dodd, Aneesh K Mehta, Barry S Zingman, Andre C Kalil, Elizabeth Hohmann, et al. 2020. “Remdesivir for the Treatment of Covid-19.” New England Journal of Medicine 383 (19): 1813–26. https://doi.org/10.1056/NEJMoa2007764.

Chambers, John M, Colin L Mallows, and BW Stuck. 1976. “A Method for Simulating Stable Random Variables.” Journal of the American Statistical Association 71 (354): 340–44.

Gasparyan, Samvel B. 2025. hce: Design and Analysis of Hierarchical Composite Endpoints. CRAN: The Comprehensive R Archive Network, R Package, Version 0.8.5. https://doi.org/10.32614/CRAN.package.hce.

Gasparyan, Samvel B, Joan Buenconsejo, Elaine K Kowalewski, Jan Oscarsson, Olof F Bengtsson, Russell Esterline, Gary G Koch, Otavio Berwanger, and Mikhail N Kosiborod. 2022. “Design and Analysis of Studies Based on Hierarchical Composite Endpoints: Insights from the DARE-19 Trial.” Therapeutic Innovation & Regulatory Science 56 (5): 785–94. https://doi.org/10.1007/s43441-022-00420-1.

Gasparyan, Samvel B, Elaine K Kowalewski, Joan Buenconsejo, and Gary G Koch. 2023. “Hierarchical Composite Endpoints in COVID-19: The DARE-19 Trial.” In Case Studies in Innovative Clinical Trials, 95–148. Chapman; Hall/CRC. https://doi.org/10.1201/9781003288640-7.

Heerspink, Hiddo L, Niels Jongs, Patrick Schloemer, Dustin J Little, Meike Brinker, Christoph Taso, Martin Karpefors, et al. 2023. “Development and Validation of a New HCE for Clinical Trials of Kidney Disease Progression.” Journal of the American Society of Nephrology 34 (12): 2025–38. https://doi.org/10.1681/ASN.0000000000000243.

Hofert, Marius, Ivan Kojadinovic, Martin Maechler, and Jun Yan. 2025. “Copula: Multivariate Dependence with Copulas.” R Package Version 1.1-5. https://CRAN.R-project.org/package=copula.

Hofert, Marius, and Martin Mächler. 2011. “Nested Archimedean Copulas Meet r: The Nacopula Package.” Journal of Statistical Software 39: 1–20.

Kondo, Toru, Samvel B Gasparyan, Pardeep S Jhund, Olof Bengtsson, Brian L Claggett, Rudolf A de Boer, Adrian F Hernandez, et al. 2023. “Use of Win Statistics to Analyze Outcomes in the DAPA-HF and DELIVER Trials.” NEJM Evidence 2 (11): EVIDoa2300042. https://doi.org/10.1056/EVIDoa2300042.

Kosiborod, Mikhail N, Russell Esterline, Remo HM Furtado, Jan Oscarsson, Samvel B Gasparyan, Gary G Koch, Felipe Martinez, et al. 2021. “Dapagliflozin in Patients with Cardiometabolic Risk Factors Hospitalised with COVID-19 (DARE-19): A Randomised, Double-Blind, Placebo-Controlled, Phase 3 Trial.” The Lancet Diabetes & Endocrinology 9 (9): 586–94. https://doi.org/10.1016/S2213-8587(21)00180-7.

Kosiborod, Mikhail, Otavio Berwanger, Gary G Koch, Felipe Martinez, Omar Mukhtar, Subodh Verma, Vijay Chopra, et al. 2021. “Effects of Dapagliflozin on Prevention of Major Clinical Events and Recovery in Patients with Respiratory Failure Because of COVID-19: Design and Rationale for the DARE-19 Study.” Diabetes, Obesity and Metabolism 23 (4): 886–96. https://doi.org/10.1111/dom.14296.

Little, Dustin J, Samvel B Gasparyan, Patrick Schloemer, Niels Jongs, Meike Brinker, Martin Karpefors, Christoph Taso, et al. 2023. “Validity and Utility of a Hierarchical Composite Endpoint for Clinical Trials of Kidney Disease Progression: A Review.” Journal of the American Society of Nephrology 34 (12): 1928–35. https://doi.org/10.1681/ASN.0000000000000244.

Marshall, Albert W, and Ingram Olkin. 1988. “Families of Multivariate Distributions.” Journal of the American Statistical Association 83 (403): 834–41.

Nelsen, Roger B. 2006. An Introduction to Copulas. Springer.

Nolan, John P. 2020. “Univariate Stable Distributions.” Springer Series in Operations Research and Financial Engineering 10: 978–73.

Sklar, M. 1959. “Fonctions de répartition à n Dimensions Et Leurs Marges.” In Annales de l’ISUP, 8:229–31. 3.