The package reda mainly provides function to fit gamma frailty model with either a piecewise constant or a spline as the baseline rate function for recurrent event data. What’s more, some handy functions are designed, such as computing and plotting sample nonparametric mean cumulative function, or so-called Nelson-Aalen estimator. Most functions in this package are S4 methods that produce S4 class objects.

In this vignette, we mainly introduce the basic usage of the functions provided in the package by examples. The details of function snytax and slots of the objects produced are available in the package manual, which will thus not be covered in this vignette.

An outline of the remainder of the vignette is as follows: We first introduce the simulated sample recurrent event data and the data checking rules. After then, the demonstration of the main function rateReg for model fitting is provided, which includes fitting model with (one piece) constant, piecewise constant, and spline baseline rate function. What follows next are the examples of functions that summarize the model fitted and functions for model selection based on AIC or BIC. Last but not the least, the demonstration of estimation and the mean cumulative function (MCF) is given, which includes sample MCF and estimated MCF from the fitted model.

Simulated Sample Recurrent Event Data

library(reda) # attach package 
data(simuDat) # attach sample dataset

First of all, the sample recurrent event data we are going to use in the following examples is called simuDat, which contains totally 500 observations of 5 variables.

head(simuDat, 10)
##    ID time event group    x1
## 1   1    1     1 Contr -1.93
## 2   1   22     1 Contr -1.93
## 3   1   23     1 Contr -1.93
## 4   1   57     1 Contr -1.93
## 5   1  112     0 Contr -1.93
## 6   2  140     0 Treat -0.11
## 7   3   40     1 Contr  0.20
## 8   3  168     0 Contr  0.20
## 9   4   14     1 Contr -0.43
## 10  4  112     0 Contr -0.43
str(simuDat)
## 'data.frame':    500 obs. of  5 variables:
##  $ ID   : num  1 1 1 1 1 2 3 3 4 4 ...
##  $ time : num  1 22 23 57 112 140 40 168 14 112 ...
##  $ event: int  1 1 1 1 0 0 1 0 1 0 ...
##  $ group: Factor w/ 2 levels "Contr","Treat": 1 1 1 1 1 2 1 1 1 1 ...
##  $ x1   : num  -1.93 -1.93 -1.93 -1.93 -1.93 -0.11 0.2 0.2 -0.43 -0.43 ...

where

The dataset is originally simulated by thinning method (Lewis and Shedler 1979) and further processed for a better demonstration purpose.

Data Checking

In the main function rateReg for model fitting, formula response is specified by function Survr, which has embedded data checking procedure for recurrent event data modeled by method based on counts and rate function. Therefore, before model fitting, the observations of the covariates specified in the formula will be checked. The checking rules include

The subject’s ID will be pinpointed if its observation violates any checking rule shown above.

Model Fitting

Model with Constant Rate Function

The default model when argument df, knots, and degree are not specified is gamma frailty model with (one piece) constant rate function, which is equivalent to negative binomial regression of the same shape and rate parameter in gamma prior.

In the following examples, we fit the models on the first 50 subjects by specifying argument subset.

constFit <- rateReg(Survr(ID, time, event) ~ group + x1, data = simuDat,
                    subset = ID %in% 1:50)
# brief summary
constFit # or explicitly call show(constFit)
## Call: 
## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat, 
##     subset = ID %in% 1:50)
## 
## Coefficients of covariates: 
## groupTreat         x1 
## -0.7602119  0.3568776 
## 
## Frailty parameter:  0.6968785 
## 
## Boundary knots: 
## 0, 168
## 
## Coefficients of pieces:
##   (0, 168] 
## 0.03818742

The function rateReg returns rateReg-class object, which can be printed out by calling the object. (Internally, show method for rateReg object is called.)

Model with Piecewise Constant Rate Function

When argument df or knots (at least one internal knot) is specified, the model becomes gamma frailty model with piecewise constant rate function or so-called HEART model (Fu, Luo, and Qu 2014) if argument degree is specified to be zero as default.

We may specify df and leave knots and degree as default. Then piecewise constant rate function will be applied and the number of pieces will equal df. The internal knots will be automatically specified at suitable quantiles of the covariate representing event and censoring time.

For example, two pieces’ constant rate function can be simply specified by setting df = 2. The internal knot will be the median time of all the event and censoring time.

# two pieces' constant rate function i.e. one internal knot
twoPiecesFit <- rateReg(Survr(ID, time, event) ~ group + x1, df = 2, 
                        data = simuDat, subset = ID %in% 1:50)
twoPiecesFit
## Call: 
## rateReg(formula = Survr(ID, time, event) ~ group + x1, df = 2, 
##     data = simuDat, subset = ID %in% 1:50)
## 
## Coefficients of covariates: 
## groupTreat         x1 
## -0.7811931  0.3434791 
## 
## Frailty parameter:  0.6932254 
## 
## Internal knots: 
## 102
## 
## Boundary knots: 
## 0, 168
## 
## Coefficients of pieces:
##   (0, 102] (102, 168] 
## 0.03500828 0.04619518

In the example shown above, the internal knots is set automatically to be 102 and the baseline rate function is two pieces’ constant.

If internal knots are specified explicitly, the df will be neglected even if it is specified. An example of model with six pieces’ constant rate function is given as follows:

piecesFit <- rateReg(Survr(ID, time, event) ~ group + x1, df = 2,
                     knots = seq(from = 28, to = 140, by = 28),
                     data = simuDat, subset = ID %in% 1:50)
piecesFit # note that df = 2 is neglected since knots are specified
## Call: 
## rateReg(formula = Survr(ID, time, event) ~ group + x1, df = 2, 
##     knots = seq(from = 28, to = 140, by = 28), data = simuDat, 
##     subset = ID %in% 1:50)
## 
## Coefficients of covariates: 
## groupTreat         x1 
## -0.7989444  0.3385190 
## 
## Frailty parameter:  0.6880174 
## 
## Internal knots: 
## 28, 56, 84, 112, 140
## 
## Boundary knots: 
## 0, 168
## 
## Coefficients of pieces:
##    (0, 28]   (28, 56]   (56, 84]  (84, 112] (112, 140] (140, 168] 
## 0.03691952 0.03691952 0.02685056 0.04148603 0.04141472 0.06078316

Model with Spline Rate Function

When argument degree is specified to be a positive integer, the baseline rate function is fitted by splines. Currently, B-splines are used.

For example, one may want to fit the baseline rate function by a cubic spline with two internal knots. Then we may explicitly specify degree = 3 and knots to be a length-two numeric vector. Or we may simply specify degree = 3 and df = 6 (if intercept is considered, which is default). Similarly, the internal knots will be automatically specified at suitable quantiles of the covariate representing event and censoring time.

Generally speaking, the degree of freedom of spline (or the number of spline bases) equals the summation of the number of internal knots and the degree of each spline base, plus one if intercept is included in spline bases.

## df can be simply specified
splineFit <- rateReg(Survr(ID, time, event) ~ group + x1, df = 6,
                     degree = 3L, data = simuDat, subset = ID %in% 1:50)
## internal knots are set as 33% and 67% quantiles of time variable
splineFit 
## Call: 
## rateReg(formula = Survr(ID, time, event) ~ group + x1, df = 6, 
##     degree = 3L, data = simuDat, subset = ID %in% 1:50)
## 
## Coefficients of covariates: 
## groupTreat         x1 
## -0.8006927  0.3311404 
## 
## Frailty parameter:  0.6892237 
## 
## Internal knots: 
## 64.66667, 134.3333
## 
## Boundary knots: 
## 0, 168
## 
## Coefficients of spline bases:
## B-spline.1 B-spline.2 B-spline.3 B-spline.4 B-spline.5 B-spline.6 
## 0.03797068 0.04208896 0.02773524 0.02764460 0.08942374 0.02632372
## or internal knots are expicitly specified
splineFit <- rateReg(Survr(ID, time, event) ~ group + x1, df = 2,
                     degree = 3L, knots = c(56, 112),
                     data = simuDat, subset = ID %in% 1:50)
splineFit # note that df = 2 is neglected similarly
## Call: 
## rateReg(formula = Survr(ID, time, event) ~ group + x1, df = 2, 
##     knots = c(56, 112), degree = 3L, data = simuDat, subset = ID %in% 
##         1:50)
## 
## Coefficients of covariates: 
## groupTreat         x1 
## -0.7988806  0.3314544 
## 
## Frailty parameter:  0.689836 
## 
## Internal knots: 
## 56, 112
## 
## Boundary knots: 
## 0, 168
## 
## Coefficients of spline bases:
## B-spline.1 B-spline.2 B-spline.3 B-spline.4 B-spline.5 B-spline.6 
## 0.03695694 0.04332529 0.03047682 0.02390817 0.07810392 0.04657188

Summary of Model Fits

A brief summary of the fitted model is given by show method as shown in the previous examples. Further, summary method for rateReg-class object provides a more specific summary of the model fitted. For instance, the summary of the models fitted in section of model fitting can be called as follows:

summary(constFit)
## Call: 
## rateReg(formula = Survr(ID, time, event) ~ group + x1, data = simuDat, 
##     subset = ID %in% 1:50)
## 
## Coefficients of covariates: 
##                coef exp(coef) se(coef)       z Pr(>|z|)  
## groupTreat -0.76021   0.46757  0.38268 -1.9866  0.04697 *
## x1          0.35688   1.42886  0.23063  1.5474  0.12177  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Parameter of frailty: 
##         parameter        se
## Frailty 0.6968785 0.1772091
## 
## Boundary knots:
## 0, 168
## 
## Coefficients of pieces:
##              coef se(coef)
## (0, 168] 0.038187   0.0089
## 
## Loglikelihood:  -996.6353
summary(piecesFit, showCall = FALSE)
## Coefficients of covariates: 
##                coef exp(coef) se(coef)       z Pr(>|z|)  
## groupTreat -0.79894   0.44980  0.38899 -2.0539  0.03999 *
## x1          0.33852   1.40287  0.23243  1.4564  0.14528  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Parameter of frailty: 
##         parameter        se
## Frailty 0.6880174 0.1740403
## 
## Internal knots: 
## 28, 56, 84, 112, 140
## 
## Boundary knots:
## 0, 168
## 
## Coefficients of pieces:
##                coef se(coef)
## (0, 28]    0.036920   0.0102
## (28, 56]   0.036920   0.0102
## (56, 84]   0.026851   0.0078
## (84, 112]  0.041486   0.0114
## (112, 140] 0.041415   0.0116
## (140, 168] 0.060783   0.0175
## 
## Loglikelihood:  -990.6756
summary(splineFit, showCall = FALSE, showKnots = FALSE)
## Coefficients of covariates: 
##                coef exp(coef) se(coef)       z Pr(>|z|)  
## groupTreat -0.79888   0.44983  0.38562 -2.0717  0.03829 *
## x1          0.33145   1.39299  0.23135  1.4327  0.15194  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Parameter of frailty: 
##         parameter        se
## Frailty  0.689836 0.1746868
## 
## Degree of spline bases: 3 
## 
## Coefficients of spline bases:
##                coef se(coef)
## B-spline.1 0.036957   0.0161
## B-spline.2 0.043325   0.0179
## B-spline.3 0.030477   0.0170
## B-spline.4 0.023908   0.0168
## B-spline.5 0.078104   0.0265
## B-spline.6 0.046572   0.0243
## 
## Loglikelihood:  -988.5322

The summary includes the function call, estimated covariate coefficients, estimated parameter of frailty variable, internal knots (if exist), boundary knots, degree of spline bases if splines are applied, coefficients of rate function bases (pieces), and log likelihood of the model fitted. Outputs of function call or knots, may be suppressed by specifying argument showCall or showKnots to be FALSE, respectively, in summary method, which would be especially useful for a relatively concise summary in a reproducible report using Rmarkdown, etc.

What’s more, the corresponding coef and confint method for point estimates and confidence interval for covariate coefficients are provided as well. The estimated coefficients of baseline rate function can be given by function baseRate. Let’s take the fitted model with spline rate function as an example.

## point estimates of covariate coefficients
coef(splineFit)
## groupTreat         x1 
## -0.7988806  0.3314544
## confidence interval for covariate coefficients
confint(splineFit, level = 0.95) 
##                 2.5%      97.5%
## groupTreat -1.680536 0.08277428
## x1         -2.398761 3.06166981
## estimated coefficients of baseline rate function
baseRate(splineFit)
## B-spline.1 B-spline.2 B-spline.3 B-spline.4 B-spline.5 B-spline.6 
## 0.03695694 0.04332529 0.03047682 0.02390817 0.07810392 0.04657188

Model Selection

Two handy functions are provided for model selection. We may compare and select the models with different baseline rate function based on Akaike Information Criterion (AIC) by function AIC or Bayesian Information Criterion (BIC) by function BIC.

AIC(constFit, piecesFit, splineFit)
##           df      AIC
## constFit   4 2001.271
## piecesFit  9 1999.351
## splineFit  9 1995.064
BIC(constFit, piecesFit, splineFit)
##           df      BIC
## constFit   4 2016.086
## piecesFit  9 2032.685
## splineFit  9 2028.398

Mean Cumulative Function (MCF)

The generic function to compute the sample MCF and the estimated MCF from the fitted model is called mcf. Another related generic function called plotMcf plots the estimated MCF by using ggplot2 plotting system.

Sample MCF (Nelson-Aalen Estimator)

The nonparametric sample MCF is also called Nelson-Aalen Estimator (Nelson 2003). The point estimate of MCF at each time point does not assume any particular underlying model. The variance of estimated MCF (ReliaWiki 2012) at each time point is estimated and the approximate confidence intervals are provided as well, which is constructed based on the asymptotic normality of log MCF.

If a formula with Survr as response is specified in function mcf, the method for sample MCF will be called. The covariate specified at the right hand side of the formula should be either 1 or any one factor variable in the data. The former computes the overall sample MCF. The latter computes the sample MCF for each level of the factor variable specified, respectively.

## overall sample MCF
sampleMcf1 <- mcf(Survr(ID, time, event) ~ 1,
                  data = simuDat, subset = ID %in% 1:10)
## sample MCF for different groups
sampleMcf2 <- mcf(Survr(ID, time, event) ~ group,
                  data = simuDat, subset = ID %in% 1:10)

After estimation, we may plot the sample MCF by function plotMcf, which actually returns a ggplot object so that the plot produced can be easily further customized by functions in package ggplot2.

For example, the overall sample MCF and the sample MCF for two groups (control vs. treatment) estimated above are plotted, respectively, as follows:

## plot overall sample MCF
plotMcf(sampleMcf1)

## plot MCF for different groups
plotMcf(sampleMcf2, mark.time = TRUE, 
        lty = c(1, 5), col = c("orange", "navy")) +
    ggplot2::xlab("Days") + ggplot2::theme_bw()

Note that all the censoring time can be marked on the step curve by specifying mark.time = TRUE. The type and color of the line can be specified through lty and col, respectively.

Estimated MCF from the fitted Model

If rateReg-class object is supplied to function mcf, the method for rateReg-class is called, which returns the estimated baseline MCF from the fitted model if newdata is not specified in the function.

The example estimating and plotting the baseline MCF from the fitted model with piecewise constant rate function is shown as follows:

piecesMcf <- mcf(piecesFit)
plotMcf(piecesMcf, conf.int = TRUE, col = "blueviolet") +
    ggplot2::xlab("Days") + ggplot2::theme_bw()

The argument newdata allows one to estimate the MCF for a given dataset instead of the baseline MCF. If newdata is specified, the data frame should have the same column names as the covariate names appearing in the formula of original fitting. The MCF will be estimated for each unique row in the data frame and its confidence intervals are constructed based on Delta-method.

In addition, we may specify the name for grouping each unique row and the levels of each group through groupName and groupLevels, respectively. For example, we may specify groupName = "Gender" and groupLevels = c("Male", "Female") for estimation of different gender groups.

As the last example in this vignette, we estimate the MCF from fitted model with spline rate function for the different treatment groups and plot the estimated MCFs and their confidence intervals correspondingly.

newDat <- data.frame(x1 = rep(0, 2), group = c("Treat", "Contr"))
estmcf <- mcf(splineFit, newdata = newDat, groupName = "Group", 
              groupLevels = c("Treatment", "Control"))
plotMcf(estmcf, conf.int = TRUE, col = c("royalblue", "red"), lty = c(1, 5)) +
    ggplot2::ggtitle("Control vs. Treatment") + ggplot2::xlab("Days") +
    ggplot2::theme_bw() 

Reference

Fu, H, J Luo, and Y Qu. 2014. “Hypoglycemic Events Analysis via Recurrent Time-to-Event (HEART) Models.” Journal of Biopharmaceutical Statistics.

Lewis, P. A., and G. S. Shedler. 1979. “Simulation of Nonhomogeneous Poisson Processes by Thinning.” Naval Research Logistics Quarterly 26 (3). Wiley Online Library: 403–13.

Nelson, Wayne B. 2003. Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other Applications. Vol. 10. SIAM.

ReliaWiki. 2012. “Recurrent Event Data Analysis.” http://reliawiki.org/index.php/Recurrent_Event_Data_Analysis.