Main Methodology
Denote the response vector for the \(i\)th subject by \(\boldsymbol{y}_i=(y_{i1},\cdots,y_{it},\cdots,y_{in_i})'\) where \(y_{it}\) is a response at time period \(t\) (\(i=1,\cdots, N\); \(t=1,\cdots,n_i\)). Note that the model and associated methodology can be applicable to the unequally spaced times and the distinct number of observations from subject to subject.We assume that the responses on different subjects are independent. Also, we assume that \(y_{it}\)’s are conditionally independent given a random vector \(b_{i}\), and that \(y_{it}\)’s. For categorical responses, we assume that \(y_{it}\) has an exponential family distribution so that generalized linear models (GLMs) can be specified by
\[\begin{eqnarray}
&&g\left\{E(y_{it})\right\}=x_{it}^T\beta+z_{it}^Tb_{i},
\tag{2.1}\\
&&b_i=(b_{i1},\ldots,b_{iq})^T\stackrel{\mbox{indep.}}{\sim} N(0,v_i^{-1}\Sigma), \notag\\
&&v_i \stackrel{\mbox{indep.}}{\sim} \Gamma\left(\nu/2,\nu/2\right), \notag
\end{eqnarray}\]
where \(\beta\) is a \(p\times 1\) unknown mean parameter vectors, \(x_{it}\) is a \(p\times 1\) corresponding vector of covariates, \(z_{it}\) is a \(q\times 1\) vector, \(0\) is a \(n_i\times 1\) zero vector, \(\Sigma\) is a \(q\times q\) covariance matrix reflecting the subject variations, and \(\Gamma(a,b)\) denotes the gamma distribution with shape parameter \(a\) and scale parameter \(b\). In this paper, we consider the normal and binary responses and the corresponding links are identity and probit, respectively.
To employ Markov Chain Monte Carlo algorithm techniques for Bayesian estimates and reduce the computational complexity, we introduce a latent variable latent variable \(y_{it}^*\) to associate with the binary outcome \(y_{it}\). The unobservable latent variable \(y_{it}^*\) and the observed binary outcome \(y_{it}\) are connected by:
\[
y_{it}=\mathbf{I}_{(y_{it}^*>0)}, \quad t = 1, \ldots, n_i,
\]
where \(\mathbf{I}_{A}\) is the indicator of event \(A\). Note that \(y_{it}\) is 1 or 0 according to the sign of \(y_{it}^*\). We assume that the latent variable is associated with explanatory variable \(x_{it}\) and random effect \(z_{it}\) with two different approaches to explaining the correlation of the repeated measures within a subject in next two sections.
Modified Cholesky Decomposition with Hypersphere Decomposition
We assume
\[
y_{it}^*=x_{it}^T\beta+z_{it}^Tb_i+\epsilon_{it},
\]
where \(\epsilon_{it}\)’s are prediction error and are assumed as
\[
\boldsymbol{\epsilon}_i=(\epsilon_{i1},\ldots,\epsilon_{in_i})^T \stackrel{indep.}{\sim} N(0,R_i)
\]
with a correlation matrix \(R_i\). Then the model (2.1) is equivalent to, for \(i=1, \ldots, N\) and \(t=1, \ldots, n_i\),
\[\begin{equation}
\begin{aligned}
y_{it} &=
\begin{cases}
1, & y_{it}^*>0; \\
0 & \mbox{otherwhise}.
\end{cases}
\end{aligned}\tag{2.2}
\end{equation}\]
Let \(\boldsymbol{y}_i^* = (y_{i1}, \ldots, y_{in_i})'\) and rewrite (2.2) in matrix form as
\[\begin{eqnarray*}
\boldsymbol{y}_i^*=X_i\beta+Z_i b_i +\boldsymbol{\epsilon}_i,
\end{eqnarray*}\]
where \(X_i\) and \(Z_i\) are \(n_i\times p\) and \(n_i\times q\) matrices and defined as follows, respectively,
\[\begin{eqnarray*}
X_i=\left(
\begin{array}{c}
x_{i1}^T \\
\vdots \\
x_{in_i}^T \\
\end{array}
\right),
Z_i=\left(
\begin{array}{c}
z_{i1}^T \\
\vdots \\
z_{in_i}^T \\
\end{array}
\right) .
\end{eqnarray*}\]
On account of identifiability, \(R_i\) is restricted as a correlation matrix. In addition to the diagonal elements equal to 1 and off-diagonal elements between -1 and 1, \(R_i\) is required to be a positive definite matrix. Moreover, the number of parameters to be estimated increases quadratically with the dimension of the matrix. In order to model \(R_{i}\) being positive definite, while alleviating the computational expensive, we propose a modeling of the correlation matrix using the hypersphere decomposition (HD) approach (Zhang, Leng, and Tang 2015). The correlation matrix \(R_i\) is reparameterized via hyperspherical coordinates (Zhang, Leng, and Tang 2015) by the following decomposition:
\[\begin{eqnarray*}
R_i=F_iF_i^T,
\end{eqnarray*}\]
where \(F_i\) is a lower triangular matrix with the \((t, j)\)th element \(f_{itj}\) given by
\[\begin{eqnarray*}
f_{itj}=\left\{
\begin{array}{ll}
1, & \hbox{for $t=j=1$;}\\
\cos(\omega_{itj}), & \hbox{for $j=1$, $t=2,\cdots,n_i$;} \\
\cos(\omega_{itj})\prod_{r=1}^{j-1}\sin(\omega_{itr}), & \hbox{for $2\leq j<t\leq n_i$;} \\
\prod_{r=1}^{j-1}\sin(\omega_{itr}), & \hbox{for $t=j;~ j=2,\cdots,n_i$.}
\end{array}
\right.
\end{eqnarray*}\]
Here \(\omega_{itj}\)’s \((\in (0,\pi))\) are angle parameters for trigonometric functions, and the angle parameters are referred to hypersphere (HS) parameters.
As in Zhang, Leng, and Tang (2015), we consider the modeling of the angles \(\omega_{itj}\)’s instead of the direct modeling of the correlation matrix, and the modeling can be directly interpreted for the correlation (Zhang, Leng, and Tang 2015). In order to obtain the unconstrained estimation of \(\omega_{itj}\) and to reduce the number of parameters for facilitating the computation, we model the correlation structures of the responses in terms of the generalized linear models which are given by:
\[\begin{eqnarray}
&&\log\left(\frac{\omega_{itj}}{\pi-\omega_{itj}}\right)=u_{itj}^T\delta,\tag{2.3}
\end{eqnarray}\]
where \(\delta\) is \(a \times 1\) vector of unknown parameter vector to model the HS parameters. Importantly, the proposed method reduces the model complexity and obtain fast-to-execute models without loss of accuracy. In addition, note that the design vector \(u_{itj}\) in (2.3) is used to model the HS parameters as functions of subject-specific covariates (Zhang, Leng, and Tang 2015; Pan and Mackenzie 2015). As a result, the design vector is specified in a manners similar to those used in heteroscedastic regression models. For example, time lag, \(|t - j|\), in the design vector \(u_{itj}\) specifies higher lag models. We will introduce the priors of parameters in the model in Section 3.
Generalized Autoregressive and Moving-Averaging Model
In order to give the complete specification of the joint distribution, the latent random vectors \(\boldsymbol{y}_{i}^*=(y_{i1}^*,\ldots,y_{in_i}^*)^T\) are jointly normally distributed given by:
\[\begin{equation}
\begin{aligned}
y_{i1}^*&= x_{i1}^T\beta + \epsilon_{i1}, \\
y_{it}^*&=x_{it}^T\beta+z_{it}^Tb_i+\sum_{j=1}^{u-1}\phi_{ij}(y_{i,t-j}^* - x_{i,t-j}^T\beta)+ \sum_{s=1}^{v-1} \psi_{i,t-s}\epsilon_{i,t-s}+\epsilon_{it}, t=1, \ldots, n_i,
\end{aligned}\tag{2.4}
\end{equation}\]
where \(\phi_{ij}\)’s are generalized autoregressive parameters (GARPs) and \(\psi_{is}\)’s are generalized moving-average parameters (GMAPs). In addition, \(\epsilon_{it}\)’s are prediction error and are assumed as
\[
\boldsymbol{\epsilon}_i=(\epsilon_{i1},\ldots,\epsilon_{in_i})^T \stackrel{indep.}{\sim} N(0,I_i),
\]
where \(I_i\) is an \(n_i\times n_i\) identity matrix. We can rewrite (2.4) in matrix form as
\[\begin{equation*}
\Phi_i (\boldsymbol{y}_i^*-X_i\beta) = Z_i b_i + \Psi_i \boldsymbol{\epsilon}_i,
\end{equation*}\]
where \(X_i\), \(n_i\times p\), \(Z_i\), \(n_i\times q\), \(\Phi_i\), \(n_i\times n_i\), \(\Psi_i\), \(n_i\times n_i\), are matrices and defined as follows, respectively,
\[\begin{eqnarray*}
X_i=\left(
\begin{array}{c}
x_{i1}^T \\
\vdots \\
x_{in_i}^T \\
\end{array}
\right),\quad
Z_i=\left(
\begin{array}{c}
z_{i1}^T \\
\vdots \\
z_{in_i}^T \\
\end{array}
\right)
\end{eqnarray*}\]
\[\begin{eqnarray*}
\Phi_i =\left(
\begin{array}{cccccc}
1 & 0 & 0 & \ldots & 0&0\\
-\phi_{i1} & 1 & 0 & \ldots &0&0\\
-\phi_{i2} & -\phi_{i1} & 1 & \ldots & 0&0 \\
\vdots & \vdots & \vdots & \ddots & \vdots &\vdots \\
0& \ldots & -\phi_{i,u-2} & \ldots & 1 &0 \\
0 & \ldots &-\phi_{i,u-1} & \ldots & -\phi_{i1} & 1
\end{array}
\right), \quad
\Psi_i =\left(
\begin{array}{cccccc}
1 & 0 & 0 & \ldots & 0&0\\
\psi_{i1} & 1 & 0 & \ldots &0&0\\
\psi_{i2} & \psi_{i1} & 1 & \ldots & 0&0 \\
\vdots & \vdots & \vdots & \ddots & \vdots &\vdots \\
0& \ldots & \psi_{i,v-2} & \ldots & 1 &0 \\
0 & \ldots &\psi_{i,v-1} & \ldots & \psi_{i1} & 1
\end{array}
\right)
\end{eqnarray*}\]
Note that \(\Phi_i\) and \(\Psi_i\) uniquely exist and are respectively called the generalized autoregressive parameter matrix (GARPM) and moving-average parameter matrix (GMAPM).
The density of the latent variable \(\boldsymbol{y}^*\) conditional on the random effect \(b=(b_1, \ldots, b_q)\) is given by
\[
p(\boldsymbol{y}^*|\boldsymbol{b}, \theta) = \prod_{i=1}^N\prod_{t=1}^{n_i} f(y^*_{it}; \mu_{it}, I_i),
\]
where \(\theta = (\beta, \nu, \Sigma, \phi, \psi)\) denote the collection of model parameters, \(\mu_{it} = x_{it}^T\beta+z_{it}^Tb_i\) and \(f(\cdot)\) is the multivariate normal density function.
Bayesian Methods
The density of the latent variable \(\boldsymbol{y}^*\) conditional on the random effect \(b=(b_1, \ldots, b_q)\) is given by
\[
p(\boldsymbol{y}^*|\boldsymbol{b}, \theta) = \prod_{i=1}^N\prod_{t=1}^{n_i} f(y^*_{it}; \mu_{it}, I_i),
\]
where $$ denote the collection of model parameters, \(\mu_{it} = x_{it}^T\beta+z_{it}^Tb_i\) and \(f(\cdot)\) is the multivariate normal density function.
To complete the Bayesian specification of the model, we use proper prior distributions instead of noninformative priors, in order to guarantee the propriety of posterior distribution. The prior distributions for \(\beta\), \(\Sigma\), and \(\delta\) in the model are given by:
\[\begin{eqnarray*}
&&\beta \sim N_p(0,\sigma_{\beta}^2 \mathrm{I}), \\
&&\Sigma \sim \mathcal{IW}(\nu_b, \Lambda^{-1}),
\end{eqnarray*}\]
where \(\sigma_{\beta}^2\) and \(\sigma_{\delta}^2\) are large to be noninformative (Daniels and Zhao 2003), \(\mathrm{I}\) is the identity matrix corresponding to the dimension of the parameter, and \(\Lambda\) is the positive definite scale matrix. Here \(N_m(\mu,\Omega)\) denotes the \(m\)-variate multivariate normal distribution with a mean vector \(\mu\) and a covariance matrix \(\Omega\), and \(\mathcal{IW}(\nu,\Lambda^{-1})\) denotes the inverse Wishart distribtion with degrees of freedom \(\nu\) and a symmetric positive definite \(q\times q\) scale matrix.
The prior of the parameters in correlation matrix for two different correlation structures are
Implementation
The aim of this section is to provide a detailed step-by-step in simulation studies to highlight the most important features of package BayesRGMM
, and to show how to extract the most important results. This section can also be considered as a user manual which allows the readers to run their own similar analyses involving only a mild modification of the example code.
The BayesRGMM
package contains four core functions. The main function `BayesRobustProbit
carries out the entire MCMC procedure, and outputs the posterior samples and estimates for model parameters along with several useful estimated information criterion statistics. Internally, most of the calculation is provided by a compiled code to reduce the computational time. User-friendly summary function BayesRobustProbitSummary
that summarizes model estimation outcomes is equipped with BayesRobustProbit
.
It provides basic posterior summary statistics such as the posterior point and confidence interval estimates of parameters and the values of information criterion statistics for model comparison. The function SimulatedDataGenerator
is used to generate simulated data for simulation studies with ARMA and MCD correlation structures. CorrMat.HSD
is applied to calculate the correlation matrix in MCD model structure. In this section, we focus primarily on introducing the those functions, and demonstrate their usage with numerical experiments.
Simulation Studies
In the simulation, we demonstrate the use of functions in the BayesRGMM
package.We consider a simple random intercept model \(q=1\) with the regression coefficient vector of size \(p=4\) given by
\[
\beta = (-0.2,-0.3, 0.8, -0.4)',
\]
where \(x_{it}\)’s are independently generated from \(N(0, 1)\). In addition, \(v_i\) is independently simulated from \(\Gamma(3, 3)\), \(b_i\) is from \(N(0, v_i \times 0.5)\), and \(z_{it}=1\) for \(i=1, \ldots, n\). That is, \(b\)’s correspond to a Student’s \(t\)-distribution with the degrees of freedom equal to \(6\). We then generate the responses based on (2.1).
In addition to the different correlation structures, we also consider a data that is missing completely at random (MCAR). We set the missing machines as follows
\[\begin{align*}
\eta_{it} &= -1.5 \times y_{t-1, i} + 1.2\times y_{t-2, i};
\end{align*}\]
Then the missing probability depends on \(\eta_{it}\)’s defined as
\[
p^{\mbox{miss}}_{it} = \frac{e^{\eta_{it}}}{1+e^{\eta_{it}}}.
\]
The data for subject \(i\) at time point \(t\) is missing according to three observed responses for the subject.
Simulation 1: MCD Correlation Structure
The correlation matrix \(R_i\) is created based on the given values
\[\begin{align}\label{Eq:Order_in_HS}
\delta = (-0.5, -0.3)' \quad \mbox{and} \quad u_{itj} = \left(\mathbf{I}\left\{|t-s|=1\right\}, \mathbf{I}\left\{|t-s|=2\right\}\right)'
\end{align}\]
library(BayesRGMM)
rm(list = ls(all = TRUE))
Fixed.Effs = c(-0.2, -0.3, 0.8, -0.4)
P = length(Fixed.Effs)
q = 1
T = 5
N = 100
num.of.iter = 100
HSD.para = c(-0.5, -0.3)
a = length(HSD.para)
w = array(runif(T * T * a), c(T, T, a))
for (time.diff in 1:a) w[, , time.diff] = 1 * (as.matrix(dist(1:T, 1:T, method = "manhattan")) ==
time.diff)
HSD.sim.data = SimulatedDataGenerator(Num.of.Obs = N, Num.of.TimePoints = T, Fixed.Effs = Fixed.Effs,
Random.Effs = list(Sigma = 0.5 * diag(1), df = 3), Cor.in.DesignMat = 0, Missing = list(Missing.Mechanism = 2,
RegCoefs = c(-1.5, 1.2)), Cor.Str = "HSD", HSD.DesignMat.para = list(HSD.para = HSD.para,
DesignMat = w))
hyper.params = list(sigma2.beta = 1, sigma2.delta = 1, v.gamma = 5, InvWishart.df = 5,
InvWishart.Lambda = diag(q))
HSD.output = BayesRobustProbit(fixed = as.formula(paste("y~-1+", paste0("x", 1:P,
collapse = "+"))), data = HSD.sim.data$sim.data, random = ~1, HS.model = ~IndTime1 +
IndTime2, subset = NULL, na.action = "na.exclude", hyper.params = hyper.params,
num.of.iter = num.of.iter)
#> Start running MCMC procedure:
#>
#> Finish MCMC Procedure.
#>
#> Data Descriptives:
#> Longitudinal Data Information:
#> Number of Observations: 354 Number of Covariates: 3
#> Number of subjects: 100
original = options(digits = 4)
Model.Estimation = BayesRobustProbitSummary(HSD.output)
cat("\nCoefficients:\n")
#>
#> Coefficients:
print(Model.Estimation$beta.est.CI)
#> PostMean StErr 2.5% 97.5%
#> x1 -0.4 0.04 -0.8 -0.07
#> x2 -0.3 0.04 -0.7 0.05
#> x3 0.9 0.03 0.6 1.14
#> x4 -0.3 0.03 -0.5 -0.07
cat("\nParameters in HSD model:\n")
#>
#> Parameters in HSD model:
print(Model.Estimation$delta.est.CI)
#> PostMean StErr 2.5% 97.5%
#> IndTime1 -0.57 0.09 -0.9 0.01
#> IndTime2 -0.07 0.05 -0.3 0.16
cat("\nRandom effect: \n")
#>
#> Random effect:
print(Model.Estimation$random.cov)
#> Variance
#> (Intercept) 0.4
cat("\nModel Information:\n")
#>
#> Model Information:
print(Model.Estimation$model.info)
#> logL AIC BIC CIC DIC MPL RJR ACC
#> -118 448 724 0.4 367 5 1 0.9
cat("\nEstimate of Ri: \n")
#>
#> Estimate of Ri:
print(Model.Estimation$Ri, quote = FALSE)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.0 0.4 0.1 0.0 0.0
#> [2,] 0.4 1.0 0.4 0.0 0.0
#> [3,] 0.1 0.4 1.0 0.4 0.0
#> [4,] 0.0 0.0 0.4 1.0 0.4
#> [5,] 0.0 0.0 0.0 0.4 1.0
options(original)
Simulation 2: ARMA Correlation Structure
To model the serial dependence for the repeated measurement, we consider an ARMA(1, 1) correlation structure with
\[
\phi = 0.4, \qquad \mbox{and}\qquad \psi = 0.2.
\]
library(BayesRGMM)
rm(list = ls(all = TRUE))
Fixed.Effs = c(-0.2, -0.8, 1, -1.2)
P = length(Fixed.Effs)
q = 1
T = 10
N = 100
num.of.iter = 100
ARMA.sim.data = SimulatedDataGenerator(Num.of.Obs = N, Num.of.TimePoints = T, Fixed.Effs = Fixed.Effs,
Random.Effs = list(Sigma = 0.5 * diag(1), df = 3), Cor.in.DesignMat = 0, list(Missing.Mechanism = 2,
RegCoefs = c(-1.5, 1.2)), Cor.Str = "ARMA", ARMA.para = list(AR.para = 0.4,
MA.para = 0.2))
ARMA.output = BayesRobustProbit(fixed = as.formula(paste("y~-1+", paste0("x", 1:P,
collapse = "+"))), data = ARMA.sim.data$sim.data, random = ~1, subset = NULL,
na.action = "na.exclude", arma.order = c(1, 1), num.of.iter = num.of.iter)
#>
#> Data Descriptives:
#> Longitudinal Data Information:
#> Number of Observations: 817 Number of Covariates: 3
#> Number of subjects: 100
#>
#> Start running MCMC procedure:
#>
[ ] 0%
[= ] 2%
[== ] 4%
[=== ] 6%
[==== ] 8%
[===== ] 10%
[====== ] 12%
[======= ] 14%
[======== ] 16%
[========= ] 18%
[========== ] 20%
[=========== ] 22%
[============ ] 24%
[============= ] 26%
[============== ] 28%
[=============== ] 30%
[================ ] 32%
[================= ] 34%
[================== ] 36%
[=================== ] 38%
[==================== ] 40%
[===================== ] 42%
[====================== ] 44%
[======================= ] 46%
[======================== ] 48%
[========================= ] 50%
[========================== ] 52%
[=========================== ] 54%
[============================ ] 56%
[============================= ] 58%
[============================== ] 60%
[=============================== ] 62%
[================================ ] 64%
[================================= ] 66%
[================================== ] 68%
[=================================== ] 70%
[==================================== ] 72%
[===================================== ] 74%
[====================================== ] 76%
[======================================= ] 78%
[======================================== ] 80%
[========================================= ] 82%
[========================================== ] 84%
[=========================================== ] 86%
[============================================ ] 88%
[============================================= ] 90%
[============================================== ] 92%
[=============================================== ] 94%
[================================================ ] 96%
[================================================= ] 98%
[==================================================] 100%
#> Finish MCMC Procedure.
original = options(digits = 4)
Model.Estimation = BayesRobustProbitSummary(ARMA.output)
cat("\nCoefficients:\n")
#>
#> Coefficients:
print(Model.Estimation$beta.est.CI)
#> PostMean StErr 2.5% 97.5%
#> x1 -0.06 0.03 -0.3 0.172
#> x2 -0.94 0.13 -1.4 -0.003
#> x3 0.93 0.07 0.4 1.191
#> x4 -0.96 0.05 -1.2 -0.653
cat("\nAMRA parameters:\n\n")
#>
#> AMRA parameters:
print(Model.Estimation$arma.est)
#> PostMean StErr 2.5% 97.5%
#> phi 1 0.004 0.01 -0.06 0.1
#> psi 1 0.071 0.02 -0.06 0.2
cat("\nRandom effect: \n")
#>
#> Random effect:
print(Model.Estimation$random.cov)
#> Variance
#> (Intercept) 0.3
cat("\nModel Information:\n")
#>
#> Model Information:
print(Model.Estimation$model.info)
#> logL AIC BIC CIC DIC MPL RJR ACC
#> -233 678 954 2e-06 580 13 1 0.9
options(original)
Daniels, Michael J., and Yan D. Zhao. 2003. “Modelling the Random Effects Covariance Matrix in Longitudinal Data.” Statistics in Medicine 22: 1631–47.
Pan, J., and G. Mackenzie. 2015. “On Modelling Mean-Covariance Structures in Longitudinal Studies.” Biometrika 90: 239–44.
Zhang, W., C. Leng, and C. Y. Tang. 2015. “A Joint Modelling Approach for Longitudinal Studies.” Journal of Royal Statistical Society, Series B 77: 219–38.