In this vignette we illustrate the use of a number of basic generic functions for models fitted by the mixed_model()
function of package GLMMadaptive.
We start by simulating some data for a binary longitudinal outcome:
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
We continue by fitting the mixed effects logistic regression for y
assuming random intercepts and random slopes for the random-effects part.
fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
As in the majority of model-fitting functions in R, the print()
and summary()
methods display a short and a detailed output of the fitted model, respectively. For 'MixMod'
objects we obtain
fm
#>
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF,
#> family = binomial())
#>
#>
#> Model:
#> family: binomial
#> link: logit
#>
#> Random effects covariance matrix:
#> StdDev Corr
#> (Intercept) 0.7260
#> time 0.2489 0.6256
#>
#> Fixed effects:
#> (Intercept) sexfemale time sexfemale:time
#> -2.071584799 -1.172034240 0.262702412 0.002493367
#>
#> log-Lik: -358.8284
and
summary(fm)
#>
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF,
#> family = binomial())
#>
#> Data Descriptives:
#> Number of Observations: 800
#> Number of Groups: 100
#>
#> Model:
#> family: binomial
#> link: logit
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -358.8284 731.6568 749.893
#>
#> Random effects covariance matrix:
#> StdDev Corr
#> (Intercept) 0.7260
#> time 0.2489 0.6256
#>
#> Fixed effects:
#> Value Std.Err z-value p-value
#> (Intercept) -2.0716 0.3199 -6.4755 < 1e-04
#> sexfemale -1.1720 0.4722 -2.4819 0.01307
#> time 0.2627 0.0564 4.6613 < 1e-04
#> sexfemale:time 0.0025 0.0781 0.0319 0.97453
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE
The output is rather self-explanatory. However, just note that the fixed-effects coefficients are on the linear predictor scale, and hence are the corresponding log-odds for the intercept and log-odds ratios for the rest of the parameters. The summary()
only shows the estimated coefficients, standard errors and p-values, but no confidence intervals. These can be separately obtained using the confint()
method, i.e.,
exp(confint(fm))
#> 2.5 % 97.5 %
#> (Intercept) 0.06729962 0.2358477
#> sexfemale 0.12274967 0.7815624
#> time 1.16444341 1.4523190
#> sexfemale:time 0.86021510 1.1683115
By default the confidence intervals are produced for the fixed effects. Hence, taking the exp we obtain the confidence intervals for the corresponding odds-ratios. In addition, by default, the level of the confidence intervals is 95%. The following piece of code produces 90% confidence intervals for the variances of the random intercepts and slopes, and for their covariance:
confint(fm, parm = "var-cov", level = 0.90)
#> 5 % 95 %
#> var.(Intercept) 0.06950694 3.9970863
#> cov.(Int)_time -0.04285652 0.9474592
#> var.time 0.02770139 1.3373452
The estimated variance-covariance matrix of the maximum likelihood estimates of all parameters is returned using the vcov()
method, e.g.,
vcov(fm)
#> (Intercept) sexfemale time sexfemale:time
#> (Intercept) 0.102343645 -0.076210649 -0.008492563 6.175501e-03
#> sexfemale -0.076210649 0.223010883 0.005931219 -1.767413e-02
#> time -0.008492563 0.005931219 0.003176198 -2.954446e-03
#> sexfemale:time 0.006175501 -0.017674132 -0.002954446 6.099051e-03
#> D_11 -0.085880010 -0.020019772 0.008266537 1.722885e-03
#> D_12 0.017555327 0.004438214 -0.001891643 -4.506264e-04
#> D_22 -0.082121459 -0.020038676 0.010306830 8.189895e-05
#> D_11 D_12 D_22
#> (Intercept) -0.085880010 0.0175553274 -8.212146e-02
#> sexfemale -0.020019772 0.0044382136 -2.003868e-02
#> time 0.008266537 -0.0018916428 1.030683e-02
#> sexfemale:time 0.001722885 -0.0004506264 8.189895e-05
#> D_11 0.379264088 -0.0950463827 4.268870e-01
#> D_12 -0.095046383 0.0374304672 -1.872794e-01
#> D_22 0.426886977 -0.1872794290 1.058810e+00
The elements of this covariance matrix that correspond to the elements of the covariance matrix of the random effects (i.e., the elements D_xx
) are on the log-Cholesky scale.
To extract the estimated fixed effects coefficients from a fitted mixed model, we can use the fixef()
method. Similarly, the empirical Bayes estimates of the random effects are extracted using the ranef()
method, and finally the coef()
method returns the subject-specific coefficients, i.e., the sum of the fixed and random effects coefficients:
fixef(fm)
#> (Intercept) sexfemale time sexfemale:time
#> -2.071584799 -1.172034240 0.262702412 0.002493367
head(ranef(fm))
#> (Intercept) time
#> 1 -0.4028803 -0.119484844
#> 2 0.7032286 0.284875825
#> 3 0.5090121 0.211931456
#> 4 -0.2303141 -0.004750287
#> 5 0.2429221 0.161494572
#> 6 -0.3855094 -0.182391588
head(coef(fm))
#> (Intercept) sexfemale time sexfemale:time
#> 1 -2.474465 -1.172034 0.14321757 0.002493367
#> 2 -1.368356 -1.172034 0.54757824 0.002493367
#> 3 -1.562573 -1.172034 0.47463387 0.002493367
#> 4 -2.301899 -1.172034 0.25795212 0.002493367
#> 5 -1.828663 -1.172034 0.42419698 0.002493367
#> 6 -2.457094 -1.172034 0.08031082 0.002493367
The fixed effects estimates in mixed models with nonlinear link functions have an interpretation conditional on the random effects. However, often we wish to obtain parameters with a marginal / population averaged interpretation, which leads many researchers to use generalized estimating equations, and dealing with potential issues with missing data. Nonetheless, recently Hedeker et al. have proposed a nice solution to this problem. Their approach is implemented in function marginal_coefs()
. For example, for model fm
we obtain the marginalized coefficients using:
marginal_coefs(fm)
#> (Intercept) sexfemale time sexfemale:time
#> -1.6016 -1.0881 0.1766 0.0506
The function calculates the marginal probabilities in our case (because we have a binary outcome) using a Monte Carlo procedure with number of samples determined by the M
argument.
Standard errors for the marginalized coefficients are obtained by setting std_errors = TRUE
in the call to marginal_coefs()
, and require a double Monte Carlo procedure for which argument K
comes also into play. To speed up computations, the outer Monte Carlo procedure is performed in parallel using package parallel and number of cores specified in the cores
argument (due to the required computing time, these standard errors are not displayed):
marginal_coefs(fm, std_errors = TRUE)
The fitted()
method extracts fitted values from the fitted mixed model. These are always on the scale of the response variable. The type
argument of fitted()
specifies the type of fitted values computed. The default is type = "mean_subject"
which corresponds to the fitted values calculated using only the fixed-effects part of the linear predictor; hence, for the subject who has random effects values equal to 0, i.e., the “mean subject”:
head(fitted(fm))
#> 1 2 3 4 5 6
#> 0.1118895 0.1156621 0.1647183 0.5815852 0.5940199 0.5950458
Setting type = "subject_specific"
will calculate the fitted values using both the fixed and random effects parts, where for the latter the empirical Bayes estimates of the random effects are used:
head(fitted(fm, type = "subject_specific"))
#> 1 2 3 4 5 6
#> 0.07766777 0.07914174 0.09707109 0.23765440 0.24276090 0.24318768
Finally, setting type = "marginal"
will calculate the fitted values based on the multiplication of the fixed-effects design matrix with the marginalized coefficients described above (due to the required computing time, these fitted values are not displayed):
head(fitted(fm, type = "marginal"))
The residuals()
method simply calculates the residuals by subtracting the fitted values from the observed repeated measurements outcome. Hence, this method also has a type
argument with exactly the same options as the fitted()
method.
To display the estimated longitudinal evolutions of the binary outcome we can use an effect plot. This is simply predictions from the models with the corresponding 95% pointwise confidence intervals.
As a first step we create a data frame the provides the setting based on which the plot is to be produced; function expand.grid()
is helpful in this regard:
nDF <- with(DF, expand.grid(time = seq(min(time), max(time), length.out = 15),
sex = levels(sex)))
Next we use the effectPlotData()
function that does the heavy lifting, i.e., calculates the predictions and confidence intervals from a fitted mixed model for the data frame provided above, i.e.,
plot_data <- effectPlotData(fm, nDF)
Then we can produce the plot using for example the xyplot()
function from package lattice, e.g.,
library("lattice")
xyplot(pred + low + upp ~ time | sex, data = plot_data,
type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
xlab = "Follow-up time", ylab = "log odds")
expit <- function (x) exp(x) / (1 + exp(x))
xyplot(expit(pred) + expit(low) + expit(upp) ~ time | sex, data = plot_data,
type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
xlab = "Follow-up time", ylab = "Subject-Specific Probabilities")
The effectPlotData()
function also allows to compute marginal predictions using the marginalized coefficients described above. This is achieved by setting marginal = TRUE
in the respective call (due to the require computing time, plot not shown):
plot_data_m <- effectPlotData(fm, nDF, marginal = TRUE)
xyplot(expit(pred) + expit(low) + expit(upp) ~ time | sex, data = plot_data_m,
type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
xlab = "Follow-up time", ylab = "Marginal Probabilities")
The anova()
method can be used to compare two fitted mixed models using a likelihood ratio test. For example, we test if we can test the null hypothesis that the covariance between the random intercepts and slopes is equal to zero using
gm <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
family = binomial())
anova(gm, fm)
#>
#> AIC BIC log.Lik LRT df p.value
#> gm 730.94 746.57 -359.47
#> fm 731.66 731.66 -358.83 1.29 1 0.2566