Methods for MixMod Objects

Dimitris Rizopoulos

2018-06-11

MixMod Objects

In this vignette we illustrate the use of a number of basic generic functions for models fitted by the mixed_model() function of package GLMMadaptive.

We start by simulating some data for a binary longitudinal outcome:

set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time

# we constuct a data frame with the design: 
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)

betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes

# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))

We continue by fitting the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.

fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
                  family = binomial())

Detailed Output, Confidence Intervals and Covariance Matrix of the MLEs

As in the majority of model-fitting functions in R, the print() and summary() methods display a short and a detailed output of the fitted model, respectively. For 'MixMod' objects we obtain

fm
#> 
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF, 
#>     family = binomial())
#> 
#> 
#> Model:
#>  family: binomial
#>  link: logit 
#> 
#> Random effects covariance matrix:
#>              StdDev    Corr
#> (Intercept)  0.7260        
#> time         0.2489  0.6256
#> 
#> Fixed effects:
#>    (Intercept)      sexfemale           time sexfemale:time 
#>   -2.071584799   -1.172034240    0.262702412    0.002493367 
#> 
#> log-Lik: -358.8284

and

summary(fm)
#> 
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF, 
#>     family = binomial())
#> 
#> Data Descriptives:
#> Number of Observations: 800
#> Number of Groups: 100 
#> 
#> Model:
#>  family: binomial
#>  link: logit 
#> 
#> Fit statistics:
#>    log.Lik      AIC     BIC
#>  -358.8284 731.6568 749.893
#> 
#> Random effects covariance matrix:
#>              StdDev    Corr
#> (Intercept)  0.7260        
#> time         0.2489  0.6256
#> 
#> Fixed effects:
#>                  Value Std.Err z-value p-value
#> (Intercept)    -2.0716  0.3199 -6.4755 < 1e-04
#> sexfemale      -1.1720  0.4722 -2.4819 0.01307
#> time            0.2627  0.0564  4.6613 < 1e-04
#> sexfemale:time  0.0025  0.0781  0.0319 0.97453
#> 
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#> 
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE

The output is rather self-explanatory. However, just note that the fixed-effects coefficients are on the linear predictor scale, and hence are the corresponding log-odds for the intercept and log-odds ratios for the rest of the parameters. The summary() only shows the estimated coefficients, standard errors and p-values, but no confidence intervals. These can be separately obtained using the confint() method, i.e.,

exp(confint(fm))
#>                     2.5 %    97.5 %
#> (Intercept)    0.06729962 0.2358477
#> sexfemale      0.12274967 0.7815624
#> time           1.16444341 1.4523190
#> sexfemale:time 0.86021510 1.1683115

By default the confidence intervals are produced for the fixed effects. Hence, taking the exp we obtain the confidence intervals for the corresponding odds-ratios. In addition, by default, the level of the confidence intervals is 95%. The following piece of code produces 90% confidence intervals for the variances of the random intercepts and slopes, and for their covariance:

confint(fm, parm = "var-cov", level = 0.90)
#>                         5 %      95 %
#> var.(Intercept)  0.06950694 3.9970863
#> cov.(Int)_time  -0.04285652 0.9474592
#> var.time         0.02770139 1.3373452

The estimated variance-covariance matrix of the maximum likelihood estimates of all parameters is returned using the vcov() method, e.g.,

vcov(fm)
#>                 (Intercept)    sexfemale         time sexfemale:time
#> (Intercept)     0.102343645 -0.076210649 -0.008492563   6.175501e-03
#> sexfemale      -0.076210649  0.223010883  0.005931219  -1.767413e-02
#> time           -0.008492563  0.005931219  0.003176198  -2.954446e-03
#> sexfemale:time  0.006175501 -0.017674132 -0.002954446   6.099051e-03
#> D_11           -0.085880010 -0.020019772  0.008266537   1.722885e-03
#> D_12            0.017555327  0.004438214 -0.001891643  -4.506264e-04
#> D_22           -0.082121459 -0.020038676  0.010306830   8.189895e-05
#>                        D_11          D_12          D_22
#> (Intercept)    -0.085880010  0.0175553274 -8.212146e-02
#> sexfemale      -0.020019772  0.0044382136 -2.003868e-02
#> time            0.008266537 -0.0018916428  1.030683e-02
#> sexfemale:time  0.001722885 -0.0004506264  8.189895e-05
#> D_11            0.379264088 -0.0950463827  4.268870e-01
#> D_12           -0.095046383  0.0374304672 -1.872794e-01
#> D_22            0.426886977 -0.1872794290  1.058810e+00

The elements of this covariance matrix that correspond to the elements of the covariance matrix of the random effects (i.e., the elements D_xx) are on the log-Cholesky scale.

Fixed and Random Effects Estimates

To extract the estimated fixed effects coefficients from a fitted mixed model, we can use the fixef() method. Similarly, the empirical Bayes estimates of the random effects are extracted using the ranef() method, and finally the coef() method returns the subject-specific coefficients, i.e., the sum of the fixed and random effects coefficients:

fixef(fm)
#>    (Intercept)      sexfemale           time sexfemale:time 
#>   -2.071584799   -1.172034240    0.262702412    0.002493367
head(ranef(fm))
#>   (Intercept)         time
#> 1  -0.4028803 -0.119484844
#> 2   0.7032286  0.284875825
#> 3   0.5090121  0.211931456
#> 4  -0.2303141 -0.004750287
#> 5   0.2429221  0.161494572
#> 6  -0.3855094 -0.182391588
head(coef(fm))
#>   (Intercept) sexfemale       time sexfemale:time
#> 1   -2.474465 -1.172034 0.14321757    0.002493367
#> 2   -1.368356 -1.172034 0.54757824    0.002493367
#> 3   -1.562573 -1.172034 0.47463387    0.002493367
#> 4   -2.301899 -1.172034 0.25795212    0.002493367
#> 5   -1.828663 -1.172034 0.42419698    0.002493367
#> 6   -2.457094 -1.172034 0.08031082    0.002493367

Marginalized Coefficients

The fixed effects estimates in mixed models with nonlinear link functions have an interpretation conditional on the random effects. However, often we wish to obtain parameters with a marginal / population averaged interpretation, which leads many researchers to use generalized estimating equations, and dealing with potential issues with missing data. Nonetheless, recently Hedeker et al. have proposed a nice solution to this problem. Their approach is implemented in function marginal_coefs(). For example, for model fm we obtain the marginalized coefficients using:

marginal_coefs(fm)
#>    (Intercept)      sexfemale           time sexfemale:time 
#>        -1.6016        -1.0881         0.1766         0.0506

The function calculates the marginal probabilities in our case (because we have a binary outcome) using a Monte Carlo procedure with number of samples determined by the M argument.

Standard errors for the marginalized coefficients are obtained by setting std_errors = TRUE in the call to marginal_coefs(), and require a double Monte Carlo procedure for which argument K comes also into play. To speed up computations, the outer Monte Carlo procedure is performed in parallel using package parallel and number of cores specified in the cores argument (due to the required computing time, these standard errors are not displayed):

marginal_coefs(fm, std_errors = TRUE)

Fitted Values and Residuals

The fitted() method extracts fitted values from the fitted mixed model. These are always on the scale of the response variable. The type argument of fitted() specifies the type of fitted values computed. The default is type = "mean_subject" which corresponds to the fitted values calculated using only the fixed-effects part of the linear predictor; hence, for the subject who has random effects values equal to 0, i.e., the “mean subject”:

head(fitted(fm))
#>         1         2         3         4         5         6 
#> 0.1118895 0.1156621 0.1647183 0.5815852 0.5940199 0.5950458

Setting type = "subject_specific" will calculate the fitted values using both the fixed and random effects parts, where for the latter the empirical Bayes estimates of the random effects are used:

head(fitted(fm, type = "subject_specific"))
#>          1          2          3          4          5          6 
#> 0.07766777 0.07914174 0.09707109 0.23765440 0.24276090 0.24318768

Finally, setting type = "marginal" will calculate the fitted values based on the multiplication of the fixed-effects design matrix with the marginalized coefficients described above (due to the required computing time, these fitted values are not displayed):

head(fitted(fm, type = "marginal"))

The residuals() method simply calculates the residuals by subtracting the fitted values from the observed repeated measurements outcome. Hence, this method also has a type argument with exactly the same options as the fitted() method.

Effect Plots

To display the estimated longitudinal evolutions of the binary outcome we can use an effect plot. This is simply predictions from the models with the corresponding 95% pointwise confidence intervals.

As a first step we create a data frame the provides the setting based on which the plot is to be produced; function expand.grid() is helpful in this regard:

nDF <- with(DF, expand.grid(time = seq(min(time), max(time), length.out = 15),
                            sex = levels(sex)))

Next we use the effectPlotData() function that does the heavy lifting, i.e., calculates the predictions and confidence intervals from a fitted mixed model for the data frame provided above, i.e.,

plot_data <- effectPlotData(fm, nDF)

Then we can produce the plot using for example the xyplot() function from package lattice, e.g.,

library("lattice")
xyplot(pred + low + upp ~ time | sex, data = plot_data,
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time", ylab = "log odds")

expit <- function (x) exp(x) / (1 + exp(x))
xyplot(expit(pred) + expit(low) + expit(upp) ~ time | sex, data = plot_data,
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time", ylab = "Subject-Specific Probabilities")

The effectPlotData() function also allows to compute marginal predictions using the marginalized coefficients described above. This is achieved by setting marginal = TRUE in the respective call (due to the require computing time, plot not shown):

plot_data_m <- effectPlotData(fm, nDF, marginal = TRUE)

xyplot(expit(pred) + expit(low) + expit(upp) ~ time | sex, data = plot_data_m,
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time", ylab = "Marginal Probabilities")

Comparing Two Models

The anova() method can be used to compare two fitted mixed models using a likelihood ratio test. For example, we test if we can test the null hypothesis that the covariance between the random intercepts and slopes is equal to zero using

gm <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
                  family = binomial())

anova(gm, fm)
#> 
#>       AIC    BIC log.Lik  LRT df p.value
#> gm 730.94 746.57 -359.47                
#> fm 731.66 731.66 -358.83 1.29  1  0.2566