The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The package HLMdiag
was created in order to provide a unified framework for analysts to diagnose hierarchical linear models (HLMs). When HLMdiag
was created in 2014, it made diagnostic procedures available for HLMs that had not yet been implemented in statistical software. Over the past 6 years, other packages have gradually implemented some of these procedures; however, diagnosing a model still often requires multiple packages, each of which has its own syntax and quirks. HLMdiag
provides diagnostic tools targeting all aspects and levels of hierarchical linear models in a single package. In this update, HLMdiag
focuses on usability in its functions. Drawing inspiration from the augment()
function in the broom
package, the new functions hlm_resid()
, hlm_influence()
, and hlm_augment()
append diagnostic information to a model’s data frame, allowing the analyst to easily work with and see how residuals and influence diagnostics relate to the original variables.
This vignette introduces new functions concerning residual diagnostics in HLMdiag
, specifically hlm_resid()
, pull_resid()
, and hlm_augment()
. The main focus of this vignette, hlm_resid()
, is intended to replace HLMresid()
in functionality, and works more robustly with 3-level models as well as models fit with nlme
. Additionally, hlm_resid()
appends residuals and fitted values to a model’s data frame in the form of a tibble. Tibbles help the analyst diagnose models by simplifying the creation of diagnostic plots, especially within ggplot2
, and working seamlessly with sorting functions in dplyr
.
For information about influence diagnostics such as Cook’s distance and leverage in HLMs, the reader should see the vignette for hlm_influence()
.
The function hlm_resid()
takes a hierarchical linear model fit as a lmerMod
or lme
object and extracts residuals and predicted values from the model using both Least Squares (LS) and Empirical Bayes (EB) methods in estimating parameters. Whereas the old function, HLMresid()
, would return a vector with a single type of residual, hlm_resid()
appends specified residuals to the data frame of the model in the form of a tibble. The use of a tibble allows the analyst to easily plot residuals against explanatory variables and identify possible outliers or model deficiencies. The function draws heavy inspiration from the augment
function of the broom
package, however offers more options for the types of residuals that the analyst may want to use.
The presence of multiple sources of variability in hierarchical linear models results in numerous quantities defining residuals. All functions in HLMdiag
follow the classification by Hilden-Minton (1995) and define three types of residuals:
The definitions of level-1 and higher-level residuals lead to different residuals depending on how the fixed and random model coefficients are estimated. hlm_resid()
implements two estimation methods: Least Squares estimation and Empirical Bayes estimation. For a comprehensive discussion of these two methods and how they are implemented, we direct the reader to Loy and Hoffman (2014).
LS and EB residuals each have their own advantages and disadvantages. Level-1 LS residuals are calculated by fitting separate linear models to each group and using LS to estimate the fixed and random effects. Because they rely only on the first level of the hierarchy, they are unconfounded with higher-level residuals and a useful first step in diagnosing a model; however, for small group sample sizes, LS residuals are unreliable. Level-1 EB residuals are defined as the conditional modes of the random effects given the data and the estimated parameter values, which are calculated with (restricted) maximum likelihood. They are confounded with higher-level residuals, but more robust with small sample sizes. For higher-level residuals, coefficients can again be estimated using either LS or EB, but EB residuals are generally preferred due to small group sizes.
To illustrate the use of hlm_resid()
, we make use of data on exam scores of 4,059 students in 65 inner London schools. This data set is distributed as part of the R package mlmRev
(Bates, Maechler, Bolker 2013), which makes well known multilevel modeling data sets available in R.
data("Exam", package = "mlmRev")
head(Exam)
#> school normexam schgend schavg vr intake standLRT sex type
#> 1 1 0.2613242 mixed 0.1661752 mid 50% bottom 25% 0.6190592 F Mxd
#> 2 1 0.1340672 mixed 0.1661752 mid 50% mid 50% 0.2058022 F Mxd
#> 3 1 -1.7238820 mixed 0.1661752 mid 50% top 25% -1.3645760 M Mxd
#> 4 1 0.9675862 mixed 0.1661752 mid 50% mid 50% 0.2058022 F Mxd
#> 5 1 0.5443412 mixed 0.1661752 mid 50% mid 50% 0.3711052 F Mxd
#> 6 1 1.7348992 mixed 0.1661752 mid 50% bottom 25% 2.1894372 M Mxd
#> student
#> 1 143
#> 2 145
#> 3 142
#> 4 141
#> 5 138
#> 6 155
For each student, the data consist of their gender (sex
) and two standardized exam scores— an intake score on the London Reading Test (LRT) at age 11 (standLRT
) and a score on the General Certificate of Secondary Education (GCSE) examination at age 16 (normexam
). Additionally, the students’ LRT scores were used to segment students into three categories (bottom 25%, middle 50%, and top 25%) based on their verbal reasoning subscore (vr
) and overall score (intake)
. At the school level, the data contain the average intake score for the school (schavg
) and type based on school gender (schgend
, coded as mixed, boys, or girls).
We illustrate usage for hlm_resid()
below using the exam data and fitting an initial two-level HLM using a student’s standardized London Reading Test intake score (standLRT) to explain their GCSE exam score, allowing for a random intercept for each school:
<- lme4::lmer(normexam ~ standLRT + (1 | school), Exam, REML = FALSE)
fm1
fm1#> Linear mixed model fit by maximum likelihood ['lmerMod']
#> Formula: normexam ~ standLRT + (1 | school)
#> Data: Exam
#> AIC BIC logLik deviance df.resid
#> 9365.243 9390.478 -4678.622 9357.243 4055
#> Random effects:
#> Groups Name Std.Dev.
#> school (Intercept) 0.3035
#> Residual 0.7522
#> Number of obs: 4059, groups: school, 65
#> Fixed Effects:
#> (Intercept) standLRT
#> 0.002391 0.563371
hlm_resid()
takes 5 arguments:
object
- the model fit as a lmerMod
or lme
object.
level
- the level at which residuals should be extracted. By default, level = 1
, returning both EB and LS level-1 residuals as well as marginal residuals. level
can also be set to a grouping factor as defined in names(object@flist)
in lme4
or in names(object$groups)
in nlme
. When set to a grouping factor, hlm_resid()
returns both EB and LS higher-level residuals, otherwise known as the random effects for each group.
standardize
- a logical indicating if the returned residuals should be standardized. By default, standardize = FALSE
, returning unstandardized residuals. When standardize = TRUE
, residuals are divided by the estimated standard error. For marginal residuals, when standardize = TRUE
, the Cholesky residuals are returned.
include.ls
- a logical indicating if LS residuals should be returned. By default, include.ls = TRUE
. The LS method of estimating residuals is more computationally intensive than the EB method. Consequently, setting include.ls = FALSE
substantially decreases the runtime on models with many observations.
data
- only necessary if na.action = na.exclude
for a lmerMod
model object. data
should specify the data frame used to fit the model containing observations with NA
values.
To extract unstandardized level-1 residuals and marginal residuals, we use the following call.
library(HLMdiag)
hlm_resid(fm1)
#> # A tibble: 4,059 x 10
#> id normexam standLRT school .resid .fitted .ls.resid .ls.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 0.619 1 -0.464 0.725 -0.561 0.822
#> 2 2 0.134 0.206 1 -0.358 0.492 -0.395 0.529
#> 3 3 -1.72 -1.36 1 -1.33 -0.393 -1.14 -0.585
#> 4 4 0.968 0.206 1 0.475 0.492 0.438 0.529
#> 5 5 0.544 0.371 1 -0.0409 0.585 -0.102 0.647
#> 6 6 1.73 2.19 1 0.125 1.61 -0.201 1.94
#> 7 7 1.04 -1.12 1 1.29 -0.253 1.45 -0.409
#> 8 8 -0.129 -1.03 1 0.0773 -0.206 0.221 -0.350
#> 9 9 -0.939 -0.538 1 -1.01 0.0730 -0.941 0.00167
#> 10 10 -1.22 -1.45 1 -0.780 -0.439 -0.576 -0.643
#> # … with 4,049 more rows, and 2 more variables: .mar.resid <dbl>,
#> # .mar.fitted <dbl>
hlm_resid()
appends 6 columns to the model frame:
.resid
the unstandardized Empirical Bayes (EB) level-1 residuals for each observation.fitted
the Empirical Bayes (EB) fitted values.ls.resid
the unstandardized Least Squares (LS) level-1 residuals.ls.fitted
the Least Squares fitted values.mar.resid
the unstandardized marginal residuals.mar.fitted
the marginal fitted valuesNote that the columns containing the Empirical Bayes residuals are simply named .resid
and .fitted
. This is because both lme4::resid()
and nlme::resid()
use the EB method in estimating residuals, thus the analyst is likely more familiar with this type of residual.
We can alter the output of the level-1 residuals using the standardize
and include.ls
parameters introduced above. The following call returns both the level-1 and marginal residuals divided by the estimated standard error and excludes the least squares residuals.
hlm_resid(fm1, standardize = TRUE, include.ls = FALSE)
#> # A tibble: 4,059 x 8
#> id normexam standLRT school .std.resid .fitted .chol.mar.resid .mar.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 0.619 1 -0.616 0.725 -0.111 0.351
#> 2 2 0.134 0.206 1 -0.476 0.492 0.0353 0.118
#> 3 3 -1.72 -1.36 1 -1.77 -0.393 -1.19 -0.766
#> 4 4 0.968 0.206 1 0.632 0.492 1.21 0.118
#> 5 5 0.544 0.371 1 -0.0544 0.585 0.445 0.211
#> 6 6 1.73 2.19 1 0.167 1.61 0.618 1.24
#> 7 7 1.04 -1.12 1 1.72 -0.253 2.06 -0.627
#> 8 8 -0.129 -1.03 1 0.103 -0.206 0.352 -0.580
#> 9 9 -0.939 -0.538 1 -1.35 0.0730 -1.07 -0.301
#> 10 10 -1.22 -1.45 1 -1.04 -0.439 -0.705 -0.813
#> # … with 4,049 more rows
hlm_resid()
appends the EB and marginal residual columns to the mode frame. The names of the columns containing residuals now have the prefix .std
to reflect the fact that they are standardized. Note that standardized marginal residuals are equivalent to the Cholesky residuals of a HLM and thus are named .chol.mar.resid
.
To access higher-level residuals, we can set the level
parameter to a grouping factor as defined in names(object@flist)
in lme4
or in names(object$groups)
in nlme
.
names(fm1@flist)
#> [1] "school"
To extract the school-level residuals, otherwise known as the predicted random effects for school, we use the following call.
hlm_resid(fm1, level = "school")
#> # A tibble: 65 x 3
#> school .ranef.intercept .ls.intercept
#> <chr> <dbl> <dbl>
#> 1 1 0.374 0.451
#> 2 2 0.502 0.550
#> 3 3 0.504 0.626
#> 4 4 0.0181 0.0719
#> 5 5 0.240 0.329
#> 6 6 0.541 0.671
#> 7 7 0.379 0.467
#> 8 8 -0.0262 0.0429
#> 9 9 -0.135 -0.169
#> 10 10 -0.337 -0.257
#> # … with 55 more rows
In the tibble returned by hlm_resid()
, each row now represents one of the groups specified by the level
parameter. In this example, there were 65 schools from which data was collected, and the resulting tibble has 65 rows. If we had included school-level variables in the model, such as the average intake score for a school (schavg
), those variables would also be included in the output. The final two columns contain the random effects for intercept at the school.
.ranef.intercept
the random effects for intercept, using Empirical Bayes estimation.ls.intercept
the random effects for intercept, using Least Squares estimationHad we included other random effect terms in the model, the EB and LS predictions of their random effects would also be included in the output. Again, the EB random effects are simply named .ranef
because it is assumed that the analyst is most familiar with this type of random effect The parameters standardize
and include.ls
work the same way with higher-level residuals and with both lmerMod
and lme
objects. Setting include.ls = FALSE
is often recommended with higher-level residuals, as calculating LS residuals is computationally intensive and the resulting residuals are often unreliable due to small group sizes.
The final parameter, data
, is an argument that becomes required when na.action = na.exclude
for an lmerMod
model object. Model frames in lme4
always use na.omit
. Thus, a data
parameter is needed to retain NA
values in the model frame that hlm_resid()
returns. To demonstrate this requirement, we create a duplicate data set of Exam
with the standardized LRT scores set to NA
for observations 1, 2, and 5.
# make copy of Exam data set
<- Exam
Exam2 # set standLRT to NA for 3 observations
c(1,2,5),7] <- NA
Exam2[# refit model with na.exclude
<- lme4::lmer(normexam ~ standLRT + (1 | school), Exam2,
fm1.NA na.action = na.exclude)
In the first example, when we try to call hlm_resid()
on our model object, it throws an informative error asking for the data set used to fit the model. We provide this using the data
parameter in the second example.
# incorrect:
hlm_resid(fm1.NA)
#> Error in hlm_resid.lmerMod(fm1.NA): Please provide the data frame used to fit the model. This is necessary when the na.action is set to na.exclude
# correct:
hlm_resid(fm1.NA, data = Exam2)
#> # A tibble: 4,059 x 10
#> id school normexam standLRT .resid .fitted .ls.resid .ls.fitted
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.261 NA NA NA NA NA
#> 2 2 1 0.134 NA NA NA NA NA
#> 3 3 1 -1.72 -1.36 -1.34 -0.381 -1.15 -0.575
#> 4 4 1 0.968 0.206 0.464 0.504 0.423 0.545
#> 5 5 1 0.544 NA NA NA NA NA
#> 6 6 1 1.73 2.19 0.113 1.62 -0.224 1.96
#> 7 7 1 1.04 -1.12 1.28 -0.241 1.44 -0.398
#> 8 8 1 -0.129 -1.03 0.0653 -0.194 0.210 -0.339
#> 9 9 1 -0.939 -0.538 -1.02 0.0850 -0.954 0.0142
#> 10 10 1 -1.22 -1.45 -0.792 -0.427 -0.585 -0.634
#> # … with 4,049 more rows, and 2 more variables: .mar.resid <dbl>,
#> # .mar.fitted <dbl>
Note that for the same model fit in nlme
, we do not need to include the data
parameter.
<- nlme::lme(normexam ~ standLRT, random = ~1 | school, Exam2,
fm1.NA.lme na.action = na.exclude)
hlm_resid(fm1.NA.lme)
#> # A tibble: 4,059 x 10
#> id normexam standLRT school .resid .fitted .ls.resid .ls.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 NA 1 NA NA NA NA
#> 2 2 0.134 NA 1 NA NA NA NA
#> 3 3 -1.72 -1.36 1 -1.34 -0.381 -1.15 -0.575
#> 4 4 0.968 0.206 1 0.464 0.504 0.423 0.545
#> 5 5 0.544 NA 1 NA NA NA NA
#> 6 6 1.73 2.19 1 0.113 1.62 -0.224 1.96
#> 7 7 1.04 -1.12 1 1.28 -0.241 1.44 -0.398
#> 8 8 -0.129 -1.03 1 0.0653 -0.194 0.210 -0.339
#> 9 9 -0.939 -0.538 1 -1.02 0.0850 -0.954 0.0142
#> 10 10 -1.22 -1.45 1 -0.792 -0.427 -0.585 -0.634
#> # … with 4,049 more rows, and 2 more variables: .mar.resid <dbl>,
#> # .mar.fitted <dbl>
In three-level models, how we specify middle-level residuals changes depending on whether a model is a lmerMod
or lme
object. This is due to differences in how lme4
and nlme
store their grouping factors To demonstrate this, we use a classroom data set from West, Welch, Galecki (2006). The data contains math testing scores from 1190 children in 312 different classroom within 107 unique schools. All classrooms are nested within schools, and all students are nested within a single classroom.
data("classroom", package = "WWGbook")
head(classroom)
#> sex minority mathkind mathgain ses yearstea mathknow housepov mathprep
#> 1 1 1 448 32 0.46 1 NA 0.082 2.00
#> 2 0 1 460 109 -0.27 1 NA 0.082 2.00
#> 3 1 1 511 56 -0.03 1 NA 0.082 2.00
#> 4 0 1 449 83 -0.38 2 -0.11 0.082 3.25
#> 5 0 1 425 53 -0.03 2 -0.11 0.082 3.25
#> 6 1 1 450 65 0.76 2 -0.11 0.082 3.25
#> classid schoolid childid
#> 1 160 1 1
#> 2 160 1 2
#> 3 160 1 3
#> 4 217 1 4
#> 5 217 1 5
#> 6 217 1 6
The data set contains 12 columns; however, we are only interested in the response variable, mathgain
, and the grouping variables, classid
and schoolid
. We fit the simple unconditional means model in both lme4
and nlme
below.
# lme4
<- lme4::lmer(mathgain ~ 1 + (1|schoolid/classid), data = classroom)
fm2.lmer # nlme
<- nlme::lme(mathgain ~ 1, random = ~1|schoolid/classid, data = classroom) fm2.lme
For both models, classroom is nested within school. We can access the grouping factors for each model with the following:
# lme4
names(fm2.lmer@flist)
#> [1] "classid:schoolid" "schoolid"
# nlme
names(fm2.lme$groups)
#> [1] "schoolid" "classid"
Note how the orders of the grouping factors are opposite for lme4
and nlme
. In lme4
, convention is to start at the lowest level of hierarchy and move upwards (in this case classid:schoolid
, followed by schoolid
). Grouping factors in nlme
are given starting at the highest grouping and move down (schoolid
, followed by classid
). The order of classid
and schoolid
in the returned tibble is based on these conventions.
To extract the classroom-level random effects for fm2.lmer
, we set level = "classid:schoolid"
.
# lme4
hlm_resid(fm2.lmer, level = "classid:schoolid")
#> # A tibble: 312 x 5
#> group classid schoolid .ranef.intercept .ls.intercept
#> <chr> <int> <int> <dbl> <dbl>
#> 1 160:1 160 1 1.64 4.15
#> 2 217:1 217 1 -0.433 -4.15
#> 3 197:2 197 2 -1.69 -12.9
#> 4 211:2 211 2 2.51 6.58
#> 5 307:2 307 2 2.44 6.33
#> 6 11:3 11 3 3.87 -4.38
#> 7 137:3 137 3 5.56 16.1
#> 8 145:3 145 3 8.10 6.62
#> 9 228:3 228 3 -0.0237 -18.4
#> 10 48:4 48 4 3.77 44
#> # … with 302 more rows
For the fm2.lme
, we set level = "classid"
.
# nlme
hlm_resid(fm2.lme, level = "classid")
#> # A tibble: 312 x 5
#> group schoolid classid .ranef.intercept .ls.intercept
#> <chr> <int> <int> <dbl> <dbl>
#> 1 1/160 1 160 1.64 4.15
#> 2 1/217 1 217 -0.433 -4.15
#> 3 2/197 2 197 -1.69 -12.9
#> 4 2/211 2 211 2.51 6.58
#> 5 2/307 2 307 2.44 6.33
#> 6 3/11 3 11 3.87 -4.38
#> 7 3/137 3 137 5.56 16.1
#> 8 3/145 3 145 8.10 6.62
#> 9 3/228 3 228 -0.0235 -18.4
#> 10 4/48 4 48 3.77 44
#> # … with 302 more rows
Now that we have discussed how to extract all types of residuals for hierarchical models using hlm_resid()
, we illustrate how to use those residuals in context to evaluate a model. We will diagnose the earlier model, fm1
, fit on the Exam
data set above. We use hlm_resid()
to extract the standardized level-1 and marginal residuals.
<- hlm_resid(fm1, level = 1, standardize = TRUE)
resid1_fm1
resid1_fm1#> # A tibble: 4,059 x 10
#> id normexam standLRT school .std.resid .fitted .std.ls.resid .ls.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 0.619 1 -0.616 0.725 -0.682 0.822
#> 2 2 0.134 0.206 1 -0.476 0.492 -0.480 0.529
#> 3 3 -1.72 -1.36 1 -1.77 -0.393 -1.40 -0.585
#> 4 4 0.968 0.206 1 0.632 0.492 0.532 0.529
#> 5 5 0.544 0.371 1 -0.0544 0.585 -0.124 0.647
#> 6 6 1.73 2.19 1 0.167 1.61 -0.251 1.94
#> 7 7 1.04 -1.12 1 1.72 -0.253 1.78 -0.409
#> 8 8 -0.129 -1.03 1 0.103 -0.206 0.271 -0.350
#> 9 9 -0.939 -0.538 1 -1.35 0.0730 -1.15 0.00167
#> 10 10 -1.22 -1.45 1 -1.04 -0.439 -0.712 -0.643
#> # … with 4,049 more rows, and 2 more variables: .chol.mar.resid <dbl>,
#> # .mar.fitted <dbl>
Utilizing the tibble output of hlm_resid()
, we can easily plot the standardized LS residuals against explanatory variables such as the standardized LRT score. We use the LS residuals at level-1 because they are unconfounded of higher-level residuals.
library(ggplot2)
ggplot(data = resid1_fm1, aes(x = standLRT, y = .std.ls.resid)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "loess", se = FALSE) +
labs(y = "LS level-1 residuals",
title = "LS residuals against standardized LRT score")
The smoother shows that standardized LRT scores might not be linearly related to GCSE exam scores. Likelihood ratio tests (not shown) confirm that quadratic and cubic terms for standLRT
contribute significantly in describing GCSE exam scores, so we incorporate these terms in the updated model, fm1b
. We then reassess the level-1 LS residuals.
<- update(fm1, .~. + I(standLRT^2) + I(standLRT^3))
fm1b
<- hlm_resid(fm1b, level = 1, standardize = TRUE)
resid1_fm1b
ggplot(data = resid1_fm1b, aes(x = standLRT, y = .std.ls.resid)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "loess", se = FALSE) +
labs(y = "LS level-1 residuals", title = "LS residuals against standardized LRT score")
The addition of the quadratic and cubic terms seems to have improved the level-1 residuals. Additionally, there don’t appear to be any large outliers that might need to be removed. We double check the LS level-1 residuals for fm1
, plotting them against the LS fitted values appended to the model frame by hlm_resid()
.
ggplot(data = resid1_fm1b, aes(x = .ls.fitted, y = .std.ls.resid)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "loess", se = FALSE) +
labs(x = "LS fitted values",
y = "LS level-1 residuals",
title = "LS residuals against LS fitted values")
The LS level-1 residuals give no indication that the model violates any assumptions of an HLM. Because there are no glaring issues, we move on to the higher-level residuals, again plotting the output of hlm_resid()
.
<- hlm_resid(fm1b, level = "school", include.ls = FALSE)
resid2_fm1b
resid2_fm1b#> # A tibble: 65 x 2
#> school .ranef.intercept
#> <chr> <dbl>
#> 1 1 0.374
#> 2 2 0.498
#> 3 3 0.502
#> 4 4 0.0211
#> 5 5 0.247
#> 6 6 0.538
#> 7 7 0.390
#> 8 8 -0.0220
#> 9 9 -0.156
#> 10 10 -0.331
#> # … with 55 more rows
ggplot(data = resid2_fm1b, aes(x = school, y = .ranef.intercept)) +
geom_point() +
labs(y = "Random effects - intercept",
title = "Intercept random effects against school") +
coord_flip()
The school-level random effects again show no large outliers. We would now check influence diagnostics such as Cook’s distance and leverage to ensure our model is appropriate. For a discussion of influence diagnostics within HLMdiag
, we direct the reader to the vignette for hlm_influence()
.
The function pull_resid()
is an efficient way to pull a single type of residual as a vector. Whereas hlm_resid()
spends time calculating all types of residuals and fitted values, pull_resid()
only calculates the type of residual specified by the user, saving time over using hlm_resid()
and indexing for a specific column. This is especially useful when used in a loop, such as when using simulation-based diagnostics.
pull_resid()
takes three parameters, object
, type
, and standardize
. The parameters object
and standardize
work identically as they do in hlm_resid()
, and the new parameter, type
, can be set to:
"ls"
to return LS level-1 residuals"eb"
to return EB level-1 residuals"marginal"
to return marginal residualsThe example below illustrates the output of pull_resid()
being used to extract level-1 standardized LS residuals.
head(pull_resid(fm1, type = "ls", standardize = TRUE))
#> [1] -0.6824458 -0.4800891 -1.4044892 0.5323373 -0.1242091 -0.2512477
The function pull_resid()
is only implemented for level-1 residuals. If the analyst wants to extract random effects for a model, they should use the ranef()
functions from either lme4
or nlme
.
The hlm_augment()
function combines the hlm_resid()
and hlm_influence()
functions to return a tibble containing information about the residuals and the influence diagnostics appended to the data. hlm_augment()
has three parameters, object
, level
, and include.ls
, all of which have the same functionality as in hlm_resid()
. The syntax is the same for the two functions. For example, we can extract both level-1 residual and influence diagnostics from the model fm1
with the following call.
<- hlm_augment(fm1)
fm1.aug ::glimpse(fm1.aug)
tibble#> Rows: 4,059
#> Columns: 15
#> $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
#> $ normexam <dbl> 0.2613242, 0.1340672, -1.7238820, 0.9675862, 0.544341…
#> $ standLRT <dbl> 0.6190592, 0.2058022, -1.3645760, 0.2058022, 0.371105…
#> $ school <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ .resid <dbl> -0.46358741, -0.35802733, -1.33127074, 0.47549167, -0…
#> $ .fitted <dbl> 0.72491161, 0.49209453, -0.39261126, 0.49209453, 0.58…
#> $ .ls.resid <dbl> -0.56113518, -0.39525182, -1.13926654, 0.43826718, -0…
#> $ .ls.fitted <dbl> 0.822459376, 0.529319021, -0.584615462, 0.529319021, …
#> $ .mar.resid <dbl> -0.089826659, 0.015733418, -0.957509986, 0.849252418,…
#> $ .mar.fitted <dbl> 0.35115086, 0.11833378, -0.76637201, 0.11833378, 0.21…
#> $ cooksd <dbl> 1.503345e-05, 2.075760e-06, 1.042793e-03, 3.661260e-0…
#> $ mdffits <dbl> 1.503227e-05, 2.075722e-06, 1.042108e-03, 3.661194e-0…
#> $ covtrace <dbl> 7.814095e-05, 1.809065e-05, 6.568974e-04, 1.809065e-0…
#> $ covratio <dbl> 1.000078, 1.000018, 1.000657, 1.000018, 1.000031, 1.0…
#> $ leverage.overall <dbl> 0.01271288, 0.01265360, 0.01328391, 0.01265360, 0.012…
This is useful for inspecting residuals values and influence diagnostics values at the same time. However, hlm_augment()
lacks some of the functionality that hlm_influence()
and hlm_resid()
have. The delete
and approx
parameters available for hlm_influence()
are not available in hlm_augment()
, so the function will always use a one step approximation and delete all observations or groups instead of a selected few. The standardize
parameter from hlm_resid()
is also not included, so unstandardized residuals will always be returned. If additional functionality is required, hlm_influence()
or hlm_resid()
should be used instead. hlm_augment()
is especially useful for inspecting influence diagnostics of observations with relatively high residuals, or vice versa. For more information about available functionality in hlm_influence()
, see the hlm_influence()
vignette.
Adam Loy, Heike Hofmann (2014). HLMdiag: A Suite of Diagnostics for Hierarchical Linear Models in R. Journal of Statistical Software, 56(5), 1-28. DOI
Brady West, Kathleen Welch, Andrzej Galecki (2006) Linear Mixed Models: A Practical Guide Using Statistical Software. First Edition. Chapman Hall / CRC Press. ISBN 1584884800
David Robinson, Alex Hayes (2020). broom: Convert Statistical Analysis Objects into Tidy Tibbles. R package version 0.5.6. https://CRAN.R-project.org/package=broom
Douglas Bates, Martin Maechler, Ben Bolker (2020). mlmRev: Examples from Multilevel Modelling Software Review. R package version 1.0-8. https://CRAN.R-project.org/package=mlmRev
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. DOI
J.A. Hilden-Minton (1995). Multilevel Diagnostics for Mixed and Hierarchical Linear Models. Ph.D. thesis, University of California Los Angeles.
Jose Pinheiro, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, R Core Team (2020). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-148, https://CRAN.R-project.org/package=nlme.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.