In this example, we will show how to use lslx
to conduct semi-confirmatory factor analysis. The example uses data HolzingerSwineford1939
in the package lavaan
. Hence, lavaan
must be installed.
In the following specification, x1
- x9
is assumed to be measurements of 3 latent factors: visual
, textual
, and speed
.
model <-
'
visual :=> x1 + x2 + x3
textual :=> x4 + x5 + x6
speed :=> x7 + x8 + x9
visual :~> x4 + x5 + x6 + x7 + x8 + x9
textual :~> x1 + x2 + x3 + x7 + x8 + x9
speed :~> x1 + x2 + x3 + x4 + x5 + x6
visual <=> fix(1) * visual
textual <=> fix(1) * textual
speed <=> fix(1) * speed
'
The operator :=>
means that the LHS latent factors is defined by the RHS observed variables. In particular, the loadings are freely estimated. The operator :~>
also means that the LHS latent factors is defined by the RHS observed variables, but these loadings are set as penalized coefficients. In this model, visual
is mainly measured by x1
- x3
, textual
is mainly measured by x4
- x6
, and speed
is mainly measured by x7
- x9
. However, the inclusion of the penalized loadings indicates that each measurement may not be only influenced by one latent factor. The operator <=>
means that the LHS and RHS variables/factors are covaried. If the LHS and RHS variable/factor are the same, <=>
specifies the variance of that variable/factor. For scale setting, visual <=> fix(1) * visual
makes the variance of visual
to be zero. Details of model syntax can be found in the section of Model Syntax via ?lslx
.
lslx
is written as an R6
class. Everytime we conduct analysis with lslx
, an lslx
object must be initialized. The following code initializes an lslx
object named r6_lslx
.
library(lslx)
r6_lslx <- lslx$new(model = model,
data = lavaan::HolzingerSwineford1939)
An 'lslx' R6 class is initialized via 'data'.
Response Variable(s): x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factor(s): visual textual speed
Here, lslx
is the object generator for lslx
object and new
is the build-in method of lslx
to generate a new lslx
object. The initialization of lslx
requires users to specify a model for model specification (argument model
) and a data to be fitted (argument sample_data
). The data set must contains all the observed variables specified in the given model. In is also possible to initialize an lslx
object via sample moments (see vignette("structural-equation-modeling")
).
To see the model specification, we may use the extract_specification
method.
r6_lslx$extract_specification()
relation left right group reference matrice block type start
x1<-1|G x1<-1 x1 1 G FALSE alpha y<-1 free NA
x2<-1|G x2<-1 x2 1 G FALSE alpha y<-1 free NA
x3<-1|G x3<-1 x3 1 G FALSE alpha y<-1 free NA
x4<-1|G x4<-1 x4 1 G FALSE alpha y<-1 free NA
x5<-1|G x5<-1 x5 1 G FALSE alpha y<-1 free NA
x6<-1|G x6<-1 x6 1 G FALSE alpha y<-1 free NA
x7<-1|G x7<-1 x7 1 G FALSE alpha y<-1 free NA
x8<-1|G x8<-1 x8 1 G FALSE alpha y<-1 free NA
x9<-1|G x9<-1 x9 1 G FALSE alpha y<-1 free NA
x1<-visual|G x1<-visual x1 visual G FALSE beta y<-f free NA
x2<-visual|G x2<-visual x2 visual G FALSE beta y<-f free NA
x3<-visual|G x3<-visual x3 visual G FALSE beta y<-f free NA
x4<-visual|G x4<-visual x4 visual G FALSE beta y<-f pen NA
x5<-visual|G x5<-visual x5 visual G FALSE beta y<-f pen NA
x6<-visual|G x6<-visual x6 visual G FALSE beta y<-f pen NA
x7<-visual|G x7<-visual x7 visual G FALSE beta y<-f pen NA
x8<-visual|G x8<-visual x8 visual G FALSE beta y<-f pen NA
x9<-visual|G x9<-visual x9 visual G FALSE beta y<-f pen NA
x1<-textual|G x1<-textual x1 textual G FALSE beta y<-f pen NA
x2<-textual|G x2<-textual x2 textual G FALSE beta y<-f pen NA
x3<-textual|G x3<-textual x3 textual G FALSE beta y<-f pen NA
x4<-textual|G x4<-textual x4 textual G FALSE beta y<-f free NA
x5<-textual|G x5<-textual x5 textual G FALSE beta y<-f free NA
x6<-textual|G x6<-textual x6 textual G FALSE beta y<-f free NA
x7<-textual|G x7<-textual x7 textual G FALSE beta y<-f pen NA
x8<-textual|G x8<-textual x8 textual G FALSE beta y<-f pen NA
x9<-textual|G x9<-textual x9 textual G FALSE beta y<-f pen NA
x1<-speed|G x1<-speed x1 speed G FALSE beta y<-f pen NA
x2<-speed|G x2<-speed x2 speed G FALSE beta y<-f pen NA
x3<-speed|G x3<-speed x3 speed G FALSE beta y<-f pen NA
x4<-speed|G x4<-speed x4 speed G FALSE beta y<-f pen NA
x5<-speed|G x5<-speed x5 speed G FALSE beta y<-f pen NA
x6<-speed|G x6<-speed x6 speed G FALSE beta y<-f pen NA
x7<-speed|G x7<-speed x7 speed G FALSE beta y<-f free NA
x8<-speed|G x8<-speed x8 speed G FALSE beta y<-f free NA
x9<-speed|G x9<-speed x9 speed G FALSE beta y<-f free NA
visual<->visual|G visual<->visual visual visual G FALSE psi f<->f fixed 1
textual<->visual|G textual<->visual textual visual G FALSE psi f<->f free NA
speed<->visual|G speed<->visual speed visual G FALSE psi f<->f free NA
textual<->textual|G textual<->textual textual textual G FALSE psi f<->f fixed 1
speed<->textual|G speed<->textual speed textual G FALSE psi f<->f free NA
speed<->speed|G speed<->speed speed speed G FALSE psi f<->f fixed 1
x1<->x1|G x1<->x1 x1 x1 G FALSE psi y<->y free NA
x2<->x2|G x2<->x2 x2 x2 G FALSE psi y<->y free NA
x3<->x3|G x3<->x3 x3 x3 G FALSE psi y<->y free NA
x4<->x4|G x4<->x4 x4 x4 G FALSE psi y<->y free NA
x5<->x5|G x5<->x5 x5 x5 G FALSE psi y<->y free NA
x6<->x6|G x6<->x6 x6 x6 G FALSE psi y<->y free NA
x7<->x7|G x7<->x7 x7 x7 G FALSE psi y<->y free NA
x8<->x8|G x8<->x8 x8 x8 G FALSE psi y<->y free NA
x9<->x9|G x9<->x9 x9 x9 G FALSE psi y<->y free NA
The row names show the coefficient names. The most two relevant columns are type
which shows the type of the coefficient and start
which gives the starting value. In lslx
, many extract
related methods are defined. extract
related methods can be used to extract quantities stored in lslx
object. For details, please see the the section of Extract-Related Methods via ?lslx
.
After an lslx
object is initialized, method fit
can be used to fit the specified model to the given data.
r6_lslx$fit(penalty_method = "mcp",
lambda_grid = seq(.02, .30, .02),
delta_grid = c(5, 10))
CONGRATS: The optimization algorithm converged under all specified penalty levels.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100
The fitting process requires users to specify the penalty method (argument penalty_method
) and the considerd penalty levels (argument lambda_grid
and delta_grid
). In this example, the mcp
penalty is implemented on the lambda grid seq(.02, .30, .02)
and delta grid c(5, 10)
. Note that in this example lambda = 0
is not considered because it may result in unidentified model. All the fitting result will be stored in the fitting
field of r6_lslx
.
Unlike traditional SEM analysis, lslx
fits the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Availble selectors can be found in the section of Penalty Level Selection via ?lslx
. The following code summarize the fitting result under the penalty level selected by Bayesian information criterion (BIC).
r6_lslx$summarize(selector = "bic")
General Information
number of observation 301
number of complete observation 301
number of missing pattern none
number of group 1
number of response 9
number of factor 3
number of free coefficient 30
number of penalized coefficient 18
Fitting Information
penalty method mcp
lambda grid 0.02 - 0.3
delta grid 5 - 10
algorithm fisher
missing method none
tolerance for convergence 0.001
Saturated Model Information
loss value 0.000
number of non-zero coefficient 54.000
degree of freedom 0.000
Baseline Model Information
loss value 3.052
number of non-zero coefficient 18.000
degree of freedom 36.000
Numerical Condition
lambda 0.260
delta 5.000
objective value 0.248
objective gradient absolute maximum 0.001
objective hessian convexity 0.695
number of iteration 2.000
loss value 0.188
number of non-zero coefficient 31.000
degree of freedom 23.000
robust degree of freedom 24.208
scaling factor 1.053
Information Criteria
Akaike information criterion (aic) 0.035
Akaike information criterion with penalty being 3 (aic3) -0.042
consistent Akaike information criterion (caic) -0.325
Bayesian information criterion (bic) -0.248
adjusted Bayesian information criterion (abic) -0.006
Haughton Bayesian information criterion (hbic) -0.108
robust Akaike information criterion (raic) 0.027
robust Akaike information criterion with penalty being 3 (raic3) -0.054
robust consistent Akaike information criterion (rcaic) -0.352
robust Bayesian information criterion (rbic) -0.271
robust adjusted Bayesian information criterion (rabic) -0.016
robust Haughton Bayesian information criterion (rhbic) -0.124
Fit Indices
root mean square error of approximation (rmsea) 0.070
comparative fit indice (cfi) 0.962
non-normed fit indice (nnfi) 0.941
standardized root mean of residual (srmr) 0.049
Likelihood Ratio Test
statistic df p-value
unadjusted 56.467 23.000 0.000
mean-adjusted 53.649 23.000 0.000
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.070 0.042 0.097
mean-adjusted 0.068 0.039 0.097
Coefficient Test (Standard Error = "sandwich", Alpha Level = 0.05)
Factor Loading
type estimate std.error z-value p-value lower upper
x1<-visual free 0.892 0.097 9.231 0.000 0.703 1.082
x2<-visual free 0.499 0.086 5.830 0.000 0.331 0.666
x3<-visual free 0.651 0.076 8.511 0.000 0.501 0.801
x4<-visual pen 0.000 - - - - -
x5<-visual pen 0.000 - - - - -
x6<-visual pen 0.000 - - - - -
x7<-visual pen 0.000 - - - - -
x8<-visual pen 0.000 - - - - -
x9<-visual pen 0.258 0.070 3.687 0.000 0.121 0.395
x1<-textual pen 0.000 - - - - -
x2<-textual pen 0.000 - - - - -
x3<-textual pen 0.000 - - - - -
x4<-textual free 0.987 0.061 16.200 0.000 0.867 1.106
x5<-textual free 1.099 0.054 20.183 0.000 0.992 1.206
x6<-textual free 0.914 0.058 15.805 0.000 0.801 1.027
x7<-textual pen 0.000 - - - - -
x8<-textual pen 0.000 - - - - -
x9<-textual pen 0.000 - - - - -
x1<-speed pen 0.000 - - - - -
x2<-speed pen 0.000 - - - - -
x3<-speed pen 0.000 - - - - -
x4<-speed pen 0.000 - - - - -
x5<-speed pen 0.000 - - - - -
x6<-speed pen 0.000 - - - - -
x7<-speed free 0.660 0.069 9.640 0.000 0.526 0.795
x8<-speed free 0.792 0.073 10.878 0.000 0.650 0.935
x9<-speed free 0.503 0.066 7.569 0.000 0.373 0.633
Covariance
type estimate std.error z-value p-value lower upper
textual<->visual free 0.451 0.073 6.206 0.000 0.309 0.593
speed<->visual free 0.325 0.083 3.901 0.000 0.162 0.488
speed<->textual free 0.220 0.078 2.839 0.002 0.068 0.372
Variance
type estimate std.error z-value p-value lower upper
visual<->visual fixed 1.000 - - - - -
textual<->textual fixed 1.000 - - - - -
speed<->speed fixed 1.000 - - - - -
x1<->x1 free 0.541 0.151 3.584 0.000 0.245 0.837
x2<->x2 free 1.127 0.111 10.138 0.000 0.909 1.345
x3<->x3 free 0.840 0.094 8.890 0.000 0.655 1.025
x4<->x4 free 0.372 0.050 7.391 0.000 0.273 0.470
x5<->x5 free 0.445 0.057 7.855 0.000 0.334 0.556
x6<->x6 free 0.357 0.046 7.672 0.000 0.266 0.448
x7<->x7 free 0.746 0.081 9.222 0.000 0.587 0.905
x8<->x8 free 0.393 0.091 4.321 0.000 0.215 0.571
x9<->x9 free 0.576 0.067 8.621 0.000 0.445 0.707
Intercept
type estimate std.error z-value p-value lower upper
x1<-1 free 4.936 0.067 73.473 0.000 4.804 5.067
x2<-1 free 6.088 0.068 89.855 0.000 5.955 6.221
x3<-1 free 2.250 0.065 34.579 0.000 2.123 2.378
x4<-1 free 3.061 0.067 45.694 0.000 2.930 3.192
x5<-1 free 4.341 0.074 58.452 0.000 4.195 4.486
x6<-1 free 2.186 0.063 34.667 0.000 2.062 2.309
x7<-1 free 4.186 0.063 66.766 0.000 4.063 4.309
x8<-1 free 5.527 0.058 94.854 0.000 5.413 5.641
x9<-1 free 5.374 0.058 92.546 0.000 5.260 5.488
In ths example, we can see that most penalized coefficients are estimated as zero under the selected penalty level except for x9<-visual
, which shows the benefit of using the semi-confirmatory approach. The summarize
method also shows the result of significance tests for the coefficients. In lslx
, the default standard errors are calculated based on sandwich formula whenever raw data is available. It is generally valid even when the model is misspecified and the data is not normal. However, it may not be valid after selecting an optimal penalty level.
lslx
provides four methods for visualizing the fitting results. The method plot_numerical_condition
shows the numerical condition under all the penalty levels. The following code plots the values of n_iter_out
(number of iterations in outer loop), objective_gradient_abs_max
(maximum of absolute value of gradient of objective function), and objective_hessian_convexity
(minimum of univariate approximate hessian). The plot can be used to evaluate the quality of numerical optimization.
r6_lslx$plot_numerical_condition()
The method plot_information_criterion
shows the values of information criteria under all the penalty levels.
r6_lslx$plot_information_criterion()
The method plot_fit_indice
shows the values of fit indices under all the penalty levels.
r6_lslx$plot_fit_indice()
The method plot_coefficient
shows the solution path of coefficients in the given block. The following code plots the solution paths of all coefficients in the block y<-f
, which contains all the regression coeffcients from latent factors to observed variables (i.e., factor loadings).
r6_lslx$plot_coefficient(block = "y<-f")
In lslx
, many quantities related to SEM can be extracted by extract-related method. For example, the loading matrix can be obtained by
r6_lslx$extract_coefficient_matrice(selector = "bic", block = "y<-f")
$G
visual textual speed
x1 0.892 0.000 0.000
x2 0.499 0.000 0.000
x3 0.651 0.000 0.000
x4 0.000 0.987 0.000
x5 0.000 1.099 0.000
x6 0.000 0.914 0.000
x7 0.000 0.000 0.660
x8 0.000 0.000 0.792
x9 0.258 0.000 0.503
The model-implied covariance matrix and residual matrix can be obtained by
r6_lslx$extract_implied_cov(selector = "bic")
$G
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 1.337 0.445 0.581 0.397 0.442 0.368 0.191 0.230 0.376
x2 0.445 1.375 0.324 0.222 0.247 0.205 0.107 0.128 0.210
x3 0.581 0.324 1.264 0.290 0.323 0.268 0.140 0.168 0.274
x4 0.397 0.222 0.290 1.346 1.085 0.902 0.143 0.172 0.224
x5 0.442 0.247 0.323 1.085 1.653 1.005 0.160 0.192 0.249
x6 0.368 0.205 0.268 0.902 1.005 1.192 0.133 0.159 0.207
x7 0.191 0.107 0.140 0.143 0.160 0.133 1.182 0.523 0.387
x8 0.230 0.128 0.168 0.172 0.192 0.159 0.523 1.021 0.465
x9 0.376 0.210 0.274 0.224 0.249 0.207 0.387 0.465 0.979
r6_lslx$extract_residual_cov(selector = "bic")
$G
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 -0.021385 0.03741 0.000811 -0.10778 0.00167 -0.08705 0.10670 -0.03411 -0.08269
x2 0.037410 -0.00667 -0.126611 0.01292 0.03602 -0.04207 0.20373 0.01870 -0.03413
x3 0.000811 -0.12661 -0.011388 0.08147 0.21034 0.02416 0.05133 -0.04474 -0.09983
x4 -0.107782 0.01292 0.081468 -0.00521 -0.01311 0.00636 -0.07629 0.04653 -0.01948
x5 0.001675 0.03602 0.210340 -0.01311 -0.00650 -0.00990 0.01680 0.01113 -0.04585
x6 -0.087046 -0.04207 0.024161 0.00636 -0.00990 -0.00451 -0.01121 -0.00600 -0.02863
x7 0.106699 0.20373 0.051325 -0.07629 0.01680 -0.01121 -0.00124 -0.01207 0.01402
x8 -0.034106 0.01870 -0.044741 0.04653 0.01113 -0.00600 -0.01207 -0.00157 0.00742
x9 -0.082688 -0.03413 -0.099828 -0.01948 -0.04585 -0.02863 0.01402 0.00742 -0.03592