Semi-Confirmatory Multi-Group Factor Analysis

Po-Hsien Huang

In this example, we will show how to use lslx to conduct multi-group semi-confirmatory factor analysis. The example uses data HolzingerSwineford1939 in the package lavaan. Hence, lavaan must be installed.

Model Sepcification

In the following specification, x1 - x9 is assumed to be measurements of 3 latent factors: visual, textual, and speed.

model <-
'
visual  :=> fix(1) * x1 + x2 + x3 
textual :=> fix(1) * x4 + x5 + x6 
speed   :=> fix(1) * x7 + x8 + x9 
'

The operator :=> means that the LHS latent factors is defined by the RHS observed variables. In this model, visual is mainly measured by x1 - x3, textual is mainly measured by x4 - x6, and speed is mainly measured by x7 - x9. Loadings of x1, x4, and x7 are fixed at 1 for scale setting. The above specification is valid for both groups. Details of model syntax can be found in the section of Model Syntax via ?lslx.

Object Initialization

lslx is written as an R6 class. Everytime we conduct analysis with lslx, an lslx object must be initialized. The following code initializes an lslx object named r6_lslx.

library(lslx)
r6_lslx <- lslx$new(model = model,
                     data = lavaan::HolzingerSwineford1939,
                     group_variable = "school",
                     reference_group = "Pasteur")
An 'lslx' R6 class is initialized via 'data'.
  Response Variable(s): x1 x2 x3 x4 x5 x6 x7 x8 x9 
  Latent Factor(s): visual textual speed 
  Group(s): Grant-White Pasteur 
  Reference Group: Pasteur 
NOTE: Because Pasteur is set as reference, coefficients in other groups actually represent increments from the reference.

Here, lslx is the object generator for lslx object and new is the build-in method of lslx to generate a new lslx object. The initialization of lslx requires users to specify a model for model specification (argument model) and a data set to be fitted (argument sample_data). The data set must contains all the observed variables specified in the given model. Because in this example a multi-group analysis is considered, variable for group labeling (argument group_variable) must be specified. In lslx, two types of parameterization can be used in multi-group analysis. The first type is the same with the traditional multi-group SEM, which treats model parameters in each group seperately. The second type sets one group as reference and treats model parameters in other gorups as increments with respect to the reference. Under the second type of parameterization, the group heterogeneity can be efficiently explored if we treat the increments as penalized parameters. In this example, Pasteur is set as reference. Hence, the parameters in Grant-White now reflect differences from the reference.

Model Respecification

After an lslx object is initialized, the heterogeneity of a multi-group model can be quickly respecified by free_heterogeneity, fix_heterogeneity, and penalize_heterogeneity methods. The following code sets x2<-visual, x3<-visual, x5<-textual, x6<-textual, x8<-speed, x9<-speed, and x2<-1, x3<-1, x5<-1, x6<-1, x8<-1, x9<-1 in Grant-White as penalized parameters. Note that parameters in Grant-White now reflect differences since Pasteur is set as reference.

r6_lslx$penalize_heterogeneity(block = "y<-f", group = "Grant-White")
The relation x2<-visual under Grant-White is set as PENALIZED with starting value = 0. 
The relation x3<-visual under Grant-White is set as PENALIZED with starting value = 0. 
The relation x5<-textual under Grant-White is set as PENALIZED with starting value = 0. 
The relation x6<-textual under Grant-White is set as PENALIZED with starting value = 0. 
The relation x8<-speed under Grant-White is set as PENALIZED with starting value = 0. 
The relation x9<-speed under Grant-White is set as PENALIZED with starting value = 0. 
NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment. 
NOTE: Please check whether the starting value for the increment represents a difference. 
r6_lslx$penalize_heterogeneity(block = "y<-1", group = "Grant-White")
The relation x1<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x2<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x3<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x4<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x5<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x6<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x7<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x8<-1 under Grant-White is set as PENALIZED with starting value = 0. 
The relation x9<-1 under Grant-White is set as PENALIZED with starting value = 0. 
NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment. 
NOTE: Please check whether the starting value for the increment represents a difference. 

Since the homogeneity of latent factor means may not be a reasonable assumtion when examinning measurement invariance, the following code relaxes this assumption

r6_lslx$free_directed(left = c("visual", "textual", "speed"),
                      right = "1",
                      group = "Grant-White")
The relation visual<-1 under Grant-White is set as FREE with starting value = NA. 
The relation textual<-1 under Grant-White is set as FREE with starting value = NA. 
The relation speed<-1 under Grant-White is set as FREE with starting value = NA. 
NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment. 
NOTE: Please check whether the starting value for the increment represents a difference. 

To see more methods to modify a specified model, please check the section of Set-Related Method via ?lslx.

Model Fitting

After an lslx object is initialized, method fit_lasso can be used to fit the specified model into the given data with lasso penalty funtion.

r6_lslx$fit_lasso(lambda_grid = seq(.02, .30, .02))
CONGRATS: The optimization algorithm converged under all specified penalty levels. 
  Specified Tolerance for Convergence: 0.001 
  Specified Maximal Number of Iterations: 100 

The fit_lasso requires users to specify the considerd penalty levels (argument lambda_grid). In this example, the lambda grid is seq(.02, .30, .02). All the fitting result will be stored in the fitting field of r6_lslx.

Model Summarizing

Unlike traditional SEM analysis, lslx fit the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Availble selectors can be found in the section of Penalty Level Selection via ?lslx. The following code summarize the fitting result under the penalty level selected by Bayesian information criterion (BIC).

r6_lslx$summarize(selector = "bic")
General Information                                                                                   
   number of observation                                                        301
   number of complete observation                                               301
   number of missing pattern                                                   none
   number of group                                                                2
   number of response                                                             9
   number of factor                                                               3
   number of free coefficient                                                    48
   number of penalized coefficient                                               15

Fitting Information                                                                                   
   penalty method                                                             lasso
   lambda grid                                                           0.02 - 0.3
   delta grid                                                                  none
   algorithm                                                                 fisher
   missing method                                                              none
   tolerance for convergence                                                  0.001

Saturated Model Information                                                                                   
   loss value                                                                 0.000
   number of non-zero coefficient                                           108.000
   degree of freedom                                                          0.000

Baseline Model Information                                                                                   
   loss value                                                                 3.181
   number of non-zero coefficient                                            36.000
   degree of freedom                                                         72.000

Numerical Condition                                                                                   
   lambda                                                                     0.120
   delta                                                                       none
   objective value                                                            0.518
   objective gradient absolute maximum                                        0.001
   objective hessian convexity                                                0.459
   number of iteration                                                        3.000
   loss value                                                                 0.459
   number of non-zero coefficient                                            50.000
   degree of freedom                                                         58.000
   robust degree of freedom                                                  60.258
   scaling factor                                                             1.039

Information Criteria                                                                                   
   Akaike information criterion (aic)                                         0.074
   Akaike information criterion with penalty being 3 (aic3)                  -0.119
   consistent Akaike information criterion (caic)                            -0.833
   Bayesian information criterion (bic)                                      -0.640
   adjusted Bayesian information criterion (abic)                            -0.029
   Haughton Bayesian information criterion (hbic)                            -0.286
   robust Akaike information criterion (raic)                                 0.059
   robust Akaike information criterion with penalty being 3 (raic3)          -0.141
   robust consistent Akaike information criterion (rcaic)                    -0.883
   robust Bayesian information criterion (rbic)                              -0.683
   robust adjusted Bayesian information criterion (rabic)                    -0.048
   robust Haughton Bayesian information criterion (rhbic)                    -0.315

Fit Indices                                                                                   
   root mean square error of approximation (rmsea)                            0.096
   comparative fit indice (cfi)                                               0.909
   non-normed fit indice (nnfi)                                               0.888
   standardized root mean of residual (srmr)                                  0.096

Likelihood Ratio Test
                    statistic         df    p-value
   unadjusted         138.244     58.000      0.000
   mean-adjusted      133.065     58.000      0.000

Root Mean Square Error of Approximation Test
                     estimate      lower      upper
   unadjusted           0.096      0.071      0.120
   mean-adjusted        0.095      0.069      0.120

Coefficient Test (Group = "Pasteur", Standard Error = "sandwich", Alpha Level = 0.05)
  Factor Loading (reference component)
                      type  estimate  std.error  z-value  p-value  lower  upper
       x1<-visual    fixed     1.000        -        -        -      -      -  
       x2<-visual     free     0.582      0.132    4.401    0.000  0.323  0.841
       x3<-visual     free     0.773      0.148    5.211    0.000  0.482  1.064
      x4<-textual    fixed     1.000        -        -        -      -      -  
      x5<-textual     free     1.120      0.067   16.597    0.000  0.988  1.252
      x6<-textual     free     0.932      0.063   14.679    0.000  0.808  1.056
        x7<-speed    fixed     1.000        -        -        -      -      -  
        x8<-speed     free     1.171      0.132    8.868    0.000  0.912  1.430
        x9<-speed     free     1.029      0.204    5.046    0.000  0.630  1.429

  Covariance (reference component)
                      type  estimate  std.error  z-value  p-value  lower  upper
 textual<->visual     free     0.414      0.131    3.154    0.001  0.157  0.672
   speed<->visual     free     0.174      0.066    2.628    0.004  0.044  0.304
  speed<->textual     free     0.176      0.060    2.904    0.002  0.057  0.294

  Variance (reference component)
                      type  estimate  std.error  z-value  p-value  lower  upper
  visual<->visual     free     0.820      0.228    3.592    0.000  0.373  1.268
textual<->textual     free     0.880      0.135    6.539    0.000  0.616  1.143
    speed<->speed     free     0.312      0.085    3.661    0.000  0.145  0.479
          x1<->x1     free     0.537      0.173    3.104    0.001  0.198  0.875
          x2<->x2     free     1.285      0.171    7.527    0.000  0.950  1.619
          x3<->x3     free     0.904      0.129    7.000    0.000  0.651  1.158
          x4<->x4     free     0.445      0.070    6.325    0.000  0.307  0.583
          x5<->x5     free     0.503      0.083    6.025    0.000  0.339  0.666
          x6<->x6     free     0.263      0.058    4.525    0.000  0.149  0.377
          x7<->x7     free     0.859      0.115    7.462    0.000  0.634  1.085
          x8<->x8     free     0.526      0.092    5.744    0.000  0.347  0.706
          x9<->x9     free     0.654      0.116    5.622    0.000  0.426  0.883

  Intercept (reference component)
                      type  estimate  std.error  z-value  p-value  lower  upper
            x1<-1     free     4.957      0.092   53.721    0.000  4.776  5.138
            x2<-1     free     6.119      0.082   74.379    0.000  5.958  6.280
            x3<-1     free     2.382      0.094   25.245    0.000  2.197  2.567
            x4<-1     free     2.778      0.087   31.914    0.000  2.607  2.949
            x5<-1     free     4.035      0.103   39.172    0.000  3.833  4.237
            x6<-1     free     1.926      0.075   25.776    0.000  1.779  2.072
            x7<-1     free     4.333      0.088   49.373    0.000  4.161  4.505
            x8<-1     free     5.601      0.075   75.033    0.000  5.455  5.748
            x9<-1     free     5.438      0.072   75.733    0.000  5.297  5.579

Coefficient Test (Group = "Grant-White", Standard Error = "sandwich", Alpha Level = 0.05)
  Factor Loading (increment component)
                      type  estimate  std.error  z-value  p-value  lower  upper
       x1<-visual    fixed     0.000        -        -        -      -      -  
       x2<-visual      pen     0.000        -        -        -      -      -  
       x3<-visual      pen     0.000        -        -        -      -      -  
      x4<-textual    fixed     0.000        -        -        -      -      -  
      x5<-textual      pen     0.000        -        -        -      -      -  
      x6<-textual      pen     0.000        -        -        -      -      -  
        x7<-speed    fixed     0.000        -        -        -      -      -  
        x8<-speed      pen     0.000        -        -        -      -      -  
        x9<-speed      pen     0.000        -        -        -      -      -  

  Covariance (increment component)
                      type  estimate  std.error  z-value  p-value  lower  upper
 textual<->visual     free     0.016      0.145    0.113    0.455 -0.268  0.301
   speed<->visual     free     0.149      0.108    1.387    0.083 -0.062  0.361
  speed<->textual     free     0.053      0.110    0.485    0.314 -0.162  0.268

  Variance (increment component)
                      type  estimate  std.error  z-value  p-value  lower  upper
  visual<->visual     free    -0.090      0.204   -0.440    0.330 -0.489  0.310
textual<->textual     free    -0.009      0.167   -0.056    0.478 -0.336  0.317
    speed<->speed     free     0.175      0.096    1.813    0.035 -0.014  0.364
          x1<->x1     free     0.100      0.178    0.565    0.286 -0.248  0.448
          x2<->x2     free    -0.332      0.225   -1.472    0.071 -0.773  0.110
          x3<->x3     free    -0.284      0.144   -1.970    0.024 -0.567 -0.001
          x4<->x4     free    -0.102      0.093   -1.095    0.137 -0.286  0.081
          x5<->x5     free    -0.126      0.103   -1.227    0.110 -0.328  0.075
          x6<->x6     free     0.174      0.093    1.874    0.030 -0.008  0.356
          x7<->x7     free    -0.252      0.138   -1.826    0.034 -0.523  0.019
          x8<->x8     free    -0.107      0.141   -0.762    0.223 -0.384  0.169
          x9<->x9     free    -0.129      0.142   -0.905    0.183 -0.407  0.150

  Intercept (increment component)
                      type  estimate  std.error  z-value  p-value  lower  upper
        visual<-1     free    -0.047      0.130   -0.358    0.360 -0.302  0.209
       textual<-1     free     0.576      0.120    4.790    0.000  0.340  0.812
         speed<-1     free    -0.124      0.093   -1.331    0.092 -0.308  0.059
            x1<-1      pen     0.000        -        -        -      -      -  
            x2<-1      pen     0.000        -        -        -      -      -  
            x3<-1      pen    -0.273      0.121   -2.256    0.012 -0.511 -0.036
            x4<-1      pen     0.000        -        -        -      -      -  
            x5<-1      pen     0.000        -        -        -      -      -  
            x6<-1      pen     0.000        -        -        -      -      -  
            x7<-1      pen    -0.212      0.113   -1.870    0.031 -0.434  0.010
            x8<-1      pen     0.000        -        -        -      -      -  
            x9<-1      pen     0.000        -        -        -      -      -  

In this example, we can see that all of the loadings are invariant across the two groups. However, the intercepts of x3 and x7 seem to be not invariant. The summarize method also shows the result of significance tests for the coefficients. In lslx, the default standard errors are calculated based on sandwich formula whenever raw data is available. It is generally valid even when the model is misspecified and the data is not normal. However, it may not be valid after selecting an optimal penalty level.