Semi-Confirmatory Factor Analysis with Missing Data

Po-Hsien Huang

In this example, we will show how to use lslx to conduct semi-confirmatory factor analysis. The example uses data HolzingerSwineford1939 in the package lavaan. Hence, lavaan must be installed.

Missing Data Construction

Because HolzingerSwineford1939 doesn’t contain missing values, we use the code in semTools to create NA (see the example of twostage function in semTools).

data <- lavaan::HolzingerSwineford1939
data$x5 <- ifelse(data$x1 <= quantile(data$x1, .3), NA, data$x5)
data$age <- data$ageyr + data$agemo/12
data$x9 <- ifelse(data$age <= quantile(data$age, .3), NA, data$x9)

By the construction, we can see that the missingness of x5 depends on the value of x1 and the missingness of x9 relies on the age variable. Note that age is created by ageyr and agemo. Since ageyr and agemo are not the variables that we are interested, the two variables are treated as auxiliary in the later analysis.

Model Sepcification and Object Initialization

The following model specification is the same to our example of semi-confirmatory factor analysis (see vignette("factor-analysis")).

model <-
'
visual  :=> x1 + x2 + x3
textual :=> x4 + x5 + x6
speed   :=> x7 + x8 + x9
visual  :~> x4 + x5 + x6 + x7 + x8 + x9 
textual :~> x1 + x2 + x3 + x7 + x8 + x9 
speed   :~> x1 + x2 + x3 + x4 + x5 + x6 
visual  <=> fix(1) * visual
textual <=> fix(1) * textual
speed   <=> fix(1) * speed
'

To initialize an lslx object with auxiliary variables, we need to specify the auxiliary_variable argument. auxiliary_variable argument only accepts numeric variables. If any categorical variable ia considered as a valid auxiliary variable, user should transform it as a set of dummy variables first. One possible method is using model.matrix function.

library(lslx)
r6_lslx <- lslx$new(model = model,
                    data = data,
                    auxiliary_variable = c("ageyr", "agemo"))
An 'lslx' R6 class is initialized via 'data'.
  Response Variable(s): x1 x2 x3 x4 x5 x6 x7 x8 x9 
  Latent Factor(s): visual textual speed 
  Auxiliary Variable(s): ageyr agemo 

Model Fitting

So far, the specified auxiliary variables are only stored in lslx object. They are actually used after implementing the fit related methods.

r6_lslx$fit(penalty_method = "mcp",
            lambda_grid = seq(.02, .30, .02),
            delta_grid = c(5, 10))
CONGRATS: The optimization algorithm converged under all specified penalty levels. 
  Specified Tolerance for Convergence: 0.001 
  Specified Maximal Number of Iterations: 100 

By default, fit related methods implement two-step method (possibly with auxiliary variables) for handling missing values. User can specify the missing method explicitly via missing_method argument. Another missing method in the current version is listwise deletion. However, listwise deletion has no theoretical advantages over the two step method.

Model Summarizing

The following code summarizes the fitting result under the penalty level selected by Bayesian information criterion (BIC). The number of missing pattern shows how many missing patterns present in the data set (include the complete pattern). If the lslx object is initialized via raw data, by default, a corrected sandwich standard error will be used for coefficient test. The correction is based on the asymptotic covariance of saturated derived by full information maximum likelihood. Also, the mean adjusted likelihood ratio test is based on this quantity. For the reference, please see the section of Missing Data in ?lslx

r6_lslx$summarize(selector = "bic")
General Information                                                                                   
   number of observation                                                    301.000
   number of complete observation                                           138.000
   number of missing pattern                                                  4.000
   number of group                                                            1.000
   number of response                                                         9.000
   number of factor                                                           3.000
   number of free coefficient                                                30.000
   number of penalized coefficient                                           18.000

Fitting Information                                                                                   
   penalty method                                                               mcp
   lambda grid                                                           0.02 - 0.3
   delta grid                                                                5 - 10
   algorithm                                                                 fisher
   missing method                                                         two stage
   tolerance for convergence                                                  0.001

Saturated Model Information                                                                                   
   loss value                                                                 0.000
   number of non-zero coefficient                                            54.000
   degree of freedom                                                          0.000

Baseline Model Information                                                                                   
   loss value                                                                 2.937
   number of non-zero coefficient                                            18.000
   degree of freedom                                                         36.000

Numerical Condition                                                                                   
   lambda                                                                     0.160
   delta                                                                      5.000
   objective value                                                            0.218
   objective gradient absolute maximum                                        0.001
   objective hessian convexity                                                0.683
   number of iteration                                                        3.000
   loss value                                                                 0.154
   number of non-zero coefficient                                            33.000
   degree of freedom                                                         21.000
   robust degree of freedom                                                  26.729
   scaling factor                                                             1.273

Information Criteria                                                                                   
   Akaike information criterion (aic)                                         0.014
   Akaike information criterion with penalty being 3 (aic3)                  -0.056
   consistent Akaike information criterion (caic)                            -0.314
   Bayesian information criterion (bic)                                      -0.244
   adjusted Bayesian information criterion (abic)                            -0.023
   Haughton Bayesian information criterion (hbic)                            -0.116
   robust Akaike information criterion (raic)                                -0.024
   robust Akaike information criterion with penalty being 3 (raic3)          -0.113
   robust consistent Akaike information criterion (rcaic)                    -0.442
   robust Bayesian information criterion (rbic)                              -0.353
   robust adjusted Bayesian information criterion (rabic)                    -0.071
   robust Haughton Bayesian information criterion (rhbic)                    -0.190

Fit Indices                                                                                   
   root mean square error of approximation (rmsea)                            0.063
   comparative fit indice (cfi)                                               0.970
   non-normed fit indice (nnfi)                                               0.949
   standardized root mean of residual (srmr)                                  0.041

Likelihood Ratio Test
                    statistic         df    p-value
   unadjusted          46.285     21.000      0.001
   mean-adjusted       36.365     21.000      0.020

Root Mean Square Error of Approximation Test
                     estimate      lower      upper
   unadjusted           0.063      0.033      0.092
   mean-adjusted        0.056      0.010      0.090

Coefficient Test (Standard Error = "sandwich", Alpha Level = 0.05)
  Factor Loading
                      type  estimate  std.error  z-value  p-value  lower  upper
       x1<-visual     free     0.934      0.108    8.655    0.000  0.722  1.145
       x2<-visual     free     0.475      0.094    5.063    0.000  0.291  0.658
       x3<-visual     free     0.630      0.082    7.659    0.000  0.469  0.791
       x4<-visual      pen     0.000        -        -        -      -      -  
       x5<-visual      pen    -0.153      0.119   -1.288    0.099 -0.386  0.080
       x6<-visual      pen     0.000        -        -        -      -      -  
       x7<-visual      pen    -0.074      0.100   -0.740    0.230 -0.270  0.122
       x8<-visual      pen     0.000        -        -        -      -      -  
       x9<-visual      pen     0.226      0.080    2.838    0.002  0.070  0.382
      x1<-textual      pen     0.000        -        -        -      -      -  
      x2<-textual      pen     0.000        -        -        -      -      -  
      x3<-textual      pen     0.000        -        -        -      -      -  
      x4<-textual     free     0.978      0.062   15.840    0.000  0.857  1.099
      x5<-textual     free     1.126      0.079   14.235    0.000  0.971  1.281
      x6<-textual     free     0.917      0.059   15.659    0.000  0.802  1.032
      x7<-textual      pen     0.000        -        -        -      -      -  
      x8<-textual      pen     0.000        -        -        -      -      -  
      x9<-textual      pen     0.000        -        -        -      -      -  
        x1<-speed      pen     0.000        -        -        -      -      -  
        x2<-speed      pen     0.000        -        -        -      -      -  
        x3<-speed      pen     0.000        -        -        -      -      -  
        x4<-speed      pen     0.000        -        -        -      -      -  
        x5<-speed      pen     0.000        -        -        -      -      -  
        x6<-speed      pen     0.000        -        -        -      -      -  
        x7<-speed     free     0.714      0.100    7.130    0.000  0.518  0.911
        x8<-speed     free     0.769      0.084    9.141    0.000  0.604  0.934
        x9<-speed     free     0.462      0.073    6.352    0.000  0.320  0.605

  Covariance
                      type  estimate  std.error  z-value  p-value  lower  upper
 textual<->visual     free     0.462      0.070    6.559    0.000  0.324  0.600
   speed<->visual     free     0.329      0.085    3.873    0.000  0.163  0.496
  speed<->textual     free     0.217      0.081    2.680    0.004  0.058  0.376

  Variance
                      type  estimate  std.error  z-value  p-value  lower  upper
  visual<->visual    fixed     1.000        -        -        -      -      -  
textual<->textual    fixed     1.000        -        -        -      -      -  
    speed<->speed    fixed     1.000        -        -        -      -      -  
          x1<->x1     free     0.463      0.187    2.469    0.007  0.095  0.830
          x2<->x2     free     1.150      0.116    9.880    0.000  0.922  1.378
          x3<->x3     free     0.867      0.099    8.739    0.000  0.672  1.061
          x4<->x4     free     0.388      0.054    7.257    0.000  0.283  0.493
          x5<->x5     free     0.397      0.070    5.685    0.000  0.260  0.534
          x6<->x6     free     0.351      0.046    7.625    0.000  0.261  0.441
          x7<->x7     free     0.694      0.114    6.094    0.000  0.471  0.917
          x8<->x8     free     0.429      0.113    3.808    0.000  0.208  0.650
          x9<->x9     free     0.619      0.077    8.071    0.000  0.469  0.769

  Intercept
                      type  estimate  std.error  z-value  p-value  lower  upper
            x1<-1     free     4.936      0.067   73.473    0.000  4.804  5.067
            x2<-1     free     6.088      0.068   89.855    0.000  5.955  6.221
            x3<-1     free     2.250      0.065   34.579    0.000  2.123  2.378
            x4<-1     free     3.061      0.067   45.694    0.000  2.930  3.192
            x5<-1     free     4.420      0.093   47.369    0.000  4.237  4.603
            x6<-1     free     2.186      0.063   34.667    0.000  2.062  2.309
            x7<-1     free     4.186      0.063   66.766    0.000  4.063  4.309
            x8<-1     free     5.527      0.058   94.854    0.000  5.413  5.641
            x9<-1     free     5.366      0.074   72.693    0.000  5.221  5.510