Regression Analysis with Lasso Penalty

Po-Hsien Huang

In this example, we will show how to use lslx to conduct regression analysis with lasso penalty.

Data Generation

The following code is used to generate data for regression analysis.

set.seed(9487)
x <- matrix(rnorm(2000), 200, 10)
colnames(x) <- paste0("x", 1:10)
y <- matrix(rnorm(200), 200, 1)
data <- data.frame(y, x)

The data set contains 200 observation on 10 covariates (x1 - x10) and a response variable (y). By the construction of the data, the 10 covariates are not useful to predict the response. The data is stored in a data.frame named data.

Model Sepcification

Model specification in lslx is quite similar to that in lavaan. However, different operators and prefix are used to accommodate the presence of penalized parameters. In the following specification, y is predicted by x1 - x10.

model <-
'
y <= x1 + x2 + x3 + x4
y <~ x5 + x6 + x7 + x8 + x9 + x10
'

The operator <= means that the regression coefficients from the RHS variables to the LHS variables are freely estimated. On the other hand, the operator <~ means that the regression coefficients from the RHS variables to the LHS variables are estimated with penalty. Details of model syntax can be found in the section of Model Syntax via ?lslx.

Object Initialization

lslx is written as an R6 class. Everytime we conduct analysis with lslx, an lslx object must be initialized. The following code initializes an lslx object named r6_lslx.

library(lslx)
r6_lslx <- lslx$new(model = model, data = data)
An 'lslx' R6 class is initialized via 'data'.
  Response Variable(s): y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 

Here, lslx is the object generator for lslx object and new is the build-in method of lslx to generate a new lslx object. The initialization of lslx requires users to specify a model for model specification (argument model) and a data set to be fitted (argument sample_data). The data set must contains all the observed variables specified in the given model. In is also possible to initialize an lslx object via importing sample moments (see vignette("structural-equation-modeling")).

Model Fitting

After an lslx object is initialized, method fit can be used to fit the specified model into the given data.

r6_lslx$fit(penalty_method = "lasso",
            lambda_grid = seq(.00, .30, .01))
CONGRATS: The optimization algorithm converged under all specified penalty levels. 
  Specified Tolerance for Convergence: 0.001 
  Specified Maximal Number of Iterations: 100 

The fitting process requires users to specify the penalty method (argument penalty_method) and the considerd penalty levels (argument lambda_grid). In this example, the lasso penalty is implemented on the lambda grid seq(.00, .30, .01). All the fitting result will be stored in the fitting field of r6_lslx.

Model Summarizing

Unlike traditional SEM analysis, lslx fit the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Availble selectors can be found in the section of Penalty Level Selection via ?lslx. The following code summarize the fitting result under the penalty level selected by Akaike information criterion (AIC).

r6_lslx$summarize(selector = "aic")
General Information                                                                                
   number of observation                                                     200
   number of complete observation                                            200
   number of missing pattern                                                none
   number of group                                                             1
   number of response                                                         11
   number of factor                                                            0
   number of free coefficient                                                 71
   number of penalized coefficient                                             6

Fitting Information                                                                                
   penalty method                                                          lasso
   lambda grid                                                           0 - 0.3
   delta grid                                                               none
   algorithm                                                              fisher
   missing method                                                           none
   tolerance for convergence                                               0.001

Saturated Model Information                                                                                
   loss value                                                              0.000
   number of non-zero coefficient                                         77.000
   degree of freedom                                                       0.000

Baseline Model Information                                                                                
   loss value                                                              0.388
   number of non-zero coefficient                                         22.000
   degree of freedom                                                      55.000

Numerical Condition                                                                                
   lambda                                                                  0.290
   delta                                                                    none
   objective value                                                         0.011
   objective gradient absolute maximum                                     0.000
   objective hessian convexity                                             0.778
   number of iteration                                                     1.000
   loss value                                                              0.011
   number of non-zero coefficient                                         71.000
   degree of freedom                                                       6.000
   robust degree of freedom                                                5.733
   scaling factor                                                          0.956

Information Criteria                                                                                
   Akaike information criterion (aic)                                     -0.049
   Akaike information criterion with penalty being 3 (aic3)               -0.079
   consistent Akaike information criterion (caic)                         -0.178
   Bayesian information criterion (bic)                                   -0.148
   adjusted Bayesian information criterion (abic)                         -0.053
   Haughton Bayesian information criterion (hbic)                         -0.092
   robust Akaike information criterion (raic)                             -0.046
   robust Akaike information criterion with penalty being 3 (raic3)       -0.075
   robust consistent Akaike information criterion (rcaic)                 -0.169
   robust Bayesian information criterion (rbic)                           -0.140
   robust adjusted Bayesian information criterion (rabic)                 -0.050
   robust Haughton Bayesian information criterion (rhbic)                 -0.088

Fit Indices                                                                                
   root mean square error of approximation (rmsea)                         0.000
   comparative fit indice (cfi)                                            1.000
   non-normed fit indice (nnfi)                                            1.000
   standardized root mean of residual (srmr)                               0.012

Likelihood Ratio Test
                    statistic         df    p-value
   unadjusted           2.279      6.000      0.892
   mean-adjusted        2.385      6.000      0.881

Root Mean Square Error of Approximation Test
                     estimate      lower      upper
   unadjusted           0.000      0.000      0.057
   mean-adjusted        0.000      0.000      0.058

Coefficient Test (Standard Error = "sandwich", Alpha Level = 0.05)
  Regression
              type  estimate  std.error  z-value  p-value  lower  upper
    y<-x1     free     0.103      0.071    1.452    0.073 -0.036  0.242
    y<-x2     free    -0.126      0.069   -1.832    0.034 -0.261  0.009
    y<-x3     free     0.043      0.073    0.581    0.281 -0.101  0.186
    y<-x4     free    -0.083      0.072   -1.150    0.125 -0.225  0.059
    y<-x5      pen     0.000        -        -        -      -      -  
    y<-x6      pen     0.000        -        -        -      -      -  
    y<-x7      pen     0.000        -        -        -      -      -  
    y<-x8      pen     0.000        -        -        -      -      -  
    y<-x9      pen     0.000        -        -        -      -      -  
   y<-x10      pen     0.000        -        -        -      -      -  

  Covariance
              type  estimate  std.error  z-value  p-value  lower  upper
  x2<->x1     free     0.101      0.085    1.192    0.117 -0.065  0.268
  x3<->x1     free     0.071      0.074    0.956    0.170 -0.074  0.216
  x4<->x1     free     0.168      0.082    2.046    0.020  0.007  0.328
  x5<->x1     free     0.069      0.078    0.889    0.187 -0.083  0.221
  x6<->x1     free    -0.281      0.078   -3.578    0.000 -0.434 -0.127
  x7<->x1     free    -0.055      0.068   -0.806    0.210 -0.187  0.078
  x8<->x1     free     0.083      0.076    1.083    0.139 -0.067  0.232
  x9<->x1     free    -0.082      0.075   -1.090    0.138 -0.230  0.066
 x10<->x1     free    -0.027      0.071   -0.379    0.353 -0.165  0.112
  x3<->x2     free     0.042      0.070    0.605    0.273 -0.094  0.179
  x4<->x2     free    -0.011      0.076   -0.152    0.440 -0.160  0.137
  x5<->x2     free     0.020      0.078    0.255    0.400 -0.132  0.172
  x6<->x2     free    -0.142      0.072   -1.972    0.024 -0.284 -0.001
  x7<->x2     free     0.031      0.066    0.473    0.318 -0.098  0.161
  x8<->x2     free     0.017      0.077    0.223    0.412 -0.134  0.169
  x9<->x2     free     0.012      0.076    0.156    0.438 -0.138  0.162
 x10<->x2     free    -0.047      0.068   -0.693    0.244 -0.180  0.086
  x4<->x3     free    -0.051      0.072   -0.705    0.240 -0.191  0.090
  x5<->x3     free    -0.095      0.072   -1.319    0.094 -0.236  0.046
  x6<->x3     free     0.082      0.080    1.029    0.152 -0.074  0.238
  x7<->x3     free     0.073      0.071    1.025    0.153 -0.067  0.213
  x8<->x3     free    -0.120      0.071   -1.693    0.045 -0.259  0.019
  x9<->x3     free     0.003      0.071    0.047    0.481 -0.136  0.142
 x10<->x3     free    -0.062      0.072   -0.856    0.196 -0.202  0.079
  x5<->x4     free     0.078      0.074    1.052    0.146 -0.068  0.224
  x6<->x4     free     0.015      0.076    0.201    0.420 -0.133  0.164
  x7<->x4     free    -0.054      0.065   -0.837    0.201 -0.180  0.072
  x8<->x4     free     0.173      0.070    2.456    0.007  0.035  0.311
  x9<->x4     free     0.101      0.070    1.455    0.073 -0.035  0.238
 x10<->x4     free     0.027      0.075    0.358    0.360 -0.119  0.173
  x6<->x5     free    -0.100      0.073   -1.372    0.085 -0.242  0.043
  x7<->x5     free    -0.136      0.071   -1.908    0.028 -0.275  0.004
  x8<->x5     free     0.126      0.074    1.690    0.046 -0.020  0.271
  x9<->x5     free    -0.061      0.079   -0.763    0.223 -0.216  0.095
 x10<->x5     free     0.071      0.077    0.914    0.180 -0.081  0.222
  x7<->x6     free     0.014      0.071    0.202    0.420 -0.125  0.153
  x8<->x6     free     0.035      0.075    0.468    0.320 -0.112  0.182
  x9<->x6     free     0.026      0.067    0.383    0.351 -0.106  0.157
 x10<->x6     free    -0.017      0.075   -0.224    0.411 -0.165  0.131
  x8<->x7     free    -0.082      0.064   -1.277    0.101 -0.208  0.044
  x9<->x7     free    -0.018      0.065   -0.274    0.392 -0.146  0.110
 x10<->x7     free    -0.096      0.060   -1.604    0.054 -0.213  0.021
  x9<->x8     free    -0.111      0.068   -1.638    0.051 -0.243  0.022
 x10<->x8     free     0.156      0.071    2.189    0.014  0.016  0.296
 x10<->x9     free    -0.145      0.074   -1.948    0.026 -0.290  0.001

  Variance
              type  estimate  std.error  z-value  p-value  lower  upper
    y<->y     free     1.104      0.112    9.903    0.000  0.886  1.323
  x1<->x1     free     1.206      0.116   10.430    0.000  0.980  1.433
  x2<->x2     free     1.164      0.109   10.688    0.000  0.951  1.378
  x3<->x3     free     1.035      0.094   10.959    0.000  0.850  1.220
  x4<->x4     free     1.010      0.087   11.670    0.000  0.840  1.179
  x5<->x5     free     1.078      0.113    9.579    0.000  0.858  1.299
  x6<->x6     free     1.057      0.114    9.237    0.000  0.832  1.281
  x7<->x7     free     0.839      0.078   10.795    0.000  0.687  0.992
  x8<->x8     free     0.986      0.089   11.069    0.000  0.811  1.161
  x9<->x9     free     1.022      0.112    9.109    0.000  0.802  1.242
x10<->x10     free     0.992      0.087   11.370    0.000  0.821  1.162

  Intercept
              type  estimate  std.error  z-value  p-value  lower  upper
     y<-1     free    -0.002      0.073   -0.034    0.487 -0.146  0.141
    x1<-1     free    -0.033      0.078   -0.424    0.336 -0.185  0.119
    x2<-1     free    -0.083      0.076   -1.083    0.139 -0.232  0.067
    x3<-1     free     0.072      0.072    1.001    0.158 -0.069  0.213
    x4<-1     free    -0.025      0.071   -0.353    0.362 -0.164  0.114
    x5<-1     free    -0.050      0.073   -0.683    0.247 -0.194  0.094
    x6<-1     free    -0.096      0.073   -1.321    0.093 -0.238  0.046
    x7<-1     free     0.027      0.065    0.414    0.339 -0.100  0.154
    x8<-1     free     0.048      0.070    0.680    0.248 -0.090  0.185
    x9<-1     free     0.001      0.071    0.016    0.494 -0.139  0.141
   x10<-1     free     0.024      0.070    0.346    0.365 -0.114  0.162

In this example, we can observed that all of the penalized coefficients are identified as zero, which is consistent with their population values. The summarize method also shows the result of significance tests for the coefficients. In lslx, the default standard errors are calculated based on a sandwich formula whenever raw data is available. It is generally valid even when the model is misspecified and the data is not normal distributed. However, it may not be valid after selecting an optimal penalty level.

Visualization

lslx provides four methods for visualizing the fitting results. The method plot_numerical_condition shows the numerical condition under all the penalty levels. The following code plots the values of n_iter_out (number of iterations in outer loop), objective_gradient_abs_max (maximum of absolute value of gradient of objective function), and objective_hessian_convexity (minimum of univariate approximate hessian). The plot can be used to evaluate the quality of numerical optimization. n_iter_out shows that the algorithm converges quickly under all the penalty levels. objective_gradient_abs_max and objective_hessian_convexity indicate that the obtained coefficients are valid minimizers under all the penalty levels.

r6_lslx$plot_numerical_condition()

The method plot_information_criterion shows the values of information criteria under all the penalty levels. The plot shows that an optimal value of lambda is any value larger than 0.15.

r6_lslx$plot_information_criterion()

The method plot_fit_indice shows the values of fit indices under all the penalty levels.

r6_lslx$plot_fit_indice()
Warning: Removed 10 rows containing missing values (geom_path).

The method plot_coefficient shows the solution path of coefficients in the given block. The following code plots the solution paths of all coefficients in the block y<-y, which contains all the regression coeffcients from observed variables to observed variables. We can see that all the regression coefficients become zero when the value of lambda is larger than 0.15.

r6_lslx$plot_coefficient(block = "y<-y")

Objects Extraction

In lslx, many quantities related to SEM can be extracted by extract-related method. For example, the coefficient under the penalty level selected by aic can be obtained by

r6_lslx$extract_coefficient(selector = "aic")
     y<-1|G     x1<-1|G     x2<-1|G     x3<-1|G     x4<-1|G     x5<-1|G     x6<-1|G     x7<-1|G 
   -0.00247    -0.03294    -0.08263     0.07199    -0.02508    -0.05018    -0.09601     0.02681 
    x8<-1|G     x9<-1|G    x10<-1|G     y<-x1|G     y<-x2|G     y<-x3|G     y<-x4|G     y<-x5|G 
    0.04775     0.00111     0.02436     0.10294    -0.12611     0.04260    -0.08332     0.00000 
    y<-x6|G     y<-x7|G     y<-x8|G     y<-x9|G    y<-x10|G     y<->y|G   x1<->x1|G   x2<->x1|G 
    0.00000     0.00000     0.00000     0.00000     0.00000     1.10422     1.20633     0.10137 
  x3<->x1|G   x4<->x1|G   x5<->x1|G   x6<->x1|G   x7<->x1|G   x8<->x1|G   x9<->x1|G  x10<->x1|G 
    0.07090     0.16763     0.06901    -0.28053    -0.05460     0.08263    -0.08223    -0.02676 
  x2<->x2|G   x3<->x2|G   x4<->x2|G   x5<->x2|G   x6<->x2|G   x7<->x2|G   x8<->x2|G   x9<->x2|G 
    1.16405     0.04209    -0.01149     0.01977    -0.14231     0.03125     0.01724     0.01193 
 x10<->x2|G   x3<->x3|G   x4<->x3|G   x5<->x3|G   x6<->x3|G   x7<->x3|G   x8<->x3|G   x9<->x3|G 
   -0.04697     1.03484    -0.05058    -0.09476     0.08204     0.07300    -0.11996     0.00334 
 x10<->x3|G   x4<->x4|G   x5<->x4|G   x6<->x4|G   x7<->x4|G   x8<->x4|G   x9<->x4|G  x10<->x4|G 
   -0.06150     1.00971     0.07830     0.01525    -0.05398     0.17313     0.10138     0.02669 
  x5<->x5|G   x6<->x5|G   x7<->x5|G   x8<->x5|G   x9<->x5|G  x10<->x5|G   x6<->x6|G   x7<->x6|G 
    1.07843    -0.09960    -0.13574     0.12568    -0.06061     0.07054     1.05668     0.01436 
  x8<->x6|G   x9<->x6|G  x10<->x6|G   x7<->x7|G   x8<->x7|G   x9<->x7|G  x10<->x7|G   x8<->x8|G 
    0.03514     0.02572    -0.01691     0.83925    -0.08207    -0.01792    -0.09597     0.98598 
  x9<->x8|G  x10<->x8|G   x9<->x9|G  x10<->x9|G x10<->x10|G 
   -0.11069     0.15616     1.02171    -0.14454     0.99154 

We may also check the quality of optimization by viewing the subgradient of objective function

r6_lslx$extract_objective_gradient(selector = "aic")
                 [,1]
y<-1|G      -2.03e-09
x1<-1|G      0.00e+00
x2<-1|G      0.00e+00
x3<-1|G     -1.29e-26
x4<-1|G     -2.58e-26
x5<-1|G      6.40e-27
x6<-1|G     -6.83e-27
x7<-1|G      3.41e-27
x8<-1|G      1.88e-26
x9<-1|G      3.41e-27
x10<-1|G     0.00e+00
y<-x1|G     -7.72e-11
y<-x2|G     -3.17e-10
y<-x3|G      1.34e-10
y<-x4|G      1.41e-13
y<-x5|G      0.00e+00
y<-x6|G      0.00e+00
y<-x7|G      0.00e+00
y<-x8|G      0.00e+00
y<-x9|G      0.00e+00
y<-x10|G     0.00e+00
y<->y|G     -1.49e-10
x1<->x1|G    7.81e-18
x2<->x1|G   -2.64e-17
x3<->x1|G   -4.66e-18
x4<->x1|G   -2.60e-18
x5<->x1|G    1.04e-17
x6<->x1|G   -9.54e-18
x7<->x1|G    6.07e-18
x8<->x1|G   -1.73e-18
x9<->x1|G    6.94e-18
x10<->x1|G  -8.67e-19
x2<->x2|G   -1.90e-18
x3<->x2|G    4.34e-19
x4<->x2|G    1.73e-18
x5<->x2|G   -9.54e-18
x6<->x2|G   -2.52e-17
x7<->x2|G   -1.39e-17
x8<->x2|G    1.56e-17
x9<->x2|G    6.94e-18
x10<->x2|G  -5.20e-18
x3<->x3|G    2.13e-18
x4<->x3|G    1.52e-17
x5<->x3|G   -9.11e-18
x6<->x3|G    2.08e-17
x7<->x3|G    4.34e-18
x8<->x3|G   -4.77e-18
x9<->x3|G    1.73e-18
x10<->x3|G   1.73e-18
x4<->x4|G   -1.39e-17
x5<->x4|G   -5.20e-18
x6<->x4|G   -4.42e-17
x7<->x4|G   -8.67e-18
x8<->x4|G    7.81e-18
x9<->x4|G   -8.67e-18
x10<->x4|G  -9.54e-18
x5<->x5|G    1.48e-17
x6<->x5|G   -3.32e-17
x7<->x5|G    3.05e-18
x8<->x5|G    4.52e-18
x9<->x5|G   -4.50e-18
x10<->x5|G  -7.24e-18
x6<->x6|G   -1.20e-17
x7<->x6|G   -1.88e-18
x8<->x6|G    3.74e-19
x9<->x6|G    3.17e-19
x10<->x6|G  -6.78e-18
x7<->x7|G    2.55e-17
x8<->x7|G   -3.83e-17
x9<->x7|G    1.44e-17
x10<->x7|G  -1.57e-17
x8<->x8|G   -2.78e-17
x9<->x8|G   -3.73e-17
x10<->x8|G   2.76e-18
x9<->x9|G   -1.41e-18
x10<->x9|G  -2.39e-18
x10<->x10|G -1.16e-18

The subgradient shows that the obtained solution is optimal since all the elements are very small.