Introduction to gradLasso

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

The gradLasso package implements an efficient gradient descent solver for LASSO-penalized regression models. It supports several families including Gaussian, Binomial, Negative Binomial, and Zero-Inflated Negative Binomial (ZINB). It also features built-in stability selection and cross-validation.

1. Gaussian Regression (Standard LASSO)

We start by simulating simple Gaussian data with correlated predictors.

set.seed(42)
# Simulate 200 obs, 20 predictors, 5 active
sim <- simulate_data(n = 200, p = 20, family = "gaussian", k = 5, snr = 3.0)
df <- data.frame(y = sim$y, sim$X)

# Check the first few rows
head(df[, 1:6])
#>            y       Var1       Var2       Var3        Var4        Var5
#> 1  2.4088118  1.3709584 -1.6863106  0.9706798 -0.04932661  0.66502569
#> 2 -1.1866537 -0.5646982  0.2140939 -0.8088901  0.25200976  0.76083532
#> 3 -0.5444072  0.3631284  1.2202852  0.2984229  1.02738323  0.41846489
#> 4 -0.4884482  0.6328626  2.1445006  0.4769757  0.91408140 -0.01476616
#> 5 -1.2214254  0.4042683 -1.2681897 -0.8203085 -0.81123831 -1.50034498
#> 6 -1.2940614 -0.1061245 -1.1488285 -1.2083257  1.29080373 -0.04453506

We can fit the model using the standard formula interface. By default, gradLasso performs 50 bootstraps for stability selection.

fit <- gradLasso(y ~ ., data = df, lambda_cv = TRUE, boot = TRUE, n_boot = 50)

print(fit)
#> 
#> gradLasso Fitted Object
#> Family: gaussian 
#> Lambda: 0.009446 
#> Deviance: 61 
#> Use plot() to view diagnostics or summary() for coefficients.

We can inspect the selected coefficients using summary(). The “Selection_Prob” column shows how often each variable was selected across bootstrap iterations.

summary(fit)
#> 
#> ------------------------------------------------
#> gradLasso Model Summary
#> ------------------------------------------------
#> Family:   gaussian
#> Deviance: 61.20
#> AIC:      101.20
#> BIC:      167.17
#> DF:       20
#> ------------------------------------------------
#> Lambda:   0.009446 (Selected via CV)
#> Method:   Stability Selection (50 bootstraps)
#> Interval: 2.5% - 97.5%
#> 
#> --- Selected Coefficients ---
#>    Predictor Estimate Selection_Prob Boot_Mean  CI_2.5 CI_97.5
#>         Var1   0.4690           1.00    0.4763  0.4054  0.5365
#>         Var2  -0.4402           1.00   -0.4364 -0.5347 -0.3436
#>         Var3   0.6597           1.00    0.6626  0.5642  0.7326
#>         Var4  -0.3445           1.00   -0.3404 -0.4040 -0.2621
#>         Var5   0.4756           1.00    0.4894  0.4069  0.5586
#>         Var8   0.1467           1.00    0.1574  0.0963  0.2167
#>        Var13  -0.0868           1.00   -0.0832 -0.1502 -0.0190
#>        Var17   0.0540           0.98    0.0545  0.0011  0.1313
#>  (Intercept)  -0.0739           0.96   -0.0701 -0.1734  0.0000
#>        Var14   0.0470           0.96    0.0386 -0.0083  0.0976
#>        Var15   0.0367           0.96    0.0432 -0.0147  0.1094
#>        Var19   0.0403           0.96    0.0407 -0.0448  0.1102
#>        Var10  -0.0316           0.90   -0.0208 -0.1069  0.0160
#>        Var11  -0.0142           0.90   -0.0248 -0.0871  0.0617
#>         Var7  -0.0277           0.88   -0.0321 -0.1171  0.0277
#>        Var18  -0.0176           0.88   -0.0172 -0.0748  0.0562
#>        Var20   0.0082           0.86    0.0127 -0.0377  0.0845
#>         Var6   0.0033           0.84    0.0140 -0.0582  0.0662
#>        Var16   0.0236           0.84    0.0244 -0.0483  0.0945
#>         Var9  -0.0349           0.82   -0.0379 -0.1221  0.0150
#>        Var12   0.0000           0.78   -0.0011 -0.0491  0.0466
#> ------------------------------------------------

Diagnostics

We can visualize the stability path and residual plots.

# Plot Stability Selection (Plot 1) and CV Deviance (Plot 2)
plot(fit, which = c(1, 2))

2. Zero-Inflated Negative Binomial (ZINB)

gradLasso specializes in complex GLMs like ZINB. We support a pipe syntax (|) to specify different predictors for the Count model and the Zero-Inflation model. Simulation

We simulate data where the count model depends on different variables than the zero-inflation model.

set.seed(456)
sim_zinb <- simulate_data(n = 500, p = 20, family = "zinb",
                          k_mu = 5, k_pi = 5, theta = 2.0)
df_zinb <- data.frame(y = sim_zinb$y, sim_zinb$X)

Model Fitting

We use the pipe syntax: y ~ predictors_for_count | predictors_for_zero. Here we use all variables (.) for both models.

# We use a smaller number of bootstraps for speed in this vignette
fit_zinb <- gradLasso(y ~ . | ., data = df_zinb,
                      family = grad_zinb(),
                      n_boot = 10,
                      lambda = 0.05) # Fixed lambda for demonstration

print(fit_zinb)
#> 
#> gradLasso Fitted Object
#> Family: zinb 
#> Lambda: 0.0295 
#> Deviance: 1202 
#> Use plot() to view diagnostics or summary() for coefficients.

Inspecting ZINB Coefficients

The summary automatically splits coefficients into “Count”, “Zero-Infl”, and “Dispersion” components.

summary(fit_zinb)
#> 
#> ------------------------------------------------
#> gradLasso Model Summary
#> ------------------------------------------------
#> Family:   zinb
#> Deviance: 1202.12
#> AIC:      1240.12
#> BIC:      1320.20
#> DF:       19
#> ------------------------------------------------
#> Lambda:   0.0295 (Selected via CV)
#> Method:   Stability Selection (10 bootstraps)
#> Interval: 2.5% - 97.5%
#> 
#> --- Count Model Coefficients ---
#>    Predictor Estimate Selection_Prob Boot_Mean  CI_2.5 CI_97.5
#>  (Intercept)   0.2891            1.0    0.3411  0.1989  0.4096
#>         Var1   0.3159            1.0    0.2416  0.1681  0.3159
#>         Var2  -0.2687            1.0   -0.3654 -0.4571 -0.2199
#>         Var3   0.4674            1.0    0.5103  0.4660  0.6454
#>         Var4  -0.0798            1.0   -0.0887 -0.2103  0.0115
#>         Var5   0.2063            1.0    0.2291  0.1913  0.3321
#>         Var6   0.0913            1.0    0.0742  0.0478  0.1382
#>         Var7  -0.1512            1.0   -0.2090 -0.2888 -0.0937
#>        Var16  -0.0156            0.9   -0.0798 -0.2015  0.0000
#>        Var19   0.0211            0.9    0.0247  0.0047  0.0313
#>        Var12   0.0000            0.8    0.0273  0.0000  0.0547
#>        Var17  -0.0548            0.8   -0.0973 -0.1618  0.0000
#>         Var9   0.0168            0.7    0.0367  0.0000  0.0667
#>        Var13   0.0000            0.6    0.0084  0.0000  0.0651
#>        Var14   0.0000            0.6   -0.0183 -0.0681  0.0000
#>        Var20   0.0000            0.5    0.0170  0.0000  0.0848
#>         Var8   0.0000            0.4   -0.0034 -0.0171  0.0000
#>        Var10   0.0000            0.3    0.0000  0.0000  0.0000
#>        Var11   0.0000            0.3    0.0229  0.0000  0.1014
#>        Var18   0.0000            0.3    0.0050  0.0000  0.0258
#>        Var15   0.0000            0.1    0.0000  0.0000  0.0000
#> 
#> --- Zero-Inflation Model Coefficients ---
#>    Predictor Estimate Selection_Prob Boot_Mean  CI_2.5 CI_97.5
#>         Var6  -0.0114            0.9   -0.1323 -0.2006 -0.0026
#>         Var5   0.0457            0.8    0.0409  0.0000  0.1585
#>  (Intercept)   0.0104            0.7   -0.0059 -0.0260  0.0000
#>         Var1  -0.0413            0.7   -0.1890 -0.3615  0.0000
#>        Var12  -0.0045            0.7   -0.0009 -0.0045  0.0000
#>         Var4  -0.0653            0.6   -0.0499 -0.1840  0.0000
#>        Var13   0.0000            0.6    0.0062  0.0000  0.0124
#>         Var7   0.0000            0.2    0.0245  0.0000  0.1897
#>        Var14   0.0000            0.2    0.0000  0.0000  0.0000
#>         Var3   0.0000            0.1    0.0035  0.0000  0.0268
#>         Var8   0.0000            0.1    0.0000  0.0000  0.0000
#>        Var10   0.0000            0.1   -0.0048 -0.0371  0.0000
#>        Var11   0.0000            0.1    0.0000  0.0000  0.0000
#>        Var20   0.0000            0.1    0.0016  0.0000  0.0122
#> 
#> --- Dispersion ---
#> Theta: 1.3594
#> ------------------------------------------------

3. Parallel Processing

For large datasets, gradLasso supports parallel execution for both Cross-Validation and Bootstrapping.

# Example (not run in vignette):
# fit <- gradLasso(y ~ ., data = df, parallel = TRUE, n_cores = 4)

Conclusion

gradLasso provides a unified, tidy interface for sparse regression across multiple GLM families. Its integrated stability selection offers robust variable selection for high-dimensional data.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.