The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The gradLasso package implements an efficient gradient
descent solver for LASSO-penalized regression models. It supports
several families including Gaussian, Binomial, Negative Binomial, and
Zero-Inflated Negative Binomial (ZINB). It also features built-in
stability selection and cross-validation.
This vignette demonstrates the basic usage of the package.
We start by simulating simple Gaussian data with correlated predictors.
set.seed(42)
# Simulate 200 obs, 20 predictors, 5 active
sim <- simulate_data(n = 200, p = 20, family = "gaussian", k = 5, snr = 3.0)
df <- data.frame(y = sim$y, sim$X)
# Check the first few rows
head(df[, 1:6])
#> y Var1 Var2 Var3 Var4 Var5
#> 1 2.4088118 1.3709584 -1.6863106 0.9706798 -0.04932661 0.66502569
#> 2 -1.1866537 -0.5646982 0.2140939 -0.8088901 0.25200976 0.76083532
#> 3 -0.5444072 0.3631284 1.2202852 0.2984229 1.02738323 0.41846489
#> 4 -0.4884482 0.6328626 2.1445006 0.4769757 0.91408140 -0.01476616
#> 5 -1.2214254 0.4042683 -1.2681897 -0.8203085 -0.81123831 -1.50034498
#> 6 -1.2940614 -0.1061245 -1.1488285 -1.2083257 1.29080373 -0.04453506We can fit the model using the standard formula interface. By
default, gradLasso performs 50 bootstraps for stability
selection.
fit <- gradLasso(y ~ ., data = df, lambda_cv = TRUE, boot = TRUE, n_boot = 50)
print(fit)
#>
#> gradLasso Fitted Object
#> Family: gaussian
#> Lambda: 0.009446
#> Deviance: 61
#> Use plot() to view diagnostics or summary() for coefficients.We can inspect the selected coefficients using
summary(). The “Selection_Prob” column shows how often each
variable was selected across bootstrap iterations.
summary(fit)
#>
#> ------------------------------------------------
#> gradLasso Model Summary
#> ------------------------------------------------
#> Family: gaussian
#> Deviance: 61.20
#> AIC: 101.20
#> BIC: 167.17
#> DF: 20
#> ------------------------------------------------
#> Lambda: 0.009446 (Selected via CV)
#> Method: Stability Selection (50 bootstraps)
#> Interval: 2.5% - 97.5%
#>
#> --- Selected Coefficients ---
#> Predictor Estimate Selection_Prob Boot_Mean CI_2.5 CI_97.5
#> Var1 0.4690 1.00 0.4763 0.4054 0.5365
#> Var2 -0.4402 1.00 -0.4364 -0.5347 -0.3436
#> Var3 0.6597 1.00 0.6626 0.5642 0.7326
#> Var4 -0.3445 1.00 -0.3404 -0.4040 -0.2621
#> Var5 0.4756 1.00 0.4894 0.4069 0.5586
#> Var8 0.1467 1.00 0.1574 0.0963 0.2167
#> Var13 -0.0868 1.00 -0.0832 -0.1502 -0.0190
#> Var17 0.0540 0.98 0.0545 0.0011 0.1313
#> (Intercept) -0.0739 0.96 -0.0701 -0.1734 0.0000
#> Var14 0.0470 0.96 0.0386 -0.0083 0.0976
#> Var15 0.0367 0.96 0.0432 -0.0147 0.1094
#> Var19 0.0403 0.96 0.0407 -0.0448 0.1102
#> Var10 -0.0316 0.90 -0.0208 -0.1069 0.0160
#> Var11 -0.0142 0.90 -0.0248 -0.0871 0.0617
#> Var7 -0.0277 0.88 -0.0321 -0.1171 0.0277
#> Var18 -0.0176 0.88 -0.0172 -0.0748 0.0562
#> Var20 0.0082 0.86 0.0127 -0.0377 0.0845
#> Var6 0.0033 0.84 0.0140 -0.0582 0.0662
#> Var16 0.0236 0.84 0.0244 -0.0483 0.0945
#> Var9 -0.0349 0.82 -0.0379 -0.1221 0.0150
#> Var12 0.0000 0.78 -0.0011 -0.0491 0.0466
#> ------------------------------------------------gradLasso specializes in complex GLMs like ZINB. We
support a pipe syntax (|) to specify different predictors
for the Count model and the Zero-Inflation model. Simulation
We simulate data where the count model depends on different variables than the zero-inflation model.
set.seed(456)
sim_zinb <- simulate_data(n = 500, p = 20, family = "zinb",
k_mu = 5, k_pi = 5, theta = 2.0)
df_zinb <- data.frame(y = sim_zinb$y, sim_zinb$X)We use the pipe syntax:
y ~ predictors_for_count | predictors_for_zero. Here we use
all variables (.) for both models.
# We use a smaller number of bootstraps for speed in this vignette
fit_zinb <- gradLasso(y ~ . | ., data = df_zinb,
family = grad_zinb(),
n_boot = 10,
lambda = 0.05) # Fixed lambda for demonstration
print(fit_zinb)
#>
#> gradLasso Fitted Object
#> Family: zinb
#> Lambda: 0.0295
#> Deviance: 1202
#> Use plot() to view diagnostics or summary() for coefficients.The summary automatically splits coefficients into “Count”, “Zero-Infl”, and “Dispersion” components.
summary(fit_zinb)
#>
#> ------------------------------------------------
#> gradLasso Model Summary
#> ------------------------------------------------
#> Family: zinb
#> Deviance: 1202.12
#> AIC: 1240.12
#> BIC: 1320.20
#> DF: 19
#> ------------------------------------------------
#> Lambda: 0.0295 (Selected via CV)
#> Method: Stability Selection (10 bootstraps)
#> Interval: 2.5% - 97.5%
#>
#> --- Count Model Coefficients ---
#> Predictor Estimate Selection_Prob Boot_Mean CI_2.5 CI_97.5
#> (Intercept) 0.2891 1.0 0.3411 0.1989 0.4096
#> Var1 0.3159 1.0 0.2416 0.1681 0.3159
#> Var2 -0.2687 1.0 -0.3654 -0.4571 -0.2199
#> Var3 0.4674 1.0 0.5103 0.4660 0.6454
#> Var4 -0.0798 1.0 -0.0887 -0.2103 0.0115
#> Var5 0.2063 1.0 0.2291 0.1913 0.3321
#> Var6 0.0913 1.0 0.0742 0.0478 0.1382
#> Var7 -0.1512 1.0 -0.2090 -0.2888 -0.0937
#> Var16 -0.0156 0.9 -0.0798 -0.2015 0.0000
#> Var19 0.0211 0.9 0.0247 0.0047 0.0313
#> Var12 0.0000 0.8 0.0273 0.0000 0.0547
#> Var17 -0.0548 0.8 -0.0973 -0.1618 0.0000
#> Var9 0.0168 0.7 0.0367 0.0000 0.0667
#> Var13 0.0000 0.6 0.0084 0.0000 0.0651
#> Var14 0.0000 0.6 -0.0183 -0.0681 0.0000
#> Var20 0.0000 0.5 0.0170 0.0000 0.0848
#> Var8 0.0000 0.4 -0.0034 -0.0171 0.0000
#> Var10 0.0000 0.3 0.0000 0.0000 0.0000
#> Var11 0.0000 0.3 0.0229 0.0000 0.1014
#> Var18 0.0000 0.3 0.0050 0.0000 0.0258
#> Var15 0.0000 0.1 0.0000 0.0000 0.0000
#>
#> --- Zero-Inflation Model Coefficients ---
#> Predictor Estimate Selection_Prob Boot_Mean CI_2.5 CI_97.5
#> Var6 -0.0114 0.9 -0.1323 -0.2006 -0.0026
#> Var5 0.0457 0.8 0.0409 0.0000 0.1585
#> (Intercept) 0.0104 0.7 -0.0059 -0.0260 0.0000
#> Var1 -0.0413 0.7 -0.1890 -0.3615 0.0000
#> Var12 -0.0045 0.7 -0.0009 -0.0045 0.0000
#> Var4 -0.0653 0.6 -0.0499 -0.1840 0.0000
#> Var13 0.0000 0.6 0.0062 0.0000 0.0124
#> Var7 0.0000 0.2 0.0245 0.0000 0.1897
#> Var14 0.0000 0.2 0.0000 0.0000 0.0000
#> Var3 0.0000 0.1 0.0035 0.0000 0.0268
#> Var8 0.0000 0.1 0.0000 0.0000 0.0000
#> Var10 0.0000 0.1 -0.0048 -0.0371 0.0000
#> Var11 0.0000 0.1 0.0000 0.0000 0.0000
#> Var20 0.0000 0.1 0.0016 0.0000 0.0122
#>
#> --- Dispersion ---
#> Theta: 1.3594
#> ------------------------------------------------For large datasets, gradLasso supports parallel
execution for both Cross-Validation and Bootstrapping.
Conclusion
gradLasso provides a unified, tidy interface for sparse
regression across multiple GLM families. Its integrated stability
selection offers robust variable selection for high-dimensional
data.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.