The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Preliminary
Consider the following setting:
Gaussian graphical model (GGM) assumption:
The data Xn × d consists of independent and identically distributed samples X1, …, Xn ∼ Nd(μ, Σ).
Disjoint group structure:
The d variables can be partitioned into disjoint groups.
Goal:
Estimate the precision matrix Ω = Σ−1 = (ωij)d × d.
Sparse-Group Estimator
where:
$S = n^{-1} \sum_{i=1}^n (X_i-\bar{X})(X_i-\bar{X})^\top$ is the empirical covariance matrix.
λ ≥ 0 is the global regularization parameter controlling overall shrinkage.
α ∈ [0, 1] is the mixing parameter controlling the balance between element-wise and block-wise penalties.
γ is the additional parameter controlling the curvature and effective degree of nonconvexity of the penalty.
Pα, γ(Ω) is a generic bi-level penalty template that can incorporate convex or non-convex regularizers while preserving the intrinsic group structure among variables.
Pγidv(Ω) is the element-wise individual penalty component.
Pγgrp(Ω) is the block-wise group penalty component.
pγ(⋅) is a penalty kernel parameterized by γ.
Ωgg′ is the submatrix of Ω with the rows from group g and columns from group g′.
The Frobenius norm ‖Ω‖F is defined as ‖Ω‖F = (∑i, j|ωij|2)1/2 = [tr(Ω⊤Ω)]1/2.
Note:
The regularization parameter λ acts as the scale factor for the entire penalty term λPα, γ(Ω).
The penalty kernel pγ(⋅) is the shape function that governs the fundamental characteristics of the regularization.
Penalties
- Lasso: Least absolute shrinkage and selection operator (Tibshirani 1996; Friedman, Hastie, and Tibshirani 2008)
λp(ωij) = λ|ωij|.
- Adaptive lasso (Zou 2006; Fan, Feng, and Wu 2009)
$$
\lambda p_\gamma(\omega_{ij}) = \lambda\frac{\vert\omega_{ij}\vert}{v_{ij}},
$$ where V = (vij)d × d = (|ω̃ij|γ)d × d is a matrix of adaptive weights, and ω̃ij is the initial estimate obtained using penalty = "lasso".
- Atan: Arctangent type penalty (Wang and Zhu 2016)
$$
\lambda p_\gamma(\omega_{ij})
= \lambda(\gamma+\frac{2}{\pi})
\arctan\left(\frac{\vert\omega_{ij}\vert}{\gamma}\right),
\quad \gamma > 0.
$$
- Exp: Exponential type penalty (Wang, Fan, and Zhu 2018)
$$
\lambda p_\gamma(\omega_{ij})
= \lambda\left[1-\exp\left(-\frac{\vert\omega_{ij}\vert}{\gamma}\right)\right],
\quad \gamma > 0.
$$
- Lq (Frank and Friedman 1993; Fu 1998; Fan and Li 2001)
λpγ(ωij) = λ|ωij|γ, 0 < γ < 1.
- LSP: Log-sum penalty (Candès, Wakin, and Boyd 2008)
$$
\lambda p_\gamma(\omega_{ij})
= \lambda\log\left(1+\frac{\vert\omega_{ij}\vert}{\gamma}\right),
\quad \gamma > 0.
$$
- MCP: Minimax concave penalty (Zhang 2010)
$$
\lambda p_\gamma(\omega_{ij})
= \begin{cases}
\lambda\vert\omega_{ij}\vert - \dfrac{\omega_{ij}^2}{2\gamma},
& \text{if } \vert\omega_{ij}\vert \leq \gamma\lambda, \\
\dfrac{1}{2}\gamma\lambda^2,
& \text{if } \vert\omega_{ij}\vert > \gamma\lambda.
\end{cases}
\quad \gamma > 1.
$$
- SCAD: Smoothly clipped absolute deviation (Fan and Li 2001; Fan, Feng, and Wu 2009)
$$
\lambda p_\gamma(\omega_{ij})
= \begin{cases}
\lambda\vert\omega_{ij}\vert
& \text{if } \vert\omega_{ij}\vert \leq \lambda, \\
\dfrac{2\gamma\lambda\vert\omega_{ij}\vert-\omega_{ij}^2-\lambda^2}{2(\gamma-1)}
& \text{if } \lambda < \vert\omega_{ij}\vert < \gamma\lambda, \\
\dfrac{\lambda^2(\gamma+1)}{2}
& \text{if } \vert\omega_{ij}\vert \geq \gamma\lambda.
\end{cases}
\quad \gamma > 2.
$$
Note:
For Lasso, which is convex, the additional parameter γ is not required, and the penalty kernel pγ(⋅) simplifies to p(⋅).
For MCP and SCAD, λ plays a dual role: it is the global regularization parameter, but it is also implicitly contained within the kernel pγ(⋅).
Illustrative Visualization
Figure 1 illustrates a comparison of various penalty functions λp(ω) evaluated over a range of ω values. The main panel (right) provides a wider view of the penalty functions’ behavior for larger |ω|, while the inset panel (left) magnifies the region near zero [−1, 1].
library(grasps) ## for penalty computation
library(ggplot2) ## for visualization
penalties <- c("atan", "exp", "lasso", "lq", "lsp", "mcp", "scad")
pen_df <- compute_penalty(seq(-4, 4, by = 0.01), penalties, lambda = 1)
plot(pen_df, xlim = c(-1, 1), ylim = c(0, 1), zoom.size = 1) +
guides(color = guide_legend(nrow = 2, byrow = TRUE))
Figure 2 displays the derivative function p′(ω) associated with a range of penalty types. The Lasso exhibits a constant derivative, corresponding to uniform shrinkage. For MCP and SCAD, the derivatives are piecewise: initially equal to the Lasso derivative, then decreasing over an intermediate region, and eventually dropping to zero, indicating that large |ω| receive no shrinkage. Other non-convex penalties show smoothly diminishing derivatives as |ω| increases, reflecting their tendency to shrink small |ω| strongly while exerting little to no shrinkage on large ones.
deriv_df <- compute_derivative(seq(0, 4, by = 0.01), penalties, lambda = 1)
plot(deriv_df) +
scale_y_continuous(limits = c(0, 1.5)) +
guides(color = guide_legend(nrow = 2, byrow = TRUE))
Reference
Candès, Emmanuel J., Michael B. Wakin, and Stephen P. Boyd. 2008.
“Enhancing Sparsity by Reweighted ℓ1 Minimization.” Journal of Fourier Analysis and Applications 14 (5): 877–905.
https://doi.org/10.1007/s00041-008-9045-x.
Fan, Jianqing, Yang Feng, and Yichao Wu. 2009.
“Network Exploration via the Adaptive LASSO and SCAD Penalties.” The Annals of Applied Statistics 3 (2): 521–41.
https://doi.org/10.1214/08-aoas215.
Fan, Jianqing, and Runze Li. 2001.
“Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60.
https://doi.org/10.1198/016214501753382273.
Frank, Lldiko E., and Jerome H. Friedman. 1993.
“A Statistical View of Some Chemometrics Regression Tools.” Technometrics 35 (2): 109–35.
https://doi.org/10.1080/00401706.1993.10485033.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008.
“Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.
https://doi.org/10.1093/biostatistics/kxm045.
Fu, Wenjiang J. 1998.
“Penalized Regressions: The Bridge Versus the Lasso.” Journal of Computational and Graphical Statistics 7 (3): 397–416.
https://doi.org/10.1080/10618600.1998.10474784.
Tibshirani, Robert. 1996.
“Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society: Series B (Methodological) 58 (1): 267–88.
https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Wang, Yanxin, Qibin Fan, and Li Zhu. 2018.
“Variable Selection and Estimation Using a Continuous Approximation to the L0 Penalty.” Annals of the Institute of Statistical Mathematics 70 (1): 191–214.
https://doi.org/10.1007/s10463-016-0588-3.
Wang, Yanxin, and Li Zhu. 2016.
“Variable Selection and Parameter Estimation with the Atan Regularization Method.” Journal of Probability and Statistics 2016: 6495417.
https://doi.org/10.1155/2016/6495417.
Zhang, Cun-Hui. 2010.
“Nearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics 38 (2): 894–942.
https://doi.org/10.1214/09-AOS729.
Zou, Hui. 2006.
“The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association 101 (476): 1418–29.
https://doi.org/10.1198/016214506000000735.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.