The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Sparse penalties such as the Lasso treat all predictors symmetrically: coefficients are shrunk independently without any notion of which predictors are related. In many applications, however, the signal has a known structure — predictors are grouped, ordered, or connected through a graph — and penalties that exploit this structure can substantially improve estimation.
This vignette builds a progressively richer family of penalties on a single simulation, showing how incorporating structural knowledge step by step leads to better coefficient recovery.
We simulate a linear model with \(p = 95\) predictors and \(n = 50\) observations. The true coefficient vector \(\beta\) is piecewise constant: it is zero over three large segments (sizes 25, 25, 25) and takes values \(+1\) and \(-1\) over two smaller active blocks (sizes 10 each).
set.seed(42)
p <- 95
n <- 50
beta <- rep(c(0, 1, 0, -1, 0), c(25, 10, 25, 10, 25))
## Block-correlated predictors: high within-segment, zero across segments
rho <- 0.75
Soo <- toeplitz(rho^(0:24)) # Toeplitz correlation within zero segments
Sww <- matrix(rho, 10, 10) # constant correlation within active segments
Sigma <- bdiag(Soo, Sww, Soo, Sww, Soo)
diag(Sigma) <- 1
X <- as.matrix(matrix(rnorm(p * n), n, p) %*% chol(Sigma))
y <- X %*% beta + rnorm(n, 0, 10)
## Segment labels (for plot legends)
segments <- rep(1:5, c(25, 10, 25, 10, 25))
segment_names <- paste0("seg", 1:5)
seg_labels <- segment_names[segments]The five segments are clearly visible in the true signal:
data.frame(index = seq_along(beta), beta = beta, segment = seg_labels) |>
ggplot(aes(x = index, y = beta, colour = segment)) +
geom_step() + geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.4) +
labs(x = "Predictor index", y = expression(beta), title = "True coefficient vector") +
theme_bw()The structural prior we wish to encode is: predictors within the same segment should have similar coefficients. This is captured by a community graph whose edges connect all pairs of predictors belonging to the same segment (a union of disjoint cliques).
## Adjacency matrix: clique within each segment, no edges across
Ioo <- matrix(1, 25, 25)
Iww <- matrix(1, 10, 10)
A <- as.matrix(bdiag(Ioo, Iww, Ioo, Iww, Ioo))
diag(A) <- 0
## Graph Laplacian: L = D - A (regularized for positive definiteness)
L2 <- -A
diag(L2) <- colSums(A) + 1e-2The Laplacian \(L_2\) defines the quadratic form
\[\beta^\top L_2\, \beta = \sum_{(i,j)\in E} (\beta_i - \beta_j)^2,\]
which penalizes differences between connected predictors.
Adding \(\lambda_2 \beta^\top L_2 \beta /
2\) to the loss therefore encourages within-segment coefficients
to be equal — a perfect match for the piecewise-constant signal. Note
that \(L_2\) is passed as the
struct argument to ridge() and
elastic_net().
For the Fused Lasso, the graph structure is encoded differently:
struct takes the adjacency matrix \(A\) directly, since the fusion penalty
is
\[\lambda_2 \sum_{(i,j)\in E} |\beta_i - \beta_j|,\]
an \(\ell_1\) (not \(\ell_2\)) penalty on pairwise differences. The graph topology is the same, but the penalty form differs.
if (requireNamespace("igraph", quietly = TRUE)) {
g <- igraph::graph_from_adjacency_matrix(A, mode = "undirected")
cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#CC79A7")
vertex_colors <- cols[segments]
igraph::plot.igraph(
g, vertex.color = vertex_colors, vertex.size = 3,
vertex.label = NA, edge.width = 0.1,
main = "Community graph (colour = segment)"
)
legend("topleft", legend = segment_names, fill = cols, bty = "n", cex = 0.8)
}The Lasso applies an element-wise \(\ell_1\) penalty and treats all predictors independently. With highly correlated predictors it tends to select one representative per block arbitrarily, producing an erratic path.
fit_lasso <- lasso(X, y, intercept = FALSE)
fit_lasso
#> Linear regression with L1 penalizer.
#> - number of coefficients: 95 + intercept
#> - lambda regularization: 100 points from 44.7 to 0.447
#> - gamma regularization: 0The Fused Lasso adds an \(\ell_1\) penalty on differences between adjacent coefficients on top of the standard \(\ell_1\) penalty:
\[\min_\beta \frac{1}{2}\|y - X\beta\|^2 + \lambda_1 \|\beta\|_1 + \lambda_2 \sum_{i=1}^{p-1} |\beta_{i+1} - \beta_i|.\]
By default, fused_lasso() uses a chain
graph (consecutive pairs), which is appropriate for ordered or
spatial signals.
fit_fused_chain <- fused_lasso(X, y, lambda2 = 5, intercept = FALSE)
fit_fused_chain
#> Linear regression with Fused-LASSO penalizer.
#> - number of coefficients: 95 + intercept
#> - l1 regularization: 50 points from 44.7 to 0.447
#> - l2 regularization: 5Replacing the chain by the community adjacency matrix \(A\) fuses predictors within the same segment rather than consecutive ones. This directly targets the true block structure of \(\beta\).
fit_fused_comm <- fused_lasso(X, y, lambda2 = 5, struct = A, intercept = FALSE)
fit_fused_comm
#> Linear regression with Fused-LASSO penalizer.
#> - number of coefficients: 95 + intercept
#> - l1 regularization: 50 points from 44.7 to 0.447
#> - l2 regularization: 5The standard ridge regression shrinks all coefficients toward zero. The structured ridge replaces the identity by a positive semidefinite matrix \(S\):
\[\min_\beta \frac{1}{2}\|y - X\beta\|^2 + \frac{\lambda_2}{2}\, \beta^\top S\, \beta.\]
With \(S = L_2\) (the community Laplacian), the penalty becomes \(\frac{\lambda_2}{2}\beta^\top L_2 \beta\), which shrinks within-segment differences rather than the coefficients themselves.
The path is a blur: all predictors are shrunk toward zero with no differentiation between segments.
fit_ridge_struct <- ridge(X, y, struct = L2, lambda_max = 1000, intercept = FALSE)
fit_ridge_struct$plot_path(labels = seg_labels)The Laplacian prior forces coefficients within the same segment to track each other. The two active segments (2 and 4) now emerge as coherent bundles.
The Elastic-net combines an \(\ell_1\) penalty with a quadratic penalty:
\[\min_\beta \frac{1}{2}\|y - X\beta\|^2 + \lambda_1\|\beta\|_1 + \frac{\lambda_2}{2}\, \beta^\top S\, \beta.\]
Setting \(S = L_2\) yields the structured Elastic-net: the \(\ell_1\) term drives entire segments to zero (sparsity at the segment level), while the Laplacian term forces the non-zero segments to be internally smooth.
fit_enet_struct <- elastic_net(X, y, lambda2 = 5, struct = L2, intercept = FALSE)
fit_enet_struct
#> Linear regression with L1 plus L2-regularization penalizer.
#> - number of coefficients: 95 + intercept
#> - lambda regularization: 100 points from 44.7 to 0.447
#> - gamma regularization: 5The two active segments now enter the path as well-separated, near-constant bundles while the three zero segments stay at zero.
We extract the BIC-selected coefficient vector for each method and compare visually against the true \(\beta\).
methods <- list(
"Lasso" = fit_lasso,
"Fused Lasso (chain)" = fit_fused_chain,
"Fused Lasso (community)"= fit_fused_comm,
"Ridge (standard)" = fit_ridge_std,
"Ridge (structured)" = fit_ridge_struct,
"Elastic-net (structured)"= fit_enet_struct
)
extract_coef <- function(fit) {
b <- fit$get_model("BIC")
if ("Intercept" %in% names(b)) b <- b[-1] # drop intercept
b
}
df_coef <- do.call(rbind, lapply(names(methods), function(nm) {
b <- extract_coef(methods[[nm]])
data.frame(
method = factor(nm, levels = names(methods)),
index = seq_along(b),
value = as.numeric(b),
segment = seg_labels
)
}))
df_true <- data.frame(
index = seq_along(beta),
beta = beta,
segment = seg_labels
)## Background shading for each segment
seg_bounds <- data.frame(
xmin = c(1, 26, 36, 61, 71),
xmax = c(25, 35, 60, 70, 95),
fill = factor(1:5)
)
ggplot(df_coef, aes(x = index, y = value)) +
geom_rect(data = seg_bounds,
aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf, fill = fill),
alpha = 0.08, inherit.aes = FALSE) +
scale_fill_manual(values = c("#E69F00","#56B4E9","#009E73","#F0E442","#CC79A7"),
guide = "none") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.3) +
geom_step(data = df_true, aes(y = beta), colour = "black", linewidth = 0.8,
linetype = "dashed") +
geom_point(aes(colour = segment), size = 0.8, alpha = 0.7) +
scale_colour_manual(values = c("#E69F00","#56B4E9","#009E73","#F0E442","#CC79A7")) +
facet_wrap(~ method, ncol = 2) +
labs(x = "Predictor index", y = "Estimated coefficient",
title = "Coefficient recovery by method",
subtitle = "Dashed line: true β",
colour = "Segment") +
theme_bw() +
theme(legend.position = "bottom")The progression illustrates the benefit of structural priors:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.