The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

LAVA: Recovering Sums of Sparse and Dense Signals

The lava() and group_lava() estimators in quadrupen

Motivation

Classical sparse estimators such as the Lasso assume that the true coefficient vector \(\theta\) is itself sparse: only a few entries are nonzero. This assumption fails when the signal has a mixed structure — a small number of large, identifiable effects superimposed on a broad background of many small effects. In this case neither the Lasso (too sparse) nor ridge regression (too dense) is well-suited.

The LAVA estimator (Chernozhukov, Hansen & Liao, Annals of Statistics, 2017) explicitly decomposes the coefficient vector as

\[\theta = \beta + \delta,\]

where \(\beta\) is the sparse component (penalized by \(\lambda_1 \|\beta\|_1\)) and \(\delta\) is the dense component (penalized by \(\frac{\lambda_2}{2} \delta^\top S\, \delta\)). The joint criterion is

\[\min_{\theta = \beta + \delta} \frac{1}{2n} \|y - X(\beta + \delta)\|^2 + \lambda_1 \|\beta\|_1 + \frac{\lambda_2}{2}\, \delta^\top S\, \delta,\]

where \(S\) is a positive semidefinite structuring matrix (identity by default). By varying \(\lambda_1\) along a regularization path (with \(\lambda_2\) fixed), quadrupen fits this model via lava() and returns an object of class LavaFit that gives access to both components separately.

Setup

library(quadrupen)
library(ggplot2)

Simulation: a mixed sparse + dense signal

We simulate a linear model where the true signal is the sum of a sparse part \(\beta\) (five isolated large effects) and a dense part \(\delta\) (small Gaussian perturbations on every predictor). Neither component alone characterizes the signal.

set.seed(42)
n <- 100
p <- 50

## Sparse component: 5 large effects, the rest zero
beta  <- numeric(p)
beta[c(5, 15, 25, 35, 45)] <- c(3, -3, 3, -3, 3)

## Dense component: small effects on all predictors
delta <- rnorm(p, mean = 0, sd = 0.2)

## Total signal
theta <- beta + delta

## Predictors with moderate AR(1) correlation
rho   <- 0.5
Sigma <- toeplitz(rho^(0:(p - 1)))
X     <- matrix(rnorm(n * p), n, p) %*% chol(Sigma)
y     <- X %*% theta + rnorm(n, sd = 2)
df_truth <- data.frame(
  index     = rep(1:p, 3),
  value     = c(beta, delta, theta),
  component = factor(
    rep(c("Sparse (β)", "Dense (δ)", "Total (θ = β + δ)"), each = p),
    levels = c("Sparse (β)", "Dense (δ)", "Total (θ = β + δ)")
  )
)

ggplot(df_truth, aes(x = index, y = value, fill = component)) +
  geom_col() +
  facet_wrap(~ component, ncol = 1, scales = "free_y") +
  scale_fill_manual(values = c("#56B4E9", "#E69F00", "#009E73"), guide = "none") +
  labs(x = "Predictor index", y = "Value",
       title = "True signal decomposition") +
  theme_bw()

Fitting with lava()

lava() takes the same interface as sparse_lm(). The lambda2 argument controls the ridge-like penalty on the dense component.

fit_lava <- lava(X, y, lambda2 = 1)
fit_lava
#> Linear regression with Lava penalizer.
#> - number of coefficients: 50 + intercept
#> - lambda regularization: 100 points from 16.1 to 0.161
#> - gamma regularization: 1

The returned object is an R6 instance of class LavaFit, inheriting from QuadrupenFit. In addition to all the standard fields and methods ($coefficients, $criteria(), $cross_validate(), etc.), it exposes two component-specific fields:

Regularization paths by component

$plot_path() accepts a component argument ("both", "sparse", "dense") to display each part of the decomposition independently.

fit_lava$plot_path(component = "both",   labels = NULL)
fit_lava$plot_path(component = "sparse", labels = NULL)
fit_lava$plot_path(component = "dense",  labels = NULL)

As \(\lambda_1\) increases (right to left on the path):

Model Selection

Let us plot the information criteria along the path. The BIC and eBIC select a model with a couple of large effects in the sparse part, while AIC selects a more complex model with more non-zero entries in the sparse component. mBIC is more conservative, as expected.

fit_lava$information_criteria$plot(c("AIC", "BIC", "eBIC", "mBIC"))

Model extraction and decomposition

$get_model() extracts the total estimated coefficient \(\hat\theta\) at a selected \(\lambda_1\). The corresponding sparse and dense components can be recovered from $sparse_coef and $dense_coef at the same index.

idx <- fit_lava$get_model("BIC", type = "index")

theta_hat <- fit_lava$get_model("BIC")[-1]          # drop intercept
beta_hat  <- as.numeric(fit_lava$sparse_coef[, idx])
delta_hat <- as.numeric(fit_lava$dense_coef[, idx])

cat("Non-zero entries in sparse component (BIC):",
    sum(beta_hat != 0), "\n")
#> Non-zero entries in sparse component (BIC): 12
cat("Non-zero entries in dense component (BIC): ",
    sum(delta_hat != 0), "\n")
#> Non-zero entries in dense component (BIC):  50
cat("Positions with |beta_hat| > 1:",
    which(abs(beta_hat) > 1), "\n")
#> Positions with |beta_hat| > 1: 5 15 25 35 45

The sparse component correctly locates the five large effects — those with |beta_hat| > 1 match the true positions exactly. The dense component is non-zero everywhere, absorbing the background of small signals.

Comparing estimators

We compare LAVA against the Lasso and Elastic-net at their BIC-selected models. In the LAVA panel, bars are split into two layers drawn with geom_col: the total estimate \(\hat\theta = \hat\beta + \hat\delta\) is drawn first (pink, dense component), and the sparse component \(\hat\beta\) is overlaid on top (green). The remaining pink portion at the tips of active bars represents the dense contribution \(\hat\delta\); the small pink bars at inactive positions are the dense signal alone.

fit_lasso <- lasso(X, y)
fit_enet  <- elastic_net(X, y, lambda2 = 1)
get_coef <- function(fit) {
  b <- fit$get_model("BIC")
  if ("Intercept" %in% names(b)) b <- b[-1]
  as.numeric(b)
}

method_levels <- c("True signal", "Lasso (BIC)", "Elastic-net (BIC)", "LAVA (BIC)")

## Non-LAVA panels: one bar per predictor
df_base <- rbind(
  data.frame(method = "True signal",       index = 1:p,
             value = theta,                fill_grp = "True signal"),
  data.frame(method = "Lasso (BIC)",       index = 1:p,
             value = get_coef(fit_lasso),  fill_grp = "Lasso"),
  data.frame(method = "Elastic-net (BIC)", index = 1:p,
             value = get_coef(fit_enet),   fill_grp = "Elastic-net")
)
df_base$method <- factor(df_base$method, levels = method_levels)

## LAVA panel: dense total drawn first, sparse component overlaid on top
df_lava_dense <- data.frame(
  method   = factor("LAVA (BIC)", levels = method_levels),
  index    = 1:p,
  value    = as.numeric(theta_hat),
  fill_grp = "LAVA dense (δ̂)"
)
df_lava_sparse <- data.frame(
  method   = factor("LAVA (BIC)", levels = method_levels),
  index    = 1:p,
  value    = beta_hat,
  fill_grp = "LAVA sparse (β̂)"
)

fill_palette <- c(
  "True signal"       = "grey20",
  "Lasso"             = "#56B4E9",
  "Elastic-net"       = "#E69F00",
  "LAVA dense (δ̂)"   = "#CC79A7",
  "LAVA sparse (β̂)"  = "#009E73"
)

ggplot(mapping = aes(x = index, y = value, fill = fill_grp)) +
  geom_col(data = df_base) +
  geom_col(data = df_lava_dense) +
  geom_col(data = df_lava_sparse) +
  facet_wrap(~ method, ncol = 2) +
  scale_fill_manual(
    values = fill_palette,
    breaks = c("LAVA sparse (β̂)", "LAVA dense (δ̂)"),
    name   = "LAVA decomposition"
  ) +
  labs(x = "Predictor index", y = "Estimated coefficient",
       title = "Coefficient recovery: Lasso vs. Elastic-net vs. LAVA") +
  theme_bw() +
  theme(legend.position = "bottom")
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf

Cross-validation

$cross_validate() works identically to the other QuadrupenFit subclasses.

set.seed(42)
fit_lava$cross_validate(K = 10, verbose = FALSE)
fit_lava$plot(type = "crossval")

set.seed(42)
fit_lasso$cross_validate(K = 10, verbose = FALSE)
fit_enet$cross_validate(K = 10, verbose = FALSE)

idx_cv   <- fit_lava$get_model("CV_min", type = "index")
beta_cv  <- as.numeric(fit_lava$sparse_coef[, idx_cv])
cat("Non-zero entries in sparse component (CV_min):",
    sum(beta_cv != 0), "\n")
#> Non-zero entries in sparse component (CV_min): 13
cat("Positions with |beta_cv| > 1:",
    which(abs(beta_cv) > 1), "\n")
#> Positions with |beta_cv| > 1: 5 15 25 35 45

Group LAVA: group-sparse + dense decomposition

group_lava() extends LAVA by replacing the element-wise \(\ell_1\) penalty on \(\beta\) with a group-wise mixed norm \(\Omega_g(\beta)\):

\[\min_{\theta = \beta + \delta} \frac{1}{2n} \|y - X(\beta + \delta)\|^2 + \lambda_1\, \Omega_g(\beta) + \frac{\lambda_2}{2}\, \delta^\top S\, \delta.\]

This is appropriate when the sparse component has group structure: entire groups of predictors are either active or silent. The type argument selects the group norm (\(\ell_1/\ell_2\) by default, or \(\ell_1/\ell_\infty\)).

Simulation with a group-sparse signal

## 5 groups of 10 predictors; groups 2 and 4 are active in the sparse part
group  <- rep(1:5, each = 10)
beta_g <- rep(c(0, 2, 0, -2, 0), each = 10)
delta_g <- rnorm(p, mean = 0, sd = 0.2)
theta_g <- beta_g + delta_g

y2 <- X %*% theta_g + rnorm(n, sd = 2)

group_names <- paste0("grp", 1:5)
var_labels  <- group_names[group]

Fitting

fit_gl <- group_lava(X, y2, group = group, lambda2 = 1)
fit_gl
#> Linear regression with group Lava l1/l2 penalizer.
#> - number of coefficients: 50 + intercept
#> - lambda regularization: 100 points from 14.9 to 0.149
#> - gamma regularization: 1

Paths of the sparse and dense components

fit_gl$plot_path(component = "sparse", labels = var_labels)
fit_gl$plot_path(component = "dense",  labels = var_labels)

The sparse path shows entire groups entering the model simultaneously (Group Lasso behaviour), while the dense path remains diffuse and non-zero everywhere.

Model selection

For group selection problems, eBIC (which carries a stronger penalty of \(\log(n) + 2\log(p)\)) is generally more conservative than BIC and better adapted to identifying which groups are truly active.

Group identification

idx_ebic   <- fit_gl$get_model("eBIC", type = "index")
beta_g_hat <- as.numeric(fit_gl$sparse_coef[, idx_ebic])

active_groups <- unique(group[beta_g_hat != 0])
cat("Active groups in sparse component (eBIC):", group_names[active_groups], "\n")
#> Active groups in sparse component (eBIC): grp2 grp4

Reference

Chernozhukov, V., Hansen, C., & Liao, Y. (2017). A lava attack on the recovery of sums of dense and sparse signals. The Annals of Statistics, 45(1), 39–76. https://doi.org/10.1214/16-AOS1434

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.