The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Package {combss}


Type: Package
Title: Continuous Optimisation Towards Best Subset Selection
Version: 0.1.0
Author: Benoit Liquet ORCID iD [aut, cre], Anant Mathur [aut], Sarat Moka [aut]
Maintainer: Benoit Liquet <benoit.liquet@univ-pau.fr>
Description: Best subset selection in generalised linear models via continuous optimisation. Reformulates the NP-hard discrete subset selection problem as a continuous optimisation over the hypercube [0,1]^p, solved via a Frank-Wolfe homotopy algorithm with closed-form ridge inner solves. Supports linear (Gaussian), binary logistic, and multinomial regression. For methodological details see Moka, Liquet, Zhu and Muller (2024) <doi:10.1007/s11222-024-10387-8> and Mathur, Liquet, Muller and Moka (2026) <doi:10.48550/arXiv.2603.21952>.
License: GPL-3
URL: https://github.com/benoit-liquet/combss
Encoding: UTF-8
Imports: glmnet (≥ 4.0), stats
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-05-06 06:20:54 UTC; benoit
Repository: CRAN
Date/Publication: 2026-05-11 19:10:26 UTC

combss: Best Subset Selection for Generalised Linear Models via Continuous Optimisation

Description

Best subset selection in linear, binary logistic, and multinomial logistic regression via a Frank-Wolfe homotopy algorithm. The algorithm reformulates the NP-hard discrete subset selection problem as a continuous optimisation over the hypercube ⁠[0, 1]^p⁠, scaling to high-dimensional settings with p >> n. Inner ridge solves use glmnet.

Author(s)

Maintainer: Benoit Liquet benoit.liquet@univ-pau.fr (ORCID)

Authors:

See Also

Useful links:


Extract the subset of selected feature indices.

Description

Extract the subset of selected feature indices.

Usage

## S3 method for class 'combss'
coef(object, k = NULL, ...)

Arguments

object

A combss object.

k

Subset size to return. Defaults to the best k if validation data was used, else the largest k evaluated.

...

Unused.

Value

Integer vector of selected column indices (1-indexed).

Examples

set.seed(1)
x <- matrix(rnorm(100 * 20), 100, 20)
y <- as.numeric(x[, 1] + x[, 2] + rnorm(100) * 0.1)
fit <- combss(x, y, family = "gaussian", q = 5)
coef(fit, k = 2)

Best subset selection for generalised linear models via continuous optimisation (COMBSS).

Description

Reformulates the NP-hard discrete subset selection problem as a continuous optimisation over the hypercube ⁠[0, 1]^p⁠, solved via a Frank-Wolfe homotopy algorithm. The inner ridge problem is solved with glmnet. For each subset size ⁠k = 1, ..., q⁠, COMBSS returns the selected feature set; if validation data is supplied, the best k is chosen by validation MSE (Gaussian) or accuracy (binomial / multinomial).

Usage

combss(
  x,
  y,
  family = c("gaussian", "binomial", "multinomial"),
  q = NULL,
  x_val = NULL,
  y_val = NULL,
  lam_ridge = 0,
  Niter = 25,
  alpha = 0.01,
  scale = TRUE,
  mandatory = NULL,
  verbose = FALSE
)

Arguments

x

Numeric design matrix ⁠(n, p)⁠ without an intercept column.

y

Response. Numeric for family = "gaussian"; ⁠{0, 1}⁠ (or two-level factor) for family = "binomial"; factor or integer in ⁠{1, ..., C}⁠ for family = "multinomial".

family

One of "gaussian" (alias: "linear"), "binomial", "multinomial".

q

Maximum subset size. Defaults to min(n, p).

x_val, y_val

Optional validation data. When provided, the best k is selected by validation metric and coef_ is returned for the best subset.

lam_ridge

Ridge penalty for the inner solver. Default 0.

Niter

Number of homotopy iterations. Default 25.

alpha

Frank-Wolfe step size. Default 0.01.

scale

Column-normalise x before the algorithm. Default TRUE.

mandatory

Optional integer vector of 1-indexed columns of x that must be included in every model.

verbose

Print progress. Default FALSE.

Value

An object of class "combss", a list with elements:

References

Moka, S., Liquet, B., Zhu, H. and Muller, S. (2024). COMBSS: best subset selection via continuous optimization. Statistics and Computing. doi:10.1007/s11222-024-10387-8

Mathur, A., Liquet, B., Muller, S. and Moka, S. (2026). Parsimonious Subset Selection for Generalized Linear Models with Biomedical Applications. arXiv:2603.21952.

Examples

set.seed(1)
n <- 200; p <- 30
beta <- c(3, 2, 1.5, 1, 0.5, rep(0, p - 5))
x <- matrix(rnorm(n * p), n, p)
y <- as.numeric(x %*% beta + rnorm(n) * 0.5)
fit <- combss(x, y, family = "gaussian", q = 10)
fit$subset_list[1:5]


Select the ridge penalty for COMBSS by leave-one-out cross-validation.

Description

Mirrors combss.cv.select_lambda in the Python combss package. For each candidate lambda and each subset size ⁠k = 1, ..., q⁠, runs COMBSS to obtain a subset and evaluates LOOCV error of a refit. Returns the best lambda overall and the best lambda per k.

Usage

combss_cv(
  x,
  y,
  family = c("gaussian", "binomial", "multinomial"),
  q,
  lambda_grid = NULL,
  Niter = 25,
  verbose = FALSE
)

Arguments

x

Numeric design matrix ⁠(n, p)⁠.

y

Response (numeric / binary / factor depending on family).

family

One of "gaussian" (alias "linear"), "binomial", "multinomial".

q

Maximum subset size.

lambda_grid

Candidate ridge penalties. Defaults to c(0, 10^seq(-3, 1, length.out = 10)).

Niter

Number of homotopy iterations passed to combss().

verbose

Print progress.

Value

List with:


Predict from a combss fit.

Description

Refits the chosen subset on the original training data via OLS or ridge (Gaussian / binomial / multinomial as appropriate) and predicts on newx. Note: the training data is not stored on the fit object, so newx must be supplied along with x_train and y_train.

Usage

## S3 method for class 'combss'
predict(object, newx, x_train, y_train, k = NULL, ...)

Arguments

object

A combss object.

newx

Numeric matrix of new observations.

x_train, y_train

Original training data used to refit on the chosen subset.

k

Subset size to use. Defaults to the best k if validation data was used, else the largest k evaluated.

...

Unused.

Value

Numeric vector of predictions for newx.

Examples

set.seed(1)
x <- matrix(rnorm(100 * 20), 100, 20)
y <- as.numeric(x[, 1] + x[, 2] + rnorm(100) * 0.1)
fit <- combss(x, y, family = "gaussian", q = 5)
newx <- matrix(rnorm(5 * 20), 5, 20)
predict(fit, newx = newx, x_train = x, y_train = y, k = 2)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.