The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Sparse Sufficient Dimension Reduction via Penalized Principal Machines

Jungmin Shin (The Ohio State University)

Seung Jun Shin (Korea University)

2026-06-13

1 Introduction

Sufficient dimension reduction (SDR) reduces the dimensionality of predictors \(\mathbf{X}\) while preserving their relationship with a response \(Y\). SDR estimates a basis \(\mathbf{B}\) of the central subspace \(\mathcal{S}_{Y\mid\mathbf{X}}\) defined by \[ Y \perp \mathbf{X} \mid \mathbf{B}^\top \mathbf{X}. \] In high dimensions, sparse SDR improves interpretability and accuracy by driving many rows of \(\mathbf{B}\) to zero, which is achieved by adding a sparsity-inducing penalty to the SDR optimization problem.

The ppmSDR package implements a unified framework for sparse SDR based on the penalized principal machine (\(\mathrm{P}^2\mathrm{M}\)). A single front-end, ppm(), dispatches to ten loss-specific estimators, all fitted by one group coordinate descent (GCD) engine, and ppm_tune() selects the sparsity parameter by cross-validation.

2 Penalized Principal Machines

2.1 Principal machine

Given data \((y_i, \mathbf{x}_i) \in \mathbb{R} \times \mathbb{R}^p\) with centered predictors, and a sequence of cutoffs \(r_1 < \cdots < r_h\), the sample principal machine solves \[ (\beta_{0k}, \boldsymbol{\beta}_k) = \arg\min_{\beta_0, \boldsymbol{\beta}} \boldsymbol{\beta}^\top \hat{\Sigma} \boldsymbol{\beta} + \frac{c}{n} \sum_{i=1}^n L_k\!\left(\tilde{y}_{ik}, \beta_0 + \boldsymbol{\beta}^\top \mathbf{x}_i\right), \quad k = 1, \ldots, h, \] where \(\hat{\Sigma} = \sum_i \mathbf{x}_i \mathbf{x}_i^\top / n\). The basis \(\hat{\mathbf{B}}\) is estimated by the leading \(d\) eigenvectors of \(\sum_{k=1}^h \hat{\boldsymbol{\beta}}_k \hat{\boldsymbol{\beta}}_k^\top\).

2.2 Penalized estimation

Under the sparsity assumption the slopes \(\boldsymbol{\beta}_k\) share a common support across \(k\), leading to the row-group penalized objective \[ \sum_{k=1}^{h} \left[ \boldsymbol{\beta}_k^\top \hat{\Sigma} \boldsymbol{\beta}_k + \frac{c}{n} \sum_{i=1}^n L_k(\tilde{y}_{ik}, \beta_{0k} + \boldsymbol{\beta}_k^\top \mathbf{x}_i) \right] + \sum_{j=1}^{p} p_{\lambda}\!\left(\|\boldsymbol{\beta}_{(j)}\|_2\right), \] where \(\boldsymbol{\beta}_{(j)} = (\beta_{1j}, \ldots, \beta_{hj})^\top\) and \(p_\lambda(\cdot)\) is the group LASSO, SCAD or MCP penalty. The penalty acts group-wise on all coefficients of predictor \(j\), so variable selection corresponds to identifying the predictors that form a sparse basis.

2.3 Supported losses and algorithms

Machine Response Type Loss \(L_k(\tilde y_k, f)\) Algorithm
\(\mathrm{P}^2\mathrm{LSM}\) Continuous RPM \((1 - \tilde y_k f)^2\) GCD
\(\mathrm{P}^2\mathrm{WLSM}\) Binary LPM \(w_k(1 - y f)^2\) GCD
\(\mathrm{P}^2\mathrm{LR}\) Continuous RPM \(\log(1 + e^{-\tilde y_k f})\) Iterative GCD
\(\mathrm{P}^2\mathrm{WLR}\) Binary LPM \(w_k \log(1 + e^{-y f})\) Iterative GCD
\(\mathrm{P}^2\mathrm{AR}\) Both LPM \((y-f)^2(\rho_k\mathbb{I}\{y \ge f\} + (1-\rho_k)\mathbb{I}\{y<f\})\) Iterative GCD
\(\mathrm{P}^2\mathrm{L2M}\) Continuous RPM \([\max\{0, 1 - \tilde y_k f\}]^2\) Iterative GCD
\(\mathrm{P}^2\mathrm{WL2M}\) Binary LPM \(w_k[\max\{0, 1 - y f\}]^2\) Iterative GCD
\(\mathrm{P}^2\mathrm{SVM}\) Continuous RPM \(\max\{0, 1 - \tilde y_k f\}\) MM-GCD
\(\mathrm{P}^2\mathrm{WSVM}\) Binary LPM \(w_k \max\{0, 1 - y f\}\) MM-GCD
\(\mathrm{P}^2\mathrm{QR}\) Both LPM \((y-f)(\rho_k - \mathbb{I}\{y<f\})\) MM-GCD

The squared loss yields an exact group-penalized least-squares problem solved directly by GCD. Smooth losses (logistic, asymmetric least squares, L2-hinge) are reduced to a sequence of penalized least-squares problems via quadratic approximation (iterative GCD). Non-differentiable losses (hinge, check) are handled by quadratic majorization within a majorization-minimization scheme (MM-GCD).

3 Usage

The argument loss selects the estimator; penalty is one of "grSCAD", "grLasso" or "grMCP"; lambda controls sparsity and is best chosen by cross-validation (see below).

3.1 Regression

set.seed(1)
n <- 1000; p <- 10
B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1
x <- matrix(rnorm(n * p), n, p)
y <- (x %*% B[, 1]) / (0.5 + (x %*% B[, 2] + 1)^2) + 0.2 * rnorm(n)

## penalized principal least-squares SVM (P^2LSM)
fit <- ppm(x, y, H = 10, C = 1, loss = "lssvm", penalty = "grSCAD", lambda = 0.01)
round(fit$evectors[, 1:2], 3)
#>       [,1] [,2]
#>  [1,]    1    0
#>  [2,]    0    1
#>  [3,]    0    0
#>  [4,]    0    0
#>  [5,]    0    0
#>  [6,]    0    0
#>  [7,]    0    0
#>  [8,]    0    0
#>  [9,]    0    0
#> [10,]    0    0
summary(fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: lssvm   Penalty: grSCAD   lambda = 0.01
#> Working dimension d = 2
#> 
#> Estimated basis of the central subspace:
#>      Dir1   Dir2
#> x1  1e+00 -1e-04
#> x2  1e-04  1e+00
#> x3  0e+00  0e+00
#> x4  0e+00  0e+00
#> x5  0e+00  0e+00
#> x6  0e+00  0e+00
#> x7  0e+00  0e+00
#> x8  0e+00  0e+00
#> x9  0e+00  0e+00
#> x10 0e+00  0e+00
#> 
#> Selected variables (2 of 10): x1, x2

3.2 Binary classification

For a binary response the weighted losses code the classes internally as \(\{-1, +1\}\).

y.binary <- sign(y)
## penalized principal weighted least-squares SVM (P^2WLSM)
fit2 <- ppm(x, y.binary, H = 10, C = 1, loss = "asls",
            penalty = "grSCAD", lambda = 0.03)
round(fit2$evectors[, 1:2], 3)
#>        [,1]   [,2]
#>  [1,] 1.000 -0.005
#>  [2,] 0.005  1.000
#>  [3,] 0.000  0.000
#>  [4,] 0.000  0.000
#>  [5,] 0.005  0.004
#>  [6,] 0.000  0.000
#>  [7,] 0.006  0.001
#>  [8,] 0.000  0.000
#>  [9,] 0.000  0.000
#> [10,] 0.000  0.000

3.3 Choosing lambda by cross-validation

ppm_tune() performs \(K\)-fold cross-validation, selecting the lambda that maximizes the held-out distance correlation between the response and the estimated sufficient predictors. Call set.seed() beforehand for reproducible folds.

set.seed(1)
cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5,
               nlambda = 10, lambda.max = 0.02)
cv$opt.lambda
#> [1] 0.009283178
summary(cv$fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: lssvm   Penalty: grSCAD   lambda = 0.00928318
#> Working dimension d = 2
#> 
#> Estimated basis of the central subspace:
#>      Dir1   Dir2
#> x1  1e+00 -3e-04
#> x2  3e-04  1e+00
#> x3  0e+00  0e+00
#> x4  0e+00  0e+00
#> x5  0e+00  0e+00
#> x6  0e+00  0e+00
#> x7  0e+00  0e+00
#> x8  0e+00  0e+00
#> x9  0e+00  0e+00
#> x10 0e+00  0e+00
#> 
#> Selected variables (2 of 10): x1, x2

4 Real-data example: Wisconsin Diagnostic Breast Cancer

We illustrate a weighted estimator on the Wisconsin Diagnostic Breast Cancer (WDBC) data, coding the diagnosis as malignant = +1 and benign = -1. After fitting the penalized principal weighted L2-SVM (P2WL2M), the predictors are projected onto the estimated two-dimensional central subspace and plotted by class.

data(wdbc)
x <- scale(as.matrix(wdbc[, -1]))
y <- ifelse(wdbc$diagnosis == "M", 1, -1)

fit <- ppm(x, y, loss = "wl2svm", penalty = "grSCAD", lambda = 0.3)
summary(fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: wl2svm   Penalty: grSCAD   lambda = 0.3
#> Working dimension d = 2
#> 
#> Estimated basis of the central subspace:
#>                            Dir1    Dir2
#> radius_mean             -0.2733 -0.3393
#> texture_mean             0.0000  0.0000
#> perimeter_mean          -0.2929 -0.3362
#> area_mean               -0.2650  0.0599
#> smoothness_mean          0.0000  0.0000
#> compactness_mean        -0.0819 -0.0947
#> concavity_mean          -0.2452  0.0401
#> concave_points_mean     -0.3706  0.2268
#> symmetry_mean            0.0000  0.0000
#> fractal_dimension_mean   0.0000  0.0000
#> radius_se               -0.1022  0.3825
#> texture_se               0.0000  0.0000
#> perimeter_se            -0.0773  0.2433
#> area_se                 -0.0670  0.2180
#> smoothness_se            0.0000  0.0000
#> compactness_se           0.0000  0.0000
#> concavity_se             0.0000  0.0000
#> concave_points_se        0.0000  0.0000
#> symmetry_se              0.0000  0.0000
#> fractal_dimension_se     0.0000  0.0000
#> radius_worst            -0.3641  0.0832
#> texture_worst            0.0000  0.0000
#> perimeter_worst         -0.3715  0.0384
#> area_worst              -0.3223  0.4892
#> smoothness_worst         0.0000  0.0000
#> compactness_worst       -0.0743 -0.0829
#> concavity_worst         -0.1698 -0.2341
#> concave_points_worst    -0.3679 -0.3794
#> symmetry_worst           0.0000  0.0000
#> fractal_dimension_worst  0.0000  0.0000
#> 
#> Selected variables (15 of 30): radius_mean, perimeter_mean, area_mean, compactness_mean, concavity_mean, concave_points_mean, radius_se, perimeter_se, area_se, radius_worst, perimeter_worst, area_worst, compactness_worst, concavity_worst, concave_points_worst

B      <- fit$evectors[, 1:2]
scores <- x %*% B
plot(scores[, 1], scores[, 2],
     col  = ifelse(y == 1, "red", "blue"),
     pch  = ifelse(y == 1, 17, 1),
     xlab = "1st SDR direction", ylab = "2nd SDR direction")
legend("topright", legend = c("malignant (+1)", "benign (-1)"),
       col = c("red", "blue"), pch = c(17, 1))

The Boston housing data (data(boston)) provide a continuous-response counterpart for the response-based estimators such as "lssvm".

5 References

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.