The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

SVD wrapper, PCA and the bi_projector

1. Why wrap SVD at all?

There are six popular SVD engines in R (base::svd, corpcor, RSpectra, irlba, rsvd, svd (PROPACK)) – each with its own argument list, naming conventions and edge-cases (some refuse to return the full rank, others crash on tall-skinny matrices).

svd_wrapper() smooths that out:

That means immediate access to verbs such as project(), reconstruct(), truncate(), partial_project().

set.seed(1)
X <- matrix(rnorm(35*10), 35, 10)   # 35 obs × 10 vars

sv_fast <- svd_wrapper(X, ncomp = 5, preproc = center(), method = "fast")

# irlba backend (if installed) gives identical results
sv_irlba <- if (requireNamespace("irlba", quietly = TRUE)) {
  suppressWarnings(svd_wrapper(X, ncomp = 5, preproc = center(), method = "irlba"))
}

# Same downstream code works for both objects:
head(scores(sv_fast)) # 35 × 5
#>            [,1]       [,2]       [,3]        [,4]        [,5]
#> [1,] -2.9415181 -1.6140167  0.2117456  0.12109736 -0.46419317
#> [2,]  0.4743086  0.3458298 -0.8467096 -1.21167498  0.02074819
#> [3,] -1.6999172 -1.1535717 -1.0276227 -0.33535843  0.37155930
#> [4,]  0.1131790  0.7789166 -0.7394153  0.43625966  2.24260205
#> [5,]  0.8437314 -1.7600608 -0.8939140  0.77861595  0.81936957
#> [6,]  0.6063990 -1.8810077  1.2246519  0.03652504 -1.40433408

if (!is.null(sv_irlba)) {
  all.equal(scores(sv_fast), scores(sv_irlba))
}
#> [1] "Mean relative difference: 0.7832173"

2. A one-liner pca()

Most people really want PCA, so pca() is a thin wrapper that

  1. calls svd_wrapper() with sane defaults,
  2. adds the S3 class “pca” (printing, screeplot, biplot, permutation test, …).
data(iris)
X_iris <- as.matrix(iris[, 1:4])

pca_fit <- pca(X_iris, ncomp = 4)    # defaults to method = "fast", preproc=center()
print(pca_fit)
#> PCA object  -- derived from SVD
#> 
#> Data: 150 observations x 4 variables
#> Components retained: 4
#> 
#> Variance explained (per component):
#>  1 2 3 4  92.46  5.31  1.71  0.52%  (cumulative:  92.46 97.77 99.48   100%)

2.1 Scree-plot and cumulative variance

screeplot(pca_fit, type = "lines", main = "Iris PCA – scree plot")

2.2 Quick biplot

# Requires ggrepel for repulsion, but works without it
biplot(pca_fit, repel_points = TRUE, repel_vars = TRUE)

(If you do not have ggrepel installed the text is placed without repulsion.)

3. What is a bi_projector?

Think bidirectional mapping:

data space  (p variables)  ↔  component space  (d ≤ p)
        new samples:  project()        ← scores
       new variables: project_vars()   ← loadings
                     reconstruction ↔  (scores %*% t(loadings))

A bi_projector therefore carries

slot shape description
v p × d component loadings (columns)
s n × d score matrix (rows = observations)
sdev d singular values (or SDs related to components)
preproc fitted transformer so you never leak training stats

Because pca() returns a bi_projector, you get other methods for free:

# rank-2 reconstruction of the iris data
Xhat2 <- reconstruct(pca_fit, comp = 1:2)
print(paste("MSE (rank 2):", round(mean((X_iris - Xhat2)^2), 4))) # MSE ~ 0.076
#> [1] "MSE (rank 2): 0.0253"

# drop to 2 PCs everywhere
pca2 <- truncate(pca_fit, 2)
shape(pca2)            # 4 vars × 2 comps
#> [1] 4 2

4. Fast code-coverage cameo

The next chunk quietly touches a few more branches used in the unit tests (std_scores(), perm_test(), rotate()), but keeps printing to a minimum:

# std scores
head(std_scores(svd_wrapper(X, ncomp = 3))) # Use the earlier X data
#>            [,1]       [,2]         [,3]
#> [1,] -2.1517996 -1.2898029 -0.009068656
#> [2,]  0.2706758  0.3540074 -0.658705409
#> [3,] -1.3315759 -0.7788579 -1.206524820
#> [4,] -0.0595748  0.7971995 -0.971493443
#> [5,]  0.5035052 -1.2105838 -1.291893170
#> [6,]  0.4441909 -1.5304768  0.866038002

# tiny permutation test (10 perms; obviously too few for science)
# This requires perm_test.pca method
# Make sure X_iris is centered if perm_test needs centered data
perm_res <- perm_test(pca_fit, X_iris, nperm = 10, comps = 2)
#> Pre-calculating reconstructions for stepwise testing...
#> Running 10 permutations sequentially for up to 2 PCA components (alpha=0.050, serial)...
#>   Testing Component 1/2...
#>   Component 1 p-value (0.09091) > alpha (0.050). Stopping sequential testing.
print(perm_res$component_results)
#> # A tibble: 1 × 5
#>    comp observed   pval lower_ci upper_ci
#>   <int>    <dbl>  <dbl>    <dbl>    <dbl>
#> 1     1    0.925 0.0909    0.682    0.689

# quick varimax rotation
if (requireNamespace("GPArotation", quietly = TRUE)) {
  pca_rotated <- rotate(pca_fit, ncomp = 3, type = "varimax")
  print(pca_rotated)
} else {
  cat("GPArotation not installed, skipping rotation example.\n")
}
#> PCA object  -- derived from SVD
#> 
#> Data: 150 observations x 4 variables
#> Components retained: 4
#> 
#> Variance explained (per component):
#>  1 2 3 4  46.56  3.92  5.68 43.84%  (cumulative:  46.56 50.48 56.16   100%)
#> 
#> Explained variance from rotation:
#> 82.9 %
#>  6.98 %
#>  10.12 %
#>  
#> 
#> Rotation details:
#>   Type: varimax 
#>   Loadings type: N/A (orthogonal)

(Running these once in the vignette means they are also executed by R CMD check, bumping test-coverage without extra scaffolding.)

5. Take-aways


Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.3
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] glmnet_4.1-10      Matrix_1.7-3       knitr_1.51         tibble_3.3.1      
#> [5] dplyr_1.1.4        ggplot2_4.0.1      multivarious_0.3.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] GPArotation_2025.3-1 utf8_1.2.6           sass_0.4.10         
#>  [4] future_1.68.0        generics_0.1.4       shape_1.4.6.1       
#>  [7] lattice_0.22-7       listenv_0.10.0       digest_0.6.39       
#> [10] magrittr_2.0.4       evaluate_1.0.5       grid_4.5.1          
#> [13] RColorBrewer_1.1-3   iterators_1.0.14     fastmap_1.2.0       
#> [16] foreach_1.5.2        jsonlite_2.0.0       ggrepel_0.9.6       
#> [19] RSpectra_0.16-2      survival_3.8-3       mgcv_1.9-3          
#> [22] scales_1.4.0         pls_2.8-5            codetools_0.2-20    
#> [25] jquerylib_0.1.4      cli_3.6.5            crayon_1.5.3        
#> [28] rlang_1.1.7          chk_0.10.0           parallelly_1.45.1   
#> [31] future.apply_1.20.0  splines_4.5.1        withr_3.0.2         
#> [34] cachem_1.1.0         yaml_2.3.12          otel_0.2.0          
#> [37] tools_4.5.1          parallel_4.5.1       corpcor_1.6.10      
#> [40] globals_0.18.0       rsvd_1.0.5           assertthat_0.2.1    
#> [43] vctrs_0.7.0          R6_2.6.1             matrixStats_1.5.0   
#> [46] proxy_0.4-27         lifecycle_1.0.5      MASS_7.3-65         
#> [49] irlba_2.3.5.1        pkgconfig_2.0.3      pillar_1.11.1       
#> [52] bslib_0.9.0          geigen_2.3           gtable_0.3.6        
#> [55] glue_1.8.0           Rcpp_1.1.1           xfun_0.55           
#> [58] tidyselect_1.2.1     svd_0.5.8            farver_2.1.2        
#> [61] nlme_3.1-168         htmltools_0.5.9      labeling_0.4.3      
#> [64] rmarkdown_2.30       compiler_4.5.1       S7_0.2.1

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.