The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
When working with similarity or kernel matrices, the computational cost grows rapidly:
For N = 10,000, that’s 100 million entries and billions of operations. For N = 100,000, it becomes intractable.
The Nyström method provides a fast approximation by using only a subset of “landmark” points:
This reduces complexity from O(N³) to O(Nm²) — a massive speedup when m is small.
set.seed(42)
N <- 1000
p <- 50
X <- matrix(rnorm(N * p), N, p)
# Nyström approximation with linear kernel (default)
# Uses only 100 landmarks instead of all 1000 points
ny_fit <- nystrom_approx(
X,
ncomp = 10,
nlandmarks = 100,
preproc = center()
)
print(ny_fit)
#> A bi_projector object with the following properties:
#>
#> Dimensions of the weights (v) matrix:
#> Rows: 1000 Columns: 10
#>
#> Dimensions of the scores (s) matrix:
#> Rows: 1000 Columns: 10
#>
#> Length of the standard deviations (sdev) vector:
#> Length: 10
#>
#> Preprocessing information:
#> A finalized pre-processing pipeline:
#> Step 1 : center
# Standard bi_projector interface
head(scores(ny_fit))
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1.45270292 0.03749928 -0.2368871 0.1187156 -0.416524468 0.58381281
#> [2,] -0.08553264 0.21098943 2.0174421 0.2780763 0.016232223 1.46634110
#> [3,] 1.35999975 2.38423969 -0.7357389 0.4051195 0.289000747 -0.24190588
#> [4,] -0.14581306 -0.32474655 -1.6893599 1.0902513 -0.972903175 0.03148513
#> [5,] 1.03295739 -0.85262013 -0.8882174 -0.2777049 -1.048920775 0.12657430
#> [6,] -0.98488361 0.15283689 -1.1922802 -0.5915995 -0.008715646 -2.10974250
#> [,7] [,8] [,9] [,10]
#> [1,] 0.1024324 -0.06344357 -1.8224682 -0.4924368
#> [2,] 1.1993310 0.27285706 1.4943231 -0.6030422
#> [3,] 0.9969271 -0.70227209 -1.0836289 0.5889711
#> [4,] 1.1429161 -0.14968205 0.7300808 1.3913713
#> [5,] -0.6772174 -0.36304203 0.4231099 1.5737886
#> [6,] 0.0955964 1.30091786 1.7640812 0.7613047By default, nystrom_approx() uses a linear kernel. You
can provide any kernel function:
# RBF (Gaussian) kernel
rbf_kernel <- function(X, Y = NULL, sigma = 1) {
if (is.null(Y)) Y <- X
sumX2 <- rowSums(X^2)
sumY2 <- rowSums(Y^2)
sqdist <- outer(sumX2, sumY2, `+`) - 2 * tcrossprod(X, Y)
sqdist[sqdist < 0] <- 0
exp(-sqdist / (2 * sigma^2))
}
ny_rbf <- nystrom_approx(
X,
kernel_func = rbf_kernel,
ncomp = 10,
nlandmarks = 100
)The result is a bi_projector, so you can project new
observations:
For very large datasets, the “double” method applies the approximation twice, reducing complexity further:
# Standard method
system.time(
ny_standard <- nystrom_approx(X, ncomp = 5, nlandmarks = 200, method = "standard")
)
#> user system elapsed
#> 0.010 0.000 0.011
# Double Nyström (faster with intermediate rank l)
system.time(
ny_double <- nystrom_approx(X, ncomp = 5, nlandmarks = 200, method = "double", l = 50)
)
#> user system elapsed
#> 0.014 0.001 0.015| Dataset Size | Suggested Landmarks | Notes |
|---|---|---|
| < 1,000 | 50–200 | Larger fraction acceptable |
| 1,000–10,000 | 200–500 | Good accuracy/speed balance |
| > 10,000 | 500–2,000 | Consider Double Nyström |
lThe Nyström approximation uses the fact that a positive semi-definite matrix K can be approximated as:
\[K \approx K_{nm} K_{mm}^{-1} K_{mn}\]
where: - \(K_{mm}\) is the m × m matrix between landmarks - \(K_{nm}\) is the N × m matrix between all points and landmarks
The eigendecomposition of this approximation can be computed efficiently: 1. Compute eigendecomposition of \(K_{mm} = U_m \Lambda_m U_m^T\) 2. Approximate eigenvectors of K as: \(U \approx K_{nm} U_m \Lambda_m^{-1/2}\)
Double Nyström applies this approximation recursively, reducing complexity from O(Nm² + m³) to O(Nml + l³) where l << m.
pca() for standard PCA on smaller datasetssvd_wrapper() for various SVD backendsThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.