The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
cvLM 2.0.0
Major Changes & Breaking
Updates
- New Default Behavior: Data centering
(
center = TRUE) is now the default. This ensures that the
intercept is not penalized in ridge regression, aligning the package
with standard statistical methodologies.
- API Cleanup: Removed the
verbose
argument from grid.search. The new C++ backend evaluates
the lambda grid analytically, rendering progress bars unnecessary.
- Refined Object Inheritance: For
lm and
glm methods, subset and na.action
are now handled by the model object prior to cross-validation, ensuring
consistency with the original model fit.
The engine has been transitioned from RcppEigen to RcppArmadillo,
allowing the package to leverage high-performance LAPACK and BLAS
libraries for large-scale matrix operations.
- SVD-Powered Grid Search:
grid.search
has been entirely rewritten in C++. It now utilizes a single
Singular Value Decomposition (SVD) to evaluate the
entire \(\lambda\) grid analytically.
- Efficiency: Reduces computational complexity from
\(O(np^2)\) per grid point to \(O(\min(n, p))\) after the initial
decomposition.
- Parallel Computation: Refined and further
integrated
RcppParallel to distribute workloads.
- For K-fold CV, threads are distributed across folds.
- For GCV/LOOCV grid searches, threads are distributed across the
\(\lambda\) grid.
- Optimized LOOCV/GCV: Implemented closed-form
solutions for Leave-One-Out and Generalized Cross-Validation using the
hat-matrix diagonal, avoiding \(n\)
model refits.
Numerical Robustness
- OLS Evolution: Transitioned from Column-Pivoted QR
decomposition to Complete Orthogonal Decomposition
(COD). This enables the computation of the unique minimum \(L_2\) norm solution for column
rank-deficient or underdetermined (\(p >
n\)) systems.
- Ridge Evolution: Transitioned from Cholesky-based
methods to Singular Value Decomposition (SVD). This
avoids the numerical risks associated with forming the cross-product
matrix \(X^TX\) and ensures stability
in ill-conditioned settings.
- Precision Control: Added a
tol
(tolerance) parameter to define the threshold for numerical rank
estimation during COD and SVD operations.
Internal Improvements
- Template Metaprogramming: Re-engineered core logic
to utilize generic, templated C++ code, shifting significant
computational evaluation to compile-time and reducing runtime
overhead.
- C++17 Migration: Upgraded the package build
standard to C++17, enabling more expressive syntax and modern compiler
optimizations.
- Memory Optimization: Refactored multi-threaded
workers to utilize pre-allocated buffers. By eliminating heap
allocations within “hot loops” (specifically during data training and
out-of-sample evaluation), the engine achieves significantly higher
throughput and lower latency.
- Armadillo Expression Tuning: Optimized the use of
Armadillo expression templates to maximize lazy evaluation. This
minimizes the creation of temporary objects and allows the compiler to
generate more efficient SIMD-augmented computation loops.
- Comprehensive Testing Suite:
- R Integration: Implemented extensive
testthat suites to validate cvLM and
grid.search against manual matrix algebra and established
packages like boot.
- Numerical Validation: Tests specifically target
edge cases including ill-conditioned, rank-deficient, and
high-dimensional (\(p > n\))
datasets.
- Zero-Copy Interoperability: Utilizes Armadillo’s
advanced memory mapping to interface directly with R-allocated memory,
ensuring zero-copy data passing between R and C++.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.