The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
BigDataStatMeth 2.0.1
- Fixed vignette build failure on Debian: replaced external CSV file
with reproducible synthetic data in the
hdf5_import
example.
BigDataStatMeth 2.0.0
This is a major release of BigDataStatMeth. It introduces a new
HDF5Matrix user-facing interface, standard S3 methods for
HDF5-backed matrices, and an extended C++ infrastructure for block-wise
statistical computing with HDF5 files.
Major changes
- Added the
HDF5Matrix interface for working with
matrices stored in HDF5 files.
- Added S3 methods that allow HDF5-backed matrices to be used with
familiar R calls such as
dim(), [,
[<-, %*%, crossprod(),
tcrossprod(), scale(), cor(),
svd(), prcomp(), qr(),
chol(), and solve().
- Reorganized the package around a standard R interface backed by a
C++ computational infrastructure for block-wise statistical
computing.
- Added global configuration through
hdf5matrix_options()
for common settings such as parallel execution, number of threads, block
size, and HDF5 compression.
User-facing functionality
- Added or extended support for creating, opening, inspecting, and
closing HDF5-backed matrices.
- Added support for subsetting, assignment, dimension names, and
conversion to in-memory R objects.
- Added S3 methods for element-wise arithmetic operations on
HDF5Matrix objects.
- Added S3 support for matrix algebra operations, including
%*%, crossprod(), tcrossprod(),
cbind(), and rbind().
- Added aggregation and summary methods, including
colSums(), rowSums(), colMeans(),
rowMeans(), colVars(), rowVars(),
colSds(), rowSds(), colMins(),
rowMins(), colMaxs(), rowMaxs(),
mean(), var(), and sd().
- Added support for statistical transformations including
scale(), sweep(), and cor().
- Added or extended methods for matrix decompositions and
factorizations, including
svd(), prcomp(),
qr(), chol(), solve(),
eigen(), and pseudoinverse().
- Added diagonal operations and utilities, including
diag(), diag<-(), diag_op(),
and diag_scale().
- Added split, reduce, and apply utilities for HDF5-backed
workflows.
- Retained additional high-level
bd* utilities for
specialized workflows that do not map directly to standard R
generics.
HDF5 storage and resource
management
- Added
list_datasets() for inspecting datasets stored in
HDF5 files.
- Added
hdf5_close_all() for closing open HDF5 handles
managed by the package.
- Improved HDF5 handle and pointer management, including safer
behavior when overwriting datasets and when unloading or reloading the
package.
- Added HDF5 compression handling and propagation of compression
settings across output datasets.
- Added support for HDF5 file space management so that free space
released by deleted datasets can be reused by subsequent writes in files
created by the package.
- Added support for block-wise and parallel execution settings through
hdf5matrix_options().
- Added TSQR support for tall-skinny QR decompositions, with automatic
method selection when appropriate.
- Improved SVD and PCA handling for HDF5-backed matrices, including
support for truncated outputs and block-wise computation.
- Improved QR, Cholesky, inverse, eigen decomposition, and
pseudoinverse workflows for HDF5-backed matrices.
- Removed redundant OpenMP critical sections where HDF5 locking
already provides the required synchronization.
- Improved thread handling for CRAN-compatible execution.
C++ infrastructure
- Extended the C++ backend with classes and routines for managing HDF5
files, groups, and datasets.
- Exposed reusable block-wise computational infrastructure through the
package headers for developers implementing new Rcpp-based methods.
- Improved integration between the R interface and the C++ backend for
matrix operations, decompositions, and HDF5 resource management.
- Added or improved C++ implementations used by the R/S3 interface for
matrix algebra, decompositions, transformations, and HDF5-backed
statistical operations.
Documentation and examples
- Reworked the main package vignette around the
HDF5Matrix interface and standard R methods.
- Improved the package-level help page opened by
help("BigDataStatMeth").
- Cleaned examples to use temporary files and avoid writing to the
user’s working directory.
- Removed obsolete vignette material and outdated installation
examples.
- Moved example data to the package
extdata
directory.
- Improved documentation consistency for HDF5 terminology and package
usage.
Bug fixes and reliability
- Fixed error propagation in matrix multiplication and crossproduct
paths.
- Fixed signed integer overflow in Cholesky block-size computation:
minimumBlockSize promoted from int to
long in
Cholesky_decomposition_intermediate_hdf5 and
Inverse_of_Cholesky_decomposition_intermediate_hdf5 to
prevent overflow before promotion to double for
sqrt(). Detected by gcc-ASAN on R-hub.
- Fixed index bug in block SVD nzeros threshold loop
(
matrixSvdBlock.hpp): wrong singular value was evaluated in
the rank-truncation check, causing incorrect nzeros counts.
- Fixed thread-safety crash in multi-level block SVD
(
H5SL_insert from H5I_register) when the
number of hierarchical levels q >= 2:
Next_level_SvdBlock_decomposition_hdf5 is now sequential;
the outer block loop already provides sufficient parallelism.
- Fixed
nev parameter not being applied to per-block
truncation in First_level_SvdBlock_decomposition_hdf5: the
parameter was declared in the function signature but never used, causing
unnecessarily large intermediate matrices and a final SVD over more
components than requested.
- Fixed several HDF5 pointer, handle, and finalizer edge cases.
- Fixed portability issues related to platform-specific memory
queries.
- Fixed several edge cases in diagonal operations, matrix-vector
operations, sparse multiplication internals, and SVD/PCA parameter
handling.
- Improved cleanup of HDF5 handles during package unloading.
- Improved behavior when repeatedly creating, overwriting, closing,
and reopening HDF5-backed datasets.
BigDataStatMeth 1.0.3
- Minor documentation and example updates.
- Version previously available on CRAN before the 2.0.0 redesign.
BigDataStatMeth 1.0.2
Major changes
- Reduced example execution time for CRAN compliance
- Fixed bdblockmult_hdf5 example (< 5 seconds now)
Bug fixes
- Fixed Makevars configuration issue causing compilation warnings
BigDataStatMeth 1.0.1
CRAN Resubmission Fixes
- Thread management: Implemented respect for
OMP_THREAD_LIMIT environment variable
- Default threads: Limited to maximum 2 threads on
CRAN servers to prevent excessive CPU usage
- Documentation: Replaced all Unicode characters with
proper LaTeX macros (, )
- HTML validation: Corrected invalid HTML tags in all
documentation files
- ATLAS compatibility: Tested with standard
BLAS/LAPACK configurations
Bug Fixes
- Fixed thread safety in parallel HDF5 operations
- Improved file locking mechanisms for concurrent access
- Corrected dimension handling in transposed operations
- Enhanced error messages for invalid inputs
Documentation Improvements
- Updated all function examples with proper mathematical notation
- Added comprehensive CRAN submission notes
- Improved vignette with clearer explanations
- Fixed formatting issues in Rd files
BigDataStatMeth 1.0.0
Major changes
- Complete rewrite of the package after the archived version
0.99.32.
- New block-wise computing framework for large matrices stored in
HDF5.
- New C++ backend integrated with R through Rcpp and Rhdf5lib.
- API redesigned; not backwards compatible with previous
versions.
- Package now focuses on providing scalable building blocks to develop
new statistical methods for large datasets.
New features
- Block-wise matrix multiplication for in-memory and HDF5-backed
matrices.
- Block-wise SVD and PCA implementations.
- Block-wise QR decomposition.
- Block-wise crossproduct and matrix operations using HDF5
storage.
- Canonical Correlation Analysis (CCA) for HDF5 matrices via
bdCCA_hdf5_rcpp().
- Improved HDF5 import utilities, including support for large text
files.
- Support for parallel computation (OpenMP) when available.
- Low memory footprint for large matrices exceeding system RAM.
Improvements
- Substantial performance improvements in matrix operations on HDF5
data.
- More robust dimension handling and HDF5 metadata management.
- Unified interface for in-memory and on-disk data.
- Better error handling and validation on input dimensions and block
sizes.
Removed
- Old implementations from the 0.99.x series have been removed.
- Deprecated functions and APIs have been replaced by the new
block-wise framework.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.