The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Modern machine learning imputation algorithms (like
missForest) excel at minimizing point-wise prediction error
(RMSE). However, this point-wise optimization inherently shrinks the
variance of the imputed values, causing structural variance
collapse. In longitudinal Growth Curve Models (GCM), this
crushes the latent slope variance (\(\sigma^2_S\)), destroying the statistical
power needed to track patient trajectories over time.
The smriti package resolves this by decoupling
prediction from structural geometry. It utilizes a two-stage
architecture: 1. Initialization: Non-parametric
imputation bridges the missingness to establish a dense matrix. 2.
Lagrangian Projection: A C++ gradient descent layer
projects the hallucinated data toward a target covariance manifold while
preserving fidelity to the initial imputed values. The augmented loss
function is
\[L(X) = \frac{1}{2}\|X - X_{\text{imp}}\|_F^2 + \frac{\lambda}{2}\|\operatorname{cov}(X) - \Sigma_{\text{target}}\|_F^2\]
where the first term anchors the solution near the initial imputation and the second (governed by \(\lambda\)) enforces the covariance structure.
Real-world clinical data often contains heavy-tailed skew or
corrupted sensor artifacts. The smriti_impute() function
handles this via the robust routing toggle:
robust = FALSE: Uses pairwise-complete Pearson
covariance, projected to the nearest positive-semidefinite matrix to
correct any non-PSD artefacts from pairwise deletion. Best for
well-behaved, approximately-Normal data.robust = TRUE: Constructs the target from pairwise
Spearman correlations (rank-based, outlier-resistant) and column-wise
MAD scale estimates. The resulting matrix is projected to the nearest
PSD manifold, producing a target that is structurally robust to severe
outliers (e.g., broken EHR sensors).The penalty weight lambda controls the trade-off between
preserving the original imputation values and matching the target
covariance. At lambda = 1.0 (the default) both objectives
are weighted equally. Increasing lambda enforces the
covariance constraint more strictly but allows greater deviation from
the initial imputation. The learning_rate (default
0.001) governs gradient step size; max_iter
(default 2000) bounds the optimisation.
library(smriti)
library(missForest)
# Load clinical data with structural missingness and sensor artifacts
data <- read.csv("clinical_proxy.csv")
# Execute robust refinement to isolate the structural manifold
clean_data <- smriti_impute(
data = data,
time_cols = c("T1", "T2", "T3", "T4"),
robust = TRUE,
lambda = 1.0
)These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.