Introduction to smriti: Structural Variance Preservation

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

The Imputation Uncertainty Principle

Modern machine learning imputation algorithms (like missForest) excel at minimizing point-wise prediction error (RMSE). However, this point-wise optimization inherently shrinks the variance of the imputed values, causing structural variance collapse. In longitudinal Growth Curve Models (GCM), this crushes the latent slope variance (\(\sigma^2_S\)), destroying the statistical power needed to track patient trajectories over time.

The smriti package resolves this by decoupling prediction from structural geometry. It utilizes a two-stage architecture: 1. Initialization: Non-parametric imputation bridges the missingness to establish a dense matrix. 2. Lagrangian Projection: A C++ gradient descent layer forces the hallucinated data onto a target covariance manifold, constrained by a Lagrangian multiplier (\(\lambda\)).

The Robustness-Efficiency Tradeoff

Real-world clinical data often contains heavy-tailed skew or corrupted sensor artifacts. The smriti_impute() function handles this via the robust routing toggle.

robust = FALSE: Utilizes standard pairwise complete covariance. Ideal for perfectly Normal data or naturally heavy-tailed biological distributions (e.g., Lognormal structural neuroimaging).
robust = TRUE: Utilizes the Minimum Covariance Determinant (MCD) estimator. It isolates the densest core of the data, creating a target manifold that is mathematically immune to severe clinical outliers (e.g., broken EHR sensors).

Core Implementation: Handling Gradient Explosion

To prevent gradient explosion in the C++ backend when projecting high-magnitude clinical markers (e.g., Hippocampal volumes \(\approx 7000\)), smriti enforces internal Z-score standardization. The data is scaled to \(\mu=0, \sigma^2=1\) prior to Lagrangian optimization, and un-scaled upon convergence, ensuring absolute numerical stability.

Example: Shielding Against Corrupted EHR Data

library(smriti)
library(missForest)

# Load clinical data with structural missingness and sensor artifacts
data <- read.csv("clinical_proxy.csv")

# Execute robust refinement to isolate the structural manifold
clean_data <- smriti_impute(
  data = data, 
  time_cols = c("T1", "T2", "T3", "T4"), 
  robust = TRUE,
  lambda = 0.5
)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.