Introduction to smriti: Structural Variance Preservation

The Imputation Uncertainty Principle

Modern machine learning imputation algorithms (like missForest) excel at minimizing point-wise prediction error (RMSE). However, this point-wise optimization inherently shrinks the variance of the imputed values, causing structural variance collapse. In longitudinal Growth Curve Models (GCM), this crushes the latent slope variance (\(\sigma^2_S\)), destroying the statistical power needed to track patient trajectories over time.

The smriti package resolves this by decoupling prediction from structural geometry. It utilizes a two-stage architecture: 1. Initialization: Non-parametric imputation bridges the missingness to establish a dense matrix. 2. Lagrangian Projection: A C++ gradient descent layer forces the hallucinated data onto a target covariance manifold, constrained by a Lagrangian multiplier (\(\lambda\)).

The Robustness-Efficiency Tradeoff

Real-world clinical data often contains heavy-tailed skew or corrupted sensor artifacts. The smriti_impute() function handles this via the robust routing toggle.

Core Implementation: Handling Gradient Explosion

To prevent gradient explosion in the C++ backend when projecting high-magnitude clinical markers (e.g., Hippocampal volumes \(\approx 7000\)), smriti enforces internal Z-score standardization. The data is scaled to \(\mu=0, \sigma^2=1\) prior to Lagrangian optimization, and un-scaled upon convergence, ensuring absolute numerical stability.

Example: Shielding Against Corrupted EHR Data

library(smriti)
library(missForest)

# Load clinical data with structural missingness and sensor artifacts
data <- read.csv("clinical_proxy.csv")

# Execute robust refinement to isolate the structural manifold
clean_data <- smriti_impute(
  data = data, 
  time_cols = c("T1", "T2", "T3", "T4"), 
  robust = TRUE,
  lambda = 0.5
)