The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
llmimpute provides missing data imputation through two complementary engines:
The package automatically selects the appropriate engine at runtime.
# Install from CRAN
install.packages("llmimpute")library(llmimpute)
# Example dataset with missing values
df <- data.frame(
age = c(45L, NA, 38L, 62L, 29L),
bp = c(130, 140, 120, 155, NA),
smoker = c("No", "Yes", "No", NA, "No"),
stringsAsFactors = FALSE
)
# 1. Diagnose missingness (no API call)
lmi_diagnose(df)
# 2. Impute — offline fallback used automatically when no API key is set
result <- lmi_impute(df)
# 3. Access results
result$data # imputed data frame
result$imputations # audit trail with confidence scores and reasoning
summary(result) # per-column statistics
# 4. Export to disk
lmi_export(result, path = tempdir(), prefix = "my_study")When operating offline, lmi_impute() delegates to
lmi_impute_offline(). You can also call it directly.
# List all 19 available offline methods
lmi_methods()
# Use a specific method
result_rf <- lmi_impute(df, offline = TRUE, offline_method = "random_forest")
result_si <- lmi_impute(df, offline = TRUE, offline_method = "softimpute")
result_br <- lmi_impute(df, offline = TRUE, offline_method = "bayesian_ridge")
# Let the package choose per column (default)
result_auto <- lmi_impute(df)The "auto" selector chooses the best algorithm per
column based on:
LLM mode requires a valid Anthropic API key obtained from
https://console.anthropic.com. Store the key in
.Renviron:
ANTHROPIC_API_KEY=sk-ant-api03-...
library(llmimpute)
# Set key for this session (reads ANTHROPIC_API_KEY from environment)
lmi_set_api_key()
# Impute with domain context
result <- lmi_impute(df, domain = "healthcare")
# Flag anomalous existing values in addition to imputing
result2 <- lmi_impute(df, domain = "healthcare", flag_suspicious = TRUE)
result2$suspicious # data.frame of flagged cellsThe domain argument guides the LLM’s reasoning:
| Value | Use when |
|---|---|
"general" |
Mixed or unknown data |
"healthcare" |
Medical records, clinical data |
"financial" |
Economic indicators, transactions |
"hr" |
Employee records, HR data |
"survey" |
Questionnaire or Likert-scale data |
"scientific" |
Lab measurements, research data |
# See available models
lmi_models()
# Higher capability (slower, more expensive)
lmi_set_model("claude-opus-4-20250514")
# Faster and cheaper
lmi_set_model("claude-haiku-4-5-20251001")Every imputed cell is recorded in
result$imputations:
head(result$imputations)
# row col original imputed confidence reasoning
# 1 2 age NA 45 72 knn ...
# 2 5 bp NA 130 68 mean ...The confidence column (0–100) reflects how reliably the
method can estimate the missing value given the available data. Filter
low-confidence imputations before downstream modelling:
high_conf <- result$imputations[result$imputations$confidence >= 70, ]For data frames with more than 50 rows, lmi_impute()
automatically chunks the data for LLM calls. Adjust
max_rows to balance API rate limits against context
quality:
result <- lmi_impute(big_df, domain = "financial", max_rows = 30L,
verbose = TRUE)The offline engine processes all rows in a single pass without chunking.
lmi_diagnose() first — free, instant, and shows the
full missing map.seed.lmi_result object with
lmi_export(..., format = "rds") to preserve the full audit
trail alongside the imputed data."softimpute" or "random_forest" offer
excellent accuracy without API costs.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.