The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
llm-join is an R package designed to use Large Language Models (LLMs, such as GPT-5, Claude, DeepSeek, etc.) for fuzzy joining of dataframes. When the key columns of two dataframes have spelling differences, are in different languages, or cannot be matched exactly, llm-join can automatically generate prompts and utilize LLMs to assist in high-quality joining.
You can install the development version of llmjoin from GitHub with:
devtools::install_github("evanliu3594/llmjoin")Please note that all information is stored strictly locally in your system configuration (run
tools::R_user_dir("llmjoin", "config")to see the full path), and is never uploaded or shared.
library(llmjoin)
# OpenAI
set_llm(provider = "openai", key = "your-api-key")
# Claude (Anthropic)
set_llm(provider = "claude", key = "your-api-key")
# Gemini (via OpenAI-compatible endpoint)
set_llm(provider = "gemini", key = "your-api-key")
# Custom endpoint (Ollama, DeepSeek, Kimi, Grok, Qwen, etc.)
set_llm(provider = "openai",
url = "https://your-custom-endpoint/v1/chat/completions",
key = "your-api-key",
model = "your-model-name")Below examples used Deepseek-V4-Flash.
Match numeric month codes to month names — the LLM understands that “01” means January.
x <- data.frame(id = c("01", "02", "04"), value = c(10, 20, 40))
y <- data.frame(month = c("January", "Feb", "May"), amount = c(100, 200, 400))
llm_join(x, y, key1 = "id", key2 = "month")
# month id value amount
# 1 Feb 02 20 200
# 2 January 01 10 100
# 3 <NA> 04 40 NAMatch approximate or differently-formatted numeric identifiers — the LLM handles rounding, unit conversion, and format differences.
left <- data.frame(
weight_kg = c(1.0, 2.5, 5.0),
product = c("Widget", "Gadget", "Thing")
)
right <- data.frame(
weight_lb = c("2.2 lb", "5.5 lb", "11 lb"),
price = c(4.99, 9.99, 19.99)
)
llm_join(left, right, key1 = "weight_kg", key2 = "weight_lb")
# weight_lb weight_kg product price
# 1 11 lb 5.0 Thing 19.99
# 2 2.2 lb 1.0 Widget 4.99
# 3 5.5 lb 2.5 Gadget 9.99Match country names to ISO codes — the LLM bridges different naming conventions, languages, and abbreviations.
left <- data.frame(
country = c("China", "United States", "Germany", "日本"),
sales = c(1500, 3200, 2100, 800)
)
right <- data.frame(
code = c("CN", "US", "DE", "JP"),
region = c("Asia", "Americas", "Europe", "Asia")
)
llm_join(left, right, key1 = "country", key2 = "code")
# code country sales region
# 1 CN China 1500 Asia
# 2 DE Germany 2100 Europe
# 3 JP 日本 800 Asia
# 4 US United States 3200 Americasx <- data.frame(id = c("01", "02", "04"), value = c(10, 20, 40))
y <- data.frame(month = c("January", "Feb", "May"), amount = c(100, 200, 400))
joint_prompt(unique(x["id"]), unique(y["month"])) |> writeClipboard()Paste the prompts to ask your LLM model, and copy the answer, going back to R and continue run:
joint <- parse_joint(readr::clipboard(), key1 = "id", key2 = "month")
Reduce(\(x, y) merge(x, y, all.x = TRUE), list(x, joint, y))
# month id value amount
# 1 Feb 02 20 200
# 2 January 01 10 100
# 3 <NA> 04 40 NAMIT License
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.