The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Find out why your keys don’t match.
You ran a left join and lost 40% of your rows. dplyr says “many-to-many relationship.” joinspy says 12 keys have trailing spaces, 8 differ only by case, and 3 contain invisible Unicode characters. Then it fixes them.
library(joinspy)
join_spy(orders, customers, by = "customer_id")
repaired <- join_repair(orders, customers, by = "customer_id")
suggest_repairs(join_spy(orders, customers, by = "customer_id"))Most join failures come down to string-level problems in keys:
"Alice" vs "Alice " (trailing space,
invisible)"NYC" vs "nyc" (case)"Johansson" vs "Johannson" (one character
off)NAR won’t warn you about any of these. join_spy() catches
them before the join runs.
join_spy() examines keys before the join:
orders <- data.frame(
id = c("A", "B ", "c", "D"),
amount = c(100, 200, 300, 400),
stringsAsFactors = FALSE
)
customers <- data.frame(
id = c("A", "B", "C", "E"),
name = c("Alice", "Bob", "Carol", "Eve"),
stringsAsFactors = FALSE
)
join_spy(orders, customers, by = "id")
#> -- Join Diagnostic Report --
#> Match rate (left): 25%
#>
#> Issues Detected:
#> ! "B " has trailing whitespace (would match "B")
#> ! "c" vs "C" — case mismatch
#> x "D" has no match in right tablejoin_repair() fixes the issues, or previews what it
would change with dry_run = TRUE.
suggest_repairs() prints the R code instead of running
it.
join_repair(orders, customers, by = "id", dry_run = TRUE)
repaired <- join_repair(orders, customers, by = "id",
standardize_case = "upper")
suggest_repairs(join_spy(orders, customers, by = "id"))
#> x$id <- trimws(x$id)
#> x$id <- toupper(x$id)
#> y$id <- toupper(y$id)join_spy() also estimates result size for each join
type:
report <- join_spy(orders, customers, by = "id")
report$expected_rows
#> inner_join: 1
#> left_join: 4
#> right_join: 4
#> full_join: 7join_explain() works after the join, on the result:
result <- merge(orders, customers, by = "id", all.x = TRUE)
join_explain(result, orders, customers, by = "id", type = "left")
#> Result has same row count as left table
#> ! 3 left key(s) have no match in right tableThe package ships join wrappers (left_join_spy(),
inner_join_spy(), etc.) that run diagnostics before joining
and attach the report as an attribute. join_strict()
enforces cardinality (1:1, 1:m,
m:1, m:m) and errors on violation.
check_cartesian() flags many-to-many keys that would
multiply your row count. analyze_join_chain() handles
multi-step A-B-C sequences.
Joins work with tibbles, data.tables, and plain data frames.
# Install from CRAN
install.packages("joinspy")
# Or install development version from GitHub
# install.packages("pak")
pak::pak("gcol33/joinspy")| Package | Focus |
|---|---|
| dplyr 1.1+ | Cardinality checks via relationship argument |
| powerjoin | 12-level configurable checks, key preprocessing |
| joyn | Match-status reporting variable per row |
| tidylog | Logs row count changes after joins |
joinspy focuses on string-level key diagnostics: whitespace, case, encoding, typos, and type mismatches. It identifies which specific keys failed, why, and can fix them automatically.
“Software is like sex: it’s better when it’s free.” — Linus Torvalds
I’m a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.
If this package saved you some time, buying me a coffee is a nice way to say thanks.
MIT (see the LICENSE.md file)
@software{joinspy,
author = {Colling, Gilles},
title = {joinspy: Diagnostic Tools for Data Frame Joins},
year = {2025},
url = {https://github.com/gcol33/joinspy}
}These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.