The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

joinspy

CRAN status CRAN downloads Monthly downloads R-CMD-check Codecov test coverage License: MIT

Find out why your keys don’t match.

You ran a left join and lost 40% of your rows. dplyr says “many-to-many relationship.” joinspy says 12 keys have trailing spaces, 8 differ only by case, and 3 contain invisible Unicode characters. Then it fixes them.

Quick Start

library(joinspy)

join_spy(orders, customers, by = "customer_id")

repaired <- join_repair(orders, customers, by = "customer_id")

suggest_repairs(join_spy(orders, customers, by = "customer_id"))

The Problem

Most join failures come down to string-level problems in keys:

R won’t warn you about any of these. join_spy() catches them before the join runs.

What joinspy Does

Diagnose

join_spy() examines keys before the join:

orders <- data.frame(
  id = c("A", "B ", "c", "D"),
  amount = c(100, 200, 300, 400),
  stringsAsFactors = FALSE
)

customers <- data.frame(
  id = c("A", "B", "C", "E"),
  name = c("Alice", "Bob", "Carol", "Eve"),
  stringsAsFactors = FALSE
)

join_spy(orders, customers, by = "id")
#> -- Join Diagnostic Report --
#> Match rate (left): 25%
#>
#> Issues Detected:
#>   ! "B " has trailing whitespace (would match "B")
#>   ! "c" vs "C" — case mismatch
#>   x "D" has no match in right table

Repair

join_repair() fixes the issues, or previews what it would change with dry_run = TRUE. suggest_repairs() prints the R code instead of running it.

join_repair(orders, customers, by = "id", dry_run = TRUE)

repaired <- join_repair(orders, customers, by = "id",
                        standardize_case = "upper")

suggest_repairs(join_spy(orders, customers, by = "id"))
#> x$id <- trimws(x$id)
#> x$id <- toupper(x$id)
#> y$id <- toupper(y$id)

Predict

join_spy() also estimates result size for each join type:

report <- join_spy(orders, customers, by = "id")
report$expected_rows
#> inner_join: 1
#> left_join:  4
#> right_join: 4
#> full_join:  7

Explain

join_explain() works after the join, on the result:

result <- merge(orders, customers, by = "id", all.x = TRUE)
join_explain(result, orders, customers, by = "id", type = "left")
#> Result has same row count as left table
#> ! 3 left key(s) have no match in right table

Also Includes

The package ships join wrappers (left_join_spy(), inner_join_spy(), etc.) that run diagnostics before joining and attach the report as an attribute. join_strict() enforces cardinality (1:1, 1:m, m:1, m:m) and errors on violation. check_cartesian() flags many-to-many keys that would multiply your row count. analyze_join_chain() handles multi-step A-B-C sequences.

Joins work with tibbles, data.tables, and plain data frames.

Installation

# Install from CRAN
install.packages("joinspy")

# Or install development version from GitHub
# install.packages("pak")
pak::pak("gcol33/joinspy")

Documentation

Package Focus
dplyr 1.1+ Cardinality checks via relationship argument
powerjoin 12-level configurable checks, key preprocessing
joyn Match-status reporting variable per row
tidylog Logs row count changes after joins

joinspy focuses on string-level key diagnostics: whitespace, case, encoding, typos, and type mismatches. It identifies which specific keys failed, why, and can fix them automatically.

Support

“Software is like sex: it’s better when it’s free.” — Linus Torvalds

I’m a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.

If this package saved you some time, buying me a coffee is a nice way to say thanks.

Buy Me A Coffee

License

MIT (see the LICENSE.md file)

Citation

@software{joinspy,
  author = {Colling, Gilles},
  title = {joinspy: Diagnostic Tools for Data Frame Joins},
  year = {2025},
  url = {https://github.com/gcol33/joinspy}
}

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.