The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
joinspy provides diagnostic tools for understanding and debugging data frame joins in R. Instead of discovering join problems after the fact, joinspy helps you identify issues upfront and understand what will happen when you join.
Key features:
join_spy() and
key_check()key_duplicates()join_explain() and
join_diff()left_join_spy(),
right_join_spy(), inner_join_spy(),
full_join_spy()join_strict()join_repair() and
suggest_repairs()plot() and summary()
methodsjoinspy analyzes join keys before you join, detecting:
join_spy()# Sample data with issues
orders <- data.frame(
customer_id = c("A", "B", "B", "C", "D "),
amount = c(100, 200, 150, 300, 50),
stringsAsFactors = FALSE
)
customers <- data.frame(
customer_id = c("A", "B", "C", "D", "E"),
name = c("Alice", "Bob", "Carol", "David", "Eve"),
stringsAsFactors = FALSE
)
# Get diagnostic report
report <- join_spy(orders, customers, by = "customer_id")Use summary() for a compact data frame of metrics:
summary(report)
#> metric value
#> 1 left_rows 5.00
#> 2 right_rows 5.00
#> 3 left_unique_keys 4.00
#> 4 right_unique_keys 5.00
#> 5 keys_matched 3.00
#> 6 keys_left_only 1.00
#> 7 keys_right_only 2.00
#> 8 match_rate 0.75
#> 9 issues 2.00
#> 10 inner_join_rows 4.00
#> 11 left_join_rows 5.00
#> 12 right_join_rows 6.00
#> 13 full_join_rows 7.00key_check()For a fast pass/fail assessment:
Use *_join_spy() functions for joins with automatic
diagnostics:
orders_clean <- data.frame(
customer_id = c("A", "B", "C"),
amount = c(100, 200, 300),
stringsAsFactors = FALSE
)
# Silent mode for pipelines
result <- left_join_spy(orders_clean, customers, by = "customer_id", .quiet = TRUE)
head(result)
#> customer_id amount name
#> 1 A 100 Alice
#> 2 B 200 Bob
#> 3 C 300 Carol
# Access diagnostics afterward
last_report()$match_analysis$match_rate
#> [1] 1Available wrappers:
| Function | Description |
|---|---|
left_join_spy() |
Left join with diagnostics |
right_join_spy() |
Right join with diagnostics |
inner_join_spy() |
Inner join with diagnostics |
full_join_spy() |
Full join with diagnostics |
Use join_strict() to enforce expected relationships:
products <- data.frame(id = 1:3, name = c("Widget", "Gadget", "Gizmo"))
prices <- data.frame(id = 1:3, price = c(10, 20, 30))
# Succeeds - 1:1 relationship
join_strict(products, prices, by = "id", expect = "1:1")
#> id name price
#> 1 1 Widget 10
#> 2 2 Gadget 20
#> 3 3 Gizmo 30# Fails - duplicates violate 1:1
prices_dup <- data.frame(id = c(1, 1, 2, 3), price = c(10, 15, 20, 30))
join_strict(products, prices_dup, by = "id", expect = "1:1")
#> Error:
#> ! Cardinality violation: expected 1:1 but found 1:m
#> Left duplicates: 0, Right duplicates: 1| Cardinality | Description |
|---|---|
"1:1" |
Each key unique in both tables |
"1:m" |
Left keys unique, right may duplicate |
"m:1" |
Right keys unique, left may duplicate |
"m:m" |
No constraints |
Fix common issues automatically:
messy <- data.frame(
id = c(" A", "B ", " C "),
value = 1:3,
stringsAsFactors = FALSE
)
# Preview repairs
join_repair(messy, by = "id", dry_run = TRUE)
#>
#> ── Repair Preview (Dry Run) ────────────────────────────────────────────────────
#>
#> ── Left table (x) ──
#>
#> ℹ id: trimmed whitespace (3)
# Apply repairs
fixed <- join_repair(messy, by = "id")
#> ✔ Repaired 3 value(s)
fixed$id
#> [1] "A" "B" "C"orders_dup <- data.frame(id = 1:3, product = c("A", "B", "C"))
inventory <- data.frame(id = c(1, 1, 2, 3), location = c("NY", "LA", "NY", "LA"))
result <- merge(orders_dup, inventory, by = "id")
join_explain(result, orders_dup, inventory, by = "id", type = "inner")
#>
#> ── Join Explanation ────────────────────────────────────────────────────────────
#>
#> ── Row Counts ──
#>
#> Left table (x): 3 rows
#> Right table (y): 4 rows
#> Result: 4 rows
#> ! Result has 1 more rows than left table
#>
#> ── Why the row count changed ──
#>
#> ℹ Right table has 1 duplicate key(s) - each match creates multiple rowsbefore <- data.frame(id = 1:3, val = c("a", "b", "c"))
after <- merge(before, data.frame(id = 2:4, name = c("B", "C", "D")), by = "id", all = TRUE)
join_diff(before, after, by = "id")
#>
#> ── Join Diff ───────────────────────────────────────────────────────────────────
#>
#> ── Dimensions ──
#>
#> Before: 3 rows x 2 columns
#> After: 4 rows x 3 columns
#> Change: "+1" rows, "+1" columns
#>
#> ── Column Changes ──
#>
#> ✔ Added: name
#>
#> ── Key Analysis ──
#>
#> Unique keys before: 3
#> Unique keys after: 4orders <- data.frame(id = 1:5, val = 1:5)
customers <- data.frame(id = 3:7, name = letters[3:7])
report <- join_spy(orders, customers, by = "id")
plot(report) # Venn diagramComprehensive pre-join diagnostic report.
| Parameter | Description | Default |
|---|---|---|
x |
Left data frame | required |
y |
Right data frame | required |
by |
Column name(s) to join by | required |
sample |
Sample size for large datasets | NULL |
Returns: JoinReport S3 object with
print(), summary(), and plot()
methods.
Quick pass/fail key quality assessment.
Returns: Logical TRUE if keys are
clean, FALSE otherwise (with messages).
Join with cardinality enforcement.
| Parameter | Description | Default |
|---|---|---|
type |
Join type: "left", "right",
"inner", "full" |
"left" |
expect |
Expected cardinality: "1:1", "1:m",
"m:1", "m:m" |
"1:1" |
Returns: Joined data frame, or error if cardinality violated.
Automatically fix common key issues.
join_repair(x, y = NULL, by, trim_whitespace = TRUE,
standardize_case = NULL, remove_invisible = TRUE,
empty_to_na = FALSE, dry_run = FALSE)| Parameter | Description | Default |
|---|---|---|
trim_whitespace |
Remove leading/trailing whitespace | TRUE |
standardize_case |
Convert to "lower" or "upper" |
NULL |
remove_invisible |
Remove invisible Unicode characters | TRUE |
empty_to_na |
Convert "" to NA |
FALSE |
dry_run |
Preview changes without modifying | FALSE |
Returns: Repaired data frame(s).
vignette("common-issues") - Catalogue of join problems
and solutions?join_spy, ?key_check,
?join_strict, ?join_repairsessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: Europe/Luxembourg
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] joinspy_0.7.3
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.55
#> [5] glue_1.8.0 cachem_1.1.0 knitr_1.51 htmltools_0.5.9
#> [9] rmarkdown_2.30 lifecycle_1.0.5 cli_3.6.5 vctrs_0.7.0
#> [13] sass_0.4.10 jquerylib_0.1.4 compiler_4.5.2 tools_4.5.2
#> [17] pillar_1.11.1 evaluate_1.0.5 bslib_0.9.0 yaml_2.3.12
#> [21] otel_0.2.0 rlang_1.1.7 jsonlite_2.0.0These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.