The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette catalogs common join problems and shows how joinspy detects and resolves them. Each issue includes detection methods and recommended solutions.
Problem: When one or both tables have duplicate keys, joins multiply rows unexpectedly.
orders <- data.frame(
customer_id = c(1, 2, 2, 3),
amount = c(100, 50, 75, 200)
)
addresses <- data.frame(
customer_id = c(1, 2, 2, 3),
address = c("NYC", "LA", "SF", "Chicago")
)
join_spy(orders, addresses, by = "customer_id")
#>
#> ── Join Diagnostic Report ──────────────────────────────────────────────────────
#> Join columns: customer_id
#>
#> ── Table Summary ──
#>
#> Left table: Rows: 4 Unique keys: 3 Duplicate keys: 1 NA keys: 0
#> Right table: Rows: 4 Unique keys: 3 Duplicate keys: 1 NA keys: 0
#>
#> ── Match Analysis ──
#>
#> Keys in both: 3
#> Keys only in left: 0
#> Keys only in right: 0
#> Match rate (left): "100%"
#>
#> ── Issues Detected ──
#>
#> ! Left table has 1 duplicate key(s) affecting 2 rows - may cause row multiplication
#> ! Right table has 1 duplicate key(s) affecting 2 rows - may cause row multiplication
#>
#> ── Expected Row Counts ──
#>
#> inner_join: 6
#> left_join: 6
#> right_join: 6
#> full_join: 6Detection: join_spy() reports duplicate
counts and expected row multiplication.
Solution: Aggregate or filter duplicates before joining.
Problem: Invisible leading/trailing spaces prevent matches.
sales <- data.frame(
product = c("Widget", "Gadget ", " Gizmo"),
units = c(10, 20, 30),
stringsAsFactors = FALSE
)
inventory <- data.frame(
product = c("Widget", "Gadget", "Gizmo"),
stock = c(100, 200, 300),
stringsAsFactors = FALSE
)
join_spy(sales, inventory, by = "product")
#>
#> ── Join Diagnostic Report ──────────────────────────────────────────────────────
#> Join columns: product
#>
#> ── Table Summary ──
#>
#> Left table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#> Right table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#>
#> ── Match Analysis ──
#>
#> Keys in both: 1
#> Keys only in left: 2
#> Keys only in right: 2
#> Match rate (left): "33.3%"
#>
#> ── Issues Detected ──
#>
#> ! Left column 'product' has 2 value(s) with leading/trailing whitespace
#> ℹ 2 near-match(es) found (e.g., 'Gadget ' ~ 'Gadget', ' Gizmo' ~ 'Gizmo') - possible typos?
#>
#> ── Expected Row Counts ──
#>
#> inner_join: 1
#> left_join: 3
#> right_join: 3
#> full_join: 5Detection: join_spy() flags whitespace
issues in the Issues section.
Solution: Use join_repair() or
trimws().
Problem: Keys differ only by case (“ABC” vs “abc”).
left <- data.frame(
code = c("ABC", "def", "GHI"),
value = 1:3,
stringsAsFactors = FALSE
)
right <- data.frame(
code = c("abc", "DEF", "ghi"),
label = c("A", "D", "G"),
stringsAsFactors = FALSE
)
join_spy(left, right, by = "code")
#>
#> ── Join Diagnostic Report ──────────────────────────────────────────────────────
#> Join columns: code
#>
#> ── Table Summary ──
#>
#> Left table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#> Right table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#>
#> ── Match Analysis ──
#>
#> Keys in both: 0
#> Keys only in left: 3
#> Keys only in right: 3
#> Match rate (left): "0%"
#>
#> ── Issues Detected ──
#>
#> ! 3 key(s) would match if case-insensitive (e.g., 'ABC' vs 'abc')
#>
#> ── Expected Row Counts ──
#>
#> inner_join: 0
#> left_join: 3
#> right_join: 3
#> full_join: 6Detection: join_spy() detects case
mismatches when keys would match if case-insensitive.
Solution: Standardize case with
join_repair().
Problem: NA values in key columns never match (by design in R).
orders <- data.frame(
customer_id = c(1, NA, 3, NA),
amount = c(100, 200, 300, 400)
)
customers <- data.frame(
customer_id = c(1, 2, 3, NA),
name = c("Alice", "Bob", "Carol", "Unknown")
)
join_spy(orders, customers, by = "customer_id")
#>
#> ── Join Diagnostic Report ──────────────────────────────────────────────────────
#> Join columns: customer_id
#>
#> ── Table Summary ──
#>
#> Left table: Rows: 4 Unique keys: 2 Duplicate keys: 0 NA keys: 2
#> Right table: Rows: 4 Unique keys: 3 Duplicate keys: 0 NA keys: 1
#>
#> ── Match Analysis ──
#>
#> Keys in both: 2
#> Keys only in left: 0
#> Keys only in right: 1
#> Match rate (left): "100%"
#>
#> ── Issues Detected ──
#>
#> ! Left table has 2 NA key(s) - these will not match
#> ! Right table has 1 NA key(s) - these will not match
#>
#> ── Expected Row Counts ──
#>
#> inner_join: 2
#> left_join: 4
#> right_join: 4
#> full_join: 6Detection: join_spy() reports NA counts
in the Table Summary.
Solution: Handle NA values explicitly—remove them or replace with a placeholder.
Problem: Inner join returns zero rows when you expected matches.
system_a <- data.frame(
user_id = c("USR001", "USR002", "USR003"),
score = c(85, 90, 78),
stringsAsFactors = FALSE
)
system_b <- data.frame(
user_id = c("1", "2", "3"),
department = c("Sales", "Marketing", "Engineering"),
stringsAsFactors = FALSE
)
report <- join_spy(system_a, system_b, by = "user_id")Detection: Match analysis shows 0% match rate.
Solution: Create a mapping table or transform keys to a common format.
Problem: Both tables have duplicate keys, causing exponential row growth.
order_items <- data.frame(
order_id = c(1, 1, 2, 2, 2),
item = c("A", "B", "C", "D", "E")
)
order_payments <- data.frame(
order_id = c(1, 1, 2, 2),
payment = c("CC1", "CC2", "Cash", "Check")
)
report <- join_spy(order_items, order_payments, by = "order_id")Detection: Expected row counts show multiplication (inner join = 10 rows from 9 source rows).
Solution: Aggregate one table first, or use
check_cartesian().
Problem: Keys have different types (numeric vs character).
orders <- data.frame(
product_id = c(1, 2, 3),
quantity = c(10, 20, 30)
)
products <- data.frame(
product_id = c("1", "2", "3"),
name = c("Widget", "Gadget", "Gizmo"),
stringsAsFactors = FALSE
)
join_spy(orders, products, by = "product_id")
#>
#> ── Join Diagnostic Report ──────────────────────────────────────────────────────
#> Join columns: product_id
#>
#> ── Table Summary ──
#>
#> Left table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#> Right table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#>
#> ── Match Analysis ──
#>
#> Keys in both: 3
#> Keys only in left: 0
#> Keys only in right: 0
#> Match rate (left): "100%"
#>
#> ── Issues Detected ──
#>
#> ! Type mismatch: 'product_id' is numeric, 'product_id' is character - may cause unexpected results
#>
#> ── Expected Row Counts ──
#>
#> inner_join: 3
#> left_join: 3
#> right_join: 3
#> full_join: 3Detection: join_spy() flags type
coercion warnings.
Solution: Convert to matching types before joining.
Problem: Empty strings ("") and
NA behave differently in joins.
left <- data.frame(
id = c("A", "", "C"),
value = 1:3,
stringsAsFactors = FALSE
)
right <- data.frame(
id = c("A", "B", ""),
label = c("Alpha", "Beta", "Empty"),
stringsAsFactors = FALSE
)
join_spy(left, right, by = "id")
#>
#> ── Join Diagnostic Report ──────────────────────────────────────────────────────
#> Join columns: id
#>
#> ── Table Summary ──
#>
#> Left table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#> Right table: Rows: 3 Unique keys: 3 Duplicate keys: 0 NA keys: 0
#>
#> ── Match Analysis ──
#>
#> Keys in both: 2
#> Keys only in left: 1
#> Keys only in right: 1
#> Match rate (left): "66.7%"
#>
#> ── Issues Detected ──
#>
#> ℹ Left column 'id' has 1 empty string(s) - these match other empty strings but not NA
#> ℹ Right column 'id' has 1 empty string(s) - these match other empty strings but not NA
#>
#> ── Expected Row Counts ──
#>
#> inner_join: NA
#> left_join: NA
#> right_join: NA
#> full_join: NADetection: join_spy() warns about empty
strings in keys.
Solution: Convert empty strings to NA with
join_repair().
When you know the expected cardinality, use
join_strict() to fail fast:
Let joinspy determine the actual relationship:
| Issue | Detection | Solution |
|---|---|---|
| Duplicates | join_spy(), key_duplicates() |
Aggregate or filter |
| Whitespace | join_spy(), key_check() |
join_repair(), trimws() |
| Case mismatch | join_spy() |
join_repair(standardize_case=) |
| NA keys | join_spy() Table Summary |
Remove or replace |
| No matches | join_spy() Match Analysis |
Check key format/mapping |
| M:M explosion | join_spy(), check_cartesian() |
Aggregate first |
| Type mismatch | join_spy() |
Convert types |
| Empty strings | join_spy() |
join_repair(empty_to_na=TRUE) |
join_spy(x, y, by) to get a comprehensive
diagnostickey_duplicates() to locate specific duplicate
rowsjoin_repair() to fix whitespace/case/encoding
issuesjoin_strict() to enforce expected cardinalityjoin_explain() to understand row
count changesvignette("introduction") - Getting started guide?join_spy, ?key_check,
?join_repair, ?join_strict?check_cartesian, ?detect_cardinalitysessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: Europe/Luxembourg
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] joinspy_0.7.3
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.55
#> [5] glue_1.8.0 cachem_1.1.0 knitr_1.51 htmltools_0.5.9
#> [9] rmarkdown_2.30 lifecycle_1.0.5 cli_3.6.5 vctrs_0.7.0
#> [13] sass_0.4.10 jquerylib_0.1.4 compiler_4.5.2 tools_4.5.2
#> [17] pillar_1.11.1 evaluate_1.0.5 bslib_0.9.0 yaml_2.3.12
#> [21] otel_0.2.0 rlang_1.1.7 jsonlite_2.0.0These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.