The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Primary keys for data frames.
In databases, you declare customer_id as a primary key
and the database enforces uniqueness. With CSV and Excel files, you get
no such guarantees - duplicates slip in silently.
keyed brings database-style protections to R data frames through four features:
| Feature | What it does |
|---|---|
| Keys | Declare unique columns, enforced through transformations |
| Locks | Assert conditions (no NAs, row counts, coverage) |
| UUIDs | Track row identity through your pipeline |
| Commits | Snapshot data to detect drift |
# install.packages("pak")
pak::pak("gcol33/keyed")Declare which columns must be unique - like a primary key in a database.
library(keyed)
# Declare the key (errors if not unique)
customers <- read.csv("customers.csv") |>
key(customer_id)
# Composite keys work too
sales <- key(sales, region, year)Keys follow your data through transformations:
# Base R
active <- customers[customers$status == "active", ]
has_key(active)
#> [1] TRUE
# dplyr
active <- customers |> filter(status == "active")
has_key(active)
#> [1] TRUEKeys block operations that would break uniqueness:
customers |> mutate(customer_id = 1)
#> Error: Key is no longer unique after transformation.
#> i Use `unkey()` first if you intend to break uniqueness.
# To proceed, explicitly remove the key first
customers |> unkey() |> mutate(customer_id = 1)Preview joins before running them:
diagnose_join(customers, orders, by = "customer_id")
#> Cardinality: one-to-many
#> customers: 1000 rows (unique)
#> orders: 5432 rows (4432 duplicates)
#> Left join will produce ~5432 rowsAssert conditions at checkpoints in your pipeline.
customers |>
lock_unique(customer_id) |> # Must be unique
lock_no_na(email) |> # No missing emails
lock_nrow(min = 100) # At least 100 rowsLocks error immediately if the condition fails - no silent continuation.
Available locks:
| Function | Checks |
|---|---|
lock_unique(df, col) |
No duplicate values |
lock_no_na(df, col) |
No missing values |
lock_complete(df) |
No NAs in any column |
lock_coverage(df, threshold, col) |
% non-NA above threshold |
lock_nrow(df, min, max) |
Row count in range |
When your data has no natural key, generate stable row identifiers.
# Add a UUID to each row
customers <- add_id(customers)
#> .id name
#> 1 a3f2c8e1b9d04567 Alice
#> 2 7b1e4a9c2f8d3601 Bob
#> 3 e9c7b2a1d4f80235 CarolUUIDs survive all transformations:
filtered <- customers |> filter(name != "Bob")
get_id(filtered)
#> [1] "a3f2c8e1b9d04567" "e9c7b2a1d4f80235"Track which rows were added or removed:
compare_ids(customers, filtered)
#> Lost: 1 row (7b1e4a9c2f8d3601)
#> Kept: 2 rowsUUIDs let you trace rows through joins, filters, and reshaping - essential for debugging data pipelines.
Snapshot your data to detect unexpected changes later.
# Save a snapshot (stored in memory for this session)
customers <- customers |> commit_keyed()
# Work with your data...
customers <- customers |>
filter(status == "active") |>
mutate(score = score + 10)
# Check what changed since the commit
check_drift(customers)
#> Drift detected!
#> - Row count: 1000 -> 847 (-153)
#> - Column 'score' modifiedHow it works: - Each data frame can have one
snapshot attached - Snapshots persist for your R session (lost on
restart) - check_drift() compares current state to the
snapshot - clear_snapshot() removes it,
list_snapshots() shows all
Useful for catching unexpected changes during interactive analysis.
| Need | Better Tool |
|---|---|
| Enforced schema | SQLite, DuckDB |
| Full data validation | pointblank, validate |
| Production pipelines | targets |
keyed gives you database-style protections without database infrastructure. For exploratory workflows where SQLite is overkill but silent corruption is unacceptable.
MIT
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.