The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

tidyaudit

R-CMD-check License: LGPL-3 Lifecycle: experimental

Pipeline audit trails and data diagnostics for tidyverse workflows.

tidyaudit captures metadata-only snapshots at each step of a dplyr pipeline, building a structured audit report without storing the data itself. Operation-aware taps enrich snapshots with join match rates, filter drop statistics, and more. The package combines diagnostic tools for interactive development and production-oriented tools for data quality.

Installation

# Install CRAN version using
install.packages("tidyaudit")

# Install development version using using `pak`
pak::pak("fpcordeiro/tidyaudit")

Quick Example

library(tidyaudit)
library(dplyr)
set.seed(123)

orders  <- data.frame(id = 1:100, amount = runif(100, 10, 500), region_id = sample(1:5, 100, TRUE))
regions <- data.frame(region_id = 1:4, name = c("North", "South", "East", "West"))

trail <- audit_trail("order_pipeline")

result <- orders |>
  audit_tap(trail, "raw") |>
  left_join_tap(regions, by = "region_id", .trail = trail, .label = "with_region") |>
  filter_tap(amount > 100, .trail = trail, .label = "high_value", .stat = amount)
#> ℹ filter_tap: amount > 100
#> Dropped 18 of 100 rows (18.0%)
#> Stat amount: dropped 1,062.191 of 25,429.39

print(trail)
#> ── Audit Trail: "order_pipeline" ─────────────────────────────────────────────────────────────────────
#> Created: 2026-02-21 14:36:35
#> Snapshots: 3
#> 
#>   #  Label        Rows  Cols  NAs  Type                                
#>   ─  ───────────  ────  ────  ───  ────────────────────────────────────
#>   1  raw           100     3    0  tap                                 
#>   2  with_region   100     4   23  left_join (many-to-one, 77% matched)
#>   3  high_value     82     4   20  filter (dropped 18 rows, 18%)       
#> 
#> Changes:
#>   raw → with_region: = rows, +1 cols, +23 NAs
#>   with_region → high_value: -18 rows, = cols, -3 NAs

audit_diff(trail, "raw", "high_value")
#> ── Audit Diff: "raw" → "high_value" ──
#> 
#>   Metric  Before  After  Delta
#>   ──────  ──────  ─────  ─────
#>   Rows       100     82    -18
#>   Cols         3      4     +1
#>   NAs          0     20    +20
#> 
#> ✔ Columns added: name
#> 
#> Numeric shifts (common columns):
#>     Column     Mean before  Mean after   Shift
#>     ─────────  ───────────  ──────────  ──────
#>     id               50.50       49.66   -0.84
#>     amount          254.29      297.16  +42.87
#>     region_id         3.08        3.05   -0.03

Features

Audit trail system — the core innovation:

Diagnostic functions — tidyverse ports from dtaudit:

See vignette("tidyaudit") for the audit trail walkthrough and vignette("diagnostics") for the diagnostic functions guide.

Relationship to dtaudit

tidyaudit is a tidyverse-native sibling to dtaudit (a data.table-based package on CRAN). The two packages share design vocabulary and S3 class naming conventions but no code or dependencies.

License

LGPL (>= 3)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.