The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

dqcheckr

Automated data quality checks for recurring dataset deliveries.

For each new file arrival, dqcheckr runs a battery of quality checks, compares the file to the previous delivery, writes a self-contained HTML report, and records summary statistics in a local SQLite database so that quality trends can be tracked over time. Supports CSV and fixed-width formats. Custom organisation-specific checks can be supplied as plain R files.

This is a CLI/API package — no UI. If you’d rather configure and run checks without writing R code, see dqcheckrGUI, a Shiny front-end built on top of this package.

What it does

Installation

install.packages("dqcheckr")

# or, the development version from GitHub
devtools::install_github("mickmioduszewski/dqcheckr")

Usage

A data officer runs a single command for each arriving dataset:

library(dqcheckr)

run_dq_check("customer_accounts", config_dir = "path/to/configs")

This prints a one-line console summary, writes an HTML report, and returns list(status, report_path, snapshot_id) invisibly.

Two YAML files control every run: a global dqcheckr.yml (default thresholds shared across datasets) and a per-dataset <dataset_name>.yml (file location, expected columns, column-level rules and overrides).

Learn more

See vignette("dqcheckr") for a full walkthrough of configuration and the available checks, or the package documentation site.

License

MIT

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.