The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

DataDNA

DataDNA is an R package that gives every data frame a compact fingerprint, lineage match, and report-ready identity figure.

Instead of only asking “what is in this table?”, DataDNA asks:

The package is designed for analysts who receive CSVs, extracts, dashboards, or modeling data sets and need a fast way to recognize and compare them.

Example

library(DataDNA)

demo <- dna_example_customers()

dna <- data_dna(demo$customers_new, name = "customers_new")
dna

card <- dna_card(dna, file = "customers_dna.html")

dna_compare(demo$customers_old, demo$customers_new)
dna_diff(demo$customers_old, demo$customers_new)

dna_compare() combines exact schema overlap with shape, species, role structure, distribution, missingness, category, and identity signals. This makes the score feel more like a data fingerprint than a strict column-name check.

The package also includes lazy-loaded customers_old and customers_new example data sets.

Find the closest ancestor

library <- list(
  customers_2024 = data_dna(customers_old),
  customers_2025 = data_dna(customers_new)
)

match <- dna_match(customers_new, library)
match

dna_match_plot(match, file = "lineage.png")

dna_match_plot() is now the recommended reporting output. It renders a static PNG/PDF lineage figure with base R graphics: white background, compact ranking table, and restrained similarity lines that fit technical reports, papers, and slide decks better than a web page.

Core API

data_dna(df)
dna_card(df)
dna_compare(old_df, new_df)
dna_diff(old_df, new_df)
dna_match(new_df, dna_library)
dna_match_card(match)
dna_match_plot(match)
dna_species(df)

Installation

From GitHub:

install.packages("devtools")
devtools::install_github("TonyIsFool/DataDNA")

Or with the lighter remotes package:

install.packages("remotes")
remotes::install_github("TonyIsFool/DataDNA")

From a local source tarball:

install.packages("DataDNA_0.1.0.tar.gz", repos = NULL, type = "source")

Design

The profiling and comparison algorithms use base R. The HTML card uses the lightweight htmltools package so the result is portable and CRAN-friendly.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.