The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette walks through a real comparative biology workflow using avian trait data and a phylogenetic tree. The same approach applies to any taxon group — mammals, fish, amphibians, plants — the functions are fully taxon-agnostic.
The typical workflow has four steps: load your data and tree, reconcile the names, produce aligned objects, and run your analysis.
data(avonet_subset) # AVONET morphological traits (Tobias et al. 2022)
data(tree_jetz) # Jetz et al. (2012) phylogeny, Corvoidea + allies
cat(sprintf("Data: %d species\n", nrow(avonet_subset)))
#> Data: 919 species
cat(sprintf("Tree: %d tips\n", ape::Ntip(tree_jetz)))
#> Tree: 657 tips
# The data uses spaces; the tree uses underscores
head(avonet_subset$Species1, 3)
#> [1] "Acanthiza apicalis" "Acanthiza chrysorrhoa" "Acanthiza cinerea"
head(tree_jetz$tip.label, 3)
#> [1] "Amytornis_barbatus" "Amytornis_merrotsyi" "Amytornis_dorotheae"These formatting differences — spaces vs underscores, minor spelling variants, taxonomic synonyms — are exactly what prepR4pcm resolves.
result <- reconcile_tree(
x = avonet_subset,
tree = tree_jetz,
x_species = "Species1",
authority = NULL # skip synonym lookup for speed
)
#> ℹ Reconciling 919 data names vs 657 tree tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ Matched 657/919 data names to tree tips
print(result)
#>
#> ── Reconciliation: data vs tree ────────────────────────────────────────────────
#> Source x: avonet_subset
#> Source y: phylo (657 tips)
#> Authority: none
#> Timestamp: 2026-06-20 12:44:20
#> ℹ Match coverage: [█████████████████████░░░░░░░░░] 71% (657/919)
#>
#> ── Match summary ──
#>
#> • Exact: 0 ( 0.0%)
#> • Normalized: 657 (71.5%)
#> • Synonym: 0 ( 0.0%)
#> • Fuzzy: 0 ( 0.0%)
#> • Manual: 0 ( 0.0%)
#> ! Unresolved (x only):262 (28.5%)
#> ! Unresolved (y only):0
#> ! Flagged for review: 0
#> ℹ Use `reconcile_summary()` for details, `reconcile_mapping()` for the full table.The reconciliation object records every name-matching decision. Inspect the mapping to see what happened:
mapping <- reconcile_mapping(result)
# Match type breakdown
table(mapping$match_type)
#>
#> normalized unresolved
#> 657 262
# Show normalised matches (formatting differences resolved automatically)
norm <- mapping[mapping$match_type == "normalized",
c("name_x", "name_y", "notes")]
if (nrow(norm) > 0) head(norm, 5)
#> # A tibble: 5 × 3
#> name_x name_y notes
#> <chr> <chr> <chr>
#> 1 Acanthiza apicalis Acanthiza_apicalis 'Acanthiza apicalis' normalised t…
#> 2 Acanthiza chrysorrhoa Acanthiza_chrysorrhoa 'Acanthiza chrysorrhoa' normalise…
#> 3 Acanthiza ewingii Acanthiza_ewingii 'Acanthiza ewingii' normalised to…
#> 4 Acanthiza inornata Acanthiza_inornata 'Acanthiza inornata' normalised t…
#> 5 Acanthiza iredalei Acanthiza_iredalei 'Acanthiza iredalei' normalised t…
# Unresolved: in data but not in tree
unresolved <- mapping[mapping$match_type == "unresolved" & mapping$in_x, ]
cat(sprintf("\nSpecies in data but not in tree: %d\n", nrow(unresolved)))
#>
#> Species in data but not in tree: 262For a detailed report:
Drop unresolved species to get a matched data frame and tree, ready for comparative analysis:
aligned <- reconcile_apply(
result,
data = avonet_subset,
tree = tree_jetz,
species_col = "Species1",
drop_unresolved = TRUE
)
#> ! Dropped 262 rows with unresolved species from data
#> ℹ Tree has 657 tips after alignment
cat(sprintf("Aligned data: %d rows\nAligned tree: %d tips\n",
nrow(aligned$data), ape::Ntip(aligned$tree)))
#> Aligned data: 657 rows
#> Aligned tree: 657 tipsWith aligned data and tree, you are ready for any phylogenetic comparative method. Here are two common approaches.
PGLS accounts for shared evolutionary history when estimating regression parameters:
library(caper)
# reconcile_apply() aligns names so data$Species1 matches tree tip labels
cd <- comparative.data(aligned$tree, aligned$data,
names.col = "Species1", vcv = TRUE)
# PGLS: body mass ~ wing length
model_pgls <- pgls(log(Mass) ~ log(Wing.Length), data = cd)
summary(model_pgls)
#>
#> Call:
#> pgls(formula = log(Mass) ~ log(Wing.Length), data = cd)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.53264 -0.04823 -0.00130 0.04332 0.50941
#>
#> Branch length transformations:
#>
#> kappa [Fix] : 1.000
#> lambda [Fix] : 1.000
#> delta [Fix] : 1.000
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -5.781917 0.381775 -15.145 < 2.2e-16 ***
#> log(Wing.Length) 2.054361 0.065135 31.540 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.09582 on 651 degrees of freedom
#> Multiple R-squared: 0.6044, Adjusted R-squared: 0.6038
#> F-statistic: 994.8 on 1 and 651 DF, p-value: < 2.2e-16When you need random effects beyond phylogeny or want a Bayesian framework, use a PGLMM. The MCMCglmm package fits Bayesian phylogenetic mixed models:
library(MCMCglmm)
# Species column as the phylogenetic grouping factor
aligned$data$phylo <- aligned$data$Species1
# Inverse phylogenetic covariance matrix
# Replace any zero-length branches (can arise after pruning)
tree_mcmc <- aligned$tree
tree_mcmc$edge.length[tree_mcmc$edge.length < .Machine$double.eps] <- 1e-6
inv_phylo <- inverseA(tree_mcmc, nodes = "ALL", scale = FALSE)
# PGLMM: continuous response
prior <- list(R = list(V = 1, nu = 0.002),
G = list(G1 = list(V = 1, nu = 0.002)))
model_mcmc <- MCMCglmm(
log(Mass) ~ log(Wing.Length) + Trophic.Level,
random = ~phylo,
family = "gaussian",
ginverse = list(phylo = inv_phylo$Ainv),
data = aligned$data,
prior = prior,
nitt = 50000, burnin = 10000, thin = 20,
verbose = FALSE
)summary(model_mcmc)
#>
#> Iterations = 10001:49981
#> Thinning interval = 20
#> Sample size = 2000
#>
#> DIC: -178.2905
#>
#> G-structure: ~phylo
#>
#> post.mean l-95% CI u-95% CI eff.samp
#> phylo 0.001687 0.00125 0.002144 1485
#>
#> R-structure: ~units
#>
#> post.mean l-95% CI u-95% CI eff.samp
#> units 0.03147 0.02651 0.03729 2017
#>
#> Location effects: log(Mass) ~ log(Wing.Length) + Trophic.Level
#>
#> post.mean l-95% CI u-95% CI eff.samp pMCMC
#> (Intercept) -6.22726 -6.73301 -5.72926 2000 <5e-04 ***
#> log(Wing.Length) 2.14957 2.04337 2.24649 2000 <5e-04 ***
#> Trophic.LevelHerbivore 0.02731 -0.05546 0.10551 2000 0.539
#> Trophic.LevelOmnivore 0.03818 -0.02799 0.10404 2000 0.264
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1For categorical responses (e.g., migration status with multiple categories), see Mizuno et al. (2025, J. Evol. Biol. 38:1699–1715) for the multinomial PGLMM approach and the accompanying tutorial at https://ayumi-495.github.io/multinomial-GLMM-tutorial/.
See Hadfield (2010, J. Stat. Softw. 33:1–22) for MCMCglmm details, Hadfield & Nakagawa (2010, J. Evol. Biol. 23:494–508) for phylogenetic quantitative genetics, and Mizuno et al. (2025, J. Evol. Biol. 38:1699–1715) for phylogenetic multinomial mixed models.
That is the complete core workflow: load, reconcile, apply, analyse.
If you need to harmonise species names across two trait datasets
before matching to a tree, use
reconcile_data():
data(nesttrait_subset) # Nest traits (Chia et al. 2023)
rec_data <- reconcile_data(
x = nesttrait_subset,
y = avonet_subset,
x_species = "Scientific_name",
y_species = "Species1",
authority = NULL,
quiet = TRUE
)
print(rec_data)
#>
#> ── Reconciliation: data vs data ────────────────────────────────────────────────
#> Source x: nesttrait_subset
#> Source y: avonet_subset
#> Authority: none
#> Timestamp: 2026-06-20 12:44:39
#> ℹ Match coverage: [██████████████████████████████] 100% (916/916)
#>
#> ── Match summary ──
#>
#> • Exact: 916 (100.0%)
#> • Normalized: 0 ( 0.0%)
#> • Synonym: 0 ( 0.0%)
#> • Fuzzy: 0 ( 0.0%)
#> • Manual: 0 ( 0.0%)
#> ! Unresolved (x only):0 ( 0.0%)
#> ! Unresolved (y only):3
#> ! Flagged for review: 0
#> ℹ Use `reconcile_summary()` for details, `reconcile_mapping()` for the full table.Once reconciled, merge the two datasets into a single data frame:
merged <- reconcile_merge(
rec_data,
data_x = nesttrait_subset,
data_y = avonet_subset,
species_col_x = "Scientific_name",
species_col_y = "Species1"
)
#> ✔ Merged 916 species (inner join)
cat(sprintf("Merged: %d rows, %d columns\n", nrow(merged), ncol(merged)))
#> Merged: 916 rows, 31 columnsreconcile_merge() assumes one row per species in each
data frame. If a species appears in multiple rows (e.g. sex-specific
measurements, repeated populations, or individual-level records), the
merge produces all pairwise combinations for that species — the same
behaviour as base merge(). reconcile_merge()
warns when it detects duplicates so that you are not surprised by row
expansion.
There are two sensible ways to handle multi-row data:
Option A. Aggregate first, merge second. If your downstream PCM expects one row per species (most PGLS and PGLMM workflows do), collapse to a species-level summary before merging:
# Example: averaging individual measurements to species means
species_means <- aggregate(
cbind(Mass, Wing.Length) ~ Species1,
data = individual_measurements,
FUN = mean
)
merged <- reconcile_merge(rec_data, species_means, avonet_subset,
species_col_x = "Species1",
species_col_y = "Species1")Option B. Reconcile once, join the mapping back to the full data. If you want to keep every row (e.g. for an individual-level PGLMM), build the reconciliation on a species-level summary and then use the mapping as a lookup table for the original, multi-row data:
# Reconcile on unique species
species_level <- data.frame(
Species1 = unique(individual_measurements$Species1)
)
rec <- reconcile_data(species_level, avonet_subset,
x_species = "Species1", y_species = "Species1",
authority = NULL, quiet = TRUE)
# Join the mapping back to the full, multi-row dataset
mapping <- reconcile_mapping(rec)
individual_measurements$species_resolved <- mapping$name_resolved[
match(individual_measurements$Species1, mapping$name_x)
]A common situation in comparative biology is merging a small focal
dataset against a much larger reference (e.g. a field study of 50
species against AVONET’s ~10,000). reconcile_merge()
accepts datasets of any size, but the how argument
matters:
# Keep only species present in both: inner join
inner <- reconcile_merge(rec_data, small_data, large_data,
species_col_x = "species",
species_col_y = "Species1",
how = "inner")
# Keep all small_data rows; fill large_data columns with NA
# for species missing from the reference: left join
left <- reconcile_merge(rec_data, small_data, large_data,
species_col_x = "species",
species_col_y = "Species1",
how = "left")Use how = "inner" when the analysis cannot tolerate
NAs in the reference columns, and how = "left"
when you want to retain every focal-study species (and you will handle
missingness in the model). how = "full" is rarely what you
want here — it would return the entire reference dataset padded with
NAs for every focal trait.
When your data and tree use different taxonomies (e.g., BirdLife data against a BirdTree phylogeny), a curated crosswalk can resolve names that automated synonym resolution misses.
A crosswalk is simply a table mapping names from one system to another. prepR4pcm includes the BirdLife-BirdTree crosswalk as an example:
data(crosswalk_birdlife_birdtree)
table(crosswalk_birdlife_birdtree$Match.type)
#>
#> 1BL to 1BT 1BL to many BT Extinct
#> 8960 225 143
#> Many BL to 1BT Newly described species
#> 1933 24Convert it to an overrides table and pass it to
reconcile_tree():
overrides <- reconcile_crosswalk(
crosswalk_birdlife_birdtree,
from_col = "Species1",
to_col = "Species3",
match_type_col = "Match.type"
)
#> ℹ 1933 many-to-one entries (lumps) included
#> ℹ 225 one-to-many entries (splits) included
#> ✔ Crosswalk: 3039 overrides (8079 identical pairs skipped)
# Re-reconcile with overrides
result_xw <- reconcile_tree(
x = avonet_subset,
tree = tree_jetz,
x_species = "Species1",
authority = NULL,
overrides = overrides
)
#> ℹ Reconciling 919 data names vs 657 tree tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (68 matched so far)...
#> ! 2971 overrides could not be applied: 23 already_matched; 2767 name_x_not_in_data; 181 name_y_not_in_target.
#> ℹ See `result$unused_overrides` for details.
#> ✔ Matched 657/919 data names to tree tips
# Compare: how many more matches with the crosswalk?
cat(sprintf("Without crosswalk: %d matched\n",
sum(result$mapping$in_x & result$mapping$in_y, na.rm = TRUE)))
#> Without crosswalk: 657 matched
cat(sprintf("With crosswalk: %d matched\n",
sum(result_xw$mapping$in_x & result_xw$mapping$in_y, na.rm = TRUE)))
#> With crosswalk: 657 matchedWhen do you need a crosswalk? Only when your data and tree follow different naming authorities and a curated mapping exists. For most use cases, the automatic cascade (exact → normalised → synonym) is sufficient.
You can also build your own overrides manually — it is just a data
frame with name_x, name_y, and optionally
user_note columns:
For sensitivity analyses across phylogenies,
reconcile_to_trees() reconciles one dataset against several
trees in one call:
data(tree_clements25) # Clements 2025 tree
results <- reconcile_to_trees(
x = avonet_subset,
trees = list(
jetz = tree_jetz,
clements = tree_clements25
),
x_species = "Species1",
authority = NULL
)
#> ℹ Reconciling 919 data names against 2 trees
#> ℹ [jetz] 657 tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ [jetz] Matched 657/919 names
#> ℹ [clements] 854 tips
#> ℹ Matching 919 x 854 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ [clements] Matched 854/919 names
# Compare overlap across trees
sapply(results, function(r) {
c(matched = sum(r$mapping$in_x & r$mapping$in_y, na.rm = TRUE),
unresolved_x = r$counts$n_unresolved_x)
})
#> jetz clements
#> matched 657 854
#> unresolved_x 262 65Enable fuzzy matching to catch likely typos in species names:
result <- reconcile_tree(
x = my_data,
tree = my_tree,
fuzzy = TRUE, # enable fuzzy matching
fuzzy_threshold = 0.9, # minimum similarity (0-1)
resolve = "flag" # flag low-confidence matches for review
)
# Check flagged matches
flagged <- reconcile_mapping(result)
flagged[flagged$match_type == "flagged", c("name_x", "name_y", "match_score")]When the tree has fewer species than the data,
reconcile_apply() drops the unresolved species. This loses
statistical power and can bias the sample.
reconcile_augment() grafts the missing species onto the
tree using genus-level placement:
aug <- reconcile_augment(
result,
tree_jetz,
where = "genus", # sister to a random congener
branch_length = "congener_median", # median terminal branch of congeners
seed = 42, # for reproducibility
quiet = TRUE
)
cat(sprintf("Original tips: %d\nAugmented tips: %d\n",
ape::Ntip(aug$original), ape::Ntip(aug$tree)))
#> Original tips: 657
#> Augmented tips: 793
cat(sprintf("Added: %d | Skipped (no congener): %d\n",
nrow(aug$augmented), nrow(aug$skipped)))
#> Added: 136 | Skipped (no congener): 126
# Which species were added, and where?
if (nrow(aug$augmented) > 0) head(aug$augmented[, c("species", "placed_near", "branch_length")])
#> # A tibble: 6 × 3
#> species placed_near branch_length
#> <chr> <chr> <dbl>
#> 1 Acanthiza cinerea Acanthiza lineata 5.04
#> 2 Calamanthus cautus Calamanthus fuliginosus 3.80
#> 3 Calamanthus montanellus Calamanthus fuliginosus 3.80
#> 4 Calamanthus pyrrhopygius Calamanthus fuliginosus 3.80
#> 5 Gerygone citrina Gerygone magnirostris 2.91
#> 6 Pyrrholaemus sagittatus Pyrrholaemus brunneus 15.5Use the augmented tree in downstream analyses. Pass the augmented
tree to reconcile_apply() — the existing reconciliation
object is still valid as the name-mapping key, but the new tree contains
the extra tips, so drop_unresolved = FALSE retains the
grafted species:
aligned_aug <- reconcile_apply(
result,
data = avonet_subset,
tree = aug$tree, # augmented tree, not the original
species_col = "Species1",
drop_unresolved = FALSE # keep augmented tips (they are now in the tree)
)Important caveat. Genus-level placement assumes the
missing species diverged similarly to its congeners, which may not hold.
Always report which species were augmented (aug$augmented)
and run sensitivity analyses comparing results with and without
them.
Write aligned data, tree, and the full mapping table to disk:
Generate a self-contained HTML report documenting every name-matching decision. Useful for sharing with collaborators or archiving alongside your analysis:
report_file <- tempfile(fileext = ".html")
reconcile_report(result, file = report_file)
unlink(report_file)The report opens in any browser. It begins with the run header, match-coverage summary, and a small bar chart of match composition (Figure 1). Further down, per-match-type detail tables and the unresolved-species list make each decision auditable (Figure 2). The file is self-contained — styles, charts, and tables are all inline — so it can be archived or shared without external assets.
Taxon-agnostic. This workflow works for any group — mammals, fish, amphibians, plants — as long as you have a data frame and a phylogenetic tree.
Provenance. Every name-matching decision is
recorded in the reconciliation object. Use
reconcile_summary() for a human-readable report or
reconcile_mapping() for the full table.
Crosswalks are optional. Most users do not need them. The automatic cascade handles formatting differences and synonyms. Crosswalks help when two well-known naming authorities disagree.
Tree augmentation. When the tree is incomplete,
reconcile_augment() grafts missing species using congener
placement — but always run sensitivity analyses with and without
augmented tips.
Sensitivity. reconcile_to_trees()
makes it easy to run the same analysis across multiple
phylogenies.
Merging. reconcile_merge() joins
two reconciled datasets into a single analysis-ready data frame, using
the mapping as the join key.
Reports. reconcile_report()
generates a self-contained HTML report suitable for sharing or
archiving.
Visualisation. reconcile_plot()
produces a bar or pie chart of match composition.
reconcile_suggest() shows the closest fuzzy candidates for
unresolved species.
Comparison. reconcile_diff()
compares two reconciliation runs side by side — e.g., before and after
adding a crosswalk.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.