The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Phylogenetic comparative methods (PCMs) propagate uncertainty imperfectly. The standard pipeline takes one tree, fits one model, and reports one set of confidence intervals. But the tree itself is typically a point estimate from a posterior, the trait data typically have missing values, and each of those is its own source of uncertainty.
This vignette walks through a pipeline that handles both:
The two packages compose: prepR4pcm produces multiPhylo,
pigauto consumes multiPhylo. Once the contract is set up,
swapping the backend (rtrees vs datelife vs fishtree) doesn’t change the
rest of the pipeline.
A model-ready dataset for fish, with:
m_per_tree upward for a real
analysis).Start with whatever names appear in your dataset. They never quite
match a tree’s tip labels — formatting drifts, synonyms creep in.
pr_get_tree() copes: it normalises every name (underscores,
whitespace, authority strings) and, for the fishtree
backend, resolves synonyms through Open Tree of Life’s Taxonomic Name
Resolution Service before querying the backend.
# Your trait data — typically loaded from disk. Names need not be
# tidy: `Esox_lucius` carries an underscore, which pr_get_tree()
# normalises away.
my_data <- data.frame(
species = c("Salmo salar", "Esox_lucius", "Oncorhynchus mykiss"),
body_size = c(98, NA, 50) # NA is fine; pigauto will impute it
)Every backend that can return multiple trees does so via
n_tree > 1. Here we ask fishtree for 50
stochastic resolutions (Chang, Rabosky, Alfaro 2019, Syst.
Biol.).
trees <- pr_get_tree(my_data,
species_col = "species",
source = "fishtree",
n_tree = 50)
class(trees$tree) # "multiPhylo"
length(trees$tree) # 50
trees$matched # the 3 species, in their original input form
trees$unmatched # character(0) — every species was placed
# Per-tree provenance: one list entry per returned tree, each
# recording source_index, citation, calibration_method, and n_tips.
length(trees$backend_meta$tree_provenance) # 50
trees$backend_meta$tree_provenance[[1]] # the provenance of tree 1pr_get_tree() records what happened to every name in
trees$mapping (one row per input species), so a name that
quietly fell out of the tree cannot go unnoticed.
For the universal-but-slower option, use
source = "datelife" — the chronogram-database backend that
returns one calibrated tree per source (Hedges, Bininda-Emonds, Upham,
etc.).
trees <- pr_get_tree(my_data,
species_col = "species",
source = "datelife",
n_tree = 50)
# trees$tree is multiPhylo; per-source citations are in
# trees$backend_meta$tree_provenance[[i]]$citationIf you already have a topology and want to date it (rather
than fetch a new one), use pr_date_tree():
my_topology <- ape::read.tree("my_topology.nex")
dated_trees <- pr_date_tree(my_topology, n_dated = 50)
class(dated_trees$tree) # "multiPhylo"
length(dated_trees$tree) # up to 50What n_dated = 50 actually returns: 50
chronograms that all share my_topology but have
different branch lengths. Each variant comes from a different
source paper in DateLife’s database (e.g. variant 1 dated against Hedges
et al. 2015, variant 2 against Bininda-Emonds et al. 2007, etc.). The
topology is fixed; the dating varies.
To get 50 different topologies, fetch them separately first
(e.g.
pr_get_tree(species, source = "rtrees", taxon = "mammal")
returns 100 mammal topologies sampled from VertLife / Upham et
al. 2019), then pass that multiPhylo to
pr_date_tree() to date each one. That gives you the
topology × source cross-product.
pr_get_tree() resolves names well enough to
retrieve a tree, but the modelling step needs a data frame
whose rows line up one-to-one with the tree’s tips.
reconcile_tree() matches your data against the retrieved
topology, and reconcile_apply() returns the aligned
data–tree pair. The 50 posterior trees share one tip set, so reconciling
against the first tree aligns the data to all of them.
rec <- reconcile_tree(my_data, trees$tree[[1]],
x_species = "species",
authority = NULL) # tree tips are plain binomials
rec # match summary: 1 exact, 2 normalized, 0 unresolved
aligned <- reconcile_apply(rec,
data = my_data,
tree = trees$tree[[1]],
species_col = "species",
drop_unresolved = TRUE)
aligned$data # one row per tree tip — the model-ready frame
#> species body_size
#> 1 Salmo salar 98
#> 2 Esox_lucius NA
#> 3 Oncorhynchus mykiss 50aligned$data carries one row per species in the tree,
with Esox_lucius keeping the NA that pigauto
will impute; aligned$tree is the matching pruned tree.
Reviewers ask. Save yourself the round-trip.
This is the contract.
pigauto::multi_impute_trees(df, trees, m_per_tree) takes
df (data with NAs) and trees
(multiPhylo) and returns m_per_tree
complete-data imputations per tree. The output knows which
imputation came from which tree, and pigauto’s pooling step later uses
that index.
library(pigauto)
# 5 imputations per tree × 50 trees = 250 complete datasets
mi <- multi_impute_trees(aligned$data,
trees = trees$tree,
m_per_tree = 5)
# Fit your model to each completed dataset. Replace the formula
# with whatever your hypothesis is.
fits <- with_imputations(mi, function(df) {
glmmTMB::glmmTMB(body_size ~ 1 + (1 | species),
data = df)
})
# Pool via Rubin's rules — the standard errors reflect BOTH the
# imputation uncertainty AND the tree-posterior uncertainty.
pooled <- pool_mi(fits,
coef_fun = function(fit) fixef(fit)$cond,
vcov_fun = function(fit) vcov(fit)$cond)
pooledThe reported intervals are wider than the equivalent single-tree-single-imputation pipeline — and correctly so. The narrower intervals were always overconfident.
Verified on a clean macOS R 4.4 install on 2026-05-01. See the Comparing tree backends vignette for the full status table.
| If your taxon is… | And you want… | Use | Status |
|---|---|---|---|
| Universal | A single best-guess tree | pr_get_tree(source = "rotl") |
✅ verified — returns the Open Tree of Life synthesis tree |
| Universal | A posterior of dated trees | pr_get_tree(source = "datelife", n_tree = 50) |
likely ✅ — datelife is in Enhances;
install separately with
pak::pak("phylotastic/datelife") |
| Birds | A single tree, current Clements | pr_get_tree(source = "clootl", n_tree = 1) |
✅ verified — works out of the box (uses the v1.6/2025 taxonomy
bundled with clootl) |
| Birds | A posterior of trees, current Clements | pr_get_tree(source = "clootl", n_tree = 50) |
⚠️ requires clootl::get_avesdata_repo(".") once before
this works; without it sampleTrees() errors with
AvesData repo not found. Capped at 100 upstream. |
| Fish | Time-calibrated, posterior | pr_get_tree(source = "fishtree", n_tree = 50) |
✅ verified — returns multiPhylo[50] |
| Bird, mammal, fish, amphibian, reptile, plant, shark, bee, butterfly | Taxon-specific mega-tree, possibly grafted | pr_get_tree(source = "rtrees", taxon = "<group>") |
✅ works; ⚠️ n_tree is informational
only — rtrees::get_tree() has no
n_tree argument, so the count is fixed by the chosen
mega-tree (taxon = "bird" → 100,
taxon = "mammal" → 100, taxon = "fish" → 1,
etc.). |
For the dating-only case (you already have a topology):
| If your topology is… | Use | Status |
|---|---|---|
| Universal taxa | pr_date_tree(my_tree, n_dated = 50) |
likely ✅ — untested in this run; same dependency story as
datelife |
pr_tree_compare() for quick pairwise Jaccard /
Robinson-Foulds / branch-length comparisons (see the comparing tree backends
vignette). For richer visual comparison use
phytools::cophyloplot() or
phangorn::densiTree().impute() / multi_impute(). This vignette skips
straight to the trees-aware variant because that’s the integration
point.For repeated runs of the same retrieval (e.g. while iterating on a
manuscript), pass cache = TRUE to
pr_get_tree(). The cache lives under
tools::R_user_dir() by default; redirect with
pr_tree_cache_dir() if you want it inside the project.
?pr_get_tree, ?pr_date_tree,
?pr_cite_tree, ?reconcile_augment — the four
entry points for prepR4pcm’s tree-handling.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.