The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Test for Differences in Diversification Rates over Time
Version: 1.0.0
Description: Employ time-calibrated phylogenies and trait/range data to test for differences in diversification rates over evolutionary time. Extend the STRAPP test from BAMMtools::traitDependentBAMM() to any time step along phylogenies. See inst/COPYRIGHTS for details on third-party code.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
URL: https://github.com/MaelDore/deepSTRAPP, https://maeldore.github.io/deepSTRAPP/
BugReports: https://github.com/MaelDore/deepSTRAPP/issues
Imports: ape (≥ 5.8.0), BAMMtools (≥ 2.1.12), cladoRcpp (≥ 0.15.1), coda (≥ 0.19-4.1), cowplot (≥ 1.1.3), dplyr (≥ 1.1.4), dunn.test (≥ 1.3.6), geiger (≥ 2.0.11), ggplot2 (≥ 3.5.0), methods (≥ 4.4.2), phytools (≥ 2.0.0), plyr (≥ 1.8.9), purrr (≥ 1.0.2), qpdf (≥ 1.4.1), RColorBrewer (≥ 1.1-3), Rcpp (≥ 1.0.14), scales (≥ 1.3.0), stringr (≥ 1.5.1), tidyr (≥ 1.3.1)
Depends: R (≥ 4.4)
LazyData: true
LazyDataCompression: xz
Suggests: knitr, maps (≥ 3.4.2), parallel (≥ 4.4.2), BioGeoBEARS (≥ 1.1.3), rmarkdown
Additional_repositories: https://maeldore.github.io/drat
LinkingTo: Rcpp
VignetteBuilder: knitr
NeedsCompilation: yes
Packaged: 2026-01-13 16:22:36 UTC; Mael
Author: Maël Doré ORCID iD [aut, cre, cph], Dan Rabosky [ctb], Mike Grundler [ctb], Huateng Huang [ctb], Pascal Title [ctb], Liam Revell [ctb], Nicholas J. Matzke [ctb]
Maintainer: Maël Doré <mael.dore@gmail.com>
Repository: CRAN
Date/Publication: 2026-01-19 17:30:02 UTC

Template file for BAMM diversification analyses

Description

Template file for BAMM diversification analyses provided as character strings.

Source: bamm-2.5.0 References: http://bamm-project.org/; https://github.com/macroevolution/bamm

Usage

data(BAMM_template_diversification)

Format

A vector of 260 character strings.

Details

A vector of 260 character strings that can be displayed with print(BAMM_template_diversification). This is the template used to generate the 'config_file.txt' controlling settings used for a BAMM run. It provides detailed explanations of the additional_BAMM_settings that can be used in prepare_diversification_data() to customize the BAMM run. It is called internally by prepare_diversification_data() to produce the custom 'config_file' used in the subsequent BAMM run.

References

Authors: Daniel Rabosky (BAMM) & Pascal Title ({BAMMtools}). Modified by Maël Doré for deepSTRAPP.

See Also

BAMM software website: http://bamm-project.org/


Convert Biogeographic Stochastic Map (BSM) to phytools SIMMAP stochastic map (SM) format

Description

These functions converts a Biogeographic Stochastic Map (BSM) output from BioGeoBEARS into a simmap object from R package {phytools} (See phytools::make.simmap()).

They require a model fit with BioGeoBEARS::bears_optim_run() and the output of a Biogeographic Stochastic Mapping performed with BioGeoBEARS::runBSM() to produce simmap objects as phylogenies with the associated mapping of range evolution along branches across simulations.

Initial functions in R package BioGeoBEARS by Nicholas J. Matzke:

Usage

BSM_to_phytools_simmap(model_fit, phylo, BSM_output, sim_index)

BSMs_to_phytools_simmaps(model_fit, phylo, BSM_output)

Arguments

model_fit

A BioGeoBEARS results object, produced by ML inference via BioGeoBEARS::bears_optim_run().

phylo

Time-calibrated phylogeny used in the BioGeoBEARS analyses to produce the historical biogeographic inference and run the Biogeographic Stochastic Mapping. Object of class "phylo" as defined in {ape}.

BSM_output

A list with two objects, a cladogenetic events table and an anagenetic events table, as the result of Biogeographic Stochastic Mapping conducted with BioGeoBEARS::runBSM().

sim_index

Integer. Index of the biogeographic simulation targeted to produce the simmap with BSM_to_phytools_simmap().

Details

These functions are slight adaptations of original functions from the R Package BioGeoBEARS by N. Matzke.

Initial functions: BioGeoBEARS::BSM_to_phytools_SM() BioGeoBEARS::BSMs_to_phytools_SMs()

Changes:

Value

The BSM_to_phytools_simmap() function returns a list with two elements:

The BSMs_to_phytools_simmaps() function loop around the BSM_to_phytools_simmap() function to aggregate all simmaps from all biogeographic simulations in a unique list of classes c("multiSimmap", "multiPhylo").

Notes on BioGeoBEARS

The R package BioGeoBEARS is needed for this function to work with biogeographic data. Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.

Notes on using the resulting simmap object in phytools (adapted from Nicholas J. Matzke)

The phytools functions, like phytools::countSimmap(), will only count the anagenetic events (range transitions occurring along branches) as it was written assuming purely anagenetic models.

It remains possible to extract cladogenetic events (range transitions occurring at speciation) by comparing the last-state-below-a-node with the descendant-pairs-above-a-node. However, it is recommended to use the built-in functions from BioGeoBEARS to summarize the biogeographic history based on the tables of cladogenetic and anagenetic events obtained from BioGeoBEARS::runBSM(). simmap objects should primarily be considered as a tool for visualization.

Associated functions in R package BioGeoBEARS:

Please note carefully that area-to-area dispersal events are not identical with the state transitions. For example, a state can be a geographic range with multipleareas, but under the logic of DEC-type models, a range-expansion event like ABC->ABCD actually means that a dispersal happened from some specific area (A, B, or C) to the new area. BSMs track this area-to-area sourcing in its cladogenetic and anagenetic event tables, at least if BioGeoBEARS::simulate_source_areas_ana_clado() has been run on the output of BioGeoBEARS::runBSM().

Author(s)

Nicholas J. Matzke. Contact: matzke@berkeley.edu

Changes by Maël Doré (see Details)

References

For BioGeoBEARS: Matzke, Nicholas J. (2018). BioGeoBEARS: BioGeography with Bayesian (and likelihood) Evolutionary Analysis with R Scripts. version 1.1.1, published on GitHub on November 6, 2018. doi:10.5281/zenodo.1478250. Website: http://phylo.wikidot.com/biogeobears.

See Also

phytools::countSimmap() phytools::make.simmap() BioGeoBEARS::simulate_source_areas_ana_clado() BioGeoBEARS::get_dmat_times_from_res() BioGeoBEARS::count_ana_clado_events() BioGeoBEARS::hist_event_counts()

Examples

if (deepSTRAPP::is_dev_version())
{
 ## Run only if you have R package 'BioGeoBEARS' installed.
 # Please install it manually from: https://github.com/nmatzke/BioGeoBEARS")

 ## Load phylogeny and tip data
 library(phytools)
 data(eel.tree)

  # (May take several minutes to run)
 ## Load directly output of prepare_trait_data() run on biogeographic data
 data(eel_biogeo_data, package = "deepSTRAPP")

 ## Convert BSM output into a unique simmap, including residence times
 simmap_1 <- BSM_to_phytools_simmap(model_fit = eel_biogeo_data$best_model_fit,
                                    phylo = eel.tree,
                                    BSM_output = eel_biogeo_data$BSM_output,
                                    sim_index = 1)
 # Explore output
 str(simmap_1, max.level = 1)
 # Print residence times in each range
 simmap_1$residence_times
 # Plot simmap
 plot(simmap_1$simmap)

 ## Convert BSM output into all simmaps in a multiSimmap/multiPhylo object
 all_simmaps <- BSMs_to_phytools_simmaps(model_fit = eel_biogeo_data$best_model_fit,
                                         phylo = eel.tree,
                                         BSM_output = eel_biogeo_data$BSM_output)
 # Explore output
 str(all_simmaps, max.level = 1)
 # Plot simmap n°1
 plot(all_simmaps[[1]]) 
}


Dataset providing biogeographic range data for extant ponerine ants

Description

A data.frame of range location for the 1534 extant ponerine ant taxa (Ponerinae subfamily).

Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3

Usage

data(Ponerinae_binary_range_table)

Format

A data.frame with 1534 rows and 10 columns.

Details

A data.frame of range locations covering the 1534 extant ponerine ant taxa (Ponerinae subfamily).

References

Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3


Data summarizing the evolution of geographic ranges in Ponerinae ants using an old ill-calibrated phylogeny for illustrative purposes

Description

A list containing geographic ranges data of Ponerinae mapped on the old ill-calibrated phylogeny, modeled with R package BioGeoBEARS. Ranges are labeled between "Old World" (O) and New World (N). This object was obtained with prepare_trait_data(). The phylogeny used is also NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.

Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3

Usage

data(Ponerinae_biogeo_data_old_calib)

Format

A list with 6 elements.

Details

A list of five objects containing information on the evolution of feeding habits in eels. This object was obtained with prepare_trait_data().

References

Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3

See Also

prepare_trait_data()


Data summarizing the evolution of fake size data in Ponerinae ants using a 2-level factor as categorical trait

Description

A list containing fake size data of Ponerinae ants mapped on the phylogeny, modeled with geiger::fitDiscrete. This object was obtained with prepare_trait_data(). This is NOT real biological/ecological data. They were designed for illustrative purposes only. The phylogeny used is also NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.

Usage

data(Ponerinae_cat_2lvl_data_old_calib)

Format

A list with 5 elements.

Details

A list of five objects containing information on the evolution of fake size data in Ponerinae ants. This object was obtained with prepare_trait_data().

See Also

prepare_trait_data()


Data summarizing the evolution of fake habitat data in Ponerinae ants using a 3-level factor as categorical trait

Description

A list containing fake habitat data of Ponerinae ants mapped on the phylogeny, modeled with geiger::fitDiscrete. This object was obtained with prepare_trait_data(). This is NOT real biological/ecological data. They were designed for illustrative purposes only. The phylogeny used is also NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.

Usage

data(Ponerinae_cat_3lvl_data_old_calib)

Format

A list with 5 elements.

Details

A list of five objects containing information on the evolution of fake habitat data in Ponerinae ants. This object was obtained with prepare_trait_data().

See Also

prepare_trait_data()


Data summarizing the evolution of a fake continuous trait in Ponerinae ants extracted for 10 Mya

Description

A list containing estimated values for a fake continuous trait mapped on the Ponerinae ant phylogeny, modeled with geiger::fitContinuous. This object was obtained with extract_most_likely_trait_values_for_focal_time() applied on trait evolution data obtained with prepare_trait_data(). The phylogeny used is NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.

Usage

data(Ponerinae_trait_cont_tip_data_10My)

Format

A list with 4 elements.

Details

A list of four elements containing information on the evolution of a continuous trait in Ponerinae ants extracted for 10 Mya.


Dataset providing fake trait data for extant ponerine ants for illustrative purposes

Description

A data.frame of fake trait data covering the 1534 extant ponerine ant taxa (Ponerinae subfamily). This is NOT real biological/ecological data. They were designed for illustrative purposes only.

Usage

data(Ponerinae_trait_tip_data)

Format

A data.frame with 1534 rows and 4 columns.

Details

A data.frame of fake trait data covering the 1534 extant ponerine ant taxa (Ponerinae subfamily).


Dataset providing the extensive time-calibrated phylogeny of extant ponerine ants

Description

A phylo object describing the time-calibrated phylogeny of the 1534 extant ponerine ants (Ponerinae subfamily).

Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3

Usage

data(Ponerinae_tree)

Format

A phylo object with 4 elements.

Details

A time-calibrated phylogeny as a phylo object with 4 elements.

References

Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3


Dataset providing the extensive time-calibrated phylogeny of extant ponerine ants using an old calibration for illustrative purposes

Description

A phylo object describing the time-calibrated phylogeny of the 1534 extant ponerine ants (Ponerinae subfamily). This is NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes. For a proper time-calibrated phylogeny of ponerine ants, see the Ponerinae_tree object in deepSTRAPP.

Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3

Usage

data(Ponerinae_tree_old_calib)

Format

A phylo object with 4 elements.

Details

A time-calibrated phylogeny as a phylo object with 4 elements.

References

Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3


Compute STRAPP to test for a relationship between diversification rates and trait data

Description

Carries out the appropriate statistical method to test for a relationship between diversification rates and trait data for a given point in the past (i.e. the focal_time). Tests are based on block-permutations: rates data are randomized across tips following blocks defined by the diversification regimes identified on each tip (typically from a BAMM).

Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.

The function is an extension of the original BAMMtools::traitDependentBAMM() function used to carry out STRAPP test on extant time-calibrated phylogenies.

Tests can be carried out on speciation, extinction and net diversification rates.

deepSTRAPP::compute_STRAPP_test_for_focal_time() can handle three types of statistical tests depending on the type of trait data provided:

Continuous trait data

Tests for correlations between trait and rates carried out with deepSTRAPP::compute_STRAPP_test_for_continuous_data(). The associated test is the Spearman's rank correlation test (See stats::cor.test).

Binary trait data

For categorical and biogeographic trait data that have only two states (ex: 'Nearctic' vs. 'Neotropics'). Tests for differences in rates between states are carried out with deepSTRAPP::compute_STRAPP_test_for_binary_data(). The associated test is the Mann-Whitney-Wilcoxon rank-sum test (See stats::wilcox.test).

Multinominal trait data

For categorical and biogeographic trait data with more than two states (ex: 'No leg' vs. 'Two legs' vs. 'Four legs'). Tests for differences in rates between states are carried out with deepSTRAPP::compute_STRAPP_test_for_multinominal_data(). The associated test for all states is the Kruskal-Wallis H test (See stats::kruskal.test). If posthoc_pairwise_tests = TRUE, post hoc pairwise tests between pairs of states will be carried out too. The associated test for post hoc pairwise tests is the Dunn's post hoc pairwise rank-sum test (See dunn.test::dunn.test).

Usage

compute_STRAPP_test_for_focal_time(
  BAMM_object,
  trait_data_list,
  rate_type = "net_diversification",
  seed = NULL,
  nb_permutations = NULL,
  replace_samples = FALSE,
  alpha = 0.05,
  two_tailed = TRUE,
  one_tailed_hypothesis = NULL,
  posthoc_pairwise_tests = FALSE,
  p.adjust_method = "none",
  return_perm_data = FALSE,
  nthreads = 1,
  print_hypothesis = TRUE
)

Arguments

BAMM_object

Object of class "bammdata", typically generated with update_rates_and_regimes_for_focal_time(), that contains a phylogenetic tree and associated diversification rates across selected posterior samples updated to a specific time in the past (i.e. the focal_time).

trait_data_list

List obtained from extract_most_likely_trait_values_for_focal_time() that contains at least a ⁠$trait_data⁠ element, a ⁠$focal_time⁠ element, and a ⁠$trait_data_type⁠. ⁠$trait_data⁠ is a named vector with the trait data found on the phylogeny at focal_time. ⁠$focal_time⁠ informs on the time in the past at which the trait and rates data will be tested. ⁠$trait_data_type⁠ informs on the type of trait data: continuous, categorical, or biogeographic.

rate_type

A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).

seed

Integer. Set the seed to ensure reproducibility. Default is NULL (a random seed is used).

nb_permutations

Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once.

replace_samples

Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample when drawing samples used to carry out the test. Default is FALSE.

alpha

Numerical. Significance level to use to compute the estimate corresponding to the values of the test statistic used to assess significance of the test. This does NOT affect p-values. Default is 0.05.

two_tailed

Logical. To define the type of tests. If TRUE (default), tests for correlations/differences in rates will be carried out with a null hypothesis that rates are not correlated with trait values (continuous data) or equals between trait states (categorical and biogeographic data). If FALSE, one-tailed tests are carried out.

  • For continuous data, it involves defining a one_tailed_hypothesis testing for either a "positive" or "negative" correlation under the alternative hypothesis.

  • For binary data (two states), it involves defining a one_tailed_hypothesis indicating which states have higher rates under the alternative hypothesis.

  • For multinominal data (more than two states), it defines the type of post hoc pairwise tests to carry out between pairs of states. If posthoc_pairwise_tests = TRUE, all two-tailed (if two_tailed = TRUE) or one-tailed (if two_tailed = FALSE) tests are automatically carried out.

one_tailed_hypothesis

A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B').

posthoc_pairwise_tests

Logical. Only for multinominal data (with more than two states). If TRUE, all possible post hoc pairwise (Dunn) tests will be computed across all pairs of states. This is a way to detect which pairs of states have significant differences in rates if the overall test (Kruskal-Wallis) is significant. Default is FALSE.

p.adjust_method

A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values in the post hoc pairwise tests to account for multiple comparisons. See stats::p.adjust() for the available methods. Default is none.

return_perm_data

Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output. This is needed to plot the histogram of the null distribution used to assess significance of the test with plot_histogram_STRAPP_test_for_focal_time(). Default is FALSE.

nthreads

Integer. Number of threads to use for paralleled computing of the tests across the permutations. The R package parallel must be loaded for nthreads > 1. Default is 1.

print_hypothesis

Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses, and what significant level is used to rejected or not the null hypothesis. Default is TRUE.

Details

These set of functions carries out the STructured RAte Permutations on Phylogenies (STRAPP) test as defined in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193.

It is an extension of the original BAMMtools::traitDependentBAMM() function used to carry out STRAPP test on extant time-calibrated phylogenies, but allowing here to test for differences/correlations at any point in the past (i.e. the focal_time).

It takes an object of class "bammdata" (BAMM_object) that was updated such as its diversification rates (⁠$tipLambda⁠ and ⁠$tipMu⁠) and regimes (⁠$tipStates⁠) are reflecting values observed at at a specific time in the past (i.e. the ⁠$focal_time⁠). Similarly, it takes a list (trait_data_list) that provides ⁠$trait_data⁠ as observed on branches at the same focal_time than the diversification rates and regimes.

A STRAPP test is carried out by drawing a random set of posterior samples from the BAMM_object, then randomly permuting rates across blocks of tips defined by the macroevolutionary regimes. Test statistics are then computed across the initial observed data and the permuted data for each sample. In a two-tailed test, the p-value is the proportion of posterior samples in which the test stats is as extreme in the permuted than in the observed data. In a one-tailed test, the p-value is the proportion of posterior samples in which the test stats is higher in the permuted than in the observed data.

———- Major changes compared to BAMMtools::traitDependentBAMM() ———-

Value

The function returns a list with at least eight elements.

Summary elements for the main test:

If using continuous or binary data:

If posthoc_pairwise_tests = TRUE (only for multinomial trait data):

If return_perm_data = TRUE, the stats data computed from the posterior samples for observed and permuted data are provided. This is needed to plot the histogram of the null distribution used to assess significance of the test with plot_histogram_STRAPP_test_for_focal_time().

If no STRAPP test was performed in the case of categorical/biogeographic data with a single state/range at focal_time, only the ⁠$trait_data_type⁠, ⁠$trait_data_type_for_stats⁠ = "none", and ⁠$focal_time⁠ are returned.

Author(s)

Maël Doré

References

For STRAPP: Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.

For STRAPP in deep times: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B., (2025), Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity, Nature Communications. doi:10.1038/s41467-025-63709-3

See Also

Associated functions in deepSTRAPP: extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time()

Original function in BAMMtools: BAMMtools::traitDependentBAMM()

Statistical tests: stats::cor.test() stats::wilcox.test() stats::kruskal.test() dunn.test::dunn.test()

For a guided tutorial, see this vignette: vignette("explore_STRAPP_test_types", package = "deepSTRAPP")

Examples

if (deepSTRAPP::is_dev_version())
{
 # ------ Prepare data ------ #

 ## Load the BAMM_object summarizing 1000 posterior samples of BAMM with diversification rates
 # for ponerine ants extracted for 10My ago.
 data(Ponerinae_BAMM_object_10My, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 # Plot the associated phylogeny with mapped rates
 plot_BAMM_rates(Ponerinae_BAMM_object_10My)

 ## Load the object containing head width trait data for ponerine ants extracted for 10My ago.
 data(Ponerinae_trait_cont_tip_data_10My, package = "deepSTRAPP")

 # Plot the associated contMap (continuous trait stochastic map)
 plot_contMap(Ponerinae_trait_cont_tip_data_10My$contMap)

 # Check that objects are ordered in the same fashion
 identical(names(Ponerinae_BAMM_object_10My$tipStates[[1]]),
           names(Ponerinae_trait_cont_tip_data_10My$trait_data))

 # Save continuous data
 trait_data_continuous <- Ponerinae_trait_cont_tip_data_10My

 ## Transform trait data into binary and multinominal data

 # Binarize data into two states
 trait_data_binary <- trait_data_continuous
 trait_data_binary$trait_data[trait_data_continuous$trait_data < 0] <- "state_A"
 trait_data_binary$trait_data[trait_data_continuous$trait_data >= 0] <- "state_B"
 trait_data_binary$trait_data_type <- "categorical"

 table(trait_data_binary$trait_data)

 # Categorize data into three states
 trait_data_multinominal <- trait_data_continuous
 trait_data_multinominal$trait_data[trait_data_continuous$trait_data < 0] <- "state_B"
 trait_data_multinominal$trait_data[trait_data_continuous$trait_data < -1] <- "state_A"
 trait_data_multinominal$trait_data[trait_data_continuous$trait_data >= 0] <- "state_C"
 trait_data_multinominal$trait_data_type <- "categorical"

 table(trait_data_multinominal$trait_data)

  # (May take several minutes to run)
 # ------ Compute STRAPP test for continuous data ------ #

 plot(x = trait_data_continuous$trait_data, y = Ponerinae_BAMM_object_10My$tipLambda[[1]])

 # Compute STRAPP test under the alternative hypothesis of a "negative" correlation
 # between "net_diversification" rates and trait data
 STRAPP_results <- compute_STRAPP_test_for_focal_time(
    BAMM_object = Ponerinae_BAMM_object_10My,
    trait_data_list = trait_data_continuous,
    two_tailed = FALSE,
    one_tailed_hypothesis = "negative",
    return_perm_data = TRUE)
 str(STRAPP_results, max.level = 2)
 # Data from the posterior samples is available in STRAPP_results$perm_data_df
 head(STRAPP_results$perm_data_df)

 # ------ Compute STRAPP test for binary data ------ #

 # Compute STRAPP test under the alternative hypothesis that "state_A" is associated
 # with higher "net_diversification" that "state_B"
 STRAPP_results <- compute_STRAPP_test_for_focal_time(
    BAMM_object = Ponerinae_BAMM_object_10My,
    trait_data_list = trait_data_binary,
    two_tailed = FALSE,
    one_tailed_hypothesis = c("state_A > state_B"))
 str(STRAPP_results, max.level = 1)

 # Compute STRAPP test under the alternative hypothesis that "state_B" is associated
 # with higher "net_diversification" that "state_A"
 STRAPP_results <- compute_STRAPP_test_for_focal_time(BAMM_object = Ponerinae_BAMM_object_10My,
    trait_data_list = trait_data_binary,
    two_tailed = FALSE,
    one_tailed_hypothesis = c("state_B > state_A"))
 str(STRAPP_results, max.level = 1)

 # ------ Compute STRAPP test for multinominal data ------ #

 # Compute STRAPP test between all three states, and compute post hoc tests
 # for differences in rates between all possible pairs of states
 # with a p-value adjusted for multiple comparison using Bonferroni's correction
 STRAPP_results <- compute_STRAPP_test_for_focal_time(
    BAMM_object = Ponerinae_BAMM_object_10My,
    trait_data_list = trait_data_multinominal,
    posthoc_pairwise_tests = TRUE,
    two_tailed = TRUE,
    p.adjust_method = "bonferroni")
 str(STRAPP_results, max.level = 3)
 # All post hoc pairwise test summaries are available in $summary_df
 STRAPP_results$posthoc_pairwise_tests$summary_df 
}


Cut the phylogeny and continuous trait mapping for a given focal time in the past

Description

Cuts off all the branches of the phylogeny which are younger than a specific time in the past (i.e. the focal_time). Branches overlapping the focal_time are shorten to the focal_time. Likewise, remove continuous trait mapping for the cut off branches by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Usage

cut_contMap_for_focal_time(contMap, focal_time, keep_tip_labels = TRUE)

Arguments

contMap

Object of class "contMap", typically generated with phytools::contMap(), that contains a phylogenetic tree and associated continuous trait mapping. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

focal_time

Numerical. The time, in terms of time distance from the present, for which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label. Default is TRUE.

Details

The phylogenetic tree is cut for a specific time in the past (i.e. the focal_time).

When a branch with a single descendant tip is cut and keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.

When a branch with a single descendant tip is cut and keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.

In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.

The continuous trait mapping is updated accordingly by removing mapping associated with the cut off branches.

Value

The function returns the cut contMap as an object of class "contMap". It contains a ⁠$tree⁠ element of classes "simmap" and "phylo". This function updates and adds multiple useful sub-elements to the ⁠$tree⁠ element.

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() extract_most_likely_trait_values_for_focal_time() extract_most_likely_trait_values_from_contMap_for_focal_time()

For a guided tutorial, see this vignette: vignette("cut_phylogenies", package = "deepSTRAPP")

Examples

# ----- Prepare data ----- #

# Load mammals phylogeny and data from the R package motmot (data included in deepSTRAPP)
# Initial data source: Slater, 2013; DOI: 10.1111/2041-210X.12084

data(mammals, package = "deepSTRAPP")

mammals_tree <- mammals$mammal.phy
mammals_data <- setNames(object = mammals$mammal.mass$mean,
                         nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]

# Run a stochastic mapping based on a Brownian Motion model
# for Ancestral Trait Estimates to obtain a "contMap" object
mammals_contMap <- phytools::contMap(mammals_tree, x = mammals_data,
                                     res = 100, # Number of time steps
                                     plot = FALSE)

# Set focal time
focal_time <- 80

# ----- Example 1: keep_tip_labels = TRUE ----- #

# Cut contMap to 80 Mya while keeping tip.label
# on terminal branches with a unique descending tip.
updated_contMap <- cut_contMap_for_focal_time(contMap = mammals_contMap,
                                              focal_time = focal_time,
                                              keep_tip_labels = TRUE)

# Plot node labels on initial stochastic map with cut-off
plot_contMap(mammals_contMap, lwd = 2, fsize = c(0.5, 1))
ape::nodelabels(cex = 0.5)
abline(v = max(phytools::nodeHeights(mammals_contMap$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot initial node labels on cut stochastic map
plot_contMap(updated_contMap, fsize = c(0.8, 1))
ape::nodelabels(cex = 0.8, text = updated_contMap$tree$initial_nodes_ID)

# ----- Example 2: keep_tip_labels = FALSE ----- #

# Cut contMap to 80 Mya while NOT keeping tip.label.
updated_contMap <- cut_contMap_for_focal_time(contMap = mammals_contMap,
                                              focal_time = focal_time,
                                              keep_tip_labels = FALSE)

# Plot node labels on initial stochastic map with cut-off
plot_contMap(mammals_contMap, fsize = c(0.5, 1))
ape::nodelabels(cex = 0.5)
abline(v = max(phytools::nodeHeights(mammals_contMap$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot initial node labels on cut stochastic map
plot_contMap(updated_contMap, fsize = c(0.8, 1))
ape::nodelabels(cex = 0.8, text = updated_contMap$tree$initial_nodes_ID)


Cut the phylogeny and posterior probability mapping of a categorical trait for a given focal time in the past

Description

Cuts off all the branches of the phylogeny which are younger than a specific time in the past (i.e. the focal_time). Branches overlapping the focal_time are shorten to the focal_time. Likewise, remove posterior probability mapping of the categorical trait for the cut off branches by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Usage

cut_densityMap_for_focal_time(densityMap, focal_time, keep_tip_labels = TRUE)

Arguments

densityMap

Object of class "densityMap", typically generated with phytools::densityMap(), that contains a phylogenetic tree and associated posterior probability mapping of a categorical trait. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

focal_time

Numerical. The time, in terms of time distance from the present, for which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label. Default is TRUE.

Details

The phylogenetic tree is cut for a specific time in the past (i.e. the focal_time).

When a branch with a single descendant tip is cut and keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.

When a branch with a single descendant tip is cut and keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.

In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.

The posterior probability mapping of a categorical trait is updated accordingly by removing mapping associated with the cut off branches.

Value

The function returns the cut densityMap as an object of class "densityMap" with three elements.

It contains a ⁠$tree⁠ element of classes "simmap" and "phylo". This function updates and adds multiple useful sub-elements to the ⁠$tree⁠ element. * ⁠$maps⁠ An updated list of named numerical vectors. Provides the mapping of posterior probability of the state along each remaining edge. * ⁠$mapped.edge⁠ An updated matrix. Provides the evolutionary time spent across posterior probabilities (columns) along the remaining edges (rows). * ⁠$root_age⁠ Integer. Stores the age of the root of the tree. * ⁠$nodes_ID_df⁠ Data.frame with two columns. Provides the conversion from the new_node_ID to the initial_node_ID. Each row is a node. * ⁠$initial_nodes_ID⁠ Vector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels with ape::nodelabels(). * ⁠$edges_ID_df⁠ Data.frame with two columns. Provides the conversion from the new_edge_ID to the initial_edge_ID. Each row is an edge/branch. * ⁠$initial_edges_ID⁠ Vector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels with ape::edgelabels().

The ⁠$col⁠ element describes the colors used to map each possible posterior probability value from 0 to 1000.

The ⁠$states⁠ element provide the name of the states. Here, the first value is the absence of the state labeled as "Not X" with X being the state. The second value is the name of the state.

High posterior probability reflects high likelihood to harbor the state. Low probability reflects high likelihood to NOT harbor the state.

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() extract_most_likely_trait_values_for_focal_time() extract_most_likely_states_from_densityMaps_for_focal_time()

For a guided tutorial, see this vignette: vignette("cut_phylogenies", package = "deepSTRAPP")

Examples

# ----- Prepare data ----- #

# Load mammals phylogeny and data from the R package motmot included within deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")

# Obtain mammal tree
mammals_tree <- mammals$mammal.phy
# Convert mass data into categories
mammals_mass <- setNames(object = mammals$mammal.mass$mean,
                         nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
mammals_data <- mammals_mass
mammals_data[seq_along(mammals_data)] <- "small"
mammals_data[mammals_mass > 5] <- "medium"
mammals_data[mammals_mass > 10] <- "large"
table(mammals_data)

 # (May take several minutes to run)
# Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
mammals_cat_data <- prepare_trait_data(tip_data = mammals_data, phylo = mammals_tree,
                                       trait_data_type = "categorical",
                                       evolutionary_models = "ER",
                                       nb_simulations = 100,
                                       plot_map = FALSE)

# Set focal time
focal_time <- 80

# Extract the density map for small mammals (state 3 = "small")
mammals_densityMap_small <- mammals_cat_data$densityMaps[[3]]

# ----- Example 1: keep_tip_labels = TRUE ----- #

# Cut densityMap to 80 Mya while keeping tip.label
# on terminal branches with a unique descending tip.
updated_mammals_densityMap_small <- cut_densityMap_for_focal_time(
     densityMap = mammals_densityMap_small,
     focal_time = focal_time,
     keep_tip_labels = TRUE)

# Plot node labels on initial stochastic map with cut-off
phytools::plot.densityMap(mammals_densityMap_small, fsize = 0.7, lwd = 2)
ape::nodelabels(cex = 0.7)
abline(v = max(phytools::nodeHeights(mammals_densityMap_small$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot initial node labels on cut stochastic map
phytools::plot.densityMap(updated_mammals_densityMap_small, fsize = 0.8)
ape::nodelabels(cex = 0.8, text = updated_mammals_densityMap_small$tree$initial_nodes_ID)

# ----- Example 2: keep_tip_labels = FALSE ----- #

# Cut densityMap to 80 Mya while NOT keeping tip.label
updated_mammals_densityMap_small <- cut_densityMap_for_focal_time(
     densityMap = mammals_densityMap_small,
     focal_time = focal_time,
     keep_tip_labels = FALSE)

# Plot node labels on initial stochastic map with cut-off
phytools::plot.densityMap(mammals_densityMap_small, fsize = 0.7, lwd = 2)
ape::nodelabels(cex = 0.7)
abline(v = max(phytools::nodeHeights(mammals_densityMap_small$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot initial node labels on cut stochastic map
phytools::plot.densityMap(updated_mammals_densityMap_small, fsize = 0.8)
ape::nodelabels(cex = 0.8, text = updated_mammals_densityMap_small$tree$initial_nodes_ID) 


Cut phylogenies and posterior probability mapping of each state for a given focal time in the past

Description

Cuts off all the branches of the phylogeny which are younger than a specific time in the past (i.e. the focal_time). Branches overlapping the focal_time are shorten to the focal_time. Likewise, remove posterior probability mapping of the categorical trait for the cut off branches by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Usage

cut_densityMaps_for_focal_time(densityMaps, focal_time, keep_tip_labels = TRUE)

Arguments

densityMaps

List of objects of class "densityMap", typically generated with prepare_trait_data(). Each densityMap (see phytools::densityMap()) contains a phylogenetic tree and associated posterior probability mapping of a categorical trait. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

focal_time

Numerical. The time, in terms of time distance from the present, for which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label. Default is TRUE.

Details

The phylogenetic tree is cut for a specific time in the past (i.e. the focal_time).

When a branch with a single descendant tip is cut and keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.

When a branch with a single descendant tip is cut and keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.

In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.

The continuous trait mapping is updated accordingly by removing mapping associated with the cut off branches.

Value

The function returns an updated list of objects as cut densityMaps of class "densityMap".

Each densityMap object contains three elements:

High posterior probability reflects high likelihood to harbor the state. Low probability reflects high likelihood to NOT harbor the state.

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() cut_densityMap_for_focal_time() extract_most_likely_trait_values_for_focal_time() extract_most_likely_states_from_densityMaps_for_focal_time()

Examples

# ----- Prepare data ----- #

# Load mammals phylogeny and data from the R package motmot, and implemented in deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")

# Obtain mammal tree
mammals_tree <- mammals$mammal.phy
# Convert mass data into categories
mammals_mass <- setNames(object = mammals$mammal.mass$mean,
                         nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
mammals_data <- mammals_mass
mammals_data[seq_along(mammals_data)] <- "small"
mammals_data[mammals_mass > 5] <- "medium"
mammals_data[mammals_mass > 10] <- "large"
table(mammals_data)

 # (May take several minutes to run)
# Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
mammals_cat_data <- prepare_trait_data(tip_data = mammals_data, phylo = mammals_tree,
                                       trait_data_type = "categorical",
                                       evolutionary_models = "ER",
                                       nb_simulations = 100,
                                       plot_map = FALSE)

# Set focal time
focal_time <- 80

# Extract the density maps
mammals_densityMaps <- mammals_cat_data$densityMaps

# ----- Example 1: keep_tip_labels = TRUE ----- #

# Cut densityMaps to 80 Mya while keeping tip.label
# on terminal branches with a unique descending tip.
updated_mammals_densityMaps <- cut_densityMaps_for_focal_time(
    densityMaps = mammals_densityMaps,
    focal_time = focal_time,
    keep_tip_labels = TRUE)

## Plot density maps as overlay of all state posterior probabilities
# ?plot_densityMaps_overlay

# Plot initial density maps
plot_densityMaps_overlay(densityMaps = mammals_densityMaps, fsize = 0.5)
abline(v = max(phytools::nodeHeights(mammals_densityMaps[[1]]$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated/cut density maps
plot_densityMaps_overlay(densityMaps = updated_mammals_densityMaps, fsize = 0.8)

# ----- Example 2: keep_tip_labels = FALSE ----- #

# Cut densityMap to 80 Mya while NOT keeping tip.label
updated_mammals_densityMaps <- cut_densityMaps_for_focal_time(
    densityMaps = mammals_densityMaps,
    focal_time = focal_time,
    keep_tip_labels = FALSE)

# Plot initial density maps
plot_densityMaps_overlay(densityMaps = mammals_densityMaps, fsize = 0.5)
abline(v = max(phytools::nodeHeights(mammals_densityMaps[[1]]$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated/cut density maps
plot_densityMaps_overlay(densityMaps = updated_mammals_densityMaps, fsize = 0.8) 


Cut the phylogeny for a given time in the past

Description

Cuts off all the branches of the phylogeny which are younger than a specific time in the past (i.e. the focal_time). Branches overlapping the focal_time are shorten to the focal_time.

Usage

cut_phylo_for_focal_time(tree, focal_time, keep_tip_labels = TRUE)

Arguments

tree

Object of class "phylo". The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

focal_time

Numerical. The time, in terms of time distance from the present, for which the tree must be cut. It must be smaller than the root age of the phylogeny.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label. Default is TRUE.

Details

When a branch with a single descendant tip is cut and keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.

When a branch with a single descendant tip is cut and keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.

In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.

Value

The function returns the cut phylogeny as an object of class "phylo". It adds multiple useful elements to the object.

Author(s)

Maël Doré

See Also

cut_contMap_for_focal_time() cut_densityMaps_for_focal_time()

For a guided tutorial, see this vignette: vignette("cut_phylogenies", package = "deepSTRAPP")

Examples

# Load eel phylogeny from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
data("eel.tree", package = "phytools")

# ----- Example 1: keep_tip_labels = TRUE ----- #

# Cut tree to 30 Mya while keeping tip.label on terminal branches with a unique descending tip.
cut_eel.tree <- cut_phylo_for_focal_time(tree = eel.tree, focal_time = 30, keep_tip_labels = TRUE)

# Plot internal node labels on initial tree with cut-off
plot(eel.tree, cex = 0.5)
abline(v = max(phytools::nodeHeights(eel.tree)[,2]) - 30, col = "red", lty = 2, lwd = 2)
nb_tips <- length(eel.tree$tip.label)
nodelabels_in_cut_tree <- (nb_tips + 1):(nb_tips + eel.tree$Nnode)
nodelabels_in_cut_tree[!(nodelabels_in_cut_tree %in% cut_eel.tree$initial_nodes_ID)] <- NA
ape::nodelabels(text = nodelabels_in_cut_tree)

# Plot initial internal node labels on cut tree
plot(cut_eel.tree, cex = 0.8)
ape::nodelabels(text = cut_eel.tree$initial_nodes_ID)

# Plot edge labels on initial tree with cut-off
plot(eel.tree, cex = 0.5)
abline(v = max(phytools::nodeHeights(eel.tree)[,2]) - 30, col = "red", lty = 2, lwd = 2)
edgelabels_in_cut_tree <- 1:nrow(eel.tree$edge)
edgelabels_in_cut_tree[!(1:nrow(eel.tree$edge) %in% cut_eel.tree$initial_edges_ID)] <- NA
ape::edgelabels(text = edgelabels_in_cut_tree)

# Plot initial edge labels on cut tree
plot(cut_eel.tree, cex = 0.8)
ape::edgelabels(text = cut_eel.tree$initial_edges_ID)

# ----- Example 2: keep_tip_labels = FALSE ----- #

# Cut tree to 30 Mya without keeping tip.label on terminal branches with a unique descending tip.
# All tip.labels are converted to their descending/tipward node ID
cut_eel.tree <- cut_phylo_for_focal_time(tree = eel.tree, focal_time = 30, keep_tip_labels = FALSE)
plot(cut_eel.tree, cex = 0.8)


Data summarizing the evolution of geographic ranges in eels

Description

A list containing (fake) geographic ranges data of eels mapped on the phylogeny, modeled with R package BioGeoBEARS. This object was obtained with prepare_trait_data(). Initial data based on feeding habits was altered to be transformed into range "A" and "B", and then adding arbitrarily multi-area "AB" ranges. This is NOT real biogeographic data. Please refer to the initial article for real data.

Original data source: Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505

Usage

data(eel_biogeo_data)

Format

A list with 9 elements.

Details

A list of 9 elements containing information on the evolution of geographic ranges in eels. This object was obtained with prepare_trait_data().

References

Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505

See Also

prepare_trait_data()


Data summarizing the evolution of feeding habits in eels using a 3-level factor as categorical trait

Description

A list containing feeding habits data of eels mapped on the phylogeny, modeled with geiger::fitDiscrete. This object was obtained with prepare_trait_data(). Initial data was altered arbitrarily to create three categories, adding a "kiss" feeding habit to the initial "bite" and "suction" data. This is NOT real biological data. Please refer to the initial article for real data.

Original data source: Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505

Usage

data(eel_cat_3lvl_data)

Format

A list with 6 elements.

Details

A list of five objects containing information on the evolution of feeding habits in eels. This object was obtained with prepare_trait_data().

References

Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505

See Also

prepare_trait_data()


Extract diversification data from a BAMM_object

Description

Extracts regimes ID and tip rates from a BAMM_object that have been updated to provide diversification data for a specific time in the past (i.e. the focal_time). Use update_rates_and_regimes_for_focal_time() to obtain a BAMM_object updated for a given focal_time.

Usage

extract_diversification_data_melted_df_for_focal_time(
  BAMM_object,
  verbose = TRUE
)

Arguments

BAMM_object

Object of class "bammdata", typically generated with update_rates_and_regimes_for_focal_time(), that contains a phylogenetic tree and associated diversification rate mapping across selected posterior samples.

verbose

Logical. Should progression be displayed? A message will be printed for every batch of 100 BAMM posterior samples extracted. Default is TRUE.

Value

Returns a data.frame with six columns.

Author(s)

Maël Doré

Examples


if (deepSTRAPP::is_dev_version())
{
 # Load the BAMM_object summarizing 1000 posterior samples of BAMM updated for a focal_time of 10 My
 data(Ponerinae_BAMM_object_10My)
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

  # (May take several minutes to run)
 # Extract diversification data
 diversification_data_df <- extract_diversification_data_melted_df_for_focal_time(
     BAMM_object = Ponerinae_BAMM_object_10My,
     verbose = TRUE)
 # Print output
 head(diversification_data_df) 
}


Extract biogeographic range data mapped on a phylogeny at a given time in the past

Description

Extracts the most likely ranges found along branches at a specific time in the past (i.e. the focal_time). Optionally, the function can update the mapped phylogeny (densityMaps) such as branches overlapping the focal_time are shorten to the focal_time, and the range mapping for the cut off branches are removed by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Usage

extract_most_likely_ranges_from_densityMaps_for_focal_time(
  densityMaps,
  ace = NULL,
  tip_data = NULL,
  focal_time,
  update_densityMaps = FALSE,
  keep_tip_labels = TRUE
)

Arguments

densityMaps

List of objects of class "densityMap", typically generated with prepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given range along branches. Each object (i.e., densityMap) corresponds to a range. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

ace

(Optional) Numerical matrix that record the posterior probabilities of ancestral ranges at internal nodes, obtained with prepare_trait_data() as output in the ⁠$ace⁠ slot. Rows are internal nodes_ID. Columns are ranges. Values are posterior probabilities of each range per node. Needed to provide accurate estimates of ancestral ranges.

tip_data

(Optional) Named character string vector of tip ranges. Names are nodes_ID of the internal nodes. Needed to provide accurate tip values.

focal_time

Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny.

update_densityMaps

Logical. Specify whether the mapped phylogeny (densityMaps) provided as input should be updated for visualization and returned among the outputs. Default is FALSE. The update consists in cutting off branches and mapping that are younger than the focal_time.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label on the updated densityMaps. Default is TRUE. Used only if update_map = TRUE.

Details

The mapped phylogeny (densityMaps) is cut at a specific time in the past (i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.

—– Extract trait_data —–

Most likely ranges are extracted from the posterior probabilities displayed in the densityMaps. The range with the highest probability is assigned to each tip and cut branches at focal_time.

True ML estimates will be used if tip_data and/or ace are provided as optional inputs. In practice the discrepancy is negligible.

—– Update the densityMaps —–

To obtain updated densityMaps alongside the trait data, set update_densityMaps = TRUE. The update consists in cutting off branches and mapping that are younger than the focal_time.

The ancestral range mapping in densityMap (⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠) is updated accordingly by removing mapping associated with the cut off branches.

Value

By default, the function returns a list with three elements.

If update_densityMaps = TRUE, the output is a list with four elements: ⁠$trait_data⁠, ⁠$focal_time⁠, ⁠$trait_data_type⁠, and ⁠$densityMaps⁠.

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() cut_densityMaps_for_focal_time()

Associated main function: extract_most_likely_trait_values_for_focal_time()

Sub-functions for other types of trait data:

extract_most_likely_trait_values_from_contMap_for_focal_time() extract_most_likely_states_from_densityMaps_for_focal_time()

Examples


## Load biogeographic range data mapped on a phylogeny
data(eel_biogeo_data, package = "deepSTRAPP")

# Explore data
str(eel_biogeo_data, 1)
eel_biogeo_data$densityMaps # Two density maps: one per unique area: A, B.
eel_biogeo_data$densityMaps_all_ranges # Three density maps: one per range: A, B, and AB.

# Set focal time to 10 Mya
focal_time <- 10

# ----- Example 1: Using only unique areas ----- #

 # (May take several minutes to run)
## Extract trait data and update densityMaps for the given focal_time

# Extract from the densityMaps
eel_biogeo_data_10My <- extract_most_likely_ranges_from_densityMaps_for_focal_time(
   densityMaps = eel_biogeo_data$densityMaps,
   # ace = eel_biogeo_data$ace,
   focal_time = focal_time,
   update_densityMaps = TRUE)

## Print trait data
str(eel_biogeo_data_10My, 1)
eel_biogeo_data_10My$trait_data

## Plot density maps as overlay of all range posterior probabilities

# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_biogeo_data$densityMaps, fsize = 0.7)
abline(v = max(phytools::nodeHeights(eel_biogeo_data$densityMaps[[1]]$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_biogeo_data_10My$densityMaps, fsize = 0.7) 

# ----- Example 2: Using all ranges ----- #

 # (May take several minutes to run)
## Extract trait data and update densityMaps_all_ranges for the given focal_time

# Extract from the densityMaps
eel_biogeo_data_10My <- extract_most_likely_ranges_from_densityMaps_for_focal_time(
  densityMaps = eel_biogeo_data$densityMaps_all_ranges,
  # ace = eel_biogeo_data$ace_all_ranges,
  focal_time = focal_time,
  update_densityMaps = TRUE)

## Print trait data
str(eel_biogeo_data_10My, 1)
eel_biogeo_data_10My$trait_data

## Plot density maps as overlay of all range posterior probabilities

# Plot initial density maps with ACE pies
root_age <- max(phytools::nodeHeights(eel_biogeo_data$densityMaps_all_ranges[[1]]$tree)[,2])
plot_densityMaps_overlay(densityMaps = eel_biogeo_data$densityMaps_all_ranges, fsize = 0.7)
abline(v =  root_age - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_biogeo_data_10My$densityMaps, fsize = 0.7) 


Extract categorical trait data mapped on a phylogeny at a given time in the past

Description

Extracts the most likely states found along branches at a specific time in the past (i.e. the focal_time). Optionally, the function can update the mapped phylogeny (densityMaps) such as branches overlapping the focal_time are shorten to the focal_time, and the trait mapping for the cut off branches are removed by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Usage

extract_most_likely_states_from_densityMaps_for_focal_time(
  densityMaps,
  ace = NULL,
  tip_data = NULL,
  focal_time,
  update_densityMaps = FALSE,
  keep_tip_labels = TRUE
)

Arguments

densityMaps

List of objects of class "densityMap", typically generated with prepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given state along branches. Each object (i.e., densityMap) corresponds to a state. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

ace

(Optional) Numerical matrix that record the posterior probabilities of ancestral states at internal nodes, obtained with prepare_trait_data() as output in the ⁠$ace⁠ slot. Rows are internal nodes_ID. Columns are states. Values are posterior probabilities of each state per node. Needed to provide accurate estimates of ancestral states.

tip_data

(Optional) Named character string vector of tip states. Names are nodes_ID of the internal nodes. Needed to provide accurate tip values.

focal_time

Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny.

update_densityMaps

Logical. Specify whether the mapped phylogeny (densityMaps) provided as input should be updated for visualization and returned among the outputs. Default is FALSE. The update consists in cutting off branches and mapping that are younger than the focal_time.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label on the updated densityMaps. Default is TRUE. Used only if update_map = TRUE.

Details

The mapped phylogeny (densityMaps) is cut at a specific time in the past (i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.

—– Extract trait_data —–

Most likely states are extracted from the posterior probabilities displayed in the densityMaps. The state with the highest probability is assigned to each tip and cut branches at focal_time.

True ML estimates will be used if tip_data and/or ace are provided as optional inputs. In practice the discrepancy is negligible.

—– Update the densityMaps —–

To obtain updated densityMaps alongside the trait data, set update_densityMaps = TRUE. The update consists in cutting off branches and mapping that are younger than the focal_time.

The categorical trait mapping in densityMap (⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠) is updated accordingly by removing mapping associated with the cut off branches.

Value

By default, the function returns a list with three elements.

If update_densityMaps = TRUE, the output is a list with four elements: ⁠$trait_data⁠, ⁠$focal_time⁠, ⁠$trait_data_type⁠, and ⁠$densityMaps⁠.

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() cut_densityMaps_for_focal_time()

Associated main function: extract_most_likely_trait_values_for_focal_time()

Sub-functions for other types of trait data:

extract_most_likely_trait_values_from_contMap_for_focal_time() extract_most_likely_ranges_from_densityMaps_for_focal_time()

Examples

# ----- Example 1: Only extent taxa (Ultrametric tree) ----- #

## Load categorical trait data mapped on a phylogeny
data(eel_cat_3lvl_data, package = "deepSTRAPP")

# Explore data
str(eel_cat_3lvl_data, 1)
eel_cat_3lvl_data$densityMaps # Three density maps: one per state

# Set focal time to 10 Mya
focal_time <- 10

 # (May take several minutes to run)
## Extract trait data and update densityMaps for the given focal_time

# Extract from the densityMaps
eel_cat_3lvl_data_10My <- extract_most_likely_states_from_densityMaps_for_focal_time(
   densityMaps = eel_cat_3lvl_data$densityMaps,
   # ace = eel_cat_3lvl_data$ace,
   focal_time = focal_time,
   update_densityMaps = TRUE)

## Print trait data
str(eel_cat_3lvl_data_10My, 1)
eel_cat_3lvl_data_10My$trait_data

## Plot density maps as overlay of all state posterior probabilities

# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_cat_3lvl_data$densityMaps, fsize = 0.5)
abline(v = max(phytools::nodeHeights(eel_cat_3lvl_data$densityMaps[[1]]$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_cat_3lvl_data_10My$densityMaps, fsize = 0.7) 


# ----- Example 2: Include fossils (Non-ultrametric tree) ----- #
## Test with non-ultrametric trees like mammals in motmot

## Prepare data

# Load mammals phylogeny and data from the R package motmot included within deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")

# Obtain mammal tree
mammals_tree <- mammals$mammal.phy
# Convert mass data into categories
mammals_mass <- setNames(object = mammals$mammal.mass$mean,
                         nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
mammals_data <- mammals_mass
mammals_data[seq_along(mammals_data)] <- "small"
mammals_data[mammals_mass > 5] <- "medium"
mammals_data[mammals_mass > 10] <- "large"
table(mammals_data)

 # (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
mammals_cat_data <- prepare_trait_data(tip_data = mammals_data, phylo = mammals_tree,
                                       trait_data_type = "categorical",
                                       evolutionary_models = "ER",
                                       nb_simulations = 100,
                                       plot_map = FALSE)

# Set focal time
focal_time <- 80

## Extract trait data and update densityMaps for the given focal_time

# Extract from the densityMaps
mammals_cat_data_80My <- extract_most_likely_states_from_densityMaps_for_focal_time(
    densityMaps = mammals_cat_data$densityMaps,
    focal_time = focal_time,
    update_densityMaps = TRUE)

## Print trait data
str(mammals_cat_data_80My, 1)
mammals_cat_data_80My$trait_data

## Plot density maps as overlay of all state posterior probabilities

# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = mammals_cat_data$densityMaps, fsize = 0.7)
abline(v = max(phytools::nodeHeights(mammals_cat_data$densityMaps[[1]]$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(mammals_cat_data_80My$densityMaps, fsize = 0.8) 


Extract trait data mapped on a phylogeny at a given time in the past

Description

Extracts the most likely trait values found along branches at a specific time in the past (i.e. the focal_time). Optionally, the function can update the mapped phylogeny (contMap or densityMaps) such as branches overlapping the focal_time are shorten to the focal_time, and the trait mapping for the cut off branches are removed by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Usage

extract_most_likely_trait_values_for_focal_time(
  contMap = NULL,
  densityMaps = NULL,
  ace = NULL,
  tip_data = NULL,
  trait_data_type,
  focal_time,
  update_map = FALSE,
  keep_tip_labels = TRUE
)

Arguments

contMap

For continuous trait data. Object of class "contMap", typically generated with prepare_trait_data() or phytools::contMap(), that contains a phylogenetic tree and associated continuous trait mapping. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

densityMaps

For categorical trait or biogeographic data. List of objects of class "densityMap", typically generated with prepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given state/range along branches. Each object (i.e., densityMap) corresponds to a state/range. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

ace

(Optional) Ancestral Character Estimates (ACE) at the internal nodes. Obtained with prepare_trait_data() as output in the ⁠$ace⁠ slot.

  • For continuous trait data: Named numerical vector typically generated with phytools::fastAnc(), phytools::anc.ML(), or ape::ace(). Names are nodes_ID of the internal nodes. Values are ACE of the trait.

  • For categorical trait or biogeographic data: Matrix that record the posterior probabilities of ancestral states/ranges. Rows are internal nodes_ID. Columns are states/ranges. Values are posterior probabilities of each state per node. Needed in all cases to provide accurate estimates of trait values.

tip_data

(Optional) Named vector of tip values of the trait.

  • For continuous trait data: Named numerical vector of trait values.

  • For categorical trait or biogeographic data: Character string vector of states/ranges Names are nodes_ID of the internal nodes. Needed to provide accurate tip values.

trait_data_type

Character string. Specify the type of trait data. Must be one of "continuous", "categorical", "biogeographic".

focal_time

Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny.

update_map

Logical. Specify whether the mapped phylogeny (contMap or densityMaps) provided as input should be updated for visualization and returned among the outputs. Default is FALSE. The update consists in cutting off branches and mapping that are younger than the focal_time.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label on the updated contMap. Default is TRUE. Used only if update_map = TRUE.

Details

The mapped phylogeny (contMap or densityMaps) is cut at a specific time in the past (i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.

—– Extract trait_data —–

For continuous trait data:

If providing only the contMap trait values at tips and internal nodes will be extracted from the mapping of the contMap leading to a slight discrepancy with the actual tip data and estimated ancestral character values.

True ML trait estimates will be used if tip_data and/or ace are provided as optional inputs. In practice the discrepancy is negligible.

For categorical trait and biogeographic data:

Most likely states/ranges are extracted from the posterior probabilities displayed in the densityMaps. The states/ranges with the highest probability is assigned to each tip and cut branches at focal_time.

True ML states/ranges will be used if tip_data and/or ace are provided as optional inputs. In practice the discrepancy is negligible.

—– Update the contMap/densityMaps —–

To obtain an updated contMap/densityMaps alongside the trait data, set update_map = TRUE. The update consists in cutting off branches and mapping that are younger than the focal_time.

The mapping in contMap/densityMaps (⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠) is updated accordingly by removing mapping associated with the cut off branches.

A specific sub-function (that can be used independently) is called according to the type of trait data:

Value

By default, the function returns a list with three elements.

If update_map = TRUE, the output is a list with four elements: ⁠$trait_data⁠, ⁠$focal_time⁠, ⁠$trait_data_type⁠, and ⁠$contMap⁠ or ⁠$densityMaps⁠.

For continuous trait data:

For categorical trait and biogeographic data:

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() cut_contMap_for_focal_time() cut_densityMaps_for_focal_time()

Associated sub-functions per type of trait data:

extract_most_likely_trait_values_from_contMap_for_focal_time() extract_most_likely_states_from_densityMaps_for_focal_time() extract_most_likely_ranges_from_densityMaps_for_focal_time()

Examples

# ----- Example 1: Continuous trait ----- #

## Prepare data

# Load eel data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505

library(phytools)
data(eel.tree)
data(eel.data)

# Extract body size
eel_data <- setNames(eel.data$Max_TL_cm,
                     rownames(eel.data))

 # (May take several minutes to run)
## Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
eel_ACE <- phytools::fastAnc(tree = eel.tree, x = eel_data)

## Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
eel_contMap <- phytools::contMap(eel.tree, x = eel_data,
                                 res = 100, # Number of time steps
                                 plot = FALSE)

# Set focal time to 50 Mya
focal_time <- 50

## Extract trait data and update contMap for the given focal_time

# Extract from the contMap (values are not exact ML estimates)
eel_cont_50 <- extract_most_likely_trait_values_for_focal_time(
   contMap = eel_contMap,
   trait_data_type = "continuous",
   focal_time = focal_time,
   update_map = TRUE)
# Extract from tip data and ML estimates of ancestral characters (values are true ML estimates)
eel_cont_50 <- extract_most_likely_trait_values_for_focal_time(
   contMap = eel_contMap,
   ace = eel_ACE, tip_data = eel_data,
   trait_data_type = "continuous",
   focal_time = focal_time,
   update_map = TRUE)

## Visualize outputs

# Print trait data
eel_cont_50$trait_data

# Plot node labels on initial stochastic map with cut-off
plot(eel_contMap, fsize = c(0.5, 1))
ape::nodelabels()
abline(v = max(phytools::nodeHeights(eel_contMap$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated contMap with initial node labels
plot(eel_cont_50$contMap)
ape::nodelabels(text = eel_cont_50$contMap$tree$initial_nodes_ID) 


# ----- Example 2: Categorical trait ----- #

 # (May take several minutes to run)
## Load categorical trait data mapped on a phylogeny
data(eel_cat_3lvl_data, package = "deepSTRAPP")

# Explore data
str(eel_cat_3lvl_data, 1)
eel_cat_3lvl_data$densityMaps # Three density maps: one per state

# Set focal time to 10 Mya
focal_time <- 10

## Extract trait data and update densityMaps for the given focal_time

# Extract from the densityMaps
eel_cat_3lvl_data_10My <- extract_most_likely_trait_values_for_focal_time(
   densityMaps = eel_cat_3lvl_data$densityMaps,
   trait_data_type = "categorical",
   focal_time = focal_time,
   update_map = TRUE)

## Print trait data
str(eel_cat_3lvl_data_10My, 1)
eel_cat_3lvl_data_10My$trait_data

## Plot density maps as overlay of all state posterior probabilities

# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_cat_3lvl_data$densityMaps)
abline(v = max(phytools::nodeHeights(eel_cat_3lvl_data$densityMaps[[1]]$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_cat_3lvl_data_10My$densityMaps) 


# ----- Example 3: Biogeographic ranges ----- #

 # (May take several minutes to run)
## Load biogeographic range data mapped on a phylogeny
data(eel_biogeo_data, package = "deepSTRAPP")

# Explore data
str(eel_biogeo_data, 1)
eel_biogeo_data$densityMaps # Two density maps: one per unique area: A, B.
eel_biogeo_data$densityMaps_all_ranges # Three density maps: one per range: A, B, and AB.

# Set focal time to 10 Mya
focal_time <- 10

## Extract trait data and update densityMaps for the given focal_time

# Extract from the densityMaps
eel_biogeo_data_10My <- extract_most_likely_trait_values_for_focal_time(
   densityMaps = eel_biogeo_data$densityMaps,
   # ace = eel_biogeo_data$ace,
   trait_data_type = "biogeographic",
   focal_time = focal_time,
   update_map = TRUE)

## Print trait data
str(eel_biogeo_data_10My, 1)
eel_biogeo_data_10My$trait_data

## Plot density maps as overlay of all range posterior probabilities

# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_biogeo_data$densityMaps)
abline(v = max(phytools::nodeHeights(eel_biogeo_data$densityMaps[[1]]$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_biogeo_data_10My$densityMaps) 


Extract continuous trait data mapped on a phylogeny at a given time in the past

Description

Extracts the most likely trait values found along branches at a specific time in the past (i.e. the focal_time). Optionally, the function can update the mapped phylogeny (contMap) such as branches overlapping the focal_time are shorten to the focal_time, and the continuous trait mapping for the cut off branches are removed by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Usage

extract_most_likely_trait_values_from_contMap_for_focal_time(
  contMap,
  ace = NULL,
  tip_data = NULL,
  focal_time,
  update_contMap = FALSE,
  keep_tip_labels = TRUE
)

Arguments

contMap

Object of class "contMap", typically generated with prepare_trait_data() or phytools::contMap(), that contains a phylogenetic tree and associated continuous trait mapping. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

ace

Named numeric vector (Optional). Ancestral Character Estimates (ACE) of the internal nodes, typically generated with phytools::fastAnc(), phytools::anc.ML(), or ape::ace(). Names are nodes_ID of the internal nodes. Values are ACE of the trait. Needed to provide accurate estimates of trait values.

tip_data

Named numeric vector (Optional). Tip values of the trait. Names are nodes_ID of the internal nodes. Needed to provide accurate tip values.

focal_time

Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny.

update_contMap

Logical. Specify whether the mapped phylogeny (contMap) provided as input should be updated for visualization and returned among the outputs. Default is FALSE. The update consists in cutting off branches and mapping that are younger than the focal_time.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label on the updated contMap. Default is TRUE. Used only if update_contMap = TRUE.

Details

The mapped phylogeny (contMap) is cut at a specific time in the past (i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.

—– Extract trait_data —–

If providing only the contMap trait values at tips and internal nodes will be extracted from the mapping of the contMap leading to a slight dependency with the actual tip data and estimated ancestral character values.

True ML estimates will be used if tip_data and/or ace are provided as optional inputs. In practice the discrepancy is negligible.

—– Update the contMap —–

To obtain an updated contMap alongside the trait data, set update_contMap = TRUE. The update consists in cutting off branches and mapping that are younger than the focal_time.

The continuous trait mapping contMap (⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠) is updated accordingly by removing mapping associated with the cut off branches.

Value

By default, the function returns a list with three elements.

If update_contMap = TRUE, the output is a list with four elements: ⁠$trait_data⁠, ⁠$focal_time⁠, ⁠$trait_data_type⁠, and ⁠$contMap⁠.

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() cut_contMap_for_focal_time()

Associated main function: extract_most_likely_trait_values_for_focal_time()

Sub-functions for other types of trait data:

extract_most_likely_states_from_densityMaps_for_focal_time() extract_most_likely_ranges_from_densityMaps_for_focal_time()

Examples

# ----- Example 1: Only extent taxa (Ultrametric tree) ----- #

## Prepare data

# Load eel data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505

library(phytools)
data(eel.tree)
data(eel.data)

# Extract body size
eel_data <- setNames(eel.data$Max_TL_cm,
                     rownames(eel.data))

# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
eel_ACE <- phytools::fastAnc(tree = eel.tree, x = eel_data)

# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
eel_contMap <- phytools::contMap(eel.tree, x = eel_data,
                                 res = 100, # Number of time steps
                                 plot = FALSE)

# Set focal time to 50 Mya
focal_time <- 50

 # (May take several minutes to run)
## Extract trait data and update contMap for the given focal_time

# Extract from the contMap (values are not exact ML estimates)
eel_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
   contMap = eel_contMap,
   focal_time = focal_time,
   update_contMap = TRUE)
# Extract from tip data and ML estimates of ancestral characters (values are true ML estimates)
eel_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
   contMap = eel_contMap,
   ace = eel_ACE, tip_data = eel_data,
   focal_time = focal_time,
   update_contMap = TRUE)

## Visualize outputs

# Print trait data
eel_test$trait_data

# Plot node labels on initial stochastic map with cut-off
plot(eel_contMap, fsize = c(0.5, 1))
ape::nodelabels()
abline(v = max(phytools::nodeHeights(eel_contMap$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated contMap with initial node labels
plot(eel_test$contMap)
ape::nodelabels(text = eel_test$contMap$tree$initial_nodes_ID) 

# ----- Example 2: Include fossils (Non-ultrametric tree) ----- #

## Test with non-ultrametric trees like mammals in motmot

## Prepare data

# Load mammals phylogeny and data from the R package motmot included within deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")

mammals_tree <- mammals$mammal.phy
mammals_data <- setNames(object = mammals$mammal.mass$mean,
                         nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]

# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
mammals_ACE <- phytools::fastAnc(tree = mammals_tree, x = mammals_data)

# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
mammals_contMap <- phytools::contMap(mammals_tree, x = mammals_data,
                                     res = 100, # Number of time steps
                                     plot = FALSE)

# Set focal time to 80 Mya
focal_time <- 80

 # (May take several minutes to run)
## Extract trait data and update contMap for the given focal_time

# Extract from the contMap (values are not exact ML estimates)
mammals_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
   contMap = mammals_contMap,
   focal_time = focal_time,
   update_contMap = TRUE)
# Extract from tip data and ML estimates of ancestral characters (values are true ML)
mammals_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
   contMap = mammals_contMap,
   ace = mammals_ACE, tip_data = mammals_data,
   focal_time = focal_time,
   update_contMap = TRUE)

## Visualize outputs

# Print trait data
mammals_test$trait_data

# Plot node labels on initial stochastic map with cut-off
phytools::plot.contMap(mammals_contMap, fsize = c(0.5, 1))
ape::nodelabels()
abline(v = max(phytools::nodeHeights(mammals_contMap$tree)[,2]) - focal_time,
       col = "red", lty = 2, lwd = 2)

# Plot updated contMap with initial node labels
phytools::plot.contMap(mammals_test$contMap, fsize = c(0.8, 1))
ape::nodelabels(text = mammals_test$contMap$tree$initial_nodes_ID) 


Check whether deepSTRAPP is a development version

Description

Detect whether the current deepSTRAPP installation corresponds to a development version (e.g., version sourced from GitHub), as opposed to a CRAN release.

The development versions include deepSTRAPP outputs as additional internal datasets to help produces examples and vignettes outputs. These additional datasets are removed from the CRAN releases because their size is not compatible with CRAN policies.

This function is only used to check if an example must be ran or a vignette chunk evaluated to produce output from data.

Usage

is_dev_version(pkg = "deepSTRAPP")

Arguments

pkg

Character string. Name of the R package for which the version type is inspected. Default is "deepSTRAPP".

Value

Logical. TRUE if running a development version.

Author(s)

Maël Doré

Examples

# Check the current deepSTRAPP installation is a development version
is_dev_version(pkg = "deepSTRAPP")


Phylogeny and body mass data for extant and extinct mammals families/genera from Slater, 2013

Description

A list containing two elements:

Source: Slater, G. J. (2013). Phylogenetic evidence for a shift in the mode of mammalian body size evolution at the Cretaceous-Palaeogene boundary. Methods in Ecology and Evolution, 4(8), 734-744. doi:10.1111/2041-210X.12084

Usage

data(mammals)

Format

A list with 2 elements.

Details

Initial dataset from Slater, G. J. (2013). The R object was imported from R package motmot to include in deepSTRAPP.


Plot diversification rates and regime shifts from BAMM on phylogeny

Description

Plot on a time-calibrated phylogeny the evolution of diversification rates and the location of regime shifts estimated from a BAMM (Bayesian Analysis of Macroevolutionary Mixtures). Each branch is colored accroding to the estimated rates of speciation, extinction, or net diversification stored in an object of class bammdata. Rates can vary along time, thus colors evolved along individual branches.

This function is a wrapper of original functions from the R package {BAMMtools}:

Usage

plot_BAMM_rates(
  BAMM_object,
  rate_type = "net_diversification",
  method = "phylogram",
  add_regime_shifts = TRUE,
  configuration_type = "MAP",
  sample_index = 1,
  adjust_size_to_prob = TRUE,
  regimes_fill = "grey",
  regimes_size = 1,
  regimes_pch = 21,
  regimes_border_col = "black",
  regimes_border_width = 1,
  ...,
  display_plot = TRUE,
  PDF_file_path = NULL
)

Arguments

BAMM_object

Object of class "bammdata", typically generated with prepare_diversification_data(), that contains a phylogenetic tree and associated diversification rate mapping across selected posterior samples. It works also for BAMM_object updated for a specific focal_time using update_rates_and_regimes_for_focal_time(), or the deepSTRAPP workflow with run_deepSTRAPP_for_focal_time and run_deepSTRAPP_over_time().

rate_type

A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).

method

A character string indicating the method for plotting the phylogenetic tree.

  • method = "phylogram" (default) plots the phylogenetic tree using rectangular coordinates.

  • method = "polar" plots the phylogenetic tree using polar coordinates.

add_regime_shifts

Logical. Whether to add the location of regime shifts on the phylogeny (Step 2). Default is TRUE.

configuration_type

A character string specifying how to select the location of regime shifts across posterior samples.

  • configuration_type = "MAP": Use the average locations recorded in posterior samples with the Maximum A Posteriori probability (MAP) configuration. This regime shift configuration is the most frequent configuration among the posterior samples (See BAMMtools::getBestShiftConfiguration()). This is the default option.

  • configuration_type = "MSC": Use the average locations recorded in posterior samples with the Maximum Shift Credibility (MSC) configuration. This regime shift configuration has the highest product of marginal probabilities across branches (See BAMMtools::maximumShiftCredibility()).

  • configuration_type = "index": Use the configuration of a unique posterior sample those index is provided in sample_index.

sample_index

Integer. Index of the posterior samples to use to plot the location of regime shifts. Used only if configuration_type = index. Default = 1.

adjust_size_to_prob

Logical. Whether to scale the size of the symbols showing the location of regime shifts according to the marginal shift probability of the shift happening on each location/branch. This will only works if there is an ⁠$MSP_tree⁠ element summarizing the marginal shift probabilities across branches in the BAMM_object. Default is TRUE.

regimes_fill

Character string. Set the color of the background of the symbols showing the location of regime shifts. Equivalent to the bg argument in BAMMtools::addBAMMshifts(). Default is "grey".

regimes_size

Numerical. Set the size of the symbols showing the location of regime shifts. Equivalent to the cex argument in BAMMtools::addBAMMshifts(). Default is 1.

regimes_pch

Integer. Set the shape of the symbols showing the location of regime shifts. Equivalent to the pch argument in BAMMtools::addBAMMshifts(). Default is 21.

regimes_border_col

Character string. Set the color of the border of the symbols showing the location of regime shifts. Equivalent to the col argument in BAMMtools::addBAMMshifts(). Default is "black".

regimes_border_width

Numerical. Set the width of the border of the symbols showing the location of regime shifts. Equivalent to the lwd argument in BAMMtools::addBAMMshifts(). Default is 1.

...

Additional graphical arguments to pass down to BAMMtools::plot.bammdata(), BAMMtools::addBAMMshifts(), and par().

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

Details

The main input BAMM_object is the typical output of prepare_diversification_data(). It provides information on rates and regimes shifts across the posterior samples of a BAMM.

⁠$MAP_BAMM_object⁠ and ⁠$MSC_BAMM_object⁠ elements are required to plot regime shift locations following the "MAP" or "MSC" configuration_type respectively. A ⁠$MSP_tree⁠ element is required to scale the size of the symbols showing the location of regime shifts according marginal shift probabilities. (If adjust_size_to_prob = TRUE).

The default option to display regime shift is to use the average locations from the posterior samples with the Maximum A Posteriori probability (MAP) configuration. However, sometimes, multiple configurations have similarly high frequency in the posterior samples (See BAMMtools::credibleShiftSet(). An alternative is to use the average locations from posterior samples with the Maximum Shift Credibility (MSC) configuration instead. This regime shift configuration has the highest product of marginal probabilities across branches where a shift is estimated. It may differ from the MAP configuration. (See BAMMtools::maximumShiftCredibility()).

Value

The function returns (invisibly) a list with three three elements similarly to BAMMtools::plot.bammdata().

Author(s)

Maël Doré

Original functions by Mike Grundler & Pascal Title in R package {BAMMtools}.

See Also

Initial functions in BAMMtools: BAMMtools::plot.bammdata() BAMMtools::addBAMMshifts()

Associated functions in deepSTRAPP: prepare_diversification_data() update_rates_and_regimes_for_focal_time() run_deepSTRAPP_for_focal_time() run_deepSTRAPP_over_time()

Examples

# Load BAMM output
data(whale_BAMM_object, package = "deepSTRAPP")

## Plot overall mean rates with MAP configuration for regime shifts
# (rates are averaged only all posterior samples)
plot_BAMM_rates(whale_BAMM_object, add_regime_shifts = TRUE,
                configuration_type = "MAP", bg = "black",
                regimes_size = 3)
## Plot overall mean rates with MSC configuration for regime shifts
# (rates are averaged only all posterior samples)
plot_BAMM_rates(whale_BAMM_object, add_regime_shifts = TRUE,
                configuration_type = "MSC", bg = "black",
                regimes_size = 3)

## Plot mean MAP rates with regime shifts
# (rates are averaged only across MAP samples)
plot_BAMM_rates(whale_BAMM_object$MAP_BAMM_object, add_regime_shifts = TRUE,
                configuration_type = "index",
                # Set to index to use the regime shift data from the '$MAP_BAMM_object'
                regimes_size = 3, bg = "black")
## Plot mean MSC rates (rates averaged only across MSC samples) with regime shifts
# (rates averaged only across MSC samples)
plot_BAMM_rates(whale_BAMM_object$MSC_BAMM_object, add_regime_shifts = TRUE,
                configuration_type = "index",
                # Set to index to use the regime shift data from the '$MSC_BAMM_object'
                regimes_size = 3, bg = "black")


Plot evolution of p-values of STRAPP tests over time

Description

Plot the evolution of the p-values of STRAPP tests carried out for across multiple time_steps, obtained from run_deepSTRAPP_over_time().

By default, return a plot with a single line for p-values of overall tests. If plot_posthoc_tests = TRUE, it will return a plot with multiple lines, one per pair in post hoc tests (only for multinominal data, with more than two states).

If a PDF file path is provided in PDF_file_path, the plot will be saved directly in a PDF file.

Usage

plot_STRAPP_pvalues_over_time(
  deepSTRAPP_outputs,
  time_range = NULL,
  pvalues_max = NULL,
  alpha = 0.05,
  display_plot = TRUE,
  plot_significant_time_frame = TRUE,
  plot_posthoc_tests = FALSE,
  select_posthoc_pairs = "all",
  plot_adjusted_pvalues = FALSE,
  PDF_file_path = NULL
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_over_time(), that summarize the results of multiple deepSTRAPP across ⁠$time_steps⁠.

time_range

Vector of two numerical values. Time boundaries used for X-axis the plot. If NULL (the default), the range of data provided in deepSTRAPP_outputs will be used.

pvalues_max

Numerical. Set the max boundary used for the Y-axis of the plot. If NULL (the default), the maximum p-value provided in deepSTRAPP_outputs will be used.

alpha

Numerical. Significance level to display as a red dashed line on the plot. If set to NULL, no line will be added. Default is 0.05.

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

plot_significant_time_frame

Logical. Whether to display a green band over the time frame that yields significant results according to the chosen alpha level. Default is TRUE.

plot_posthoc_tests

Logical. For multinominal data only. Whether to plot the p-values for the overall Kruskal-Wallis test across all states (plot_posthoc_tests = FALSE), or plot the p-values for the pairwise post hoc Dunn's test across pairs of states (plot_posthoc_tests = TRUE). Default is FALSE. This is only possible if deepSTRAPP_outputs contains the ⁠$pvalues_summary_df_for_posthoc_pairwise_tests⁠ element returned by run_deepSTRAPP_over_time() when posthoc_pairwise_tests = TRUE.

select_posthoc_pairs

Vector of character strings used to specify the pairs to include in the plot. Names of pairs must match the pairs found in deepSTRAPP_outputs$pvalues_summary_df_for_posthoc_pairwise_tests$pair. Default is "all" to include all pairs.

plot_adjusted_pvalues

Logical. Whether to display the p-values adjusted for multiple testing rather than the raw p-values. See argument 'p.adjust_method' in run_deepSTRAPP_for_focal_time() or run_deepSTRAPP_over_time(). Default is FALSE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

Details

Plots are build based on the p-values recorded in summary_df provided by run_deepSTRAPP_over_time().

For overall tests, those p-values are found in ⁠$pvalues_summary_df⁠.

For multinominal data (categorical or biogeographic data with more than 2 states), it is possible to plot p-values of post hoc pairwise tests. Set plot_posthoc_tests = TRUE to generate plots for the pairwise post hoc Dunn's test across pairs of states. To achieve this, the deepSTRAPP_outputs input object must contain a ⁠$pvalues_summary_df_for_posthoc_pairwise_tests⁠ element that summarizes p-values computed across pairs of states for all post hoc tests. This is obtained from run_deepSTRAPP_over_time() when setting posthoc_pairwise_tests = TRUE to carry out post hoc tests.

Value

The function returns a list of classes gg and ggplot. This object is a ggplot that can be displayed on the console with print(output). It corresponds to the plot being displayed on the console when the function is run, if display_plot = TRUE, and can be further modify for aesthetics using the ggplot2 grammar.

If a PDF_file_path is provided, the function will also generate a PDF file of the plot.

Author(s)

Maël Doré

See Also

run_deepSTRAPP_over_time()

Examples

if (deepSTRAPP::is_dev_version())
{
  ## Load results of run_deepSTRAPP_over_time() for categorical data with 3-levels
  data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
  ## This dataset is only available in development versions installed from GitHub.
  # It is not available in CRAN versions.
  # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

  ## Plot results of overall Kruskal-Wallis / Mann-Whitney-Wilcoxon tests across all time-steps
  plot_overall <- plot_STRAPP_pvalues_over_time(
     deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
     alpha = 0.10,
     time_range = c(0, 30), # Adjust time range if needed
     display_plot = FALSE)

  # Adjust aesthetics a posteriori
  plot_overall <- plot_overall +
     ggplot2::theme(
        plot.title = ggplot2::element_text(size = 16))

  print(plot_overall)

  ## Plot results of post hoc pairwise Dunn's tests between selected pairs of states
  plot_posthoc <- plot_STRAPP_pvalues_over_time(
     deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
     alpha = 0.10,
     plot_posthoc_tests = TRUE,
     # PDF_file_path = "./pvalues_over_time.pdf",
     select_posthoc_pairs = c("arboreal != subterranean",
                              "arboreal != terricolous"),
     display_plot = FALSE)

  # Adjust aesthetics a posteriori
  plot_posthoc <- plot_posthoc +
     ggplot2::theme(
        plot.title = ggplot2::element_text(size = 16),
        legend.title = ggplot2::element_text(size  = 14),
        legend.position.inside = c(0.25, 0.25))

  print(plot_posthoc)
}


Plot continuous trait evolution on the tree

Description

Plot on a time-calibrated phylogeny the evolution of a continuous trait as summarized in a contMap object typically generated with prepare_trait_data().

This function is a wrapper of original functions from the R package {phytools}:

Usage

plot_contMap(
  contMap,
  color_scale = NULL,
  ...,
  display_plot = TRUE,
  PDF_file_path = NULL
)

Arguments

contMap

List of class "contMap", typically generated with prepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given state/range along branches. Each object (i.e., densityMap) corresponds to a state/range. If no color is provided for multi-area ranges, they will be interpolated.

color_scale

Vector of character string. List of colors to use to build the color scale with grDevices::colorRampPalette() showing the evolution of a continuous trait. From lowest values to highest values.

...

Additional arguments to pass down to phytools::plot.contMap() to control plotting.

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

Details

This function is a wrapper of original functions phytools::setMap() and phytools::plot.contMap(). Additions are listed below:

Value

If display_plot = TRUE, the function plots the time-calibrated phylogeny displaying the evolution of a continuous trait. If PDF_file_path is provided, the function exports the plot into a PDF file.

An object of class "contMap" with an (optionally) updated color scale (⁠$cols⁠) is returned invisibly.

Author(s)

Maël Doré

Original functions by Liam Revell in R package {phytools}. Contact: liam.revell@umb.edu

See Also

phytools::plot.contMap() plot_densityMaps_overlay()

Examples

# Load phylogeny
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_tree, package = "deepSTRAPP")

## Prepare trait data

 # (May take several minutes to run)
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                    nm = Ponerinae_trait_tip_data$Taxa)

# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree, x = Ponerinae_cont_tip_data)

# Infer Ancestral Character Estimates based on a Brownian Motion model
# and run a Stochastic Mapping to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
                                       res = 100, # Number of time steps
                                       plot = FALSE)

# Plot contMap with the phytools method
plot(x = Ponerinae_contMap, fsize = c(0.5, 1))

# Plot contMap with an updated color scale
plot_contMap(contMap = Ponerinae_contMap, fsize = c(0.5, 1),
             color_scale = c("darkgreen", "limegreen", "orange", "red"))
             # PDF_file_path = "Ponerinae_contMap.pdf") 


Plot posterior probabilities of states/ranges on phylogeny from densityMaps

Description

Plot on a time-calibrated phylogeny the evolution of a categorical trait/biogeographic ranges summarized from densityMaps typically generated with prepare_trait_data(). Each branch is colored according to the posterior probability of being in a given state/range. Color for each state/range are overlaid using transparency to produce a single plot for all states/ranges.

Usage

plot_densityMaps_overlay(
  densityMaps,
  colors_per_levels = NULL,
  add_ACE_pies = TRUE,
  cex_pies = 0.5,
  ace = NULL,
  ...,
  display_plot = TRUE,
  PDF_file_path = NULL
)

Arguments

densityMaps

List of objects of class "densityMap", typically generated with prepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given state/range along branches. Each object (i.e., densityMap) corresponds to a state/range. If no color is provided for multi-area ranges, they will be interpolated.

colors_per_levels

Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors. If NULL (default), the color scale provided densityMaps will be used.

add_ACE_pies

Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = TRUE.

cex_pies

Numerical. To adjust the size of the ACE pies. Default = 0.5.

ace

Numerical matrix. To provide the posterior probabilities of ancestral states/ranges (characters) estimates (ACE) at internal nodes used to plot the ACE pies. Rows are internal nodes. Columns are states/ranges. Values are posterior probabilities of each state per node. Typically generated with prepare_trait_data() in the ⁠$ace⁠ slot. If NULL (default), the ACE are extracted from the densityMaps with a possible slight discrepancy with the actual tip states and estimated posterior probabilities of ancestral states.

...

Additional arguments to pass down to phytools::plotSimmap() to control plotting.

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

Value

If display_plot = TRUE, the function plots a time-calibrated phylogeny displaying the evolution of a categorical trait/biogeographic ranges. If PDF_file_path is provided, the function exports the plot into a PDF file.

Author(s)

Maël Doré

Original functions by Liam Revell in R package {phytools}. Contact: liam.revell@umb.edu

See Also

phytools::plot.densityMap() phytools::plotSimmap()

Examples


# Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)

# Transform feeding mode data into a 3-level factor
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
eel_data <- as.character(eel_data)
eel_data[c(1, 5, 6, 7, 10, 11, 15, 16, 17, 24, 25, 28, 30, 51, 52, 53, 55, 58, 60)] <- "kiss"
eel_data <- stats::setNames(eel_data, rownames(eel.data))
table(eel_data)

# Manually define a Q_matrix for rate classes of state transition to use in the 'matrix' model
# Does not allow transitions from state 1 ("bite") to state 2 ("kiss") or state 3 ("suction")
# Does not allow transitions from state 3 ("suction") to state 1 ("bite")
# Set symmetrical rates between state 2 ("kiss") and state 3 ("suction")
Q_matrix = rbind(c(NA, 0, 0), c(1, NA, 2), c(0, 2, NA))

# Set colors per states
colors_per_levels <- c("limegreen", "orange", "dodgerblue")
names(colors_per_levels) <- c("bite", "kiss", "suction")

 # (May take several minutes to run)
# Run evolutionary models to prepare trait data
eel_cat_3lvl_data <- prepare_trait_data(tip_data = eel_data, phylo = eel.tree,
    trait_data_type = "categorical",
    colors_per_levels = colors_per_levels,
    evolutionary_models = c("ER", "SYM", "ARD", "meristic", "matrix"),
    Q_matrix = Q_matrix,
    nb_simulations = 1000,
    plot_map = TRUE,
    plot_overlay = TRUE,
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE) 

# Load directly output
data(eel_cat_3lvl_data, package = "deepSTRAPP")

# Plot densityMaps one by one
plot(eel_cat_3lvl_data$densityMaps[[1]]) # densityMap for state n°1 ("bite")
plot(eel_cat_3lvl_data$densityMaps[[2]]) # densityMap for state n°1 ("kiss")
plot(eel_cat_3lvl_data$densityMaps[[3]]) # densityMap for state n°1 ("suction")

# Plot overlay of all densityMaps
plot_densityMaps_overlay(densityMaps = eel_cat_3lvl_data$densityMaps)


Plot histogram of STRAPP test statistics to assess results

Description

Plot an histogram of the distribution of the test statistics obtained from a STRAPP test carried out for a unique focal_time. (See compute_STRAPP_test_for_focal_time() and run_deepSTRAPP_for_focal_time()).

Returns a single histogram for overall tests. If plot_posthoc_tests = TRUE, it will return a faceted plot with an histogram per post hoc tests.

If a PDF file path is provided in PDF_file_path, the plot will be saved directly in a PDF file.

Usage

plot_histogram_STRAPP_test_for_focal_time(
  deepSTRAPP_outputs,
  focal_time = NULL,
  display_plot = TRUE,
  plot_posthoc_tests = FALSE,
  PDF_file_path = NULL
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_for_focal_time(), that summarize the results of a STRAPP test for a specific time in the past (i.e. the focal_time). deepSTRAPP_outputs can also be extracted from the output of run_deepSTRAPP_over_time() that run the whole deepSTRAPP workflow and store results inside ⁠$STRAPP_results_over_time⁠.

focal_time

Numerical. (Optional) If deepSTRAPP_outputs comprises results over multiple time-steps (i.e., output of run_deepSTRAPP_over_time(), this is the time of the STRAPP test targeted for plotting.

display_plot

Logical. Whether to display the histogram(s) generated in the R console. Default is TRUE.

plot_posthoc_tests

Logical. For multinominal data only. Whether to plot the histogram for the overall Kruskal-Wallis test across all states (plot_posthoc_tests = FALSE), or plot the histograms for all the pairwise post hoc Dunn's test across pairs of states (plot_posthoc_tests = TRUE). Default is FALSE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

Details

The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_for_focal_time(). It provides information on results of a STRAPP test performed at a given focal_time.

Histograms are built based on the distribution of the test statistics. Such distributions are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_for_focal_time() when return_perm_data = TRUE so that the distributions of test stats computed across posterior samples are returned among the outputs under ⁠$STRAPP_results$perm_data_df⁠.

For multinominal data (categorical or biogeographic data with more than 2 states), it is possible to plot the histograms of post hoc pairwise tests. Set plot_posthoc_tests = TRUE to generate histograms for all the pairwise post hoc Dunn's test across pairs of states. To achieve this, the deepSTRAPP_outputs input object must contain a ⁠$STRAPP_results$posthoc_pairwise_tests$perm_data_array⁠ element that summarizes test statistics computed across posterior samples for all pairwise post hoc tests. This is obtained from run_deepSTRAPP_for_focal_time() when setting both posthoc_pairwise_tests = TRUE to carry out post hoc tests, and return_perm_data = TRUE to record distributions of test statistics.

Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(), providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the unique time-step used for plotting.

For plotting all time-steps at once, see plot_histograms_STRAPP_tests_over_time().

Value

By default, the function returns a list of classes gg and ggplot. This object is a ggplot that can be displayed on the console with print(output). It corresponds to the histogram being displayed on the console when the function is run, if display_plot = TRUE, and can be further modify for aesthetics using the ggplot2 grammar.

If using multinominal data and set plot_posthoc_tests = TRUE, the function will return a list of objects. Each object is the ggplot associated with a pairwise post hoc test. To plot each histogram i individually, use print(output_list[[i]]). To plot all histograms at once in a multifaceted plot, as displayed on the console if display_plot = TRUE, use cowplot::plot_grid(plotlist = output_list).

Each plot also displays summary statistics for the STRAPP test associated with the data displayed.

If a PDF_file_path is provided, the function will also generate a PDF file of the plot. For post hoc tests, this will save the multifaceted plot.

Author(s)

Maël Doré

See Also

Associated functions in deepSTRAPP: run_deepSTRAPP_for_focal_time() plot_histograms_STRAPP_tests_over_time()

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #

 # Load fake trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

 # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

  # (May take several minutes to run)
 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
              color_scale = color_scale)

 ## Set focal time to 10 Mya
 focal_time <- 10

 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 Ponerinae_deepSTRAPP_cont_old_calib_10My <- run_deepSTRAPP_for_focal_time(
   contMap = Ponerinae_contMap,
   ace = Ponerinae_ACE,
   tip_data = Ponerinae_cont_tip_data,
   trait_data_type = "continuous",
   BAMM_object = Ponerinae_BAMM_object_old_calib,
   focal_time = focal_time,
   rate_type = "net_diversification",
   return_perm_data = TRUE,
   extract_diversification_data_melted_df = TRUE,
   return_updated_trait_data_with_Map = TRUE,
   return_updated_BAMM_object = TRUE)

 ## Explore output
 str(Ponerinae_deepSTRAPP_cont_old_calib_10My, max.level = 1)

 # ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_for_focal_time() ----- #

 histogram_ggplot <- plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_10My,
    display_plot = TRUE,
    # PDF_file_path = "./plot_STRAPP_histogram_10My.pdf",
    plot_posthoc_tests = FALSE)
 # Adjust aesthetics a posteriori
 histogram_ggplot_adj <- histogram_ggplot +
    ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(histogram_ggplot_adj) 

 # ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_over_time() ----- #

 ## Load directly outputs from run_deepSTRAPP_over_time()
 data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 # Select focal_time = 10My
 focal_time <- 10

 ## Plot histogram for overall test
 histogram_ggplot <- plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    focal_time = focal_time,
    display_plot = TRUE,
    # PDF_file_path = "./plot_STRAPP_histogram_10My.pdf",
    plot_posthoc_tests = FALSE)
 # Adjust aesthetics a posteriori
 histogram_ggplot_adj <- histogram_ggplot +
   ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(histogram_ggplot_adj)

 # ----- Example 2: Categorical trait ----- #

 ## Load data

 # Load phylogeny
 data(Ponerinae_tree, package = "deepSTRAPP")
 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract categorical data with 3-levels
 Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
                                         nm = Ponerinae_trait_tip_data$Taxa)
 table(Ponerinae_cat_3lvl_tip_data)

 # Select color scheme for states
 colors_per_states <- c("forestgreen", "sienna", "goldenrod")
 names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

  # (May take several minutes to run)
 ## Produce densityMaps using stochastic character mapping based on an ARD Mk model
 Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
    tip_data = Ponerinae_cat_3lvl_tip_data,
    phylo = Ponerinae_tree_old_calib,
    trait_data_type = "categorical",
    colors_per_states = colors_per_states,
    evolutionary_models = "ARD", # Use default ARD model
    nb_simulations = 100, # Reduce number of simulations to save time
    seed = 1234, # Seet seed for reproducibility
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE,
    plot_map = FALSE) 

 # Load directly output
 data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")

 ## Set focal time to 10 Mya
 focal_time <- 10

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My <- run_deepSTRAPP_for_focal_time(
    densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
    ace = Ponerinae_cat_3lvl_data_old_calib$ace,
    tip_data = Ponerinae_cat_3lvl_tip_data,
    trait_data_type = "categorical",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    focal_time = focal_time,
    rate_type = "net_diversification",
    posthoc_pairwise_tests = TRUE,
    return_perm_data = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE)

 ## Explore output
 str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My, max.level = 1)

 # ------ Plot histogram of STRAPP overall test results ------ #

 histogram_ggplot <- plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My,
    display_plot = TRUE,
    # PDF_file_path = "./plot_STRAPP_histogram_overall_test.pdf",
    plot_posthoc_tests = FALSE)
 # Adjust aesthetics a posteriori
 histogram_ggplot_adj <- histogram_ggplot +
    ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(histogram_ggplot_adj)

 # ------ Plot histograms of STRAPP post hoc test results ------ #

 histograms_ggplot_list <- plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My,
    display_plot = TRUE,
    # PDF_file_path = "./plot_STRAPP_histograms_posthoc_tests.pdf",
    plot_posthoc_tests = TRUE)
 # Plot all histograms one by one
 print(histograms_ggplot_list)
 # Plot all histograms on one faceted plot
 cowplot::plot_grid(plotlist = histograms_ggplot_list) 
}


Plot multiple histograms of STRAPP test statistics over time-steps

Description

Plot an histogram of the distribution of the test statistics obtained from a deepSTRAPP workflow carried out for each focal time in ⁠$time_steps⁠. Main input = output of a deepSTRAPP run over time using run_deepSTRAPP_over_time()).

Returns one histogram for overall tests for each focal time in ⁠$time_steps⁠. If plot_posthoc_tests = TRUE, it will return one faceted plot with an histogram per post hoc tests for each focal time in ⁠$time_steps⁠.

If a PDF file path is provided in PDF_file_path, the plots will be saved directly in a PDF file, with one page per focal time in ⁠$time_steps⁠.

Usage

plot_histograms_STRAPP_tests_over_time(
  deepSTRAPP_outputs,
  display_plots = TRUE,
  plot_posthoc_tests = FALSE,
  PDF_file_path = NULL
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_over_time(), that summarize the results of multiple deepSTRAPP across ⁠$time_steps⁠. It needs to include the ⁠$STRAPP_results_over_time⁠ element with ⁠$perm_data_df⁠ obtained when setting both return_STRAPP_results = TRUE and return_perm_data = TRUE.

display_plots

Logical. Whether to display the histograms generated in the R console. Default is TRUE.

plot_posthoc_tests

Logical. For multinominal data only. Whether to plot the histograms for the overall Kruskal-Wallis test across all states (plot_posthoc_tests = FALSE), or plot the histograms for all the pairwise post hoc Dunn's tests across pairs of states (plot_posthoc_tests = TRUE). Time-steps at which the data does not yield more than two states/ranges will show a warning and generate no plot. Default is FALSE.

PDF_file_path

Character string. If provided, the plots will be saved in a unique PDF file following the path provided here. The path must end with ".pdf". Each page of the PDF corresponds to a focal time in ⁠$time_steps⁠.

Details

The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_over_time(). It provides information on results of a STRAPP tests performed over multiple time-steps.

Histograms are built based on the distribution of the test statistics. Such distributions are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_over_time() when return_STRAPP_results = TRUE AND return_perm_data = TRUE. The ⁠$STRAPP_results_over_time⁠ objects provided within the input are lists that must contain a ⁠$perm_data_df⁠ element that summarizes test statistics computed across posterior samples.

For multinominal data (categorical or biogeographic data with more than 2 states), it is possible to plot the histograms of post hoc pairwise tests. Set plot_posthoc_tests = TRUE to generate histograms for all the pairwise post hoc Dunn's test across pairs of states. To achieve this, the ⁠$STRAPP_results_over_time⁠ objects must contain a ⁠$posthoc_pairwise_tests$perm_data_array⁠ element that summarizes test statistics computed across posterior samples for all pairwise post hoc tests. This is obtained from run_deepSTRAPP_over_time() when setting return_STRAPP_results = TRUE to return the STRAPP results, posthoc_pairwise_tests = TRUE to carry out post hoc tests, and return_perm_data = TRUE to record distributions of test statistics. Time-steps for which the data do not yield more than two states/ranges will show a warning and generate no plot.

Value

By default, the function returns a list of sub-lists of classes gg and ggplot ordered as in ⁠$time_steps⁠. Each sub-list corresponds to a ggplot for a given focal_time i that can be displayed on the console with print(output[[i]]). If display_plots = TRUE, the histograms are being displayed on the console one by one while generated.

If using multinominal data and set plot_posthoc_tests = TRUE, the function will return a list of sub-lists of objects ordered as in ⁠$time_steps⁠. Each sub-list is a list of the ggplots associated with pairwise post hoc tests carried out for this a given focal_time. For a given focal_time i, to plot each histogram j individually, use print(output_list[[i]][[j]]). To plot all histograms of a given focal_time i at once in a multifaceted plot, as displayed sequentially on the console if display_plots = TRUE, use cowplot::plot_grid(plotlist = output_list[[i]]).

Each plot also displays summary statistics for the STRAPP test associated with the data displayed.

If a PDF_file_path is provided, the function will also generate a PDF file of the plots with one page per ⁠$time_steps⁠. For post hoc tests, this will save the multifaceted plots.

Author(s)

Maël Doré

See Also

Associated functions in deepSTRAPP: run_deepSTRAPP_over_time() plot_histogram_STRAPP_test_for_focal_time()

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #

 # Load fake trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

 # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

  # (May take several minutes to run)
 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
              color_scale = color_scale)

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
 # nb_time_steps <- 5
 time_step_duration <- 5
 time_range <- c(0, 40)

 ## Run deepSTRAPP on net diversification rates
 Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
    contMap = Ponerinae_contMap,
    ace = Ponerinae_ACE,
    tip_data = Ponerinae_cont_tip_data,
    trait_data_type = "continuous",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    # nb_time_steps = nb_time_steps,
    time_range = time_range,
    time_step_duration = time_step_duration,
    return_perm_data = TRUE,
    extract_trait_data_melted_df = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly trait data output
 data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Plot histograms of STRAPP overall test results
 # Tests are Spearman's rank correlation tests

 # Plot all histograms
 histogram_ggplots <- plot_histograms_STRAPP_tests_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    display_plot = TRUE,
    # PDF_file_path = "./plot_STRAPP_histogram_overall_test.pdf",
    plot_posthoc_tests = FALSE)

 # Print histogram for time step 1 = 0 My
 print(histogram_ggplots[[1]])
 # Adjust aesthetics of plot for time step 1 a posteriori
 histogram_ggplot_adj <- histogram_ggplots[[1]] +
    ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(histogram_ggplot_adj)

 # ----- Example 2: Categorical data ----- #

 ## Load data

 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract categorical data with 3-levels
 Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
                                         nm = Ponerinae_trait_tip_data$Taxa)
 table(Ponerinae_cat_3lvl_tip_data)

 # Select color scheme for states
 colors_per_states <- c("forestgreen", "sienna", "goldenrod")
 names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

  # (May take several minutes to run)
 ## Produce densityMaps using stochastic character mapping based on an ARD Mk model
 Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
    tip_data = Ponerinae_cat_3lvl_tip_data,
    phylo = Ponerinae_tree_old_calib,
    trait_data_type = "categorical",
    colors_per_levels = colors_per_states,
    evolutionary_models = "ARD",
    nb_simulations = 100,
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE,
    plot_map = FALSE) 

 # Load directly trait data output
 data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
 # nb_time_steps <- 5
 time_step_duration <- 5
 time_range <- c(0, 40)

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates across time-steps.
 Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40 <- run_deepSTRAPP_over_time(
    densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
    ace = Ponerinae_cat_3lvl_data_old_calib$ace,
    tip_data = Ponerinae_cat_3lvl_tip_data,
    trait_data_type = "categorical",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    # nb_time_steps = nb_time_steps,
    time_range = time_range,
    time_step_duration = time_step_duration,
    rate_type = "net_diversification",
    seed = 1234, # Set for reproducibility
    alpha = 0.10, # Select a generous level of significance for the sake of the example
    posthoc_pairwise_tests = TRUE,
    return_perm_data = TRUE,
    extract_trait_data_melted_df = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly deepSTRAPP output
 data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Explore output
 str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, max.level = 1)

 ## Plot histograms of STRAPP overall test results #
 # Tests are Kruskall-Wallis H tests when more than two states/ranges are present.
 # Tests are Mann–Whitney–Wilcoxon rank-sum tests when only two states/ranges are present.

 histogram_ggplots <- plot_histograms_STRAPP_tests_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
    display_plot = TRUE,
    # PDF_file_path = "./plot_STRAPP_histograms_overall_tests.pdf",
    plot_posthoc_tests = FALSE)

 # Print histogram for time step 1 = 0 My
 print(histogram_ggplots[[1]])
 # Adjust aesthetics of plot for time step 1 a posteriori
 histogram_ggplot_adj <- histogram_ggplots[[1]] +
     ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(histogram_ggplot_adj)

 ## Plot histograms of STRAPP post hoc test results ------ #
 # Tests are Dunn's multiple comparison pairwise post hoc tests possible
 # only when more than two states/ranges are present.

 histograms_ggplots_list <- plot_histograms_STRAPP_tests_over_time(
     deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
     display_plot = TRUE,
     # PDF_file_path = "./plot_STRAPP_histograms_posthoc_tests.pdf",
     plot_posthoc_tests = TRUE)

 # Print all histograms for time step 1 (= 0 My) one by one
 print(histograms_ggplots_list[[1]])
 # Plot all histograms for time step 1 (= 0 My) on one faceted plot
 cowplot::plot_grid(plotlist = histograms_ggplots_list[[1]])
}


Plot evolution of diversification rates in relation to trait values over time

Description

Plot the evolution of diversification rates in relation to trait values extracted for multiple time_steps with run_deepSTRAPP_over_time().

Rates are averaged across branches at each time step (i.e., focal_time).

Usage

plot_rates_through_time(
  deepSTRAPP_outputs,
  rate_type = "net_diversification",
  quantile_ranges = c(0, 0.25, 0.5, 0.75, 1),
  select_trait_levels = "all",
  time_range = NULL,
  color_scale = NULL,
  colors_per_levels = NULL,
  plot_CI = FALSE,
  CI_type = "fuzzy",
  CI_quantiles = 0.95,
  display_plot = TRUE,
  PDF_file_path = NULL,
  return_mean_data_per_samples_df = FALSE,
  return_median_data_across_samples_df = FALSE
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_over_time(), that summarize the results of multiple STRAPP tests across ⁠$time_steps⁠. The list needs to include two data.frame: ⁠$trait_data_df_over_time⁠ and ⁠$diversification_data_df_over_time⁠ by setting extract_trait_data_melted_df = TRUE and extract_diversification_data_melted_df = TRUE.

rate_type

A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). Even if the deepSTRAPP_outputs object was generated with run_deepSTRAPP_over_time() for testing another type of rates, the ⁠$trait_data_df_over_time⁠ and ⁠$diversification_data_df_over_time⁠ data frames will contain data for all types of rates.

quantile_ranges

Vector of numerical. Only for continuous trait data. Quantiles used as thresholds to group branches by trait values. It must start with 0 and finish with 1. Default is c(0, 0.25, 0.5, 0.75, 1.0) which produces four balanced quantile groups.

select_trait_levels

(Vector of) character string. Only for categorical and biogeographic trait data. To provide a list of a subset of states/ranges to plot. Names must match the ones found in deepSTRAPP_outputs$trait_data_df_over_time$trait_value. Default is all which means all states/ranges will be plotted.

time_range

Vector of two numerical values. Time boundaries used for the plot. If NULL (the default), the range of data provided in deepSTRAPP_outputs will be used.

color_scale

Vector of character string. List of colors to use to build the color scale with grDevices::colorRampPalette() to display the quantile groups used to discretize the continuous trait data. From lowest values to highest values. Only for continuous data. Default = NULL will use the 'Spectral' color palette in RColorBrewer::brewer.pal().

colors_per_levels

Named character string. To set the colors to use to plot rates of each state/range. Names = states/ranges; values = colors. If NULL (default), the default ggplot2 color palette (scales::hue_pal()) will be used. Only for categorical and biogeographic data.

plot_CI

Logical. Whether to plot a confidence interval (CI) based on the distribution of rates found in posterior samples. Default is FALSE.

CI_type

Character string. To select the type of confidence interval (CI) to plot.

  • fuzzy (default): to overlay the evolution of rates found in all posterior samples with high transparency levels.

  • quantiles_rect: to add a polygon encompassing a proportion of the rate values found in posterior samples. This proportion is defined with CI_quantiles.

CI_quantiles

Numerical. Proportion of rate values across posterior samples encompassed by the confidence interval. Only if CI_type = "quantiles_rect". Default is 0.95.

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

return_mean_data_per_samples_df

Logical. Whether to include in the output the data.frame of mean rates per trait values computed for each posterior sample at each time-step (aggregated across groups of branches based on trait data). This is used to draw the confidence interval. Default is FALSE.

return_median_data_across_samples_df

Logical. Whether to include in the output the data.frame of median rates per trait values across posterior samples computed for at each time-step (aggregated across groups of branches based on trait data AND posterior samples). This is used to draw the lines on the plot. Default is FALSE.

Value

The function returns a list with at least one element.

Optional summary data frames:

If a PDF_file_path is provided, the function will also generate a PDF file of the plot.

Author(s)

Maël Doré

See Also

run_deepSTRAPP_over_time()

For a guided tutorial, see this vignette: vignette("plot_rates_through_time", package = "deepSTRAPP")

Examples


# ------ Example 1: Plot rates through time for continuous data ------ #

if (deepSTRAPP::is_dev_version())
{
  ## Load results of run_deepSTRAPP_over_time()
  data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
  ## This dataset is only available in development versions installed from GitHub.
  # It is not available in CRAN versions.
  # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

  # Visualize trait data
  hist(Ponerinae_deepSTRAPP_cont_old_calib_0_40$trait_data_df_over_time$trait_value,
     xlab = "Trait values", main = NULL)

  # Generate plot
  plotTT_continuous <- plot_rates_through_time(
     deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
     quantile_ranges = c(0, 0.25, 0.5, 0.75, 1.0),
     time_range = c(0, 50), # Control range of the X-axis
     # color_scale = c("limegreen", "red"),
     plot_CI = TRUE,
     CI_type = "quantiles_rect",
     CI_quantiles = 0.9,
     display_plot = FALSE,
     # PDF_file_path = "./plotTT_continuous.pdf",
     return_mean_data_per_samples_df = TRUE,
     return_median_data_across_samples_df = TRUE)

  # Explore output
  # str(plotTT_continuous, max.level = 1)

  # Plot
  print(plotTT_continuous$rates_TT_ggplot)
  # Adjust aesthetics of plot a posteriori
  plotTT_continuous_adj <- plotTT_continuous$rates_TT_ggplot +
      ggplot2::theme(
         plot.title = ggplot2::element_text(color = "red", size = 15),
         axis.title = ggplot2::element_text(size = 14),
         axis.text = ggplot2::element_text(size = 12))
  # Plot again
  print(plotTT_continuous_adj)
}


# ------ Example 2: Plot rates through time for categorical data ------ #

if (deepSTRAPP::is_dev_version())
{
  ## Load results of run_deepSTRAPP_over_time()
  data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
  ## This dataset is only available in development versions installed from GitHub.
  # It is not available in CRAN versions.
  # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

  # Explore trait data
  table(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40$trait_data_df_over_time$trait_value)

  # Set colors to use
  colors_per_states <- c("forestgreen", "sienna", "goldenrod")
  names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

  # Generate plot only for "arboreal" and "terricolous"
  plotTT_categorical <- plot_rates_through_time(
      deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
      select_trait_levels = c("arboreal", "terricolous"),
      time_range = c(0, 50),
      colors_per_levels = colors_per_states,
      plot_CI = TRUE,
      CI_type = "quantiles_rect",
      CI_quantiles = 0.9,
      display_plot = FALSE,
      # PDF_file_path = "./plotTT_categorical.pdf",
      return_mean_data_per_samples_df = TRUE,
      return_median_data_across_samples_df = TRUE)

  # Explore output
  # str(plotTT_categorical, max.level = 1)

  # Adjust aesthetics of plot a posteriori
  plotTT_categorical_adj <- plotTT_categorical$rates_TT_ggplot +
      ggplot2::theme(
         plot.title = ggplot2::element_text(size = 15),
         axis.title = ggplot2::element_text(size = 14),
         axis.text = ggplot2::element_text(size = 12))
  print(plotTT_categorical_adj)
}

# ------ Example 3: Plot rates through time for biogeographic data ------ #

if (deepSTRAPP::is_dev_version())
{
  ## Load results of run_deepSTRAPP_over_time()
  data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
  ## This dataset is only available in development versions installed from GitHub.
  # It is not available in CRAN versions.
  # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

  # Explore range data
  table(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40$trait_data_df_over_time$trait_value)

  # Set colors to use
  colors_per_ranges <- c("mediumpurple2", "peachpuff2")
  names(colors_per_ranges) <- c("N", "O")

  plotTT_biogeographic <- plot_rates_through_time(
      deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_0_40,
      select_trait_levels = "all",
      time_range = c(0, 50),
      colors_per_levels = colors_per_ranges,
      plot_CI = TRUE,
      CI_type = "quantiles_rect",
      CI_quantiles = 0.9,
      display_plot = FALSE,
      # PDF_file_path = "./plotTT_biogeographic.pdf",
      return_mean_data_per_samples_df = TRUE,
      return_median_data_across_samples_df = TRUE)

  # Explore output
  # str(plotTT_biogeographic, max.level = 1)

  # Adjust aesthetics of plot a posteriori
  plotTT_biogeographic_adj <- plotTT_biogeographic$rates_TT_ggplot +
      ggplot2::theme(
         plot.title = ggplot2::element_text(size = 15),
         axis.title = ggplot2::element_text(size = 14),
         axis.text = ggplot2::element_text(size = 12))
  print(plotTT_biogeographic_adj)
}


Plot rates vs. trait data for a given focal time

Description

Plot rates vs. trait data as extracted for a given focal time. Data are extracted from the output of a deepSTRAPP run carried out with run_deepSTRAPP_for_focal_time() or run_deepSTRAPP_over_time()).

Returns a single plot showing rates vs. trait data for a given focal time. If the trait data are 'continuous', the plot is a scatter plot. If the trait data are 'categorical' or 'biogeographic', the plot is a boxplot.

If a PDF file path is provided in PDF_file_path, the plot will be saved directly in a PDF file.

Usage

plot_rates_vs_trait_data_for_focal_time(
  deepSTRAPP_outputs,
  focal_time = NULL,
  rate_type = "net_diversification",
  select_trait_levels = "all",
  color_scale = NULL,
  colors_per_levels = NULL,
  display_plot = TRUE,
  PDF_file_path = NULL,
  return_mean_rates_vs_trait_data_df = FALSE
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_for_focal_time(), that summarize the results of a STRAPP test for a specific time in the past (i.e. the focal_time). deepSTRAPP_outputs can also be extracted from the output of run_deepSTRAPP_over_time() that runs the whole deepSTRAPP workflow over multiple time-steps.

focal_time

Numerical. (Optional) If deepSTRAPP_outputs comprises results over multiple time-steps (i.e., output of run_deepSTRAPP_over_time(), this is the time of the STRAPP test targeted for plotting.

rate_type

A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). Even if the deepSTRAPP_outputs object was generated with run_deepSTRAPP_over_time() for testing another type of rates, the object will contain data for all types of rates.

select_trait_levels

(Vector of) character string. Only for categorical and biogeographic trait data. To provide a list of a subset of states/ranges to plot. Names must match the ones found in the deepSTRAPP_outputs. Default is all which means all states/ranges will be plotted.

color_scale

Vector of character string. List of colors to use to build the color scale with grDevices::colorRampPalette() to display the points. Color scale from lowest values to highest rate values. Only for continuous data. Default = NULL will use the 'Spectral' color palette in RColorBrewer::brewer.pal().

colors_per_levels

Named character string. To set the colors to use to plot data points and box for each state/range. Names = states/ranges; values = colors. If NULL (default), the default ggplot2 color palette (scales::hue_pal()) will be used. Only for categorical and biogeographic data.

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

return_mean_rates_vs_trait_data_df

Logical. Whether to include in the output the data.frame of mean rates per trait values/states/ranges computed for each posterior sample at the focal time. Default is FALSE.

Details

The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_for_focal_time(). It provides information on results of a STRAPP test performed at a given focal_time.

Plots are built based on both trait data and diversification data as extracted for the given focal_time. Such data are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_for_focal_time() when return_updated_trait_data_with_Map = TRUE for trait data, and extract_diversification_data_melted_df = TRUE for diversification data. Please ensure to select those arguments when running deepSTRAPP.

Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(), providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the unique time-step used for plotting.

For plotting all time-steps at once, see plot_rates_vs_trait_data_over_time().

Value

The function returns a list with at least one element.

If the trait data are 'continuous', the plot is a scatter plot showing how diversification rates varies with trait values. If the trait data are 'categorical' or 'biogeographic', the plot is a boxplot showing diversification rates per states/ranges.

Each plot also displays summary statistics for the STRAPP test associated with the data displayed:

Optional summary data.frame:

If a PDF_file_path is provided, the function will also generate a PDF file of the plot.

Author(s)

Maël Doré

See Also

Associated functions in deepSTRAPP: run_deepSTRAPP_for_focal_time() plot_rates_vs_trait_data_over_time()

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #

 # Load fake trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

 # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

  # (May take several minutes to run)
 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
              color_scale = color_scale)

 ## Set focal time to 10 Mya
 focal_time <- 10

 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 Ponerinae_deepSTRAPP_cont_old_calib_10My <- run_deepSTRAPP_for_focal_time(
    contMap = Ponerinae_contMap,
    ace = Ponerinae_ACE,
    tip_data = Ponerinae_cont_tip_data,
    trait_data_type = "continuous",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    focal_time = focal_time,
    rate_type = "net_diversification",
    return_perm_data = TRUE,
    # Need to be set to TRUE to save diversification data
    extract_diversification_data_melted_df = TRUE,
    # Need to be set to TRUE to save trait data
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE)

 ## Explore output
 str(Ponerinae_deepSTRAPP_cont_old_calib_10My, max.level = 1)

 # ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_for_focal_time() ----- #

 # Get plot
 rates_vs_trait_output <- plot_rates_vs_trait_data_for_focal_time(
    deepSTRAPP_outputs = deepPonerinae_deepSTRAPP_cont_old_calib_10My,
    color_scale = c("grey80", "orange"),
    display_plot = TRUE,
    # PDF_file_path = "./plot_rates_vs_trait_10My.pdf"
    return_mean_rates_vs_trait_data_df = TRUE)
 # Adjust aesthetics a posteriori
 rates_vs_trait_ggplot_adj <- rates_vs_trait_output$rates_vs_trait_ggplot +
    ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(rates_vs_trait_ggplot_adj)

 # Explore melted data.frame of mean rates and trait values extracted for the given focal time.
 head(rates_vs_trait_output$mean_rates_vs_trait_data_df) 

 # ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_over_time() ----- #

 ## Load directly outputs from run_deepSTRAPP_over_time()
 data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 # Select focal_time = 10My
 focal_time <- 10

 # Get plot
 rates_vs_trait_output <- plot_rates_vs_trait_data_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    focal_time = focal_time,
    color_scale = c("grey80", "purple"),
    display_plot = TRUE)
    # PDF_file_path = "./plot_rates_vs_trait_10My.pdf"

 # Adjust aesthetics a posteriori
 rates_vs_trait_ggplot_adj <- rates_vs_trait_output$rates_vs_trait_ggplot +
     ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(rates_vs_trait_ggplot_adj)


 # ----- Example 2: Categorical trait ----- #

 ## Load data

 # Load phylogeny
 data(Ponerinae_tree, package = "deepSTRAPP")
 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract categorical data with 3-levels
 Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
                                         nm = Ponerinae_trait_tip_data$Taxa)
 table(Ponerinae_cat_3lvl_tip_data)

 # Select color scheme for states
 colors_per_states <- c("forestgreen", "sienna", "goldenrod")
 names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

  # (May take several minutes to run)
 ## Produce densityMaps using stochastic character mapping based on an ARD Mk model
 Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
    tip_data = Ponerinae_cat_3lvl_tip_data,
    phylo = Ponerinae_tree_old_calib,
    trait_data_type = "categorical",
    colors_per_states = colors_per_states,
    evolutionary_models = "ARD", # Use default ARD model
    nb_simulations = 100, # Reduce number of simulations to save time
    seed = 1234, # Seet seed for reproducibility
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE,
    plot_map = FALSE)

 # Load directly output
 data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")

 ## Set focal time to 10 Mya
 focal_time <- 10

 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My <- run_deepSTRAPP_for_focal_time(
    densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
    ace = Ponerinae_cat_3lvl_data_old_calib$ace,
    tip_data = Ponerinae_cat_3lvl_tip_data,
    trait_data_type = "categorical",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    focal_time = focal_time,
    rate_type = "net_diversification",
    posthoc_pairwise_tests = TRUE,
    return_perm_data = TRUE,
    # Need to be set to TRUE to save diversification data
    extract_diversification_data_melted_df = TRUE,
    # Need to be set to TRUE to save trait data
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE)

 ## Explore output
 str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My, max.level = 1)

 ## Plot rates vs. states
 rates_vs_trait_output <- plot_rates_vs_trait_data_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My,
    focal_time = 10,
    select_trait_levels = c("arboreal", "terricolous"), # Select only two levels
    colors_per_levels = colors_per_states[c("arboreal", "terricolous")], # Adjust colors
    display_plot = TRUE,
    # PDF_file_path = "./plot_rates_vs_trait_10My.pdf",
    return_mean_rates_vs_trait_data_df = TRUE)

 # Adjust aesthetics a posteriori
 rates_vs_trait_ggplot_adj <- rates_vs_trait_output$rates_vs_trait_ggplot +
    ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
 print(rates_vs_trait_ggplot_adj)

 # Explore melted data.frame of mean rates and states extracted for the given focal time.
 head(rates_vs_trait_output$mean_rates_vs_trait_data_df) 
 }


Plot rates vs. trait data over time-steps

Description

Plot rates vs. trait data as extracted for all trait_data_dftime-steps. Data are extracted from the output of a deepSTRAPP run carried out with run_deepSTRAPP_over_time()) over multiple time-steps.

For each time-step, returns a plot showing rates vs. trait data. If the trait data are 'continuous', the plot is a scatter plot. If the trait data are 'categorical' or 'biogeographic', the plot is a boxplot.

If a PDF file path is provided in PDF_file_path, all plots will be saved directly in a unique PDF file, with one page per plot/time-step.

Usage

plot_rates_vs_trait_data_over_time(
  deepSTRAPP_outputs,
  rate_type = "net_diversification",
  select_trait_levels = "all",
  color_scale = NULL,
  colors_per_levels = NULL,
  display_plot = TRUE,
  PDF_file_path = NULL,
  return_mean_rates_vs_trait_data_df = FALSE
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_over_time() that runs the whole deepSTRAPP workflow over multiple time-steps.

rate_type

A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). Even if the deepSTRAPP_outputs object was generated with run_deepSTRAPP_over_time() for testing another type of rates, the object will contain data for all types of rates.

select_trait_levels

(Vector of) character string. Only for categorical and biogeographic trait data. To provide a list of a subset of states/ranges to plot. Names must match the ones found in the deepSTRAPP_outputs. Default is all which means all states/ranges will be plotted.

color_scale

Vector of character string. List of colors to use to build the color scale with grDevices::colorRampPalette() to display the points. Color scale from lowest values to highest rate values. Only for continuous data. Default = NULL will use the 'Spectral' color palette in RColorBrewer::brewer.pal().

colors_per_levels

Named character string. To set the colors to use to plot data points and box for each state/range. Names = states/ranges; values = colors. If NULL (default), the default ggplot2 color palette (scales::hue_pal()) will be used. Only for categorical and biogeographic data.

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

return_mean_rates_vs_trait_data_df

Logical. Whether to include in the output the data.frame of mean rates per trait values/states/ranges computed for each posterior sample over all time-steps. Default is FALSE.

Details

The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_over_time(). It provides information on results of a STRAPP test performed over multiple time-steps.

Plots are built based on both trait data and diversification data as extracted for the time-steps. Such data are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_over_time().

For plotting a single focal_time, see plot_rates_vs_trait_data_for_focal_time().

Value

The function returns a list with at least one element.

If the trait data are 'continuous', the plots are scatter plots showing how diversification rates varies with trait values. If the trait data are 'categorical' or 'biogeographic', the plots are boxplots showing diversification rates per states/ranges.

Each plot also displays summary statistics for the STRAPP test associated with the data displayed:

Optional summary data.frame:

If a PDF_file_path is provided, the function will also generate a unique PDF file with one plot/page per ⁠$time_steps⁠.

Author(s)

Maël Doré

See Also

Associated functions in deepSTRAPP: run_deepSTRAPP_over_time() plot_rates_vs_trait_data_for_focal_time()

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #

 # Load fake trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

  ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

 # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

  # (May take several minutes to run)
 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
              color_scale = color_scale)

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
 # nb_time_steps <- 5
 time_step_duration <- 5
 time_range <- c(0, 40)

 ## Run deepSTRAPP on net diversification rates
 Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
    contMap = Ponerinae_contMap,
    ace = Ponerinae_ACE,
    tip_data = Ponerinae_cont_tip_data,
    trait_data_type = "continuous",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    # nb_time_steps = nb_time_steps,
    time_range = time_range,
    time_step_duration = time_step_duration,
    return_perm_data = TRUE,
    # Need to be set to TRUE to save trait data
    extract_trait_data_melted_df = TRUE,
    # Need to be set to TRUE to save diversification data
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
     return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly trait data output
 data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 # Explore output
 str(Ponerinae_deepSTRAPP_cont_old_calib_0_40, max.level = 1)

 ## Plot for all time-steps
 rates_vs_trait_outputs <- plot_rates_vs_trait_data_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    color_scale = c("grey80", "purple"), # Adjust color scale
    display_plot = TRUE,
    # PDF_file_path = "./plot_rates_vs_trait_0_40My.pdf",
    return_mean_rates_vs_trait_data_df = TRUE)

 ## Print plot for time step 3 = 10 My
 print(rates_vs_trait_outputs$rates_vs_trait_ggplots[[3]])

 ## Explore melted data.frame of rates and trait data
 head(rates_vs_trait_outputs$mean_rates_vs_trait_data_df)

 # ----- Example 2: Categorical data ----- #

 ## Load data

 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract categorical data with 3-levels
 Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
                                         nm = Ponerinae_trait_tip_data$Taxa)
 table(Ponerinae_cat_3lvl_tip_data)

 # Select color scheme for states
 colors_per_states <- c("forestgreen", "sienna", "goldenrod")
 names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

  # (May take several minutes to run)
 ## Produce densityMaps using stochastic character mapping based on an ARD Mk model
 Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
    tip_data = Ponerinae_cat_3lvl_tip_data,
    phylo = Ponerinae_tree_old_calib,
    trait_data_type = "categorical",
    colors_per_levels = colors_per_states,
    evolutionary_models = "ARD",
    nb_simulations = 100,
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE,
    plot_map = FALSE) 

 # Load directly trait data output
 data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
 # nb_time_steps <- 5
 time_step_duration <- 5
 time_range <- c(0, 40)

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates across time-steps.
 Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40 <- run_deepSTRAPP_over_time(
    densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
    ace = Ponerinae_cat_3lvl_data_old_calib$ace,
    tip_data = Ponerinae_cat_3lvl_tip_data,
    trait_data_type = "categorical",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    # nb_time_steps = nb_time_steps,
    time_range = time_range,
    time_step_duration = time_step_duration,
    rate_type = "net_diversification",
    seed = 1234, # Set for reproducibility
    alpha = 0.10, # Select a generous level of significance for the sake of the example
    posthoc_pairwise_tests = TRUE,
    return_perm_data = TRUE,
    # Need to be set to TRUE to save trait data
    extract_trait_data_melted_df = TRUE,
    # Need to be set to TRUE to save diversification data
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly deepSTRAPP output
 data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 # Explore output
 str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, max.level = 1)

 # Adjust color scheme
 colors_per_states <- c("orange", "dodgerblue", "red")
 names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

 ## Plot for all time-steps
 rates_vs_trait_outputs <- plot_rates_vs_trait_data_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
    colors_per_levels = colors_per_states, # Adjust color scheme
    display_plot = TRUE,
    # PDF_file_path = "./plot_rates_vs_trait_0_40My.pdf",
    return_mean_rates_vs_trait_data_df = TRUE)

 ## Print plot for time step 3 = 10 My
 print(rates_vs_trait_outputs$rates_vs_trait_ggplots[[3]])

 ## Explore melted data.frame of rates and states
 head(rates_vs_trait_outputs$mean_rates_vs_trait_data_df)
}


Plot trait/range evolution vs. diversification rates and regime shifts on phylogeny

Description

Plot two mapped phylogenies with evolutionary data with branches cut off at focal_time.

This function is a wrapper multiple plotting functions:

Usage

plot_traits_vs_rates_on_phylogeny_for_focal_time(
  deepSTRAPP_outputs,
  focal_time = NULL,
  color_scale = NULL,
  colors_per_levels = NULL,
  add_ACE_pies = TRUE,
  cex_pies = 0.5,
  rate_type = "net_diversification",
  keep_initial_colorbreaks = FALSE,
  add_regime_shifts = TRUE,
  configuration_type = "MAP",
  sample_index = 1,
  adjust_size_to_prob = TRUE,
  regimes_fill = "grey",
  regimes_size = 1,
  regimes_pch = 21,
  regimes_border_col = "black",
  regimes_border_width = 1,
  ...,
  display_plot = TRUE,
  PDF_file_path = NULL
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_for_focal_time(), that summarize the results of a STRAPP test for a specific time in the past (i.e. the focal_time). deepSTRAPP_outputs can also be extracted from the output of run_deepSTRAPP_over_time() that run the whole deepSTRAPP workflow over multiple time-steps.

focal_time

Numerical. (Optional) If deepSTRAPP_outputs comprises results over multiple time-steps (i.e., output of run_deepSTRAPP_over_time(), this is the time of the STRAPP test targeted for plotting.

color_scale

Vector of character string. List of colors to use to build the color scale with grDevices::colorRampPalette() showing the evolution of a continuous trait. From lowest values to highest values. (For continuous trait data only)

colors_per_levels

Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors. If NULL (default), the color scale provided densityMaps will be used. (For categorical and biogeographic data only)

add_ACE_pies

Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = TRUE.

cex_pies

Numerical. To adjust the size of the ACE pies. Default = 0.5.

rate_type

A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).

keep_initial_colorbreaks

Logical. Whether to keep the same color breaks as used for the most recent focal time. Typically, the current time (t = 0). This will only works if you provide the output of run_deepSTRAPP_over_time() as deepSTRAPP_outputs. Default = FALSE.

add_regime_shifts

Logical. Whether to add the location of regime shifts on the phylogeny (Step 2). Default is TRUE.

configuration_type

A character string specifying how to select the location of regime shifts across posterior samples.

  • configuration_type = "MAP": Use the average locations recorded in posterior samples with the Maximum A Posteriori probability (MAP) configuration. This regime shift configuration is the most frequent configuration among the posterior samples (See BAMMtools::getBestShiftConfiguration()). This is the default option.

  • configuration_type = "MSC": Use the average locations recorded in posterior samples with the Maximum Shift Credibility (MSC) configuration. This regime shift configuration has the highest product of marginal probabilities across branches (See BAMMtools::maximumShiftCredibility()).

  • configuration_type = "index": Use the configuration of a unique posterior sample those index is provided in sample_index.

sample_index

Integer. Index of the posterior samples to use to plot the location of regime shifts. Used only if configuration_type = index. Default = 1.

adjust_size_to_prob

Logical. Whether to scale the size of the symbols showing the location of regime shifts according to the marginal shift probability of the shift happening on each location/branch. This will only works if there is an ⁠$MSP_tree⁠ element summarizing the marginal shift probabilities across branches in the BAMM_object. Default is TRUE.

regimes_fill

Character string. Set the color of the background of the symbols showing the location of regime shifts. Equivalent to the bg argument in BAMMtools::addBAMMshifts(). Default is "grey".

regimes_size

Numerical. Set the size of the symbols showing the location of regime shifts. Equivalent to the cex argument in BAMMtools::addBAMMshifts(). Default is 1.

regimes_pch

Integer. Set the shape of the symbols showing the location of regime shifts. Equivalent to the pch argument in BAMMtools::addBAMMshifts(). Default is 21.

regimes_border_col

Character string. Set the color of the border of the symbols showing the location of regime shifts. Equivalent to the col argument in BAMMtools::addBAMMshifts(). Default is "black".

regimes_border_width

Numerical. Set the width of the border of the symbols showing the location of regime shifts. Equivalent to the lwd argument in BAMMtools::addBAMMshifts(). Default is 1.

...

Additional graphical arguments to pass down to phytools::plot.contMap(), phytools::plotSimmap(), plot_densityMaps_overlay(), BAMMtools::plot.bammdata(), BAMMtools::addBAMMshifts(), and par().

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

Details

The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_for_focal_time(). It provides information on results of a STRAPP test performed at a given focal_time, and can also encompass updated phylogenies with mapped trait evolution and diversification rates and regimes shifts if appropriate arguments are set.

⁠$MAP_BAMM_object⁠ and ⁠$MSC_BAMM_object⁠ elements are required in ⁠$updated_BAMM_object⁠ to plot regime shift locations following the "MAP" or "MSC" configuration_type respectively. A ⁠$MSP_tree⁠ element is required to scale the size of the symbols showing the location of regime shifts according marginal shift probabilities. (If adjust_size_to_prob = TRUE).

Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(), providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the unique time-step used for plotting.

For plotting all time-steps at once, see plot_traits_vs_rates_on_phylogeny_over_time().

Value

If display_plot = TRUE, the function displays a plot with two facets in the R console:

If PDF_file_path is provided, the plot will be exported in a PDF file.

Author(s)

Maël Doré

See Also

phytools::plot.densityMap() plot_densityMaps_overlay() plot_BAMM_rates()

Functions in deepSTRAPP needed to produce the deepSTRAPP_outputs as input: run_deepSTRAPP_for_focal_time() run_deepSTRAPP_over_time() Function in deepSTRAPP to plot all time-steps at once: plot_traits_vs_rates_on_phylogeny_over_time()

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #
 ## Load data

 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

 # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

  # (May take several minutes to run)
 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
              color_scale = color_scale)

 ## Set focal time to 10 Mya
 focal_time <- 10

 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 Ponerinae_deepSTRAPP_cont_old_calib_10My <- run_deepSTRAPP_for_focal_time(
   contMap = Ponerinae_contMap,
   ace = Ponerinae_ACE,
   tip_data = Ponerinae_cont_tip_data,
   trait_data_type = "continuous",
   BAMM_object = Ponerinae_BAMM_object_old_calib,
   focal_time = focal_time,
   rate_type = "net_diversification",
   return_perm_data = TRUE,
   extract_diversification_data_melted_df = TRUE,
   return_updated_trait_data_with_Map = TRUE,
   return_updated_BAMM_object = TRUE)

 ## Explore output
 str(Ponerinae_deepSTRAPP_cont_old_calib_10My, max.level = 1)

 ## Plot updated contMap vs. updated diversification rates
 plot_traits_vs_rates_on_phylogeny_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_10My,
    keep_initial_colorbreaks = TRUE, # To use the same color breaks as for t = 0 My
    color_scale = c("limegreen", "orange", "red"), # Adjust color scale on contMap
    legend = TRUE, labels = TRUE, # Show legend and label on BAMM plot
    cex = 0.7) # Adjust label size on contMap
    # PDF_file_path = "Updated_maps_cont_10My.pdf") 

 # ----- Example 2: Biogeographic data ----- #

 ## Load data

 # Load phylogeny
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
 # Load trait df
 data(Ponerinae_binary_range_table, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare range data for Old World vs. New World

 # No overlap in ranges
 table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)

 Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
                                      nm = Ponerinae_binary_range_table$Taxa)
 Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
 Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
 Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
 names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
 table(Ponerinae_NO_data)

 colors_per_levels <- c("mediumpurple2", "peachpuff2")
 names(colors_per_levels) <- c("N", "O")

  # (May take several minutes to run)
 ## Run evolutionary models
 Ponerinae_biogeo_data <- prepare_trait_data(
     tip_data = Ponerinae_NO_data,
     trait_data_type = "biogeographic",
     phylo = Ponerinae_tree_old_calib,
     evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
     BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
     keep_BioGeoBEARS_files = FALSE,
     prefix_for_files = "Ponerinae_old_calib",
     max_range_size = 2,
     split_multi_area_ranges = TRUE, # Set to TRUE to display the two outputs
     nb_simulations = 100, # Reduce to save time (Default = '1000')
     colors_per_levels = colors_per_levels,
     return_model_selection_df = TRUE,
     verbose = TRUE) 

 # Load directly output
 data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")

 ## Explore output
 str(Ponerinae_biogeo_data_old_calib, 1)

 ## Set focal time to 10 Mya
 focal_time <- 10

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 Ponerinae_deepSTRAPP_biogeo_old_calib_10My <- run_deepSTRAPP_for_focal_time(
    densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
    ace = Ponerinae_biogeo_data_old_calib$ace,
    tip_data = Ponerinae_NO_data,
    trait_data_type = "biogeographic",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    focal_time = focal_time,
    rate_type = "net_diversification",
    return_perm_data = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE)

 ## Explore output
 str(Ponerinae_deepSTRAPP_biogeo_old_calib_10My, max.level = 1)

 ## Plot updated contMap vs. updated diversification rates
 plot_traits_vs_rates_on_phylogeny_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_10My,
    # Adjust colors on densityMaps
    colors_per_levels = c("N" = "dodgerblue2", "O" = "orange"),
    # Show legend and label on BAMM plot
    legend = TRUE, labels = TRUE,
    cex_pies = 0.2, # Adjust size of ACE pies on densityMaps
    cex = 0.7) # Adjust size of tip labels on BAMM plot
    # PDF_file_path = "Updated_maps_biogeo_old_calib_10My.pdf") 

 # ----- Example 3: With outputs over_time ----- #

 ## Load directly outputs from run_deepSTRAPP_over_time()
 data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Explore output
 str(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, max.level =  1)

 ## Plot updated contMap vs. updated diversification rates for focal_time = 40My
 plot_traits_vs_rates_on_phylogeny_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_0_40,
    focal_time = 40, # Select focal_time = 40My
    # Adjust colors on densityMaps
    colors_per_levels = c("N" = "dodgerblue2", "O" = "orange"),
    # Show legend and label on BAMM plot
    legend = TRUE, labels = TRUE,
    cex_pies = 0.2, # Adjust size of ACE pies on densityMaps
    cex = 0.7) # Adjust size of tip labels on BAMM plot
    # PDF_file_path = "Updated_maps_biogeo_old_calib_40My.pdf")
}


Plot multiple mapped phylogenies of trait/range evolution vs. diversification rates and regime shifts over time-steps

Description

Plot two mapped phylogenies with evolutionary data with branches cut off for each time-step. Main input = output of a deepSTRAPP run over time using run_deepSTRAPP_over_time()).

This function is a wrapper of run_deepSTRAPP_for_focal_time() used to plot mapped phylogenies for a unique given focal_time.

Returns one plot per focal time in ⁠$time_steps⁠. If a PDF file path is provided in PDF_file_path, the plots will be saved directly in a PDF file, with one page per focal time in ⁠$time_steps⁠.

Usage

plot_traits_vs_rates_on_phylogeny_over_time(
  deepSTRAPP_outputs,
  color_scale = NULL,
  colors_per_levels = NULL,
  add_ACE_pies = TRUE,
  cex_pies = 0.5,
  rate_type = "net_diversification",
  keep_initial_colorbreaks = FALSE,
  add_regime_shifts = TRUE,
  configuration_type = "MAP",
  sample_index = 1,
  adjust_size_to_prob = TRUE,
  regimes_fill = "grey",
  regimes_size = 1,
  regimes_pch = 21,
  regimes_border_col = "black",
  regimes_border_width = 1,
  ...,
  display_plot = TRUE,
  PDF_file_path = NULL
)

Arguments

deepSTRAPP_outputs

List of elements generated with run_deepSTRAPP_over_time(), that summarize the results of a STRAPP run over multiple time-steps in the past.

color_scale

Vector of character string. List of colors to use to build the color scale with grDevices::colorRampPalette() showing the evolution of a continuous trait. From lowest values to highest values. (For continuous trait data only)

colors_per_levels

Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors. If NULL (default), the color scale provided densityMaps will be used. (For categorical and biogeographic data only)

add_ACE_pies

Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = TRUE.

cex_pies

Numerical. To adjust the size of the ACE pies. Default = 0.5.

rate_type

A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).

keep_initial_colorbreaks

Logical. Whether to keep the same color breaks as used for the most recent focal time. Typically, the current time (t = 0). Default = FALSE.

add_regime_shifts

Logical. Whether to add the location of regime shifts on the phylogeny (Step 2). Default is TRUE.

configuration_type

A character string specifying how to select the location of regime shifts across posterior samples.

  • configuration_type = "MAP": Use the average locations recorded in posterior samples with the Maximum A Posteriori probability (MAP) configuration. This regime shift configuration is the most frequent configuration among the posterior samples (See BAMMtools::getBestShiftConfiguration()). This is the default option.

  • configuration_type = "MSC": Use the average locations recorded in posterior samples with the Maximum Shift Credibility (MSC) configuration. This regime shift configuration has the highest product of marginal probabilities across branches (See BAMMtools::maximumShiftCredibility()).

  • configuration_type = "index": Use the configuration of a unique posterior sample those index is provided in sample_index.

sample_index

Integer. Index of the posterior samples to use to plot the location of regime shifts. Used only if configuration_type = index. Default = 1.

adjust_size_to_prob

Logical. Whether to scale the size of the symbols showing the location of regime shifts according to the marginal shift probability of the shift happening on each location/branch. This will only works if there is an ⁠$MSP_tree⁠ element summarizing the marginal shift probabilities across branches in the BAMM_object. Default is TRUE.

regimes_fill

Character string. Set the color of the background of the symbols showing the location of regime shifts. Equivalent to the bg argument in BAMMtools::addBAMMshifts(). Default is "grey".

regimes_size

Numerical. Set the size of the symbols showing the location of regime shifts. Equivalent to the cex argument in BAMMtools::addBAMMshifts(). Default is 1.

regimes_pch

Integer. Set the shape of the symbols showing the location of regime shifts. Equivalent to the pch argument in BAMMtools::addBAMMshifts(). Default is 21.

regimes_border_col

Character string. Set the color of the border of the symbols showing the location of regime shifts. Equivalent to the col argument in BAMMtools::addBAMMshifts(). Default is "black".

regimes_border_width

Numerical. Set the width of the border of the symbols showing the location of regime shifts. Equivalent to the lwd argument in BAMMtools::addBAMMshifts(). Default is 1.

...

Additional graphical arguments to pass down to phytools::plot.contMap(), phytools::plot.simmap(), plot_densityMaps_overlay(), BAMMtools::plot.bammdata(), BAMMtools::addBAMMshifts(), and par().

display_plot

Logical. Whether to display the plot generated in the R console. Default is TRUE.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

Details

The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_over_time(). It provides information on results of a STRAPP run performed over multiple time-steps, and can also encompass updated phylogenies with mapped trait evolution and diversification rates and regimes shifts if appropriate arguments are set.

⁠$MAP_BAMM_object⁠ and ⁠$MSC_BAMM_object⁠ elements are required in the elements of ⁠$updated_BAMM_objects_over_time⁠ to plot regime shift locations following the "MAP" or "MSC" configuration_type respectively. A ⁠$MSP_tree⁠ element is required to scale the size of the symbols showing the location of regime shifts according marginal shift probabilities. (If adjust_size_to_prob = TRUE).

Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(), providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the unique time-step used for plotting.

For plotting a single time-step (i.e., ⁠$focal_time⁠) at once, see plot_traits_vs_rates_on_phylogeny_for_focal_time().

Value

If display_plot = TRUE, the function displays the successive plots with two facets in the R console:

If PDF_file_path is provided, the plots will be exported in a unique PDF file with one page per time-step.

Author(s)

Maël Doré

See Also

plot_contMap() phytools::plot.densityMap() plot_densityMaps_overlay() plot_BAMM_rates()

Function in deepSTRAPP needed to produce the deepSTRAPP_outputs as input: run_deepSTRAPP_over_time() Function in deepSTRAPP to a single time-step at once: plot_traits_vs_rates_on_phylogeny_for_focal_time()

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #

 ## Load data

 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

  # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

  # (May take several minutes to run)
 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
              color_scale = color_scale)

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
 # nb_time_steps <- 5
 time_step_duration <- 5
 time_range <- c(0, 40)

 ## Run deepSTRAPP on net diversification rates
 Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
    contMap = Ponerinae_contMap,
    ace = Ponerinae_ACE,
    tip_data = Ponerinae_cont_tip_data,
    trait_data_type = "continuous",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    # nb_time_steps = nb_time_steps,
    time_range = time_range,
    time_step_duration = time_step_duration,
    return_perm_data = TRUE,
    extract_trait_data_melted_df = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly trait data output
 data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Plot updated contMap vs. updated diversification rates
 plot_traits_vs_rates_on_phylogeny_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    # To use the same color breaks as for t = 0 My in all BAMM plots
    keep_initial_colorbreaks = TRUE,
    color_scale = c("limegreen", "orange", "red"), # Adjust color scale on contMap
    legend = TRUE, labels = TRUE, # Show legend and label on BAMM plot
    cex = 0.7) # Adjust label size on contMap
    # PDF_file_path = "Updated_maps_cont_old_calib_0_40My.pdf")

 # ----- Example 2: Biogeographic data ----- #

 ## Load data

 # Load phylogeny
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
 # Load trait df
 data(Ponerinae_binary_range_table, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare range data for Old World vs. New World

 # No overlap in ranges
 table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)

 Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
                                      nm = Ponerinae_binary_range_table$Taxa)
 Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
 Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
 Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
 names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
 table(Ponerinae_NO_data)

 colors_per_levels <- c("mediumpurple2", "peachpuff2")
 names(colors_per_levels) <- c("N", "O")

  # (May take several minutes to run)
 ## Run evolutionary models
 Ponerinae_biogeo_data <- prepare_trait_data(
    tip_data = Ponerinae_NO_data,
    trait_data_type = "biogeographic",
    phylo = Ponerinae_tree_old_calib,
    evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
    BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
    keep_BioGeoBEARS_files = FALSE,
    prefix_for_files = "Ponerinae_old_calib",
    max_range_size = 2,
    split_multi_area_ranges = TRUE, # Set to TRUE to use only unique areas "A" and "B"
    nb_simulations = 100, # Reduce to save time (Default = '1000')
    colors_per_levels = colors_per_levels,
    return_model_selection_df = TRUE,
    verbose = TRUE) 

 # Load directly output
 data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows from 0 to 40 Mya.
 time_range <- c(0, 40)
 time_step_duration <- 5

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates for time-steps = 0 to 40 Mya.
 Ponerinae_deepSTRAPP_biogeo_old_calib_0_40 <- run_deepSTRAPP_over_time(
    densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
    ace = Ponerinae_biogeo_data_old_calib$ace,
    tip_data = Ponerinae_ON_tip_data,
    trait_data_type = "biogeographic",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    time_range = time_range,
    time_step_duration = time_step_duration,
    seed = 1234, # Set seed for reproducibility
    alpha = 0.10, # Select a generous level of significance for the sake of the example
    rate_type = "net_diversification",
    return_perm_data = TRUE,
    extract_trait_data_melted_df = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly output
 data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Explore output
 str(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, max.level =  1)

 ## Plot updated contMap vs. updated diversification rates
 plot_traits_vs_rates_on_phylogeny_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_0_40,
    # Adjust colors on densityMaps
    colors_per_levels = c("N" = "dodgerblue2", "O" = "orange"),
    legend = TRUE, labels = TRUE, # Show legend and label on BAMM plot
    cex_pies = 0.2, # Adjust size of ACE pies on densityMaps
    cex = 0.7) # Adjust label size on contMap
    # PDF_file_path = "Updated_maps_biogeo_old_calib_0_40My.pdf")
}


Run a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow

Description

Run a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow to produce a BAMM_object that contains a phylogenetic tree and associated diversification rates mapped along branches, across selected posterior samples:

The BAMM_object output is typically used as input to run deepSTRAPP with run_deepSTRAPP_for_focal_time() or run_deepSTRAPP_over_time(). Diversification rates and regimes shift can be visualized with plot_BAMM_rates().

BAMM is a model of diversification for time-calibrated phylogenies that explores complex diversification dynamics by allowing multiple regime shifts across clades without a priori hypotheses on the location of such shifts. It uses reversible jump Markov chain Monte Carlo (rjMCMC) to automatically explore a vast range of models with different speciation and extinction rates, and different number and location of regime shits.

This function will work only if you have the BAMM C++ program installed in your machine. See the BAMM website: http://bamm-project.org/ and the companion R package {BAMMtools}.

Usage

prepare_diversification_data(
  BAMM_install_directory_path,
  phylo,
  prefix_for_files = NULL,
  seed = NULL,
  numberOfGenerations = 10^7,
  globalSamplingFraction = 1,
  sampleProbsFilename = NULL,
  expectedNumberOfShifts = NULL,
  eventDataWriteFreq = NULL,
  burn_in = 0.25,
  nb_posterior_samples = 1000,
  additional_BAMM_settings = list(),
  BAMM_output_directory_path = NULL,
  keep_BAMM_outputs = TRUE,
  MAP_odd_ratio_threshold = 5,
  skip_evaluations = FALSE,
  plot_evaluations = TRUE,
  save_evaluations = TRUE
)

Arguments

BAMM_install_directory_path

Character string. The path to the directory where BAMM is. Use '/' to separate directory and sub-directories. The path must end with '/'.

phylo

Time-calibrated phylogeny. Object of class "phylo" as defined in R package {ape}. The phylogeny must be rooted and fully resolved. BAMM does not currently work with fossils, so the tree must also be ultrametric.

prefix_for_files

Character string. Prefix to add to all BAMM files stored in the BAMM_output_directory_path if keep_BAMM_outputs = TRUE. Files will be exported such as 'prefix_*' with an underscore separating the prefix and the file name. Default is NULL (no prefix is added).

seed

Integer. Set the seed to ensure reproducibility. Default is NULL (a random seed is used).

numberOfGenerations

Integer. Number of steps in the MCMC run. It should be set high enough to reach the equilibrium distribution and allows posterior samples to be uncorrelated. Check the Effective Sample Size of parameters with coda::effectiveSize() in the Evaluation step. Default value is 10^7.

globalSamplingFraction

Numerical. Global sampling fraction representing the overall proportion of terminals in the phylogeny compared to the estimated overall richness in the clade. It acts as a multiplier on the rates needed to achieve such extant diversity. Default is 1.0 (assuming all taxa are in the phylogeny).

sampleProbsFilename

Character string. The path to the .txt file used to provide clade-specific sampling fractions. See BAMMtools::samplingProbs() to generate such file. If provided, globalSamplingFraction is ignored.

expectedNumberOfShifts

Integer. Set the expected number of regime shifts. It acts as an hyperparameter controlling the exponential prior distribution used to modulate reversible jumps across model configurations in the rjMCMC run. If set to NULL (default), an empirical rule will be used to define this value: 1 regime shift expected for every 100 tips in the phylogeny, with a minimum of 1. The best practice consists in trying several values and inspect the similarity of the prior and posterior distribution of the regime shift parameter. See BAMMtools::plotPrior() and the Evaluation step to produce such evaluation plot.

eventDataWriteFreq

Integer. Set the frequency in which to write the event data to the output file = the sampling frequency of posterior samples. If set to NULL (default), will set frequency such as 2000 posterior samples are recorded such as eventDataWriteFreq = numberOfGenerations / 2000.

burn_in

Numerical. Proportion of posterior samples removed from the BAMM output to ensure that the remaining samples where drawn once the equilibrium distribution was reached. This can be evaluated looking at the MCMC trace (see Evaluation step). Default is 0.25.

nb_posterior_samples

Numerical. Number of posterior samples to extract, after removing the burn-in, in the final BAMM_object to use for downstream analyses. Default = 1000.

additional_BAMM_settings

List of named elements. Additional settings options for BAMM provided as a list of named arguments. Ex: list(lambdaInit0 = 0.5, muInit0 = 0). See available settings in the template file provided within the deepSTRAPP package files as 'BAMM_template_diversification.txt'. The template can also be loaded directly in R with utils::data(BAMM_template_diversification) and displayed with print(BAMM_template_diversification).

BAMM_output_directory_path

Character string. The path to the directory used to store input/output files generated. Use '/' to separate directory and subdirectories. It must end with '/'.

keep_BAMM_outputs

Logical. Whether the BAMM_output_directory should be kept after the run. Default = TRUE.

MAP_odd_ratio_threshold

Numerical. Controls the definition of 'core-shifts' used to distinguish across configurations when fetching the MAP samples. Shifts that have an odd-ratio of marginal posterior probability / prior lower than MAP_odd_ratio_threshold are ignored. See BAMMtools::getBestShiftConfiguration(). Default = 5.

skip_evaluations

Logical. Whether to skip the Evaluation step including MCMC trace, ESS, and prior/posterior comparisons for expected number of shifts. Default = FALSE.

plot_evaluations

Logical. Whether to display the plots generated during the Evaluation step: MCMC trace, and prior/posterior comparisons for expected number of shifts. Default = TRUE.

save_evaluations

Logical. Whether to save the outputs of evaluations in a table (ESS), and PDFs (MCMC trace, and prior/posterior comparisons for expected number of shifts) in the BAMM_output_directory. Default = TRUE.

Details

This function runs a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow to produce a BAMM_object that contains a phylogenetic tree and associated diversification rates mapped along branches, across selected posterior samples.

Step 1: Set BAMM

Step 2: Run BAMM

Step 3: Evaluate BAMM

Step 4: Import BAMM outputs

Step 5: Clean BAMM files

The BAMM_object output:

Value

The function returns a BAMM_object of class "bammdata" which is a list with at least 22 elements.

Phylogeny-related elements used to plot a phylogeny with ape::plot.phylo():

BAMM internal elements used for tree exploration:

BAMM elements summarizing diversification data:

Additional elements providing key information for downstream analyses:

The function also produces files listed in the Details section and stored in the the BAMM_output_directory.

Note on diversification models for time-calibrated phylogenies

This function relies on BAMM to provide a reliable solution to map diversification rates and regime shifts on a time-calibrated phylogeny and obtain the BAMM_object object needed to run the deepSTRAPP workflow (run_deepSTRAPP_for_focal_time, run_deepSTRAPP_over_time). However, it is one option among others for modeling diversification on phylogenies. You may wish to explore alternatives models such as LSBDS model in RevBayes (Höhna et al., 2016), the MTBD model (Barido-Sottani et al., 2020), or the ClaDS2 model (Maliet et al., 2019) for your own data. However, you will need Bayesian models that infer regime shifts to be able to perform STRAPP tests (Rabosky & Huang, 2016). Additionally, you need to format the model output such as in BAMM_object, so it can be used in a deepSTRAPP workflow.

This function perform a single BAMM run to infer diversification rates and regime shifts. Due to the stochastic nature of the exploration of the parameter space with MCMC process, best practice recommend to ran multiple runs and check for convergence of the MCMC traces, ensuring that the region of high probability has been reached by your MCMC runs.

Author(s)

Maël Doré

References

For BAMM: Rabosky, D. L. (2014). Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PloS one, 9(2), e89543. doi:10.1371/journal.pone.0089543. Website: http://bamm-project.org/.

For {BAMMtools}: Rabosky, D. L., Grundler, M., Anderson, C., Title, P., Shi, J. J., Brown, J. W., ... & Larson, J. G. (2014). BAMM tools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5(7), 701-707. doi:10.1111/2041-210X.12199

See Also

run_deepSTRAPP_for_focal_time() run_deepSTRAPP_over_time() update_rates_and_regimes_for_focal_time() prepare_trait_data() plot_BAMM_rates()

For a guided tutorial, see this vignette: vignette("model_diversification_dynamics", package = "deepSTRAPP")

Examples

# ----- Example 1: Whale phylogeny ----- #

library(phytools)
data(whale.tree)

## Not run: 
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (\url{http://bamm-project.org/}) for information.

# Run BAMM workflow with deepSTRAPP
whale_BAMM_object <- prepare_diversification_data(
   BAMM_install_directory_path = "./software/bamm-2.5.0/",
   phylo = whale.tree,
   prefix_for_files = "whale",
   numberOfGenerations = 100000, # Set low for the example
   BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
   keep_BAMM_outputs = FALSE, # Adjust if needed
)
## End(Not run)

# Load directly the result
data(whale_BAMM_object)

# Explore output
str(whale_BAMM_object, 1)

# Plot mean net diversification rates and regime shifts on the phylogeny
plot_BAMM_rates(whale_BAMM_object, cex = 0.5,
                labels = TRUE, legend = TRUE)

# ----- Example 2: Ponerinae phylogeny ----- #

#   Load phylogeny
data("Ponerinae_tree", package = "deepSTRAPP")
plot(Ponerinae_tree, show.tip.label = FALSE)

## Not run: 
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (http://bamm-project.org/) for information.

# Run BAMM workflow with deepSTRAPP
Ponerinae_BAMM_object <- prepare_diversification_data(
   BAMM_install_directory_path = "./software/bamm-2.5.0/",
   phylo = Ponerinae_tree,
   prefix_for_files = "Ponerinae",
   numberOfGenerations = 10^7, # Set high for optimal run, but will take ages
   BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
   keep_BAMM_outputs = FALSE, # Adjust if needed
)
## End(Not run)

if (deepSTRAPP::is_dev_version())
{
 # Load directly the result
 data(Ponerinae_BAMM_object)
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 # Explore output
 str(Ponerinae_BAMM_object, 1)

 # Plot mean net diversification rates and regime shifts on the phylogeny
 plot_BAMM_rates(Ponerinae_BAMM_object,
                 labels = FALSE, legend = TRUE)
}


Map trait evolution on a time-calibrated phylogeny

Description

Map trait evolution on a time-calibrated phylogeny in several steps:

Usage

prepare_trait_data(
  tip_data,
  trait_data_type,
  phylo,
  seed = NULL,
  evolutionary_models = NULL,
  Q_matrix = NULL,
  BioGeoBEARS_directory_path = NULL,
  keep_BioGeoBEARS_files = TRUE,
  prefix_for_files = NULL,
  nb_cores = 1,
  max_range_size = 2,
  split_multi_area_ranges = FALSE,
  ...,
  res = 100,
  nb_simulations = 1000,
  color_scale = NULL,
  colors_per_levels = NULL,
  plot_map = TRUE,
  plot_overlay = TRUE,
  add_ACE_pies = TRUE,
  PDF_file_path = NULL,
  return_ace = TRUE,
  return_BSM = FALSE,
  return_simmaps = FALSE,
  return_best_model_fit = FALSE,
  return_model_selection_df = FALSE,
  verbose = TRUE
)

Arguments

tip_data

Named numerical or character string vector of trait values/states/ranges at tips. Names should be ordered as the tip labels in the phylogeny found in phylo$tip.label. For biogeographic data, ranges should follow the coding scheme of BioGeoBEARS with a unique CAPITAL letter per unique areas (ex: A, B), combined to form multi-area ranges (Ex: AB). Alternatively, you can provide tip_data as a matrix or data.frame of binary presence/absence in each area (coded as unique CAPITAL letter). In this case, columns are unique areas, rows are taxa, and values are integer (0/1) signaling absence or presence of the taxa in the area.

trait_data_type

Character string. Type of trait data. Either: "continuous", "categorical" or "biogeographic".

phylo

Time-calibrated phylogeny. Object of class "phylo" as defined in {ape}. Tip labels (phylo$tip.label) should match names in tip_data.

seed

Integer. Set the seed to ensure reproducibility. Default is NULL (a random seed is used).

evolutionary_models

(Vector of) character string(s). To provide the set of evolutionary models to fit on the data.

  • Models available for continuous data are detailed in geiger::fitContinuous(). Default is "BM".

  • Models available for categorical data are detailed in geiger::fitDiscrete(). Default is "ARD".

  • Models for biogeographic data are fit with R package BioGeoBEARS using BioGeoBEARS::bears_optim_run(). Default is "DEC".

  • See list in "Details" section.

Q_matrix

Custom Q-matrix for categorical data representing transition classes between states. Transitions with similar integers are estimated with a shared rate parameter. Transitions with 0 represent rates that are fixed to zero (i.e., impossible transitions). Diagonal must be populated with NA. row.names(Q_matrix) and col.names(Q_matrix) are the states. Provide "matrix" among the model listed in 'evolutionary_models' to use the custom Q-matrix for modeling. Only for categorical data.

BioGeoBEARS_directory_path

Character string. The path to the directory used to store input/output files generated for/by BioGeoBEARS during biogeographic historical inferences. Use '/' to separate directory and sub-directories. It must end with '/'. Only for biogeographic data.

keep_BioGeoBEARS_files

Logical. Whether the BioGeoBEARS_directory and its content should be kept after the run. Default = TRUE. Only for biogeographic data.

prefix_for_files

Character string. Prefix to add to all BioGeoBEARS files stored in the BioGeoBEARS_directory_path if keep_BioGeoBEARS_files = TRUE. Files will be exported such as 'prefix_*' with an underscore separating the prefix and the file name. Default is NULL (no prefix is added). Only for biogeographic data.

nb_cores

Interger. Number of cores to use for parallel computation during BioGeoBEARS runs. Default = 1. Only for biogeographic data.

max_range_size

Integer. Maximum number of unique areas encompassed by multi-range areas. Default = 2. Only for biogeographic data.

split_multi_area_ranges

Logical. Whether to split multi-area ranges across unique areas when mapping ranges. Ex: For range EW, posterior probabilities will be split equally between Eastern Palearctic (E) and Western Palearctic (W). Default = FALSE. Only for biogeographic data.

...

Additional arguments to be passed down to the functions used to fit models (See evolutionary_models) and produce simmaps with phytools::make.simmap() or BioGeoBEARS::runBSM().

res

Integer. Define the number of time steps used to interpolate/estimate trait value/state/range in contMap/densityMaps. Default = 100.

nb_simulations

Integer. Define the number of simulations generated for stochastic mapping. Default = 1000. Only for "categorical" and "biogeographic" data.

color_scale

Vector of character string. List of colors to use to build the color scale with grDevices::colorRampPalette() showing the evolution of a continuous trait on the contMap. From lowest values to highest values. Only for continuous data. Default = NULL will use the rainbow() color palette.

colors_per_levels

Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors. If NULL (default), the rainbow() color scale will be used for categorical trait; the BioGeoBEARS::get_colors_for_states_list_0based() will be used for biogeographic ranges. If no color is provided for multi-area ranges, they will be interpolated based on the colors provided for unique areas. Only for categorical and biogeographic data.

plot_map

Logical. Whether to plot or not the phylogeny with mapped trait evolution. Default = TRUE.

plot_overlay

Logical. If TRUE (default), plot a unique densityMap with overlapping states/ranges using transparency. If FALSE, plot a densityMap per state/range. Only for "categorical" and "biogeographic" data.

add_ACE_pies

Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = TRUE. Only for categorical and biogeographic data.

PDF_file_path

Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf".

return_ace

Logical. Whether the named vector of ancestral characters estimates (ACE) at internal nodes should be returned in the output. Default = TRUE.

return_BSM

Logical. (Only for Biogeographic data) Whether the summary tables of anagenetic and cladogenetic events generated during the Biogeographic Stochastic Mapping (BSM) process should be returned in the output. Default = FALSE.

return_simmaps

Logical. Whether the evolutionary histories simulated during stochastic mapping (i.e., simmaps) should be returned in the output. Default = TRUE. Only for "categorical" and "biogeographic" data.

return_best_model_fit

Logical. Whether to include the output of the best fitting model in the function output. Default = FALSE.

return_model_selection_df

Logical. Whether to include the data.frame summarizing model comparisons used to select the best fitting model should be returned in the output. Default = FALSE.

verbose

Logical. Should progression be displayed? A message will be printed for every steps in the process. Default is TRUE.

Details

Map trait evolution on a time-calibrated phylogeny in several steps:

Step 1: Models are fit using Maximum Likelihood approach:

Step 2: Best model is identified among the list of evolutionary_models by comparing the corrected AIC (AICc) and selecting the model with lowest AICc.

Step 3: For continuous traits: Ancestral characters estimates (ACE) are inferred with phytools::fastAnc on a tree with modified branch lengths scaled to reflect the evolutionary rates estimated from the best model using phytools::rescale().

Step 4: Stochastic Mapping.

For categorical and biogeographic data, stochastic mapping simulations are performed to generate evolutionary histories compatible with the best model and inferred ACE. Node states/ranges are drawn from the scaled marginal likelihoods of ACE, and states/ranges shifts along branches are simulated according to the transition matrix Q estimated from the best fitting model.

Step 5: Infer ancestral states along branches.

Value

The function returns a list with at least two elements.

If return_ace = TRUE,

If return_BSM = TRUE, (Only for biogeographic data)

If return_simmaps = TRUE, (Only for categorical and biogeographic data)

If return_best_model_fit = TRUE,

If model_selection_df = TRUE,

For biogeographic data, the function also produce input and output files associated with BioGeoBEARS and stored in the directory specified in BioGeoBEARS_directory_path. The directory and its content are kept if keep_BioGeoBEARS_files = TRUE

Note on macroevolutionary models of trait evolution

This function provides an easy solution to map trait evolution on a time-calibrated phylogeny and obtain the contMap/densityMaps objects needed to run the deepSTRAPP workflow (run_deepSTRAPP_for_focal_time, run_deepSTRAPP_over_time). However, it does not explore the most complex options for trait evolution. You may need to explore more complex models to capture the dynamics of trait evolution. such as trait-dependent multi-rate models (phytools::brownie.lite(), OUwie::OUwie), Bayesian MCMC implementations allowing a thorough exploration of location and number of regime shifts (Ex: BayesTraits, RevBayes), or RRphylo for a penalized phylogenetic ridge regression approach that allows regime shifts across all branches.

Note on macroevolutionary models of biogeographic history

This function provides an easy solution to infer ancestral geographic ranges using the BioGeoBEARS framework. It allows to directly compare model fits across 6 models: DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, BAYAREALIKE+J. It uses a step-wise approach by using MLE estimates of previous runs as staring parameter values when increasing complexity (adding +J parameters). However, it does not explore the most complex options for historical biogeography. You may need to explore more complex models to capture the dynamics of range evolution such as time-stratification with adjacency matrices, dispersal multipliers (+W), distance-based dispersal probabilities (+X), or other features. See for instance, http://phylo.wikidot.com/biogeobears).

The R package BioGeoBEARS is needed for this function to work with biogeographic data. Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.

Author(s)

Maël Doré

References

For macroevolutionary models in geiger: Pennell, M. W., Eastman, J. M., Slater, G. J., Brown, J. W., Uyeda, J. C., FitzJohn, R. G., ... & Harmon, L. J. (2014). geiger v2. 0: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics, 30(15), 2216-2218. doi:10.1093/bioinformatics/btu181.

For BioGeoBEARS: Matzke, Nicholas J. (2018). BioGeoBEARS: BioGeography with Bayesian (and likelihood) Evolutionary Analysis with R Scripts. version 1.1.1, published on GitHub on November 6, 2018. doi:10.5281/zenodo.1478250. Website: http://phylo.wikidot.com/biogeobears.

See Also

geiger::fitContinuous() geiger::fitDiscrete() phytools::contMap() phytools::densityMap()

For a guided tutorial, see the associated vignettes:

Examples

# ----- Example 1: Continuous data ----- #

## Load phylogeny and tip data
# Load eel phylogeny and tip data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
data("eel.tree", package = "phytools")
data("eel.data", package = "phytools")

# Extract body size
eel_data <- stats::setNames(eel.data$Max_TL_cm,
                            rownames(eel.data))

 # (May take several minutes to run)
## Map trait evolution on the phylogeny
mapped_cont_traits <- prepare_trait_data(
   tip_data = eel_data,
   trait_data_type = "continuous",
   phylo = eel.tree,
   evolutionary_models = c("BM", "OU", "lambda", "kappa"),
   # Example of an additional argument ('control') that can be provided to geiger::fitContinuous()
   control = list(niter = 200),
   color_scale = c("darkgreen", "limegreen", "orange", "red"),
   plot_map = FALSE,
   return_best_model_fit = TRUE,
   return_model_selection_df = TRUE,
   verbose = TRUE)

## Explore output
plot_contMap(mapped_cont_traits$contMap) # contMap with interpolated trait values
mapped_cont_traits$model_selection_df # Summary of model selection
# Parameter estimates and optimization summary of the best model
# (Here, the best model is Pagel's lambda)
mapped_cont_traits$best_model_fit$opt
mapped_cont_traits$ace # Ancestral character estimates at internal nodes 

# ----- Example 2: Categorical data ----- #

## Load phylogeny and tip data
# Load eel phylogeny and tip data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
data("eel.tree", package = "phytools")
data("eel.data", package = "phytools")

# Transform feeding mode data into a 3-level factor
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
eel_data <- as.character(eel_data)
eel_data[c(1, 5, 6, 7, 10, 11, 15, 16, 17, 24, 25, 28, 30, 51, 52, 53, 55, 58, 60)] <- "kiss"
eel_data <- stats::setNames(eel_data, rownames(eel.data))
table(eel_data)

# Manually define a Q_matrix for rate classes of state transition to use in the 'matrix' model
# Does not allow transitions from state 1 ("bite") to state 2 ("kiss") or state 3 ("suction")
# Does not allow transitions from state 3 ("suction") to state 1 ("bite")
# Set symmetrical rates between state 2 ("kiss") and state 3 ("suction")
Q_matrix = rbind(c(NA, 0, 0), c(1, NA, 2), c(0, 2, NA))

# Set colors per states
colors_per_states <- c("limegreen", "orange", "dodgerblue")
names(colors_per_states) <- c("bite", "kiss", "suction")

 # (May take several minutes to run)
## Run evolutionary models
eel_cat_3lvl_data <- prepare_trait_data(tip_data = eel_data, phylo = eel.tree,
    trait_data_type = "categorical",
    colors_per_levels = colors_per_states,
    evolutionary_models = c("ER", "SYM", "ARD", "meristic", "matrix"),
    Q_matrix = Q_matrix,
    nb_simulations = 1000,
    plot_map = TRUE,
    plot_overlay = TRUE,
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE) 

# Load directly output
data(eel_cat_3lvl_data, package = "deepSTRAPP")

## Explore output
plot(eel_cat_3lvl_data$densityMaps[[1]]) # densityMap for state n°1 ("bite")
eel_cat_3lvl_data$model_selection_df # Summary of model selection
# Parameter estimates and optimization summary of the best model
# (Here, the best model is ER)
print(eel_cat_3lvl_data$best_model_fit)$ # Summary of the best evolutionary model
eel_cat_3lvl_data$ace # Posterior probabilities of each state (= ACE) at internal nodes


# ----- Example 3: Biogeographic data ----- #

if (deepSTRAPP::is_dev_version())
{
 ## The R package 'BioGeoBEARS' is needed for this function to work with biogeographic data.
 # Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.

 ## Load phylogeny and tip data
 # Load eel phylogeny and tip data from the R package phytools
 # Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
 data("eel.tree", package = "phytools")
 data("eel.data", package = "phytools")

 # Transform feeding mode data into biogeographic data with ranges A, B, and AB.
 eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
 eel_data <- as.character(eel_data)
 eel_data[eel_data == "bite"] <- "A"
 eel_data[eel_data == "suction"] <- "B"
 eel_data[c(5, 6, 7, 15, 25, 32, 33, 34, 50, 52, 57, 58, 59)] <- "AB"
 eel_data <- stats::setNames(eel_data, rownames(eel.data))
 table(eel_data)

 colors_per_ranges <- c("dodgerblue3", "gold")
 names(colors_per_ranges) <- c("A", "B")

  # (May take several minutes to run)
 ## Run evolutionary models
 eel_biogeo_data <- prepare_trait_data(
    tip_data = eel_data,
    trait_data_type = "biogeographic",
    phylo = eel.tree,
    # Default = "DEC" for biogeographic
    evolutionary_models = c("BAYAREALIKE", "DIVALIKE", "DEC",
                            "BAYAREALIKE+J", "DIVALIKE+J", "DEC+J"),
    BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
    keep_BioGeoBEARS_files = FALSE,
    prefix_for_files = "eel",
    max_range_size = 2,
    split_multi_area_ranges = TRUE, # Set to TRUE to display the two outputs
    # Reduce the number of Stochastic Mapping simulations to save time (Default = '1000')
    nb_simulations = 100,
    colors_per_levels = colors_per_ranges,
    return_simmaps = TRUE,
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE,
    verbose = TRUE) 

 # Load directly output
 data(eel_biogeo_data, package = "deepSTRAPP")

 ## Explore output
 str(eel_biogeo_data, 1)
 eel_biogeo_data$model_selection_df # Summary of model selection
 # Parameter estimates and optimization summary of the best model
 # (Here, the best model is DEC+J)
 eel_biogeo_data$best_model_fit$optim_result

 # Posterior probabilities of each state (= ACE) at internal nodes
 eel_biogeo_data$ace # Only with unique areas
 eel_biogeo_data$ace_all_ranges # Including multi-area ranges (Here, AB)

 ## Plot densityMaps
 # densityMap for range n°1 ("A")
 plot(eel_biogeo_data$densityMaps[[1]])
 # densityMaps with all unique areas overlaid
 plot_densityMaps_overlay(eel_biogeo_data$densityMaps)
 # densityMaps with all ranges (including multi-area ranges) overlaid
 plot_densityMaps_overlay(eel_biogeo_data$densityMaps_all_ranges)
}


Run deepSTRAPP to test for a relationship between diversification rates and trait data at a given focal time

Description

Wrapper function to run deepSTRAPP workflow for a given point in the past (i.e. the focal_time). It starts from traits mapped on a phylogeny (trait data) and BAMM output (diversification data) and carries out the appropriate statistical method to test for a relationship between diversification rates and trait data. Tests are based on block-permutations: rates data are randomized across tips following blocks defined by the diversification regimes identified on each tip (typically from a BAMM).

Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.

See the original BAMMtools::traitDependentBAMM() function used to carry out STRAPP test on extant time-calibrated phylogenies.

Tests can be carried out on speciation, extinction and net diversification rates.

Usage

run_deepSTRAPP_for_focal_time(
  contMap = NULL,
  densityMaps = NULL,
  ace = NULL,
  tip_data = NULL,
  trait_data_type,
  BAMM_object,
  focal_time,
  keep_tip_labels = TRUE,
  rate_type = "net_diversification",
  seed = NULL,
  nb_permutations = NULL,
  replace_samples = FALSE,
  alpha = 0.05,
  two_tailed = TRUE,
  one_tailed_hypothesis = NULL,
  posthoc_pairwise_tests = FALSE,
  p.adjust_method = "none",
  return_perm_data = FALSE,
  nthreads = 1,
  print_hypothesis = TRUE,
  extract_diversification_data_melted_df = FALSE,
  return_updated_trait_data_with_Map = FALSE,
  return_updated_BAMM_object = FALSE,
  verbose = TRUE
)

Arguments

contMap

For continuous trait data. Object of class "contMap", typically generated with prepare_trait_data() or phytools::contMap(), that contains a phylogenetic tree and associated continuous trait mapping. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

densityMaps

For categorical trait or biogeographic data. List of objects of class "densityMap", typically generated with prepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given state/range along branches. Each object (i.e., densityMap) corresponds to a state/range. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

ace

(Optional) Ancestral Character Estimates (ACE) at the internal nodes. Obtained with prepare_trait_data() as output in the ⁠$ace⁠ slot.

  • For continuous trait data: Named numerical vector typically generated with phytools::fastAnc(), phytools::anc.ML(), or ape::ace(). Names are nodes_ID of the internal nodes. Values are ACE of the trait.

  • For categorical trait or biogeographic data: Matrix that record the posterior probabilities of ancestral states/ranges. Rows are internal nodes_ID. Columns are states/ranges. Values are posterior probabilities of each state per node. Needed in all cases to provide accurate estimates of trait values.

tip_data

(Optional) Named vector of tip values of the trait.

  • For continuous trait data: Named numerical vector of trait values.

  • For categorical trait or biogeographic data: Character string vector of states/ranges Names are nodes_ID of the internal nodes. Needed to provide accurate tip values.

  • For biogeographic data, ranges should follow the coding scheme of BioGeoBEARS with a unique CAPITAL letter per unique areas (ex: A, B), combined to form multi-area ranges (Ex: AB). Alternatively, you can provide tip_data as a matrix or data.frame of binary presence/absence in each area (coded as unique CAPITAL letter). In this case, columns are unique areas, rows are taxa, and values are integer (0/1) signaling absence or presence of the taxa in the area.

trait_data_type

Character string. Specify the type of trait data. Must be one of "continuous", "categorical", "biogeographic".

BAMM_object

Object of class "bammdata", typically generated with prepare_diversification_data(), that contains a phylogenetic tree and associated diversification rate mapping across selected posterior samples. The phylogenetic tree must the same as the one associated with the contMap/densityMaps, ace and tip_data.

focal_time

Numerical. The time, in terms of time distance from the present, at which data must be extracted and the phylogeny and mappings must be cut. It must be smaller than the root age of the phylogeny.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label on the updated phylogeny. Default is TRUE.

rate_type

A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).

seed

Integer. Set the seed to ensure reproducibility. Default is NULL (a random seed is used).

nb_permutations

Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once.

replace_samples

Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample when drawing samples used to carry out the STRAPP test. Default is FALSE.

alpha

Numerical. Significance level to use to compute the estimate corresponding to the values of the test statistic used to assess significance of the test. This does NOT affect p-values. Default is 0.05.

two_tailed

Logical. To define the type of tests. If TRUE (default), tests for correlations/differences in rates will be carried out with a null hypothesis that rates are not correlated with trait values (continuous data) or equals between trait states (categorical and biogeographic data). If FALSE, one-tailed tests are carried out.

  • For continuous data, it involves defining a one_tailed_hypothesis testing for either a "positive" or "negative" correlation under the alternative hypothesis.

  • For binary data (two states), it involves defining a one_tailed_hypothesis indicating which states have higher rates under the alternative hypothesis.

  • For multinominal data (more than two states), it defines the type of post hoc pairwise tests to carry out between pairs of states. If posthoc_pairwise_tests = TRUE, all two-tailed (if two_tailed = TRUE) or one-tailed (if two_tailed = FALSE) tests are automatically carried out.

one_tailed_hypothesis

A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B').

posthoc_pairwise_tests

Logical. Only for multinominal data (with more than two states). If TRUE, all possible post hoc pairwise (Dunn) tests will be computed across all pairs of states. This is a way to detect which pairs of states have significant differences in rates if the overall test (Kruskal-Wallis) is significant. Default is FALSE.

p.adjust_method

A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values in the post hoc pairwise tests to account for multiple comparisons. See stats::p.adjust() for the available methods. Default is none.

return_perm_data

Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output. This is needed to plot the histogram of the null distribution used to assess significance of the test with plot_histogram_STRAPP_test_for_focal_time(). Default is FALSE.

nthreads

Integer. Number of threads to use for paralleled computing of the STRAPP tests across the permutations. The R package parallel must be loaded for nthreads > 1. Default is 1.

print_hypothesis

Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses, and what significant level is used to rejected or not the null hypothesis. Default is TRUE.

extract_diversification_data_melted_df

Logical. Specify whether diversification data (regimes ID and tip rates) must be extracted from the updated_BAMM_object and returned in a melted data.frame. Default is FALSE.

return_updated_trait_data_with_Map

Logical. Specify whether the trait_data extracted for the given focal_time and the updated version of mapped phylogeny (contMap/densityMaps) provided as input should be returned among the outputs. The updated contMap/densityMaps consists in cutting off branches and mapping that are younger than the focal_time. Default is FALSE.

return_updated_BAMM_object

Logical. Specify whether the updated_BAMM_object with phylogeny and mapped diversification rates cut-off at the focal_time should be returned among the outputs.

verbose

Logical. Should progression be displayed? A message will be printed at each stepof the deepSTRAPP workflow, and for every batch of 100 BAMM posterior samples whose rates are regimes are updated, and optionally extracted in a melted data.frame (if extract_diversification_data_melted_df = TRUE). Default is TRUE.

Details

The function encapsulates several functions carrying out each step of the deepSTRAPP workflow:

Extract trait data

extract_most_likely_trait_values_for_focal_time() extracts the most likely trait values found along branches at the focal_time. Optionally, the function can update the mapped phylogeny (contMap/densityMaps) such as branches overlapping the focal_time are shorten to the focal_time, and the trait mapping for the cut off branches are removed by updating the ⁠$tree$maps⁠ and ⁠$tree$mapped.edge⁠ elements.

Extract diversification data

update_rates_and_regimes_for_focal_time() updates the BAMM_object to obtain the diversification rates/regimes found along branches the focal_time. Optionally, the function can update the BAMM_object to display a mapped phylogeny such as branches overlapping the focal_time are shorten to the focal_time

Extract diversification data in a melted df

If requested (extract_diversification_data_melted_df = TRUE), extract_diversification_data_melted_df_for_focal_time() will be used to extract regimes ID and tip rates from the updated_BAMM_object and provide a melted data.frame summarizing the diversification data as found on the phylogeny for the focal_time.

Compute STRAPP test

compute_STRAPP_test_for_focal_time() carries out the appropriate statistical method to test for a relationship between diversification rates and trait data for a given point in the past (i.e. the focal_time). It can handle three types of statistical tests depending on the type of trait data provided:

Value

The function returns a list with at least two elements.

Optional formatted output:

Optional data updated for the focal_time:

Author(s)

Maël Doré

See Also

extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time() extract_diversification_data_melted_df_for_focal_time() compute_STRAPP_test_for_focal_time()

For a guided tutorial on complete deepSTRAPP workflow, see the associated vignettes:

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #
 ## Load data

 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

  # (May take several minutes to run)
 # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
             color_scale = color_scale)

 ## Set focal time to 10 Mya
 focal_time <- 10

 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 deepSTRAPP_output <- run_deepSTRAPP_for_focal_time(
   contMap = Ponerinae_contMap,
   ace = Ponerinae_ACE,
   tip_data = Ponerinae_cont_tip_data,
   trait_data_type = "continuous",
   BAMM_object = Ponerinae_BAMM_object_old_calib,
   focal_time = focal_time,
   rate_type = "net_diversification",
   return_perm_data = TRUE,
   extract_diversification_data_melted_df = TRUE,
   return_updated_trait_data_with_Map = TRUE,
   return_updated_BAMM_object = TRUE)

 ## Explore output
 str(deepSTRAPP_output, max.level = 1)

 # Access deepSTRAPP results
 str(deepSTRAPP_output$STRAPP_results)

 # Access trait data
 head(deepSTRAPP_output$updated_trait_data_with_Map$trait_data)

 # Access the diversification data in a melted data.frame
 head(deepSTRAPP_output$diversification_data_df)

 # Plot rates vs. trait values across branches
 plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_output)

 # Plot updated contMap
 plot_contMap(deepSTRAPP_output$updated_trait_data_with_Map$contMap)
 ape::nodelabels(text =
   deepSTRAPP_output$updated_trait_data_with_Map$contMap$tree$initial_nodes_ID)

 # Plot diversification rates on updated phylogeny
 plot_BAMM_rates(deepSTRAPP_output$updated_BAMM_object, labels = TRUE)

 # Plot histogram of test stats
 plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = deepSTRAPP_output) 

 # ----- Example 2: Categorical trait ----- #

 ## Load data

 # Load phylogeny
 data(Ponerinae_tree, package = "deepSTRAPP")
 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract categorical data with 3-levels
 Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
                                         nm = Ponerinae_trait_tip_data$Taxa)
 table(Ponerinae_cat_3lvl_tip_data)

 # Select color scheme for states
 colors_per_states <- c("forestgreen", "sienna", "goldenrod")
 names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

  # (May take several minutes to run)
 ## Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
 Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
    tip_data = Ponerinae_cat_3lvl_tip_data,
    phylo = Ponerinae_tree_old_calib,
    trait_data_type = "categorical",
    colors_per_levels = colors_per_states,
    evolutionary_models = "ARD", # Use default ARD model
    nb_simulations = 100, # Reduce number of simulations to save time
    seed = 1234, # Seet seed for reproducibility
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE,
    plot_map = FALSE) 

 # Load directly output
 data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")

 ## Set focal time to 10 Mya
 focal_time <- 10

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 deepSTRAPP_output <- run_deepSTRAPP_for_focal_time(
     densityMaps = Ponerinae_cat_data_old_calib$densityMaps,
     ace = Ponerinae_cat_data_old_calib$ace,
     tip_data = Ponerinae_cat_3lvl_tip_data,
     trait_data_type = "categorical",
     BAMM_object = Ponerinae_BAMM_object_old_calib,
     focal_time = focal_time,
     rate_type = "net_diversification",
     posthoc_pairwise_tests = TRUE,
     return_perm_data = TRUE,
     extract_diversification_data_melted_df = TRUE,
     return_updated_trait_data_with_Map = TRUE,
     return_updated_BAMM_object = TRUE)

 ## Explore output
 str(deepSTRAPP_output, max.level = 1)

 # Access deepSTRAPP results
 str(deepSTRAPP_output$STRAPP_results, max.level = 2)
 # Result for overall Kruskal-Wallis test
 deepSTRAPP_output$STRAPP_results[1:3]
 # Results for posthoc pairwise Dunn's tests
 deepSTRAPP_output$STRAPP_results$posthoc_pairwise_tests$summary_df

 # Access trait data
 head(deepSTRAPP_output$updated_trait_data_with_Map$trait_data)

 # Access the diversification data in a melted data.frame
 head(deepSTRAPP_output$diversification_data_df)

 # Plot rates vs. states across branches
 plot_rates_vs_trait_data_for_focal_time(
     deepSTRAPP_outputs = deepSTRAPP_output,
     colors_per_levels = colors_per_states)

 # Plot updated densityMaps cut at focal time
 plot_densityMaps_overlay(deepSTRAPP_output$updated_trait_data_with_Map$densityMaps)

 # Plot diversification rates on updated phylogeny
 plot_BAMM_rates(BAMM_object = deepSTRAPP_output$updated_BAMM_object, legend = TRUE, labels = FALSE,
    colorbreaks = deepSTRAPP_output$updated_BAMM_object$initial_colorbreaks$net_diversification)

 # Plot histogram of Kruskal-Wallis overall test stats
 plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = deepSTRAPP_output)

 # Plot histograms of posthoc pairwise Dunn's test stats
 plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = deepSTRAPP_output,
    plot_posthoc_tests = TRUE) 

 # ----- Example 3: Biogeographic ranges ----- #

 ## Load data

 # Load phylogeny
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
 # Load trait df
 data(Ponerinae_binary_range_table, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare range data for Old World vs. New World

 # No overlap in ranges
 table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)

 Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
                                      nm = Ponerinae_binary_range_table$Taxa)
 Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
 Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
 Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
 names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
 table(Ponerinae_NO_data)

 colors_per_ranges <- c("mediumpurple2", "peachpuff2")
 names(colors_per_ranges) <- c("N", "O")

  # (May take several minutes to run)
 ## Run evolutionary models
 Ponerinae_biogeo_data <- prepare_trait_data(
     tip_data = Ponerinae_NO_data,
     trait_data_type = "biogeographic",
     phylo = Ponerinae_tree_old_calib,
     evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
     BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
     keep_BioGeoBEARS_files = FALSE,
     prefix_for_files = "Ponerinae_old_calib",
     max_range_size = 2,
     split_multi_area_ranges = TRUE, # Set to TRUE to display the two outputs
     nb_simulations = 100, # Reduce to save time (Default = '1000')
     colors_per_levels = colors_per_ranges,
     return_model_selection_df = TRUE,
     verbose = TRUE) 

 # Load directly output
 data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")

 ## Explore output
 str(Ponerinae_biogeo_data_old_calib, 1)

 ## Set focal time to 10 Mya
 focal_time <- 10

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.

 deepSTRAPP_output <- run_deepSTRAPP_for_focal_time(
    densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
    ace = Ponerinae_biogeo_data_old_calib$ace,
    tip_data = Ponerinae_NO_data,
    trait_data_type = "biogeographic",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    focal_time = focal_time,
    rate_type = "net_diversification",
    return_perm_data = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE)

 ## Explore output
 str(deepSTRAPP_output, max.level = 1)

 # Access deepSTRAPP results
 str(deepSTRAPP_output$STRAPP_results, max.level = 2)
 # Result for Mann-Whitney-Wilcoxon test
 deepSTRAPP_output$STRAPP_results[1:3]

 # Access trait data
 head(deepSTRAPP_output$updated_trait_data_with_Map$trait_data)

 # Access the diversification data in a melted data.frame
 head(deepSTRAPP_output$diversification_data_df)

 # Plot rates vs. ranges across branches
 plot_rates_vs_trait_data_for_focal_time(
     deepSTRAPP_outputs = deepSTRAPP_output,
     colors_per_levels = colors_per_ranges)

 # Plot updated densityMaps cut at focal time
 plot_densityMaps_overlay(deepSTRAPP_output$updated_trait_data_with_Map$densityMaps)

 # Plot diversification rates on updated phylogeny
 plot_BAMM_rates(BAMM_object = deepSTRAPP_output$updated_BAMM_object, legend = TRUE, labels = FALSE,
    colorbreaks = deepSTRAPP_output$updated_BAMM_object$initial_colorbreaks$net_diversification)

 # Plot histogram of Mann-Whitney-Wilcoxon test
 plot_histogram_STRAPP_test_for_focal_time(
    STRAPP_results = deepSTRAPP_output$STRAPP_results) 
}


Run deepSTRAPP to test for a relationship between diversification rates and trait data over multiple time steps

Description

Wrapper function to run deepSTRAPP workflows over multiple time steps in the past. It starts from traits mapped on a phylogeny (trait data) and BAMM output (diversification data) and carries out the appropriate statistical method to test for a relationship between diversification rates and trait data. The workflow is repeated over multiple points in time (i.e. the time_steps) and results are summarized in a data.frame. The function can also provide summaries of trait values and diversification rates extracted along branches over the different time_steps.

Statistical tests are based on block-permutations: rates data are randomized across tips following blocks defined by the diversification regimes identified on each tip (typically from a BAMM). Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.

See the original BAMMtools::traitDependentBAMM() function used to carry out STRAPP test on extant time-calibrated phylogenies.

Tests can be carried out on speciation, extinction and net diversification rates.

Usage

run_deepSTRAPP_over_time(
  contMap = NULL,
  densityMaps = NULL,
  ace = NULL,
  tip_data = NULL,
  trait_data_type,
  BAMM_object,
  time_steps = NULL,
  time_range = NULL,
  nb_time_steps = NULL,
  time_step_duration = NULL,
  keep_tip_labels = TRUE,
  rate_type = "net_diversification",
  seed = NULL,
  nb_permutations = NULL,
  replace_samples = FALSE,
  alpha = 0.05,
  two_tailed = TRUE,
  one_tailed_hypothesis = NULL,
  posthoc_pairwise_tests = FALSE,
  p.adjust_method = "none",
  return_perm_data = FALSE,
  nthreads = 1,
  print_hypothesis = TRUE,
  extract_trait_data_melted_df = FALSE,
  extract_diversification_data_melted_df = FALSE,
  return_STRAPP_results = FALSE,
  return_updated_trait_data_with_Map = FALSE,
  return_updated_BAMM_object = FALSE,
  verbose = TRUE,
  verbose_extended = FALSE
)

Arguments

contMap

For continuous trait data. Object of class "contMap", typically generated with prepare_trait_data() or phytools::contMap(), that contains a phylogenetic tree and associated continuous trait mapping. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

densityMaps

For categorical trait or biogeographic data. List of objects of class "densityMap", typically generated with prepare_trait_data(), that contains a phylogenetic tree and associated posterior probability of being in a given state/range along branches. Each object (i.e., densityMap) corresponds to a state/range. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

ace

(Optional) Ancestral Character Estimates (ACE) at the internal nodes. Obtained with prepare_trait_data() as output in the ⁠$ace⁠ slot.

  • For continuous trait data: Named numerical vector typically generated with phytools::fastAnc(), phytools::anc.ML(), or ape::ace(). Names are nodes_ID of the internal nodes. Values are ACE of the trait.

  • For categorical trait or biogeographic data: Matrix that record the posterior probabilities of ancestral states/ranges. Rows are internal nodes_ID. Columns are states/ranges. Values are posterior probabilities of each state per node. Needed in all cases to provide accurate estimates of trait values.

tip_data

(Optional) Named vector of tip values of the trait.

  • For continuous trait data: Named numerical vector of trait values.

  • For categorical trait or biogeographic data: Character string vector of states/ranges Names are nodes_ID of the internal nodes. Needed to provide accurate tip values.

  • For biogeographic data, ranges should follow the coding scheme of BioGeoBEARS with a unique CAPITAL letter per unique areas (ex: A, B), combined to form multi-area ranges (Ex: AB). Alternatively, you can provide tip_data as a matrix or data.frame of binary presence/absence in each area (coded as unique CAPITAL letter). In this case, columns are unique areas, rows are taxa, and values are integer (0/1) signaling absence or presence of the taxa in the area.

trait_data_type

Character string. Specify the type of trait data. Must be one of "continuous", "categorical", "biogeographic".

BAMM_object

Object of class "bammdata", typically generated with prepare_diversification_data(), that contains a phylogenetic tree and associated diversification rate mapping across selected posterior samples. The phylogenetic tree must the same as the one associated with the contMap, ace and tip_data.

time_steps

Numerical vector. Time steps at which the STRAPP tests should be carried out. If NULL (the default), time_steps will be generated from a combination of two arguments among time_range, nb_time_steps, and/or time_step_duration.

time_range

Vector of two numerical values. Time boundaries within with the time_steps must be defined if not provided. If NULL (the default), and time_range is needed to generate the time_steps, the depth of the tree is used by default: c(0, root_age). However, no time step will be generated for the 'root_age'.

nb_time_steps, time_step_duration

Numerical. Number of time steps and duration of each time step used to generate time_steps if not provided. You must provide at least one of those two arguments to be able to generate time_steps.

keep_tip_labels

Logical. Specify whether terminal branches with a single descendant tip must retained their initial tip.label on the updated phylogeny. Default is TRUE.

rate_type

A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default).

seed

Integer. Set the seed to ensure reproducibility. Default is NULL (a random seed is used).

nb_permutations

Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once.

replace_samples

Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample when drawing samples used to carry out the STRAPP test. Default is FALSE.

alpha

Numerical. Significance level to use to compute the estimate corresponding to the values of the test statistic used to assess significance of the test. This does NOT affect p-values. Default is 0.05.

two_tailed

Logical. To define the type of tests. If TRUE (default), tests for correlations/differences in rates will be carried out with a null hypothesis that rates are not correlated with trait values (continuous data) or equals between trait states (categorical and biogeographic data). If FALSE, one-tailed tests are carried out.

  • For continuous data, it involves defining a one_tailed_hypothesis testing for either a "positive" or "negative" correlation under the alternative hypothesis.

  • For binary data (two states), it involves defining a one_tailed_hypothesis indicating which states have higher rates under the alternative hypothesis.

  • For multinominal data (more than two states), it defines the type of post hoc pairwise tests to carry out between pairs of states. If posthoc_pairwise_tests = TRUE, all two-tailed (if two_tailed = TRUE) or one-tailed (if two_tailed = FALSE) tests are automatically carried out.

one_tailed_hypothesis

A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B').

posthoc_pairwise_tests

Logical. Only for multinominal data (with more than two states). If TRUE, all possible post hoc pairwise (Dunn) tests will be computed across all pairs of states. This is a way to detect which pairs of states have significant differences in rates if the overall test (Kruskal-Wallis) is significant. Default is FALSE.

p.adjust_method

A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values in the post hoc pairwise tests to account for multiple comparisons. See stats::p.adjust() for the available methods. Default is none.

return_perm_data

Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output. This is needed to plot the histograms of the null distribution used to assess significance of the tests with plot_histogram_STRAPP_test_for_focal_time(). (for a single focal_time) and plot_histograms_STRAPP_tests_over_time() (for multiple time_steps). Default is FALSE.

nthreads

Integer. Number of threads to use for paralleled computing of the STRAPP tests across the permutations. The R package parallel must be loaded for nthreads > 1. Default is 1.

print_hypothesis

Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses, and what significant level is used to rejected or not the null hypothesis. Default is TRUE.

extract_trait_data_melted_df

Logical. Specify whether trait data must be extracted from the updated_contMap/updated_densityMaps objects at each time step and returned in a melted data.frame. Default is FALSE.

extract_diversification_data_melted_df

Logical. Specify whether diversification data (regimes ID and tip rates) must be extracted from the updated_BAMM_object at each time step and returned in a melted data.frame. Default is FALSE.

return_STRAPP_results

Logical. Specify whether the STRAPP_results objects summarizing the results of the STRAPP tests carried out at each time step should be returned among the outputs in addition to the ⁠$pvalues_summary_df⁠ already providing test stat estimates and p-values obtained across all time_steps.

return_updated_trait_data_with_Map

Logical. Specify whether the trait_data extracted for the given focal_time and the updated version of mapped phylogeny (contMap/densityMaps) provided as input should be returned among the outputs. The updated contMap/densityMaps consists in cutting off branches and mapping that are younger than the focal_time. Default is FALSE.

return_updated_BAMM_object

Logical. Specify whether the updated_BAMM_object with phylogeny and mapped diversification rates cut-off at the focal_time should be returned among the outputs.

verbose

Logical. Should progression per time_steps be displayed? Default is TRUE.

verbose_extended

Should progression per time_steps AND within each deepSTRAPP workflow de displayed? In addition to printing progress along time_steps, a message will be printed at each step of the deepSTRAPP workflow, and for every batch of 100 BAMM posterior samples whose rates are regimes are updated. If extract_diversification_data_melted_df = TRUE, a message for will also be printed when rates are extracted. Default is FALSE.

Details

The function is a wrapper of run_deepSTRAPP_for_focal_time() that runs the deepSTRAPP workflow over multiple time_steps.

The deepSTRAPP workflow is described step by step in the run_deepSTRAPP_for_focal_time() documentation.

Its main output is the ⁠$pvalues_summary_df⁠: a data.frame providing test stat estimates and p-values obtained across all time_steps, that can be passed down to plot_STRAPP_pvalues_over_time() to generate a plot showing the evolution of the test results across time. If using multinominal data (with more than two states) and posthoc_pairwise_tests = TRUE, the output will also contain a data.frame providing test stat estimates and p-values for post hoc pairwise tests in ⁠$pvalues_summary_df_for_posthoc_pairwise_tests⁠.

The function offers options to generate summary data.frames of the data extracted across time_steps:

The function also allows to keep records of the intermediate objects generated during the STRAPP workflow:

Value

The function returns a list with at least five elements.

Optional summary df for multinominal data, if posthoc_pairwise_tests = TRUE:

Optional melted data.frames:

Optional objects generated for each time step (i.e., focal_time) and ordered as in ⁠$time_steps⁠:

Author(s)

Maël Doré

See Also

To run the deepSTRAPP workflow for a single focal_time: run_deepSTRAPP_for_focal_time() extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time() extract_diversification_data_melted_df_for_focal_time() compute_STRAPP_test_for_focal_time()

For a guided tutorial on complete deepSTRAPP workflow, see the associated vignettes:

Examples

if (deepSTRAPP::is_dev_version())
{
 # ----- Example 1: Continuous trait ----- #
 ## Load data

 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny with old calibration
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract continuous trait data as a named vector
 Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
                                     nm = Ponerinae_trait_tip_data$Taxa)

 # Select a color scheme from lowest to highest values
 color_scale = c("darkgreen", "limegreen", "orange", "red")

 # Get Ancestral Character Estimates based on a Brownian Motion model
 # To obtain values at internal nodes
 Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)

  # (May take several minutes to run)
 # Run a Stochastic Mapping based on a Brownian Motion model
 # to interpolate values along branches and obtain a "contMap" object
 Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
                                        res = 100, # Number of time steps
                                        plot = FALSE)
 # Plot contMap = stochastic mapping of continuous trait
 plot_contMap(contMap = Ponerinae_contMap,
              color_scale = color_scale)

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
 # nb_time_steps <- 5
 time_step_duration <- 5
 time_range <- c(0, 40)

 ## Run deepSTRAPP on net diversification rates
 Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
    contMap = Ponerinae_contMap,
    ace = Ponerinae_ACE,
    tip_data = Ponerinae_cont_tip_data,
    trait_data_type = "continuous",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    # nb_time_steps = nb_time_steps,
    time_range = time_range,
    time_step_duration = time_step_duration,
    return_perm_data = TRUE,
    extract_trait_data_melted_df = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly trait data output
 data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Explore output
 str(Ponerinae_deepSTRAPP_cont_old_calib_0_40, max.level = 1)

 # Display test summary
 # Can be passed down to [deepSTRAPP::plot_STRAPP_pvalues_over_time()] to generate a plot
 # showing the evolution of the test results across time.
 Ponerinae_deepSTRAPP_cont_old_calib_0_40$pvalues_summary_df

 # Access trait data in a melted data.frame
 head(Ponerinae_deepSTRAPP_cont_old_calib_0_40$trait_data_df_over_time)

 # Access the diversification data in a melted data.frame
 head(Ponerinae_deepSTRAPP_cont_old_calib_0_40$diversification_data_df_over_time)

 # Access deepSTRAPP results
 str(Ponerinae_deepSTRAPP_cont_old_calib_0_40$STRAPP_results, max.level = 2)

 ## Visualize updated phylogenies

  # (May take time to plot)
 # Plot updated contMap for time step n°2
 deepSTRAPP_outputs <- Ponerinae_deepSTRAPP_cont_old_calib_0_40
 contMap_step2 <- deepSTRAPP_outputs$updated_trait_data_with_Map_over_time[[2]]
 plot_contMap(contMap_step2$contMap, ftype = "off")

 # Plot diversification rates on updated phylogeny for time step n°2
 BAMM_object_step2 <- deepSTRAPP_outputs$updated_BAMM_objects_over_time[[2]]
 plot_BAMM_rates(BAMM_object = BAMM_object_step2,
    legend = TRUE, labels = FALSE,
    colorbreaks = BAMM_object_step2$initial_colorbreaks$net_diversification) 

 ## Visualize test results

  # (May take time to plot)
 # Plot p-values of Spearman tests across all time-steps
 plot_STRAPP_pvalues_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    alpha = 0.10)

 # Plot evolution of mean rates through time
 plot_rates_through_time(deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40)

 # Plot rates vs. trait values across branches for time step = 10 My
 plot_rates_vs_trait_data_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    focal_time = 10)

 # Plot histogram of Spearman test stats for time step = 10 My
 plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    focal_time = 10)

 # Plot histograms of Spearman test results (One plot per time-step)
 plot_histograms_STRAPP_tests_over_time(
    deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
    display_plots = TRUE)  

 # ----- Example 2: Categorical trait ----- #

 ## Load data

 # Load trait df
 data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
 # Load phylogeny
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare trait data

 # Extract categorical data with 3-levels
 Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
                                         nm = Ponerinae_trait_tip_data$Taxa)
 table(Ponerinae_cat_3lvl_tip_data)

 # Select color scheme for states
 colors_per_states <- c("forestgreen", "sienna", "goldenrod")
 names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")

  # (May take several minutes to run)
 ## Produce densityMaps using stochastic character mapping based on an ARD Mk model
 Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
    tip_data = Ponerinae_cat_3lvl_tip_data,
    phylo = Ponerinae_tree_old_calib,
    trait_data_type = "categorical",
    colors_per_levels = colors_per_states,
    evolutionary_models = "ARD",
    nb_simulations = 100,
    return_best_model_fit = TRUE,
    return_model_selection_df = TRUE,
    plot_map = FALSE) 

  # Load directly trait data output
 data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
 # nb_time_steps <- 5
 time_step_duration <- 5
 time_range <- c(0, 40)

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates across time-steps.
 Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40 <- run_deepSTRAPP_over_time(
    densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
    ace = Ponerinae_cat_3lvl_data_old_calib$ace,
    tip_data = Ponerinae_cat_3lvl_tip_data,
    trait_data_type = "categorical",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    # nb_time_steps = nb_time_steps,
    time_range = time_range,
    time_step_duration = time_step_duration,
    rate_type = "net_diversification",
    seed = 1234,
    alpha = 0.10, # Select a generous level of significance for the sake of the example
    posthoc_pairwise_tests = TRUE,
    return_perm_data = TRUE,
    extract_trait_data_melted_df = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly deepSTRAPP output
 data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
 deepSTRAPP_outputs <- Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Explore output
 str(deepSTRAPP_outputs, max.level = 1)

 # Display test summaries
 # Can be passed down to [deepSTRAPP::plot_STRAPP_pvalues_over_time()] to generate a plot
 # showing the evolution of the test results across time.
 deepSTRAPP_outputs$pvalues_summary_df
 # Results for posthoc pairwise Dunn's tests over time-steps
 deepSTRAPP_outputs$pvalues_summary_df_for_posthoc_pairwise_tests

 # Access trait data in a melted data.frame
 head(deepSTRAPP_outputs$trait_data_df_over_time)

 # Access the diversification data in a melted data.frame
 head(deepSTRAPP_outputs$diversification_data_df_over_time)

 # Access details of deepSTRAPP results
 str(deepSTRAPP_outputs$STRAPP_results, max.level = 2)

 ## Visualize updated phylogenies

  # (May take time to plot)
 # Plot updated densityMaps for time step n°2 = 10My
 densityMaps_10My <- deepSTRAPP_outputs$updated_trait_data_with_Map_over_time[[2]]
 plot_densityMaps_overlay(densityMaps_10My$densityMaps)

 # Plot diversification rates on updated phylogeny for time step n°2
 BAMM_object_10My <- deepSTRAPP_outputs$updated_BAMM_objects_over_time[[2]]
 plot_BAMM_rates(BAMM_object = BAMM_object_10My,
    legend = TRUE, labels = FALSE,
    colorbreaks = BAMM_object_10My$initial_colorbreaks$net_diversification) 

 ## Visualize test results

  # (May take time to plot)
 # Plot p-values of overall Kruskal-Wallis test across all time-steps
 plot_STRAPP_pvalues_over_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    alpha = 0.10)

 # Plot p-values of post hoc pairwise Dunn's tests between pairs of tests across all time-steps
 plot_STRAPP_pvalues_over_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    alpha = 0.10,
    plot_posthoc_tests = TRUE)

 # Plot evolution of mean rates through time
 plot_rates_through_time(deepSTRAPP_outputs = deepSTRAPP_outputs,
    colors_per_levels = colors_per_states)

 # Plot rates vs. trait values across branches for time step n°2 = 10 My
 plot_rates_vs_trait_data_for_focal_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    focal_time = 10,
    colors_per_levels = colors_per_states)

 # Plot histogram of overall Kruskal-Wallis test for time step n°2 = 10 My
 plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    focal_time = 10)

 # Plot histograms of overall Kruskal-Wallis test results across all time-steps
 # (One plot per time-step)
 plot_histograms_STRAPP_tests_over_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    plot_posthoc_tests = FALSE)

 # Plot histograms of post hoc pairwise Dunn's test results across all time-steps
 # (One multifaceted plot per time-step)
 plot_histograms_STRAPP_tests_over_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    plot_posthoc_tests = TRUE) 

 # ----- Example 3: Biogeographic ranges ----- #

 ## Load data

 # Load phylogeny
 data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
 # Load trait df
 data(Ponerinae_binary_range_table, package = "deepSTRAPP")

 # Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Prepare range data for Old World vs. New World

 # No overlap in ranges
 table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)

 Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
                                      nm = Ponerinae_binary_range_table$Taxa)
 Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
 Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
 Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
 names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
 table(Ponerinae_NO_data)

 colors_per_ranges <- c("mediumpurple2", "peachpuff2")
 names(colors_per_ranges) <- c("N", "O")

  # (May take several minutes to run)
 ## Run evolutionary models
 Ponerinae_biogeo_data <- prepare_trait_data(
    tip_data = Ponerinae_NO_data,
    trait_data_type = "biogeographic",
    phylo = Ponerinae_tree_old_calib,
    evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
    BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
    keep_BioGeoBEARS_files = FALSE,
    prefix_for_files = "Ponerinae_old_calib",
    max_range_size = 2,
    split_multi_area_ranges = TRUE, # Set to TRUE to use only unique areas "A" and "B"
    nb_simulations = 100, # Reduce to save time (Default = '1000')
    colors_per_levels = colors_per_ranges,
    return_model_selection_df = TRUE,
    verbose = TRUE) 

 # Load directly output
 data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")

 ## Set for time steps of 5 My. Will generate deepSTRAPP workflows from 0 to 40 Mya.
 time_range <- c(0, 40)
 time_step_duration <- 10

  # (May take several minutes to run)
 ## Run deepSTRAPP on net diversification rates for time-steps = 0 to 40 Mya.
 Ponerinae_deepSTRAPP_biogeo_old_calib_0_40 <- run_deepSTRAPP_over_time(
    densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
    ace = Ponerinae_biogeo_data_old_calib$ace,
    tip_data = Ponerinae_ON_tip_data,
    trait_data_type = "biogeographic",
    BAMM_object = Ponerinae_BAMM_object_old_calib,
    time_range = time_range,
    time_step_duration = time_step_duration,
    seed = 1234, # Set seed for reproducibility
    alpha = 0.10, # Select a generous level of significance for the sake of the example
    rate_type = "net_diversification",
    return_perm_data = TRUE,
    extract_trait_data_melted_df = TRUE,
    extract_diversification_data_melted_df = TRUE,
    return_STRAPP_results = TRUE,
    return_updated_trait_data_with_Map = TRUE,
    return_updated_BAMM_object = TRUE,
    verbose = TRUE,
    verbose_extended = TRUE) 

 ## Load directly output
 data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
 deepSTRAPP_outputs <- Ponerinae_deepSTRAPP_biogeo_old_calib_0_40
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Explore output
 str(deepSTRAPP_outputs, max.level = 1)

 # Display test summaries
 # Can be passed down to [deepSTRAPP::plot_STRAPP_pvalues_over_time()] to generate a plot
 # showing the evolution of the test results across time.
 deepSTRAPP_outputs$pvalues_summary_df

 # Access bioregeographic range data in a melted data.frame
 head(deepSTRAPP_outputs$trait_data_df_over_time)

 # Access the diversification data in a melted data.frame
 head(deepSTRAPP_outputs$diversification_data_df_over_time)

 # Access details of deepSTRAPP results
 str(deepSTRAPP_outputs$STRAPP_results, max.level = 2)

 ## Visualize updated phylogenies

  # (May take time to plot)
 # Plot updated densityMaps for time step n°2 = 10My
 densityMaps_10My <- deepSTRAPP_outputs$updated_trait_data_with_Map_over_time[[2]]
 plot_densityMaps_overlay(densityMaps_10My$densityMaps)

 # Plot diversification rates on updated phylogeny for time step n°2
 BAMM_object_10My <- deepSTRAPP_outputs$updated_BAMM_objects_over_time[[2]]
 plot_BAMM_rates(BAMM_object = BAMM_object_10My,
   legend = TRUE, labels = FALSE,
   colorbreaks = BAMM_object_10My$initial_colorbreaks$net_diversification) 

 ## Visualize test results

  # (May take time to plot)
 # Plot p-values of Mann-Whitney-Wilcoxon tests across all time-steps
 plot_STRAPP_pvalues_over_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    alpha = 0.05)

 # Plot evolution of mean rates through time
 plot_rates_through_time(deepSTRAPP_outputs = deepSTRAPP_outputs,
    colors_per_levels = colors_per_ranges)

 # Plot rates vs. trait values across branches for time step n°2 = 10 My
 plot_rates_vs_trait_data_for_focal_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    focal_time = 10,
    colors_per_levels = colors_per_ranges)

 # Plot histogram of Mann-Whitney-Wilcoxon test stats for time step n°2 = 10My
 plot_histogram_STRAPP_test_for_focal_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    focal_time = 10)

 # Plot histograms of Mann-Whitney-Wilcoxon test stats for all time-steps (One plot per time-step)
 plot_histograms_STRAPP_tests_over_time(
    deepSTRAPP_outputs = deepSTRAPP_outputs,
    display_plots = TRUE,
    plot_posthoc_tests = FALSE) 
}


Compare model fits with AICc and Akaike's weights

Description

Compare models fit with BioGeoBEARS::bear_optim_run() using AICc and Akaike's weights. Generate a data.frame summarizing information. Identify the best model and extract its results.

Usage

select_best_model_from_BioGeoBEARS(list_model_fits)

Arguments

list_model_fits

Named list with the results of a model fit with BioGeoBEARS::bear_optim_run() in each element.

Value

The function returns a list with three elements.

Author(s)

Maël Doré

See Also

BioGeoBEARS::bear_optim_run() BioGeoBEARS::get_LnL_from_BioGeoBEARS_results_object() BioGeoBEARS::AICstats_2models()

Examples

if (deepSTRAPP::is_dev_version())
{
 ## The R package 'BioGeoBEARS' is needed for this function to work with biogeographic data.
 # Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.

 # Load phylogeny and tip data
 library(phytools)
 data(eel.tree)
 data(eel.data)

 # Transform feeding mode data into biogeographic data with ranges A, B, and AB.
 eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
 eel_data <- as.character(eel_data)
 eel_data[eel_data == "bite"] <- "A"
 eel_data[eel_data == "suction"] <- "B"
 eel_data[c(5, 6, 7, 15, 25, 32, 33, 34, 50, 52, 57, 58, 59)] <- "AB"
 eel_data <- stats::setNames(eel_data, rownames(eel.data))
 table(eel_data)

 colors_per_levels <- c("dodgerblue3", "gold")
 names(colors_per_levels) <- c("A", "B")

  # (May take several minutes to run)
 ## Prepare phylo

 # Set path to BioGeoBEARS directory
 BioGeoBEARS_directory_path = "./BioGeoBEARS_directory/"

 # Export phylo
 path_to_phylo <- BioGeoBEARS::np(paste0(BioGeoBEARS_directory_path, "phylo.tree"))
 write.tree(phy = eel.tree, file = path_to_phylo)

 ## Prepare range data

 tip_data <- eel_data
 # Convert tip_data to df
 ranges_df <- as.data.frame(tip_data)
 # Get list of all unique areas
 unique_areas_in_ranges_list <- strsplit(x = tip_data, split = "")
 unique_areas <- unique(unlist(unique_areas_in_ranges_list))
 unique_areas <- unique_areas[order(unique_areas)]

 # Loop per unique areas
 for (i in seq_along(unique_areas))
 {
   # i <- 1

   # Extract unique area
   unique_area_i <- unique_areas[i]
   # Detect presence in ranges
   binary_match_i <- unlist(lapply(X = unique_areas_in_ranges_list,
      FUN = function (x) { unique_area_i %in% x } ))

   # Add to ranges_df
   ranges_df <- cbind(ranges_df, binary_match_i)
 }
 # Extract binary df of presence/absence
 binary_df <- ranges_df[, -1]
 # Convert character strings into numerical factors
 binary_df_num <- as.data.frame(apply(X = binary_df, MARGIN = 2, FUN = as.numeric))
 row.names(binary_df_num) <- names(tip_data)
 names(binary_df_num) <- unique_areas

 # Produce tipranges object from numeric df
 Taxa_bioregions_tipranges_obj <- BioGeoBEARS::define_tipranges_object(tmpdf = binary_df_num)

 # Set path to tip ranges object
 path_to_tip_ranges <- BioGeoBEARS::np(paste0(BioGeoBEARS_directory_path,"tip_ranges.data"))

 # Export tip ranges in Lagrange/PHYLIP format
 BioGeoBEARS::save_tipranges_to_LagrangePHYLIP(
   tipranges_object = Taxa_bioregions_tipranges_obj,
   lgdata_fn = path_to_tip_ranges,
   areanames = colnames(Taxa_bioregions_tipranges_obj@df))

 ## Prepare models

 # Prepare DEC model run
 DEC_run <- BioGeoBEARS::define_BioGeoBEARS_run(
    num_cores_to_use = 1,
    max_range_size = 2, # To set the maximum number of bioregion encompassed
      # by a lineage range at any time
    trfn = path_to_phylo, # To provide path to the input tree file
    geogfn = path_to_tip_ranges, # To provide path to the LagrangePHYLIP file with binary ranges
    return_condlikes_table = TRUE, # To ask to obtain all marginal likelihoods
      # computed by the model and used to display ancestral states
    print_optim = TRUE)

 # Check that starting parameter values are inside the min/max
 DEC_run <- BioGeoBEARS::fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object = DEC_run)
 # Check validity of set-up before run
 BioGeoBEARS::check_BioGeoBEARS_run(DEC_run)

 # Use the DEC model as template for DEC+J model
 DEC_J_run <- DEC_run

 # Update status of jump speciation parameter to be estimated
 DEC_J_run$BioGeoBEARS_model_object@params_table["j","type"] <- "free"
 # Set initial value of J for optimization to an arbitrary low non-null value
 j_start <- 0.0001
 DEC_J_run$BioGeoBEARS_model_object@params_table["j","init"] <- j_start
 DEC_J_run$BioGeoBEARS_model_object@params_table["j","est"] <- j_start

 # Check validity of set-up before run
 DEC_J_run <- BioGeoBEARS::fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object = DEC_J_run)
 invisible(BioGeoBEARS::check_BioGeoBEARS_run(DEC_J_run))

 ## Run models

 # Run DEC model
 DEC_fit <- BioGeoBEARS::bears_optim_run(DEC_run)

 # Set starting values for optimization of DEC+J model based on MLE in the DEC model
 d_start <- DEC_fit$outputs@params_table["d","est"]
 e_start <- DEC_fit$outputs@params_table["e","est"]

 DEC_J_run$BioGeoBEARS_model_object@params_table["d","init"] <- d_start
 DEC_J_run$BioGeoBEARS_model_object@params_table["d","est"] <- d_start
 DEC_J_run$BioGeoBEARS_model_object@params_table["e","init"] <- e_start
 DEC_J_run$BioGeoBEARS_model_object@params_table["e","est"] <- e_start

 # Run DEC+J model
 DEC_J_fit <- BioGeoBEARS::bears_optim_run(DEC_J_run)

 ## Store model outputs
 list_model_fits <- list(DEC = DEC_fit, DEC_J = DEC_J_fit)

 ## Compare models
 model_comparison_output <- select_best_model_from_BioGeoBEARS(list_model_fits = list_model_fits)

 # Explore output
 str(model_comparison_output, max.level = 1)
 # Print comparison
 print(model_comparison_output$models_comparison_df)
 # Print best model fit
 print(model_comparison_output$best_model_fit$outputs)

 ## Clean local files
 unlink(BioGeoBEARS_directory_path, recursive = TRUE, force = TRUE) 
}


Compare trait evolutionary model fits with AICc and Akaike's weights

Description

Compare evolutionary models fit with geiger::fitContinuous() or geiger::fitDiscrete() using AICc and Akaike's weights. Generate a data.frame summarizing information. Identify the best model and extract its results.

Usage

select_best_trait_model_from_geiger(list_model_fits)

Arguments

list_model_fits

Named list with the results of a model fit with geiger::fitContinuous() or geiger::fitDiscrete() in each element.

Value

The function returns a list with three elements.

Author(s)

Maël Doré

See Also

geiger::fitContinuous() geiger::fitDiscrete()

Examples


# ----- Example 1: Continuous data ----- #

# Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)

# Extract body size
eel_data <- stats::setNames(eel.data$Max_TL_cm,
                            rownames(eel.data))

 # (May take several minutes to run)
# Fit BM model
BM_fit <- geiger::fitContinuous(phy = eel.tree, dat = eel_data, model = "BM")
# Fit EB model
EB_fit <- geiger::fitContinuous(phy = eel.tree, dat = eel_data, model = "EB")
# Fit OU model
OU_fit <- geiger::fitContinuous(phy = eel.tree, dat = eel_data, model = "OU")

# Store models
list_model_fits <- list(BM = BM_fit, EB = EB_fit, OU = OU_fit)

# Compare models
model_comparison_output <- select_best_trait_model_from_geiger(list_model_fits = list_model_fits)

# Explore output
str(model_comparison_output, max.level = 2)

# Print comparison
print(model_comparison_output$models_comparison_df)

# Print best model fit
print(model_comparison_output$best_model_fit) 

# ----- Example 2: Categorical data ----- #

# Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)

# Extract feeding mode data
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
table(eel_data)

 # (May take several minutes to run)
# Fit ER model
ER_fit <- geiger::fitDiscrete(phy = eel.tree, dat = eel_data, model = "ER")
# Fit ARD model
ARD_fit <- geiger::fitDiscrete(phy = eel.tree, dat = eel_data, model = "ARD")

# Store models
list_model_fits <- list(ER = ER_fit, ARD = ARD_fit)

# Compare models
model_comparison_output <- select_best_trait_model_from_geiger(list_model_fits = list_model_fits)

# Explore output
str(model_comparison_output, max.level = 2)

# Print comparison
print(model_comparison_output$models_comparison_df)

# Print best model fit
print(model_comparison_output$best_model_fit) 


Update diversification rates/regimes mapped on a phylogeny up to a given time in the past

Description

Updates an object of class "bammdata" to obtain the diversification rates/regimes found along branches at a specific time in the past (i.e. the focal_time). Optionally, the function can update the object to display a mapped phylogeny such as branches overlapping the focal_time are shorten to the focal_time.

Usage

update_rates_and_regimes_for_focal_time(
  BAMM_object,
  focal_time,
  update_rates = TRUE,
  update_regimes = TRUE,
  update_tree = FALSE,
  update_plot = FALSE,
  update_all_elements = FALSE,
  keep_tip_labels = TRUE,
  verbose = TRUE
)

Arguments

BAMM_object

Object of class "bammdata", typically generated with prepare_diversification_data(), that contains a phylogenetic tree and associated diversification rate mapping across selected posterior samples. The phylogenetic tree must be rooted and fully resolved/dichotomous, but it does not need to be ultrametric (it can includes fossils).

focal_time

Numerical. The time, in terms of time distance from the present, at which the tree and rate mapping must be cut. It must be smaller than the root age of the phylogeny.

update_rates

Logical. Specify whether diversification rates stored in ⁠$tipLambda⁠ (speciation) and ⁠$tipMu⁠ (extinction) must be updated to summarize rates found along branches at focal_time. Default is TRUE.

update_regimes

Logical. Specify whether diversification regimes stored in ⁠$tipStates⁠ must be updated to summarize regimes found along branches at focal_time. Default is TRUE.

update_tree

Logical. Specify whether to update the phylogeny such as all branches that are younger than the focal_time are cut-off. Default is FALSE.

update_plot

Logical. Specify whether to update the phylogeny AND the elements used by plot_BAMM_rates() to plot diversification rates on the phylogeny such as all branches that are younger than the focal_time are cut-off. Default is FALSE. If set to TRUE, it will override the update_tree parameter and update the phylogeny.

update_all_elements

Logical. Specify whether to update all the elements in the object, including rates/regimes/phylogeny/elements for plot_BAMM_rates()/all other elements. Default is FALSE. If set to TRUE, it will override other ⁠update_*⁠ parameters and update all elements.

keep_tip_labels

Logical. Should terminal branches with a single descendant tip retain their initial tip.label on the updated phylogeny and diversification rate mapping? Default is TRUE. If set to FALSE, the tipward node ID will be used as label for all tips.

verbose

Logical. Should progression be displayed? A message will be printed for every batch of 100 BAMM posterior samples updated. Default is TRUE.

Details

The object of class "bammdata" (BAMM_object) is cut-off at a specific time in the past (i.e. the focal_time) and the current diversification rate values of the overlapping edges/branches are extracted.

—– Update diversification rate data —–

If update_rates = TRUE, diversification rates of branches overlapping with focal_time will be updated. Each cut-off branches form a new tip present at focal_time with updated associated diversification rate data. Fossils older than focal_time do not yield any data.

If update_regimes = TRUE, diversification regimes of branches overlapping with focal_time will be updated. Each cut-off branches form a new tip present at focal_time with updated associated diversification regime ID found in ⁠$tipStates⁠. Fossils older than focal_time do not yield any data.

—– Update the phylogeny —–

If update_tree = TRUE, elements defining the phylogeny, as in an "phylo" object will be updated such as all branches that are younger than the focal_time are cut-off:

—– Update the plot from plot_BAMM_rates() —–

If update_plot = TRUE, elements used to plot diversification rates on the phylogeny using plot_BAMM_rates() will be updated such as all branches that are younger than the focal_time are cut-off:

Value

The function returns a list as an updated BAMM_object of class "bammdata".

Phylogeny-related elements used to plot a phylogeny with ape::plot.phylo():

BAMM internal elements used for tree exploration:

BAMM elements summarizing diversification data:

Additional elements providing key information for downstream analyses:

New elements added to provide update information:

Author(s)

Maël Doré

See Also

cut_phylo_for_focal_time() plot_BAMM_rates()

Examples

# ----- Example 1: Extant whales (87 taxa) ----- #

## Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(whale_BAMM_object, package = "deepSTRAPP")

## Set focal-time to 5 My
focal_time = 5

 # (May take several minutes to run)
## Update the BAMM object
whale_BAMM_object_5My <- update_rates_and_regimes_for_focal_time(
   BAMM_object = whale_BAMM_object,
   focal_time = 5,
   update_rates = TRUE, update_regimes = TRUE,
   update_tree = TRUE, update_plot = TRUE,
   update_all_elements = TRUE,
   keep_tip_labels = TRUE,
   verbose = TRUE)

# Add "phylo" class to be compatible with phytools::nodeHeights()
class(whale_BAMM_object) <- unique(c(class(whale_BAMM_object), "phylo"))
root_age <- max(phytools::nodeHeights(whale_BAMM_object)[,2])
# Remove temporary "phylo" class
class(whale_BAMM_object) <- setdiff(class(whale_BAMM_object), "phylo")

## Plot initial BAMM_object for t = 0 My
plot_BAMM_rates(whale_BAMM_object, add_regime_shifts = TRUE,
                labels = TRUE, legend = TRUE, cex = 0.5,
                par.reset = FALSE) # Keep plotting parameters in memory to use abline().
abline(v = root_age - focal_time,
      col = "red", lty = 2, lwd = 2)

## Plot updated BAMM_object for t = 5 My
plot_BAMM_rates(whale_BAMM_object_5My, add_regime_shifts = TRUE,
                labels = TRUE, legend = TRUE, cex = 0.8) 

# ----- Example 2: Extant Ponerinae (1,534 taxa) ----- #

if (deepSTRAPP::is_dev_version())
{
 ## Load the BAMM_object summarizing 1000 posterior samples of BAMM
 data(Ponerinae_BAMM_object, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Set focal-time to 10 My
 focal_time = 10

  # (May take several minutes to run)
 ## Update the BAMM object
 Ponerinae_BAMM_object_10My <- update_rates_and_regimes_for_focal_time(
    BAMM_object = Ponerinae_BAMM_object,
    focal_time = focal_time,
    update_rates = TRUE, update_regimes = TRUE,
    update_tree = TRUE, update_plot = TRUE,
    update_all_elements = TRUE,
    keep_tip_labels = TRUE,
    verbose = TRUE) 

 ## Load results to save time
 data(Ponerinae_BAMM_object_10My, package = "deepSTRAPP")
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 ## Extract root age
 # Add temporarily the "phylo" class to be compatible with phytools::nodeHeights()
 class(Ponerinae_BAMM_object) <- unique(c(class(Ponerinae_BAMM_object), "phylo"))
 root_age <- max(phytools::nodeHeights(Ponerinae_BAMM_object)[,2])
 # Remove temporary "phylo" class
 class(Ponerinae_BAMM_object) <- setdiff(class(Ponerinae_BAMM_object), "phylo")

 ## Plot diversification rates on the initial tree
 plot_BAMM_rates(Ponerinae_BAMM_object,
                 legend = TRUE, labels = FALSE)
 abline(v = root_age - focal_time,
        col = "red", lty = 2, lwd = 2)

 ## Plot diversification rates and regime shifts on the updated tree (cut-off for 10 My)
 # Keep the initial color scheme
 plot_BAMM_rates(Ponerinae_BAMM_object_10My, legend = TRUE, labels = FALSE,
                 colorbreaks = Ponerinae_BAMM_object_10My$initial_colorbreaks$net_diversification)

 # Use a new color scheme mapped on the new distribution of rates
 plot_BAMM_rates(Ponerinae_BAMM_object_10My, legend = TRUE, labels = FALSE)
}


Dataset summarizing 1000 posterior samples of BAMM for extant whales

Description

An object of class "bammdata" containing information of diversification dynamics of extant whales (Cetacea order) modeled with BAMM.

Source: Steeman, M. E., M. B. Hebsgaard, R. E. Fordyce, S. Y. W. Ho, D. L. Rabosky, R. Nielsen, C. Rahbek, H. Glenner, M. V. Sorensen, and E. Willerslev (2009) Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58, 573-585.

Usage

data(whale_BAMM_object)

Format

A list with 24 elements.

Details

An object of class "bammdata" containing information of diversification dynamics of extant ponerine ants (Ponerinae subfamily) modeled with BAMM.

Phylogeny-related elements used to plot a phylogeny with ape::plot.phylo():

BAMM internal elements used for tree exploration:

BAMM elements summarizing diversification data:

Additional elements providing key information for downstream analyses:

References

Steeman, M. E., M. B. Hebsgaard, R. E. Fordyce, S. Y. W. Ho, D. L. Rabosky, R. Nielsen, C. Rahbek, H. Glenner, M. V. Sorensen, and E. Willerslev (2009) Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58, 573-585.

See Also

BAMM software website: http://bamm-project.org/

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.