The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Partially Matching of Trait Data and Tree(s) in treedata.table

Josef Uyeda, Cristian Roman-Palacios, April Wright

08/08/2020

Partially matching trait data and tree(s)

The as.treedata.table function enables users to match a tree (or multiple trees) against a single trait database. We first load the sample dataset.

library(ape)
library(treedata.table)

# Load example data
data(anolis)
#Create treedata.table object with as.treedata.table
td <- as.treedata.table(tree = anolis$phy, data = anolis$dat)
## Tip labels detected in column: X
## Phylo object detected
## No tips were dropped from the original tree/dataset

Tips that are not common between the tree (or trees) and dataset are dropped from the resulting treedata.table object. For instance, below I have modified the original anole phylogeny such that A. ahli (ahi) is replaced for a label that is not present in the dataset (NAA).

anolis_newtip<-anolis$phy
anolis_newtip$tip.label[1]<-'NAA'
anolis_newtip
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##   NAA, allogus, rubribarbus, imias, sagrei, bremeri, ...
## 
## Rooted; includes branch lengths.

We then use this modified tree to fit a treedata.table object using the as.treedata.table function:

td <- as.treedata.table(tree=anolis_newtip, data=anolis$dat)
## Tip labels detected in column: X
## Phylo object detected
## 1 tip(s) dropped from the original tree
## 1 tip(s) dropped from the original dataset

Note that as.treedata.table drops all non-overlapping tips (NAA [present in the tree but not in the trait data] and ahi [present in the database but not in tree] in this case) and returns a treedata.table object with fully matching phy and data objects.

td
## $phy 
## 
## Phylogenetic tree with 99 tips and 98 internal nodes.
## 
## Tip labels:
##   allogus, rubribarbus, imias, sagrei, bremeri, quadriocellifer, ...
## 
## Rooted; includes branch lengths.
## 
## $dat 
##          tip.label      SVL PCI_limbs PCII_head PCIII_padwidth_vs_tail
## 1:         allogus 4.040138 -2.845570 0.6001134             -1.0253056
## 2:     rubribarbus 4.078469 -2.238349 1.1199779             -1.1929572
## 3:           imias 4.099687 -3.048917 2.3320349              0.1616442
## 4:          sagrei 4.067162 -1.741055 2.0228243              0.1693635
## 5:         bremeri 4.113371 -1.813611 2.6067501              0.6399320
## 6: quadriocellifer 3.901619 -2.267894 0.9909208              0.3553405
##    PCIV_lamella_num awesomeness  hostility   attitude ecomorph island
## 1:        -2.463311   0.6244689 -0.5000962  0.7128910       TG   Cuba
## 2:        -2.087433  -0.4277574  0.4800445 -0.9674263       TG   Cuba
## 3:        -2.112606   0.1694260 -0.4108123  0.1963580       TG   Cuba
## 4:        -1.375769  -0.6304338  0.7193130 -1.2228276       TG   Cuba
## 5:        -1.626299  -1.7543006  1.4127184  0.1832345       TG   Cuba
## 6:        -2.105059  -0.2576389  0.4627081 -0.2712794       TG   Cuba

Fully-matching matrix and trees are also returned in treedata.table objects with multiPhylo objects in their phy component. See the example below.

We first construct a multiPhylo object that partially overlaps the original trait database by using NAA instead of ahi.

anolis2<-anolis$phy
anolis2$tip.label[1]<-'NAA'
anolis1<-anolis$phy
anolis1$tip.label[1]<-'NAA'
trees<-list(anolis1,anolis2)
class(trees) <- "multiPhylo"
trees
## 2 phylogenetic trees

Next, we fit the treedata.table object using the relevant multiPhylo object and the original trait database.

td <- as.treedata.table(tree=trees, data=anolis$dat)
## Tip labels detected in column: X
## Multiphylo object detected
## 1 tip(s) dropped from 2 trees
## 1 tip(s)  dropped from the original dataset

Note that 1 tip was dropped for all trees in the multiPhylo object and a single row was deleted from the data.table object in the treedata.table object.

td
## $phy 
## 2 phylogenetic trees
## 
## $dat 
##          tip.label      SVL PCI_limbs PCII_head PCIII_padwidth_vs_tail
## 1:         allogus 4.040138 -2.845570 0.6001134             -1.0253056
## 2:     rubribarbus 4.078469 -2.238349 1.1199779             -1.1929572
## 3:           imias 4.099687 -3.048917 2.3320349              0.1616442
## 4:          sagrei 4.067162 -1.741055 2.0228243              0.1693635
## 5:         bremeri 4.113371 -1.813611 2.6067501              0.6399320
## 6: quadriocellifer 3.901619 -2.267894 0.9909208              0.3553405
##    PCIV_lamella_num awesomeness  hostility   attitude ecomorph island
## 1:        -2.463311   0.6244689 -0.5000962  0.7128910       TG   Cuba
## 2:        -2.087433  -0.4277574  0.4800445 -0.9674263       TG   Cuba
## 3:        -2.112606   0.1694260 -0.4108123  0.1963580       TG   Cuba
## 4:        -1.375769  -0.6304338  0.7193130 -1.2228276       TG   Cuba
## 5:        -1.626299  -1.7543006  1.4127184  0.1832345       TG   Cuba
## 6:        -2.105059  -0.2576389  0.4627081 -0.2712794       TG   Cuba

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.