It can be a bit fiddly to get a phylogenetic dataset into R, particularly if you are not used to working with files in the NEXUS format.
The first thing that you’ll need to do is load the phangorn
package, which should have been installed when you installed TreeSearch
.
library('phangorn')
## Loading required package: ape
If your data is in an Excel spreadsheet, one way to load it into R is using the xlsx
package. First you’ll have to install it:
install.packages('xlsx') # You only need to do this once
Then you should prepare your Excel spreadsheet such that each row corresponds to a taxon, and each column to a character.
Then you can read the data from the Excel file by telling R which sheet, rows and columns contain your data:
library('xlsx')
raw_data <- as.matrix(read.xlsx(filename,
sheetIndex=1, # Loads sheet number 1 from the excel file
rowIndex=2:21, # Extracts rows 2 to 21
colIndex=2:26, # Extracts columns B to Z
header=FALSE
))
taxon_names <- read.xlsx(filename, sheetIndex=1, rowIndex=2:21, colIndex=1, as.data.frame=FALSE) # In this example, the names of taxa are in column 1
rownames(raw_data) <- taxon_names
If your data is in a NEXUS file, you can read it using the preinstalled package ape
:
raw_data <- ape::read.nexus.data(filename)
Non-standard elements of a Nexus file might be beyond the capabilities of ape’s parser. In particular, you will need to replace spaces in taxon names with an underscore, and to arrange all data into a single block starting BEGIN DATA
. You’ll need to strip out comments, character definitions and separate taxon blocks.
The function readNexus
in package phylobase
promises to be more powerful, yet I’ve not been able to get it to work.
The next stage is to get the raw data into a format that TreeSearch
can understand. Here we’ll enlist the help of the phangorn
package, which was installed when you installed TreeSearch
:
library('phangorn')
my_data <- phyDat(raw_data, type='USER', levels=c(0:9, '-'))
type='USER'
tells the parser to expect morphological data.
The levels
parameter simply lists all the states that any character might take. 0:9
includes all the integer digits from 0 to 9. If you have inapplicable data in your matrix, you should list -
as a separate level as it represents an additional state (as handled by the Morphy implementation of (Brazeau, Guillerme, & Smith, 2017)).
You might want to:
Brazeau, M. D., Guillerme, T., & Smith, M. R. (2017). Morphological phylogenetic analysis with inapplicable data. Biorxiv. doi:10.1101/209775