Overview

In the Introduction vignette users learned how to perform the most fundamental computations and visualizations for phylotranscriptomics analyses using the myTAI package. Especially in the Introduction vignette we demonstrated how to perform Phylostratigraphy and Divergence Stratigraphy as well as the construction of PhyloExpressionSets and DivergenceExpressionSets.

In the Intermediate vignette users learned about more detailed analyses and more specialized techniques to investigate the observed phylotranscriptomics patterns (TAI, TDI, RE, etc.).

This vignette aims to provide users with in-depth tutorials on how to retrieve published Phylostratigraphic Maps and Divergence Maps and how they can integrate these data sets into their own phylotranscriptomics pipeline.

Biological Sequence Retrieval with biomartr

Both methods Phylostratigraphy and Divergence Stratigraphy need genome or proteome information for their analyses. This section will show users how to retrieve biological sequences in *.fasta format to be able to perform Phylostratigraphy and Divergence Stratigraphy.

The classical way to download biological sequences is via the Terminal:


# download CDS file of A. thaliana
curl ftp://ftp.ensemblgenomes.org/pub/
plants/release-23/fasta/arabidopsis_thaliana/
cds/Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz 
-o Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz

# download CDS file of A. lyrata
curl ftp://ftp.ensemblgenomes.org/pub/plants/
release-23/fasta/arabidopsis_lyrata/cds/
Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz 
-o Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz

Alternatively, users can use the Biological Data Retrieval package biomartr to download proteomes from the refseq database (see Sequence Retrieval Vignette for details).

# install.packages("devtools")

# install the current version of biomartr on your system
library(devtools)
install_github("HajkD/biomartr", build_vignettes = TRUE, dependencies = TRUE)
# download the proteome of Arabidopsis thaliana from refseq
# and store the corresponding proteome file in '_ncbi_downloads/proteomes'
Ath_Proteome <- getProteome( db           = "refseq", 
                             kingdom      = "plant",
                             organism     = "Arabidopsis thaliana",
                             clean_folder = FALSE )

Internally, the getProteome() function creates a directory named _ncbi_downloads/proteome in which corresponding proteomes are loaded and then sourced as data.table object into the current R session. When specifying clean_folder = FALSE, the _ncbi_downloads/proteomes folder will not be removed and the corresponding proteome does not need to be downloaded again when sourcing it into the current R session via getProteome().

Download Published Phylostratigraphic Maps and Divergence Maps

Here you can find a collection of scripts and documentation of already published Phylostratigraphic Maps and Divergence Maps and step by step instructions on how to retrieve them to be able to use them together with myTAI. Omitting the process of performing Phylostratigrahy and Divergence Stratigraphy yourself allows you to quickly combine already published Phylostratigraphic Maps and Divergence Maps with an gene expression data set of your interest to capture evolutionary signals with myTAI.