Overview

In the Introduction vignette you learned how to perform the most fundamental computations and visualizations for phylotranscriptomics analyses using the myTAI package. Especially the Introduction vignette showed you how to perform Phylostratigraphy and Divergence Stratigraphy as well as the construction of PhyloExpressionSets and DivergenceExpressionSets.

In the Intermediate vignette you learned about more detailed analyses and more specialized techniques to investigate the observed phylotranscriptomics patterns (TAI, TDI, RE, etc.).

This vignette shows you how to retrieve already published Phylostratigraphic Maps and Divergence Maps and how you can integrate these data sets into your phylotranscriptomics pipeline.

Biological Sequence Retrieval with biomartr

Both methods Phylostratigraphy and Divergence Stratigraphy need genome or proteome information for their analyses. This section will show you how to retrieve biological sequences in *.fasta format to be able to perform Phylostratigraphy and Divergence Stratigraphy.

The classical way to download biological sequences is via the Terminal:


# download CDS file of A. thaliana
curl ftp://ftp.ensemblgenomes.org/pub/
plants/release-23/fasta/arabidopsis_thaliana/
cds/Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz 
-o Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz

# download CDS file of A. lyrata
curl ftp://ftp.ensemblgenomes.org/pub/plants/
release-23/fasta/arabidopsis_lyrata/cds/
Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz 
-o Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz

Alternatively, you can use the Biological Data Retrieval package biomartr to download proteomes from the refseq database (see Sequence Retrieval Vignette for details).

# install.packages("devtools")

# install the current version of biomartr on your system
library(devtools)
install_github("HajkD/biomartr", build_vignettes = TRUE, dependencies = TRUE)
# download the proteome of Arabidopsis thaliana from refseq
# and store the corresponding proteome file in '_ncbi_downloads/proteomes'
Ath_Proteome <- getProteome( db           = "refseq", 
                             kingdom      = "plant",
                             organism     = "Arabidopsis thaliana",
                             clean_folder = FALSE )

Internally, the getProteome() function creates a directory named _ncbi_downloads/proteome in which corresponding proteomes are loaded and then sourced as data.table object into the current R session. When specifying clean_folder = FALSE, the _ncbi_downloads/proteomes folder will not be removed and the corresponding proteome does not need to be downloaded again when sourcing it into the current R session via getProteome().

Download Published Phylostratigraphic Maps and Divergence Maps

Here you can find a collection of scripts and documentation of already published Phylostratigraphic Maps and Divergence Maps and step by step instructions on how to retrieve them to be able to use them together with myTAI. Omitting the process of performing Phylostratigrahy and Divergence Stratigraphy yourself allows you to quickly combine already published Phylostratigraphic Maps and Divergence Maps with an gene expression data set of your interest to capture evolutionary signals with myTAI.