In the Introduction vignette you learned how to perform the most fundamental computations and visualizations for phylotranscriptomics analyses using the myTAI
package. Especially the Introduction vignette showed you how to perform Phylostratigraphy
and Divergence Stratigraphy
as well as the construction of PhyloExpressionSets
and DivergenceExpressionSets
.
In the Intermediate vignette you learned about more detailed analyses and more specialized techniques to investigate the observed phylotranscriptomics patterns (TAI
, TDI
, RE
, etc.).
This vignette shows you how to retrieve already published Phylostratigraphic Maps
and Divergence Maps
and how you can integrate these data sets into your phylotranscriptomics pipeline.
biomartr
Both methods Phylostratigraphy
and Divergence Stratigraphy
need genome or proteome information for their analyses. This section will show you how to retrieve biological sequences in *.fasta
format to be able to perform Phylostratigraphy
and Divergence Stratigraphy
.
The classical way to download biological sequences is via the Terminal
:
# download CDS file of A. thaliana
curl ftp://ftp.ensemblgenomes.org/pub/
plants/release-23/fasta/arabidopsis_thaliana/
cds/Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz
-o Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz
# download CDS file of A. lyrata
curl ftp://ftp.ensemblgenomes.org/pub/plants/
release-23/fasta/arabidopsis_lyrata/cds/
Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz
-o Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz
Alternatively, you can use the Biological Data Retrieval package biomartr to download proteomes from the refseq database (see Sequence Retrieval Vignette for details).
# install.packages("devtools")
# install the current version of biomartr on your system
library(devtools)
install_github("HajkD/biomartr", build_vignettes = TRUE, dependencies = TRUE)
# download the proteome of Arabidopsis thaliana from refseq
# and store the corresponding proteome file in '_ncbi_downloads/proteomes'
Ath_Proteome <- getProteome( db = "refseq",
kingdom = "plant",
organism = "Arabidopsis thaliana",
clean_folder = FALSE )
Internally, the getProteome()
function creates a directory named _ncbi_downloads/proteome
in which corresponding proteomes are loaded and then sourced as data.table object into the current R session. When specifying clean_folder = FALSE
, the _ncbi_downloads/proteomes
folder will not be removed and the corresponding proteome does not need to be downloaded again when sourcing it into the current R session via getProteome()
.
Phylostratigraphic Maps
and Divergence Maps
Here you can find a collection of scripts and documentation of already published Phylostratigraphic Maps
and Divergence Maps
and step by step instructions on how to retrieve them to be able to use them together with myTAI
. Omitting the process of performing Phylostratigrahy
and Divergence Stratigraphy
yourself allows you to quickly combine already published Phylostratigraphic Maps
and Divergence Maps
with an gene expression data set of your interest to capture evolutionary signals with myTAI
.