Ecology is the scientific With the rapid and inexpensive next-generation sequencing technologies The structure of complex biological systems reflects not only their function but also into the habitates in which they envoled and are adapted to. Reverse Ecology- an emerging new frontier in Evolutionary Sysmtems biology at the interface of computational biology, genomics and environmental science, which uses population genomics to study ecology with no a priori assumptions about the organism(s) under consideration. This term was suggested by Li et al. during a conference on ecological genomics in Christchurch, New Zealand. It facilitates the translation of high through genomic data into large scale ecological data, and utilizes system-based method to obtain novel insights of pooly characterized microorganisms and relationships between microorgasnims or their environments in a superorganism.Traditional approach, however, can only applied to a small scale and for ralatively well-studied systems.
This manual is a brief introduction to structure, funcitons and usage of RevEcoR package. The RevEcoR package implementation a large-scale reconstruction of metabolic environmnets using a cross-species analysis, idenditifying the set of compounds with a graph-based algorithm mentioned in Borenstein et al. that each species extracts from its environmnet where the metabolic network data is obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) datase with KEGG API.This algorithm consists of several steps:
Downding metabolic networks data from KEGG database: all species metabolic network data could be downloaded from the KEGG PATHWAY datase with the KEGG REST API.
Reconstruction metabolic networks of all species: a directed graph whose nodes represent compounds and whose edges represent reactions linking substrates to products.
Identify seed set of a specific organism: each metabolic network was decomposed into its strongly conneted components (SCC) using Kasaraju's algorithm. The SCC forms a directed graph whose nodes are the components and whose edges are the orginal edges in the graph that connect nodes in two different components. Then, detectting the seed set with this SCCs.
Host-microbe and microbe-microbe cooperation analysis
Until RevEcoR is ready for CRAN or Bioconductor, you can install it directly from GitHub using devtools.
However, install_github
can only install the dependencies that have been
released on CRAN. Three dependencencies for ReveseEcologyR were released on
**Bioconductor**??
Biobase,
KEGGRESTand
mmnet. Users should install
these three packages manually first.
source("http://bioconductor.org/biocLite.R")
biocLite(c("mmnet","KEGGREST","Biobase"))
Then install RevEcoR
using devtools.
install.packages("RevEcoR")
## or you can install the latest version from github
if (!require(devtools)
install.packages("devtools")
devtools::install_github("yiluheihei/RevEcoR")
You'll also need to make sure your machine is able to build packages from source. See Package Development Prerequisites for the tools needed for your operating system.
After installation, you can load RevEcoR into current workspace by typing or pasting the following codes:
library(RevEcoR)
Metabolic data which contains reactions and metabolites information of each organism. In total, 3139 species metabolic , covering all taxonomic groups was download, since July 2014.
Futhermore, This package provides getOrgMetabolicData
to download the
specific organism metabolic data, and return a list where each element
consists of three elements: reaction, substrate and product. It will help to
keep a downloaded metabolic data in local. It not only can help keeping the
data accurate and speed up the process of network reconstruction, but also
allows you to reuse the data for other purposes.
For getOrgMetabolicData
, check for the existence of queried orgaism
metabolic data in the local database. If the data is not there, retrieve it
from KEGG and write into the local database. If not, return the data
immediately from the local database.Type ?getOrgMetabolicData
to see more
details.
## download sample metabolic data from remote KEGG database
buc <- getOrgMetabolicData("buc")
data(kegg_buc)
head(buc)
The crucial premise of Reverse Ecology is that genomic information can be converted into ecological information. Metabolism linking miroorgasnim with the biochemical environmnet acrossing compound exchanges: import of exogenous compounds and impact the composition of its environment via secretion of other compounds. Thus, metabolic network is the reflection of interation between organisms and its environment.
Graph-based representation of metabolic reactions where nodes represent compounds and edges represent reactions is a common tool in analyzing and studing metabolic networks. A directed edge from node a to b indicates that compound a is a substrate in some reaction that produces compound b.
reconstructGsMN
could be used to reconstruction the metabolicnetwork of a
specific organism.
KEGG organism database is a large scale resources and is widly used to reconsturct genome scale metabolic network(GsMN). However, large human microbiome data is generated with the development of Next-generation Sequencing. Spcies detected in the human microbiome is in booming and more non-cultured microbes has been detected. This microbes is not necessarily contains in KEGG.
Integrated microbial genomes (IMG) is a widely resouce and analysis tool for human microbiome. For metagenomic microbiome data, uses can anaysis in IMG and obtain the genomic information of species in the microbiome as well as the KEGG orthology (KO) functional annotation profile.
## species in KEGG
buc.net <- reconstructGsMN(kegg_buc, RefData = NULL)
igraph::print.igraph(buc.net)
## IGRAPH DN-- 259 681 --
## + attr: name (v/c)
igraph::plot.igraph(buc.net, vertex.label=NA, vertex.size=5, edge.arrow.size=0.1)
## ko annotation profile species detected in a human microbiome in IMG (not in KEGG)
annodir <- system.file("extdata/koanno.tab",package = "RevEcoR")
metabolic.data <- read.delim(annodir,stringsAsFactors=FALSE)
##load the reference metabolic data
data(RefDbcache,package="mmnet")
g2 <- reconstructGsMN(metabolic.data, RefData = RefDbcache)
Figure 1 Reconstruction metabolic network of *Buchnera aphidicola APS*
As the interactions with the environment was reflected in the metabolic networks, these networks could be used not only to infer metabolic function but alse to obtain insights into the growth environments in which the species evolved.
Apparantly, organisms can survive in a wide range of enviromnets and may activate only a subset of the pathways in the network of each environment, using a different set of exogenously acquired compounds (termed seed set). The seed set of the network is defined as the minimal set of compounds in the network that allows the synthesis of all other compouds, and can serve as a good proxy for its environment and can be conceived as the essential and effective biochemical environment.
getSeedSets
was used to detect the seed set of a metabolic network which
returns a seedset-class object. Futhermore, some methods was supported for
seedset-class, e.g length, nonseeds. For more details on seedset-class, see
?seedset-class
.It can help us to get the compound that organisms are
exogenously acquired compouds from the environment, representing the
organism's nutritional profile. This algorithm is based on a fast method
Kasaraju algorithm for SCC decomposition which is implementaed in Kasaraju
.
For more details, see ?getSeedSets
and ?KasarajuSCC
.
## seed set prediction
seed.set <- getSeedSets(buc.net, 0.2)
show(seed.set)
## Object of class seedset
## IGRAPH: -- 259 nodes 681 edges --
## seedset length 9
head(seed.set@seeds)
## [[1]]
## [1] "cpd:C15532"
##
## [[2]]
## [1] "cpd:C15811"
##
## [[3]]
## [1] "cpd:C05729"
##
## [[4]]
## [1] "cpd:C00993"
##
## [[5]]
## [1] "cpd:C00031"
##
## [[6]]
## [1] "cpd:C00392"
## The node colored with red represents the species' seed set
nodes <- igraph::V(buc.net)$name
seeds <- unlist(seed.set@seeds)
seed.index <- match(seeds,nodes)
node.color <- rep("SkyBlue2",length(nodes))
node.color[seed.index] <- "red"
igraph::plot.igraph(buc.net,
vertex.label=NA, vertex.size=5, edge.arrow.size=0.1,
vertex.color = node.color)
Figure 2The node colored with red represents the species' seed set
The topology of metabolic networks can provide insight not only into the
metabolic process that accur within each species, but also into interactions
between different species. Here we provides caculateCoopreationIndex
using
three cooperation index: competition index, coplementarity index
measure the microbe-microbe co-occurrence pattern. More details, see
?caculateCoopreationIndex
.
# ptr metabolic network
data(kegg_ptr)
##ptr.net <- reconstructGsMN(getOrgMetabolicData("ptr"))
ptr.net <- reconstructGsMN(kegg_ptr)
# cooperation analysis between buc and ptr
cooperation.index <- caculateCooperationIndex(buc.net,ptr.net)
cooperation.index
## $competition.index
## [,1] [,2]
## [1,] 1.00000000 0.1818182
## [2,] 0.03571429 1.0000000
##
## $complementarity.index
## [,1] [,2]
## [1,] 0.00000000 0.4545455
## [2,] 0.07142857 0.0000000
##
## $bsi.score
## [,1] [,2]
## [1,] 1.0000000 0.6363636
## [2,] 0.1071429 1.0000000
To further evaluate the predicted interactions of RevEcoR on a large scale, we integrated functions mentioned above to investigate species interactions in the gut microbiome. We focused on a list of 116 prevalent gut species, whose genome sequence is available in IMG database and sequence coverage is more than 1% in at least one metagenomic sample of 124 individuals. Genome annotation profiles of this 116 species was collected from IMG database and was used to calculated the interactions (competition and complementarity index) for all pairs of species.
For each species, we download the list of genes mapped to the Kyoto Encyclopedia of Genes and Genomes orthologous groups (KOs) was downloaded with a in-house R script. This data, which was used to reconstruct the metabolic network of each species, have been saved as gut_microbiome.rda in subdirectory data of RevEcoR. We can load it as the following code:
data(gut_microbiome)
## summary(gut_microbiome)
Then, it was used to reconstuct the metabolic network, predict the seed set, and finally predict the pairs of interactions between different species:
gut.nets <- lapply(gut_microbiome,reconstructGsMN)
seed.sets <- lapply(gut.nets,getSeedSets)
gut.interactions <- caculateCooperationIndex(gut.nets)
competition.index <- gut.interactions$competition.index
complementarity.index <- gut.interactions$complementarity.index
Specially, it will help us to predict whether species compete with one another tend to co-occur or to exclude by comparing the species interactions and co-occurrences. We obtained co-occurrence scores directly from Levy R's research, and saved it as occurence.tab in subdirectory inst/extdata of RevEcoR . Co-occurrence score was calculated based on species abundances across all samples and measured by the Jaccard similarity index.
Load the co-occurrence data:
occurrence.score <- read.delim(system.file("extdata/occurrence.tab",
package = "RevEcoR"),stringsAsFactors = FALSE, quote = "")
The Jaccard index measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. Thus, co-occurrence scores are generally symmetric whereas competition index or complementarity index are not.
A symmetric version of two interaction index was gnerated by replacing each element of the interaction indices with the mean of each value and that transpose value.
competition.index <- (competition.index + t(competition.index))/2
complementarity.index <- (complementarity.index + t(complementarity.index))/2
Subsequently, the Spearman correlation between the co-occurrence scores and the two interaction indices was calculated. A permutation-based Mantel test, which is commonly used in ecology, was used to determin the significance of this correlation.
## upper triangles, which is used to calculate the correlation
competition.upper <- competition.index[upper.tri(competition.index)]
occurrence.upper <- occurrence.score[upper.tri(occurrence.score)]
complementarity.upper <- complementarity.index[upper.tri(complementarity.index)]
## calculate the spearman correlation betwwen co-occurrence scores and two
## interactions indices
competition.cor <- cor(competition.upper,occurance.upper,method="spearman")
complematarity.cor <- cor(complementarity.upper,occurance.upper,method="spearman")
## permutation-based mantel test. Random permutation the co-occurance score
## 10000 times, P value is the fraction of correlations as high as or higher
## than the original
if (require(magrittr)){
null.stat <- replicate(10000,
sample(1:116) %>% occurrence.score[.,.] %>%
.[upper.tri(.)]
)
competition.null <- cor(competition.upper,null.stat)
complementarity.null <- cor(complementarity.upper,null.stat)
length(which(competition.null >= competition.cor)) ## 0 p.competition < 0.00001
length(which(competition.null >= complementarity.cor)) ## 0 p.complementarity< 0.00001
}
We found that competition index is significant positively correlated with co-occurrence (cor = 0.165, P < 10-4, Mantel correlation test), whereas the complementarity index is significant negatively correlated with co-occurrence (cor = -0.259, P < 10-4, Mantel correlation test). This suggests that competition is liked to be the key factor to promote the assembly of gut microorganisms
The version number of R and packages loaded for generating the vignette were:
sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
## [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RevEcoR_0.99.1 knitr_1.10.5
##
## loaded via a namespace (and not attached):
## [1] flexmix_2.3-12 igraph_0.7.1 Rcpp_0.11.5
## [4] XVector_0.6.0 magrittr_1.5 MASS_7.3-35
## [7] BiocGenerics_0.12.0 zlibbioc_1.12.0 IRanges_2.0.0
## [10] munsell_0.4.2 colorspace_1.2-6 lattice_0.20-31
## [13] stringr_0.6.2 httr_0.5 plyr_1.8.1
## [16] tools_3.2.0 nnet_7.3-9 parallel_3.2.0
## [19] grid_3.2.0 gtable_0.1.2 Biobase_2.26.0
## [22] biom_0.3.13 png_0.1-7 modeltools_0.2-21
## [25] gtools_3.4.1 digest_0.6.4 mmnet_1.4.0
## [28] RJSONIO_1.3-0 Matrix_1.2-0 reshape2_1.4
## [31] ggplot2_1.0.0 formatR_1.0 S4Vectors_0.4.0
## [34] RCurl_1.95-4.3 KEGGREST_1.6.0 evaluate_0.5.5
## [37] scales_0.2.4 Biostrings_2.34.0 stats4_3.2.0
## [40] XML_3.98-1.1 proto_0.3-10