The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
gsEasy
has a function gset
for calculating p-values of enrichment for sets (of genes) in ranked/scored lists (of genes) by permutation (see ‘Gene Set Enrichment Analysis’ described by Subramanian et al, 2005). The arguments of gset
are named as in the paper:
N
: the total number of genes,S
: integer
vector giving the ranks of the genes in the test set amongst the N
, or giving the indices within the scores vector r
(see below) or a character
vector of the names of genes in the test set,r
: (optional) vector of length N
of correlation scores, e.g. gene expression correlation. If unspecified, it defaults to 1-(i-1)/N
for the i
th gene. If S
is given as the names of genes, r
must either be a character
vector of genes in rank order or named by genes
(necessarily containing all the genes in S
).p
: a numeric value used to exponentiate the enrichment scores given by r
, with higher values having the effect of increasing the weight on the highest scores/ranks (for more details, see Subramanian et al, 2005). The default value is 1
[i.e. not transformed].Say we had a set of 5 genes which appeared at the top five ranks out of 1000 (i.e. highly enriched at the high ranks!). We could then calculate an enrichment p-value using the command:
gset(S=1:5, N=1000)
## [1] 9.9999e-06
So the p-value is close to zero. However for random sets, the p-values are distributed uniformly:
replicate(n=10, expr=gset(S=sample.int(n=1000, size=5), N=1000))
## [1] 0.7213930 0.1111111 0.1058394 0.5870647 0.1142857 0.9850746 0.9701493
## [8] 0.1840796 0.5074627 0.8905473
Alternatively, you can pass the names of genes as S
with a sorted list of gene names as r
(in which case the scores default to the ranks in the list), or a numeric vector of scores named by genes as r
.
gset(S=c("gene 1", "gene 5", "gene 40"), r=paste("gene", 1:100))
## [1] 0.07792208
Multiple gene sets can thus be tested for enrichment with a single call to a high level function such as sapply
(or, if you have many sets to test and multiple cores available, mclapply
), for instance:
c(list(1:5), replicate(n=10, simplify=FALSE, expr=sample.int(n=1000, size=5)))
gene_sets <-names(gene_sets) <- c("enriched set", paste("unenriched set", 1:10))
gene_sets
## $`enriched set`
## [1] 1 2 3 4 5
##
## $`unenriched set 1`
## [1] 6 53 974 143 936
##
## $`unenriched set 2`
## [1] 438 237 252 458 293
##
## $`unenriched set 3`
## [1] 624 680 778 230 727
##
## $`unenriched set 4`
## [1] 997 254 714 307 513
##
## $`unenriched set 5`
## [1] 104 786 263 857 632
##
## $`unenriched set 6`
## [1] 149 970 961 73 265
##
## $`unenriched set 7`
## [1] 435 138 982 387 722
##
## $`unenriched set 8`
## [1] 667 441 506 186 913
##
## $`unenriched set 9`
## [1] 791 911 638 468 274
##
## $`unenriched set 10`
## [1] 758 626 817 698 45
sapply(gene_sets, function(set) gset(S=set, N=1000))
## enriched set unenriched set 1 unenriched set 2 unenriched set 3
## 0.0000099999 0.0017199828 0.2388059701 0.9353233831
## unenriched set 4 unenriched set 5 unenriched set 6 unenriched set 7
## 0.7562189055 0.5024875622 0.0247097529 0.4228855721
## unenriched set 8 unenriched set 9 unenriched set 10
## 0.7462686567 0.9054726368 0.5124378109
gsEasy
has a function get_ontological_gene_sets
for creating lists of gene sets corresponding to annotation with ontological terms such that ontological is-a relations are propagated. get_ontological_gene_sets
accepts an ontological_index
(see the R package ontologyIndex
for more details) argument and two character vectors, corresponding to genes and terms respectively, whereby the n-th element in each vector corresponds to one annotation pair. The result, a list of character vectors of gene names, can then be used as an argument of gset
.
library(ontologyIndex)
data(hpo)
data.frame(
df <-gene=c("gene 1", "gene 2"),
term=c("HP:0000598", "HP:0000118"),
name=hpo$name[c("HP:0000598", "HP:0000118")],
stringsAsFactors=FALSE,
row.names=NULL)
df
## gene term name
## 1 gene 1 HP:0000598 Abnormality of the ear
## 2 gene 2 HP:0000118 Phenotypic abnormality
get_ontological_gene_sets(hpo, gene=df$gene, term=df$term)
## $`HP:0000001`
## [1] "gene 1" "gene 2"
##
## $`HP:0000118`
## [1] "gene 1" "gene 2"
##
## $`HP:0000598`
## [1] "gene 1"
gsEasy
comes with a list
of GO annotations, GO_gene_sets
[based on annotations downloaded from geneontology.org on 07/08/2016], which can be loaded with data
. This comprises a list
of all gene sets (i.e. character
vectors of gene names) associated with each GO term, for GO terms being annotated with at most 500 genes.
data(GO_gene_sets)
1:6] GO_gene_sets[
## $`GO:0000002`
## [1] "AKT3" "LONP1" "MEF2A" "MGME1" "MPV17" "MRPL17"
## [7] "MRPL39" "OPA1" "PIF1" "SLC25A33" "SLC25A36" "SLC25A4"
## [13] "TYMP"
##
## $`GO:0000003`
## [1] "EIF4H" "IL12B" "LEP" "LEPR" "MMP23A" "RHOXF1" "SEPP1" "STAT3"
## [9] "TNP1" "VGF" "WDR43"
##
## $`GO:0000009`
## [1] "ALG12"
##
## $`GO:0000010`
## [1] "PDSS1" "PDSS2"
##
## $`GO:0000011`
## [1] "RBSN"
##
## $`GO:0000012`
## [1] "APLF" "APTX" "E9PQ18" "LIG4" "M0R2N6" "Q6ZNB5" "SIRT1" "TDP1"
## [9] "TNP1" "XRCC1"
It also has a function get_GO_gene_sets
which is a specialisation of get_ontological_gene_sets
for the Gene Ontology (GO) which can be called passing just a file path to the annotation file (official up-to-date version available at https://geneontology.org/).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.