An introduction to the R package neat

Mirko Signorelli

Introduction

What’s neat?

neat is the R package that implements NEAT, the Network Enrichment Analysis Test which is presented in Signorelli, M., Vinciotti, V., Wit, E. C. (2016). NEAT: an efficient network enrichment analysis test. ArXiv preprint, arXiv:1604.01210.

The article can be found at https://arxiv.org/abs/1604.01210

Get started

In order to be able to use the package, you need to install it with

install.packages('neat')

and, then, to load it with the command

library('neat')

A first example

Let’s first have a quick look at an example of how a network enrichment analysis can be carried out with NEAT.

The analysis will tipically consist of three steps: preparation of the data, computation of the test and inspection of the results.

Preparation of the data

Let’s start by loading yeast, a list which contains the data that we will need for the analysis:

data(yeast) # load the data
ls(yeast) # display the content of the list
## [1] "esr1"       "esr2"       "goslimproc" "kegg"       "yeastnet"  
## [6] "ynetgenes"

Let’s say that we are interested to know whether a set of differentially expressed genes, yeast$esr2, can be related to some functional gene sets contained in yeast$goslimproc. Let’s focus the attention on two of these processes, namely ‘response to heat’ and ‘response to starvation’.

Before we can proceed with the analysis, we have to create two lists of gene sets, one (which I will call induced_genes) containing the set of differentially expressed genes and the other (called functional_sets) with the functional sets of interest:

induced_genes = list('ESR 2' = yeast$esr2) # set of differentially expressed genes 
#(ESR 2 is the set of induced ESR genes)
functional_sets = yeast$goslimproc[c(72,75)] # two functional gene sets of interest: 
#response to heat and to starvation

Besides these two lists, we will need two further objects:

Computation of the test

The computation of the test can be done with the function neat as follows:

test = neat(alist = induced_genes, blist = functional_sets, network = yeast$yeastnet,
            nettype = 'undirected', nodes = yeast$ynetgenes, alpha = 0.01)

Analysis of the results

The results are now saved in the object test, which we can display with the command print:

print(test)
##       A                      B nab expected_nab pvalue     conclusion
## 1 ESR 2       response_to_heat  86         96.9 0.2518  No enrichment
## 2 ESR 2 response_to_starvation 459        331.4 0.0000 Overenrichment

From the table we can see that the set of differentially expressed genes (ESR 2) is not enriched with respect to the set of genes involved in response to heating, whereas is overenriched with respect to the set of genes that are responsible for response to starvation. Thus, we can conclude that genes in ESR 2 are activated when the yeast cell is exposed to starvation, but not when exposed to heating.

A closer look to the package

The core of the package is the function neat:

neat(alist, blist, network, nettype, nodes, alpha = NULL, 
     anames = NULL, bnames = NULL)

The fundamental arguments of the function are:

Moreover, three optional arguments are alpha, which allows to specify the significance level of the test, and anames and bnames (they can be used to name the elements of alist and blist, if not already named).

As a (toy) example, let’s consider a partially directed network with 7 nodes defined by the following adjacency matrix

A = matrix(0, nrow=7, ncol=7)
labels = letters[1:7]
rownames(A) = labels; colnames(A) = labels
A[1,c(2,3)]=1; A[2,c(5,7)]=1;A[3,c(1,4)]=1;A[4,c(2,5,7)]=1;A[6,c(2,5)]=1;A[7,4]=1
print(A)
##   a b c d e f g
## a 0 1 1 0 0 0 0
## b 0 0 0 0 1 0 1
## c 1 0 0 1 0 0 0
## d 0 1 0 0 1 0 1
## e 0 0 0 0 0 0 0
## f 0 1 0 0 1 0 0
## g 0 0 0 1 0 0 0

How to specify the lists of gene sets

Let’s consider three sets of genes {a,e}, {c,g} and {d,f} and suppose we want to test whether there is enrichment from the first two sets to the third one.

First of all, let’s create a vector for each of the three sets:

set1 = c('a','e')
set2 = c('c','g')
set3 = c('d','f')

As we want to know whether there is enrichment from set1 and set2 to set3, we can create two gene lists, one (alist) containing set1 and set2 and the other (blist) containing set3:

alist = list('set 1' = set1, 'set 2' = set2)
blist = list('set 3' = set3)

Alternative network formats

Above we have defined the network with its adjacency matrix A. However, the network can be passed to neat in three alternative formats:

library(Matrix)
as(A, 'sparseMatrix')
## 7 x 7 sparse Matrix of class "dgCMatrix"
##   a b c d e f g
## a . 1 1 . . . .
## b . . . . 1 . 1
## c 1 . . 1 . . .
## d . 1 . . 1 . 1
## e . . . . . . .
## f . 1 . . 1 . .
## g . . . 1 . . .
##       [,1] [,2]
##  [1,] "a"  "b" 
##  [2,] "a"  "c" 
##  [3,] "b"  "e" 
##  [4,] "b"  "g" 
##  [5,] "c"  "a" 
##  [6,] "c"  "d" 
##  [7,] "d"  "b" 
##  [8,] "d"  "e" 
##  [9,] "d"  "g" 
## [10,] "f"  "b" 
## [11,] "f"  "e" 
## [12,] "g"  "d"

Network type

Set the argument nettype equal to 'undirected' if an undirected network is at hand, and equal to 'directed' if you are considering a directed or partially directed network.

Compute the test

Once you have prepared the lists of gene sets and the network, what you need is to run neat, without forgetting to specify the correct nettype (here nettype = 'directed) and the labels of nodes (here nodes = labels):

test1 = neat(alist = alist, blist = blist, network = A, 
             nettype = 'directed', nodes = labels)
print(test1)
##       A     B nab expected_nab     pvalue
## 1 set 1 set 3   0       0.3333 0.68181818
## 2 set 2 set 3   2       0.5000 0.04545455

If you want to add to the results a column specifying the conclusion of the test (overenrichment, no enrichment or underenrichment) for a given significance level, you use the option alpha:

test2 = neat(alist = alist, blist = blist, network = A, 
             nettype = 'directed', nodes = labels, alpha = 0.05)
print(test2)
##       A     B nab expected_nab     pvalue     conclusion
## 1 set 1 set 3   0       0.3333 0.68181818  No enrichment
## 2 set 2 set 3   2       0.5000 0.04545455 Overenrichment

Further details and material

The aim of this vignette is to provide a quick introduction to the computation of NEAT using R. Here I focused my attention on the fundamental aspects that one needs to use the package.

Further details, functions and examples can be found in the manual of the package, which is available at https://cran.r-project.org/package=neat

The description of the method is available in the article by Signorelli et al. (2016) which you can find here: https://arxiv.org/abs/1604.01210. A shorter version of the paper was presented at the 31st IWSM and published in the Conference proceedings: https://www.researchgate.net/publication/305082810_NEAT_an_efficient_Network_Enrichment_Analysis_Test