Data preparation

We consider the acute lymphocytic leukemia dataset, included with the simPATHy package, first published in (Chiaretti et al. 2005). This dataset contains expression of 3405 genes in two conditions with sample sizes \(n_1=37\) and \(n_2=41\).

library(simPATHy)
library(graph)
data("chimera")
dim(chimera)
## [1] 3405   78
table(colnames(chimera))
## 
##  1  2 
## 37 41

The column names indicate the condition (1 or 2) of each sample.

We take a subset of genes corresponding to the genes participating in the KEGG’s ``Acute Myeloid Leukemia’’ pathway.

Next, a directed acyclic graph is derived manually from this pathway.

graph<-gRbase::dag(~867:25+867:613+5295:867+5294:867+
207:5295+207:5294+4193:207+3551:207+
4792:3551+7157:4193+3265:6654+
3845:6654+6654:2885+2885:25+2885:613)

We take the first condition of this dataset as a reference condition for our example and begin by estimating the covariance matrix.

genes<-graph::nodes(graph)
data<-t(chimera[genes,colnames(chimera)==1])
S<-cov(data) 

The matrix S does not reflect conditional independence constraints imposed by the graph. To impose these structural constraints simPATHy provides a function fitSgraph for maximum likelihood estimation of covariance matrices in graphical models, for both Gaussian bayesian networks and Gaussian graphical models.

S<-fitSgraph(graph,S)
round(S[1:5,1:5],3)
##         867     25   613  5295   5294
## 867   0.115 -0.006 0.071 0.008  0.045
## 25   -0.006  0.240 0.000 0.000 -0.003
## 613   0.071  0.000 0.121 0.005  0.028
## 5295  0.008  0.000 0.005 0.317  0.003
## 5294  0.045 -0.003 0.028 0.003  0.246

The package also provides two plotting functions for graphical models. The first plotGraphNELD3, focuses on the graphical structure, while the second one, plotCorGraph, focuses on the correlation matrix. In both cases, the colors represent the strength of relation between nodes, where the user by setting the parameter type chooses whether to show the pairwise correlation coefficient (type= "cor") or the partial correlation coefficient (type= "pcor").

plotGraphNELD3(graph,type = "cor",S1 = S)
plotGraphNELD3
plotCorGraph(S1 = S,type = "cor")

When the number of nodes is high and relations between them week, a user can improve the visibility by adjusting the color range colLim (uncomment to launch).

lim<-round(max(abs(simPATHy:::riscala(S))[upper.tri(S)]),2)
#plotGraphNELD3(graph,type = "cor",S1 = S,colLim = c(-lim,lim))
plotCorGraph(S1 = S,type ="cor",colLim = c(-lim,lim))

Note that when an element is outside of the colLim interval, it is colored gray in plotCorGraph and represented as a dashed link in plotGraphNELD3.

When plotting a correlation matrix, a user can also pass the associated graph to the plotCorGraph function to plot the adjacency matrix over the correlation (or partial correlation) matrix.

plotCorGraph(S1 = S,type = "cor", graph = graph)

The zero elements of the adjacency matrix are represented as shaded squares, whereas non-zero elements are represented as squares with grey borderline.

Selecting a path in a graph

Now that we have defined a graph and obtained a covariance matrix for the reference condition, we select a path that is to be dysregulated in the dysregulated condition. For simPATHy a path is defined as a list of edges od the graph. It can be set manually

path <- list(c("613","1398"),c("25","1398"),c("1398","5295"),
 c("5295","207"),c("207","4193"),c("4193","7157"))

Alternatively, simPATHy provides a generatePATH function that finds the shortest path connecting two given nodes.

path <- generatePath(graph,from="613",to="7157")

Finally, a user can select a path in an interactive plot by calling the function getPathShiny. The desired path is chosen edge by edge, and upon completion a user presses button.

path <- getPathShiny(graph)
getPathShiny

Selecting parameters of dysregulation

By dysregulation we intend some multiplicative change of a subset of a correlation matrix. To specify the strength of dysregulation a user provides two positive parameters min and max. The strength of dysregulation is then sampled uniformly from the interval [min, max]: a value smaller than 1 represents deactivation (a relation between two variables is weakened), a value greater than 1 represents activation (a relation between two variables is strengthened). These parameters are specified for each path edge separately.

min<-c(2,8,2,0.1,0.5)
max<-c(2,10,2,4,0.5)

In some applications it might be of interest to change the direction of relation between two variables (the correlation coefficient changes sign). To allow for this possibility, simPATHy provides prob parameter. prob is a number between 0 and 1, with 0 implying that the sign of the correlation coefficient should be changed, and 1 implying that the sign should be left unaltered (default setting). Values between the two extremes allow for random sign switch: the sign is changed with probability 1-prob.

prob<-c(1,0,0,0.5,1)
dys<-cbind(min,max,prob)
rownames(dys)<-sapply(path,paste,collapse = "~")
dys
##           min  max prob
## 613~867   2.0  2.0  1.0
## 867~5295  8.0 10.0  0.0
## 5295~207  2.0  2.0  0.0
## 207~4193  0.1  4.0  0.5
## 4193~7157 0.5  0.5  1.0

For example, the correlation coefficient between variables 613 and 867 is to be activated in the dysregulated condition, more precisely, multiplied by two, while the relation between 4193 and 7157 is to be deactivated in the second condition (the correlation coefficient multiplied by 0.5). On the other hand, the nature of relation between 5295 and 207 is to be changed in the dysregulated condition (switch), since prob=0 implies sign change.

Results

After choosing the sample sizes n1 and n2 for the two conditions (default is 500), we have set all the required parameters and can proceed by calling the main function simPATHy.

set.seed(123)
Result<-simPATHy(graph,path,S,min,max,prob)

The output is a simPATHy class object represented by a list of nine elements.

class(Result)
## [1] "simPATHy"
names(Result)
## [1] "dataset"    "S1"         "S2"         "path"       "strength"  
## [6] "param"      "correction" "mu1"        "mu2"

The key element is the simulated dataset containing n1+n2 observations from two conditions–reference condition cl1 and dysregulated condition cl2–sampled from multivariate normal distributions with covariance matrices Result$S1 and Result$S2, respectively.

round(Result$dataset[c(1:3,501:503),1:5],3)
##        867     25    613   5295   5294
## cl1  0.620 -0.045  0.132  0.178  0.696
## cl1 -0.125 -0.495 -0.460 -0.121 -0.258
## cl1 -0.228 -0.099  0.283 -0.023 -0.002
## cl2 -0.169  1.201 -0.201  0.374  0.545
## cl2 -0.557  0.680 -0.303  0.631  0.580
## cl2  0.555  1.337  0.304 -0.391 -0.694

By default observations are sampled from zero mean normal distributions; however, a user can specify different values for mu1 and mu2.

We can also recover the dysregulation parameters.

Result$param
## $min
## [1] 2.0 8.0 2.0 0.1 0.5
## 
## $max
## [1]  2.0 10.0  2.0  4.0  0.5
## 
## $prob
## [1] 1.0 0.0 0.0 0.5 1.0

Sometimes the dysregulation specified by the above parameters is not admissible since the modified correlation coefficient lies outside the (-1,1) range. Furthermore, to avoid excessively strong dysregulations, the upper limit for the absolute value of the dysregulated correlation coefficient is set to \[\min(0.9; 1.25\max \left\{|\rho_{u,v}|, u\neq v\right\}),\] where \(R=(\rho_{u,v})\) is the correlation matrix of the reference condition. For this reason, simPATHy also returns the actual multiplicative constant applied to each path edge correlation coefficient.

Result$strength
## [1]  1.4933490 -8.8179538 -2.0000000 -0.2776703  0.5000000

When the dysregulation of the initial (reference condition) covariance matrix leads to a matrix that is no longer positive definite, the resulting matrix is corrected via internal function makePositiveDefinite. Whether the correction has been performed and if yes, what is the constant added to the diagonal is also reported.

Result$correction
## $isCorrected
## [1] TRUE
## 
## $correction
## [1] 0.1

The summary of the output is provided by the function easyLookDys.

easyLookDys(Result)
edge type strength cov.S1 cov.S2 cor..S1 cor..S2
613~867 activation 1.4933 0.0711 0.1062 0.3260 0.4869
867~5295 switch -8.8180 0.0084 -0.0737 0.0279 -0.2460
5295~207 switch -2.0000 -0.0366 0.0682 -0.0725 0.1353
207~4193 switch -0.2777 0.0240 -0.0066 0.0700 -0.0194
4193~7157 deactivation 0.5000 0.0216 0.0108 0.1157 0.0578

To visualize differences in two conditions we can use plotting functions mentioned previously: plotGraphNELD3 and plotCorGraph. In this case, both functions take, in addition to the graph, two covariance matrices corresponding to two conditions and plot the difference between them (uncomment to launch).

#plotGraphNELD3(graph,type = "cor",S1 = Result$S1, S2 = Result$S2, colLim = c(-0.4,0.4))
plotCorGraph(S1 = Result$S1, S2 = Result$S2, type = "cor",
graph = graph, path = Result$path,colLim = c(-0.4,0.4))

A user can examine these plots in more detail by calling an interactive easyLookShiny function.

easyLookShiny(Result, graph)
easyLookShiny

References

Chiaretti, Sabina, Xiaochun Li, Robert Gentleman, Antonella Vitale, Kathy S Wang, Franco Mandelli, Robin Foa, and Jerome Ritz. 2005. “Gene Expression Profiles of B-Lineage Adult Acute Lymphocytic Leukemia Reveal Genetic Patterns That Identify Lineage Derivation and Distinct Mechanisms of Transformation.” Clinical Cancer Research 11 (20). AACR: 7209–19.