We consider the acute lymphocytic leukemia dataset, included with the simPATHy package, first published in (Chiaretti et al. 2005). This dataset contains expression of 3405 genes in two conditions with sample sizes \(n_1=37\) and \(n_2=41\).
library(simPATHy)
library(graph)
data("chimera")
dim(chimera)
## [1] 3405 78
table(colnames(chimera))
##
## 1 2
## 37 41
The column names indicate the condition (1 or 2) of each sample.
We take a subset of genes corresponding to the genes participating in the KEGG’s ``Acute Myeloid Leukemia’’ pathway.
Next, a directed acyclic graph is derived manually from this pathway.
graph<-gRbase::dag(~867:25+867:613+5295:867+5294:867+
207:5295+207:5294+4193:207+3551:207+
4792:3551+7157:4193+3265:6654+
3845:6654+6654:2885+2885:25+2885:613)
We take the first condition of this dataset as a reference condition for our example and begin by estimating the covariance matrix.
genes<-graph::nodes(graph)
data<-t(chimera[genes,colnames(chimera)==1])
S<-cov(data)
The matrix S
does not reflect conditional independence constraints imposed by the graph
. To impose these structural constraints simPATHy provides a function fitSgraph
for maximum likelihood estimation of covariance matrices in graphical models, for both Gaussian bayesian networks and Gaussian graphical models.
S<-fitSgraph(graph,S)
round(S[1:5,1:5],3)
## 867 25 613 5295 5294
## 867 0.115 -0.006 0.071 0.008 0.045
## 25 -0.006 0.240 0.000 0.000 -0.003
## 613 0.071 0.000 0.121 0.005 0.028
## 5295 0.008 0.000 0.005 0.317 0.003
## 5294 0.045 -0.003 0.028 0.003 0.246
The package also provides two plotting functions for graphical models. The first plotGraphNELD3
, focuses on the graphical structure, while the second one, plotCorGraph
, focuses on the correlation matrix. In both cases, the colors represent the strength of relation between nodes, where the user by setting the parameter type
chooses whether to show the pairwise correlation coefficient (type= "cor"
) or the partial correlation coefficient (type= "pcor"
).
plotGraphNELD3(graph,type = "cor",S1 = S)
plotCorGraph(S1 = S,type = "cor")
When the number of nodes is high and relations between them week, a user can improve the visibility by adjusting the color range colLim
(uncomment to launch).
lim<-round(max(abs(simPATHy:::riscala(S))[upper.tri(S)]),2)
#plotGraphNELD3(graph,type = "cor",S1 = S,colLim = c(-lim,lim))
plotCorGraph(S1 = S,type ="cor",colLim = c(-lim,lim))
Note that when an element is outside of the colLim
interval, it is colored gray in plotCorGraph
and represented as a dashed link in plotGraphNELD3
.
When plotting a correlation matrix, a user can also pass the associated graph to the plotCorGraph
function to plot the adjacency matrix over the correlation (or partial correlation) matrix.
plotCorGraph(S1 = S,type = "cor", graph = graph)
The zero elements of the adjacency matrix are represented as shaded squares, whereas non-zero elements are represented as squares with grey borderline.
Now that we have defined a graph and obtained a covariance matrix for the reference condition, we select a path that is to be dysregulated in the dysregulated condition. For simPATHy a path is defined as a list of edges od the graph. It can be set manually
path <- list(c("613","1398"),c("25","1398"),c("1398","5295"),
c("5295","207"),c("207","4193"),c("4193","7157"))
Alternatively, simPATHy provides a generatePATH
function that finds the shortest path connecting two given nodes.
path <- generatePath(graph,from="613",to="7157")
Finally, a user can select a path in an interactive plot by calling the function getPathShiny
. The desired path is chosen edge by edge, and upon completion a user presses button.
path <- getPathShiny(graph)
By dysregulation we intend some multiplicative change of a subset of a correlation matrix. To specify the strength of dysregulation a user provides two positive parameters min
and max
. The strength of dysregulation is then sampled uniformly from the interval [min, max]
: a value smaller than 1 represents deactivation (a relation between two variables is weakened), a value greater than 1 represents activation (a relation between two variables is strengthened). These parameters are specified for each path edge separately.
min<-c(2,8,2,0.1,0.5)
max<-c(2,10,2,4,0.5)
In some applications it might be of interest to change the direction of relation between two variables (the correlation coefficient changes sign). To allow for this possibility, simPATHy
provides prob
parameter. prob
is a number between 0 and 1, with 0 implying that the sign of the correlation coefficient should be changed, and 1 implying that the sign should be left unaltered (default setting). Values between the two extremes allow for random sign switch: the sign is changed with probability 1-prob
.
prob<-c(1,0,0,0.5,1)
dys<-cbind(min,max,prob)
rownames(dys)<-sapply(path,paste,collapse = "~")
dys
## min max prob
## 613~867 2.0 2.0 1.0
## 867~5295 8.0 10.0 0.0
## 5295~207 2.0 2.0 0.0
## 207~4193 0.1 4.0 0.5
## 4193~7157 0.5 0.5 1.0
For example, the correlation coefficient between variables 613
and 867
is to be activated in the dysregulated condition, more precisely, multiplied by two, while the relation between 4193
and 7157
is to be deactivated in the second condition (the correlation coefficient multiplied by 0.5). On the other hand, the nature of relation between 5295
and 207
is to be changed in the dysregulated condition (switch), since prob=0
implies sign change.
After choosing the sample sizes n1
and n2
for the two conditions (default is 500), we have set all the required parameters and can proceed by calling the main function simPATHy
.
set.seed(123)
Result<-simPATHy(graph,path,S,min,max,prob)
The output is a simPATHy
class object represented by a list of nine elements.
class(Result)
## [1] "simPATHy"
names(Result)
## [1] "dataset" "S1" "S2" "path" "strength"
## [6] "param" "correction" "mu1" "mu2"
The key element is the simulated dataset
containing n1+n2
observations from two conditions–reference condition cl1
and dysregulated condition cl2
–sampled from multivariate normal distributions with covariance matrices Result$S1
and Result$S2
, respectively.
round(Result$dataset[c(1:3,501:503),1:5],3)
## 867 25 613 5295 5294
## cl1 0.620 -0.045 0.132 0.178 0.696
## cl1 -0.125 -0.495 -0.460 -0.121 -0.258
## cl1 -0.228 -0.099 0.283 -0.023 -0.002
## cl2 -0.169 1.201 -0.201 0.374 0.545
## cl2 -0.557 0.680 -0.303 0.631 0.580
## cl2 0.555 1.337 0.304 -0.391 -0.694
By default observations are sampled from zero mean normal distributions; however, a user can specify different values for mu1
and mu2
.
We can also recover the dysregulation parameters.
Result$param
## $min
## [1] 2.0 8.0 2.0 0.1 0.5
##
## $max
## [1] 2.0 10.0 2.0 4.0 0.5
##
## $prob
## [1] 1.0 0.0 0.0 0.5 1.0
Sometimes the dysregulation specified by the above parameters is not admissible since the modified correlation coefficient lies outside the (-1,1) range. Furthermore, to avoid excessively strong dysregulations, the upper limit for the absolute value of the dysregulated correlation coefficient is set to \[\min(0.9; 1.25\max \left\{|\rho_{u,v}|, u\neq v\right\}),\] where \(R=(\rho_{u,v})\) is the correlation matrix of the reference condition. For this reason, simPATHy also returns the actual multiplicative constant applied to each path edge correlation coefficient.
Result$strength
## [1] 1.4933490 -8.8179538 -2.0000000 -0.2776703 0.5000000
When the dysregulation of the initial (reference condition) covariance matrix leads to a matrix that is no longer positive definite, the resulting matrix is corrected via internal function makePositiveDefinite
. Whether the correction has been performed and if yes, what is the constant added to the diagonal is also reported.
Result$correction
## $isCorrected
## [1] TRUE
##
## $correction
## [1] 0.1
The summary of the output is provided by the function easyLookDys
.
easyLookDys(Result)
edge | type | strength | cov.S1 | cov.S2 | cor..S1 | cor..S2 |
---|---|---|---|---|---|---|
613~867 | activation | 1.4933 | 0.0711 | 0.1062 | 0.3260 | 0.4869 |
867~5295 | switch | -8.8180 | 0.0084 | -0.0737 | 0.0279 | -0.2460 |
5295~207 | switch | -2.0000 | -0.0366 | 0.0682 | -0.0725 | 0.1353 |
207~4193 | switch | -0.2777 | 0.0240 | -0.0066 | 0.0700 | -0.0194 |
4193~7157 | deactivation | 0.5000 | 0.0216 | 0.0108 | 0.1157 | 0.0578 |
To visualize differences in two conditions we can use plotting functions mentioned previously: plotGraphNELD3
and plotCorGraph
. In this case, both functions take, in addition to the graph, two covariance matrices corresponding to two conditions and plot the difference between them (uncomment to launch).
#plotGraphNELD3(graph,type = "cor",S1 = Result$S1, S2 = Result$S2, colLim = c(-0.4,0.4))
plotCorGraph(S1 = Result$S1, S2 = Result$S2, type = "cor",
graph = graph, path = Result$path,colLim = c(-0.4,0.4))
A user can examine these plots in more detail by calling an interactive easyLookShiny
function.
easyLookShiny(Result, graph)
Chiaretti, Sabina, Xiaochun Li, Robert Gentleman, Antonella Vitale, Kathy S Wang, Franco Mandelli, Robin Foa, and Jerome Ritz. 2005. “Gene Expression Profiles of B-Lineage Adult Acute Lymphocytic Leukemia Reveal Genetic Patterns That Identify Lineage Derivation and Distinct Mechanisms of Transformation.” Clinical Cancer Research 11 (20). AACR: 7209–19.