The wTO
package, can be used on any kind of count data, but we highly recommend to use normalized and quality controlled data according to the data type such as RMA, MD5 for microarray, RPKM, TPM or PKM for RNA-seq, sample normalized data for metagenomics.
As an example, the package contains three data sets, two from microarray chips (Microarray_Expression1
and Microarray_Expression2
), and one from abundance in metagenomics (metagenomics_abundance
).
The wTO method is a method for building networks based on pairwise correlations normalized and corrected by all shared correlations. For this reason, the user can choose a set of factors of interest, called here Overlaps, those are the nodes that will be corrected and normalized by all other factors in the dataset. Those factors can be Transcription Factor, long non coding RNAs, a set of species of interest etc.
The wTO
package contains 2 data sets that were obtained using expression arrays (Microarray_Expression1
and Microarray_Expression2
), they were previously normalized and the quality control was done. We will use it to build the wTO network using the different methods implemented in the package.
First we are going to inspect those data sets.
require(wTO)
#> Loading required package: wTO
require(magrittr)
#> Loading required package: magrittr
data("ExampleGRF")
data("Microarray_Expression1")
data("Microarray_Expression2")
dim(Microarray_Expression1)
#> [1] 268 18
dim(Microarray_Expression2)
#> [1] 268 18
Microarray_Expression1[1:5,1:5]
#> ID1 ID2 ID3 ID4 ID5
#> FAM122B 5.325653 5.039814 5.099828 5.053185 5.213816
#> DEFB108B 2.038747 1.965599 1.925807 1.977435 2.079381
#> CCSER2 4.973347 4.865783 4.818910 5.024392 4.697314
#> GPD2 5.453287 5.595471 5.223886 5.130226 5.370672
#> HECW1 4.350837 4.279759 4.218375 4.472152 4.408025
Microarray_Expression2[1:5,1:5]
#> ID1 ID2 ID3 ID4 ID5
#> FAM122B 5.532142 6.395654 5.159082 5.806040 5.339848
#> DEFB108B 2.456210 1.993044 2.251673 2.440018 2.493610
#> CCSER2 5.164945 4.923511 4.979691 5.080116 5.014569
#> GPD2 5.742455 5.649180 5.430411 6.007418 5.662126
#> HECW1 4.595407 5.243644 4.802716 4.957706 4.738554
head(ExampleGRF)
#> x
#> 1 ACAD8
#> 2 ANAPC2
#> 3 ANKRD22
#> 4 ANKRD2
#> 5 ARHGAP35
#> 6 ASH1L
Please, note that the individuals are in the columns and the gene expressions are in the rows. Moreover, the row.names()
are the names of the genes. The list of genes that will be used for measuring the interactions are in ExampleGRF. There should always be more than 2 of them contained in the expression set. If there are no common nodes to be measured, the method will retun an error.
sum(ExampleGRF$x %in% row.names(Microarray_Expression1))
#> [1] 168
We can run the wTO
package with 3 modes. The first one is running the wTO without resampling. For that we can use the wTO()
. The second one, wTO.Complete()
, gives you the whole diagnosis plot, hard-threshold on the ωi, j, the ωi, j, |ωi, j| values and p-values. The last mode, wTO.fast()
, just returns the ωi, j values and p-value.
wTO()
function:To use the wTO()
function, the first step is to compute the correlation among the nodes of interest using CorrelationOverlap()
and then use it as input for the wTO()
. In the first function the user is allowed to choose the method for correlation between Pearson (‘p’) or Spearman (‘s’). The second function allows the choice between absolute values (‘abs’) or signed values (‘sign’). Please, keep in mind that the result of the wTO()
function is a matrix, and it can be easily converted to an edge list using the function wTO.in.line()
.
wTO_p_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'abs')
wTO_p_abs[1:5,1:5]
#> ZNF333 ZNF28 ANKRD22 ZFR TRIM33
#> ZNF333 0.352 0.237 0.269 0.242 0.241
#> ZNF28 0.237 0.287 0.209 0.206 0.239
#> ANKRD22 0.269 0.209 0.299 0.199 0.252
#> ZFR 0.242 0.206 0.199 0.328 0.258
#> TRIM33 0.241 0.239 0.252 0.258 0.361
wTO_p_abs %<>% wTO.in.line()
head(wTO_p_abs)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 0.237
#> 2: ZNF333 ANKRD22 0.269
#> 3: ZNF28 ANKRD22 0.209
#> 4: ZNF333 ZFR 0.242
#> 5: ZNF28 ZFR 0.206
#> 6: ANKRD22 ZFR 0.199
wTO_s_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'abs') %>% wTO.in.line()
head(wTO_s_abs)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 0.236
#> 2: ZNF333 ANKRD22 0.258
#> 3: ZNF28 ANKRD22 0.215
#> 4: ZNF333 ZFR 0.264
#> 5: ZNF28 ZFR 0.187
#> 6: ANKRD22 ZFR 0.193
wTO_p_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_p_sign)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 -0.099
#> 2: ZNF333 ANKRD22 -0.185
#> 3: ZNF28 ANKRD22 0.076
#> 4: ZNF333 ZFR -0.117
#> 5: ZNF28 ZFR -0.077
#> 6: ANKRD22 ZFR -0.036
wTO_s_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_s_sign)
#> Node.1 Node.2 wTO
#> 1: ZNF333 ZNF28 -0.064
#> 2: ZNF333 ANKRD22 -0.143
#> 3: ZNF28 ANKRD22 0.029
#> 4: ZNF333 ZFR -0.164
#> 5: ZNF28 ZFR -0.011
#> 6: ANKRD22 ZFR 0.024
wTO.Complete()
function:The usage of the function wTO.Complete()
is straight-forward. No plug-in-functions() are necessary. The arguments parsed to the wTO.Complete()
functions are the number k of threads that should be used for computing the ωi, j, the amount of replications, n, the expression matrix, Data, the Overlapping nodes, the correlation method (Pearson or Spearman) for the method_resampling that should be Bootstrap, BlockBootstrap or Reshuffle, the p-value correction method, pvalmethod (any from the p.adjust.methods), if the correlation should be saved, the δ is the expected difference, expected.diff, between the resampled values and the ωi, j and also if the diagnosis plot should be plotted.
wTO_s_sign_complete = wTO.Complete(k = 5, n = 250, Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', method_resampling = 'Bootstrap', pvalmethod = 'BH', savecor = TRUE, expected.diff = 0.2, plot = TRUE)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
The diagnosis plot shows the quality of the resampling (first two plots). The closer the purple line to the black line, the better. The ωi, j vs |ωi, j| shows the amount of ωi, j being affected by cancellations on the heuristics of the method, the more similar to a smile plot, the better. The last two plots show the relashionship between p-values and the ωi, j. It is expected that higher ω’s presents lower p-values.
The resulting object from the wTO.Complete()
function is a list containing: * wTO an edge list of informations such as the signed and unsigned ωi, j values its raw and adjusted p-values. * Correlation values, also as an edge list * Quantiles, the quantiles from the empirical distribution and the calculated ω’s from the original data, for both signed and unsigned networks.
wTO_s_sign_complete
#> $wTO
#> Node.1 Node.2 wTO_sign wTO_abs pval_sig pval_abs Padj_sig
#> 1: ZNF333 ZNF28 -0.099 0.237 0.168 0.004 0.3607366
#> 2: ZNF333 ANKRD22 -0.185 0.269 0.188 0.016 0.3607366
#> 3: ZNF333 ZFR -0.117 0.242 0.180 0.012 0.3607366
#> 4: ZNF333 TRIM33 0.007 0.241 0.136 0.008 0.3607366
#> 5: ZNF333 RIMS3 -0.325 0.409 0.144 0.000 0.3607366
#> ---
#> 14024: ANAPC2 SBNO2 -0.147 0.298 0.156 0.000 0.3607366
#> 14025: ANAPC2 ZNF528 -0.142 0.222 0.152 0.016 0.3607366
#> 14026: TIGD7 SBNO2 -0.297 0.354 0.128 0.004 0.3607366
#> 14027: TIGD7 ZNF528 -0.099 0.219 0.212 0.032 0.3607366
#> 14028: SBNO2 ZNF528 0.141 0.311 0.368 0.024 0.4030531
#> Padj_abs
#> 1: 0.01167541
#> 2: 0.02395390
#> 3: 0.02083624
#> 4: 0.01712559
#> 5: 0.00000000
#> ---
#> 14024: 0.00000000
#> 14025: 0.02395390
#> 14026: 0.01167541
#> 14027: 0.03670450
#> 14028: 0.03016774
#>
#> $Correlation
#> Node.1 Node.2 Cor
#> 1: FAM122B DEFB108B 0.366857931
#> 2: FAM122B CCSER2 0.278870911
#> 3: DEFB108B CCSER2 -0.252482453
#> 4: FAM122B GPD2 -0.005649124
#> 5: DEFB108B GPD2 -0.107064848
#> ---
#> 35774: TRIM23 ZNF528 0.054249174
#> 35775: ZNF559 ZNF528 -0.218309729
#> 35776: ANAPC2 ZNF528 -0.013821370
#> 35777: TIGD7 ZNF528 0.011807143
#> 35778: SBNO2 ZNF528 0.092317502
#>
#> $Quantiles
#> 0.1% 2.5% 10% 90% 97.5% 99.9%
#> Empirical.Quantile -0.56 -0.46 -0.34 0.37 0.48 0.56
#> Quantile -0.50 -0.40 -0.28 0.32 0.43 0.52
#> Empirical.Quantile.abs 0.21 0.24 0.27 0.47 0.53 0.57
#> Quantile.abs 0.17 0.19 0.21 0.41 0.47 0.53
#>
#> attr(,"class")
#> [1] "wTO" "list"
wTO.fast()
function:The wTO.fast()
function is a simplified verion of the wTO.Complete()
function, that doesn’t return diagnosis, correlation, nor the quantiles, but allows the user to choose the method for correlation, the sign of the ω to be calculated and the resampling method should be one of the two Bootrastap or BlockBootstrap. The p-values are the raw p-values and if the user desires to calculate its correction it can be easily done as shown above.
fast_example = wTO.fast(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's', sign = 'sign', delta = 0.2, n = 250, method_resampling = 'Bootstrap')
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
head(fast_example)
#> Node.1 Node.2 wTO pval
#> 1: ZNF333 ZNF28 -0.064 0.264
#> 2: ZNF333 ANKRD22 -0.143 0.236
#> 3: ZNF28 ANKRD22 0.029 0.188
#> 4: ZNF333 ZFR -0.164 0.232
#> 5: ZNF28 ZFR -0.011 0.256
#> 6: ANKRD22 ZFR 0.024 0.264
fast_example$adj.pval = p.adjust(fast_example$pval)
Along with the expression data, the wTO
package also includes a metagenomics dataset that is the abundance of some OTU’s in bacterias collected since 1997. More about this data can be found at \[<https://www.ebi.ac.uk/metagenomics/projects/ERP013549>\].
The OTU (Operational Taxonomic Units) contains the taxonomy of the particular OTU and from Week1 to Week98, the abundance of that particular OTU in that week.
data("metagenomics_abundance")
metagenomics_abundance[2:10, 1:10]
#> OTU
#> 2 Root;k__Archaea;p__Euryarchaeota;c__Thermoplasmata;o__E2;f__MarinegroupII;g__;s__
#> 3 Root;k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__OCS155;g__;s__
#> 4 Root;k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Microbacteriaceae;g__;s__
#> 5 Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__;s__
#> 6 Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__JTB248;s__
#> 7 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__;g__;s__
#> 8 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__;s__
#> 9 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__Fluviicola;s__
#> 10 Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__;s__
#> Week1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9
#> 2 1 6 0 0 0 1 0 1 0
#> 3 0 0 0 0 0 0 0 5 0
#> 4 0 0 0 0 0 0 0 0 0
#> 5 0 1 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0 0 0
#> 7 0 1 0 0 0 0 0 1 0
#> 8 0 0 0 0 0 0 0 1 0
#> 9 0 0 0 0 0 0 0 1 0
#> 10 0 1 0 0 0 0 0 7 0
Before we are able to define the network, we have first to understand the patterns of autocorrelation of each species, and then define the lag, that will be used for the BlockBootstrap resampling in the wTO.Complete()
or fast.wTO()
functions. To define the lag, we use autocorrelation function acf()
.
row.names(metagenomics_abundance) = metagenomics_abundance$OTU
metagenomics_abundance = metagenomics_abundance[,-1]
par(mfrow = c(3,3))
for ( i in 1:nrow(metagenomics_abundance)){
acf(t(metagenomics_abundance[i,]))
}
Because most of them have only a high autocorrelation with itself or maximum 2 weeks, we will use a lag of 2 for the blocks used in the bootstrap.
The functions wTO.fast()
and wTO.Complete()
are able to accomodate the lag parameter, therefore, they will be used here.
Meta_fast = wTO.fast(Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 'p', sign = 'sign', n = 250, method_resampling = 'BlockBootstrap', lag = 2)
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
Meta_Complete = wTO.Complete(k = 1, n = 250, Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 's' , method_resampling = 'BlockBootstrap', lag = 2 )
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
From the expression data-sets, we are able to draw a Consensus Network. For that, the function wTO.Consensus()
can be used. This function works in a special way, that the user should pass a list of data.frames containing the Nodes names and the wTO and p-values. We show an example above.
Let’s calculate the networks the same way we did in the Section Genomic data.
wTO_Data1 = wTO.fast(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', n = 250)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
wTO_Data2 = wTO.fast(Data = Microarray_Expression2, Overlap = ExampleGRF$x, method = 'p', n = 250)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!
Now, let’s combine both networks in one Consensus Network.
CN_expression = wTO.Consensus(data = list (wTO_Data1 = data.frame
(Node.1 = wTO_Data1$Node.1,
Node.2 = wTO_Data1$Node.2,
wTO = wTO_Data1$wTO,
pval = wTO_Data1$pval)
, wTO_Data2C = data.frame
(Node.1 = wTO_Data2$Node.1,
Node.2 = wTO_Data2$Node.2,
wTO = wTO_Data2$wTO,
pval = wTO_Data2$pval)))
#> Joining by: Node.1, Node.2
#> Joining by: Node.1, Node.2
#> Joining by: ID
#> Total common nodes: 168
Or using the wTO.Complete()
:
wTO_Data1C = wTO.Complete(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', n = 250, k = 5)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
wTO_Data2C = wTO.Complete(Data = Microarray_Expression2, Overlap = ExampleGRF$x, method = 'p', n = 250, k = 5)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!
Now, let’s combine both networks in one Consensus Network.
CN_expression = wTO.Consensus(data = list (wTO_Data1C = data.frame
(Node.1 = wTO_Data1C$wTO$Node.1,
Node.2 = wTO_Data1C$wTO$Node.2,
wTO = wTO_Data1C$wTO$wTO_sign,
pval = wTO_Data1C$wTO$pval_sig), wTO_Data2C = data.frame
(Node.1 = wTO_Data2C$wTO$Node.1,
Node.2 = wTO_Data2C$wTO$Node.2,
wTO = wTO_Data2C$wTO$wTO_sign,
pval = wTO_Data2C$wTO$pval_sig)))
#> Joining by: Node.1, Node.2
#> Joining by: Node.1, Node.2
#> Joining by: ID
#> Total common nodes: 168
head(CN_expression)
#> Node.1 Node.2 CN pval.fisher
#> 1 ZNF333 ZNF28 -0.1191288 0.06006662
#> 2 ZNF333 ANKRD22 -0.1400000 0.13016905
#> 3 ZNF333 ZFR -0.1091659 0.10147819
#> 4 ZNF333 TRIM33 -0.0240000 0.11707440
#> 5 ZNF333 RIMS3 -0.2798101 0.10797662
#> 6 ZNF333 ZNF595 0.1542850 0.11529832
The wTO
package also includes an interactive visualization tool that can be used to inspect the results of the wTO netwoks or Consensus Network.
The arguments given to this function are the Nodes names, its wTO and p-values. Optionals are the cutoffs that can be applied to the p-value or to the wTO value. We highly reccomend using both by subseting the data previous to the visualization. The layout of the network can be also chosen from a variety that are implemented in igraph package, for the the Make_Cluster argument many clustering algorithms that are implemented in igraph can be used. The final graph can be exported as an html
or as png
.
Visualization = NetVis(Node.1 = CN_expression$Node.1,
Node.2 = CN_expression$Node.2,
wTO = CN_expression$CN,
pval = CN_expression$pval.fisher, cutoff = list(kind = 'pval', value = 0.001), MakeGroups = 'louvain', layout = 'layout_components')
#> Joining by: id
CN_expression_filtered = subset(CN_expression, abs(CN_expression$CN)> 0.4 & CN_expression$pval.fisher < 0.0001)
dim(CN_expression_filtered)
#> [1] 45 4
Visualization2 = NetVis(
Node.1 = CN_expression_filtered$Node.1,
Node.2 = CN_expression_filtered$Node.2,
wTO = CN_expression_filtered$CN,
pval = CN_expression_filtered$pval.fisher,
cutoff = list(kind = 'pval', value = 0.001),
MakeGroups = 'louvain',
layout = 'layout_components', path = 'Vis.html')
#> Joining by: id
#> Vis.html
sessionInfo()
#> R version 3.4.4 (2018-03-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] magrittr_1.5 wTO_1.6.1
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.17 visNetwork_2.0.3 digest_0.6.15
#> [4] rprojroot_1.3-2 plyr_1.8.4 jsonlite_1.5
#> [7] backports_1.1.2 evaluate_0.10.1 som_0.3-5.1
#> [10] stringi_1.2.2 reshape2_1.4.3 data.table_1.11.4
#> [13] rmarkdown_1.10 tools_3.4.4 stringr_1.3.1
#> [16] htmlwidgets_1.2 igraph_1.2.1 parallel_3.4.4
#> [19] yaml_2.1.19 compiler_3.4.4 pkgconfig_2.0.1
#> [22] htmltools_0.3.6 knitr_1.20