The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
R Package for Information-incorporated Gene Network Construction with FDR Control
Hao Wang, Yumou Qiu and Peng Liu.
[haydo.wang@outlook.com] (Hao Wang)
Wang, H., Qiu, Y.*, Guo, H., Yin, Y., Liu, P.*, 2024. Information-incorporated Gene Network Construction with FDR Control. Under review.
# R version is required >= 3.4.4
# When the first time to use the package, please make sure dependent packages are installed under your R environment, if not, please use commands below to install
> #install.packages("tidyverse")
> #install.packages("glmnet")
> #install.packages("mvtnorm")
> #install.packages("igraph")
> #install.packages("Matrix")
# install "devtools" package in your R environment
> # devtools::install_github("HaoWang47/PCGII")
> library(PCGII)
> library(corpcor)
> library(glmnet)
> library(igraph)
> library(Matrix)
> library(mvtnorm)
> library(tidyverse)
This is a tutorial script for researchers who are interested in
applying PCGII on omics data to learn the direct association structure
of omics features. The main function PGCII()
takes a
biologically pre-processed expression data matrix as input, and returns
a list of statistics (estimates and test statistics). The function
inference()
takes a list returned by PGCII()
as input and conduct simultaneous test to identify significant partial
correlations with False Discovery Rate (FDR) controlled at a
pre-determined nominal level (0.05 by default).
PCGII()
df
: the main expression data, an \(n\) by \(p\) matrix/dataframe, in which each row
corresponds to a sample and each column represents expression/abundance
of an omics feature;prior
: the prior set, a \(k\) by \(2\) dataframe, in which each row
corresponds to a pair of nodes (any omics features) that are connected
under prior belief. Note, prior input has to be dataframe with column
names “row” and “col”;lambda
: the regularization parameter, used in the
node-wise regression. If missing, default lambda will be used which is
at the order of sqrt(2log(p/sqrt(n))/n).Est
: estimated partial correlation matrix;EstThresh
: sparse partial correlation estimation matrix
with threshold;kappa
: estimated ratio of forth and squared second
moment of residuals, please refer to the manuscript for details;tscore
: estimated test statistics matrix of partial
correlations;n
: sample size;p
: number of genes under study.Inference()
list
: a list returned by either PCGII()
or
clevel()
.alpha
: pre-determined False Discovery Rate. Nominal FDR
is set at 0.05 by default.Simulate data \(X\) from a scale-free network \(g\).
> # Simulating data
> set.seed(1234567)
> n=50 # sample size
> p=30 # number of nodes
>
> omega=make_random_precision_mat(eta=.01, p=p)
>
> Sigma=solve(omega) # population covariance matrix, which is used to generate data
> X = rmvnorm(n = n, sigma = Sigma) # simulate expression data
Network analysis of data matrix X
.
> # determine tuning parameter: fixed lambda
> lam=2*sqrt(log(p)/n)
>
> # create prior set: directed prior network
> prior_set=matrix(data=c(6,5, 28,14), nrow=2, ncol=2, byrow = TRUE)
> colnames(prior_set)=c("row", "col")
> PCGII_out=PCGII(df=X, prior=as.data.frame(prior_set), lambda = lam)
> inference_out=inference(list=PCGII_out)
> diag(inference_out)=0
> # Visualization
> inference_out %>%
+ graph_from_adjacency_matrix(mode = "undirected") %>%
+ plot(vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)
>
> # create prior set: undirected prior network
> PCGII_out=PCGII(df=X, prior=undirected_prior(prior_set), lambda = lam)
> inference_out=inference(list=PCGII_out)
> diag(inference_out)=0
> # Visualization
> inference_out %>%
+ graph_from_adjacency_matrix(mode = "undirected") %>%
+ plot(vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.