The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
We introduce a Pseudo-value Regression Approach for Network Analysis (PRANA) [1]. To our knowledge, this is the first attempt of utilizing a regression modeling for the differential network (DN) analysis by collective gene expression levels under two experimental conditions (\(\textit{e.g.}\) ‘current’ vs. ‘non-current smokers’ or ‘high-risk’ vs. ‘low-risk’). We start from the mutual information (MI) criteria, followed by pseudo-value calculations, which are then entered into a robust regrssion model.
Please install these R packages prior to use PRANA. Note that minet package is available in Bioconductor. Please see below for further instruction. Of important note (as of July 2024), dnapath R package is archived in CRAN as of May 2024, which no longer becomes a requirement for the installation of PRANA. Instead, run_aracne() function is available in our package as of version 1.0.5 to obtain mutual information (MI) estimate via ARACNE for network estimation step.
# library(dplyr)
# library(parallel) # To use mclapply() when reestimating the association matrix.
# library(robustbase) # To fit a robust regression
# library(minet) # To estimate network via ARACNE
# Please run the following lines to install minet package
# from Bioconductor in your R console:
#if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("minet")
In the original article [1], a real-data analysis was performed to showcase the utility of PRANA. This contains clinical and expression data for 406 samples and 28 COPD-related genes that were highlighted in a recent genome-wide association study [2]. The full data is available from the Gene Expression Database with accession number GSE158699 [3].
The main variable of our interest in this analysis is the current smoking status. We obtain the indices of subjects who are ‘current’ vs. ‘non-current smokers.’ These indices are used to dichotomize expression dataset into ‘current (Group B)’ and ‘non-current smokers (Group A).’ This is important as the we estimate the group-specific \(p \times p\) association matrices.
# A gene expression data part of the downloaded data.
rnaseqdat <- combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat <- as.data.frame(apply(rnaseqdat, 2, as.numeric))
# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat <- combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]
# Indices of non-current smoker (namely Group A)
index_grpA <- which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB <- which(combinedCOPDdat_RGO$currentsmoking == 1)
Once the data processing is done, we are off to apply the PRANA. In PRANA
function, please make sure you specify the expression and clinical data separately. Additionally, you will need to provide the indices for each group of a binary indicator variable that is deemed as a main predictor variable.
In this package, there are some additional R functions that can help you locating the name of genes that are significantly differentially connected (DC) and names and adjusted p-values via the empirical Bayes adjustment [4] for all variables and for a specific variable.
# This is useful when we want to create a table with adjusted p-values only.
adjpval(PRANAres)
#> currentsmoking packyrs age gender race FEV1perc
#> 10370 0.004 1.000 0.996 1 1.000 1
#> 10420 0.000 1.000 0.998 1 1.000 1
#> 1306 0.686 0.996 1.000 1 1.000 1
#> 155185 0.000 1.000 1.000 1 1.000 1
#> 158158 0.846 1.000 1.000 1 1.000 1
#> 1653 0.002 1.000 0.996 1 1.000 1
#> 1762 0.000 1.000 1.000 1 1.000 1
#> 23389 0.020 1.000 0.996 1 1.000 1
#> 253461 0.000 1.000 0.996 1 1.000 1
#> 26112 0.800 0.870 1.000 1 1.000 1
#> 27436 0.000 1.000 1.000 1 1.000 1
#> 3308 0.000 1.000 1.000 1 1.000 1
#> 3696 0.000 1.000 0.996 1 1.000 1
#> 374739 0.000 1.000 0.996 1 0.626 1
#> 3842 0.000 1.000 0.996 1 1.000 1
#> 406 0.000 1.000 0.996 1 1.000 1
#> 56986 0.000 1.000 0.996 1 1.000 1
#> 57188 0.000 1.000 1.000 1 1.000 1
#> 6239 0.462 1.000 1.000 1 1.000 1
#> 7067 0.000 0.998 0.996 1 0.330 1
#> 7871 0.000 1.000 0.996 1 1.000 1
#> 79961 0.004 1.000 1.000 1 1.000 1
#> 79991 0.000 1.000 0.996 1 1.000 1
#> 8224 0.000 1.000 0.996 1 1.000 1
#> 8853 0.000 1.000 0.996 1 1.000 1
#> 8870 0.000 1.000 0.996 1 1.000 1
#> 9258 0.000 1.000 0.996 1 0.994 1
#> 9686 0.846 0.972 0.996 1 1.000 1
Back to the description of the original article by Ahn \(\textit{et al.}\), the main interest is in the current smoking status to declare whether a gene is DC between current and non-current smokers at the user-specified significance level. Thus, we subset the vector of p-values for current smoking status from the adjpvaltab
object, aka an object with adjusted p-values and gene names using adjpval
function above.
# Create an object to keep the table with adjusted p-values using adjpval() function.
adjptab <- adjpval(PRANAres)
adjpval_specific_var
is a handy function to retrieve adjusted p-values with gene names for a single variable of interest.
# NOTE: Please do NOT forget to provide a name of variable with the quotation marks!
adjpval_specific_var(adjptab = adjptab, varname = "currentsmoking")
#> currentsmoking
#> 10370 0.004
#> 10420 0.000
#> 1306 0.686
#> 155185 0.000
#> 158158 0.846
#> 1653 0.002
#> 1762 0.000
#> 23389 0.020
#> 253461 0.000
#> 26112 0.800
#> 27436 0.000
#> 3308 0.000
#> 3696 0.000
#> 374739 0.000
#> 3842 0.000
#> 406 0.000
#> 56986 0.000
#> 57188 0.000
#> 6239 0.462
#> 7067 0.000
#> 7871 0.000
#> 79961 0.004
#> 79991 0.000
#> 8224 0.000
#> 8853 0.000
#> 8870 0.000
#> 9258 0.000
#> 9686 0.846
If you would like to quickly know which genes are significantly DC for your main binary grouping variable, please use sigDCGtab
to proceed. This will return a table with adjusted p-values and gene names.
# NOTE: Please do NOT forget to provide a name of variable with the quotation marks!
sigDCGtab <- sigDCGtab(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05)
sigDCGtab
#> currentsmoking
#> 10370 0.004
#> 10420 0.000
#> 155185 0.000
#> 1653 0.002
#> 1762 0.000
#> 23389 0.020
#> 253461 0.000
#> 27436 0.000
#> 3308 0.000
#> 3696 0.000
#> 374739 0.000
#> 3842 0.000
#> 406 0.000
#> 56986 0.000
#> 57188 0.000
#> 7067 0.000
#> 7871 0.000
#> 79961 0.004
#> 79991 0.000
#> 8224 0.000
#> 8853 0.000
#> 8870 0.000
#> 9258 0.000
Lastly, a function below, called sigDCGnames
, will be useful when you just want to return the names of the significantly DC genes from PRANA at the 0.05 significance level.
# NOTE: Please do NOT forget to provide a name of variable with the quotation marks!
sigDCGnames <- sigDCGnames(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05)
sigDCGnames
#> [1] "10370" "10420" "155185" "1653" "1762" "23389" "253461" "27436"
#> [9] "3308" "3696" "374739" "3842" "406" "56986" "57188" "7067"
#> [17] "7871" "79961" "79991" "8224" "8853" "8870" "9258"
As an additional step, we can use rename_genes
function to rename Entrez gene IDs into gene symbols. However, it is currently not available since dnapath
package is currently archived. A user may manually install the archived version of dnapath
from the CRAN, and follow the R code below.
[1] Ahn S, Grimes T, Datta S. (2023). A pseudo-value regression approach for differential network analysis of co-expression data. \(\textit{BMC Bioinformatics}\). 24(1), 8.
[2] Sakornsakolpat P, Prokopenko D, Lamontagne M, and et al. (2019). Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. \(\textit{Nature Genetics}\), 51(3), 494–505.
[3] Wang Z, Masoomi A, Xu Z, and et al. (2021). Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. \(\textit{PLoS Computational Biology}\), 17(10), e1009433.
[4] Datta S, Datta S. (2005). Empirical Bayes screening of many p-values with applications to microarray studies. \(\textit{Bioinformatics}\), 21(9), 1987–1994.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.