The SubtypeDrug package is a systematic biological tool to optimize cancer subtype-specific drugs. The main capabilities of this tool are as follows:
1. Extracting subpathways and drug subpathway association data. We use a topology-based analysis method to mine connected subpathways from the canonical biological pathways of the KEGG database. Next, integrating drug-induced transcriptome data in human cell lines with these subpathways construct drug subpathway association data.
2. Inferring subpathway activity profile. The program provides two methods GSVA and ssGSEA to convert the gene expression profile into the subpathway activity profile.
3. Estimating a normalized drug-disease reverse association score. In the subpathway activity profile, we quantify the individualized subpathway aberrance score by comparing it with accumulated normal samples. Then, we use pattern recognition to reversely associate the drug up-regulatory and down-regulatory subpathway with the ranked list of the subpathways of each sample. The inverse correlation strength score of each drug in each sample is normalized as the normalized drug-disease reverse association score.
4. Identification of cancer subtype-specific drugs. Taking the sutype-specific drug score (SDS) of each cancer subtype as a parameter, the significant cancer subtype-specific drugs are identified through sample perturbation.
5. Visual display of results content. We provide plotDrugStructure(), plotDScoreHeatmap(), plotDSpwHeatmap(), plotGlobalGraph() and plotSpwNetGraph() functions to clearly display the results through drug structure diagram, heat map, box diagram and network diagram, etc.
In addition, for samples with only cancer and normal types, SubtypeDrug can also identify cancer-related drugs. And the effect of drugs at different concentrations is also considered.
We used k-clique method in social network analysis to extract the subpathways from KEGG database and eliminated smaller subpathways with more than 80% overlap of the gene between subpathways that belong to the same pathway. The subpathway data is stored in a list structure. This process is similar to the psSubpathway
system we built earlier (Han et al. 2019).
CMap build 02 raw data is downloaded from the CMap website (Lamb et al. 2006). After normalizing gene expression profiles, differential expression of genes between treatment groups with drugs (distinguish different concentrations of the same drug) and control groups is calculated. For each drug and different concentrations of the same drug, the genes are ranked in a ordered list according to their differential expression. The drug subpathway association score is calculated by enriching gene tags of subpathway to the gene ordered list base on Kolmogorov-Smirnov (KS) like statistic. An empirical gene-based permutation test procedure is used to estimate the significance of drug subpathway association score. The greater positive or negative drug subpathway association score of a subpathway indicate that the drug activates or inhibits this subpathway more strongly. According to the above process, each drug has a table which has Subpathway ID, drug subpathway association score (DSAS), and significance P-value as columns. The table of all drugs is stored as a list we term drug subpathway association data (DrugSpwData).
Due to the large data, we construct an packet SubtypeDrugData
to store subpathway list data and drug subpathway association data.The package SubtypeDrugData
has been uploaded to the github repository (https://github.com/hanjunwei-lab/SubtypeDrugData), and can be downloaded and used by the following code:
## Download SubtypeDrugData package from GitHub.
require(devtools)
install_github("hanjunwei-lab/SubtypeDrugData",force = TRUE)
require(SubtypeDrugData)
## Get subpathway list data.
## If the gene expression profile contains gene Symbol.
data(SpwSymbolList)
## If the gene expression profile contains gene Entrezid.
data(SpwEntrezidList)
## Get drug subpathway association data.
data(DrugSpwData)
This section introduces the evaluation of normalized drug-disease reverse association score, and the identification of cancer subtype-specific drugs. These functions are mainly implemented by the function OCSSD()
. This function mainly requires four inputs: the gene expression profile, the Sample phenotype data, a list of subpathways and drug subpathway association data.
First, the function OCSSD()
infers the subpathway activity profile from the gene expression profile through the GSVA
(Hänzelmann, Castelo, and Guinney 2013) or ssGSEA
method (Barbie et al. 2009).
Next, the individualized subpathway aberrance score was estimated using the mean and standard deviation of the normal samples (Ahn et al. 2014). The estimated formula as follows: \[{\mathop{{Z}}\nolimits_{{ij}}=\frac{{\mathop{{Sub}}\nolimits_{{ij}}-mean{ \left( {S\mathop{{ub}}\nolimits_{{i,normal}}} \right) }}}{{stdev{ \left( {\mathop{{Sub}}\nolimits_{{i,normal}}} \right) }}}}\]
where \(Sub_{ij}\) is the activity value of the i th subpathway in the j th cancer sample and \(Sub_{i,normal}\) is the vector of activity value of the i th subpahtway in the normal samples. Individualized subpathway aberrance score \(Z\) denotes the expression status of the subpathway in each cancer sample relative to normal samples.
For cancer sample j, subpathways are ranked in descending order based on the individualized subpathway aberrance score to form the list \(L_j\) and we set \(q\) be the total number of subpathways. Function OCSSD()
provides two methods to estimate the drug-disease inverse association score:
\[{D\mathop{{}}\nolimits_{{1}}=max\mathop{{}}\nolimits_{{g=1}}^{{p}}{ \left[ {\frac{{g}}{{p}}-\frac{{V{ \left( {g} \right) }}}{{q}}} \right] }}\]
\[{\mathop{{D}}\nolimits_{{2}}=max\mathop{{}}\nolimits_{{g=1}}^{{p}}{ \left[ {\frac{{V{ \left( {g} \right) }}}{{q}}-\frac{{{ \left( {g-1} \right) }}}{{p}}} \right] }}\]
We set \(KS_d^{up}\)=\(D_{1}\), if \(D_{1}\)>\(D_{2}\) or \(KS_d^{up}\)=\(-D_{2}\), if \(D_{1}\)<\(D_{2}\). Like the above process, \(KS_d^{down}\) is also calculated and the drug-disease reverse association score of the \(d\) th drug in the \(j\) th cancer sample is \(S_{dj}=KS_d^{up}-KS_d^{down}\).
Finally, the normalized drug-disease reverse association score (\(NS\)) is defined as \(S_{dj}/|max(S_{d})|\) where \(S_{dj}>0\), or \(S_{dj}/|min(S_{d})|\) where \(S_{dj}<0\). Through the above method, we further convert the gene expression profile into a normalized drug-disease reverse association score matrix \(M=\{NS_{dj}\}\) (the rows are drugs and the columns are cancer samples).
For a given drug and the t th cancer subtype, the sutype-specific drug score (\(SDS\)) is estimated as follows:
\[{SDS\mathop{{}}\nolimits_{{t}}=\frac{{1}}{{\mathop{{ \beta }}\nolimits_{{t}}}}{\mathop{ \sum }\limits_{{j \in t}}{NS\mathop{{}}\nolimits_{{j}}}}}\]
where, \(\beta_{t}\) is the number of samples in the t th cancer subtype, \(NS_j\) is the normalized drug-disease reverse association score of j th cancer sample for the t th cancer subtype. The greater the negative \(SDS\) indicates the drug has stronger the potential therapeutic effect on this cancer subtype.
For the purpose of identifying significative cancer subtype-specific drugs, we assess the significance of the \(SDS\) by using an empirical sample-based permutation test procedure and pool the permuted \(SDS\) of all cancer subtype into one null distribution \(SDS^*\) and \(N\) is the number of elements in \(SDS^*\). The two-sided p value is estimated as: \[ Pvalue_t=\frac{\#\{SDS^*\mid|SDS^*|\ge|SDS_t|\}}{N} \]
For the gene expression data that only have two sample type of cancer and healthy control, the list of subpathways is arranged in descending order according to the difference value of the subpathway activity between cancer and normal groups in the subpathway activity profile. Subsequently, the extent of the drug`s effect on cancer is evaluated by enriching the up- and down-regulated subpathways of drug regulation into the ordered list of subpathway. Through an empirical subpathway-based permutation test, cancer-related drugs are identified.
Taking the simulative breast cancer data as an example, breast cancer-related and subtype-specific drug identification and visualization are as follows:
require(GSVA)
#> Loading required package: GSVA
require(parallel)
#> Loading required package: parallel
## Get simulated breast cancer gene expression profile data.
Geneexp<-get("Geneexp")
## Obtain sample subtype data and calculate breast cancer subtype-specific drugs.
Subtype_labels<-system.file("extdata", "Subtype_labels.cls", package = "SubtypeDrug")
# Identify breast subtype-specific drugs.
Subtype_drugs<-OCSSD(Geneexp,Subtype_labels,"Control",SpwSymbolList,
input.drug.data=DrugSpwData,parallel.sz=1)
## Results display.
str(Subtype_drugs)
#> List of 8
#> $ Basal :'data.frame': 8 obs. of 6 variables:
#> ..$ Drug : chr [1:8] "deferoxamine(6e-06M)" "pirenperone(1.02e-05M)" "atovaquone(1.1e-05M)" "piroxicam(1.2e-05M)" ...
#> ..$ Target_upregulation : chr [1:8] "00010_1 00010_5 00010_6 00030_2 00051_1 00052_2 00230_5 00500_3 00520_4 00520_7 00561_1 00563_1 00860_6 04015_6"| __truncated__ "00020_4 00100_6 00100_7 00510_5 00510_8 04010_11 04010_15 04010_21 04010_22 04012_2 04022_1 04022_6 04137_2 041"| __truncated__ "00380_5 00980_10 00980_11 04014_3 04060_42 04060_45 04514_65 04610_7 04621_28 04922_2 04928_7 04928_17 04979_4 "| __truncated__ "00100_2 00100_6 00100_7 00350_6 00350_8 00450_2 00603_1 01524_2 04064_5 04144_6 04210_5 04210_7 04217_2 04350_5"| __truncated__ ...
#> ..$ Target_downregulation: chr [1:8] "04060_6 04621_13 04917_14 04928_18 05134_7 05142_3 05144_3 05152_11 05152_14 05160_12 05166_12 05202_7 05206_32"| __truncated__ "00140_10 00280_10 03015_3 04014_9 04060_27 04390_4 04925_5 05169_5 05210_5 05215_6 05226_21" "00010_5 00010_6 00100_2 00100_6 00270_3 00620_2 00790_8 03460_5 04010_3 04060_2 04060_44 04066_1 04068_6 04110_"| __truncated__ "00020_4 03015_1 04015_6 04022_7 04022_10 04261_5 04666_10 04720_4 04722_10 04742_8 04744_1 04910_7 04922_4 0492"| __truncated__ ...
#> ..$ SDS : num [1:8] -0.651 -0.667 -0.686 0.0577 -0.178 -0.152 0.268 0.223
#> ..$ Pvalue : num [1:8] 0 0 0.171 0.955 0.374 0.405 0.533 0.048
#> ..$ FDR : num [1:8] 0 0 0.649 0.996 0.77 0.788 0.859 0.407
#> $ Her2 :'data.frame': 8 obs. of 6 variables:
#> ..$ Drug : chr [1:8] "deferoxamine(6e-06M)" "pirenperone(1.02e-05M)" "atovaquone(1.1e-05M)" "piroxicam(1.2e-05M)" ...
#> ..$ Target_upregulation : chr [1:8] "00010_1 00010_5 00010_6 00030_2 00051_1 00052_2 00230_5 00500_3 00520_4 00520_7 00561_1 00563_1 00860_6 04015_6"| __truncated__ "00020_4 00100_6 00100_7 00510_5 00510_8 04010_11 04010_15 04010_21 04010_22 04012_2 04022_1 04022_6 04137_2 041"| __truncated__ "00380_5 00980_10 00980_11 04014_3 04060_42 04060_45 04514_65 04610_7 04621_28 04922_2 04928_7 04928_17 04979_4 "| __truncated__ "00100_2 00100_6 00100_7 00350_6 00350_8 00450_2 00603_1 01524_2 04064_5 04144_6 04210_5 04210_7 04217_2 04350_5"| __truncated__ ...
#> ..$ Target_downregulation: chr [1:8] "04060_6 04621_13 04917_14 04928_18 05134_7 05142_3 05144_3 05152_11 05152_14 05160_12 05166_12 05202_7 05206_32"| __truncated__ "00140_10 00280_10 03015_3 04014_9 04060_27 04390_4 04925_5 05169_5 05210_5 05215_6 05226_21" "00010_5 00010_6 00100_2 00100_6 00270_3 00620_2 00790_8 03460_5 04010_3 04060_2 04060_44 04066_1 04068_6 04110_"| __truncated__ "00020_4 03015_1 04015_6 04022_7 04022_10 04261_5 04666_10 04720_4 04722_10 04742_8 04744_1 04910_7 04922_4 0492"| __truncated__ ...
#> ..$ SDS : num [1:8] 0.196 0.474 -0.864 -0.699 -0.219 -0.183 0.174 0.0489
#> ..$ Pvalue : num [1:8] 0.289 0.0107 0 0 0.276 0.318 0.774 0.664
#> ..$ FDR : num [1:8] 0.655 0.386 0 0 0.648 0.682 0.893 0.858
#> $ LumA :'data.frame': 8 obs. of 6 variables:
#> ..$ Drug : chr [1:8] "deferoxamine(6e-06M)" "pirenperone(1.02e-05M)" "atovaquone(1.1e-05M)" "piroxicam(1.2e-05M)" ...
#> ..$ Target_upregulation : chr [1:8] "00010_1 00010_5 00010_6 00030_2 00051_1 00052_2 00230_5 00500_3 00520_4 00520_7 00561_1 00563_1 00860_6 04015_6"| __truncated__ "00020_4 00100_6 00100_7 00510_5 00510_8 04010_11 04010_15 04010_21 04010_22 04012_2 04022_1 04022_6 04137_2 041"| __truncated__ "00380_5 00980_10 00980_11 04014_3 04060_42 04060_45 04514_65 04610_7 04621_28 04922_2 04928_7 04928_17 04979_4 "| __truncated__ "00100_2 00100_6 00100_7 00350_6 00350_8 00450_2 00603_1 01524_2 04064_5 04144_6 04210_5 04210_7 04217_2 04350_5"| __truncated__ ...
#> ..$ Target_downregulation: chr [1:8] "04060_6 04621_13 04917_14 04928_18 05134_7 05142_3 05144_3 05152_11 05152_14 05160_12 05166_12 05202_7 05206_32"| __truncated__ "00140_10 00280_10 03015_3 04014_9 04060_27 04390_4 04925_5 05169_5 05210_5 05215_6 05226_21" "00010_5 00010_6 00100_2 00100_6 00270_3 00620_2 00790_8 03460_5 04010_3 04060_2 04060_44 04066_1 04068_6 04110_"| __truncated__ "00020_4 03015_1 04015_6 04022_7 04022_10 04261_5 04666_10 04720_4 04722_10 04742_8 04744_1 04910_7 04922_4 0492"| __truncated__ ...
#> ..$ SDS : num [1:8] 0.00174 0.102 -0.0792 -0.259 0.723 0.574 0.679 0.111
#> ..$ Pvalue : num [1:8] 0.993 0.586 0.997 0.561 0 0 0 0.332
#> ..$ FDR : num [1:8] 1 1 1 1 0 0 0 1
#> $ LumB :'data.frame': 8 obs. of 6 variables:
#> ..$ Drug : chr [1:8] "deferoxamine(6e-06M)" "pirenperone(1.02e-05M)" "atovaquone(1.1e-05M)" "piroxicam(1.2e-05M)" ...
#> ..$ Target_upregulation : chr [1:8] "00010_1 00010_5 00010_6 00030_2 00051_1 00052_2 00230_5 00500_3 00520_4 00520_7 00561_1 00563_1 00860_6 04015_6"| __truncated__ "00020_4 00100_6 00100_7 00510_5 00510_8 04010_11 04010_15 04010_21 04010_22 04012_2 04022_1 04022_6 04137_2 041"| __truncated__ "00380_5 00980_10 00980_11 04014_3 04060_42 04060_45 04514_65 04610_7 04621_28 04922_2 04928_7 04928_17 04979_4 "| __truncated__ "00100_2 00100_6 00100_7 00350_6 00350_8 00450_2 00603_1 01524_2 04064_5 04144_6 04210_5 04210_7 04217_2 04350_5"| __truncated__ ...
#> ..$ Target_downregulation: chr [1:8] "04060_6 04621_13 04917_14 04928_18 05134_7 05142_3 05144_3 05152_11 05152_14 05160_12 05166_12 05202_7 05206_32"| __truncated__ "00140_10 00280_10 03015_3 04014_9 04060_27 04390_4 04925_5 05169_5 05210_5 05215_6 05226_21" "00010_5 00010_6 00100_2 00100_6 00270_3 00620_2 00790_8 03460_5 04010_3 04060_2 04060_44 04066_1 04068_6 04110_"| __truncated__ "00020_4 03015_1 04015_6 04022_7 04022_10 04261_5 04666_10 04720_4 04722_10 04742_8 04744_1 04910_7 04922_4 0492"| __truncated__ ...
#> ..$ SDS : num [1:8] 0.111 0.0054 -0.496 -0.19 -0.0509 0.13 -0.0137 -0.448
#> ..$ Pvalue : num [1:8] 0.553 0.973 0.614 0.744 0.804 0.476 0.988 0
#> ..$ FDR : num [1:8] 0.794 0.988 0.826 0.868 0.894 0.75 0.995 0
#> $ DrugMatrix : num [1:8, 1:32] 0.181 0.265 -0.838 -0.202 -0.564 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:8] "deferoxamine(6e-06M)" "pirenperone(1.02e-05M)" "atovaquone(1.1e-05M)" "piroxicam(1.2e-05M)" ...
#> .. ..$ : chr [1:32] "TCGA-C8-A12M-01A" "TCGA-AO-A03L-01A" "TCGA-AO-A0J8-01A" "TCGA-A2-A0T4-01A" ...
#> $ SubpathwayMatrix : num [1:78, 1:40] 0.5495 0.0767 0.1824 -0.1241 0.5582 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:78] "00020_4" "00100_6" "00100_7" "00510_5" ...
#> .. ..$ : chr [1:40] "TCGA-BH-A0BZ-11A" "TCGA-BH-A1FR-11B" "TCGA-A7-A0DB-11A" "TCGA-A7-A0CH-11A" ...
#> $ SampleInformation:'data.frame': 40 obs. of 2 variables:
#> ..$ sampleId : chr [1:40] "TCGA-BH-A0BZ-11A" "TCGA-BH-A1FR-11B" "TCGA-A7-A0DB-11A" "TCGA-A7-A0CH-11A" ...
#> ..$ sampleSubtype: chr [1:40] "Control" "Control" "Control" "Control" ...
#> $ Parameter :List of 11
#> ..$ control.label : chr "Control"
#> ..$ spw.min.sz : num 1
#> ..$ spw.max.sz : num Inf
#> ..$ spw.score.method : chr "gsva"
#> ..$ kcdf : chr "Gaussian"
#> ..$ drug.p.val.threshold: num 0.05
#> ..$ drug.spw.min.sz : num 1
#> ..$ drug.spw.max.sz : num Inf
#> ..$ weighted.drug.score : logi TRUE
#> ..$ nperm : num 1000
#> ..$ parallel.sz : num 0
The OCSSD() function can also be used to identify breast cancer-related drugs in only two types of samples: breast cancer and normal.
Cancer_normal_labels<-system.file("extdata", "Cancer_normal_labels.cls", package = "SubtypeDrug")
Disease_drugs<-OCSSD(Geneexp,Cancer_normal_labels,"Control",SpwSymbolList,input.drug.data=DrugSpwData,
parallel.sz=1)
The function OCSSD() can also support user-defined data.
## User-defined drug regulation data should resemble the structure below.
UserDS<-get("UserDS")
str(UserDS[1:5])
#> List of 5
#> $ 1,5-isoquinolinediol(1e-04M):List of 2
#> ..$ Target_upregulation : chr [1:64] "00140_10" "00450_2" "00590_2" "00830_2" ...
#> ..$ Target_downregulation: chr [1:76] "00030_2" "00071_1" "00100_2" "00100_5" ...
#> $ 2-deoxy-D-glucose(0.01M) :List of 2
#> ..$ Target_upregulation : chr [1:68] "00250_1" "01524_2" "03013_1" "03015_4" ...
#> ..$ Target_downregulation: chr [1:43] "00051_1" "00100_6" "00230_1" "00240_8" ...
#> $ NA : NULL
#> $ NA : NULL
#> $ NA : NULL
## Need to load gene set data consistent with drug regulation data.
UserGS<-get("UserGS")
str(UserGS[1:5])
#> List of 5
#> $ 00140_10: chr [1:49] "CYP11A1" "HSD3B1" "HSD3B2" "CYP17A1" ...
#> $ 00250_1 : chr [1:17] "NIT2" "ASNS" "NAT8L" "IL4I1" ...
#> $ 00030_2 : chr [1:28] "GPI" "DERA" "PRPS1L1" "PRPS1" ...
#> $ 00071_1 : chr [1:17] "ACADSB" "ACADS" "EHHADH" "HADH" ...
#> $ 00100_2 : chr [1:8] "EBP" "DHCR7" "SC5D" "DHCR24" ...
Drugs<-OCSSD(Geneexp,Subtype_labels,"Control",UserGS,input.drug.data=UserDS,parallel.sz=1)
require(pheatmap)
## Heat map of normalized disease-drug reverse association scores for all subtype-specific drugs.
plotDScoreHeatmap(data=Subtype_drugs,show.rownames = TRUE,show.colnames = FALSE)
## Plot only Basal subtype-specific drugs.
plotDScoreHeatmap(data=Subtype_drugs,subtype.label="Basal",show.rownames = TRUE,show.colnames = FALSE)
## Plot a heat map of the individualized activity aberrance scores of subpathway regulated by drug pirenperone(1.02e-05M).
## Basal-specific drugs pirenperone(1.02e-05M) regulated subpathways that show opposite activity from normal samples.
plotDSpwHeatmap(data=Subtype_drugs,drug.label="pirenperone(1.02e-05M)",subtype.label="Basal",show.colnames=FALSE)
## Plot a global graph of the Basal-specific drug pirenperone(1.02e-05M).
plotGlobalGraph(data=Subtype_drugs,drug.label="pirenperone(1.02e-05M)")
require(ChemmineR)
require(rvest)
## Plot the chemical structure of drug pirenperone(1.02e-05M).
plotDrugStructure(drug.label="pirenperone(1.02e-05M)")
Ahn, TaeJin, Eunjin Lee, Nam Huh, and Taesung Park. 2014. “Personalized Identification of Altered Pathways in Cancer Using Accumulated Normal Tissue Data.” Bioinformatics 30 (17): i422–i429.
Barbie, David A, Pablo Tamayo, Jesse S Boehm, So Young Kim, Susan E Moody, Ian F Dunn, Anna C Schinzel, et al. 2009. “Systematic Rna Interference Reveals That Oncogenic Kras-Driven Cancers Require Tbk1.” Nature 462 (7269): 108.
Han, Junwei, Xudong Han, Qingfei Kong, and Liang Cheng. 2019. “PsSubpathway: A Software Package for Flexible Identification of Phenotype-Specific Subpathways in Cancer Progression.” Bioinformatics.
Hänzelmann, Sonja, Robert Castelo, and Justin Guinney. 2013. “GSVA: Gene Set Variation Analysis for Microarray and Rna-Seq Data.” BMC Bioinformatics 14 (1): 7.
Lamb, Justin, Emily D Crawford, David Peck, Joshua W Modell, Irene C Blat, Matthew J Wrobel, Jim Lerner, et al. 2006. “The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease.” Science 313 (5795): 1929–35.