Here, we developed a software package (SMDIC) to automated identification of Somatic Mutation-Driven Immune Cell. The operation modes including: i) inferring the relative abundance matrix of tumor-infiltrating immune cells and integrating it with a particular gene mutation status, ii) detecting differential immune cells with respect to the gene mutation status and converting the abundance matrix of significant differential immune cell into two binary matrices (one for up-regulated and one for down-regulated cells), iii) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and iv) visualization of immune cell abundance of samples in different mutation status. Its capabilities enable SMDIC to identify somatic mutation-specific immune cell response. SMDIC may contribute to find neoantigens to facilitate the development of tailored immunotherapy for patients with cancer.
This vignette illustrates how to easily use the SMDIC package. With the use of functions in this packages, users could identify the immune cells driven by somatic mutations in tumor microenvironment.
This package provides the exp2cell
function use gene expression profile to quantify cell abundance matrix.
This package provides the maf2matrix
function use mutation annotation file (MAF) format data to build a binary mutations matrix.
This package provides the mutcorcell
function to identify the immune cells driven by somatic mutations in tumor microenvironment .
This package provides the plotwaterfall
function to plot the waterfall for mutation genes which drive immune cells.
This package provides the plotinteractions
function to plot the co-occurrence and mutual exclusivity plots for mutation genes which drive immune cells.
This package provides the heatmapcell
function to draw clustered heatmaps for the cells driven by a somatic mutation.
This package provides the survcell
function to draw Kaplan–Meier curves for survival in the above-median and below-median groups for cell risk score.
We can use function GetExampleData
to return example data and environment variables.
library(SMDIC)
library(GSVA)
exp.example<-GetExampleData("exp.example") # obtain example expression data
cellmatrix.example<-exp2cell(exp.example,method="ssGSEA") #Cell abundance matrix,method must be one of "xCell","ssGSEA" and "CIBERSORT".
#> Warning in .local(expr, gset.idx.list, ...): 659 genes with constant
#> expression values throuhgout the samples.
#view first six rows and six colmns of cell infiltration score matrix.
cellmatrix.example[1:6,1:6]
#> TCGA.A7.A13G.01 TCGA.A7.A26E.01 TCGA.A1.A0SQ.01
#> aDC -0.1721896 -0.2365829 -0.09769543
#> B cells -0.1785598 -0.2060056 -0.21060641
#> CD8 T cells 0.2603276 0.2939651 0.32180621
#> Cytotoxic cells -0.1710623 -0.1744474 -0.13125948
#> DC -0.2110966 -0.2248506 -0.16642146
#> Eosinophils 0.1456574 0.1297275 0.16402532
#> TCGA.GI.A2C9.11 TCGA.E9.A1NE.01 TCGA.A2.A0SX.01
#> aDC -0.11102476 0.30829474 0.255985640
#> B cells -0.15950503 -0.06773615 -0.007057109
#> CD8 T cells 0.34343045 0.42561320 0.337348145
#> Cytotoxic cells 0.04400304 0.29537554 0.125879324
#> DC 0.16974204 -0.01435208 0.109405174
#> Eosinophils 0.12760198 0.13450722 0.068610081
We extract the non-silent somatic mutations (nonsense mutation, missense mutation, frame-shif indels, splice site, nonstop mutation, translation start site, inframe indels) in protein-coding regions, and built a binary mutations matrix, in which 1 represents any mutation occurs in a particular gene in a particular sample, otherwise the element is 0. The genes with a given mutation frequency greater than a threshold value one precent as the default value) are retained for the following analysis.
maf <- system.file("extdata","example.maf.gz",package = "SMDIC") #get path of the mutation annotation file.
mutmatrix.example<-maf2matrix(maf)
head(mutmatrix.example)[,1:6]
#> TCGA.E9.A22G.01 TCGA.OL.A6VO.01 TCGA.AR.A1AP.01 TCGA.BH.A0AW.01
#> TP53 1 1 1 1
#> TP53BP2 0 0 0 0
#> TP53BP1 0 0 0 0
#> PCDH10 0 0 0 0
#> CDH1 0 0 0 0
#> CDH10 0 0 0 0
#> TCGA.E2.A1LG.01 TCGA.A2.A04U.01
#> TP53 1 1
#> TP53BP2 0 0
#> TP53BP1 0 0
#> PCDH10 1 0
#> CDH1 0 0
#> CDH10 0 0
Function mutcorcell
detects the differential immune cells with respect to a particular gene mutation status and construction of two binary matrices based on the abundance matrix of significant differential immune cell (one for up-regulated and one for down-regulated), identification of somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices.
#prepare data for following analysis.
cellmatrix<-GetExampleData("cellmatrix") # obtain example result from real rasult: cell abundance matrix from real data.
mutmatrix<-GetExampleData("mutmatrix")# select mutmatrix example result from real result: a binary mutations matrix
mutcell<-mutcorcell(cell = cellmatrix,mutmatrix = mutmatrix,fisher.adjust = TRUE)
result mutcell include three matrices:mut_cell is a binary numerical matrix which shows the immune cells driven by somatic mutation;mut_cell_p is a numerical matrix which shows pvalue of the immune cells driven by somatic mutation;mut_cell_fdr is a numerical matrix which shows fdr of the immune cells driven by somatic mutation.
#view first ten rows and six colmns of mutcell matrix.
mutcell$mut_cell[1:6,1:6]
#> Adipocytes Astrocytes CD4+ Tcm CD8+ naive T-cells CD8+ T-cells
#> TP53 0 1 0 1 0
#> CDH1 1 0 1 0 0
#> WDR87 0 0 0 0 0
#> NBPF14 0 0 0 0 0
#> KCNA4 0 0 0 0 1
#> FBXW7 0 0 0 0 0
#> CD8+ Tcm
#> TP53 0
#> CDH1 0
#> WDR87 1
#> NBPF14 0
#> KCNA4 1
#> FBXW7 0
#mutcell$mut_cell_p
#mutcell$mut_cell_fdr
Function mutcellsummary
produces result summaries of the results of function mutcorcell
function.
summary<-mutcellsummary(mutcell =mutcell,mutmatrix = mutmatrix,cellmatrix = cellmatrix)# The summary have four columns.The first column are gene names,the second column are the cells driven by the gene,the third column are the number of cells driven by the gene,the fourth column are mutation rates of gene.
head(summary)
#> gene
#> 1 TP53
#> 2 CDH1
#> 3 NBPF14
#> 4 KCNA4
#> 5 FBXW7
#> 6 WDR87
#> cells
#> 1 Astrocytes,CD8+ naive T-cells,Keratinocytes,Macrophages,Macrophages M1,Melanocytes,NK cells,pDC,Pericytes,pro B-cells,Sebocytes,Tgd cells,Th1 cells,Th2 cells,Tregs
#> 2 Adipocytes,CD4+ Tcm,Chondrocytes,CMP,Endothelial cells,Fibroblasts,HSC,ly Endothelial cells,Megakaryocytes,Mesangial cells,mv Endothelial cells,NKT
#> 3 CD8+ Tem,DC,iDC,Keratinocytes,pro B-cells,Sebocytes
#> 4 CD8+ T-cells,CD8+ Tcm,NK cells,pDC
#> 5 MEP,Th1 cells,Th2 cells
#> 6 CD8+ Tcm,NK cells
#> count mut rate
#> 1 15 0.34394251
#> 2 12 0.12936345
#> 3 6 0.01334702
#> 4 4 0.01129363
#> 5 3 0.01642710
#> 6 2 0.01334702
Function gene2cellsummary
produces result summaries of the immune cells driven by a somatic mutation.
gene2cellsummary(gene="TP53",method="xCell",mutcell = mutcell) #a matrix shows the short name, full name, pvalue, fdr of the cells driven by a somatic mutation
#> gene cell name
#> Astrocytes "TP53" "Astrocytes"
#> CD8+ naive T-cells "TP53" "CD8+ naive T-cells"
#> Keratinocytes "TP53" "Keratinocytes"
#> Macrophages "TP53" "Macrophages"
#> Macrophages M1 "TP53" "Macrophages M1"
#> Melanocytes "TP53" "Melanocytes"
#> NK cells "TP53" "NK cells"
#> pDC "TP53" "pDC"
#> Pericytes "TP53" "Pericytes"
#> pro B-cells "TP53" "pro B-cells"
#> Sebocytes "TP53" "Sebocytes"
#> Tgd cells "TP53" "Tgd cells"
#> Th1 cells "TP53" "Th1 cells"
#> Th2 cells "TP53" "Th2 cells"
#> Tregs "TP53" "Tregs"
#> Full name pvalue
#> Astrocytes "Astrocytes" "0.000576847321847722"
#> CD8+ naive T-cells "CD8+ naive T-cells" "0.0207077108310764"
#> Keratinocytes "Keratinocytes" "2.81193482643837e-07"
#> Macrophages "Macrophages" "1.31914206783442e-07"
#> Macrophages M1 "Macrophages M1" "7.10462234101335e-09"
#> Melanocytes "Melanocytes" "0.000676773784821099"
#> NK cells "NK cells" "0.0280379829917283"
#> pDC "Plasmacytoid dendritic cells" "0.000210994256282543"
#> Pericytes "Pericytes" "0.282545533851344"
#> pro B-cells "pro B-cells" "0.00460846572401249"
#> Sebocytes "Sebocytes" "1.51070589090759e-09"
#> Tgd cells "Gamma delta T-cells" "0.101167582186044"
#> Th1 cells "Type 1 T-helper cells" "2.78201500461677e-07"
#> Th2 cells "Type 2 T-helper cells" "8.8599596664488e-13"
#> Tregs "Regulatory T-cells" "0.00266066364119057"
#> fdr
#> Astrocytes "1.51701337800958e-06"
#> CD8+ naive T-cells "0.0196735352426572"
#> Keratinocytes "2.15581670026942e-06"
#> Macrophages "0.00138653368414243"
#> Macrophages M1 "1.08937542562205e-07"
#> Melanocytes "0.0449933285137109"
#> NK cells "0.00301325632695189"
#> pDC "0.0101992106245639"
#> Pericytes "0.00301325632695189"
#> pro B-cells "0.0163068787157365"
#> Sebocytes "3.47462354908745e-08"
#> Tgd cells "0.0101992106245639"
#> Th1 cells "2.15581670026942e-06"
#> Th2 cells "4.07558144656645e-11"
#> Tregs "0.00311315941017706"
Function heatmapcell
provides visualization of the relative abundance of identified immune cells in the gene mutation status using heat maps
library(pheatmap)
heatmapcell(gene = "TP53",mutcell = mutcell,cellmatrix = cellmatrix,mutmatrix = mutmatrix)
Function plotwaterfal
and plotinteractions
provides visualization of the mutation genes correlated with immune cells using waterfall plot and mutually exclusive or co-occurring plot.
#file<-"dir" #dir must be an absolute path or the name relatived to the current working directory.
#mutoncoplot(maffile = file,mutcell.summary = summary,cellnumcuoff =0)
#mutinteractions(maffile = file,mutcell.summary = summary,cellnumcuoff =0)
Function survcell
draws Kaplan–Meier curves for survival in the above-median and below-median groups for cell risk score. The cell risk score is calaulated by the weighted mean of cells driven by a gene mutation, where the weight of cells is estimated by the “Univariate” or “Multivariate” cox.
mutcell<-GetExampleData("mutcell") # The result of `mutcorcell` function.
cellmatrix<-GetExampleData("cellmatrix") # Cell abundance matrix
surv<-GetExampleData("surv") # The survival data
survcell(gene ="TP53",mutcell=mutcell,cellmatrix=cellmatrix,surv=surv)