The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Cell type classification with SignacX: PBMCs from 10X Genomics

Mathew Chamberlain

2021-11-18

This vignette shows how to use Signac with Seurat. There are three parts: Seurat, Signac and then visualization. We use an example PBMCs scRNA-seq data set from 10X Genomics. First, we set up a folder for holding the Signac reference data:

Seurat

Start with the standard pre-processing steps for a Seurat object.

library(Seurat)

Download data from 10X Genomics.

dir.create("fls")
download.file("https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_filtered_feature_bc_matrix.h5", 
    destfile = "fls/pbmc_1k_v3_filtered_feature_bc_matrix.h5")

Create a Seurat object, and then perform SCTransform normalization. Note:

# load data
E = Read10X_h5(filename = "fls/pbmc_1k_v3_filtered_feature_bc_matrix.h5")
pbmc <- CreateSeuratObject(counts = E, project = "pbmc")

# run sctransform
pbmc <- SCTransform(pbmc, verbose = FALSE)

Perform dimensionality reduction by PCA and UMAP embedding. Note:

# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)

SignacX

First, make sure you have the Signac package installed.

install.packages("SignacX")

Load the library

# load library
library(SignacX)

Generate SignacX labels for the Seurat object. Note:

# Run Signac
labels <- Signac(pbmc, num.cores = 4)
celltypes = GenerateLabels(labels, E = pbmc)

Sometimes, training the neural networks takes a lot of time. To make Signac faster, we implemented SignacFast which uses an ensemble of pre-trained neural network models. Note:

# Run Signac
labels_fast <- SignacFast(pbmc)
celltypes_fast = GenerateLabels(labels_fast, E = pbmc)

SignacFast took only ~30 seconds. Relative to Signac, the main difference is that SignacFast tends to leave a few more cells “Unclassified.”

How does SignacFast compare to Signac?
B MPh TNK Unclassified
B 186 0 0 0
MPh 0 362 0 54
TNK 0 0 573 3
Unclassified 0 0 0 44

Visualizations

Now we can visualize the cell type classifications at many different levels: Immune and nonimmune

pbmc <- AddMetaData(pbmc, metadata = celltypes$Immune, col.name = "immmune")
pbmc <- SetIdent(pbmc, value = "immmune")
png(filename = "fls/plot1.png")
DimPlot(pbmc)
dev.off()

Immune, Nonimmune (if any) and unclassified cells

pbmc <- AddMetaData(pbmc, metadata = celltypes$L2, col.name = "L2")
pbmc <- SetIdent(pbmc, value = "L2")
png(filename = "fls/plot2.png")
DimPlot(pbmc)
dev.off()

Myeloid and lymphocytes

lbls = factor(celltypes$CellTypes)
levels(lbls) <- sort(unique(lbls))
pbmc <- AddMetaData(pbmc, metadata = lbls, col.name = "celltypes")
pbmc <- SetIdent(pbmc, value = "celltypes")
png(filename = "./fls/plot3.png")
DimPlot(pbmc)
dev.off()

Cell types

pbmc <- AddMetaData(pbmc, metadata = celltypes$CellTypes_novel, col.name = "celltypes_novel")
pbmc <- SetIdent(pbmc, value = "celltypes_novel")
png(filename = "./fls/plot4.png")
DimPlot(pbmc)
dev.off()

Cell types with novel populations

pbmc <- AddMetaData(pbmc, metadata = celltypes$CellStates, col.name = "cellstates")
pbmc <- SetIdent(pbmc, value = "cellstates")
png(filename = "./fls/plot5.png")
DimPlot(pbmc)
dev.off()

Cell states

Identify differentially expressed genes between cell types. Here, we see that Signac identified two novel cell populations that are positive for platelet and plasma cell markers, respectively.

pbmc <- SetIdent(pbmc, value = "celltypes_novel")

# Find protein markers for all clusters, and draw a heatmap
markers <- FindAllMarkers(pbmc, only.pos = TRUE, verbose = F, logfc.threshold = 1)
library(dplyr)
top5 <- markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_logFC)
png(filename = "./fls/plot6.png")
DoHeatmap(pbmc, features = unique(top5$gene), angle = 90)
dev.off()

Immune marker genes

Save results

saveRDS(pbmc, file = "fls/pbmcs_signac.rds")
saveRDS(celltypes, file = "fls/celltypes.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast.rds")
Session Info
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.3    magrittr_2.0.1    formatR_1.7       htmltools_0.5.1.1
##  [5] tools_4.0.3       yaml_2.2.1        stringi_1.5.3     rmarkdown_2.6    
##  [9] highr_0.8         knitr_1.30        stringr_1.4.0     digest_0.6.27    
## [13] xfun_0.20         rlang_0.4.10      evaluate_0.14

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.