The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Benchmarking SignacFast with single cell flow-sorted data

Mathew Chamberlain

2021-11-18

This vignette shows how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from this publication.

Load data

Read the CEL-seq2 data.

ReadCelseq <- function(counts.file, meta.file) {
    E = suppressWarnings(readr::read_tsv(counts.file))
    gns <- E$gene
    E = E[, -1]
    E = Matrix::Matrix(as.matrix(E), sparse = TRUE)
    rownames(E) <- gns
    E
}

counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz"
meta.file = "./fls/celseq_meta.immport.723957.tsv"

E = ReadCelseq(counts.file = counts.file, meta.file = meta.file)
M = suppressWarnings(readr::read_tsv(meta.file))

# filter data based on depth and number of genes detected
kmu = Matrix::colSums(E != 0)
kmu2 = Matrix::colSums(E)
E = E[, kmu > 200 & kmu2 > 500]

# filter by mitochondrial percentage
logik = grepl("^MT-", rownames(E))
MitoFrac = Matrix::colSums(E[logik, ])/Matrix::colSums(E) * 100
E = E[, MitoFrac < 20]

Seurat

Start with the standard pre-processing steps for a Seurat object.

library(Seurat)

Create a Seurat object, and then perform SCTransform normalization. Note:

# load data
synovium <- CreateSeuratObject(counts = E, project = "FACs")

# run sctransform
synovium <- SCTransform(synovium)

Perform dimensionality reduction by PCA and UMAP embedding. Note:

# These are now standard steps in the Seurat workflow for visualization and clustering
synovium <- RunPCA(synovium, verbose = FALSE)
synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE)
synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)

SignacX

library(SignacX)

Generate Signac labels for the Seurat object. Note:

labels <- Signac(synovium, num.cores = 4)
celltypes = GenerateLabels(labels, E = synovium)

Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note:

# Run SignacFast
labels_fast <- SignacFast(synovium, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = synovium)

Compare results:

Celltypes:
B MPh NonImmune Plasma.cells TNK Unclassified
B 681 0 0 0 0 0
MPh 0 835 0 0 0 68
NonImmune 0 0 2487 0 0 0
Plasma.cells 0 0 0 263 0 6
TNK 0 0 0 0 1768 0
Unclassified 0 13 7 0 0 174
Cellstates:
B.memory B.naive Fibroblasts Macrophages Mon.Classical NonImmune Plasma.cells T.CD4.memory T.CD4.naive T.CD8.em T.CD8.naive T.regs Unclassified
B.memory 489 4 0 0 0 0 0 0 0 0 0 0 1
B.naive 4 184 0 0 0 0 0 0 0 0 0 0 0
DC 0 0 0 4 3 0 0 0 0 0 0 0 5
Fibroblasts 0 0 2110 0 0 136 0 0 0 0 0 0 0
Macrophages 0 0 0 662 33 1 0 0 0 0 0 0 73
Mon.Classical 0 0 0 23 93 0 0 0 0 0 0 0 1
NonImmune 0 0 74 0 0 166 0 0 0 0 0 0 2
Plasma.cells 0 0 0 0 0 1 259 0 0 0 0 0 8
T.CD4.memory 0 0 0 0 0 0 0 504 112 17 29 15 0
T.CD4.naive 0 0 0 0 0 0 0 0 309 4 18 0 1
T.CD8.em 0 0 0 0 1 0 0 7 4 574 1 0 2
T.CD8.naive 0 0 0 0 0 0 0 2 1 0 26 2 0
T.regs 0 0 0 0 0 0 0 0 27 1 2 106 0
Unclassified 0 0 1 12 3 5 0 0 0 1 0 0 179

Save results

saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds")
saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds")
Session Info
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.3    magrittr_2.0.1    formatR_1.7       htmltools_0.5.1.1
##  [5] tools_4.0.3       yaml_2.2.1        stringi_1.5.3     rmarkdown_2.6    
##  [9] highr_0.8         knitr_1.30        stringr_1.4.0     digest_0.6.27    
## [13] xfun_0.20         rlang_0.4.10      evaluate_0.14

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.