Signac and SPRING: Learning CD56 NK cells from multi-modal analysis of CITE-seq PBMCs from 10X Genomics

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Mathew Chamberlain

2021-11-18

This vignette shows how to use SignacX with Seurat and SPRING to learn a new cell type category from single cell data.

Load data

We start with CITE-seq data that were already classified with SignacX using the SPRING pipeline.

library(Seurat)
library(SignacX)

Load CITE-seq data from 10X Genomics processed with SPRING and classified with SignacX already.

# load CITE-seq data
data.dir = './CITESEQ_EXPLORATORY_CITESEQ_5K_PBMCS/FullDataset_v1_protein'
E = CID.LoadData(data.dir = data.dir)

# Load labels
json_data = rjson::fromJSON(file=paste0(data.dir,'/categorical_coloring_data.json'))

Create a Seurat object for the protein expression data; we will use this as a reference.

# separate protein and gene expression data
logik = grepl("Total", rownames(E))
P = E[logik,]
E = E[!logik,]

# CLR normalization in Seurat
colnames(P) <- 1:ncol(P)
colnames(E) <- 1:ncol(E)
reference <- CreateSeuratObject(E)
reference[["ADT"]] <- CreateAssayObject(counts = P)
reference <- NormalizeData(reference, assay = "ADT", normalization.method = "CLR")

Identify CD56 bright NK cells based on protein expression data.

# generate labels 
lbls = json_data$CellStates$label_list
lbls[lbls != "NK"] = "Unclassified"
CD16 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD16-TotalSeqB-CD16",]
CD56 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD56-TotalSeqB-CD56",]
logik = log2(CD56) > 10 & log2(CD16) < 7.5 & lbls == "NK"; sum(logik)
lbls[logik] = "NK.CD56bright"

SignacX

Generate a training data set from the reference data and save it for later use. Note:

SignacBoot performs feature selection, bootstrapping, imputation and normalization to derive a training data set from single cell data.

# generate bootstrapped single cell data
R_learned = SignacBoot(E = E, spring.dir = data.dir, L = c("NK", "NK.CD56bright"), labels = lbls, logfc.threshold = 1)

# save the training data
save(R_learned, file = "training_NKBright_v207.rda")

Classify a new data set with the model

Load expression data for a different data set (this was also previously processed through SPRING and SignacX)

# Classify another data set with new model
# load new data
new.data.dir = "./PBMCs_5k_10X/FullDataset_v1"
E = CID.LoadData(data.dir = new.data.dir)
# load cell types identified with Signac
json_data = rjson::fromJSON(file=paste0(new.data.dir,'/categorical_coloring_data.json'))

Generate new labels. Note:

Signac trains an ensemble of 100 neural network classifiers using the new training data set built above (R_learned), and then classifies unseen data (E).

# generate new labels
cr_learned = Signac(E = E, R = R_learned, spring.dir = new.data.dir)

Now we amend the existing labels (classified previously with SignacX); we add the new labels and generate a new SPRING layout.Note:

We usually copy the existing SPRING files from “FullDataset_v1” to “FullDataset_v1_Learned” to generate a new layout while preserving the existing layout.

# modify the existing labels
cr = lapply(json_data, function(x) x$label_list)
logik = cr$CellStates == 'NK'
cr$CellStates[logik] = cr_learned[logik]
logik = cr$CellStates_novel == 'NK'
cr$CellStates_novel[logik] = cr_learned[logik]
new.data.dir = paste0(new.data.dir, "_Learned")

Save results

# save
dat = CID.writeJSON(cr, spring.dir = new.data.dir, new_colors = c('red'), new_populations = c( 'NK.CD56bright'))

Session Info

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.3    magrittr_2.0.1    formatR_1.7       htmltools_0.5.1.1
##  [5] tools_4.0.3       yaml_2.2.1        stringi_1.5.3     rmarkdown_2.6    
##  [9] highr_0.8         knitr_1.30        stringr_1.4.0     digest_0.6.27    
## [13] xfun_0.20         rlang_0.4.10      evaluate_0.14

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.