CellDEEP Quick Start

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

What CellDEEP does

CellDEEP reduces scRNA-seq sparsity by pooling cells into pseudocells before DE testing.

Load package and example data

library(CellDEEP)
data("sim")

Step 1: Run DE directly with FindMarker.CellDEEP

FindMarker.CellDEEP includes metadata preparation internally. Key parameters to set: - group_id, sample_id, cluster_id: metadata column names in your Seurat object - ident.1, ident.2: two groups to compare - cell_selection: how to select cells for pooling ("kmean" or "random") - readcounts: how to aggregate counts in pooled cells ("sum" or "mean") - min_cells_per_subgroup: minimum cells required in each sample-cluster subgroup for pooling

de.test <- FindMarker.CellDEEP(
  sim,
  group_id = "Status",
  sample_id = "DonorID",
  cluster_id = "cluster_id",
  Pool = TRUE,
  test.use = "wilcox",
  n_cells = 3,
  min_cells_per_subgroup = 1,
  cell_selection = "random",
  readcounts = "sum",
  logfc.threshold = 0.25,
  ident.1 = "Case",
  ident.2 = "Control"
)
#> Start Pooling.....
#> Pooling...
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
#> Pooling summary (random):
#> Input cells: 200
#> Cells kept in pooled pseudocells: 180
#> Cells not kept (approx): 20
#> Skipped empty groups: 0
#> Skipped empty clusters: 0
#> Skipped empty samples: 20
#> Skipped subgroups (<= min_cells_per_subgroup): 0
#> Dropped remainder cells (< n_cells) after random pooling: 20
#> FindMarker running.....
#> 1st ident is:
#> Case
#> 2nd ident is:
#> Control
#> group by:
#> group_id
#> Normalizing layer: counts
#> Finding variable features for layer counts
#> Centering and scaling data matrix
#> For a (much!) faster implementation of the Wilcoxon Rank Sum Test,
#> (default method for FindMarkers) please install the presto package
#> --------------------------------------------
#> install.packages('devtools')
#> devtools::install_github('immunogenomics/presto')
#> --------------------------------------------
#> After installation of presto, Seurat will automatically use the more 
#> efficient implementation (no further action necessary).
#> This message will be shown once per session
#> 20
#> Gene1728Gene1992Gene1626Gene1864Gene1715Gene1807

Step 2: Pool cells only (optional)

Use these functions if you want pooled objects without running DE immediately.

min_cells_per_subgroup means the minimum number of cells required in each sample_id x cluster_id subgroup before pooling is performed.

Pooling functions use standardized metadata fields (sample_id, group_id, cluster_id), so prepare once before pooling:

pool_input <- prepare_data(
  sim,
  sample_id = "DonorID",
  group_id = "Status",
  cluster_id = "cluster_id"
)

K-means pooling

pooled_kmean <- CellDEEP.Kmean(
  pool_input,
  readcounts = "sum",
  n_cells = 3,
  min_cells_per_subgroup = 1,
  assay_name = "RNA"
)
#> Pooling...
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
#> Drop out cell number during kmean pooling is:
#> 24
#> Pooling summary (kmean):
#> Input cells: 200
#> Cells kept in pooled pseudocells: 176
#> Cells not kept (approx): 24
#> Skipped empty groups: 0
#> Skipped empty clusters: 0
#> Skipped empty samples: 20
#> Skipped subgroups (<= min_cells_per_subgroup): 0
#> Dropped singleton cells after kmeans split: 24
pooled_kmean
#> An object of class Seurat 
#> 2000 features across 56 samples within 1 assay 
#> Active assay: RNA (2000 features, 0 variable features)
#>  1 layer present: counts

Random pooling

pooled_random <- CellDEEP.Random(
  pool_input,
  readcounts = "sum",
  n_cells = 5,
  min_cells_per_subgroup = 1,
  assay_name = "RNA"
)
#> Pooling...
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
#> Pooling summary (random):
#> Input cells: 200
#> Cells kept in pooled pseudocells: 160
#> Cells not kept (approx): 40
#> Skipped empty groups: 0
#> Skipped empty clusters: 0
#> Skipped empty samples: 20
#> Skipped subgroups (<= min_cells_per_subgroup): 0
#> Dropped remainder cells (< n_cells) after random pooling: 40
pooled_random
#> An object of class Seurat 
#> 2000 features across 32 samples within 1 assay 
#> Active assay: RNA (2000 features, 0 variable features)
#>  1 layer present: counts

If no genes pass the adjusted p-value filter in this small example dataset, try a larger dataset or set full_list = TRUE.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.