The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Run Canek on a toy example

library(Canek)

# Functions
## Function to plot the pca coordinates
plotPCA <- function(pcaData = NULL, label = NULL, legPosition = "topleft"){
  col <- as.integer(label) 
  plot(x = pcaData[,"PC1"], y = pcaData[,"PC2"],
       col = as.integer(label), cex = 0.75, pch = 19,
       xlab = "PC1", ylab = "PC2")
  legend(legPosition,  pch = 19,
         legend = levels(label), 
         col =  unique(as.integer(label)))
}

Load the data

On this toy example we use the two simulated batches included in the SimBatches data from Canek’s package. SimBatches is a list containing:

lsData <- list(B1 = SimBatches$batches[[1]], B2 = SimBatches$batches[[2]])
batch <- factor(c(rep("Batch-1", ncol(lsData[[1]])),
                  rep("Batch-2", ncol(lsData[[2]]))))
celltype <- SimBatches$cell_types
table(batch)
#> batch
#> Batch-1 Batch-2 
#>     631     948
table(celltype)
#> celltype
#> Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 4 
#>        1451          53          38          37

PCA before correction

We perform the Principal Component Analysis (PCA) of the joined datasets and scatter plot the first two PCs. The batch-effect causes cells to group by batch.

data <- Reduce(cbind, lsData)
pcaData <- prcomp(t(data), center = TRUE, scale. = TRUE)$x
plotPCA(pcaData = pcaData, label = batch, legPosition = "bottomleft")

plotPCA(pcaData = pcaData, label = celltype, legPosition = "bottomleft")

Run Canek

We correct the toy batches using the function RunCanek. This function accepts:

On this example we use the list of matrices created before.

data <- RunCanek(lsData)

PCA after correction

We perform PCA of the corrected datasets and plot the first two PCs. After correction, the cells group by their corresponding cell type.

pcaData <- prcomp(t(data), center = TRUE, scale. = TRUE)$x
plotPCA(pcaData = pcaData, label = batch, legPosition = "topleft")

plotPCA(pcaData = pcaData, label = celltype, legPosition = "topleft")

Session info

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#> 
#> Matrix products: default
#> BLAS/LAPACK: /Users/martin/miniconda3/envs/R_4.1.3/lib/libopenblasp-r0.3.18.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Canek_0.2.5
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.10          highr_0.10           DEoptimR_1.0-11     
#>  [4] bslib_0.4.2          compiler_4.1.3       bluster_1.4.0       
#>  [7] jquerylib_0.1.4      class_7.3-21         prabclus_2.3-2      
#> [10] BiocNeighbors_1.12.0 numbers_0.8-5        tools_4.1.3         
#> [13] digest_0.6.31        mclust_6.0.0         jsonlite_1.8.4      
#> [16] evaluate_0.20        lattice_0.20-45      pkgconfig_2.0.3     
#> [19] rlang_1.0.6          Matrix_1.5-3         igraph_1.3.5        
#> [22] cli_3.6.0            rstudioapi_0.14      yaml_2.3.7          
#> [25] parallel_4.1.3       xfun_0.37            fastmap_1.1.0       
#> [28] knitr_1.42           cluster_2.1.4        sass_0.4.5          
#> [31] S4Vectors_0.32.4     fpc_2.2-10           diptest_0.76-0      
#> [34] nnet_7.3-18          stats4_4.1.3         grid_4.1.3          
#> [37] robustbase_0.95-0    R6_2.5.1             flexmix_2.3-18      
#> [40] BiocParallel_1.28.3  rmarkdown_2.20       irlba_2.3.5.1       
#> [43] kernlab_0.9-32       magrittr_2.0.3       matrixStats_0.63.0  
#> [46] modeltools_0.2-23    htmltools_0.5.4      BiocGenerics_0.40.0 
#> [49] MASS_7.3-58.3        cachem_1.0.6         FNN_1.1.3.1

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.