The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Quick start to Harmony

Korsunsky et al.: Fast, sensitive, and accurate integration of single cell data with Harmony

Introduction

Harmony is an algorithm for performing integration of single cell genomics datasets. Please check out our latest manuscript on Nature Methods.

Installation

Install Harmony from CRAN with standard commands.

Code
install.packages('harmony')

Once Harmony is installed, load it up!

Code
library(harmony)

Integrating cell line datasets from 10X

The example below follows Figure 2 in the manuscript.

We downloaded 3 cell line datasets from the 10X website. The first two (jurkat and 293t) come from pure cell lines while the half dataset is a 50:50 mixture of Jurkat and HEK293T cells. We inferred cell type with the canonical marker XIST, since the two cell lines come from 1 male and 1 female donor.

We library normalized the cells, log transformed the counts, and scaled the genes. Then we performed PCA and kept the top 20 PCs. The PCA embeddings and meta data are available as part of this package.

Code
data(cell_lines)
V <- cell_lines$scaled_pcs
meta_data <- cell_lines$meta_data

Initially, the cells cluster by both dataset (left) and cell type (right).

Code
library(ggplot2)

do_scatter <- function(xy, meta_data, label_name, base_size = 12) {    
    palette_use <- c(`jurkat` = '#810F7C', `t293` = '#D09E2D',`half` = '#006D2C')
    xy <- xy[, 1:2]
    colnames(xy) <- c('X1', 'X2')
    plt_df <- xy %>% data.frame() %>% cbind(meta_data)
    plt <- ggplot(plt_df, aes(X1, X2, col = !!rlang::sym(label_name), fill = !!rlang::sym(label_name))) + 
        theme_test(base_size = base_size) +
        guides(color = guide_legend(override.aes = list(stroke = 1, alpha = 1,
                                                        shape = 16, size = 4))) +
        scale_color_manual(values = palette_use) +
        scale_fill_manual(values = palette_use) +
        theme(plot.title = element_text(hjust = .5)) +
        labs(x = "PC 1", y = "PC 2") +
        theme(legend.position = "none") +
        geom_point(shape = '.')
    
    ## Add labels
    data_labels <- plt_df %>%
        dplyr::group_by(!!rlang::sym(label_name)) %>%
        dplyr::summarise(X1 = mean(X1), X2 = mean(X2)) %>%
        dplyr::ungroup()
    plt + geom_label(data = data_labels, aes(label = !!rlang::sym(label_name)), 
                            color = "white", size = 4)
}
p1 <- do_scatter(V, meta_data, 'dataset') + 
    labs(title = 'Colored by dataset')
p2 <- do_scatter(V, meta_data, 'cell_type') + 
    labs(title = 'Colored by cell type')

cowplot::plot_grid(p1, p2)

Let’s run Harmony to remove the influence of dataset-of-origin from the cell embeddings.

Code
harmony_embeddings <- harmony::RunHarmony(
    V, meta_data, 'dataset', verbose=FALSE
)

After Harmony, the datasets are now mixed (left) and the cell types are still separate (right).

Code
p1 <- do_scatter(harmony_embeddings, meta_data, 'dataset') + 
    labs(title = 'Colored by dataset')
p2 <- do_scatter(harmony_embeddings, meta_data, 'cell_type') + 
    labs(title = 'Colored by cell type')
cowplot::plot_grid(p1, p2, nrow = 1)

Next Steps

Interfacing to software packages

You can also run Harmony as part of an established pipeline in several packages, such as Seurat. For these vignettes, please visit our github page.

Detailed breakdown of the Harmony algorithm

For more details on how each part of Harmony works, consult our more detailed vignette “Detailed Walkthrough of Harmony Algorithm”.

Session Info

Code
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS/LAPACK: /home/main/miniconda3/envs/Renv/lib/libopenblasp-r0.3.21.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.2.0    ggrepel_0.9.3      ggthemes_4.2.4     lubridate_1.9.2   
##  [5] forcats_1.0.0      stringr_1.5.0      purrr_1.0.2        readr_2.1.4       
##  [9] tidyr_1.3.0        tibble_3.2.1       ggplot2_3.5.1      tidyverse_2.0.0   
## [13] data.table_1.14.8  cowplot_1.1.3      dplyr_1.1.4        Seurat_5.0.1      
## [17] SeuratObject_5.0.1 sp_1.6-1           harmony_1.2.3      Rcpp_1.0.12       
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.16             colorspace_2.1-0       deldir_1.0-9          
##   [4] ellipsis_0.3.2         ggridges_0.5.4         RcppHNSW_0.6.0        
##   [7] spatstat.data_3.0-1    farver_2.1.2           leiden_0.4.3          
##  [10] listenv_0.9.0          RSpectra_0.16-1        fansi_1.0.6           
##  [13] codetools_0.2-19       splines_4.2.0          cachem_1.0.7          
##  [16] knitr_1.42             polyclip_1.10-4        spam_2.10-0           
##  [19] jsonlite_1.8.7         RhpcBLASctl_0.23-42    ica_1.0-3             
##  [22] cluster_2.1.4          png_0.1-8              uwot_0.1.16           
##  [25] spatstat.sparse_3.0-1  shiny_1.7.4            sctransform_0.4.1     
##  [28] compiler_4.2.0         httr_1.4.5             Matrix_1.6-3          
##  [31] fastmap_1.1.1          lazyeval_0.2.2         cli_3.6.2             
##  [34] later_1.3.0            htmltools_0.5.6.1      tools_4.2.0           
##  [37] igraph_1.6.0           dotCall64_1.1-1        gtable_0.3.5          
##  [40] glue_1.7.0             RANN_2.6.1             reshape2_1.4.4        
##  [43] scattermore_1.2        jquerylib_0.1.4        vctrs_0.6.5           
##  [46] nlme_3.1-162           spatstat.explore_3.2-1 progressr_0.13.0      
##  [49] lmtest_0.9-40          spatstat.random_3.1-5  xfun_0.40             
##  [52] globals_0.16.2         timechange_0.2.0       mime_0.12             
##  [55] miniUI_0.1.1.1         lifecycle_1.0.4        irlba_2.3.5.1         
##  [58] goftest_1.2-3          future_1.32.0          MASS_7.3-58.3         
##  [61] zoo_1.8-12             scales_1.3.0           hms_1.1.3             
##  [64] promises_1.2.0.1       spatstat.utils_3.0-5   parallel_4.2.0        
##  [67] RColorBrewer_1.1-3     yaml_2.3.7             reticulate_1.29       
##  [70] pbapply_1.7-0          gridExtra_2.3          sass_0.4.5            
##  [73] stringi_1.7.12         highr_0.10             fastDummies_1.7.3     
##  [76] rlang_1.1.3            pkgconfig_2.0.3        matrixStats_1.0.0     
##  [79] evaluate_0.22          lattice_0.20-45        tensor_1.5            
##  [82] ROCR_1.0-11            labeling_0.4.3         htmlwidgets_1.6.2     
##  [85] tidyselect_1.2.1       parallelly_1.36.0      RcppAnnoy_0.0.22      
##  [88] plyr_1.8.8             magrittr_2.0.3         R6_2.5.1              
##  [91] generics_0.1.3         withr_3.0.0            pillar_1.9.0          
##  [94] fitdistrplus_1.1-11    abind_1.4-5            survival_3.5-5        
##  [97] future.apply_1.11.0    KernSmooth_2.23-21     utf8_1.2.4            
## [100] spatstat.geom_3.2-1    plotly_4.10.2          tzdb_0.4.0            
## [103] rmarkdown_2.21         grid_4.2.0             digest_0.6.33         
## [106] xtable_1.8-4           httpuv_1.6.9           munsell_0.5.1         
## [109] viridisLite_0.4.2      bslib_0.4.2

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.