The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

ComBatQC

Introduction

The need for larger samples in human subjects research have led to a trend of aggregating data across multiple locations (sites). This trend is especially prevalent in neuroimaging research. However, while the larger samples promoted greater power to detect significant associations as well as better generalizability of results, multiple-site study designs also introduce heterogeneity in acquisition and processing, which might consequently impact study findings.

ComBat is used as the major harmonization technique in neuroimaging and the ComBat Family further extends the original ComBat methodology to enable flexible covariate modeling, leveraging efficient R implementations of regression models. However, it remains a challenge to evaluate potential batch effects as well as the performance of harmonization. ComBatFamQC provides a useful visualization tool through Rshiny for interactive batch effect diagnostics before and after harmonization. To streamline the harmonization process and improve efficiency, transparency, and reproducibility, ComBatFamQC also provides default harmonization methods by integrating the ComBatFamily package.

The ComBatFamQC visualization includes three key functions:

visual_prep: provides all relevant statistical test results for batch effect visualization and evaluation.
combat_harm: provides default harmonization methods from the ComBatFamily package.
comfam_shiny: generate interactive visualization through Rshiny.

The ComBatFamQC includes the following harmonizationmethods:

ComBat (Johnson et al., 2007)
ComBat-GAM (Pomponio et al., 2020)
Longitudinal ComBat (Beer et al., 2020)
CovBat(Chen et al., 2021)

Set up

Install the ComBatFamQC package and read in the data set to be harmonized. ADNI data is used in the vignette for illustration. To be noticed, the data set should include at least the following columns:

batch column
feature columns (make sure univariate column is excluded)
covariate columns (essential for gam model)
random effect column (essential for lmer model)

library(ComBatFamQC)
library(dplyr)
data(adni)

Interactive Batch Effect Diagnostics

Three types of regression models can be considered for batch effect evaluation:

lm: Linear regression model, which assumes that the relationship between the variables is linear.
gam: Generalized additive model, which models the relationship between the dependent variable and certain independent variable as a smooth, non-linear function, typically using splines.
lmer: Linear mixed-effects model, which extends the linear regression model to account for both fixed effects and random effects. It is commonly used in longitudinal datasets.

(Note: For Windows users, make sure to set cores = 1 in visual_prep function. The MDMR test can be time-consuming, especially in large datasets. Users have the option to disable the MDMR test by setting mdmr = FALSE.)

Linear Regression Model

features <- colnames(adni)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "manufac"
result_orig <- visual_prep(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni)
comfam_shiny(result_orig)

Generalized additive Model

result_gam <- visual_prep(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni)
comfam_shiny(result_gam)

Linear Mixed-Effects Model

result_lmer <- visual_prep(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni)
comfam_shiny(result_lmer)

Export Batch Effect Diagnosis Result

There are two export options: 1) generate a Quarto report (requires Quarto to be installed), and 2) generate a combined Excel file.

Generate a Quarto report

#library(quarto)
temp_dir <- tempfile()
dir.create(temp_dir)
diag_save(path = temp_dir, result = result_lmer, use_quarto = TRUE)

Generate a combined EXCEL file

diag_save(path = temp_dir, result = result_lmer, use_quarto = FALSE)

Harmonization Using default ComBatFamily Methods

There are two types of harmonization scenarios users can choose from:

First-time Harmonization (Can also do interactive harmonization through Rshiny)
Out of Sample Harmonization
- predict from existing ComBat model (works only for original ComBat and ComBat-GAM)
- harmonize new data toward existing reference data (works for all built-in ComBat harmonization methods)

First-time Harmonization

Specify parameters carefully based on the harmonization method to be applied.

Original ComBat

features <- colnames(adni)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "manufac"
## Harmonize using evaluation results as the inputs
combat_model <- combat_harm(result = result_orig, type = "lm", interaction = interaction, smooth = NULL, random = NULL, df = adni)
## Harmonize through specifying features, batch, covariates and df arguments
combat_model_copy <- combat_harm(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni)
## Expect to get the same harmonization results
identical(combat_model$harmonized_df, combat_model_copy$harmonized_df)

# save harmonized data
write.csv(combat_model$harmonized_df, file.path(temp_dir, "harmonized.csv"))

# save combat model
saveRDS(combat_model$combat.object, file.path(temp_dir, "combat_model.rds"))
# Clean up the temporary file
unlink(temp_dir, recursive = TRUE)

Longitudinal ComBat

## Harmonize using evaluation results as the inputs
combat_model_lmer <- combat_harm(result = result_lmer, type = "lmer", interaction = interaction, smooth = NULL, random = "subid", df = adni)
## Harmonize through specifying features, batch, covariates and df arguments
combat_model_lmer_copy <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni)
## Expect to get the same harmonization results
identical(combat_model_lmer$harmonized_df, combat_model_lmer_copy$harmonized_df)

ComBat-GAM

## Harmonize using evaluation results as the inputs
combat_model_gam <- combat_harm(result = result_gam, type = "gam", interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni)
## Harmonize through specifying features, batch, covariates and df arguments
combat_model_gam_copy <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni)
## Expect to get the same harmonization results
identical(combat_model_gam$harmonized_df, combat_model_gam_copy$harmonized_df)

CovBat

## Harmonize using evaluation results as the inputs
covbat_model <- combat_harm(result = result_gam, type = "gam", interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni, family = "covfam")
## Harmonize through specifying features, batch, covariates and df arguments
covbat_model_copy <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni, family = "covfam")
## Expect to get the same harmonization results
identical(covbat_model$harmonized_df, covbat_model_copy$harmonized_df)

Out of Sample Harmonization

from ComBat Model

Specify predict parameter to be TRUE and object parameter to be saved ComBat model.

saved_model <- combat_model_gam$combat.object
harm_predict <- combat_harm(df = adni %>% head(1000), predict = TRUE, object = saved_model)

from Reference Data

Specify reference parameter to be saved reference data. To be noticed, the reference data should have identical columns as the new data and the new data should contain reference data as its sub sample.

# harmonize reference data
reference_site <- adni %>% group_by(site) %>% summarize(count = n()) %>% arrange(desc(count)) %>% pull(site) %>% head(30)
reference_df <- adni %>% filter(site %in% reference_site)
features <- colnames(reference_df)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "site"
ref_model <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = reference_df)

# harmonize new data to the reference data
harm_new <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni, reference = ref_model$harmonized_df)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.