Olink® NPX datasets are normalized datasets using either plate control normalization or intensity normalization methods. Intensity normalization method assumes that all samples within a project are fully randomized.
The joint analysis of two or more Olink® NPX datasets often requires an additional batch correction step to remove technical variations, which is referred to as bridging.
Bridging is needed if Olink® NPX datasets are:
plate control normalized only and run conditions (e.g lab and reagent lots) have changes.
intensity normalized but from two different sample populations.
To bridge two or more Olink® NPX datasets, bridging samples are needed to calculate the assay-specific adjustment factors between datasets. Bridging samples are shared samples among datasets - that is samples that are analysed in both datasets. The recommended number of bridging samples are shown in the table below. Olink® NPX datasets without shared samples should not be combined using the bridging approach described below.
Platform | BridgingSamples |
---|---|
Target 96 | 8-16 |
Explore 1536 | 8-16 |
Explore Expansion | 16-24 |
The following tutorial is designed to give you an overview of the kinds of data combining methods that are possible using the Olink® bridging procedure. Before starting bridging, it is important to check if the same sample IDs were assigned to the bridging samples.
Bridging datasets are standard Olink® NPX tables. They can be loaded
using read_NPX()
function with NPX manager output file as
input.
<- read_NPX("~/NPX_file1_location.xlsx")
data1 <- read_NPX("~/NPX_file2_location.xlsx") data2
To demonstrate how bridging works, we will use the example datasets
(npx_data1
and npx_data2
) from Olink
Analyze package. This workflow also uses functions from the
dplyr, stringr, and ggplot2 packages.
Explore and have an overview of the datasets that are going to be bridged. For example, plot and compare NPX distribution between datasets.
# Load datasets
<- npx_data1 %>%
npx_1 filter(!str_detect(SampleID, "CONTROL_SAMPLE")) %>% #Remove control samples
mutate(dataset = "data1")
<- npx_data2 %>%
npx_2 filter(!str_detect(SampleID, "CONTROL_SAMPLE")) %>% #Remove control samples
mutate(dataset = "data2")
<- bind_rows(npx_1, npx_2)
npx_df
# Plot NPX density before bridging normalization
%>%
npx_df mutate(Panel = gsub("Olink ", "", Panel)) %>%
ggplot(aes(x = NPX, fill = dataset)) +
geom_density(alpha = 0.4) +
facet_grid(~Panel) +
olink_fill_discrete(coloroption = c("red", "darkblue")) +
set_plot_theme() +
ggtitle("Before bridging normalization: NPX distribution") +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.text = element_text(size = 16),
legend.title = element_blank(),
legend.position = "top")
Use PCA plot to visualize sample-to-sample distance before bridging. Typically the project dataset accounts for most of the observed variation within the combined datasets at this point.
## before bridging
#### Extract bridging samples
<- npx_data1 %>%
npx_1 mutate(dataset = "data1")
<- npx_data2 %>%
npx_2 mutate(dataset = "data2")
<- intersect(npx_1$SampleID, npx_2$SampleID) %>%
overlap_samples data.frame() %>%
filter(!str_detect(., "CONTROL_SAMPLE")) %>% #Remove control samples
pull(.)
### Generate unique SampleIDs
<- npx_data1 %>%
npx_1 mutate(dataset = "data1",
SampleID = paste0(dataset, PlateID, SampleID))
<- npx_data1 %>%
br_1 mutate(dataset = "data1") %>%
filter(SampleID %in% overlap_samples) %>%
mutate(SampleID = paste0(dataset, PlateID, SampleID))
<- npx_data2 %>%
npx_2 mutate(dataset = "data2",
SampleID = paste0(dataset, PlateID, SampleID))
<- npx_data2 %>%
br_2 mutate(dataset = "data2") %>%
filter(SampleID %in% overlap_samples) %>%
mutate(SampleID = paste0(dataset, PlateID, SampleID))
<- rbind(npx_1, npx_2) %>%
npx_before_br mutate(datatype = ifelse(SampleID %in% c(br_1$SampleID, br_2$SampleID),
"bridges", dataset))
### PCA plot
::olink_pca_plot(df = npx_before_br,
OlinkAnalyzecolor_g = "datatype",
byPanel = TRUE)
PCA plot of combined datasets before bridging
We can use olink_normalization()
function to bridge two
datasets. The bridging procedure is to first calculate the
median of the paired NPX differences per assay between the
bridging samples as adjustment factor then use these adjustment factors
to adjust NPX values between two datasets. In this process, one dataset
is considered the reference dataset (df1
) and its NPX
values remain unaltered. The other dataset is considered the new dataset
(df2
) and is adjusted to the reference dataset based on the
adjustment factors.
The output from olink_normalization()
function is a NPX
table with adjusted NPX value in the column NPX
.
olink_normalization()
uses the information from column
Project
to distinguish between reference dataset from the
other dataset. It is up to the user to define which dataset is the
reference dataset and specify the names of the bridge samples.
# Find shared samples
<- npx_data1 %>%
npx_1 mutate(dataset = "data1")
<- npx_data2 %>%
npx_2 mutate(dataset = "data2")
<- intersect(npx_1$SampleID, npx_2$SampleID) %>%
overlap_samples data.frame() %>%
filter(!str_detect(., "CONTROL_SAMPLE")) %>% #Remove control samples
pull(.)
# Perform Bridging normalization
<- olink_normalization(df1 = npx_1,
npx_br_data df2 = npx_2,
overlapping_samples_df1 = overlap_samples,
df1_project_nr = "20200001",
df2_project_nr = "20200002",
reference_project = "20200001")
glimpse(npx_br_data)
#> Rows: 61,824
#> Columns: 19
#> $ SampleID <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "CONTROL…
#> $ Index <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
#> $ OlinkID <chr> "OID01216", "OID01216", "OID01216", "OID01216", "OID0121…
#> $ UniProt <chr> "O00533", "O00533", "O00533", "O00533", "O00533", "O0053…
#> $ Assay <chr> "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", …
#> $ MissingFreq <dbl> 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.…
#> $ Panel_Version <chr> "v.1201", "v.1201", "v.1201", "v.1201", "v.1201", "v.120…
#> $ PlateID <chr> "Example_Data_1_CAM.csv", "Example_Data_1_CAM.csv", "Exa…
#> $ QC_Warning <chr> "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", …
#> $ LOD <dbl> 2.368467, 2.368467, 2.368467, 2.368467, 2.368467, 2.3684…
#> $ NPX <dbl> 12.956143, 11.269477, 25.451070, 14.453038, 7.628712, 6.…
#> $ Subject <chr> "ID1", "ID1", "ID1", "ID2", "ID2", "ID2", "ID3", "ID3", …
#> $ Treatment <chr> "Untreated", "Untreated", "Untreated", "Untreated", "Unt…
#> $ Site <chr> "Site_D", "Site_D", "Site_D", "Site_C", "Site_C", "Site_…
#> $ Time <chr> "Baseline", "Week.6", "Week.12", "Baseline", "Week.6", "…
#> $ Project <chr> "20200001", "20200001", "20200001", "20200001", "2020000…
#> $ Panel <chr> "Olink Cardiometabolic", "Olink Cardiometabolic", "Olink…
#> $ dataset <chr> "data1", "data1", "data1", "data1", "data1", "data1", "d…
#> $ Adj_factor <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
First, check NPX distribution in datasets after bridging normalization.
# Plot NPX density after bridging normalization
%>%
npx_br_data mutate(Panel = gsub("Olink ", "", Panel)) %>%
ggplot(aes(x = NPX, fill = dataset)) +
geom_density(alpha = 0.4) +
facet_grid(~Panel) +
olink_fill_discrete(coloroption = c("red", "darkblue")) +
set_plot_theme() +
ggtitle("After bridging normalization: NPX distribution") +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.text = element_text(size = 16),
legend.title = element_blank(),
legend.position = "top")
Then summarize number of assays that have adjustment factors in certain ranges. High adjustment factors can result from variations between projects, such as panel versions or technical modifications. Such assays can be visualized individually with violin plots and may warrant further investigation to confirm they are still comparable between projects.
<- c(-Inf, -1, -0.25, 0.25, 1, Inf)
breaks <- c("[< -1]", "(-1,-0.25]", "(-0.25,0.25]", "(0.25,1]", "(>1]")
labels
# reference project defined in bridging normalization
<- "20200001"
reference_project %>%
npx_br_data mutate(Panel = gsub("Olink ", "", Panel)) %>%
# remove adjustment factors from reference project.
filter(Project != reference_project) %>%
mutate(group = cut(Adj_factor, breaks = breaks,
labels = labels)) %>%
select(OlinkID, Adj_factor, Panel, group) %>%
distinct() %>%
group_by(Panel, group) %>%
tally() %>%
::pivot_wider(names_from = group, values_from = n) %>%
tidyr::column_to_rownames(var = "Panel") %>%
tibblerelocate(any_of(labels)) %>%
kbl(booktabs = TRUE,
digits = 2,
caption = paste("Distribution of assays in different ranges of",
"adjustment factors")) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE,
position = "center", latex_options = "HOLD_position")
[< -1] | (-1,-0.25] | (-0.25,0.25] | (0.25,1] | (>1] | |
---|---|---|---|---|---|
Cardiometabolic | 6 | 15 | 36 | 18 | 17 |
Inflammation | 13 | 18 | 39 | 17 | 5 |
Finally, use PCA plot to check whether bridging normalization has effect in correcting batch effects. In the example below, it is clear that before bridging samples from data 1 and 2 are divided into separate clusters due to the batch effects, but after bridging they are shown as one cluster in the PCA plot. Bridging normalization has sufficiently removed the batch effects between two data sets.
## After bridging
### Generate unique SampleIDs
<- npx_before_br %>%
br_ids select(SampleID, OlinkID, datatype)
<- npx_br_data %>%
npx_after_br mutate(SampleID = paste0(dataset, PlateID, SampleID)) %>%
left_join(br_ids, by = c("SampleID", "OlinkID"))
### PCA plot
::olink_pca_plot(df = npx_after_br,
OlinkAnalyzecolor_g = "datatype",
byPanel = TRUE)
PCA plot of combined datasets after bridging