Olink® NPX datasets are normalized datasets using either plate control normalization or intensity normalization methods. Intensity normalization method assumes that all samples within a project are fully randomized.
The joint analysis of two or more Olink® NPX datasets often requires additional batch correction step to remove technical variations, which is referred to as bridging.
Bridging is needed if Olink® NPX datasets are:
plate control normalized only and run conditions (e.g lab and reagent lots) have changes.
intensity normalized but from two different sample populations.
To bridge two or more Olink® NPX datasets, bridging samples are needed to calculate adjustment factors between datasets. Bridging samples are overlapping samples between datasets. The recommended number of bridging samples for Explore 1536 datasets is between 8-16. Olink® NPX datasets without overlapping samples can not be combined to perform joint analysis using the bridging approach described below.
The following tutorial is designed to give you an overview of the kinds of data combining methods that are possible using the Olink® bridging procedure.
library(OlinkAnalyze)
library(dplyr)
library(stringr)
The bridging objects are standard Olink® NPX tables. They can be
loaded using read_NPX()
function with NPX manager output
file as input.
<- read_NPX("~/NPX_file1_location.xlsx")
data1 <- read_NPX("~/NPX_file2_location.xlsx") data2
To demonstrate how bridging works, we will use the example datasets
(npx_data1
and npx_data2
) from
OlinkAnalyze package.
We can use olink_normalization()
function to bridge two
datasets. The bridging procedure is to first calculate the
median of the paired NPX differences per assay between the
bridging samples as adjustment factor then use these adjustment factors
to adjust NPX values between two datasets. The output from
olink_normalization()
function is a NPX table with adjusted
NPX value in the column NPX
.
# Find overlapping samples
<- npx_data1 %>%
npx_1 mutate(dataset = "data1")
<- npx_data2 %>%
npx_2 mutate(dataset = "data2")
<- intersect(npx_1$SampleID, npx_2$SampleID) %>%
overlap_samples data.frame() %>%
filter(!str_detect(., 'CONTROL_SAMPLE')) %>% #Remove control samples
pull(.)
# Perform Bridging normalization
<- olink_normalization(df1 = npx_1,
npx_br_data df2 = npx_2,
overlapping_samples_df1 = overlap_samples,
df1_project_nr = '20200001',
df2_project_nr = '20200002',
reference_project = '20200001')
glimpse(npx_br_data)
#> Rows: 61,824
#> Columns: 19
#> $ SampleID <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "CONTROL…
#> $ Index <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
#> $ OlinkID <chr> "OID01216", "OID01216", "OID01216", "OID01216", "OID0121…
#> $ UniProt <chr> "O00533", "O00533", "O00533", "O00533", "O00533", "O0053…
#> $ Assay <chr> "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", …
#> $ MissingFreq <dbl> 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.…
#> $ Panel_Version <chr> "v.1201", "v.1201", "v.1201", "v.1201", "v.1201", "v.120…
#> $ PlateID <chr> "Example_Data_1_CAM.csv", "Example_Data_1_CAM.csv", "Exa…
#> $ QC_Warning <chr> "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", …
#> $ LOD <dbl> 2.368467, 2.368467, 2.368467, 2.368467, 2.368467, 2.3684…
#> $ NPX <dbl> 12.956143, 11.269477, 25.451070, 14.453038, 7.628712, 6.…
#> $ Subject <chr> "ID1", "ID1", "ID1", "ID2", "ID2", "ID2", "ID3", "ID3", …
#> $ Treatment <chr> "Untreated", "Untreated", "Untreated", "Untreated", "Unt…
#> $ Site <chr> "Site_D", "Site_D", "Site_D", "Site_C", "Site_C", "Site_…
#> $ Time <chr> "Baseline", "Week.6", "Week.12", "Baseline", "Week.6", "…
#> $ Project <chr> "20200001", "20200001", "20200001", "20200001", "2020000…
#> $ Panel <chr> "Olink Cardiometabolic", "Olink Cardiometabolic", "Olink…
#> $ dataset <chr> "data1", "data1", "data1", "data1", "data1", "data1", "d…
#> $ Adj_factor <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
PCA plot is used to visualize sample-to-sample distance before and after bridging.
## before bridging
### Generate unique SampleIDs
<- npx_data1 %>%
npx_1 mutate(dataset = "data1") %>%
mutate(SampleID = paste0(dataset, PlateID, SampleID))
<- npx_data2 %>%
npx_2 mutate(dataset = "data2") %>%
mutate(SampleID = paste0(dataset, PlateID, SampleID))
<- rbind(npx_1, npx_2)
npx_before_br
### PCA plot
::olink_pca_plot(df = npx_before_br, color_g = "dataset", byPanel = TRUE, coloroption = c("orange",
OlinkAnalyze"darkblue"))
PCA plot of combined datasets without bridging
## After bridging
### Generate unique SampleIDs
<- npx_br_data %>%
npx_after_br mutate(SampleID = paste0(dataset, PlateID, SampleID))
### PCA plot
::olink_pca_plot(df = npx_after_br, color_g = "dataset", byPanel = TRUE, coloroption = c("orange",
OlinkAnalyze"darkblue"))
PCA plot of combined datasets after bridging