Introduction to bridging datasets

Compiled: August 31, 2022

Introduction to bridging Olink®NPX datasets

Olink® NPX datasets are normalized datasets using either plate control normalization or intensity normalization methods. Intensity normalization method assumes that all samples within a project are fully randomized.

The joint analysis of two or more Olink® NPX datasets often requires additional batch correction step to remove technical variations, which is referred to as bridging.

Bridging is needed if Olink® NPX datasets are:

To bridge two or more Olink® NPX datasets, bridging samples are needed to calculate adjustment factors between datasets. Bridging samples are overlapping samples between datasets. The recommended number of bridging samples for Explore 1536 datasets is between 8-16. Olink® NPX datasets without overlapping samples can not be combined to perform joint analysis using the bridging approach described below.

The following tutorial is designed to give you an overview of the kinds of data combining methods that are possible using the Olink® bridging procedure.

Setup the bridging objects

library(OlinkAnalyze)
library(dplyr)
library(stringr)

The bridging objects are standard Olink® NPX tables. They can be loaded using read_NPX() function with NPX manager output file as input.

data1 <- read_NPX("~/NPX_file1_location.xlsx")
data2 <- read_NPX("~/NPX_file2_location.xlsx")

To demonstrate how bridging works, we will use the example datasets (npx_data1 and npx_data2) from OlinkAnalyze package.

Perform bridging

We can use olink_normalization() function to bridge two datasets. The bridging procedure is to first calculate the median of the paired NPX differences per assay between the bridging samples as adjustment factor then use these adjustment factors to adjust NPX values between two datasets. The output from olink_normalization() function is a NPX table with adjusted NPX value in the column NPX.

           
# Find overlapping samples
npx_1 <- npx_data1 %>% 
  mutate(dataset = "data1")
npx_2 <- npx_data2 %>% 
  mutate(dataset = "data2")

overlap_samples <- intersect(npx_1$SampleID, npx_2$SampleID) %>% 
  data.frame() %>% 
  filter(!str_detect(., 'CONTROL_SAMPLE')) %>% #Remove control samples
  pull(.)
# Perform Bridging normalization
npx_br_data <- olink_normalization(df1 = npx_1, 
                                   df2 = npx_2, 
                                   overlapping_samples_df1 = overlap_samples,
                                   df1_project_nr = '20200001',
                                   df2_project_nr = '20200002',
                                   reference_project = '20200001')
glimpse(npx_br_data)
#> Rows: 61,824
#> Columns: 19
#> $ SampleID      <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "CONTROL…
#> $ Index         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
#> $ OlinkID       <chr> "OID01216", "OID01216", "OID01216", "OID01216", "OID0121…
#> $ UniProt       <chr> "O00533", "O00533", "O00533", "O00533", "O00533", "O0053…
#> $ Assay         <chr> "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", …
#> $ MissingFreq   <dbl> 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.…
#> $ Panel_Version <chr> "v.1201", "v.1201", "v.1201", "v.1201", "v.1201", "v.120…
#> $ PlateID       <chr> "Example_Data_1_CAM.csv", "Example_Data_1_CAM.csv", "Exa…
#> $ QC_Warning    <chr> "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", …
#> $ LOD           <dbl> 2.368467, 2.368467, 2.368467, 2.368467, 2.368467, 2.3684…
#> $ NPX           <dbl> 12.956143, 11.269477, 25.451070, 14.453038, 7.628712, 6.…
#> $ Subject       <chr> "ID1", "ID1", "ID1", "ID2", "ID2", "ID2", "ID3", "ID3", …
#> $ Treatment     <chr> "Untreated", "Untreated", "Untreated", "Untreated", "Unt…
#> $ Site          <chr> "Site_D", "Site_D", "Site_D", "Site_C", "Site_C", "Site_…
#> $ Time          <chr> "Baseline", "Week.6", "Week.12", "Baseline", "Week.6", "…
#> $ Project       <chr> "20200001", "20200001", "20200001", "20200001", "2020000…
#> $ Panel         <chr> "Olink Cardiometabolic", "Olink Cardiometabolic", "Olink…
#> $ dataset       <chr> "data1", "data1", "data1", "data1", "data1", "data1", "d…
#> $ Adj_factor    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…

Perform evaluation of bridging normalization

PCA plot is used to visualize sample-to-sample distance before and after bridging.

## before bridging

### Generate unique SampleIDs
npx_1 <- npx_data1 %>%
    mutate(dataset = "data1") %>%
    mutate(SampleID = paste0(dataset, PlateID, SampleID))

npx_2 <- npx_data2 %>%
    mutate(dataset = "data2") %>%
    mutate(SampleID = paste0(dataset, PlateID, SampleID))

npx_before_br <- rbind(npx_1, npx_2)

### PCA plot
OlinkAnalyze::olink_pca_plot(df = npx_before_br, color_g = "dataset", byPanel = TRUE, coloroption = c("orange",
    "darkblue"))

PCA plot of combined datasets without bridging

## After bridging

### Generate unique SampleIDs
npx_after_br <- npx_br_data %>%
    mutate(SampleID = paste0(dataset, PlateID, SampleID))
### PCA plot
OlinkAnalyze::olink_pca_plot(df = npx_after_br, color_g = "dataset", byPanel = TRUE, coloroption = c("orange",
    "darkblue"))

PCA plot of combined datasets after bridging