The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Olink® Analyze Vignette

Olink DS team

2024-02-21

Olink® Analyze is an R package that provides a versatile toolbox to enable fast and easy handling of Olink® NPX data for your proteomics research. Olink® Analyze provides functions for using Olink data, including functions for importing Olink® NPX datasets exported from the NPX Manager, as well as quality control (QC) plot functions and functions for various statistical tests. This package is meant to provide a convenient pipeline for your Olink NPX data analysis.

Installation

You can install Olink® Analyze from CRAN.

install.packages("OlinkAnalyze")

List of functions

Preprocessing

Statistical analysis

Visualization

Sample datasets

Usage

Load the library

# Load OlinkAnalyze
library(OlinkAnalyze)

# Load other libraries used in Vignette
library(dplyr)
library(ggplot2)
library(stringr)

Preprocessing

Read NPX data (read_NPX)

The read_NPX function imports an NPX file of wide format that has been exported from Olink® NPX Manager and converts the data into the (preferred by R) long format. The wide format is the most common way Olink® delivers data for Olink® Target 96, however, for data analysis a long format is preferred. No prior alterations to the output of the NPX Manager should be made for this function to work as expected.

Function arguments

  • filename: Path to the NPX Manager output file.
data <- read_NPX("~/NPX_file_location.xlsx")

Function output

A tibble in long format containing:

  • SampleID: Sample names or IDs.
  • Index: Unique number for each SampleID. It is used to make up for non unique sample IDs.
  • OlinkID: Unique ID for each assay assigned by Olink. In case the assay is included in more than one panels it will have a different OlinkID in each one.
  • UniProt: UniProt ID.
  • Assay: Common gene name for the assay.
  • MissingFreq: Missing frequency for the OlinkID, i.e. frequency of samples with NPX value below limit of detection (LOD).
  • Panel: Olink Panel that samples ran on. Read more about Olink Panels here: https://olink.com/products-services/.
  • Panel_Version: Version of the panel. A new panel version might include some different or improved assays.
  • PlateID: Name of the plate.
  • QC_Warning: Indication whether the sample passed Olink QC. Read more here: https://olink.com/faq/how-is-quality-control-of-the-data-performed/.
  • LOD: Limit of detection (LOD) is the minimum level of an individual protein that can be measured. LOD is defined as 3 times the standard deviation over background.
  • NPX: Normalized Protein eXpression, is Olink’s unit of protein expression level in a log2 scale. The majority of the functions of this package use NPX values for calculations. Read more about NPX here: https://olink.com/faq/what-is-npx/.

Statistical analysis

Post-hoc ANOVA analysis (olink_anova_posthoc)

olink_anova_posthoc performs a post-hoc ANOVA test using the function emmeans from the R library emmeans with Tukey p-value adjustment per assay (by OlinkID) at confidence level 0.95.

The function handles both factor and numerical variables and/or covariates. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation (SD) difference in the numerical variable, e.g. mean NPX at mean (numerical variable) versus mean NPX at mean (numerical variable) + 1*SD (numerical variable).

Function arguments

  • df: NPX data frame in long format should minimally contain protein name (Assay), OlinkID, UniProt, Panel and an outcome factor with at least 3 levels.
  • olinkid_list: Character vector of OlinkID’s on which to perform the post-hoc analysis. If not specified, all assays in df are used.
  • variable: Single character value or character array. In case of single character then that should represent a column in the df. Otherwise, if length > 1, the included variable names will be used in crossed analyses. It can also accept the notations ‘:’ or ’*’.
  • covariates: Single character value or character array. Default: NULL. Confounding factors to include in the analysis. In case of single character then that should represent a column in the df. It can also accept the notations ‘:’ or ’*’, while crossed analysis will not be inferred from main effects.
  • outcome: Name of the column from df that contains the dependent variable. Default: NPX.
  • effect: Term on which to perform the post-hoc analysis. Character vector. Must be subset of or identical to the variable and no adjustment is performed.
  • mean_return: Logical. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE.
  • verbose: Logical. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.
# calculate the p-value for the ANOVA
anova_results_oneway <- olink_anova(df = npx_data1, 
                                    variable = 'Site')
# extracting the significant proteins
anova_results_oneway_significant <- anova_results_oneway %>%
  filter(Threshold == 'Significant') %>%
  pull(OlinkID)
anova_posthoc_oneway_results <- olink_anova_posthoc(df = npx_data1,
                                                    olinkid_list = anova_results_oneway_significant,
                                                    variable = 'Site',
                                                    effect = 'Site')

Function output

A tibble with the following columns:

  • Assay <chr>: Assay name.
  • OlinkID <chr>: Unique Olink ID.
  • UniProt <chr>: UniProt ID.
  • Panel <chr>: Olink Panel.
  • term <chr>: Name of the variable that was used for the p-value calculation. The “:” between variables indicates interaction between variables.
  • contrast <chr>: Variables (in term) that are compared.
  • estimate <dbl>: Difference in mean NPX between variables (from contrast).
  • conf.low <dbl>: Low bound of the confidence interval for the mean.
  • conf.high <dbl>: High bound of the confidence interval for the mean.
  • Adjusted_pval <dbl>: Adjusted p-value for the test (Benjamini & Hochberg).
  • Threshold <chr>: Text indication if assay is significant (adjusted p-value < 0.05).

Post-hoc one way non-parametric analysis (olink_one_non_parametric_posthoc)

olink_one_non_parametric_posthoc performs a post-hoc Wilcoxon test using the function wilcox_test from the R library rstatix with Benjamini & Hochberg p-value adjustment per assay (by OlinkID) at confidence level 0.95. The function handles both factor and numerical variables and/or covariates.

Function arguments

  • df: NPX data frame in long format should minimally contain protein name (Assay), OlinkID, UniProt, Panel and an outcome factor with at least 3 levels.
  • olinkid_list: Character vector of OlinkID’s on which to perform the post-hoc analysis. If not specified, all assays in df are used.
  • variable: Single character value or character array. In case of single character then that should represent a column in the df.
  • verbose: Logical. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.
#Friedman Test
Friedman_results <- olink_one_non_parametric(df = npx_data1, 
                                             variable = "Time", 
                                             subject = "Subject",
                                             dependence = TRUE)

#Filtering out significant and relevant results.
significant_assays <- Friedman_results %>%
  filter(Threshold == 'Significant') %>%
  dplyr::select(OlinkID) %>%
  distinct() %>%
  pull()

#Posthoc test for the results from Friedman Test
friedman_posthoc_results <- olink_one_non_parametric_posthoc(npx_data1, 
                                                             variable = "Time", 
                                                             test = "friedman",
                                                             olinkid_list = significant_assays)

Function output

A tibble with the following columns:

  • Assay <chr>: Assay name.
  • OlinkID <chr>: Unique Olink ID.
  • UniProt <chr>: UniProt ID.
  • Panel <chr>: Olink Panel.
  • term <chr>: Name of the variable that was used for the p-value calculation.
  • contrast <chr>: Variables (in term) that are compared.
  • estimate <dbl>: Difference in mean NPX between variables (from contrast).
  • conf.low <dbl>: Low bound of the confidence interval for the location parameter.
  • conf.high <dbl>: High bound of the confidence interval for the location parameter.
  • Adjusted_pval <dbl>: Adjusted p-value for the test (Benjamini & Hochberg).
  • Threshold <chr>: Text indication if assay is significant (adjusted p-value < 0.05).

Post-hoc of regression models for ordinal data analysis (olink_ordinalRegression_posthoc)

olink_ordinalRegression_posthoc performs a post-hoc ANOVA test using the function emmeans from the R library emmeans with Tukey p-value adjustment per assay (by OlinkID) at confidence level 0.95. The function handles both factor and numerical variables and/or covariates.

Function arguments

  • df: NPX data frame in long format should minimally contain protein name (Assay), OlinkID, UniProt, Panel and an outcome factor with at least 3 levels.
  • olinkid_list: Character vector of OlinkID’s on which to perform the post-hoc analysis. If not specified, all assays in df are used.
  • variable: Single character value or character array. In case of single character then that should represent a column in the df. Otherwise, if length > 1, the included variable names will be used in crossed analyses. It can also accept the notations ‘:’ or ’*’.
  • covariates: Single character value or character array. Default: NULL. Confounding factors to include in the analysis. In case of single character then that should represent a column in the df. It can also accept the notations ‘:’ or ’*’, while crossed analysis will not be inferred from main effects.
  • outcome: Name of the column from df that contains the dependent variable. Default: NPX.
  • effect: Term on which to perform the post-hoc analysis. Character vector. Must be subset of or identical to the variable and no adjustment is performed.
  • mean_return: Logical. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE.
  • verbose: Logical. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.
# Two-way Ordinal Regression
ordinalRegression_results <- olink_ordinalRegression(df = npx_data1,
                             variable="Treatment:Time")
# extracting the significant proteins
significant_assays <- ordinalRegression_results %>% 
  filter(Threshold == 'Significant' & term == 'Treatment:Time') %>%
  select(OlinkID) %>%
  distinct() %>%
  pull()
# Posthoc test for the model NPX~Treatment*Time,
ordinalRegression_posthoc_results <- olink_ordinalRegression_posthoc(npx_data1, 
                                                                     variable=c("Treatment:Time"),
                                                                     covariates="Site",
                                                                     olinkid_list = significant_assays,
                                                                     effect = "Treatment:Time")

Function output

A tibble with the following columns:

  • Assay <chr>: Assay name.
  • OlinkID <chr>: Unique Olink ID.
  • UniProt <chr>: UniProt ID.
  • Panel <chr>: Olink Panel.
  • term <chr>: Name of the variable that was used for the p-value calculation. The “:” between variables indicates interaction between variables.
  • contrast <chr>: Variables (in term) that are compared.
  • estimate <dbl>: Difference in mean NPX between variables (from contrast).
  • Adjusted_pval <dbl>: Adjusted p-value for the test (Benjamini & Hochberg).
  • Threshold <chr>: Text indication if assay is significant (adjusted p-value < 0.05).

Post-hoc linear mixed effects model analysis (olink_lmer_posthoc)

The olink_lmer_posthoc function is similar to olink_lmer but performs a post-hoc analysis based on a linear mixed model effects model using the function lmer from the R library lmerTest and the function emmeans from the R library emmeans. The function handles both factor and numerical variables and/or covariates. Differences in estimated marginal means are calculated for all pairwise levels of a given output variable. Degrees of freedom are estimated using Satterthwaite’s approximation. The post-hoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g. mean NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable). The output tibble is arranged by ascending adjusted p-values.

Function arguments

  • df: NPX data frame in long format should minimally contain protein name (Assay), OlinkID, UniProt, Panel and 1-2 variables with at least 2 levels and subject ID.
  • variable: Single character value or character array. In case of single character then that should represent a column in the df. Otherwise, if length > 1, the included variable names will be used in crossed analyses. It can also accept the notations ‘:’ or ’*’.
  • olinkid_list: Character vector of OlinkID’s on which to perform the post-hoc analysis. If not specified, all assays in df are used.
  • effect: Term on which to perform the post-hoc analysis. Character vector. Must be subset of or identical to the variable.
  • outcome: Name of the column from df that contains the dependent variable. Default: NPX.
  • random: Single character value or character array with random effects.
  • covariates: Single character value or character array. Default: NULL. Confounding factors to include in the analysis. In case of single character then that should represent a column in the df. It can also accept the notations ‘:’ or ’*’, while crossed analysis will not be inferred from main effects.
  • mean_return: Logical. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE and no adjustment is performed.
  • verbose: Logical. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.
# Linear mixed model with two variables.
lmer_results_twoway <- olink_lmer(df = npx_data1, 
                                  variable = c('Site', 'Treatment'),
                                  random = 'Subject')
# extracting the significant proteins
lmer_results_twoway_significant <- lmer_results_twoway %>%
  filter(Threshold == 'Significant', term == 'Treatment') %>%
  pull(OlinkID)
# performing post-hoc analysis
lmer_posthoc_twoway_results <- olink_lmer_posthoc(df = npx_data1,
                                                  olinkid_list = lmer_results_twoway_significant,
                                                  variable = c('Site', 'Treatment'),
                                                  random = 'Subject',
                                                  effect = 'Treatment') 

Function output

A tibble with the following columns:

  • Assay <chr>: Assay name.
  • OlinkID <chr>: Unique Olink ID.
  • UniProt <chr>: UniProt ID.
  • Panel <chr>: Olink Panel.
  • term <chr>: Name of the variable that was used for the p-value calculation. The “:” between variables indicates interaction between variables.
  • contrast <chr>: Variables (in term) that are compared.
  • estimate <dbl>: Difference in mean NPX between variables (from contrast).
  • conf.low <dbl>: Low bound of the confidence interval for the mean.
  • conf.high <dbl>: High bound of the confidence interval for the mean.
  • Adjusted_pval <dbl>: Adjusted p-value for the test (Benjamini & Hochberg).
  • Threshold <chr>: Text indication if assay is significant (adjusted p-value < 0.05).

Exploratory analysis

Visualization

Theming function (set_plot_theme)

This function sets a coherent plot theme for plots by adding it to a ggplot object. It is mainly used for aesthetic reasons.

npx_data1 %>% 
  filter(OlinkID == 'OID01216') %>% 
  ggplot(aes(x = Treatment, y = NPX, fill = Treatment)) +
  geom_boxplot() +
  set_plot_theme()

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.