Method Comparison Workflow

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Marcello Grassi

Introduction

Method comparison studies are fundamental in clinical laboratories and biotech research. When introducing a new analytical method, we must demonstrate that it produces results comparable to an established reference method. This vignette walks through a complete method comparison workflow using the valytics package.

The statistical approaches implemented in valytics follow well-established methodology from the clinical chemistry literature. We focus on two complementary techniques: Bland-Altman analysis for assessing agreement and Passing-Bablok regression for evaluating systematic differences.

Getting Started

library(valytics)
library(ggplot2)

We will use the glucose_methods dataset included in the package. This dataset contains paired measurements from a point-of-care (POC) glucose meter and a laboratory reference analyzer on 60 patient samples.

data("glucose_methods")
head(glucose_methods)
#>   sample_id reference poc_meter
#> 1    GLU026       118       131
#> 2    GLU022       113       110
#> 3    GLU043        83        77
#> 4    GLU005        51        57
#> 5    GLU016       112       121
#> 6    GLU010        77        77

Before diving into statistical analysis, it is always good practice to visualize the raw data:

ggplot(glucose_methods, aes(x = reference, y = poc_meter)) +
  geom_point(alpha = 0.7) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray50") +
  labs(
    x = "Reference Method (mg/dL)",
    y = "POC Meter (mg/dL)",
    title = "Glucose Method Comparison"
  ) +
  coord_fixed() +
  theme_minimal()

Scatter plot of POC vs laboratory glucose measurements with identity line.

The points cluster around the identity line, suggesting reasonable agreement. Now let us quantify this agreement using appropriate statistical methods.

Bland-Altman Analysis

Bland-Altman analysis, introduced by Bland and Altman (1986), assesses agreement between two measurement methods by examining the differences between paired measurements. Rather than correlation, which can be misleading for method comparison, this approach focuses on clinically meaningful questions: How large are the differences? Is there systematic bias?

Running the Analysis

The ba_analysis() function accepts paired measurements as vectors or via a formula interface:

# Vector interface
ba <- ba_analysis(
  x = glucose_methods$reference,
  y = glucose_methods$poc_meter
)

# Alternative: formula interface
# ba <- ba_analysis(reference ~ poc_meter, data = glucose_methods)

ba
#> 
#> Bland-Altman Analysis
#> ---------------------------------------- 
#> n = 60 paired observations
#> 
#> Difference type: Absolute (y - x)
#> Confidence level: 95%
#> 
#> Results:
#>   Bias (mean difference): 5.700
#>     95% CI: [3.539, 7.861]
#>   SD of differences: 8.365
#> 
#> Limits of Agreement:
#>   Lower LoA: -10.695
#>     95% CI: [-14.409, -6.982]
#>   Upper LoA: 22.095
#>     95% CI: [18.382, 25.809]

The print output shows the mean difference (bias) and the 95% limits of agreement (LoA). These limits represent the range within which 95% of differences between the two methods are expected to fall.

Interpreting Results

The summary() method provides additional statistical details:

summary(ba)
#> 
#> Bland-Altman Analysis - Detailed Summary
#> ================================================== 
#> 
#> Call:
#> ba_analysis(x = glucose_methods$reference, y = glucose_methods$poc_meter)
#> 
#> Sample size: n = 60
#> Variables: x = 'x', y = 'y'
#> Difference type: Absolute (y - x)
#> Confidence level: 95%
#> 
#> -------------------------------------------------- 
#> Descriptive Statistics:
#> -------------------------------------------------- 
#>  Variable  N  Mean    SD Median Min Max
#>         x 60 131.0 73.54    111  48 323
#>         y 60 136.7 74.19    118  54 342
#> 
#> -------------------------------------------------- 
#> Agreement Statistics:
#> -------------------------------------------------- 
#>  Statistic Estimate CI_Lower_95% CI_Upper_95%
#>       Bias      5.7        3.539        7.861
#>  Lower LoA    -10.7      -14.409       -6.982
#>  Upper LoA     22.1       18.382       25.809
#> 
#> SD of differences: 8.3652
#> 
#> -------------------------------------------------- 
#> Normality of Differences (Shapiro-Wilk test):
#> -------------------------------------------------- 
#> W = 0.9567, p-value = 3.25e-02
#> Note: p < 0.05 suggests differences may not be normally distributed.
#>       Consider inspecting the Bland-Altman plot for patterns.

Key outputs to examine:

Bias: The mean difference between methods. A bias significantly different from zero indicates systematic over- or under-estimation by one method.
Limits of Agreement: The 95% LoA define the expected range of differences. Whether these limits are clinically acceptable depends on the intended use of the measurement.
Normality test: The Shapiro-Wilk test checks whether differences follow a normal distribution, an assumption underlying the LoA calculation.

Visualization

The Bland-Altman plot displays differences against the average of paired measurements:

plot(ba)

Bland-Altman plot showing bias and 95% limits of agreement.

This plot reveals several important features:

Points scattered randomly around the bias line suggest no relationship between measurement magnitude and agreement.
The shaded bands represent 95% confidence intervals for the bias and LoA.
Points outside the LoA may warrant individual investigation.

For publication-quality figures, you can use autoplot() with additional ggplot2 customization:

autoplot(ba) +
  labs(title = "POC Meter vs Reference Analyzer Agreement") +
  theme_bw()

Customized Bland-Altman plot.

Percentage Differences

When the magnitude of measurements varies widely, percentage differences can be more informative than absolute differences:

ba_pct <- ba_analysis(
  x = glucose_methods$reference,
  y = glucose_methods$poc_meter,
  type = "percent"
)
ba_pct
#> 
#> Bland-Altman Analysis
#> ---------------------------------------- 
#> n = 60 paired observations
#> 
#> Difference type: Percent (y - x)
#> Confidence level: 95%
#> 
#> Results:
#>   Bias (mean difference): 5.035
#>     95% CI: [3.516, 6.555]
#>   SD of differences: 5.882
#> 
#> Limits of Agreement:
#>   Lower LoA: -6.494
#>     95% CI: [-9.105, -3.882]
#>   Upper LoA: 16.564
#>     95% CI: [13.952, 19.175]

plot(ba_pct)

Bland-Altman plot with percentage differences.

Percentage-based LoA are particularly useful when acceptable differences scale with measurement magnitude.

Passing-Bablok Regression

While Bland-Altman analysis assesses overall agreement, Passing-Bablok regression (1983) specifically addresses two questions: Is there a constant bias (intercept different from 0)? Is there a proportional bias (slope different from 1)?

This non-parametric regression method is robust to outliers and does not assume that measurement errors occur in only one method, making it well-suited for method comparison studies.

Running the Regression

pb <- pb_regression(
  x = glucose_methods$reference,
  y = glucose_methods$poc_meter
)
pb
#> 
#> Passing-Bablok Regression
#> ---------------------------------------- 
#> n = 60 paired observations
#> 
#> CI method: Analytical (Passing-Bablok 1983)
#> Confidence level: 95%
#> 
#> Regression equation:
#>   glucose_methods$poc_meter = 2.861 + 1.028 * glucose_methods$reference
#> 
#> Results:
#>   Intercept: 2.861
#>     95% CI: [2.074, 4.605]
#>     (excludes 0: significant constant bias)
#> 
#>   Slope: 1.028
#>     95% CI: [1.013, 1.037]
#>     (excludes 1: significant proportional bias)

Interpreting Results

summary(pb)
#> 
#> Passing-Bablok Regression - Detailed Summary
#> ================================================== 
#> 
#> Data:
#>   X variable: glucose_methods$reference
#>   Y variable: glucose_methods$poc_meter
#>   Sample size: 60
#> 
#> Settings:
#>   Confidence level: 95%
#>   CI method: Analytical (Passing-Bablok 1983)
#> 
#> Regression Coefficients:
#> -------------------------------------------------- 
#>           Estimate 95% Lower 95% Upper
#> Intercept   2.8611    2.0741    4.6051
#> Slope       1.0278    1.0127    1.0370
#> 
#> Regression equation:
#>   glucose_methods$poc_meter = 2.8611 + 1.0278 * glucose_methods$reference
#> 
#> Linearity Test (CUSUM):
#> -------------------------------------------------- 
#>   Test statistic: 0.9834
#>   Critical value (alpha = 0.05): 1.36
#>   p-value: 0.2882
#>   Result: Linearity assumption is satisfied (p >= 0.05)
#> 
#> Interpretation:
#> -------------------------------------------------- 
#>   Intercept: CI excludes 0 (2.074 to 4.605)
#>     -> Significant positive constant bias of 2.861
#>   Slope: CI excludes 1 (1.013 to 1.037)
#>     -> Significant proportional bias of 2.8%
#> 
#> Conclusion:
#> -------------------------------------------------- 
#>   The two methods show SYSTEMATIC DIFFERENCES:
#>     - Constant bias: 2.861 glucose_methods$poc_meter
#>     - Proportional bias: 2.8%
#> 
#> Residuals (perpendicular):
#> -------------------------------------------------- 
#>       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
#> -20.804423  -2.702250  -0.009685  -0.558530   2.416529  14.315148

The summary provides hypothesis tests and confidence intervals for the regression parameters:

Intercept: If the 95% CI includes 0, there is no evidence of constant systematic difference.
Slope: If the 95% CI includes 1, there is no evidence of proportional systematic difference.
CUSUM test: Assesses linearity of the relationship. A significant result (p < 0.05) suggests deviation from linearity.

When both the intercept includes 0 and the slope includes 1, we conclude that the methods are statistically equivalent.

Visualization

The scatter plot shows the fitted regression line with confidence band:

plot(pb, type = "scatter")

Passing-Bablok regression with 95% confidence band.

The dashed identity line (y = x) serves as a reference. If the methods were in perfect agreement, the regression line would coincide with the identity line.

Residual plots help assess model assumptions:

plot(pb, type = "residuals")

Perpendicular residuals from Passing-Bablok regression.

Residuals should scatter randomly around zero without obvious patterns. Trends or heteroscedasticity may indicate violations of the linearity assumption.

The CUSUM plot provides a visual assessment of linearity:

plot(pb, type = "cusum")

CUSUM plot for linearity assessment.

Points should remain within the boundary lines if the linear model is appropriate. Deviations suggest non-linear relationships that may require transformation or alternative modeling approaches.

Bootstrap Confidence Intervals

For smaller sample sizes or when parametric assumptions are questionable, bootstrap confidence intervals provide a robust alternative:

pb_boot <- pb_regression(
  x = glucose_methods$reference,
  y = glucose_methods$poc_meter,
  ci_method = "bootstrap",
  boot_n = 1999
)
summary(pb_boot)
#> 
#> Passing-Bablok Regression - Detailed Summary
#> ================================================== 
#> 
#> Data:
#>   X variable: glucose_methods$reference
#>   Y variable: glucose_methods$poc_meter
#>   Sample size: 60
#> 
#> Settings:
#>   Confidence level: 95%
#>   CI method: Bootstrap BCa (n = 1999)
#> 
#> Regression Coefficients:
#> -------------------------------------------------- 
#>           Estimate 95% Lower 95% Upper
#> Intercept   2.8611   -2.9388    6.0000
#> Slope       1.0278    0.9867    1.0577
#> 
#> Regression equation:
#>   glucose_methods$poc_meter = 2.8611 + 1.0278 * glucose_methods$reference
#> 
#> Linearity Test (CUSUM):
#> -------------------------------------------------- 
#>   Test statistic: 0.9834
#>   Critical value (alpha = 0.05): 1.36
#>   p-value: 0.2882
#>   Result: Linearity assumption is satisfied (p >= 0.05)
#> 
#> Interpretation:
#> -------------------------------------------------- 
#>   Intercept: CI includes 0
#>     -> No significant constant (additive) bias
#>   Slope: CI includes 1
#>     -> No significant proportional (multiplicative) bias
#> 
#> Conclusion:
#> -------------------------------------------------- 
#>   The two methods are EQUIVALENT within the measured range.
#>   No systematic differences detected.
#> 
#> Residuals (perpendicular):
#> -------------------------------------------------- 
#>       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
#> -20.804423  -2.702250  -0.009685  -0.558530   2.416529  14.315148

The BCa (bias-corrected and accelerated) bootstrap method adjusts for potential bias and skewness in the bootstrap distribution.

Putting It All Together

A complete method comparison report typically includes both analyses. Here is a summary workflow:

# 1. Load and inspect data
data("glucose_methods")

# 2. Bland-Altman analysis for agreement assessment
ba <- ba_analysis(reference ~ poc_meter, data = glucose_methods)
summary(ba)
plot(ba)

# 3. Passing-Bablok regression for systematic differences
pb <- pb_regression(reference ~ poc_meter, data = glucose_methods)
summary(pb)
plot(pb, type = "scatter")
plot(pb, type = "cusum")

# 4. Document conclusions
# - Bias and LoA from Bland-Altman
# - Slope and intercept CIs from Passing-Bablok
# - Clinical interpretation based on acceptable performance criteria

Handling Missing Data

Both ba_analysis() and pb_regression() handle missing values through the na_action parameter:

# Create data with missing values for demonstration
glucose_missing <- glucose_methods
glucose_missing$poc_meter[c(5, 15, 25)] <- NA

# Default behavior: remove pairs with missing values
ba_complete <- ba_analysis(
  reference ~ poc_meter,
  data = glucose_missing,
  na_action = "omit"
)

# Require complete cases (will error if any NA present)
# ba_strict <- ba_analysis(
#   reference ~ poc_meter,
#   data = glucose_missing,
#   na_action = "fail"
# )

Additional Datasets

The package includes two additional datasets for exploring different scenarios:

# Creatinine: enzymatic vs Jaffe methods
data("creatinine_serum")
head(creatinine_serum)
#>   sample_id enzymatic jaffe
#> 1  CREAT056      2.10  2.46
#> 2  CREAT022      1.20  1.47
#> 3  CREAT050      2.50  2.34
#> 4  CREAT024      0.83  1.07
#> 5  CREAT063      1.56  1.65
#> 6  CREAT039      0.58  0.77

# High-sensitivity troponin: two immunoassay platforms
data("troponin_cardiac")
head(troponin_cardiac)
#>   sample_id platform_a platform_b
#> 1   TROP020      111.0       92.0
#> 2   TROP023       42.9       33.4
#> 3   TROP008        3.8        3.7
#> 4   TROP002       33.7       31.9
#> 5   TROP011      910.0      750.0
#> 6   TROP031       52.0       45.8

These datasets represent different analytical challenges: the creatinine data includes known interferences affecting the Jaffe method at low concentrations, while the troponin data covers a wide dynamic range typical of cardiac biomarkers.

References

Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307-310.

Bland JM, Altman DG. Measuring agreement in method comparison studies. Statistical Methods in Medical Research. 1999;8(2):135-160.

Passing H, Bablok W. A new biometrical procedure for testing the equality of measurements from two different analytical methods. Journal of Clinical Chemistry and Clinical Biochemistry. 1983;21(11):709-720.

Passing H, Bablok W. Comparison of several regression procedures for method comparison studies and determination of sample sizes. Journal of Clinical Chemistry and Clinical Biochemistry. 1984;22(6):431-445.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.