The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette provides a conceptual overview of the statistical
methods implemented in valytics. The goal is to help you
understand what the numbers mean and how to think
about them, not to prescribe specific acceptance criteria or make
decisions for you.
Whether your analysis “passes” or “fails” depends entirely on your specific application, regulatory requirements, and clinical context. This package provides the tools; you and your organization define what constitutes acceptable agreement.
The bias (mean difference) quantifies the average systematic offset between two methods. It answers: “On average, how much higher or lower does method Y read compared to method X?”
data("creatinine_serum")
ba <- ba_analysis(
x = creatinine_serum$enzymatic,
y = creatinine_serum$jaffe
)cat("Bias:", round(ba$results$bias, 3), "mg/dL\n")
#> Bias: 0.174 mg/dL
cat("95% CI:", round(ba$results$bias_ci["lower"], 3), "to",
round(ba$results$bias_ci["upper"], 3), "\n")
#> 95% CI: 0.127 to 0.22What this tells you:
What this does NOT tell you:
The limits of agreement (LoA) define an interval expected to contain 95% of the differences between methods. They answer: “For a randomly selected sample, how much could the two methods disagree?”
cat("Lower LoA:", round(ba$results$loa_lower, 3), "\n")
#> Lower LoA: -0.236
cat("Upper LoA:", round(ba$results$loa_upper, 3), "\n")
#> Upper LoA: 0.584
cat("Width:", round(ba$results$loa_upper - ba$results$loa_lower, 3), "\n")
#> Width: 0.82The LoA represent the range of disagreement you can expect in practice. A narrow LoA indicates consistent agreement; a wide LoA indicates variable differences.
Key insight: The LoA are often more informative than the bias alone. Two methods might have negligible average bias but wide limits of agreement, meaning individual measurements could differ substantially.
The Bland-Altman plot provides a visual assessment:
Bland-Altman plot showing differences vs. averages.
What to look for:
Bland-Altman analysis assumes normally distributed differences. The summary provides a Shapiro-Wilk test:
summ <- summary(ba)
if (!is.null(summ$normality_test)) {
cat("Shapiro-Wilk p-value:", round(summ$normality_test$p.value, 4), "\n")
}
#> Shapiro-Wilk p-value: 0A low p-value suggests non-normality. Consider:
ggplot(data.frame(diff = ba$results$differences), aes(x = diff)) +
geom_histogram(aes(y = after_stat(density)), bins = 15,
fill = "steelblue", alpha = 0.7) +
geom_density(linewidth = 1) +
labs(x = "Difference (Jaffe - Enzymatic)", y = "Density") +
theme_minimal()Distribution of differences.
Passing-Bablok regression fits a line:
Y = intercept + slope * X
The parameters have direct interpretations:
cat("Slope:", round(pb$results$slope, 4), "\n")
#> Slope: 0.9711
cat(" 95% CI:", round(pb$results$slope_ci["lower"], 4), "to",
round(pb$results$slope_ci["upper"], 4), "\n")
#> 95% CI: 0.9661 to 0.9741
cat("Intercept:", round(pb$results$intercept, 4), "\n")
#> Intercept: 0.2339
cat(" 95% CI:", round(pb$results$intercept_ci["lower"], 4), "to",
round(pb$results$intercept_ci["upper"], 4), "\n")
#> 95% CI: 0.2288 to 0.2387How to read the confidence intervals:
You can use the regression equation to estimate expected differences at specific concentrations:
# At various concentrations, what's the expected difference?
concentrations <- c(0.8, 1.3, 3.0, 6.0)
for (conc in concentrations) {
expected_y <- pb$results$intercept + pb$results$slope * conc
difference <- expected_y - conc
cat(sprintf("At X = %.1f: expected Y = %.3f, difference = %.3f\n",
conc, expected_y, difference))
}
#> At X = 0.8: expected Y = 1.011, difference = 0.211
#> At X = 1.3: expected Y = 1.496, difference = 0.196
#> At X = 3.0: expected Y = 3.147, difference = 0.147
#> At X = 6.0: expected Y = 6.060, difference = 0.060This helps translate abstract regression parameters into concrete, application-specific terms.
The CUSUM test evaluates whether a linear model is appropriate:
cat("CUSUM statistic:", round(pb$cusum$statistic, 4), "\n")
#> CUSUM statistic: 0.97
cat("p-value:", round(pb$cusum$p_value, 4), "\n")
#> p-value: 0.3036A significant result (conventionally p < 0.05) suggests the relationship may not be linear across the measurement range. If non-linearity is detected:
CUSUM plot for linearity assessment.
High correlation between methods is often reported but can be misleading:
r <- cor(creatinine_serum$enzymatic, creatinine_serum$jaffe)
cat("Correlation coefficient:", round(r, 4), "\n")
#> Correlation coefficient: 0.9952Correlation measures whether methods rank samples similarly, not whether they give the same values. Two methods with r = 1 but different calibrations would show systematic bias that correlation fails to detect.
Your results depend on:
Be cautious about extrapolating beyond the conditions of your study.
A statistically significant bias (CI excludes zero) may or may not be practically important. Consider:
# Example: Is a bias of X clinically meaningful?
# This depends entirely on YOUR application
bias_value <- ba$results$bias
cat("Observed bias:", round(bias_value, 3), "mg/dL\n")
#> Observed bias: 0.174 mg/dL
cat("\nWhether this is 'acceptable' depends on:\n")
#>
#> Whether this is 'acceptable' depends on:
cat("- Your specific clinical decision thresholds\n")
#> - Your specific clinical decision thresholds
cat("- Regulatory requirements for your application\n")
#> - Regulatory requirements for your application
cat("- Intended use of the measurement\n")
#> - Intended use of the measurement
cat("- Established performance goals (CLIA, biological variation, etc.)\n")
#> - Established performance goals (CLIA, biological variation, etc.)Here’s how to extract key statistics for reporting:
# Bland-Altman summary
cat("=== Bland-Altman Analysis ===\n")
#> === Bland-Altman Analysis ===
cat(sprintf("n = %d\n", ba$input$n))
#> n = 80
cat(sprintf("Bias: %.3f (95%% CI: %.3f to %.3f)\n",
ba$results$bias,
ba$results$bias_ci["lower"],
ba$results$bias_ci["upper"]))
#> Bias: 0.174 (95% CI: 0.127 to 0.220)
cat(sprintf("SD of differences: %.3f\n", ba$results$sd_diff))
#> SD of differences: 0.209
cat(sprintf("LoA: %.3f to %.3f\n\n",
ba$results$loa_lower,
ba$results$loa_upper))
#> LoA: -0.236 to 0.584
# Passing-Bablok summary
cat("=== Passing-Bablok Regression ===\n")
#> === Passing-Bablok Regression ===
cat(sprintf("Slope: %.4f (95%% CI: %.4f to %.4f)\n",
pb$results$slope,
pb$results$slope_ci["lower"],
pb$results$slope_ci["upper"]))
#> Slope: 0.9711 (95% CI: 0.9661 to 0.9741)
cat(sprintf("Intercept: %.4f (95%% CI: %.4f to %.4f)\n",
pb$results$intercept,
pb$results$intercept_ci["lower"],
pb$results$intercept_ci["upper"]))
#> Intercept: 0.2339 (95% CI: 0.2288 to 0.2387)
cat(sprintf("CUSUM p-value: %.4f\n", pb$cusum$p_value))
#> CUSUM p-value: 0.3036The valytics package provides three complementary
approaches for method comparison. Each has strengths suited to different
scenarios.
| Aspect | Bland-Altman | Passing-Bablok | Deming |
|---|---|---|---|
| Primary question | How well do methods agree? | Is there systematic bias? | Is there systematic bias? |
| Statistical approach | Descriptive statistics | Non-parametric regression | Parametric regression |
| Error assumption | Differences ~ Normal | Distribution-free | Errors ~ Normal |
| Outlier handling | Sensitive | Robust | Sensitive |
| Output focus | Bias, limits of agreement | Slope, intercept CIs | Slope, intercept, SEs |
| Sample size | n >= 30 recommended | n >= 30 for stable CIs | n >= 10 feasible |
| Best when | Defining acceptable agreement | Outliers present, unknown error | Known error ratio, small n |
In practice, using multiple methods provides a more complete picture:
# Complete method comparison workflow
ba <- ba_analysis(reference ~ test, data = mydata)
pb <- pb_regression(reference ~ test, data = mydata)
dm <- deming_regression(reference ~ test, data = mydata)
# Bland-Altman for agreement assessment
summary(ba)
plot(ba)
# Compare regression methods
cat("Passing-Bablok slope:", pb$results$slope, "\n")
cat("Deming slope:", dm$results$slope, "\n")If Passing-Bablok and Deming give similar results, you can be more confident in the conclusions. If they differ substantially, investigate why (outliers? non-normality? heteroscedasticity?).
The valytics package provides statistical tools for
method comparison. It calculates:
These statistics describe the relationship between methods. Whether that relationship is “acceptable” for your purpose is a separate question that depends on:
The package reports what the data show. You decide what it means for your application.
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307-310.
Bland JM, Altman DG. Measuring agreement in method comparison studies. Statistical Methods in Medical Research. 1999;8(2):135-160.
Passing H, Bablok W. A new biometrical procedure for testing the equality of measurements from two different analytical methods. Journal of Clinical Chemistry and Clinical Biochemistry. 1983;21(11):709-720.
Westgard JO, Hunt MR. Use and interpretation of common statistical tests in method-comparison studies. Clinical Chemistry. 1973;19(1):49-57.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.