title: “Bootstrapping and hazard ratio thresholds” |
author: “Dominic Pearce” |
date: “2017-10-09” |
header-includes: |
- |
output: github_document |
vignette: > |
% |
% |
% |
library(survivALL)
library(Biobase)
library(knitr)
When performing a large number of statistical test, as is integral to the survivALL
rationale, it is important to protect against false positive results using some form of multiple testing correction. For survivALL
this is implemented as a bootstrapping exercise to determine robust thresholds of hazard ratio significance. In short we calculate, for each point-of-separation a upper and lower limit within which we expect to see hazard ratios occur by chance, and beyond which hazard ratios are unlikely (1 in 20) to have occurred by chance.
To achieve this, we randomly sample our survival data with replacement and calculate survival statistics for all points-of-separation, as we would using for a biomarker under investigation. By repeating this procedure 1,000s or 10,000s of times, we produce a distribution of expected hazard ratios, of which we use the mean and standard deviation to calculate our per-point-of-separation upper and lower thresholds.
data(nki_subset)
#bootstrapping data should be in the format of 1 repeat per column
bs_mtx <- matrix(nrow = ncol(nki_subset), ncol = 20)
system.time(
for(i in 1:ncol(bs_mtx)){
bs_mtx[, i] <- allHR(measure = sample(1:ncol(nki_subset),
replace = TRUE),
srv = pData(nki_subset),
time = "t.dmfs",
event = "e.dmfs")
}
)
user system elapsed 24.474 0.317 24.791
kable(bs_mtx[1:20, 1:5])
NA | NA | NA | NA | NA |
-0.6636024 | NA | -0.6498172 | NA | NA |
0.2948655 | NA | -0.0363576 | 0.2002942 | NA |
0.7936404 | NA | -0.7139390 | 0.2002942 | 0.4663900 |
1.1562355 | NA | -0.3604850 | NA | -0.4318112 |
1.3953123 | 0.8903437 | 0.0338967 | -0.3773828 | -0.0921724 |
1.6656537 | 1.1573373 | -0.5522164 | NA | 0.2828075 |
0.7217249 | 0.2363196 | -0.1805967 | NA | -0.3141590 |
0.8928270 | 0.4541722 | -0.0206375 | NA | -0.5281121 |
1.0783287 | 0.0203318 | -0.3614099 | NA | -0.2767970 |
1.2076726 | -0.2578487 | -0.1591703 | NA | -0.1311125 |
1.3155364 | -0.1037536 | -0.0295546 | NA | -0.4169492 |
1.4695730 | -0.3253106 | 0.1494979 | NA | -0.5841185 |
1.5601172 | -0.2050379 | 0.2654024 | -0.4403237 | -0.7834439 |
1.5601172 | -0.0132453 | 0.4034886 | -0.2577967 | -0.6889736 |
1.6628944 | -0.2292343 | 0.1620302 | NA | -0.5143187 |
1.7183333 | -0.3537452 | -0.0783955 | -0.3369952 | -0.3959981 |
1.7975937 | -0.4249202 | -0.2959228 | NA | -0.5227285 |
1.8815282 | -0.5329942 | -0.1943872 | NA | -0.4260811 |
1.9614968 | -0.4339722 | -0.0708462 | NA | -0.3311565 |
Having calculated our bootstrapped data we then simply hand the matrix to either the survivALL()
or plotALL()
functions to handle the subsequent thresholding calculations. It should be noted that thresholding up to 10,000x can be a long process requiring an investment of time.