title: “Bootstrapping and hazard ratio thresholds”
author: “Dominic Pearce”
date: “2017-10-09”
header-includes:
-
output: github_document
vignette: >
%
%
%

 

Bootstrapping and hazard ratio thresholds

 

Libraries

library(survivALL)
library(Biobase)
library(knitr)

 

When performing a large number of statistical test, as is integral to the survivALL rationale, it is important to protect against false positive results using some form of multiple testing correction. For survivALL this is implemented as a bootstrapping exercise to determine robust thresholds of hazard ratio significance. In short we calculate, for each point-of-separation a upper and lower limit within which we expect to see hazard ratios occur by chance, and beyond which hazard ratios are unlikely (1 in 20) to have occurred by chance.

To achieve this, we randomly sample our survival data with replacement and calculate survival statistics for all points-of-separation, as we would using for a biomarker under investigation. By repeating this procedure 1,000s or 10,000s of times, we produce a distribution of expected hazard ratios, of which we use the mean and standard deviation to calculate our per-point-of-separation upper and lower thresholds.

 

data(nki_subset)

#bootstrapping data should be in the format of 1 repeat per column
bs_mtx <- matrix(nrow = ncol(nki_subset), ncol = 20)

system.time(
            for(i in 1:ncol(bs_mtx)){
                bs_mtx[, i] <- allHR(measure = sample(1:ncol(nki_subset), 
                                                      replace = TRUE),
                                     srv = pData(nki_subset),
                                     time = "t.dmfs",
                                     event = "e.dmfs")
            }
)

user system elapsed 24.474 0.317 24.791


kable(bs_mtx[1:20, 1:5])
NA NA NA NA NA
-0.6636024 NA -0.6498172 NA NA
0.2948655 NA -0.0363576 0.2002942 NA
0.7936404 NA -0.7139390 0.2002942 0.4663900
1.1562355 NA -0.3604850 NA -0.4318112
1.3953123 0.8903437 0.0338967 -0.3773828 -0.0921724
1.6656537 1.1573373 -0.5522164 NA 0.2828075
0.7217249 0.2363196 -0.1805967 NA -0.3141590
0.8928270 0.4541722 -0.0206375 NA -0.5281121
1.0783287 0.0203318 -0.3614099 NA -0.2767970
1.2076726 -0.2578487 -0.1591703 NA -0.1311125
1.3155364 -0.1037536 -0.0295546 NA -0.4169492
1.4695730 -0.3253106 0.1494979 NA -0.5841185
1.5601172 -0.2050379 0.2654024 -0.4403237 -0.7834439
1.5601172 -0.0132453 0.4034886 -0.2577967 -0.6889736
1.6628944 -0.2292343 0.1620302 NA -0.5143187
1.7183333 -0.3537452 -0.0783955 -0.3369952 -0.3959981
1.7975937 -0.4249202 -0.2959228 NA -0.5227285
1.8815282 -0.5329942 -0.1943872 NA -0.4260811
1.9614968 -0.4339722 -0.0708462 NA -0.3311565

 

Having calculated our bootstrapped data we then simply hand the matrix to either the survivALL() or plotALL() functions to handle the subsequent thresholding calculations. It should be noted that thresholding up to 10,000x can be a long process requiring an investment of time.