Bootstrapping and hazard ratio thresholds

Dominic Pearce, The Institute of Genetics and Molecular Medicine, The University of Edinburgh
2018-03-01

 

Bootstrapping and hazard ratio thresholds

 

Libraries

library(survivALL)
library(Biobase)
library(knitr)

 

To determine and ensure reliable prognostic association as a measure of significance, survivALL can perform a non-parametric bootstrapping procedure. In short we calculate, for each point-of-separation a distribution of expected hazard ratios (HRs), against which we're able to compare our observed HRs as part of our analysis.

To achieve this, we randomly sample our survival data with replacement and then calculate survival statistics for all points-of-separation, exactly as we would for a biomarker under investigation. By repeating this procedure 1,000s or 10,000s of times, we produce our distribution of expected hazard ratios.

 

data(nki_subset)

#bootstrapping data should be in the format of 1 repeat per column
bs_mtx <- matrix(nrow = ncol(nki_subset), ncol = 20)

system.time(
            for(i in 1:ncol(bs_mtx)){
                bs_mtx[, i] <- allHR(measure = sample(1:ncol(nki_subset), 
                                                      replace = TRUE),
                                     srv = pData(nki_subset),
                                     time = "t.dmfs",
                                     event = "e.dmfs")
            }
)

user system elapsed 24.175 0.319 24.494


kable(bs_mtx[1:20, 1:5])
NA NA NA NA NA
NA -0.3599171 -1.4197454 NA -2.0119266
0.4293359 NA -0.3909950 0.0787256 -2.4384185
0.8579847 -0.7296944 0.3616208 -0.7586645 -1.5438291
0.2333717 -0.1874635 0.6971450 -1.1601894 -1.0832225
0.4599159 0.2391921 0.8537166 -1.1601894 -0.7796579
0.7513465 -0.1850448 -0.0300880 -0.8377632 -1.1452012
0.9436336 0.0803095 -0.4747348 -0.5768162 -1.3095617
0.3539460 -0.1838493 -0.1934629 -0.8438445 -1.0661074
0.0105186 -0.0278590 0.0189760 -0.7870098 -1.3212246
-0.3091995 -0.1991649 -0.3605486 -0.5732683 -1.3766364
-0.1773941 -0.0821620 -0.1964635 -0.6844629 -1.5170263
-0.0620164 -0.3388317 0.0018052 -0.5345262 -1.4735722
-0.0058549 -0.3108250 0.2082813 -0.3803533 -1.5456743
0.1339785 -0.1846160 -0.0421507 -0.1672836 -1.3871177
0.2384631 -0.3569942 -0.2797757 -0.0598791 -1.4304986
0.3223261 -0.2300988 -0.4758438 -0.3050656 -1.2893195
0.0689436 -0.4189298 -0.3463617 -0.2033154 -1.1232568
0.1859467 -0.3055634 -0.1930128 -0.1517074 -1.1641397
0.2701165 -0.2220612 -0.3896788 -0.3678349 -0.9975935

 

Having calculated our bootstrapped data we then simply hand the matrix to either the survivALL() or plotALL() functions (using the bs_dfr = argument) to handle the subsequent significance calculations. It should be noted that bootstrapping up to 10,000x can be a long process requiring an investment of time.