The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Haplotype Discrete Survival Models

Klaus Holst & Thomas Scheike

2025-04-23

Haplotype Analysis for discrete TTP

Cycle-specific logistic regression of haplo-type effects with known haplo-type probabilities. Given observed genotype G and unobserved haplotypes H we here mix out over the possible haplotypes using that \(P(H|G)\) is given as input.

\[\begin{align*} S(t|x,G) & = E( S(t|x,H) | G) = \sum_{h \in G} P(h|G) S(t|z,h) \end{align*}\] so survival can be computed by mixing out over possible h given g.

Survival is based on logistic regression for the discrete hazard function of the form \[\begin{align*} \mbox{logit}(P(T=t| T >= t, x,h)) & = \alpha_t + x(h) beta \end{align*}\] where x(h) is a regression design of x and haplotypes \(h=(h_1,h_2)\).

Simple binomial data can be fitted using this function.

For standard errors we assume that haplotype probabilities are known.

We are particularly interested in the types haplotypes:

types <- c("DCGCGCTCACG","DTCCGCTGACG","ITCAGTTGACG","ITCCGCTGAGG")

## some haplotypes frequencies for simulations 
data(hapfreqs)
print(hapfreqs)
#>             index   haplotype     freq
#> DCGAGCTCACG     1 DCGAGCTCACG 0.010681
#> DCGCGCTCACG     2 DCGCGCTCACG 0.138387
#> DTGAGCTCACG     3 DTGAGCTCACG 0.000310
#> DTGAGCTCACA     4 DTGAGCTCACA 0.006800
#> DTGAGCTCGCG     5 DTGAGCTCGCG 0.034517
#> DTGACCTCACG     6 DTGACCTCACG 0.001336
#> DTGCGCTCACG     7 DTGCGCTCACG 0.009969
#> DTGCGCTCACA     8 DTGCGCTCACA 0.011833
#> DTGCGCTCGCG     9 DTGCGCTCGCG 0.302389
#> DTGCGCCCGCG    10 DTGCGCCCGCG 0.001604
#> DTGCCCTCACG    11 DTGCCCTCACG 0.003912
#> DTCAGCTGACG    12 DTCAGCTGACG 0.001855
#> DTCCGCTGACG    13 DTCCGCTGACG 0.103394
#> DTCCCCTGACG    14 DTCCCCTGACG 0.000310
#> ITCAGTTGACG    15 ITCAGTTGACG 0.048124
#> ITCCGCTGAGG    16 ITCCGCTGAGG 0.291273
#> ITCCGTTGACG    17 ITCCGTTGACG 0.031089
#> ITCCGTCGACG    18 ITCCGTCGACG 0.001502
#> ITCCCCTGAGG    19 ITCCCCTGAGG 0.000653

Among the types of interest we look up the frequencies and choose a baseline

www <-which(hapfreqs$haplotype %in% types)
hapfreqs$freq[www]
#> [1] 0.138387 0.103394 0.048124 0.291273

baseline=hapfreqs$haplotype[9]
baseline
#> [1] "DTGCGCTCGCG"

We have cycle specific data with \(id\) and outcome \(y\)

data(haploX)
dlist(haploX,.~id|id %in% c(1,4,7))
#> id: 1
#>   y X1 X2 X3 X4 times lbnr__id Count1
#> 1 0 0  0  0  0  1     1        0     
#> 2 0 0  0  0  0  2     2        0     
#> 3 0 0  0  0  0  3     3        0     
#> 4 0 0  0  0  0  4     4        0     
#> 5 0 0  0  0  0  5     5        0     
#> 6 0 0  0  0  0  6     6        0     
#> ------------------------------------------------------------ 
#> id: 4
#>    y X1 X2 X3 X4 times lbnr__id Count1
#> 19 1 0  0  0  0  1     1        0     
#> ------------------------------------------------------------ 
#> id: 7
#>    y X1 X2 X3 X4 times lbnr__id Count1
#> 37 0 1  0  0  0  1     1        0     
#> 38 0 1  0  0  0  2     2        0     
#> 39 1 1  0  0  0  3     3        0

and a list of possible haplo-types for each id and how likely they are \(p\) (the sum of within each id is 1):

data(hHaplos) ## loads ghaplos 
head(ghaplos)
#>    id      haplo1      haplo2          p
#> 1   1 DTGCGCTCGCG DTGAGCTCGCG 1.00000000
#> 19  2 ITCCGTTGACG DTGAGCTCGCG 0.06867716
#> 21  2 ITCAGTTGACG DTGCGCTCGCG 0.93132284
#> 51  3 ITCCGTTGACG DTGAGCTCGCG 0.06867716
#> 53  3 ITCAGTTGACG DTGCGCTCGCG 0.93132284
#> 66  4 DTGCGCTCGCG DTGCGCTCGCG 1.00000000

The first id=1 has the haplotype fully observed, but id=2 has two possible haplotypes consistent with the observed genotype for this id, the probabiblities are 7% and 93%, respectively.

With the baseline given above we can specify a regression design that gives an effect if a “type” is present (sm=0), or an additive effect of haplotypes (sm=1):

designftypes <- function(x,sm=0) {
hap1=x[1]
hap2=x[2]
if (sm==0) y <- 1*( (hap1==types) | (hap2==types))
if (sm==1) y <- 1*(hap1==types) + 1*(hap2==types)
return(y)
}

To fit the model we start by constructing a time-design (named X) and takes the haplotype distributions for each id

haploX$time <- haploX$times
Xdes <- model.matrix(~factor(time),haploX)
colnames(Xdes) <- paste("X",1:ncol(Xdes),sep="")
X <- dkeep(haploX,~id+y+time)
X <- cbind(X,Xdes)
Haplos <- dkeep(ghaplos,~id+"haplo*"+p)
desnames=paste("X",1:6,sep="")   # six X's related to 6 cycles 
head(X)
#>   id y time X1 X2 X3 X4 X5 X6
#> 1  1 0    1  1  0  0  0  0  0
#> 2  1 0    2  1  1  0  0  0  0
#> 3  1 0    3  1  0  1  0  0  0
#> 4  1 0    4  1  0  0  1  0  0
#> 5  1 0    5  1  0  0  0  1  0
#> 6  1 0    6  1  0  0  0  0  1

Now we can fit the model with the design given by the designfunction

out <- haplo.surv.discrete(X=X,y="y",time.name="time",
      Haplos=Haplos,desnames=desnames,designfunc=designftypes) 
names(out$coef) <- c(desnames,types)
out$coef
#>          X1          X2          X3          X4          X5          X6 
#> -1.82153345 -0.61608261 -0.17143057 -1.27152045 -0.28635976 -0.19349091 
#> DCGCGCTCACG DTCCGCTGACG ITCAGTTGACG ITCCGCTGAGG 
#>  0.79753613  0.65747412  0.06119231  0.31666905
summary(out)
#>             Estimate Std.Err     2.5%   97.5%   P-value
#> X1          -1.82153  0.1619 -2.13892 -1.5041 2.355e-29
#> X2          -0.61608  0.1895 -0.98748 -0.2447 1.149e-03
#> X3          -0.17143  0.1799 -0.52398  0.1811 3.406e-01
#> X4          -1.27152  0.2631 -1.78719 -0.7559 1.346e-06
#> X5          -0.28636  0.2030 -0.68425  0.1115 1.584e-01
#> X6          -0.19349  0.2134 -0.61184  0.2249 3.647e-01
#> DCGCGCTCACG  0.79754  0.1494  0.50465  1.0904 9.445e-08
#> DTCCGCTGACG  0.65747  0.1621  0.33971  0.9752 5.007e-05
#> ITCAGTTGACG  0.06119  0.2145 -0.35931  0.4817 7.755e-01
#> ITCCGCTGAGG  0.31667  0.1361  0.04989  0.5834 1.999e-02

Haplotypes “DCGCGCTCACG” “DTCCGCTGACG” gives increased hazard of pregnancy

The data was generated with these true coefficients

tcoef=c(-1.93110204,-0.47531630,-0.04118204,-1.57872602,-0.22176426,-0.13836416,
0.88830288,0.60756224,0.39802821,0.32706859)

cbind(out$coef,tcoef)
#>                               tcoef
#> X1          -1.82153345 -1.93110204
#> X2          -0.61608261 -0.47531630
#> X3          -0.17143057 -0.04118204
#> X4          -1.27152045 -1.57872602
#> X5          -0.28635976 -0.22176426
#> X6          -0.19349091 -0.13836416
#> DCGCGCTCACG  0.79753613  0.88830288
#> DTCCGCTGACG  0.65747412  0.60756224
#> ITCAGTTGACG  0.06119231  0.39802821
#> ITCCGCTGAGG  0.31666905  0.32706859

The design fitted can be found in the output

head(out$X,10)
#>    X1 X2 X3 X4 X5 X6 haplo1 haplo2 haplo3 haplo4
#> 1   1  0  0  0  0  0      0      0      0      0
#> 2   1  1  0  0  0  0      0      0      0      0
#> 3   1  0  1  0  0  0      0      0      0      0
#> 4   1  0  0  1  0  0      0      0      0      0
#> 5   1  0  0  0  1  0      0      0      0      0
#> 6   1  0  0  0  0  1      0      0      0      0
#> 8   1  0  0  0  0  0      0      0      1      0
#> 10  1  1  0  0  0  0      0      0      1      0
#> 12  1  0  1  0  0  0      0      0      1      0
#> 14  1  0  0  1  0  0      0      0      1      0

SessionInfo

sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: aarch64-apple-darwin24.3.0
#> Running under: macOS Sequoia 15.4.1
#> 
#> Matrix products: default
#> BLAS:   /Users/kkzh/.asdf/installs/R/4.4.3/lib/R/lib/libRblas.dylib 
#> LAPACK: /Users/kkzh/.asdf/installs/R/4.4.3/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] timereg_2.0.6  survival_3.8-3 mets_1.3.6    
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.4           knitr_1.49          rlang_1.1.5        
#>  [4] xfun_0.51           jsonlite_1.9.1      future.apply_1.11.3
#>  [7] listenv_0.9.1       lava_1.8.1          htmltools_0.5.8.1  
#> [10] sass_0.4.9          rmarkdown_2.29      grid_4.4.3         
#> [13] evaluate_1.0.3      jquerylib_0.1.4     fastmap_1.2.0      
#> [16] numDeriv_2016.8-1.1 yaml_2.3.10         mvtnorm_1.3-3      
#> [19] lifecycle_1.0.4     compiler_4.4.3      codetools_0.2-20   
#> [22] ucminf_1.2.2        Rcpp_1.0.14         future_1.40.0      
#> [25] lattice_0.22-6      digest_0.6.37       R6_2.6.1           
#> [28] parallelly_1.43.0   parallel_4.4.3      splines_4.4.3      
#> [31] Matrix_1.7-2        bslib_0.9.0         tools_4.4.3        
#> [34] globals_0.17.0      cachem_1.1.0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.