The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Cycle-specific logistic regression of haplotype effects with known haplotype probabilities. Given observed genotype \(G\) and unobserved haplotypes \(H\), we marginalise over the possible haplotypes using \(P(H|G)\) supplied as input.
\[\begin{align*} S(t|x,G) & = E( S(t|x,H) | G) = \sum_{h \in G} P(h|G) S(t|z,h) \end{align*}\] so survival can be computed by marginalising over possible \(h\) given \(g\).
Survival is based on logistic regression for the discrete hazard function of the form \[\begin{align*} \mbox{logit}(P(T=t \mid T \geq t, x, h)) & = \alpha_t + x(h)^T \beta \end{align*}\] where \(x(h)\) is a regression design of \(x\) and haplotypes \(h=(h_1,h_2)\).
Simple binomial data can also be fitted using this function.
Standard errors are computed assuming that the haplotype probabilities are known.
We are particularly interested in the types haplotypes:
types <- c("DCGCGCTCACG","DTCCGCTGACG","ITCAGTTGACG","ITCCGCTGAGG")
## some haplotypes frequencies for simulations
data(haplo)
hapfreqs <- haplo$hapfreqs
print(hapfreqs)
#> index haplotype freq
#> DCGAGCTCACG 1 DCGAGCTCACG 0.010681
#> DCGCGCTCACG 2 DCGCGCTCACG 0.138387
#> DTGAGCTCACG 3 DTGAGCTCACG 0.000310
#> DTGAGCTCACA 4 DTGAGCTCACA 0.006800
#> DTGAGCTCGCG 5 DTGAGCTCGCG 0.034517
#> DTGACCTCACG 6 DTGACCTCACG 0.001336
#> DTGCGCTCACG 7 DTGCGCTCACG 0.009969
#> DTGCGCTCACA 8 DTGCGCTCACA 0.011833
#> DTGCGCTCGCG 9 DTGCGCTCGCG 0.302389
#> DTGCGCCCGCG 10 DTGCGCCCGCG 0.001604
#> DTGCCCTCACG 11 DTGCCCTCACG 0.003912
#> DTCAGCTGACG 12 DTCAGCTGACG 0.001855
#> DTCCGCTGACG 13 DTCCGCTGACG 0.103394
#> DTCCCCTGACG 14 DTCCCCTGACG 0.000310
#> ITCAGTTGACG 15 ITCAGTTGACG 0.048124
#> ITCCGCTGAGG 16 ITCCGCTGAGG 0.291273
#> ITCCGTTGACG 17 ITCCGTTGACG 0.031089
#> ITCCGTCGACG 18 ITCCGTCGACG 0.001502
#> ITCCCCTGAGG 19 ITCCCCTGAGG 0.000653Among the haplotypes of interest we look up the frequencies and choose a baseline:
www <-which(hapfreqs$haplotype %in% types)
hapfreqs$freq[www]
#> [1] 0.138387 0.103394 0.048124 0.291273
baseline=hapfreqs$haplotype[9]
baseline
#> [1] "DTGCGCTCGCG"We have cycle-specific data with \(id\) and outcome \(y\):
haploX <- haplo$haploX
dlist(haploX,.~id|id %in% c(1,4,7))
#> id: 1
#> y X1 X2 X3 X4 times lbnr__id Count1
#> 1 0 0 0 0 0 1 1 0
#> 2 0 0 0 0 0 2 2 0
#> 3 0 0 0 0 0 3 3 0
#> 4 0 0 0 0 0 4 4 0
#> 5 0 0 0 0 0 5 5 0
#> 6 0 0 0 0 0 6 6 0
#> ------------------------------------------------------------
#> id: 4
#> y X1 X2 X3 X4 times lbnr__id Count1
#> 19 1 0 0 0 0 1 1 0
#> ------------------------------------------------------------
#> id: 7
#> y X1 X2 X3 X4 times lbnr__id Count1
#> 37 0 1 0 0 0 1 1 0
#> 38 0 1 0 0 0 2 2 0
#> 39 1 1 0 0 0 3 3 0and a list of possible haplotypes for each id and their probabilities \(p\) (the probabilities within each id sum to 1):
ghaplos <- haplo$ghaplos
head(ghaplos)
#> id haplo1 haplo2 p
#> 1 1 DTGCGCTCGCG DTGAGCTCGCG 1.00000000
#> 19 2 ITCCGTTGACG DTGAGCTCGCG 0.06867716
#> 21 2 ITCAGTTGACG DTGCGCTCGCG 0.93132284
#> 51 3 ITCCGTTGACG DTGAGCTCGCG 0.06867716
#> 53 3 ITCAGTTGACG DTGCGCTCGCG 0.93132284
#> 66 4 DTGCGCTCGCG DTGCGCTCGCG 1.00000000The first id=1 has the haplotype fully observed, but id=2 has two possible haplotypes consistent with the observed genotype for this id, the probabilities are 7% and 93%, respectively.
With the baseline given above we can specify a regression design that
gives an effect if a haplotype of the specified type is present
(sm=0), or an additive effect of haplotypes
(sm=1):
designftypes <- function(x,sm=0) {
hap1=x[1]
hap2=x[2]
if (sm==0) y <- 1*( (hap1==types) | (hap2==types))
if (sm==1) y <- 1*(hap1==types) + 1*(hap2==types)
return(y)
}To fit the model we first construct a time design matrix (named
X) and supply the haplotype distributions for each id:
haploX$time <- haploX$times
Xdes <- model.matrix(~factor(time),haploX)
colnames(Xdes) <- paste("X",1:ncol(Xdes),sep="")
X <- dkeep(haploX,~id+y+time)
X <- cbind(X,Xdes)
Haplos <- dkeep(ghaplos,~id+"haplo*"+p)
desnames=paste("X",1:6,sep="") # six X's related to 6 cycles
head(X)
#> id y time X1 X2 X3 X4 X5 X6
#> 1 1 0 1 1 0 0 0 0 0
#> 2 1 0 2 1 1 0 0 0 0
#> 3 1 0 3 1 0 1 0 0 0
#> 4 1 0 4 1 0 0 1 0 0
#> 5 1 0 5 1 0 0 0 1 0
#> 6 1 0 6 1 0 0 0 0 1We can now fit the model with the design given by the design function:
out <- haplo_surv_discrete(X=X,y="y",time.name="time",
Haplos=Haplos,desnames=desnames,designfunc=designftypes)
names(out$coef) <- c(desnames,types)
out$coef
#> X1 X2 X3 X4 X5 X6
#> -1.82153345 -0.61608261 -0.17143057 -1.27152045 -0.28635976 -0.19349091
#> DCGCGCTCACG DTCCGCTGACG ITCAGTTGACG ITCCGCTGAGG
#> 0.79753613 0.65747412 0.06119231 0.31666905
summary(out)
#> Estimate Std.Err 2.5% 97.5% P-value
#> X1 -1.82153 0.1619 -2.13892 -1.5041 2.355e-29
#> X2 -0.61608 0.1895 -0.98748 -0.2447 1.149e-03
#> X3 -0.17143 0.1799 -0.52398 0.1811 3.406e-01
#> X4 -1.27152 0.2631 -1.78719 -0.7559 1.346e-06
#> X5 -0.28636 0.2030 -0.68425 0.1115 1.584e-01
#> X6 -0.19349 0.2134 -0.61184 0.2249 3.647e-01
#> DCGCGCTCACG 0.79754 0.1494 0.50465 1.0904 9.445e-08
#> DTCCGCTGACG 0.65747 0.1621 0.33971 0.9752 5.007e-05
#> ITCAGTTGACG 0.06119 0.2145 -0.35931 0.4817 7.755e-01
#> ITCCGCTGAGG 0.31667 0.1361 0.04989 0.5834 1.999e-02Haplotypes "DCGCGCTCACG" and "DTCCGCTGACG"
give an increased hazard of pregnancy.
The data were generated with these true coefficients:
tcoef=c(-1.93110204,-0.47531630,-0.04118204,-1.57872602,-0.22176426,-0.13836416,
0.88830288,0.60756224,0.39802821,0.32706859)
cbind(out$coef,tcoef)
#> tcoef
#> X1 -1.82153345 -1.93110204
#> X2 -0.61608261 -0.47531630
#> X3 -0.17143057 -0.04118204
#> X4 -1.27152045 -1.57872602
#> X5 -0.28635976 -0.22176426
#> X6 -0.19349091 -0.13836416
#> DCGCGCTCACG 0.79753613 0.88830288
#> DTCCGCTGACG 0.65747412 0.60756224
#> ITCAGTTGACG 0.06119231 0.39802821
#> ITCCGCTGAGG 0.31666905 0.32706859The fitted design matrix can be found in the output:
head(out$X,10)
#> X1 X2 X3 X4 X5 X6 haplo1 haplo2 haplo3 haplo4
#> 1 1 0 0 0 0 0 0 0 0 0
#> 2 1 1 0 0 0 0 0 0 0 0
#> 3 1 0 1 0 0 0 0 0 0 0
#> 4 1 0 0 1 0 0 0 0 0 0
#> 5 1 0 0 0 1 0 0 0 0 0
#> 6 1 0 0 0 0 1 0 0 0 0
#> 8 1 0 0 0 0 0 0 0 1 0
#> 10 1 1 0 0 0 0 0 0 1 0
#> 12 1 0 1 0 0 0 0 0 1 0
#> 14 1 0 0 1 0 0 0 0 1 0sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/kkzh/.asdf/installs/r/4.6.0/lib/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Copenhagen
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] timereg_2.0.7 survival_3.8-6 mets_1.3.10
#>
#> loaded via a namespace (and not attached):
#> [1] cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [4] xfun_0.57 otel_0.2.0 jsonlite_2.0.0
#> [7] listenv_0.10.1 future.apply_1.20.2 lava_1.9.1
#> [10] htmltools_0.5.9 stats4_4.6.0 sass_0.4.10
#> [13] rmarkdown_2.31 grid_4.6.0 evaluate_1.0.5
#> [16] jquerylib_0.1.4 fastmap_1.2.0 numDeriv_2016.8-1.1
#> [19] yaml_2.3.12 mvtnorm_1.3-7 lifecycle_1.0.5
#> [22] compiler_4.6.0 codetools_0.2-20 ucminf_1.2.3
#> [25] Rcpp_1.1.1-1.1 future_1.70.0 lattice_0.22-9
#> [28] digest_0.6.39 R6_2.6.1 parallelly_1.47.0
#> [31] parallel_4.6.0 splines_4.6.0 Matrix_1.7-5
#> [34] bslib_0.11.0 tools_4.6.0 RcppArmadillo_15.2.6-1
#> [37] globals_0.19.1 cachem_1.1.0These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.