The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Cycle-specific logistic regression of haplo-type effects with known haplo-type probabilities. Given observed genotype G and unobserved haplotypes H we here mix out over the possible haplotypes using that \(P(H|G)\) is given as input.
\[\begin{align*} S(t|x,G) & = E( S(t|x,H) | G) = \sum_{h \in G} P(h|G) S(t|z,h) \end{align*}\] so survival can be computed by mixing out over possible h given g.
Survival is based on logistic regression for the discrete hazard function of the form \[\begin{align*} \mbox{logit}(P(T=t| T >= t, x,h)) & = \alpha_t + x(h) beta \end{align*}\] where x(h) is a regression design of x and haplotypes \(h=(h_1,h_2)\).
Simple binomial data can be fitted using this function.
For standard errors we assume that haplotype probabilities are known.
We are particularly interested in the types haplotypes:
types <- c("DCGCGCTCACG","DTCCGCTGACG","ITCAGTTGACG","ITCCGCTGAGG")
## some haplotypes frequencies for simulations
data(hapfreqs)
print(hapfreqs)
#> index haplotype freq
#> DCGAGCTCACG 1 DCGAGCTCACG 0.010681
#> DCGCGCTCACG 2 DCGCGCTCACG 0.138387
#> DTGAGCTCACG 3 DTGAGCTCACG 0.000310
#> DTGAGCTCACA 4 DTGAGCTCACA 0.006800
#> DTGAGCTCGCG 5 DTGAGCTCGCG 0.034517
#> DTGACCTCACG 6 DTGACCTCACG 0.001336
#> DTGCGCTCACG 7 DTGCGCTCACG 0.009969
#> DTGCGCTCACA 8 DTGCGCTCACA 0.011833
#> DTGCGCTCGCG 9 DTGCGCTCGCG 0.302389
#> DTGCGCCCGCG 10 DTGCGCCCGCG 0.001604
#> DTGCCCTCACG 11 DTGCCCTCACG 0.003912
#> DTCAGCTGACG 12 DTCAGCTGACG 0.001855
#> DTCCGCTGACG 13 DTCCGCTGACG 0.103394
#> DTCCCCTGACG 14 DTCCCCTGACG 0.000310
#> ITCAGTTGACG 15 ITCAGTTGACG 0.048124
#> ITCCGCTGAGG 16 ITCCGCTGAGG 0.291273
#> ITCCGTTGACG 17 ITCCGTTGACG 0.031089
#> ITCCGTCGACG 18 ITCCGTCGACG 0.001502
#> ITCCCCTGAGG 19 ITCCCCTGAGG 0.000653
Among the types of interest we look up the frequencies and choose a baseline
www <-which(hapfreqs$haplotype %in% types)
hapfreqs$freq[www]
#> [1] 0.138387 0.103394 0.048124 0.291273
baseline=hapfreqs$haplotype[9]
baseline
#> [1] "DTGCGCTCGCG"
We have cycle specific data with \(id\) and outcome \(y\)
data(haploX)
dlist(haploX,.~id|id %in% c(1,4,7))
#> id: 1
#> y X1 X2 X3 X4 times lbnr__id Count1
#> 1 0 0 0 0 0 1 1 0
#> 2 0 0 0 0 0 2 2 0
#> 3 0 0 0 0 0 3 3 0
#> 4 0 0 0 0 0 4 4 0
#> 5 0 0 0 0 0 5 5 0
#> 6 0 0 0 0 0 6 6 0
#> ------------------------------------------------------------
#> id: 4
#> y X1 X2 X3 X4 times lbnr__id Count1
#> 19 1 0 0 0 0 1 1 0
#> ------------------------------------------------------------
#> id: 7
#> y X1 X2 X3 X4 times lbnr__id Count1
#> 37 0 1 0 0 0 1 1 0
#> 38 0 1 0 0 0 2 2 0
#> 39 1 1 0 0 0 3 3 0
and a list of possible haplo-types for each id and how likely they are \(p\) (the sum of within each id is 1):
data(hHaplos) ## loads ghaplos
head(ghaplos)
#> id haplo1 haplo2 p
#> 1 1 DTGCGCTCGCG DTGAGCTCGCG 1.00000000
#> 19 2 ITCCGTTGACG DTGAGCTCGCG 0.06867716
#> 21 2 ITCAGTTGACG DTGCGCTCGCG 0.93132284
#> 51 3 ITCCGTTGACG DTGAGCTCGCG 0.06867716
#> 53 3 ITCAGTTGACG DTGCGCTCGCG 0.93132284
#> 66 4 DTGCGCTCGCG DTGCGCTCGCG 1.00000000
The first id=1 has the haplotype fully observed, but id=2 has two possible haplotypes consistent with the observed genotype for this id, the probabiblities are 7% and 93%, respectively.
With the baseline given above we can specify a regression design that gives an effect if a “type” is present (sm=0), or an additive effect of haplotypes (sm=1):
designftypes <- function(x,sm=0) {
hap1=x[1]
hap2=x[2]
if (sm==0) y <- 1*( (hap1==types) | (hap2==types))
if (sm==1) y <- 1*(hap1==types) + 1*(hap2==types)
return(y)
}
To fit the model we start by constructing a time-design (named X) and takes the haplotype distributions for each id
haploX$time <- haploX$times
Xdes <- model.matrix(~factor(time),haploX)
colnames(Xdes) <- paste("X",1:ncol(Xdes),sep="")
X <- dkeep(haploX,~id+y+time)
X <- cbind(X,Xdes)
Haplos <- dkeep(ghaplos,~id+"haplo*"+p)
desnames=paste("X",1:6,sep="") # six X's related to 6 cycles
head(X)
#> id y time X1 X2 X3 X4 X5 X6
#> 1 1 0 1 1 0 0 0 0 0
#> 2 1 0 2 1 1 0 0 0 0
#> 3 1 0 3 1 0 1 0 0 0
#> 4 1 0 4 1 0 0 1 0 0
#> 5 1 0 5 1 0 0 0 1 0
#> 6 1 0 6 1 0 0 0 0 1
Now we can fit the model with the design given by the designfunction
out <- haplo.surv.discrete(X=X,y="y",time.name="time",
Haplos=Haplos,desnames=desnames,designfunc=designftypes)
names(out$coef) <- c(desnames,types)
out$coef
#> X1 X2 X3 X4 X5 X6
#> -1.82153345 -0.61608261 -0.17143057 -1.27152045 -0.28635976 -0.19349091
#> DCGCGCTCACG DTCCGCTGACG ITCAGTTGACG ITCCGCTGAGG
#> 0.79753613 0.65747412 0.06119231 0.31666905
summary(out)
#> Estimate Std.Err 2.5% 97.5% P-value
#> X1 -1.82153 0.1619 -2.13892 -1.5041 2.355e-29
#> X2 -0.61608 0.1895 -0.98748 -0.2447 1.149e-03
#> X3 -0.17143 0.1799 -0.52398 0.1811 3.406e-01
#> X4 -1.27152 0.2631 -1.78719 -0.7559 1.346e-06
#> X5 -0.28636 0.2030 -0.68425 0.1115 1.584e-01
#> X6 -0.19349 0.2134 -0.61184 0.2249 3.647e-01
#> DCGCGCTCACG 0.79754 0.1494 0.50465 1.0904 9.445e-08
#> DTCCGCTGACG 0.65747 0.1621 0.33971 0.9752 5.007e-05
#> ITCAGTTGACG 0.06119 0.2145 -0.35931 0.4817 7.755e-01
#> ITCCGCTGAGG 0.31667 0.1361 0.04989 0.5834 1.999e-02
Haplotypes “DCGCGCTCACG” “DTCCGCTGACG” gives increased hazard of pregnancy
The data was generated with these true coefficients
tcoef=c(-1.93110204,-0.47531630,-0.04118204,-1.57872602,-0.22176426,-0.13836416,
0.88830288,0.60756224,0.39802821,0.32706859)
cbind(out$coef,tcoef)
#> tcoef
#> X1 -1.82153345 -1.93110204
#> X2 -0.61608261 -0.47531630
#> X3 -0.17143057 -0.04118204
#> X4 -1.27152045 -1.57872602
#> X5 -0.28635976 -0.22176426
#> X6 -0.19349091 -0.13836416
#> DCGCGCTCACG 0.79753613 0.88830288
#> DTCCGCTGACG 0.65747412 0.60756224
#> ITCAGTTGACG 0.06119231 0.39802821
#> ITCCGCTGAGG 0.31666905 0.32706859
The design fitted can be found in the output
head(out$X,10)
#> X1 X2 X3 X4 X5 X6 haplo1 haplo2 haplo3 haplo4
#> 1 1 0 0 0 0 0 0 0 0 0
#> 2 1 1 0 0 0 0 0 0 0 0
#> 3 1 0 1 0 0 0 0 0 0 0
#> 4 1 0 0 1 0 0 0 0 0 0
#> 5 1 0 0 0 1 0 0 0 0 0
#> 6 1 0 0 0 0 1 0 0 0 0
#> 8 1 0 0 0 0 0 0 0 1 0
#> 10 1 1 0 0 0 0 0 0 1 0
#> 12 1 0 1 0 0 0 0 0 1 0
#> 14 1 0 0 1 0 0 0 0 1 0
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: aarch64-apple-darwin24.2.0
#> Running under: macOS Sequoia 15.2
#>
#> Matrix products: default
#> BLAS: /Users/klaus/.asdf/installs/R/4.4.2/lib/R/lib/libRblas.dylib
#> LAPACK: /Users/klaus/.asdf/installs/R/4.4.2/lib/R/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] mets_1.3.5 timereg_2.0.6 survival_3.8-3
#>
#> loaded via a namespace (and not attached):
#> [1] cli_3.6.3 knitr_1.49 rlang_1.1.4
#> [4] xfun_0.50 jsonlite_1.8.9 listenv_0.9.1
#> [7] future.apply_1.11.3 lava_1.8.1 htmltools_0.5.8.1
#> [10] sass_0.4.9 rmarkdown_2.29 grid_4.4.2
#> [13] evaluate_1.0.1 jquerylib_0.1.4 fastmap_1.2.0
#> [16] mvtnorm_1.3-2 yaml_2.3.10 lifecycle_1.0.4
#> [19] numDeriv_2016.8-1.1 compiler_4.4.2 codetools_0.2-20
#> [22] ucminf_1.2.2 Rcpp_1.0.13-1 future_1.34.0
#> [25] lattice_0.22-6 digest_0.6.37 R6_2.5.1
#> [28] parallelly_1.41.0 parallel_4.4.2 splines_4.4.2
#> [31] bslib_0.8.0 Matrix_1.7-1 tools_4.4.2
#> [34] globals_0.16.3 cachem_1.1.0
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.