The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Parallel Testing

library(segtest)

We provide a walk-through of how to run tests for segregation distortion at many loci in parallel.

Data sets ufit, ufit2, and ufit3 contain the genotyping output of updog::multidog() using three different models.

You can convert this genotyping output to what multi_lrt() expects using multidog_to_g().

If you did not use the f1pp or f1 models, use ether the all_gl (to run tests using genotype log-likelihoods) or all_g (to run tests assuming genotypes are known) options, and you must specify the ID’s of the parents.

o1 <- multidog_to_g(ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp")
o2 <- multidog_to_g(ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp")

If you did use the f1pp or f1 models, use either the off_gl (to run tests using genotype log-likelihoods) or off_g (to run tests assuming genotypes are known) options.

o3 <- multidog_to_g(ufit2, type = "off_g")
o4 <- multidog_to_g(ufit2, type = "off_gl")
o5 <- multidog_to_g(ufit3, type = "off_g")
o6 <- multidog_to_g(ufit3, type = "off_gl")

We would recommend always using genotype log-likelihoods. But the option is there for known genotypes.

Parallelization support is provided through the future package. You specify how to implement parallelization using future::plan(). Then you run multi_lrt(). Then you shut down the parallelization with future::plan(). The most common plan is future::multisession(), where you specify the number of parallel processes with the workers argument. You can get the maximum number of workers via future::availableCores()

future::availableCores()
#> system 
#>     16

So a typically workload looks something like this:

future::plan(future::multisession(workers = 2))
mout <- multi_lrt(g = o2$g, p1 = o2$p1, p2 = o2$p2)
future::plan(future::sequential())

The output is a data frame. The most important parts of this are the snp and p_value columns. As a reminder, please ignore the alpha, xi1, and xi2 estimates. Those are noisy. Please don’t use them for real work.

mout[c("snp", "p_value")]
#>            snp     p_value
#> 1   1_44673475 0.205250847
#> 2  11_28836119 0.200142085
#> 3  12_31029646 0.018008460
#> 4   12_8487773 1.000000000
#> 5    2_1426329 0.453995964
#> 6   2_20070837 0.001131421
#> 7   2_40108108 0.410718596
#> 8   4_37820371 0.212515165
#> 9   6_30619745 0.617023681
#> 10   6_4350249 0.649505305

It looks like SNPs 12_31029646 and 2_20070837 are in possible segregation distortion. Let’s look at the posterior mode genotypes of SNP 2_20070837:

o1$g["2_20070837", ]
#>   0   1   2   3   4 
#>  40 166  32   2   0
o1$p1["2_20070837"]
#> 2_20070837 
#>          0
o1$p2["2_20070837"]
#> 2_20070837 
#>          2

So SNP 2_20070837 likely got flagged because of two individuals that are very likely a genotype of 3, which is impossible if the parents have genotypes 0 and 2.

For SNP 12_31029646, let’s compare the expected frequencies against the observed modes.

offspring_gf_2(alpha = 0.1249, xi1 = 1/3, p1 = 2, p2 = 0)
#> [1] 0.2083 0.5834 0.2083 0.0000 0.0000
o1$g["12_31029646", ] / sum(o1$g["12_31029646", ])
#>         0         1         2         3         4 
#> 0.1541667 0.5833333 0.2625000 0.0000000 0.0000000
o1$p1["12_31029646"]
#> 12_31029646 
#>           2
o1$p2["12_31029646"]
#> 12_31029646 
#>           0

So SNP 12_31029646 likely got flagged because there are too many individuals with a genotype of 2, and not enough with a genotype of 0.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.