The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
We provide a walk-through of how to run tests for segregation distortion at many loci in parallel.
Data sets ufit
, ufit2
, and
ufit3
contain the genotyping output of
updog::multidog()
using three different models.
ufit
: Uses the norm
model without
specifying who the parents are.ufit2
: Uses the f1pp
model, specifying the
parents.ufit3
: Uses the f1
model, specifying the
parents.You can convert this genotyping output to what
multi_lrt()
expects using multidog_to_g()
.
If you did not use the f1pp
or f1
models, use ether the all_gl
(to run tests using genotype
log-likelihoods) or all_g
(to run tests assuming genotypes
are known) options, and you must specify the ID’s of the parents.
o1 <- multidog_to_g(ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp")
o2 <- multidog_to_g(ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp")
If you did use the f1pp
or f1
models, use either the off_gl
(to run tests using genotype
log-likelihoods) or off_g
(to run tests assuming genotypes
are known) options.
o3 <- multidog_to_g(ufit2, type = "off_g")
o4 <- multidog_to_g(ufit2, type = "off_gl")
o5 <- multidog_to_g(ufit3, type = "off_g")
o6 <- multidog_to_g(ufit3, type = "off_gl")
We would recommend always using genotype log-likelihoods. But the option is there for known genotypes.
Parallelization support is provided through the future package. You
specify how to implement parallelization using
future::plan()
. Then you run multi_lrt()
. Then
you shut down the parallelization with future::plan()
. The
most common plan is future::multisession()
, where you
specify the number of parallel processes with the workers
argument. You can get the maximum number of workers via
future::availableCores()
So a typically workload looks something like this:
future::plan(future::multisession(workers = 2))
mout <- multi_lrt(g = o2$g, p1 = o2$p1, p2 = o2$p2)
future::plan(future::sequential())
The output is a data frame. The most important parts of this are the
snp
and p_value
columns. As a reminder, please
ignore the alpha
, xi1
, and xi2
estimates. Those are noisy. Please don’t use them for real work.
mout[c("snp", "p_value")]
#> snp p_value
#> 1 1_44673475 0.205250847
#> 2 11_28836119 0.200142085
#> 3 12_31029646 0.018008460
#> 4 12_8487773 1.000000000
#> 5 2_1426329 0.453995964
#> 6 2_20070837 0.001131421
#> 7 2_40108108 0.410718596
#> 8 4_37820371 0.212515165
#> 9 6_30619745 0.617023681
#> 10 6_4350249 0.649505305
It looks like SNPs 12_31029646 and 2_20070837 are in possible segregation distortion. Let’s look at the posterior mode genotypes of SNP 2_20070837:
o1$g["2_20070837", ]
#> 0 1 2 3 4
#> 40 166 32 2 0
o1$p1["2_20070837"]
#> 2_20070837
#> 0
o1$p2["2_20070837"]
#> 2_20070837
#> 2
So SNP 2_20070837 likely got flagged because of two individuals that are very likely a genotype of 3, which is impossible if the parents have genotypes 0 and 2.
For SNP 12_31029646, let’s compare the expected frequencies against the observed modes.
offspring_gf_2(alpha = 0.1249, xi1 = 1/3, p1 = 2, p2 = 0)
#> [1] 0.2083 0.5834 0.2083 0.0000 0.0000
o1$g["12_31029646", ] / sum(o1$g["12_31029646", ])
#> 0 1 2 3 4
#> 0.1541667 0.5833333 0.2625000 0.0000000 0.0000000
o1$p1["12_31029646"]
#> 12_31029646
#> 2
o1$p2["12_31029646"]
#> 12_31029646
#> 0
So SNP 12_31029646 likely got flagged because there are too many individuals with a genotype of 2, and not enough with a genotype of 0.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.