This vignette illustrates the usage of the SNPknock
package in combination with the imputation software fastPHASE to create knockoff copies of unphased genotypes or phased haplotypes (Sesia, Sabatti, and Candès 2017). Since fastPHASE
is not available as an R package, this particular functionality of SNPknock
requires the user to first obtain a copy of fastPHASE
.
fastPHASE
fastPHASE is a program whose purpose is to estimate missing genotypes and unobserved haplotypes. Its underlying algorithm is based on the hidden Markov model described in (Scheet and Stephens 2006).
Binary executables for Linux and Mac OS are available from http://scheet.org/software.html.
Before continuing with this tutorial, download and extract the fastPHASE tarball from the above link and move the fastPHASE
executable file into a convenient directory (e.g. “~/bin/”).
Finally, we can use the hidden Markov model created above to generate knockoffs.
Xk = SNPknock.knockoffGenotypes(X, hmm$r, hmm$alpha, hmm$theta)
table(Xk)
## Xk
## 0 1 2
## 76558 54761 14081
Finally, we can use the hidden Markov model created above to generate knockoffs.
Hk = SNPknock.knockoffHaplotypes(H, hmm$r, hmm$alpha, hmm$theta)
table(Hk)
## Hk
## 0 1
## 207908 82892
If you want to see some basic usage of SNPknock
, see the introductory vignette.
Scheet, Paul, and Matthew Stephens. 2006. “A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase.” Am J Hum Genet 78 (4). The American Society of Human Genetics: 629–44. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1424677/.
Sesia, M., C. Sabatti, and E. J. Candès. 2017. “Gene Hunting with Knockoffs for Hidden Markov Models.” ArXiv E-Prints, June. https://arxiv.org/abs/1706.04677.