The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This tutorial demonstrates how to use the SEAGLE
package when the user inputs a matrix \({\bf G}\). We’ll begin by loading the SEAGLE
package.
library(SEAGLE)
As an example, we’ll generate some synthetic data for usage in this tutorial. Let’s consider a dataset with \(n=5000\) individuals and \(L=100\) loci, where the first \(40\) are causal.
The makeSimData
function generates a covariate matrix \(\widetilde{\bf X} \in \mathbb{R}^{n \times 3}\), where the first column is the all ones vector for the intercept and the second and third columns are \({\bf X} \sim \text{N}(0,1)\) and \({\bf E} \sim \text{N}(0,1)\), respectively. The last two columns are scaled to have \(0\) mean and unit variance.
The makeSimData
function additionally generates the genetic marker matrix \({\bf G}\) with synthetic haplotype data from the COSI software. Detailed procedures for generating \({\bf G}\) can be found in the accompanying journal manuscript. Finally, the makeSimData
function also generates a continuous phenotype \({\bf y}\) according to the following fixed effects model \[
{\bf y} = \tilde{\bf X} \boldsymbol{\gamma}_{\widetilde{\bf X}} + {\bf G}\boldsymbol{\gamma}_{G} + \text{diag}(E){\bf G}\boldsymbol{\gamma}_{GE} + {\bf e}.
\] Here, \(\boldsymbol{\gamma}_{\tilde{\bf X}}\) is the all ones vector of length \(P=3\), \(\boldsymbol{\gamma}_{G} \in \mathbb{R}^{L}\), \(\boldsymbol{\gamma}_{GE}\in \mathbb{R}^{L}\), and \({\bf e} \sim \text{N}({\bf 0}, \sigma\, {\bf I}_{n})\). The entries of \(\boldsymbol{\gamma}_{G}\) and \(\boldsymbol{\gamma}_{GE}\) pertaining to causal loci are set to be \(\gamma_{G}\) = gammaG
and \(\gamma_{GE}\) = gammaGE
, respectively. The remaining entries of \(\boldsymbol{\gamma}_{G}\) and \(\boldsymbol{\gamma}_{GE}\) pertaining to non-causal loci are set to \(0\).
<- makeSimData(H=cosihap, n=5000, L=100, gammaG=1, gammaGE=0, causal=40, seed=1) dat
Now that we have our data, we can prepare it for use in the SEAGLE algorithm. We will input our \({\bf y}\), \({\bf X}\), \({\bf E}\), and \({\bf G}\) into the prep.SEAGLE
function. The intercept = 1
parameter indicates that the first column of \({\bf X}\) is the all ones vector for the intercept.
This preparation procedure formats the input data for the SEAGLE
function by checking the dimensions of the input data. It also pre-computes a QR decomposition for \(\widetilde{\bf X} = \begin{pmatrix} {\bf 1}_{n} & {\bf X} & {\bf E} \end{pmatrix}\), where \({\bf 1}_{n}\) denotes the all ones vector of length \(n\).
<- prep.SEAGLE(y=dat$y, X=dat$X, intercept=1, E=dat$E, G=dat$G) objSEAGLE
Finally, we’ll input the prepared data into the SEAGLE
function to compute the score-like test statistic \(T\) and its corresponding p-value. The init.tau
and init.sigma
parameters are the initial values for \(\tau\) and \(\sigma\) employed in the REML EM algorithm.
<- SEAGLE(objSEAGLE, init.tau=0.5, init.sigma=0.5)
res $T
res#> [1] 246.1886
$pv
res#> [1] 0.8441451
The score-like test statistic \(T\) for the G\(\times\)E effect and its corresponding p-value can be found in res$T
and res$pv
, respectively.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.