Example 2: Using SEAGLE with Simulated Data

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This tutorial demonstrates how to use the SEAGLE package when the user inputs a matrix ${\bf G}$. We’ll begin by loading the SEAGLE package.

library(SEAGLE)

As an example, we’ll generate some synthetic data for usage in this tutorial. Let’s consider a dataset with $n=5000$ individuals and $L=100$ loci, where the first $40$ are causal.

The makeSimData function generates a covariate matrix $\widetilde{\bf X} \in \mathbb{R}^{n \times 3}$, where the first column is the all ones vector for the intercept and the second and third columns are ${\bf X} \sim \text{N}(0,1)$ and ${\bf E} \sim \text{N}(0,1)$, respectively. The last two columns are scaled to have $0$ mean and unit variance.

The makeSimData function additionally generates the genetic marker matrix ${\bf G}$ with synthetic haplotype data from the COSI software. Detailed procedures for generating ${\bf G}$ can be found in the accompanying journal manuscript. Finally, the makeSimData function also generates a continuous phenotype ${\bf y}$ according to the following fixed effects model \[ {\bf y} = \tilde{\bf X} \boldsymbol{\gamma}_{\widetilde{\bf X}} + {\bf G}\boldsymbol{\gamma}_{G} + \text{diag}(E){\bf G}\boldsymbol{\gamma}_{GE} + {\bf e}. \] Here, $\boldsymbol{\gamma}_{\tilde{\bf X}}$ is the all ones vector of length $P=3$, $\boldsymbol{\gamma}_{G} \in \mathbb{R}^{L}$, $\boldsymbol{\gamma}_{GE}\in \mathbb{R}^{L}$, and ${\bf e} \sim \text{N}({\bf 0}, \sigma\, {\bf I}_{n})$. The entries of $\boldsymbol{\gamma}_{G}$ and $\boldsymbol{\gamma}_{GE}$ pertaining to causal loci are set to be $\gamma_{G}$ = gammaG and $\gamma_{GE}$ = gammaGE, respectively. The remaining entries of $\boldsymbol{\gamma}_{G}$ and $\boldsymbol{\gamma}_{GE}$ pertaining to non-causal loci are set to $0$.

dat <- makeSimData(H=cosihap, n=5000, L=100, gammaG=1, gammaGE=0, causal=40, seed=1)

Now that we have our data, we can prepare it for use in the SEAGLE algorithm. We will input our ${\bf y}$, ${\bf X}$, ${\bf E}$, and ${\bf G}$ into the prep.SEAGLE function. The intercept = 1 parameter indicates that the first column of ${\bf X}$ is the all ones vector for the intercept.

This preparation procedure formats the input data for the SEAGLE function by checking the dimensions of the input data. It also pre-computes a QR decomposition for $\widetilde{\bf X} = \begin{pmatrix} {\bf 1}_{n} & {\bf X} & {\bf E} \end{pmatrix}$, where ${\bf 1}_{n}$ denotes the all ones vector of length $n$.

objSEAGLE <- prep.SEAGLE(y=dat$y, X=dat$X, intercept=1, E=dat$E, G=dat$G)

Finally, we’ll input the prepared data into the SEAGLE function to compute the score-like test statistic $T$ and its corresponding p-value. The init.tau and init.sigma parameters are the initial values for $\tau$ and $\sigma$ employed in the REML EM algorithm.

res <- SEAGLE(objSEAGLE, init.tau=0.5, init.sigma=0.5)
res$T
#> [1] 246.1886
res$pv
#> [1] 0.8441451

The score-like test statistic $T$ for the G$\times$E effect and its corresponding p-value can be found in res$T and res$pv, respectively.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.