The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

SKAT, weights, and projections

Thomas Lumley

The original SKAT statistic for linear and generalised linear models is \[Q=(y-\hat\mu)'GW^2G'(y-\hat\mu)=(y-\hat\mu)'K(y-\hat\mu)\] where \(G\) is \(N\times M\) genotype matrix, and \(W\) is a weight matrix that in practice is diagonal. I’ve changed the original notation from \(W\) to \(W^2\), because everyone basically does. The Harvard group has a factor of 1/2 somewhere in here, the BU/CHARGE group doesn’t.

When the adjustment model isn’t ordinary linear regression, there is a second weight matrix, which I’ll write \(\Sigma\), giving the metric that makes \(y\mapsto y-\hat\mu\) the projection orthogonal to the range of \(X\). That is \[\hat\mu = \left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)Y\] Note that both \[\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)\] and \[I-\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)\] are projections.

The matrix whose eigenvalues are needed for SKAT is \(H=P_0^{1/2}KP_0^{1/2}\) (or \(K^{1/2}P_0K^{1/2}\)) where \[P_0=V^{1/2}\left[I-\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)\right]V^{1/2}\] is the covariance matrix of the the residuals, with \(V=\mathop{var}[Y]\). Usually \(V=\Sigma\), but that’s not necessary.

famSKAT has test statistic \[Q=(y-\hat\mu)'V^{-1}GW^2G'V^{-1}(y-\hat\mu)=(y-\hat\mu)'V^{-1}KV^{-1}(y-\hat\mu)\] so the matrix \(H\) is \(H=P_0^{1/2}V^{-1}KV^{-1}P_0^{1/2}\).

When we want to take a square root of \(P_0\) it helps a lot that the central piece is a projection, and so is idempotent: we can define \[\Pi_0=\left[I-\left(\Sigma^{-1/2}X(X^T\Sigma^{-1} X)^{-1}X^T\Sigma^{-1/2}\right)\right]\] and write \(P_0=V^{1/2}\Pi_0V^{1/2}=V^{1/2}\Pi_0\Pi_0V^{1/2}\).

Now consider \(\tilde G\), with \(H=\tilde G^T\tilde G\). We can take \(\tilde G = WG'V^{-1}V^{1/2}\Pi_0=WG'V^{-1/2}\Pi_0\) where \(G\) is sparse, \(W\) is diagonal. The projection \(\Pi_0\) was needed to fit the adjustment model, so it will be fast. In family data where \(V=\Sigma\) is based on expected relatedness from a pedigree, the Cholesky square root \(R=V^{1/2}=\Sigma^{1/2}\) will be sparse.

Let \(f\) be the size of the largest pedigree. We should still be able to multiply a vector by \(\tilde G\) in \(O(MN\alpha+Nf^2)\) time where \(\alpha\ll 1\) depends on the sparseness of \(G\). If so, we can compute the leading-eigenvalue approximation in \(O(MNk\alpha+Nkf^2)\) time. (In fact, we can replace \(f^2\) by the average of the squares of number of relatives for everyone in the sample)

The relevant bits of the code, the functions that multiply by \(\tilde G\) and \(\tilde G^T\), look like

CholSigma<-t(chol(SIGMA))
Z<-nullmodel$x
qr<-qr(as.matrix(solve(CholSigma,Z)))
rval <- list(
    mult = function(X) {
      base::qr.resid(qr,as.matrix(solve(CholSigma,(spG %*% X))))
        }, 
    tmult = function(X) {
      crossprod(spG, solve(t(CholSigma), base::qr.resid(qr,X)))
    })

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.