The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this vignette, we briefly illustrate the different ways susie_rss
can be called, and draw connections between running
susie_rss
on summary data, and running susie
on individual-level data.
library(susieR)
Simulate a data set with 200 samples and 1,000 variables, in which the only first 4 variables affect the outcome.
set.seed(1)
<- 200
n <- 1000
p <- rep(0,p)
beta 1:4] <- 1
beta[<- matrix(rnorm(n*p),nrow = n,ncol = p)
X <- scale(X,center = TRUE,scale = FALSE)
X <- drop(X %*% beta + rnorm(n)) y
Compute summary statistics \(\hat{b}_j, \hat{s}_j\) and the correlation matrix, \({\bf R}\). These quantities will be provided as input to susie_rss.
<- univariate_regression(X,y)
ss <- compute_suff_stat(X,y,standardize = FALSE)
dat <- cov2cor(dat$XtX) R
The susie and susie_rss analyses produce the exact same results when
the summary statistics bhat
, shat
,
var_y
and n
are provided to susie_rss (and
when R
is an “in sample” correlation estimate—that is, when
it was computed from the same matrix X
that was used to
obtain the other statistics). If the covariate effects are removed from
the genotypes in univariate regression, the in-sample LD matrix should
compute from the genotype residuals where the covariate effects have
been removed.
<- susie(X,y,L = 10)
res1 <- susie_rss(bhat = ss$betahat,shat = ss$sebetahat,R = R,n = n,
res2 var_y = var(y),L = 10,estimate_residual_variance = TRUE)
# HINT: For estimate_residual_variance = TRUE, please check that R is the "in-sample" LD matrix; that is, the correlation matrix obtained using the exact same data matrix X that was used for the other summary statistics. Also note, when covariates are included in the univariate regressions that produced the summary statistics, also consider removing these effects from X before computing R.
plot(coef(res1),coef(res2),pch = 20,xlab = "susie",ylab = "susie_rss")
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
When some but not all of these statistics are provided, the results should be similar, but not exactly the same.
Next let’s compare the susie and susie_rss outputs when \({\bf X}, y\) are standardized before computing the summary statistics (by “standardize”, we mean that \(y\) and the columns of \({\bf X}\) are each divided by the sample standard deviation so that they each have the same standard deviation).
<- univariate_regression(scale(X),scale(y))
ss <- compute_suff_stat(X,y,standardize = TRUE)
dat <- cov2cor(dat$XtX) R
Then we compute the z-scores:
<- ss$betahat/ss$sebetahat zhat
When standardizing, providing susie_rss with summary data
z
(or bhat
, shat
), R
and n
is sufficient for susie_rss to recover the same
results as susie:
<- susie(scale(X),scale(y),L = 10)
res1 <- susie_rss(bhat = ss$betahat,shat = ss$sebetahat,R = R,n = n,
res2 L = 10,estimate_residual_variance = TRUE)
<- susie_rss(zhat,R,n = n,L = 10,estimate_residual_variance = TRUE)
res3 layout(matrix(1:2,1,2))
plot(coef(res1),coef(res2),pch = 20,xlab = "susie",
ylab = "susie_rss(bhat,shat)")
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
plot(coef(res1),coef(res3),pch = 20,xlab = "susie",ylab = "susie_rss(z)")
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
When the residual variance is not estimated in susie_rss, the susie_rss results may be close to susie, but may no longer be exactly the same:
<- susie_rss(zhat,R,n = n,L = 10)
res4 plot(coef(res1),coef(res4),pch = 20,xlab = "susie",ylab = "susie_rss")
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
Whenever R
is an “in sample” correlation matrix, we
recommend estimating the residual variance.
Without providing the sample size, n
, the coefficients
are interpreted as the “noncentrality parameters” (NCPs), and are
(roughly) related to the susie parameters by a factor of \(\sqrt{n}\):
<- susie_rss(zhat,R,L = 10)
res5 plot(coef(res1),coef(res5)/sqrt(n),pch = 20,xlab = "susie",
ylab = "susie_rss/sqrt(n)")
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
Whenever possible, the sample size, or a reasonable estimate of the sample size, should be provided.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.