Contents

1 Introduction

We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model involves an additive effect of allele count and multiplicative component random noise (hence “ACME”: Additive-Contribution, Multiplicative-Error), and is defined as

\[y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i\]

where

We estimate the model using a fast iterative algorithm.
The algorithm estimates the model which is nonlinear only with respect to \(\eta = \beta_1 / \beta_0\) \[y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i\]

2 Installation

ACMEeqtl can be installed with the following command.

install.packages('ACMEeqtl')

3 Using the Package

ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.

library(ACMEeqtl)

3.1 Testing a Single Gene-SNP Pair

First we generate sample gene expression, SNP allele counts, and a set of covariates.

# Model parameters
beta0 = 10000
beta1 = 50000

# Data dimensions
nsample = 1000
ncvrt = 19

### Data generation
### Zero average covariates
cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt)
cvrt = t(t(cvrt) - colMeans(cvrt))

# Generate SNPs
s = rbinom(n = nsample, size = 2, prob = 0.2)

# Generate log-normalized expression
y = log(beta0 + beta1 * s) + 
    cvrt %*% rnorm(ncvrt) + 
    rnorm(nsample)

We provide two equivalent functions for model estimation.

z1 = effectSizeEstimationR(s, y, cvrt)
## Warning in beta_cur * x: Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.

## Warning in beta_cur * x: Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.

## Warning in beta_cur * x: Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.

## Warning in beta_cur * x: Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.

## Warning in beta_cur * x: Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.

## Warning in beta_cur * x: Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.

## Warning in beta_cur * x: Recycling array of length 1 in array-vector arithmetic is deprecated.
##   Use c() or as.vector() instead.
z2 = effectSizeEstimationC(s, y, cvrt)

pander(rbind(z1,z2))
  beta0 beta1 nits SSE SST F eta SE_eta
z1 9196 54704 6 1049 1925 818 5.95 0.483
z2 9196 54704 6 1049 1925 818 5.95 0.483

Variables z1, z2 show that the estimation was done in 6 iterations, with estimated parameters

3.2 Testing All Local Gene-SNP Pairs

First we generate a eQTL dataset in filematrix format (see filematrix package).

tempdirectory = tempdir();
#tempdirectory = "~/Desktop/package_tests"
z = create_artificial_data(
    nsample = 100,
    ngene = 500,
    nsnp = 5000,
    ncvrt = 1,
    minMAF = 0.2,
    saveDir = tempdirectory,
    returnData = FALSE,
    savefmat = TRUE,
    savetxt = FALSE,
    verbose = FALSE)

In this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.

multithreadACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    cisdist = 100e+03, 
    threads = 2,
    workdir = paste0(tempdirectory,"/filematrices"),
    verbose = FALSE)

Now the filematrix ACME holds estimations for all local gene-SNP pairs.

fm = fm.open(paste0(tempdirectory,"/filematrices/ACME"))
TenResults = fm[,1:10];
rownames(TenResults) = rownames(fm);
close(fm);

pander(t(TenResults))
geneid snp_id beta0 beta1 nits SSE SST F eta SE
1 1 98.4 -36.1 7 102 116 13.7 -0.367 0.0507
1 2 83.8 -11.2 7 115 116 1.04 -0.133 0.112
2 10 142 9.41 4 126 127 0.12 0.0662 0.2
2 11 101 66.3 5 117 127 8.24 0.656 0.322
2 12 144 7.02 6 127 127 0.0711 0.0488 0.19
2 13 160 -17.6 5 126 127 0.346 -0.111 0.171
3 20 141 -25.1 5 99.2 102 2.52 -0.178 0.0897
3 21 100 31.5 5 99.4 102 2.33 0.315 0.244
3 22 113 4.64 5 102 102 0.089 0.0412 0.144
3 23 137 -33.5 6 97.3 102 4.46 -0.245 0.085

3.3 Testing Multi-SNP Model for All Local Gene-SNP Pairs

Now we can estimate multi-SNP ACME models for each gene:

multisnpACME(
    genefm = 'gene',
    snpsfm = 'snps',
    glocfm = 'gene_loc',
    slocfm = 'snps_loc',
    cvrtfm = 'cvrt',
    acmefm = 'ACME',
    workdir = paste0(tempdirectory, "/filematrices"),
    genecap = Inf,
    verbose = FALSE)

Now the filematrix ACME_multiSNP holds estimations for all multi-SNP models.

fm = fm.open(paste0(tempdirectory,"/filematrices/ACME_multiSNP"))
TenResults = fm[,1:10];
rownames(TenResults) = rownames(fm);
close(fm);

pander(t(TenResults))
geneid snp_id beta0 betas forward_adjR2
1 1 95.2 -34.9 0.114
2 11 98.7 62.7 0.0688
3 23 129 -28.4 0.0341
3 21 129 36.4 0.057
3 20 129 -21 0.0716
4 30 100 30.7 0.0324
4 33 100 -24.1 0.0616
5 41 123 41.7 0.0379
5 40 123 -18.5 0.0407
6 51 104 -26.6 0.0492

Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R\(^2\) value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model’s adjusted-R\(^2\) is greater than that from the previous combined model.