The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Coding-variant Allelic Series Test

Updated: 2024-11-20

Data

To run an allelic series test, there are 4 key inputs:

The example data used below were generated using the DGP function provided with the package. The data set includes 100 subjects, 300 variants, and a continuous phenotype. The true effect sizes follow an allelic series, with magnitudes proportional to c(1, 2, 3) for BMVs, DMVs, and PTVs respectively.

set.seed(101)
n <- 100
data <- AllelicSeries::DGP(
  n = n,
  snps = 300,
  beta = c(1, 2, 3) / sqrt(n),
)

# Annotations.
anno <- data$anno
head(anno)
## [1] 1 1 2 2 1 2
# Covariates.
covar <- data$covar
head(covar)
##      int         age sex          pc1        pc2         pc3
## [1,]   1 -0.77465210   1  0.539531581  1.2134116  0.63478280
## [2,]   1 -0.03492052   0  0.009270313  1.5256400 -0.62027221
## [3,]   1 -0.20782643   1 -0.777242512  1.2319750 -0.73311857
## [4,]   1  0.69939529   0 -0.357743663  0.1747664 -0.29754242
## [5,]   1  0.47420055   1 -1.083115262 -1.3005348 -0.68338808
## [6,]   1 -1.15038152   0 -0.539141451 -0.5159589  0.01971706
# Genotypes.
geno <- data$geno
head(geno[,1:5])
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0
## [3,]    0    0    0    0    0
## [4,]    0    0    0    0    0
## [5,]    0    0    0    0    0
## [6,]    0    1    0    0    0
# Phenotype.
pheno <- data$pheno
head(pheno)
## [1] 0.2958143 1.7294107 0.4886318 1.0201730 1.0304183 1.9704552

The example data generated by the preceding are available under vignettes/vignette_data.

Running the alleic series test

The COding-variant Allelic Series Test (COAST) is run using the COAST function. By default, the output of COAST includes a data.frame of counts showing the number of alleles, variants, and carriers in each class that contributed to the test, and a data.frame of p-values, with the omni test denoting the final, overall p-value. Inspection of the component p-values may be useful for determining which model(s) detected evidence of an association. In the present case, the baseline count model provided the greatest power.

results <- AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar
)
show(results)
## Effect Sizes:
##         test beta    se
## 1       base 0.05 0.068
## 2       base 0.06 0.079
## 3       base 0.59 0.175
## 4        ind 0.31 0.256
## 5        ind 0.22 0.216
## 6        ind 0.66 0.218
## 7  max_count 0.06 0.048
## 8    max_ind 0.44 0.130
## 9  sum_count 0.07 0.031
## 10   sum_ind 0.19 0.058
## 
## 
## Counts:
##   anno alleles variants carriers
## 1    1     186      164       84
## 2    2     127      110       74
## 3    3      27       26       23
## 
## 
## P-values:
##           test   type     pval
## 1     baseline burden 6.62e-03
## 2          ind burden 9.11e-03
## 3    max_count burden 2.32e-01
## 4      max_ind burden 8.15e-04
## 5    sum_count burden 1.48e-02
## 6      sum_ind burden 1.09e-03
## 7 allelic_skat   skat 6.83e-02
## 8         omni   omni 4.68e-03

The effect sizes data.frame is accessed via:

results@Betas
##         test       beta         se
## 1       base 0.05309214 0.06836091
## 2       base 0.06427266 0.07917711
## 3       base 0.58612733 0.17472435
## 4        ind 0.30597919 0.25595872
## 5        ind 0.21943023 0.21607933
## 6        ind 0.66475646 0.21829831
## 7  max_count 0.05719340 0.04783470
## 8    max_ind 0.43655519 0.13041078
## 9  sum_count 0.07437087 0.03052869
## 10   sum_ind 0.19038906 0.05828998

the counts data.frame via:

results@Counts
##   anno alleles variants carriers
## 1    1     186      164       84
## 2    2     127      110       74
## 3    3      27       26       23

and the p-values data.frame via:

results@Pvals
##           test   type        pval
## 1     baseline burden 0.006618953
## 2          ind burden 0.009107121
## 3    max_count burden 0.231834498
## 4      max_ind burden 0.000815325
## 5    sum_count burden 0.014846685
## 6      sum_ind burden 0.001089858
## 7 allelic_skat   skat 0.068349669
## 8         omni   omni 0.004683243

Different numbers of annotation categories

COAST was originally intended to operate on the benign missense variants, damaging missense variants, and protein truncating variants within a gene, but it has been generalized to allow for an arbitrary number of discrete annotation categories. The following example simulates and analyzes data with 4 annotation categories. The main difference when analyzing a different number of annotation categories is that the weight vector should be specified, and should have length equal to the number of possible annotation categories. COAST will run, albeit with a warning, if there are possible annotation categories to which no variants are assigned (e.g. a gene contains no PTVs).

withr::local_seed(102)

# Generate data.
n <- 1e2
data <- AllelicSeries::DGP(
  n = n,
  snps = 400,
  beta = c(1, 2, 3, 4) / sqrt(n),
  prop_anno = c(0.4, 0.3, 0.2, 0.1),
  weights = c(1, 1, 1, 1)
)

# Run COAST-SS.
results <- AllelicSeries::COAST(
  anno = data$anno,
  covar = data$covar,
  geno = data$geno,
  pheno = data$pheno,
  weights = c(1, 2, 3, 4)
)
show(results)
## Effect Sizes:
##         test  beta    se
## 1       base  0.01 0.065
## 2       base  0.23 0.062
## 3       base  0.19 0.096
## 4       base  0.61 0.116
## 5        ind -0.10 0.253
## 6        ind  0.53 0.205
## 7        ind  0.31 0.169
## 8        ind  0.96 0.186
## 9  max_count  0.18 0.035
## 10   max_ind  0.47 0.090
## 11 sum_count  0.11 0.018
## 12   sum_ind  0.19 0.035
## 
## 
## Counts:
##   anno alleles variants carriers
## 1    1     194      169       86
## 2    2     159      131       78
## 3    3      71       61       51
## 4    4      41       39       31
## 
## 
## P-values:
##           test   type     pval
## 1     baseline burden 9.67e-10
## 2          ind burden 5.86e-07
## 3    max_count burden 4.26e-07
## 4      max_ind burden 1.65e-07
## 5    sum_count burden 5.37e-10
## 6      sum_ind burden 8.24e-08
## 7 allelic_skat   skat 3.27e-06
## 8         omni   omni 4.11e-09

Test options

AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  apply_int = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  include_orig_skato_all = TRUE,
  include_orig_skato_ptv = TRUE,
  ptv_anno = 3
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = 1 * (pheno > 0),
  covar = covar,
  is_pheno_binary = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  min_mac = 2
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  return_omni_only = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  score_test = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  weights = c(1, 2, 3)
)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.