Getting started with gwasrapidd

The GWAS Catalog

The GWAS Catalog is a service provided by the EMBL-EBI and NHGRI that offers a manually curated and freely available database of published genome-wide association studies (GWAS). The Catalog website and infrastructure is hosted by the EMBL-EBI.

There are three ways to access the Catalog database:

gwasrapidd facilitates the access to the Catalog via the REST API, allowing you to programmatically retrieve data directly into R.

GWAS Catalog Entities

The Catalog REST API is organized around four core entities: studies, associations, variants, and traits. gwasrapidd provides four corresponding functions to get each of the entities: get_studies(), get_associations(), get_variants(), and get_traits().

Each function maps to an appropriately named S4 classed object: studies, associations, variants, and traits (see Figure 1).

Figure 1 | gwasrapidd retrieval functions.

Figure 1 | gwasrapidd retrieval functions.

You can use a combination of several search criteria with each retrieval function as shown in Figure 2. For example, if you want to get studies using either one of these two criteria: study accession identifier and variant identifier, you could run the following code:

library(gwasrapidd)
my_studies <- get_studies(study_id = 'GCST000858', variant_id = 'rs12752552')

This command returns all studies that match either 'GCST000858' or 'rs12752552'. This is equivalent to running get_studies separately on each criteria, and combining the results afterwards:

s1 <- get_studies(study_id = 'GCST000858')
s2 <- get_studies(variant_id = 'rs12752552')
my_studies <- union(s1, s2)

All four retrieval functions accept the set_operation parameter which defines the way the results obtained with each criterion are combined. The two options for this parameter are 'union' (default) or 'intersection', resulting, respectively, in an OR or AND operation.

Figure 2 | gwasrapidd arguments for retrieval functions. Colors indicate the criteria that can be used for retrieving GWAS Catalog entities: studies (green), associations (red), variants (purple), and traits (orange).

Figure 2 | gwasrapidd arguments for retrieval functions. Colors indicate the criteria that can be used for retrieving GWAS Catalog entities: studies (green), associations (red), variants (purple), and traits (orange).

Example 1 | Finding Risk Alleles Associated with Autoimmune Disease

As a first example, take the work by Light et al. (2014). In this work the authors focused on variants that had been previously reported in genome-wide association studies (GWAS) for autoimmune disease.

With gwasrapidd we can interrogate the GWAS Catalog for the study/studies by searching by autoimmune disease (an EFO trait). To do that let’s load gwasrapidd first:

library(gwasrapidd)

Then query the GWAS Catalog by EFO trait:

my_studies <- get_studies(efo_trait = 'autoimmune disease')

We can now check how many GWAS studies we got back:

n(my_studies)
#> [1] 8
my_studies@studies$study_id
#> [1] "GCST003097"   "GCST011008"   "GCST007071"   "GCST009873"   "GCST011005"  
#> [6] "GCST011009"   "GCST90029015" "GCST90029016"

Apparently only 8 studies: GCST003097, GCST011008, GCST007071, GCST009873, GCST011005, GCST011009, GCST90029015, GCST90029016. Let’s see the associated publication titles:

my_studies@publications$title
#> [1] "Meta-analysis of shared genetic architecture across ten pediatric autoimmune diseases."                                                            
#> [2] "Genetic factors underlying the bidirectional relationship between autoimmune and mental disorders - findings from a Danish population-based study."
#> [3] "Leveraging Polygenic Functional Enrichment to Improve GWAS Power."                                                                                 
#> [4] "Meta-analysis of Immunochip data of four autoimmune diseases reveals novel single-disease and cross-phenotype associations."                       
#> [5] "Genetic factors underlying the bidirectional relationship between autoimmune and mental disorders - findings from a Danish population-based study."
#> [6] "Genetic factors underlying the bidirectional relationship between autoimmune and mental disorders - findings from a Danish population-based study."
#> [7] "Mixed-model association for biobank-scale datasets."                                                                                               
#> [8] "Mixed-model association for biobank-scale datasets."

If you want to further inspect these publications, you can quickly browse the respective PubMed entries:

# This launches your web browser at https://www.ncbi.nlm.nih.gov/pubmed/26301688
open_in_pubmed(my_studies@publications$pubmed_id) 

Now if we want to know the variants previously associated with autoimmune disease, as used by Light et al. (2014), we need to retrieve statistical association information on these variants, and then filter them based on the same level of significance \(P < 1\times 10^{-6}\) (Light et al. 2014).

So let’s start by getting the associations by study_id:

# You could have also used get_associations(efo_trait = 'autoimmune disease')
my_associations <- get_associations(study_id = my_studies@studies$study_id)

Seemingly, there are 182 associations.

n(my_associations)
#> [1] 182

However, not all variants meet the level of significance, as required by Light et al. (2014):

# Get association ids for which pvalue is less than 1e-6.
dplyr::filter(my_associations@associations, pvalue < 1e-6) %>% # Filter by p-value
  tidyr::drop_na(pvalue) %>%
  dplyr::pull(association_id) -> association_ids # Extract column association_id

Here we subset the my_associations object by a vector of association identifiers (association_ids) into a smaller object, my_associations2:

# Extract associations by association id
my_associations2 <- my_associations[association_ids]
n(my_associations2)
#> [1] 180

Of the 182 associations found in GWAS Catalog, 180 meet the p-value threshold of \(1\times 10^{-6}\). Here are the variants, and their respective risk allele and risk frequency:

my_associations2@risk_alleles[c('variant_id', 'risk_allele', 'risk_frequency')] %>%
  print(n = Inf)
#> # A tibble: 180 × 3
#>     variant_id  risk_allele risk_frequency
#>     <chr>       <chr>                <dbl>
#>   1 rs11580078  G                    0.43 
#>   2 rs6679677   A                    0.09 
#>   3 rs34884278  C                    0.3  
#>   4 rs6689858   C                    0.29 
#>   5 rs2075184   T                    0.23 
#>   6 rs36001488  C                    0.48 
#>   7 rs4676410   A                    0.19 
#>   8 rs4625      G                    0.31 
#>   9 rs62324212  A                    0.42 
#>  10 rs7725052   C                    0.43 
#>  11 rs7731626   A                    0.39 
#>  12 rs4869313   T                    0.42 
#>  13 rs11741255  A                    0.42 
#>  14 rs755374    T                    0.32 
#>  15 rs36051895  T                    0.29 
#>  16 rs4246905   T                    0.28 
#>  17 rs11145763  C                    0.4  
#>  18 rs706778    T                    0.41 
#>  19 rs10822050  C                    0.39 
#>  20 rs1250563   C                    0.29 
#>  21 rs1332099   T                    0.46 
#>  22 rs17885785  T                    0.2  
#>  23 rs17466626  G                    0.02 
#>  24 rs1689510   C                    0.31 
#>  25 rs72743477  G                    0.21 
#>  26 rs12598357  G                    0.39 
#>  27 rs12928404  C                    0.38 
#>  28 rs117372389 T                    0.02 
#>  29 rs12232497  C                    0.45 
#>  30 rs62131887  T                    0.28 
#>  31 rs602662    G                    0.49 
#>  32 rs2738774   A                    0.32 
#>  33 rs2836882   A                    0.27 
#>  34 rs2066363   C                    0.34 
#>  35 rs114846446 A                    0.01 
#>  36 rs7672495   C                    0.18 
#>  37 rs7660520   A                    0.26 
#>  38 rs7831697   G                    0.25 
#>  39 rs7042370   T                    0.43 
#>  40 rs10988542  C                    0.08 
#>  41 rs7100025   G                    0.34 
#>  42 rs77150043  T                    0.23 
#>  43 rs2807264   C                    0.21 
#>  44 rs12863738  T                    0.17 
#>  45 rs556990455 T                    0.016
#>  46 rs4713462   A                    0.29 
#>  47 rs1046080   C                    0.26 
#>  48 rs115884658 A                    0.047
#>  49 rs3132940   T                    0.15 
#>  50 rs114355928 T                    0.018
#>  51 rs1064173   A                    0.25 
#>  52 rs3104394   G                    0.16 
#>  53 rs3957147   T                    0.21 
#>  54 rs2071538   A                    0.24 
#>  55 rs194679    T                    0.047
#>  56 rs381218    T                    0.29 
#>  57 rs10797431  <NA>                NA    
#>  58 rs72920202  <NA>                NA    
#>  59 rs10494079  <NA>                NA    
#>  60 rs2476601   <NA>                NA    
#>  61 rs1800601   <NA>                NA    
#>  62 rs11675342  <NA>                NA    
#>  63 rs1534430   <NA>                NA    
#>  64 rs67927699  <NA>                NA    
#>  65 rs5865      <NA>                NA    
#>  66 rs2075302   <NA>                NA    
#>  67 rs142647938 <NA>                NA    
#>  68 rs10202630  <NA>                NA    
#>  69 rs7568275   <NA>                NA    
#>  70 rs3087243   <NA>                NA    
#>  71 rs145268310 <NA>                NA    
#>  72 rs1921445   <NA>                NA    
#>  73 rs28583049  <NA>                NA    
#>  74 rs1530687   <NA>                NA    
#>  75 rs57791671  <NA>                NA    
#>  76 rs114558062 <NA>                NA    
#>  77 rs2030519   <NA>                NA    
#>  78 rs10937560  <NA>                NA    
#>  79 rs56817615  <NA>                NA    
#>  80 rs7441808   <NA>                NA    
#>  81 rs9683415   <NA>                NA    
#>  82 rs6840978   <NA>                NA    
#>  83 rs7655915   <NA>                NA    
#>  84 rs391851    <NA>                NA    
#>  85 rs114378220 <NA>                NA    
#>  86 rs11746555  <NA>                NA    
#>  87 rs1549922   <NA>                NA    
#>  88 rs9392504   <NA>                NA    
#>  89 rs72928038  <NA>                NA    
#>  90 rs761357    <NA>                NA    
#>  91 rs11757201  <NA>                NA    
#>  92 rs6914622   <NA>                NA    
#>  93 rs9356551   <NA>                NA    
#>  94 rs60600003  <NA>                NA    
#>  95 rs221781    <NA>                NA    
#>  96 rs3807307   <NA>                NA    
#>  97 rs1032129   <NA>                NA    
#>  98 rs11785816  <NA>                NA    
#>  99 rs865488    <NA>                NA    
#> 100 rs7005834   <NA>                NA    
#> 101 rs970987    <NA>                NA    
#> 102 rs1443438   <NA>                NA    
#> 103 rs13299616  <NA>                NA    
#> 104 rs10986284  <NA>                NA    
#> 105 rs706778    <NA>                NA    
#> 106 rs2181622   <NA>                NA    
#> 107 rs71508903  <NA>                NA    
#> 108 rs10748781  <NA>                NA    
#> 109 rs7088058   <NA>                NA    
#> 110 rs1199047   <NA>                NA    
#> 111 rs4409785   <NA>                NA    
#> 112 rs773107    <NA>                NA    
#> 113 rs4761587   <NA>                NA    
#> 114 rs1320344   <NA>                NA    
#> 115 rs7310615   <NA>                NA    
#> 116 rs191252491 <NA>                NA    
#> 117 rs9507287   <NA>                NA    
#> 118 rs76428106  <NA>                NA    
#> 119 rs2093816   <NA>                NA    
#> 120 rs9591325   <NA>                NA    
#> 121 rs55984493  <NA>                NA    
#> 122 rs11622435  <NA>                NA    
#> 123 rs10444776  <NA>                NA    
#> 124 rs11073337  <NA>                NA    
#> 125 rs8061370   <NA>                NA    
#> 126 rs78534766  <NA>                NA    
#> 127 rs11117433  <NA>                NA    
#> 128 rs35776863  <NA>                NA    
#> 129 rs13380830  <NA>                NA    
#> 130 rs73316435  <NA>                NA    
#> 131 rs1893217   <NA>                NA    
#> 132 rs1790588   <NA>                NA    
#> 133 rs10425559  <NA>                NA    
#> 134 rs34536443  <NA>                NA    
#> 135 rs11086102  <NA>                NA    
#> 136 rs12980063  <NA>                NA    
#> 137 rs240753    <NA>                NA    
#> 138 rs3765209   <NA>                NA    
#> 139 rs12482947  <NA>                NA    
#> 140 rs5754100   <NA>                NA    
#> 141 rs229541    <NA>                NA    
#> 142 rs6664969   A                   NA    
#> 143 rs1748041   C                   NA    
#> 144 rs2476601   A                   NA    
#> 145 rs1217403   C                   NA    
#> 146 rs10912267  A                   NA    
#> 147 rs13415465  G                   NA    
#> 148 rs12619531  G                   NA    
#> 149 rs10931468  A                   NA    
#> 150 rs6749371   T                   NA    
#> 151 rs7574865   T                   NA    
#> 152 rs7426056   A                   NA    
#> 153 rs3087243   A                   NA    
#> 154 rs35677470  A                   NA    
#> 155 rs17753641  G                   NA    
#> 156 rs16878091  A                   NA    
#> 157 rs1422673   T                   NA    
#> 158 rs72928038  A                   NA    
#> 159 rs11757201  C                   NA    
#> 160 rs58721818  T                   NA    
#> 161 rs212407    G                   NA    
#> 162 rs60600003  G                   NA    
#> 163 rs7780389   T                   NA    
#> 164 rs4731532   A                   NA    
#> 165 rs2812378   G                   NA    
#> 166 rs3118470   C                   NA    
#> 167 rs72776098  A                   NA    
#> 168 rs947474    G                   NA    
#> 169 rs3802604   G                   NA    
#> 170 rs1250568   C                   NA    
#> 171 rs10892299  T                   NA    
#> 172 rs11171739  C                   NA    
#> 173 rs8043085   T                   NA    
#> 174 rs34593439  A                   NA    
#> 175 rs1054609   C                   NA    
#> 176 rs2542148   C                   NA    
#> 177 rs74956615  A                   NA    
#> 178 rs1893592   C                   NA    
#> 179 rs66534072  G                   NA    
#> 180 rs189071385 <NA>                NA

References

Light, Nicholas, Véronique Adoue, Bing Ge, Shu-Huang Chen, Tony Kwan, and Tomi Pastinen. 2014. “Interrogation of Allelic Chromatin States in Human Cells by High-Density ChIP-Genotyping.” Epigenetics 9 (9): 1238–51.