xGScoreAdvR Documentation

Function to calculate per base scores given a list of genomic regions in terms of overlaps with genomic annotations

Description

xGScoreAdv is supposed to calculate per base scores for an input list of genomic regions (genome build 19), using genomic annotations (eg genomic segments, active chromatin, transcription factor binding sites/motifs, conserved sites). The per base scores are calculated for overlaps with each genomic annotation. Scores for genomic regions/variants can be constraint/conservation or impact/pathogenicity.

Usage

xGScoreAdv(data, format = c("data.frame", "bed", "chr:start-end",
"GRanges"),
build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"),
GS.annotation = c("fitCons", "phastCons", "phyloP", "mcap", "cadd"),
GR.annotation = c(NA, "Uniform_TFBS", "ENCODE_TFBS_ClusteredV3",
"ENCODE_TFBS_ClusteredV3_CellTypes", "Uniform_DNaseI_HS",
"ENCODE_DNaseI_ClusteredV3", "ENCODE_DNaseI_ClusteredV3_CellTypes",
"Broad_Histone", "SYDH_Histone", "UW_Histone", "FANTOM5_Enhancer_Cell",
"FANTOM5_Enhancer_Tissue", "FANTOM5_Enhancer_Extensive",
"FANTOM5_Enhancer",
"Segment_Combined_Gm12878", "Segment_Combined_H1hesc",
"Segment_Combined_Helas3", "Segment_Combined_Hepg2",
"Segment_Combined_Huvec",
"Segment_Combined_K562", "TFBS_Conserved", "TS_miRNA", "TCGA",
"ReMap_Public_TFBS", "ReMap_Public_mergedTFBS",
"ReMap_PublicAndEncode_mergedTFBS", "ReMap_Encode_TFBS",
"Blueprint_BoneMarrow_Histone", "Blueprint_CellLine_Histone",
"Blueprint_CordBlood_Histone", "Blueprint_Thymus_Histone",
"Blueprint_VenousBlood_Histone", "Blueprint_DNaseI",
"Blueprint_Methylation_hyper", "Blueprint_Methylation_hypo",
"EpigenomeAtlas_15Segments_E029", "EpigenomeAtlas_15Segments_E030",
"EpigenomeAtlas_15Segments_E031", "EpigenomeAtlas_15Segments_E032",
"EpigenomeAtlas_15Segments_E033", "EpigenomeAtlas_15Segments_E034",
"EpigenomeAtlas_15Segments_E035", "EpigenomeAtlas_15Segments_E036",
"EpigenomeAtlas_15Segments_E037", "EpigenomeAtlas_15Segments_E038",
"EpigenomeAtlas_15Segments_E039", "EpigenomeAtlas_15Segments_E040",
"EpigenomeAtlas_15Segments_E041", "EpigenomeAtlas_15Segments_E042",
"EpigenomeAtlas_15Segments_E043", "EpigenomeAtlas_15Segments_E044",
"EpigenomeAtlas_15Segments_E045", "EpigenomeAtlas_15Segments_E046",
"EpigenomeAtlas_15Segments_E047", "EpigenomeAtlas_15Segments_E048",
"EpigenomeAtlas_15Segments_E050", "EpigenomeAtlas_15Segments_E051",
"EpigenomeAtlas_15Segments_E062", "CpG_anno", "Genic_anno"), details =
F,
verbose = T, RData.location =
"http://galahad.well.ox.ac.uk/bigdata_dev")

Arguments

data

input genomic regions (GR). If formatted as "chr:start-end" (see the next parameter 'format' below), GR should be provided as a vector in the format of 'chrN:start-end', where N is either 1-22 or X, start (or end) is genomic positional number; for example, 'chr1:13-20'. If formatted as a 'data.frame', the first three columns correspond to the chromosome (1st column), the starting chromosome position (2nd column), and the ending chromosome position (3rd column). If the format is indicated as 'bed' (browser extensible data), the same as 'data.frame' format but the position is 0-based offset from chromomose position. If the genomic regions provided are not ranged but only the single position, the ending chromosome position (3rd column) is allowed not to be provided. The data could also be an object of 'GRanges' (in this case, formatted as 'GRanges')

format

the format of the input data. It can be one of "data.frame", "chr:start-end", "bed" or "GRanges"

build.conversion

the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so)

GS.annotation

which genomic scores (GS) annotaions used. It can be 'fitCons' (the probability of fitness consequences for point mutations; http://www.ncbi.nlm.nih.gov/pubmed/25599402), 'phastCons' (the probability that each nucleotide belongs to a conserved element/negative selection [0,1]), 'phyloP' (conservation at individual sites representing -log p-values under a null hypothesis of neutral evolution, positive scores for conservation and negative scores for acceleration), 'mcap' (eliminating a majority of variants with uncertain significance in clinical exomes at high sensitivity: http://www.ncbi.nlm.nih.gov/pubmed/27776117), and 'cadd' (combined annotation dependent depletion for estimating relative levels of pathogenicity of potential human variants: http://www.ncbi.nlm.nih.gov/pubmed/24487276)

GR.annotation

the genomic regions of annotation data. By default, it is 'NA' to disable this option. Pre-built genomic annotation data are detailed in the section 'Note'. Beyond pre-built annotation data, the user can specify the customised input. To do so, first save your RData file (a list of GR objects, each is an GR object correponding to an annotation) into your local computer. Then, tell "GR.annotation" with your RData file name (with or without extension), plus specify your file RData path in "RData.location". Note: you can also load your customised GR object directly

details

logical to indicate whether the detailed information (ie ratio) is returned. By default, it sets to false for no inclusion

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display

RData.location

the characters to tell the location of built-in RData files. See xRDataLoader for details

Value

a data frame with 6 columns:

Note

The genomic annotation data are described below according to the data sources and data types.
1. ENCODE Transcription Factor ChIP-seq data

2. ENCODE DNaseI Hypersensitivity site data

3. ENCODE Histone Modification ChIP-seq data from different sources

4. FANTOM5 expressed enhancer atlas

5. ENCODE combined (ChromHMM and Segway) Genome Segmentation data

6. Conserved TFBS

7. TargetScan miRNA regulatory sites

8. TCGA exome mutation data

9. ReMap integration of transcription factor ChIP-seq data (publicly available and ENCODE)

10. Blueprint Histone Modification ChIP-seq data

11. BLUEPRINT DNaseI Hypersensitivity site data

12. BLUEPRINT DNA Methylation data

13. Roadmap Epigenomics Core 15-state Genome Segmentation data for primary cells (blood and T cells)

14. Roadmap Epigenomics Core 15-state Genome Segmentation data for primary cells (HSC and B cells)

15. CpG annotation

16. Genic annotation

See Also

xGScore

Examples

## Not run: 
# Load the XGR package and specify the location of built-in data
library(XGR)
RData.location <- "http://galahad.well.ox.ac.uk/bigdata_dev"

# a) provide the genomic regions
## load ImmunoBase
ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase',
RData.location=RData.location)
## get lead SNPs reported in AS GWAS
data <- ImmunoBase$AS$variant

# b) in terms of overlaps with genomic segments (Primary monocytes from peripheral blood)
## fitness consequence score 
res_df <- xGScoreAdv(data=data, format="GRanges",
GS.annotation="fitCons",
GR.annotation="EpigenomeAtlas_15Segments_E029",
RData.location=RData.location)
## phastCons conservation score 
res_df <- xGScoreAdv(data=data, format="GRanges",
GS.annotation="phastCons",
GR.annotation="EpigenomeAtlas_15Segments_E029",
RData.location=RData.location)

# c) in terms of overlaps with genic annotations
## phyloP conservation score 
res_df <- xGScoreAdv(data=data, format="GRanges",
GS.annotation="phyloP", GR.annotation="Genic_anno",
RData.location=RData.location)

## End(Not run)