The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

package-ApplyPolygenicScore

Description
Installation
Getting Started
Resources
Contributors
License

Description

This R package provides a set of utilities to simply and transparently parse genotype/dosage data from an input VCF, match genotype coordinates to the component SNPs of an existing polygenic score model, and apply SNP weights to dosages to calculate a polygenic score for each individual in accordance with the additive weighted sum of dosages method.

Installation

To install the last release on CRAN:

# In an R session
install.packages("ApplyPolygenicScore")

To install the latest development version from GitHub:

# In an R session
# install.packages("devtools")

devtools::install_github("uclahs-cds/package-ApplyPolygenicScore")

# To access vignettes, make sure to add the `build_vignettes` argument:

devtools::install_github("uclahs-cds/package-ApplyPolygenicScore", build_vignettes = TRUE)

Getting Started

This package is a fantastic resource for people in the following scenario: you have just received some genetic data from a DNA sequencing or genotyping experiment, or perhaps you have just downloaded one such dataset from a public repository. You noticed that a research group or genetic consortium has recently published a new polygenic score for the trait you are interested in studying. You would now like to apply that PGS to your data. ApplyPolygenicScore provides all the functions you need to import all required input data into R, perform the necessary calculations, and transition smoothly into analysis.

Below is an overview of the data you will require to get started. For even more details, check out our discussion on What is a PGS?.

Input Data

You will need only two pieces of data to get started: 1. A VCF file: Genotype data of the individuals upon which you wish to apply a polygenic score. 2. A PGS weight file: Coordinates of each SNP that compose the polygenic score you with to apply, a SNP ID, and their associated weights.

Genotype data

Genotype data should be provided in the form of a VCF (Variant Call Format) file.
Others have done a great job of describing Variant Call Format. For those with a basic understanding of genetic nomenclature, we recommend the GATK resource.
For those who need a refresher on genomics and genomic data, we recommend starting with the fact sheets curated by the National Human Genome Research Institute (NHGRI).

If you wish to apply a PGS to a cohort, we recommend that genotypes for the whole cohort be aggregated in one VCF file, either through a regenotyping process, or through VCF merging with an external tool designed for manipulating VCF files. VCF files can be very large, causing memory-related complications in the R environment. To reduce memory usage and improve speed of PGS application, we recommend pre-filtering the input VCF for only the coordinates that compose the PGS you wish to apply. This action can be performed using a genomic coordinate file in BED format and tools such as bcftools or bedtools. To facilitate this process, ApplyPolygenicScore provides a function that outputs a BED-formatted file containing genomic coordinates for any number of PGS weight files provided as input.

PGS weight file

The PGS weight file describes a PGS by providing a list of component SNPs, their genomic coordinates, and their respective weights.
The PGS Catalog is a public database of PGSs and their weight files, and a great first stop for acquiring a PGS weight file.
The functions of ApplyPolygenicScore have been designed to operate on weight files that have been formatted according to the standards established by The PGS Catalog. These are very well documented here.
You could easily create your own compatible PGS weight file, simply by formatting all required columns by Catalog standards.
When in doubt, use our check.pgs.weight.columns() function to make sure any data table you import into R contains the required columns for downstream functions.
Required columns in text files prior to importation are: chr_name, chr_position, effect_allele, effect_weight
Optional columns that can be used in certain functions are rsID, other_allele, and allelefrequency_effect.
The following PGS Catalog column names are converted upon importation to column names used by VCF files to facilitate PGS/VCF matching

input to `import.pgs.weight.file`	output of `import.pgs.weight.file`
`rsID`	`ID`
`chr_name`	`CHROM`
`chr_position`	`POS`

Recommended Workflow

Convert PGS weight files to BED-formatted coordinate files.

We recommend starting by filtering your input VCF for just the variants in your PGS weight files. Several software tools are available to do this, and most all require a coordinate file in BED format. A description of BED format can be found here.

The function import.pgs.weight.file can be used to import your PGS weight files into R. The functions convert.pgs.to.bed and combine.pgs.bed can be used to make the conversion, and merge several BED dataframes into one, respectively.
Import your VCF file.

Once you have filtered down your VCF, simply import it into R using import.vcf. This function is a wrapper of vcfR::vcfR2tidy that ensures all required fields are imported.
Apply your PGS.

Provide your imported VCF and PGS weight files to apply.polygenic.score. It’s as simple as that. Under the hood, this function begins by calling combine.vcf.with.pgs. The merge function also outputs a list of variants in your PGS that could not be found in your VCF data, which you can obtain by calling the function independently. apply.polygenic.score outputs lots of useful information along with the score and provides various customizeable options, such as methods for handling missing sites (see this discussion for more) and basic analyses with phenotype data.
Create summary plots.

ApplyPolygenicScore comes with several plotting functions designed to operate on the results of apply.polygenic.score. Display PGS density curves with create.pgs.density.plot, distributions with create.pgs.boxplot and PGS percentile ranks with create.pgs.rank.plot. If you provided phenotype data in step 3, you can incorporate categorical data into the density plots and categorical and continuous phenotype data into the rank plots, and use create.pgs.with.continuous.phenotype.plot to make scatterplots of your PGS against any continuous phenotype data. For more sophisticated downstream analysis, check how well the PGS predicts binary outcomes using analyze.pgs.binary.predictiveness.

For more step-by-step instructions, check out our vignettes.

Resources

This package is hosted on CRAN. The manual of all functions and the User Guide vignette can be accessed on the ApplyPolygenicScore CRAN page.

If you have installed the package from GitHub with build_vignettes = TRUE, you may view the vignette by running the following:

vignette('UserGuide', package = 'ApplyPolygenicScore')

Or by simply opening the rendered file that will be automatically written to the doc folder in your local package directory.

View function-specific documentation using ?:

?apply.polygenic.score

Getting Help

Looking for guidance or support with ApplyPolygenicScore? Check out our Discussions page.

Submit bugs, suggest new features or see current works at our Issues page.

Contributors

For lists of contributors please visit here at GitHub.

License

Author: Nicole Zeltser(nzeltser@mednet.ucla.edu)

package-ApplyPolygenicScore is licensed under the GNU General Public License version 2. See the file LICENSE.md for the terms of the GNU GPL license.

A package that provides utilities for the application of an existing polygenic score to a VCF.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.