This vignette introduces the GWASinspector package, its general form and how to run the algorithm on multiple GWAS files:
Check our website for further information and reference data sets.
The easiest way to get GWASinspector is to install it from CRAN:
Alternatively, you can use the installation function and zipped package from our website:
This package includes functions for checking prerequisite libraries on the local machine, setting up QC parameters and running the QC on GWAS result files.
system.check()
Some R packages or system functions are not necessary for running the algorithm; but, it is suggested to install them to benefit full functionality. Using this function to get a brief information about whether these utilities are present and accessible. After running the command you will see a table indicating package names and their version.
inspect()
Main function for running the algorithm on GWAS data files.
check.database()
Displays the summary of a reference database, including how many tables are in the database file, number of data rows for each data table and the first row of each table
find.variants()
Search for a list of variants in the reference panel.
get.config()
Save a sample configuration file on your computer.
get.headerTranslation()
Save a sample column header translation table file.
compare.GWASs()
Compares result files from different analyses. So, there is no need to re-run the analysis on a result file again.
man.plot()
Generates the Manhattan plot from a result file. This function has many features that are described in the paclage tutorial.
inspect.example()
This function runs the algorithm on a fabricated GWAS result file. User should only set the output folder for saving the generated files. The input file and reference dataset are embedded in the package.
Comparing result files with an standard reference panel is the most important part of the QC process. This reference is used to check the alleles in the datasets and to ensure they are all in the same configuration (same strand, same coded alleles) in the post-QC data.
We have created databases from the most popular refernece panel (e.g. HapMap, 1000G, HRC) which are available from our website. Database files are in SQLite format and can be downloaded as a compressed file.
Some reference panles include more than one population which should be set as a parameter in the configuration file before running the algorithm.
Column names in the input files might be different among cohorts. For example, CODED_ALLELE and EFFECT_ALLELE have the same defintion and both should be converted to the same name when compared together. The simplest way is to use a table for translate popular column names to a standard name for consistency of the output files.
A sample file including common terms is provided as part of the package and could be used as a template. The file contains a two-column table, with the left column containing the standard column-names and the right the alternatives.
INI file format is used for configuring the desired parameters and settings for running the algorithm and has three componenets:
Key-names and section-names should not be edited or renamed. Otherwise the algorithm will not work properly.
A sample file is included in the package which should be used as a template. File paths and QC parameters are are set according to comments and examples in the file.
This walkthrough explains how to run QC on a sample result file.
After installation, try loading the package with the following command. You might see some warnings but the package should load successfully without any errors.
require(GWASinspector)
local machine and R environment can be explored by running the following function.
system.check()
This provides information about existing and missing libraries. There are some optional libraries that can be used for better user experience. Refer to the manual for further information.
Pandoc library is required for generating HTML reports. This is an optional feature and full report will always be available in text format.
Java library is required for installing kableExtra and xlsx packages.
kableExtra package provides customized tables in HTML report file. The HTML output file will have a simple look if this package is not available.
xlsx package is required for Excel report files. This is an optional feature and full report will always be available in text format.
Standard allele-frequency reference datasets are available from our website. This file should be decompressed and copied in the references folder (dir_references
parameter of the config file).
A copy of this file can be copied to a local folder by running this command. This is a text file which includes most common variable/header names and can be edited according to user specifications. This file should be copied in the references folder (dir_references
parameter of the config file).
The default name of this file is alt_headers.txt. header_translations
field should be edited in the configuration file accrodingly if this name is changed by user.
Notice: Duplicated entries will stop the algorihtm.
This is a text file and is used for configuring the desired parameters and settings for running the algorithm. A sample file can be copied to local folder by running the following command (config.ini).
The default name of this file is config.ini. This can be changed by user.
Please refer to the configuration file or package manual for full detail of parameters.
QC functions starts with the following command. Please refer to the package manual for full detail of parameters.
inspect('/path/to/configuration/file')
All result files and reports are saved in the output folder. An exhaustive log file indicates all steps and can be used for localization of any problems during this run.