The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Table of Contents
Seqminer is a highly efficient R-package for retrieving sequence variants from biobank scale datasets of millions of individuals and billions of genetic variants. It supports all variant types, including multi-allelic variants and imputation dosages. It takes VCF/BCF/BGEN/PLINK format as input file, indexes, queries them based upon variant-based index and loads them as R data types such as list or matrix.
Install the development version (devtools package is required):
devtools::install_github("zhanxw/seqminer")
Here are some examples of how to use seqminer to index and query files in real-life scenarios.
library(seqminer)
bcf.ref.file <- "input.bcf"
bcf.idx.file <- "input.bcf.scIdx"
out <- seqminer::createSingleChromosomeBCFIndex(bcf.ref.file, bcf.idx.file)
or
vcf.ref.file <- "input.vcf.gz"
vcf.idx.file <- "input.vcf.gz.scIdx"
out <- seqminer::createSingleChromosomeVCFIndex(vcf.ref.file, vcf.idx.file)
This would generate variant-based index that works with commonly used sequence variant file format, such as VCF/BCF files.
Query VCF file:
vcf.ref.file <- "input.vcf.gz"
vcf.idx.file <- "input.vcf.gz.scIdx"
tabix.range <- "1:123-1234"
geno <- seqminer::readSingleChromosomeVCFToMatrixByRange(vcf.ref.file, tabix.range, vcf.idx.file)
Query BCF file:
bcf.ref.file <- "input.bcf"
bcf.idx.file <- "input.bcf.scIdx"
tabix.range <- "1:123-1234"
geno <- seqminer::readSingleChromosomeBCFToMatrixByRange(bcf.ref.file, tabix.range, bcf.idx.file)
Querying multiple regions is also doable, simply specify multiple
regions and separte them by a comma,
e.g. "1:123-124,1:1234-1235"
.
Output example (column represents variants, row represents individuals):
Query BGEN file:
bg.ref.file <- "input.bgen"
bg.range <- "1:123-1234"
geno.mat <- seqminer::readBGENToMatrixByRange(bg.ref.file, bg.range)
geno.list <- seqminer::readBGENToListByRange(bg.ref.file, bg.range)
Make sure that bgen file has an index file *.bgi
in the
same folder.
Query PLINK file:
plink.ref.file <- "input"
geno <- seqminer::readPlinkToMatrixByIndex(plink.ref.file, sampleIndex=1:20000, markerIndex=1:100)
We also developed a seqminer command line interface:
./queryVCFIndex.intel input.vcf.gz input.vcf.gz.scIdx 1:123-1234
Citation:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.