The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

PONG2 Imputation Workflow

Norman Lab

Overview

This vignette provides a complete, step-by-step guide to performing KIR allele imputation using the impute command in PONG2.

The workflow covers:


Prerequisites

Requirement Version Notes
PLINK2 ≥ 2.0 Must be in PATH
R ≥ 4.0 With PONG2 installed
minimac4 ≥ 4.1.6 Only for --fill-missing
Eagle2 ≥ 2.4 Only for pre-phasing
bgzip & tabix HTSlib Only for --fill-missing

Step 1: Prepare Input Data

PONG2 works best when input files are restricted to chromosome 19 (covering the KIR locus). Extract chr19 from your full-genome PLINK files:

plink2 \
  --bfile your_full_genome_prefix \
  --chr 19 \
  --make-bed \
  --out chr19_only

This creates chr19_only.bed, chr19_only.bim, and chr19_only.fam.


Step 2: Run Basic PONG2 Imputation

# --filter can be 0.005 or 0.01
# 0.005 allows more rare KIR alleles in the output
pong2 impute \
  -i chr19_only \
  -o results/basic \
  -l KIR3DL1 \
  -a hg38 \
  -t 16 \
  --filter 0.005

PONG2 will automatically check the SNP overlap between your data and the 1KGP reference panel in the KIR region and report the match rate.


Step 3: Check SNP Overlap

NOTE: KIR Region SNP Overlap between input data and 1KGP

Overlap rate is computed between your input data and the 1000 Genomes Project (1KGP) reference panel in the KIR region:

Assembly KIR Region Coordinates
hg19 chr19:55,000,000–55,400,000
hg38 chr19:54,000,000–55,000,000
Overlap Rate Status Action
≥ 50% Pass Proceed with PONG2 directly
< 50% Fail Run Eagle2 + pre-imputation first

If your match rate is sufficient (≥ 50%), PONG2 will proceed automatically. If not, use one of the pre-imputation strategies below.


Step 4: Pre-imputation (when SNP overlap < 50%)

Pre-phasing the KIR region is required before any pre-imputation strategy.

Pre-phase with Eagle2

hg19

eagle \
  --bfile=chr19_only \
  --geneticMapFile=genetic_map_hg19.txt.gz \
  --outPrefix=chr19.phased \
  --chrom=19 \
  --numThreads=20 \
  --bpStart=55000000 \
  --bpEnd=55400000

hg38

eagle \
  --bfile=chr19_only \
  --geneticMapFile=genetic_map_hg38.txt.gz \
  --outPrefix=chr19.phased \
  --chrom=19 \
  --numThreads=20 \
  --bpStart=54000000 \
  --bpEnd=55000000

Eagle2 outputs a phased VCF: chr19.phased.vcf.gz


Option A: Local Pre-imputation with minimac4 (built-in)

Pass the pre-phased VCF directly to PONG2 using --vcf and --fill-missing.

Important: --vcf is the only input required with --fill-missing.
PLINK files cannot hold phased haplotype data — the pipeline derives everything from the VCF internally. Do not supply -i together with --fill-missing.

pong2 impute \
  --vcf chr19.phased.vcf.gz \
  -o results/local_impute \
  -l KIR3DL1 \
  -a hg19 \
  -t 20 \
  --filter 0.005 \
  --fill-missing

Step 5: Interpreting Output

After pong2 impute completes, results are saved in <output>/KIR/:

File Description
KIR/<locus>.csv Predicted KIR alleles per sample (main results)
KIR/<locus>.RData Full prediction object including allele probabilities

Output CSV format

sample.id, KIR3DL1.1, KIR3DL1.2, prob.KIR3DL1.1, prob.KIR3DL1.2
HG00096,   KIR3DL1*001, KIR3DL1*002, 0.98, 0.95
HG00097,   KIR3DL1*005, KIR3DL1*015, 0.87, 0.91

Large sample datasets

For datasets with >2,000 samples, PONG2 automatically splits prediction into chunks of 2,000 samples to prevent memory issues. Results are combined and saved as a single output file — no action required from the user.


Summary: Which Workflow to Choose?

Scenario Recommended approach
SNP overlap ≥ 50% Run pong2 impute -i directly
SNP overlap < 50%, quick run needed Eagle2 → pong2 impute --vcf --fill-missing
SNP overlap < 50%, highest accuracy Eagle2 → Michigan Server → pong2 impute -i
Low overlap, understand risks pong2 impute -i --force

Next Steps

Happy KIR imputation! 🧬

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.