Quality of genotypes and sequences can be assessed with several functions. For genotypic data, by-locus summaries can be conducted with the summarizeLoci function, which will also produce summaries for each strata:

data(msats.g)
msats <- msats.g
smry <- summarizeLoci(msats)
head(smry)
##       num.genotyped prop.genotyped num.alleles allelic.richness
## D11t            125           0.99          12            0.096
## EV37            119           0.94          22            0.185
## EV94            125           0.99          15            0.120
## Ttr11           125           0.99           9            0.072
## Ttr34           126           1.00          10            0.079
##       prop.unique.alleles exptd.heterozygosity obsvd.heterozygosity
## D11t                0.250                 0.75                 0.70
## EV37                0.136                 0.83                 0.70
## EV94                0.067                 0.83                 0.78
## Ttr11               0.222                 0.80                 0.70
## Ttr34               0.200                 0.81                 0.70

The dupGenotypes function identifies samples that have the same or nearly the same genotypes. The number (or percent) of loci that must be shared in order for it to be considered a duplicate can be set by the num.shared argument. The return data.frame provides which loci the two samples show mismatches at so they can be reviewed.

# Find samples that share alleles at 2/3rds of the loci
dupGenotypes(msats, num.shared = 0.66)
##    ids.1 ids.2 strata.1 strata.2 num.loci.genotyped num.loci.shared
## 1  41579 45237  Coastal  Coastal                  5               5
## 2  23945 78065  Coastal  Coastal                  5               4
## 3  25503 78053  Coastal  Coastal                  5               4
## 4  25509 41822  Coastal  Coastal                  5               4
## 5  41540 78040  Coastal  Coastal                  5               4
## 6  41578 45233  Coastal  Coastal                  5               4
## 7  44720 78058  Coastal  Coastal                  5               4
## 8  45230 78040  Coastal  Coastal                  5               4
## 9  78034 78040  Coastal  Coastal                  5               4
## 10 78034 78043  Coastal  Coastal                  5               4
## 11 78035 78053  Coastal  Coastal                  5               4
## 12 78045 78058  Coastal  Coastal                  5               4
##    prop.loci.shared mismatch.loci
## 1               1.0              
## 2               0.8          EV37
## 3               0.8         Ttr11
## 4               0.8         Ttr34
## 5               0.8          D11t
## 6               0.8          EV94
## 7               0.8          EV94
## 8               0.8         Ttr11
## 9               0.8          EV94
## 10              0.8          EV37
## 11              0.8          D11t
## 12              0.8         Ttr34

The start and end positions and number of N’s and indels can be generated with the summarizeSeqs function:

data(dolph.seqs)
seq.smry <- summarizeSeqs(as.DNAbin(dolph.seqs))
head(seq.smry)
##      start end length num.ns num.indels
## 4495     1 402    402      0          2
## 4496     1 402    402      0          2
## 4498     1 402    402      0          1
## 5814     1 402    402      0          2
## 5815     1 402    402      0          2
## 5816     1 402    402      0          2

Base frequencies can be generated with baseFreqs:

bf <- baseFreqs(as.DNAbin(dolph.seqs))

# nucleotide frequencies by site
bf$site.freq[, 1:15]
##     1   2   3   4   5   6 7   8   9  10  11  12  13  14  15
## a   0 126 126 126 126 126 5   0   0   0   0 126   0   0   0
## c   0   0   0   0   0   0 0   0 126   0   0   0   0   0   0
## g 126   0   0   0   0   0 0 126   0   0   0   0   0   0 126
## t   0   0   0   0   0   0 0   0   0 126 126   0 126 126   0
## u   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## r   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## y   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## m   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## k   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## w   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## s   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## b   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## d   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## h   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## v   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## n   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## x   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## -   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
## .   0   0   0   0   0   0 0   0   0   0   0   0   0   0   0
# overall nucleotide frequencies
bf$base.freqs
## 
##      a      c      g      t      u      r      y      m      k      w 
## 0.2997 0.2282 0.1283 0.3389 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 
##      s      b      d      h      v      n      x      -      . 
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0048 0.0000

Sequences can be scanned for low-frequency substitutions with lowFreqSubs:

lowFreqSubs(as.DNAbin(dolph.seqs), min.freq = 2)
##      id site base       motif
## 1 74962   57    a taaaaataatt
## 2 74962  104    g catacgcatgt
## 3 74962  392    t catgctccgtg
## 4 74962  393    c atgctccgtgg
## 5 23792  274    t cctattgatcc
## 6 26304  274    a cctataaatcc
## 7 26304  394    t taccttgtggg
## 8 23794  287    g cctccgttata

Unusual sequences can be identified by plotting likelihoods based on pairwise distances:

data(dolph.haps)
haplotypeLikelihoods(as.DNAbin(dolph.haps))

## Hap.32 Hap.22 Hap.06 Hap.02 Hap.15 Hap.29 Hap.10 Hap.30 Hap.23 Hap.03 
##  -26.3  -20.7  -19.2  -16.0  -14.8  -13.4  -11.1   -9.4   -9.2   -8.9 
## Hap.04 Hap.33 Hap.31 Hap.14 Hap.09 Hap.12 Hap.18 Hap.19 Hap.07 Hap.21 
##   -8.8   -8.8   -8.1   -7.6   -7.5   -7.4   -7.2   -7.0   -6.8   -4.3 
## Hap.13 Hap.20 Hap.26 Hap.27 Hap.16 Hap.05 Hap.24 Hap.17 Hap.25 Hap.01 
##   -4.3   -3.3   -3.3   -3.2   -2.9   -2.9   -2.7   -2.2   -1.8   -1.7 
## Hap.08 Hap.28 Hap.11 
##   -1.4   -1.2    0.0

All of the above functions can be conducted at once with the qaqc function. Only those functions appropriate to the data type contained (haploid or diploid) will be run. Files are written for each output that are labelled either by the @description slot of the gtypes object or the optional label argument of the function.