For most functions in strataG, you will need to load your data into a gtypes object. A gtypes object is an R S4 class with the following slots:
A gtypes object can be created with the standard S4 new constructor function, like this:
#--- create a diploid (microsatellite) gtypes object
data(dolph.msats)
data(dolph.strata)
strata.schemes <- dolph.strata[, c("broad", "fine")]
rownames(strata.schemes) <- dolph.strata$id
msats <- new("gtypes", gen.data = dolph.msats[, -1], ploidy = 2,
ind.names = dolph.msats[, 1], schemes = strata.schemes)
#--- create a haploid sequence (mtDNA) gtypes object
data(dolph.seqs)
dloop.haps <- cbind(haps = dolph.strata$id)
rownames(dloop.haps) <- dolph.strata$id
dloop <- new("gtypes", gen.data = dloop.haps, ploidy = 1,
sequences = dolph.seqs)
Both of these examples create a gtypes where every individual is in a single default stratum. To stratify samples when created, supply either a vector, factor, or stratification scheme name to the ‘strata’ argument in new, like so:
msats.fine <- new("gtypes", gen.data = dolph.msats[, -1], ploidy = 2,
ind.names = dolph.msats[, 1], schemes = strata.schemes, strata = "fine")
The df2gtypes function assumes that you have a matrix or data.frame with columns for individual ids, stratification, and locus data. You then specify the columns in the data.frame where this information can be found. df2gtypes can be used for data with multiple alleles per locus, like this:
# create a single data.frame with the msat data and stratification
msats.merge <- merge(dolph.strata, dolph.msats, all.y = TRUE, description = date())
str(msats.merge)
## 'data.frame': 126 obs. of 14 variables:
## $ id : chr "18650" "18651" "18652" "18653" ...
## $ dLoop : chr "Hap.06" "Hap.14" "Hap.23" "Hap.24" ...
## $ broad : chr "Offshore" "Offshore" "Offshore" "Offshore" ...
## $ fine : chr "Offshore.South" "Offshore.South" "Offshore.South" "Offshore.South" ...
## $ D11t.1 : chr "131" "137" "131" "133" ...
## $ D11t.2 : chr "133" "143" "133" "139" ...
## $ EV37.1 : chr "212" "200" "212" "202" ...
## $ EV37.2 : chr "222" "228" "218" "220" ...
## $ EV94.1 : chr "263" "249" "249" "229" ...
## $ EV94.2 : chr "265" "251" "251" "245" ...
## $ Ttr11.1: chr "197" "197" "211" "197" ...
## $ Ttr11.2: chr "209" "197" "213" "215" ...
## $ Ttr34.1: chr "185" "185" "185" "189" ...
## $ Ttr34.2: chr "187" "187" "191" "195" ...
# create the gtypes object
msats.fine <- df2gtypes(msats.merge, ploidy = 2, id.col = 1, strata.col = 3, loc.col = 5)
…or for haploid data, like this:
data(dolph.seqs)
seq.df <- dolph.strata[ c("id", "broad", "id")]
colnames(seq.df)[3] <- "D-loop"
dl.g <- df2gtypes(seq.df, ploidy = 1, sequences = dolph.seqs)
dl.g
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 126 samples, 1 locus, 2 strata
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 0 68 1
## Offshore 58 0 58 1
## heterozygosity
## Coastal 1
## Offshore 1
##
## Sequence summary:
## num.seqs min.length mean.length max.length a c
## D.loop 126 402 402 402 0.3011288 0.2293531
## g t
## D.loop 0.1289702 0.3405479
Note that since each sequence in dolph.seqs is for a given individual, the pct.unique.alleles and heterozygosity values are for every stratum 1 and num.samples equals num.alleles. In order to convert the sequences to unique haplotypes, use the labelHaplotypes function:
dl.haps <- labelHaplotypes(dl.g)
dl.haps$gtype
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 126 samples, 1 locus, 2 strata
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 0 5 0.0000000
## Offshore 58 0 29 0.4827586
## heterozygosity
## Coastal 0.7427568
## Offshore 0.9594676
##
## Sequence summary:
## num.seqs min.length mean.length max.length a c
## D.loop 33 402 402 402 0.3011738 0.2293071
## g t
## D.loop 0.1291935 0.3403256
The sequence2gtypes function creates an unstratified gtype object with just a set of DNA sequences:
data(dolph.haps)
haps.g <- sequence2gtypes(dolph.haps)
haps.g
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 33 samples, 1 locus, 1 stratum
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Default 33 0 33 1
## heterozygosity
## Default 1
##
## Sequence summary:
## num.seqs min.length mean.length max.length a c
## gene1 33 402 402 402 0.3011738 0.2293071
## g t
## gene1 0.1291935 0.3403256
If you have a vector that identifies strata designations for the sequences, that can be supplied as well:
# extract and name the stratification scheme
strata <- dolph.strata$fine
names(strata) <- dolph.strata$ids
# create the gtypes object
dloop.fine <- sequence2gtypes(dolph.seqs, strata, seq.names = "dLoop",
description = "dLoop: fine-scale stratification")
dloop.fine
##
## <<< dLoop: fine-scale stratification >>>
##
## Contents: 126 samples, 1 locus, 3 strata
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 0 68 1
## Offshore.North 40 0 40 1
## Offshore.South 18 0 18 1
## heterozygosity
## Coastal 1
## Offshore.North 1
## Offshore.South 1
##
## Sequence summary:
## num.seqs min.length mean.length max.length a c
## dLoop 126 402 402 402 0.3011288 0.2293531
## g t
## dLoop 0.1289702 0.3405479
Note that stratification is generally provided for individuals, thus if you want to stratify the resulting gtypes object from sequence2gtypes, one sequence for each individual should be provided, rather than just a set of unique haplotypes.
This converts from a genind object from the adegenet package.
library(adegenet)
# from example(df2genind)
df <- data.frame(locusA=c("11","11","12","32"),
locusB=c(NA,"34","55","15"),
locusC=c("22","22","21","22"))
row.names(df) <- .genlab("genotype",4)
obj <- df2genind(df, ploidy=2, ncode=1)
obj
## /// GENIND OBJECT /////////
##
## // 4 individuals; 3 loci; 9 alleles; size: 6.1 Kb
##
## // Basic content
## @tab: 4 x 9 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 2-4)
## @loc.fac: locus factor for the 9 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: df2genind(X = df, ncode = 1, ploidy = 2)
##
## // Optional content
## - empty -
# convert to gtypes
gi.g <- genind2gtypes(obj)
gi.g
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 4 samples, 3 loci, 1 stratum
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Default 4 0.3333333 3 0.5277778
## heterozygosity
## Default 0.4722222
##
## Locus summary:
## num.genotyped num.alleles prop.unique.alleles obsvd.heterozygosity
## locusA 4 3 0.3333333 0.5000000
## locusB 3 4 0.7500000 0.6666667
## locusC 4 2 0.5000000 0.2500000
There are several functions for getting basic information from a gtypes object:
A gtypes object can be subset using the standard R ‘[’ indexing operation, with three slots: [i, j, k]. The first (i) specifies the desired individuals, the second (j) is the loci to return, and the third (k) is the strata. All standard R indexing operations involving numerical, character, or logical vectors work for each argument. For example, to return 10 random individuals:
sub.msats <- msats.fine[sample(nInd(msats.fine), 10), , ]
sub.msats
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 10 samples, 5 loci, 2 strata
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 4 0 3.0 0.2833333
## Offshore 6 0 6.6 0.5085714
## heterozygosity
## Coastal 0.6
## Offshore 0.8
##
## Locus summary:
## num.genotyped num.alleles prop.unique.alleles obsvd.heterozygosity
## D11t 10 7 0.4285714 0.7
## EV37 10 7 0.4285714 0.7
## EV94 10 8 0.5000000 0.8
## Ttr11 10 7 0.0000000 0.7
## Ttr34 10 7 0.4285714 0.7
…or to return specific loci:
sub.msats <- sub.msats[, c("D11t", "EV37", "TV7"), ]
sub.msats
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 10 samples, 2 loci, 2 strata
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 4 0 2.5 0.1666667
## Offshore 6 0 7.0 0.5714286
## heterozygosity
## Coastal 0.5000000
## Offshore 0.8333333
##
## Locus summary:
## num.genotyped num.alleles prop.unique.alleles obsvd.heterozygosity
## D11t 10 7 0.4285714 0.7
## EV37 10 7 0.4285714 0.7
…or some loci in a specific stratum:
sub.msats <- msats.fine[, c("Ttr11", "D11t"), "Coastal"]
sub.msats
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 68 samples, 2 loci, 1 stratum
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 0.5 3.5 0
## heterozygosity
## Coastal 0.5773705
##
## Locus summary:
## num.genotyped num.alleles prop.unique.alleles obsvd.heterozygosity
## Ttr11 68 4 0 0.6323529
## D11t 67 3 0 0.5223881
A summary function has been defined for gtypes, and the return value provides the number of individuals, loci, strata, allele frequencies, a by-strata summary, and a by-locus summary:
msats.smry <- summary(msats.fine)
str(msats.smry)
## List of 7
## $ num.ind : num 126
## $ num.loc : int 5
## $ num.strata : int 2
## $ allele.freqs:List of 5
## ..$ D11t : num [1:12, 1:2, 1:2] 0 0 0 0 0 0 48 83 3 0 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : chr [1:12] "117" "119" "121" "127" ...
## .. .. ..$ : chr [1:2] "freq" "prop"
## .. .. ..$ : chr [1:2] "Coastal" "Offshore"
## ..$ EV37 : num [1:22, 1:2, 1:2] 0 0 0 0 0 0 0 1 71 33 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : chr [1:22] "190" "200" "202" "204" ...
## .. .. ..$ : chr [1:2] "freq" "prop"
## .. .. ..$ : chr [1:2] "Coastal" "Offshore"
## ..$ EV94 : num [1:15, 1:2, 1:2] 0 0 0 12 0 47 30 0 0 25 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : chr [1:15] "229" "239" "243" "245" ...
## .. .. ..$ : chr [1:2] "freq" "prop"
## .. .. ..$ : chr [1:2] "Coastal" "Offshore"
## ..$ Ttr11: num [1:9, 1:2, 1:2] 0 0 42 0 0 59 33 2 0 0 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : chr [1:9] "193" "197" "207" "209" ...
## .. .. ..$ : chr [1:2] "freq" "prop"
## .. .. ..$ : chr [1:2] "Coastal" "Offshore"
## ..$ Ttr34: num [1:10, 1:2, 1:2] 0 0 33 19 0 65 8 11 0 0 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : chr [1:10] "179" "183" "185" "187" ...
## .. .. ..$ : chr [1:2] "freq" "prop"
## .. .. ..$ : chr [1:2] "Coastal" "Offshore"
## $ strata.smry : num [1:2, 1:5] 68 58 1.2 0.8 4.8 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "Coastal" "Offshore"
## .. ..$ : chr [1:5] "num.samples" "num.missing" "num.alleles" "prop.unique.alleles" ...
## $ unstratified: int 0
## $ locus.smry : num [1:5, 1:7] 125 119 125 125 126 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:5] "D11t" "EV37" "EV94" "Ttr11" ...
## .. ..$ : chr [1:7] "num.genotyped" "prop.genotyped" "num.alleles" "allelic.richness" ...
## - attr(*, "description")= chr "gtypes created on 2016-04-04 10:42:47"
## - attr(*, "class")= chr [1:2] "gtypeSummary" "list"
For just the by-locus summary statistics:
summarizeLoci(msats.fine)
## num.genotyped prop.genotyped num.alleles allelic.richness
## D11t 125 0.9920635 12 0.09600000
## EV37 119 0.9444444 22 0.18487395
## EV94 125 0.9920635 15 0.12000000
## Ttr11 125 0.9920635 9 0.07200000
## Ttr34 126 1.0000000 10 0.07936508
## prop.unique.alleles exptd.heterozygosity obsvd.heterozygosity
## D11t 0.25000000 0.7468273 0.7040000
## EV37 0.13636364 0.8270751 0.6974790
## EV94 0.06666667 0.8304900 0.7760000
## Ttr11 0.22222222 0.7953414 0.7040000
## Ttr34 0.20000000 0.8144248 0.6984127
You can specify the stratification scheme when creating a gtypes object as in the examples above. Once a gtypes object has been created, you can also change the stratification scheme by either supplying a new vector for the @strata slot:
# randomly stratify individuals to two populations
strata(msats) <- sample(c("Pop1", "Pop2"), nInd(msats), rep = TRUE)
summary(msats)$strata.smry
## num.samples num.missing num.alleles prop.unique.alleles
## Pop1 63 0.8 12.2 0.1896581
## Pop2 63 1.2 12.2 0.1751804
## heterozygosity
## Pop1 0.7295545
## Pop2 0.7023215
or, if there is a stratification scheme data.frame in the @schemes slot, you can use the stratify function to choose a stratification scheme:
# choose "broad" stratification scheme
msats <- stratify(msats, "broad")
summary(msats)$strata.smry
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 1.2 4.8 0.08571429
## Offshore 58 0.8 13.6 0.17505051
## heterozygosity
## Coastal 0.6312283
## Offshore 0.8144326
If the @schemes slot is empty (NULL), you can fill it with a proper data.frame like this:
schemes(dloop) <- strata.schemes
NOTE: Filling or changing the @schemes slot does not affect the current stratification of the samples. You must then select a new stratification scheme or fill the @strata slot as above.
stratify(dloop, "fine")
##
## <<< gtypes created on 2016-04-04 10:42:47 >>>
##
## Contents: 126 samples, 1 locus, 3 strata
##
## Strata summary:
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 0 68 1
## Offshore.North 40 0 40 1
## Offshore.South 18 0 18 1
## heterozygosity
## Coastal 1
## Offshore.North 1
## Offshore.South 1
##
## Sequence summary:
## num.seqs min.length mean.length max.length a c
## haps 126 402 402 402 0.3011288 0.2293531
## g t
## haps 0.1289702 0.3405479
If some samples should be unstratified (excluded from any stratified analyses), they should have NAs in the appropriate position in the @strata slot. For example:
# unstratify a random 10 samples
x <- strata(msats)
x[sample(indNames(msats), 10)] <- NA
strata(msats) <- x
summary(msats)$strata.smry
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 62 1.0 4.6 0.06666667
## Offshore 54 0.4 13.4 0.16682540
## heterozygosity
## Coastal 0.6330504
## Offshore 0.8171209
You can also randomly permute the current stratification scheme using the permuteStrata function like this:
msats <- stratify(msats, "fine")
# original
summary(msats)$strata.smry
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 1.2 4.8 0.08571429
## Offshore.North 40 0.8 12.6 0.22398990
## Offshore.South 18 0.0 11.0 0.25098291
## heterozygosity
## Coastal 0.6312283
## Offshore.North 0.7901822
## Offshore.South 0.8666667
# permuted
ran.msats <- permuteStrata(msats)
summary(ran.msats)$strata.smry
## num.samples num.missing num.alleles prop.unique.alleles
## Coastal 68 1.6 11.4 0.1500675
## Offshore.North 40 0.2 11.2 0.2035839
## Offshore.South 18 0.2 8.2 0.3112121
## heterozygosity
## Coastal 0.6895669
## Offshore.North 0.7638462
## Offshore.South 0.7071895
NOTE: Only samples assigned to strata are permuted with permuteStrata. Those not assigned (NAs) remain unassigned.
The allelic data in a gtypes object can be converted back to a matrix with as.matrix:
gen.mat <- as.matrix(msats)
head(gen.mat)
## D11t.1 D11t.2 EV37.1 EV37.2 EV94.1 EV94.2 Ttr11.1 Ttr11.2 Ttr34.1
## 4495 "133" "133" NA NA NA NA "211" "211" "185"
## 4496 "137" "141" NA NA "243" "251" NA NA "189"
## 4498 "133" "133" "220" "220" "245" "265" "209" "211" "189"
## 5814 "135" "139" "210" "226" "251" "265" "209" "209" "185"
## 5815 "139" "139" "212" "212" "249" "261" "207" "213" "185"
## 5816 "135" "137" "214" "220" "249" "253" "207" "209" "187"
## Ttr34.2
## 4495 "191"
## 4496 "189"
## 4498 "189"
## 5814 "187"
## 5815 "185"
## 5816 "195"
By default, this function splits each allele into its own column. One can make a matrix with one locus per column and alleles separated by a specified character by setting the one.col argument to TRUE:
gen.mat <- as.matrix(msats, one.col = TRUE)
head(gen.mat)
## D11t EV37 EV94 Ttr11 Ttr34
## 4495 "133/133" NA NA "211/211" "185/191"
## 4496 "137/141" NA "243/251" NA "189/189"
## 4498 "133/133" "220/220" "245/265" "209/211" "189/189"
## 5814 "135/139" "210/226" "251/265" "209/209" "185/187"
## 5815 "139/139" "212/212" "249/261" "207/213" "185/185"
## 5816 "135/137" "214/220" "249/253" "207/209" "187/195"
The contents of a gtypes object can be written to a file with the write.gtypes function. This will write a .csv file with the allelic information and a .fasta file for any sequence data if it exists.