The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
First, the library is loaded:
For reproducibility, the seed for the (pseudo) random number generator is set:
First, the population sizes are determined:
A population can be simulated (hiding progress information) as follows:
set.seed(1) # For reproducibility
sim_res_growth <- sample_geneology_varying_size(
population_sizes = population_sizes,
# VRS = 0.2:
enable_gamma_variance_extension = TRUE,
gamma_parameter_shape = 5,
gamma_parameter_scale = 1/5,
# Live population:
# 3 generations
generations_full = 3,
generations_return = 3,
progress = FALSE)
Live population:
Until pedigrees are build/infered, there is not much information available (e.g. about children). So let us infer the pedigrees:
## List of 11 pedigrees (of size 217, 96, 95, 56, 51, 47, ...)
## [1] 11
## 33 37 39 21 47 38 51 56 95 96 217
## 1 1 1 1 1 1 1 1 1 1 1
## [1] 217
We can look at the population as a (tidy)graph:
## # A tbl_graph: 730 nodes and 719 edges
## #
## # A rooted forest with 11 trees
## #
## # A tibble: 730 × 4
## name gens_from_final ped_id haplotype
## <chr> <int> <int> <list>
## 1 727 19 4 <int [0]>
## 2 713 18 4 <int [0]>
## 3 704 17 4 <int [0]>
## 4 688 16 4 <int [0]>
## 5 685 15 4 <int [0]>
## 6 673 14 4 <int [0]>
## # ℹ 724 more rows
## #
## # A tibble: 719 × 2
## from to
## <int> <int>
## 1 1 2
## 2 2 3
## 3 3 4
## # ℹ 716 more rows
This can be plotted:
if (requireNamespace("ggraph", quietly = TRUE)) {
library(ggraph)
p <- ggraph(g, layout = 'tree') +
geom_edge_link() +
geom_node_point(size = 8) +
geom_node_text(aes(label = name), color = "white") +
facet_nodes(~ ped_id) +
theme_graph()
print(p)
}
## Indlæser krævet pakke: ggplot2
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This is rather difficult to make any sense of. Let’s instead plot only pedigree 1:
PED_ID <- 1
g_ped2 <- g %>%
activate(nodes) %>%
filter(ped_id == PED_ID)
if (requireNamespace("ggraph", quietly = TRUE)) {
library(ggraph)
p <- ggraph(g_ped2, layout = 'tree') +
geom_edge_link() +
geom_node_point(size = 8) +
geom_node_text(aes(label = name), color = "white") +
theme_graph()
print(p)
}
Up until now, only the genealogy has been simulated. Now, we run a mutational process, i.e. assign haplotypes to founders and let haplotypes flow down the individuals.
We use realistic data. In the package, there is information about the individual markers:
## # A tibble: 29 × 5
## Marker Mutations Meioses MutProb Alleles
## <fct> <dbl> <dbl> <dbl> <list>
## 1 DYS438 4 10673 0.000375 <dbl [22]>
## 2 DYS392 8 15418 0.000519 <dbl [22]>
## 3 DYS393 15 14264 0.00105 <dbl [17]>
## 4 DYS437 13 10652 0.00122 <dbl [15]>
## 5 DYS385a 32 26171 0.00122 <dbl [52]>
## 6 DYS385b 32 26171 0.00122 <dbl [52]>
## 7 DYS643 3 2220 0.00135 <dbl [17]>
## 8 DYS448 11 7229 0.00152 <dbl [36]>
## 9 DYS390 33 15612 0.00211 <dbl [19]>
## 10 DYS19 36 16090 0.00224 <dbl [19]>
## # ℹ 19 more rows
Note, that MutProb
is the point estimate given by
MutProb = Mutations / Meioses
. Information about which
markers that are in which kit is also provided:
## # A tibble: 88 × 2
## Marker Kit
## <fct> <fct>
## 1 DYS392 Minimal
## 2 DYS393 Minimal
## 3 DYS385a Minimal
## 4 DYS385b Minimal
## 5 DYS390 Minimal
## 6 DYS19 Minimal
## 7 DYS391 Minimal
## 8 DYS389I Minimal
## 9 DYS389II Minimal
## 10 DYS438 PowerPlex Y
## # ℹ 78 more rows
## # A tibble: 5 × 2
## Kit n
## <fct> <int>
## 1 Minimal 9
## 2 PowerPlex Y 12
## 3 Yfiler 17
## 4 PowerPlex Y23 23
## 5 Yfiler Plus 27
Let us take all PowerPlex Y23 markers and assume that we only have a
partial profile where DYS437
and DYS448
dropped out. At the same time, we also filter out the integer alleles
(for generating random founder haplotypes in a minute):
partial_kit <- ystr_kits %>%
filter(Kit == "PowerPlex Y23") %>%
inner_join(ystr_markers, by = "Marker") %>%
filter(!(Marker %in% c("DYS437", "DYS448"))) %>%
rowwise() %>% # To work on each row
mutate(IntegerAlleles = list(Alleles[Alleles == round(Alleles)]),
MinIntAllele = min(IntegerAlleles),
MaxIntAllele = max(IntegerAlleles)) %>%
ungroup() %>%
select(-Kit, -Alleles)
partial_kit
## # A tibble: 21 × 7
## Marker Mutations Meioses MutProb IntegerAlleles MinIntAllele MaxIntAllele
## <fct> <dbl> <dbl> <dbl> <list> <dbl> <dbl>
## 1 DYS438 4 10673 0.000375 <dbl [14]> 5 19
## 2 DYS392 8 15418 0.000519 <dbl [14]> 6 20
## 3 DYS393 15 14264 0.00105 <dbl [12]> 7 18
## 4 DYS385a 32 26171 0.00122 <dbl [23]> 6 28
## 5 DYS385b 32 26171 0.00122 <dbl [23]> 6 28
## 6 DYS643 3 2220 0.00135 <dbl [13]> 4 17
## 7 DYS390 33 15612 0.00211 <dbl [14]> 17 30
## 8 DYS19 36 16090 0.00224 <dbl [12]> 9 20
## 9 DYS391 38 15486 0.00245 <dbl [12]> 5 16
## 10 DYS389I 42 14339 0.00293 <dbl [9]> 9 17
## # ℹ 11 more rows
This “partial kit” has the following mutation probabilities:
## [1] 0.0003747775 0.0005188740 0.0010515984 0.0012227274 0.0012227274
## [6] 0.0013513514 0.0021137586 0.0022374145 0.0024538293 0.0029290746
## [11] 0.0030266344 0.0036747818 0.0037541061 0.0041229909 0.0042882833
## [16] 0.0043338286 0.0050205386 0.0054475439 0.0063641395 0.0133475707
## [21] 0.0147194112
We can make a founder haplotype generator as follows (sampling alleles randomly is not how Y-STR works, but it may work fine for founder haplotypes):
generate_random_haplotype <- function() {
partial_kit %>%
rowwise() %>%
mutate(Allele = IntegerAlleles[sample.int(length(IntegerAlleles), 1)]) %>%
pull(Allele)
}
Now, a new haplotype is created everytime the function is called (with no arguments):
## [1] 11 14 16 14 6 17 26 16 10 13 20 13 9 28 14 27 31 13 18 24 22
## [1] 18 17 18 14 28 13 30 17 12 11 18 8 11 33 12 17 16 11 17 19 13
Of course such generator can also be created for a reference database with Y-STR profiles.
Now, we are ready to assign haplotypes to the genealogy:
set.seed(1)
pedigrees_all_populate_haplotypes_custom_founders(
pedigrees = pedigrees,
get_founder_haplotype = generate_random_haplotype,
mutation_rates = mu,
progress = FALSE)
We can now plot pedigrees with haplotype information (note that
as_tbl_graph
needs to be called again):
g_ped2 <- as_tbl_graph(pedigrees) %>%
activate(nodes) %>%
filter(ped_id == PED_ID) %>%
group_by(name) %>%
mutate(haplotype_str = paste0(haplotype[[1]], collapse = ";"))
#mutate(haplotype_str = map(haplotype, paste0, collapse = ";")[[1]])
if (requireNamespace("ggraph", quietly = TRUE)) {
library(ggraph)
p <- ggraph(g_ped2, layout = 'tree') +
geom_edge_link() +
geom_node_point(aes(color = haplotype_str), size = 8) +
geom_node_text(aes(label = name), color = "white") +
theme_graph()
print(p)
}
We have live_pop
from the population.
set.seed(5)
Q_index <- sample.int(n = length(live_pop), size = 1)
Q <- live_pop[[Q_index]]
print_individual(Q)
## pid = 322 with father pid = 376 and no children
## [1] 13 9 13 6 7 17 23 19 6 11 6 11 9 32 14 28 25 11 18 24 14
Now, count matches in live part of pedigree and in live part of population:
Q_ped <- get_pedigree_from_individual(Q)
ped_live_matches <- count_haplotype_occurrences_pedigree(
pedigree = Q_ped,
haplotype = Q_hap,
generation_upper_bound_in_result = 2) # gen 0, 1, 2
pop_live_matches <- count_haplotype_occurrences_individuals(
individuals = live_pop,
haplotype = Q_hap)
ped_live_matches
## [1] 36
## [1] 36
We can also inspect pedigree matches information about number of meioses and \(L_1\) distances:
path_details <- pedigree_haplotype_matches_in_pedigree_meiosis_L1_dists(
suspect = Q,
generation_upper_bound_in_result = 2)
## [1] 36
## meioses max_L1 pid
## [1,] 16 0 319
## [2,] 16 0 339
## [3,] 16 0 289
## [4,] 17 0 157
## [5,] 18 0 45
## [6,] 17 0 177
This can of course be repeated to many populations (genealogies, haplotype processes, suspects etc.). Also note that variability can be put on mutation rates, e.g. by a Bayesian approach.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.