The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette demonstrates a complete workflow for missing person identification combining genetic DNA evidence with non-genetic preliminary investigation evidence (sex, age, hair color).
A family is searching for their missing relative with the following known characteristics:
A person of interest (POI) has been found with:
We will evaluate this match using both:
Error rates (epsilon) account for uncertainty in observations:
Sex Evidence:
# MP is female (H=1), POI observed as female
lr_sex_result <- lr_sex(
LR = TRUE,
H = 1, # True hypothesis (MP is female)
nsims = 1,
eps = eps_sex,
erRs = 0.01 # Database recording error
)
#> Warning: Parameter 'nsims' is deprecated. Use 'numsims' instead.
cat("LR for sex evidence:", lr_sex_result$LRs, "\n")
#> LR for sex evidence: 1.96Age Evidence:
# MP age = 25, tolerance = 5 years (so 20-30 is acceptable)
# POI age = 27 (falls within range)
lr_age_result <- lr_age(
LR = TRUE,
H = 1, # True hypothesis
nsims = 1,
epa = eps_age,
erRa = 0.01,
MPa = 25, # MP age
MPr = 5 # Range tolerance
)
#> Warning: Parameter 'nsims' is deprecated. Use 'numsims' instead.
cat("LR for age evidence:", lr_age_result$LRa, "\n")
#> LR for age evidence: 7.6Hair Color Evidence:
# MP has color 2 (dark brown), POI observed as color 2
lr_color_result <- lr_hair_color(
LR = TRUE,
H = 1, # True hypothesis
nsims = 1,
MPc = 2, # MP hair color
epc = eps_color,
erRc = eps_color
)
#> Warning: Parameter 'nsims' is deprecated. Use 'numsims' instead.
cat("LR for hair color:", lr_color_result$LRc, "\n")
#> LR for hair color: 4.840271# Population CPT (H2)
cpt_h2 <- cpt_population(
propS = c(0.5, 0.5),
MPa = 25,
MPr = 5,
propC = c(0.15, 0.35, 0.25, 0.15, 0.10) # Realistic color distribution
)
# MP CPT (H1)
cpt_h1 <- cpt_missing_person(
MPs = 1, # Female
MPc = 2, # Dark brown
eps = eps_sex,
epa = eps_age,
epc = eps_color
)
# Visualize both CPTs and LR heatmap
plot_cpt(cpt_h2, cpt_h1)For illustration, we’ll show how to simulate genetic LRs using a parent-child relationship pedigree. Note: This code is provided for reference but not executed in this vignette to avoid dependency on specific pedigree structures.
# Create a simple pedigree: parent-child relationship
# The missing person (ID 5) is child of individual 2
# Using linearPed to create grandparent-parent-child
ped <- linearPed(2) # 5 individuals
# Add genetic markers from Norwegian population
ped <- setMarkers(ped, locusAttributes = NorwegianFrequencies[1:10])
# Simulate a profile for the reference person
set.seed(123)
ped <- profileSim(ped, N = 1, ids = 2)[[1]]# Simulate genetic LRs
genetic_sims <- sim_lr_genetic(
reference = ped,
missing = 5,
numsims = 100,
seed = 456
)
# Convert to dataframe
genetic_df <- lr_to_dataframe(genetic_sims)
# Visualize
plot_lr_distribution(genetic_df)For this demonstration, we’ll use pre-computed example values:
# Example genetic LR values (pre-computed)
# These represent typical values from parent-child testing
set.seed(42)
genetic_df <- data.frame(
Related = 10^rnorm(100, mean = 3, sd = 1.5),
Unrelated = 10^rnorm(100, mean = -0.5, sd = 1)
)
cat("Summary of log10(LR) under H1 (Related):\n")
#> Summary of log10(LR) under H1 (Related):
summary(log10(genetic_df$Related))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.490 2.075 3.135 3.049 3.992 6.430
cat("\nSummary of log10(LR) under H2 (Unrelated):\n")
#>
#> Summary of log10(LR) under H2 (Unrelated):
summary(log10(genetic_df$Unrelated))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -2.52468 -1.09150 -0.56929 -0.58748 -0.03821 2.20189We need to specify a prior probability that a random POI is the MP. This depends on the size of the candidate population:
The posterior odds combine all evidence:
\[\text{Posterior Odds} = \text{Prior Odds} \times LR_{genetic} \times LR_{nongenetic}\]
# For the simulations under H1 (true match scenario)
posterior_h1 <- prior_odds * genetic_df$Related * lr_nongenetic
# For simulations under H2 (no match scenario)
posterior_h2 <- prior_odds * genetic_df$Unrelated * lr_nongenetic
# Summary
cat("Posterior odds under H1 (median):", round(median(posterior_h1), 4), "\n")
#> Posterior odds under H1 (median): 9.8328
cat("Posterior odds under H2 (median):", round(median(posterior_h2), 6), "\n")
#> Posterior odds under H2 (median): 0.001945# Find optimal threshold with weight 10 (FP 10x worse than FN)
threshold_result <- decision_threshold(
datasim = genetic_df,
weight = 10
)
#> Decision threshold is: 2694.0341
# Calculate error rates at the optimal threshold
rates <- threshold_rates(
datasim = genetic_df,
threshold = threshold_result
)
#> FNR = 0.56 ; FPR = 0 ; MCC = 0.5311
# Check rates at different thresholds
cat("\nError rates at different thresholds:\n")
#>
#> Error rates at different thresholds:
for (t in c(1, 10, 100, 1000)) {
r <- threshold_rates(genetic_df, threshold = t)
cat(sprintf("LR > %5d: FPR=%.3f, FNR=%.3f, MCC=%.3f\n",
t, r$FPR, r$FNR, r$MCC))
}
#> FNR = 0.04 ; FPR = 0.24 ; MCC = 0.7348
#> LR > 1: FPR=0.240, FNR=0.040, MCC=0.735
#> FNR = 0.1 ; FPR = 0.04 ; MCC = 0.8616
#> LR > 10: FPR=0.040, FNR=0.100, MCC=0.862
#> FNR = 0.24 ; FPR = 0.01 ; MCC = 0.7707
#> LR > 100: FPR=0.010, FNR=0.240, MCC=0.771
#> FNR = 0.46 ; FPR = 0 ; MCC = 0.6082
#> LR > 1000: FPR=0.000, FNR=0.460, MCC=0.608
# Plot decision curve
plot_decision_curve(
datasim = genetic_df,
LRmax = 10000
)The LR quantifies how many times more likely the evidence is under H1 vs H2:
| Log10(LR) | LR Range | Interpretation |
|---|---|---|
| < 0 | < 1 | Supports H2 (not the MP) |
| 0-1 | 1-10 | Weak support for H1 |
| 1-2 | 10-100 | Support for H1 |
| 2-4 | 100-10,000 | Strong support for H1 |
| > 4 | > 10,000 | Very strong support for H1 |
# Median genetic LR under H1
median_genetic_lr <- median(genetic_df$Related)
# Total LR
total_lr <- median_genetic_lr * lr_nongenetic
log10_total <- log10(total_lr)
cat("Genetic LR (median under H1):", round(median_genetic_lr, 0), "\n")
#> Genetic LR (median under H1): 1364
cat("Non-genetic LR:", round(lr_nongenetic, 2), "\n")
#> Non-genetic LR: 72.1
cat("Total combined LR:", round(total_lr, 0), "\n")
#> Total combined LR: 98318
cat("Log10(Total LR):", round(log10_total, 2), "\n")
#> Log10(Total LR): 4.99For interactive exploration of parameters and their effects, use the Shiny applications:
This workflow demonstrated how to:
The key advantage of combining evidence types is increased discrimination power, particularly useful in cases where genetic evidence alone may be inconclusive due to:
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
#> [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/Argentina/Buenos_Aires
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forrel_1.8.1 pedtools_2.9.0 mispitools_1.4.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyr_1.3.2 sandwich_3.1-1 sass_0.4.10 generics_0.1.4
#> [5] lpSolve_5.6.23 stringi_1.8.7 verbalisr_0.7.2 lattice_0.22-7
#> [9] pROC_1.19.0.1 digest_0.6.39 magrittr_2.0.4 RColorBrewer_1.1-3
#> [13] evaluate_1.0.5 grid_4.5.2 fastmap_1.2.0 plyr_1.8.9
#> [17] jsonlite_2.0.0 Matrix_1.7-4 Formula_1.2-5 purrr_1.2.0
#> [21] scales_1.4.0 pbapply_1.7-4 jquerylib_0.1.4 cli_3.6.5
#> [25] rlang_1.1.6 miscTools_0.6-28 withr_3.0.2 cachem_1.1.0
#> [29] yaml_2.3.12 otel_0.2.0 tools_4.5.2 parallel_4.5.2
#> [33] reshape2_1.4.5 kinship2_1.9.6.2 dplyr_1.1.4 ggplot2_4.0.1
#> [37] maxLik_1.5-2.2 vctrs_0.6.5 R6_2.6.1 zoo_1.8-15
#> [41] lifecycle_1.0.4 stringr_1.6.0 pedprobr_1.0.1 pkgconfig_2.0.3
#> [45] pillar_1.11.1 bslib_0.9.0 gtable_0.3.6 Rcpp_1.1.0
#> [49] glue_1.8.0 pedmut_0.9.0 xfun_0.55 tibble_3.3.0
#> [53] tidyselect_1.2.1 knitr_1.51 farver_2.1.2 DirichletReg_0.7-2
#> [57] patchwork_1.3.2 htmltools_0.5.9 labeling_0.4.3 rmarkdown_2.30
#> [61] compiler_4.5.2 S7_0.2.1 ribd_1.7.1 quadprog_1.5-8Marsico FL, Vigeland MD, Egeland T, Herrera Pinero F (2021). “Making decisions in missing person identification cases with low statistical power.” Forensic Science International: Genetics, 52, 102519.
Marsico FL, et al. (2023). “Likelihood ratios for non-genetic evidence in missing person cases.” Forensic Science International: Genetics, 66, 102891.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.