This vignette illustrates how to detect ambiguity and inconsistency in a merged taxonomy. Start by loading the 2000 row sample dataset that comes with taxonbridge
:
library(taxonbridge)
<- load_sample()
sample dim(sample)
#> [1] 2000 20
Next, retrieve all rows that have lineage information in both the GBIF backbone and NCBI:
<- get_lineages(sample) lineages
Then validate the lineages by using the kingdom and family taxonomic ranks, and create a list of the resulting tibble(s). Note that phylum, class, and order may also be used. In this example, entries that failed validation are returned by setting valid = FALSE
.
<- get_validity(lineages, rank = "kingdom", valid = FALSE)
kingdom #> Term conversion carried out on kingdom taxonomic rank
<- get_validity(lineages, rank = "family", valid = FALSE)
family <- list(kingdom, family) candidates
Finally, detect candidate incongruencies (excluding those with uninomial scientific names):
get_inconsistencies(candidates, uninomials = FALSE)
#> [1] "Gordonia neofelifaecis" "Attheya septentrionalis"
Two binomial names exhibit incongruency. Upon reference to the literature and the individual entries it can be seen that:
Attheya septentrionalis is assigned to different families of the problematica order Chaetocerotales
Gordonia neofelifaecis is a plant (family: Theaceae) in the GBIF but a bacterium in the NCBI (family: Gordoniaceae)
Attheya septentrionalis has the status “synonym” in the GBIF data:
$canonicalName=="Attheya septentrionalis", "taxonomicStatus"]
lineages[lineages#> # A tibble: 1 × 1
#> taxonomicStatus
#> <chr>
#> 1 synonym
Applying the get_status()
function and rerunning the exercise leaves only Gordonia neofelifaecis as a binomial incongruency with biological provenance:
<- get_status(get_lineages(sample), status = "accepted")
lineages <- get_validity(lineages, rank = "kingdom", valid = FALSE)
kingdom #> Term conversion carried out on kingdom taxonomic rank
<- get_validity(lineages, rank = "family", valid = FALSE)
family <- list(kingdom, family)
candidates get_inconsistencies(candidates, uninomials = FALSE)
#> [1] "Gordonia neofelifaecis"