Map new concepts

library(ontologics)
library(dplyr, warn.conflicts = FALSE)

When an ontology exists already, it can be used either by looking up which concepts exist in it, which relations the concepts have among each other, by adding and linking new concepts to the already existing concepts or by extending the harmonised concepts based on external ontologies.

Relate an external ontology to harmonized concepts

When adding (or mapping) new concepts, we first have to define the source of the new concepts.

# already existing ontology for some project about crops
crops <- load_ontology(path = system.file("extdata", "crops.rds", package = "ontologics"))

# where we have to set the external dataset as new source
crops <- new_source(name = "externalDataset",
                    version = "0.0.1",
                    description = "a vocabulary",
                    homepage = "https://www.something.net",
                    license = "CC-BY-4.0",
                    ontology = crops)

# new concepts that occur in the external dataset, which should be harmonised with the ontology
newConcepts <- tibble(label = c("Wheat", "NUTS", "Avocado"))

The new concepts are from different conceptual levels, both ‘Wheat’ and ‘Avocado’ are the crop itself, while ‘NUTS’ is an aggregate of various crops (such as walnut, hazelnut, etc). Let’s first find out whether these concepts are in fact new concepts because they are missing from the ontology.

missingConcepts <- get_concept(x = newConcepts, ontology = crops)
kable(missingConcepts)
label class id has_broader description
Wheat class .02.09 .02 NA
NUTS group .10 NA NA
Avocado NA NA NA NA

This tells us that that both ‘NUTS’ and ‘Wheat’ don’t seem to be missing from the ontology. Moreover, we see that ‘Wheat’ is a class and not a crop and ‘NUTS’ also isn’t crop and doesn’t have any broader concept. While Avocado is missing from the ontology entirely, the crop Wheat is also missing, hence both have to be defined as new concept.

By studying the ontology (not shown here), we can identify the semantic relation between the new concepts and some of the already harmonized concepts. First of all, to assign new concepts from an external ontology, a harmonised concept must already exist in the harmonised ontology. For the new harmonized concepts, we chose the lower capital letter words to show the difference between those and the external concepts and we assign them as narrower concepts to the respective already existing, harmonized (broader) concepts.

broaderConcepts <- get_concept(x = tibble(label = c("Wheat", "Tropical and subtropical Fruit")), 
                               ontology = crops)

crops <- new_concept(new = c("wheat", "avocado"),
                     broader = broaderConcepts,
                     class = "crop",
                     ontology = crops)

Eventually, all new (external) concepts can be mapped to already harmonized concepts. Even though ‘NUTS’ already exists, this also applies to this new concept, because the already existing concept ‘NUTS’ doesn’t necessarily have to be the same as the new concept ‘NUTS’. This all depends on the respective description of the harmonized and external concepts. When setting a new mapping, the type and the certainty of the match have to be defined. For ‘wheat’ this is a close match, because the concepts are very related. for ‘NUTS’ this is, after checking the theoretical definitions also a close match. ‘Avocado’ is nested into ‘Tropical Fruit’ and thus has the match type broad (because Tropical Fruit is broader than Avocado).

toMap <- get_concept(x = tibble(label = c("wheat", "NUTS", "avocado")),
                     ontology = crops)

crops <- new_mapping(new = newConcepts %>% pull(label),
                     target = toMap,
                     match = c("close", "close", "close"),
                     source = "externalDataset",
                     certainty = 3,
                     ontology = crops)

Extend the classes

It may be, moreover, that we want to (or have to) add new concepts, that do not have a class defined yet. This can be the case when we have to nest new concepts, out of a range of concepts that do have a valid class, into concepts that are already at the lowest possible hierarchical level. These concepts are, when specified with class = NA, class = c(bla, blubb, NA), or class = NULL, assigned the actual class undefined and you are informed about how to proceed.

broaderConcepts <- get_concept(x = tibble(label = c("wheat", "wheat")),
                               ontology = crops)

# for (some of) these concepts we do not know the class ...
crops <- new_concept(new = c("wheat1", "wheat2"),
                     broader = broaderConcepts,
                     class = NA_character_,
                     ontology = crops)
#> Warning: some new concepts (wheat1, wheat2) don't have a class; please define
#> this with 'new_class()' and re-run 'new_concept()' with these concepts and the
#> new class.

get_concept(x = tibble(label = "Wheat"), tree = TRUE, ontology = crops)[,1:6] %>% 
  kable()
label class id has_broader description has_close_match
Wheat class .02.09 .02 NA NA
wheat crop .02.09.01 .02.09 NA externalDataset_1.3
wheat1 undefined .02.09.01.01 .02.09.01 NA NA
wheat2 undefined .02.09.01.02 .02.09.01 NA NA
# ... ok, then let's specify that class and re-run new_concept
crops <- new_class(new = "cultivar", target = "crop", 
                   description = "type of plant that people have bred for desired traits", 
                   ontology = crops)

crops <- new_concept(new = c("wheat1", "wheat2"),
                     broader = broaderConcepts,
                     class = "cultivar",
                     ontology = crops)

Now we can check whether the updated ontology is as we’d expect, for example by looking at the tree of the respective items again. We should expect that the new harmonized concepts now appear in the ontology and that they have some link to an external concept defined.

get_concept(x = tibble(label = "Wheat"), tree = TRUE, ontology = crops)[,1:6] %>% 
  kable()
label class id has_broader description has_close_match
Wheat class .02.09 .02 NA NA
wheat crop .02.09.01 .02.09 NA externalDataset_1.3
wheat1 cultivar .02.09.01.03 .02.09.01 NA NA
wheat2 cultivar .02.09.01.04 .02.09.01 NA NA

get_concept(x = tibble(label = "NUTS"), tree = TRUE, ontology = crops)[,1:6] %>% 
  kable()
label class id has_broader description has_close_match
NUTS group .10 NA NA externalDataset_3.3
Treenuts class .10.01 .10 NA NA
Other nuts class .10.02 .10 NA NA

get_concept(x = tibble(label = "FRUIT"), tree = TRUE, ontology = crops)[,1:6] %>% 
  kable()
label class id has_broader description has_close_match
FRUIT group .06 NA NA NA
Berries class .06.01 .06 NA NA
Citrus Fruit class .06.02 .06 NA NA
Grapes class .06.03 .06 NA NA
Pome Fruit class .06.04 .06 NA NA
Stone Fruit class .06.05 .06 NA NA
Tropical and subtropical Fruit class .06.06 .06 NA NA
avocado crop .06.06.01 .06.06 NA externalDataset_2.3
Other fruit class .06.07 .06 NA NA