NAME

rdf2tarql - Convert RDF examples to TARQL scripts

SYNOPSIS

  perl rdf2tarql.pl model.ttl > model.tarql

DESCRIPTION

rdf2tarql converts an RDF example with embedded CSV column names into a TARQL script (query). TARQL is a high-speed streaming convertor from CSV to RDF. We've used it to convert huge files (over 10M rows, 145 columns) using complex TARQL queries (480 lines: 110 prefixes, 33 nodes, 250 triples, 110 binds).

RDF Model

Typically the example is an rdfpuml model that uses embedded column names in URLs and attribute values.

Consider the following example about persons (customers):

    <person/(customer_id)> a :NaturalPerson;
      :id "(customer_id)";
      :firstName "(first_name)";
      :lastName "(last_name)";
      :gender "(gender)";
      :religion "(religion)";
      :hasAddress <person/(customer_id)/address>;
      :hasEvent  <person/(customer_id)/birth>;
      :hasEvent  <person/(customer_id)/education>.

    <person/(customer_id)/address> a :Address;
      :houseNumber "(house_number)";
      :street "(street)";
      :postalCode "(postal_code)";
      :city <country/(country)/city/urlify(city)>;
      :country <country/(country)>.

    <country/(country)/city/urlify(city)> a :City; :country <country/(country)>; :name "(city)".

    <country/(country)> a :Country; :code "(country)".

    <person/(customer_id)/birth> a :BirthEvent; :hasDate "(date_of_birth)"^^xsd:date.

    <person/(customer_id)/education> a :EducationEvent;
      :hasDate "(enrollment_date)"^^xsd:date;
      :university <university/urlify(university)>;
      :degree <degree/urlify(education_degree)>.

Generated Construct

The generated TARQL consists of two parts. First is a CONSTRUCT that's very similar to the example (model graph):

  construct {
    ?person_URL a :NaturalPerson;
      :id ?customer_id;
      :firstName ?first_name;
      :lastName ?last_name;
      :gender ?gender;
      :religion ?religion;
      :hasAddress ?person_address_URL;
      :hasEvent  ?person_birth_URL;
      :hasEvent  ?person_education_URL.

    ?person_address_URL a :Address;
      :houseNumber ?house_number;
      :street ?street;
      :postalCode ?postal_code;
      :city ?country_city_URL;
      :country ?country_URL.

    ?country_city_URL a :City; :country ?country_URL; :name ?city.

    ?country_URL a :Country; :code ?country.

    ?person_birth_URL a :BirthEvent; :hasDate ?DATE_OF_BIRTH.

    ?person_education_URL a :EducationEvent;
      :hasDate ?ENROLLMENT_DATE;
      :university ?university_URL;
      :degree ?degree_URL.

    ?university_URL a :University; :name ?university.
    ?degree_URL a :AcademicDegree; :name ?education_degree.

Generated Binds

Then come a bunch of bindings generated by:

Using the CSV fields (eg ?customer_id),
Computing URLs from patterns (eg ?person_address_URL),
Implementing a urlify() function that replaces consecutive punctuation with a single _ and removes leading/trailing punctuation (eg ?CITY then ?country_city_URL)
Implementing datatype casting using strdt() (eg ?DATE_OF_BIRTH)

All these binds are generated automatically using some simple conventions:

URL variables are named using the constant parts (eg ?person_birth) and appending _URL
Transformed variables are rendered in uppercase (eg ?CITY, ?DATE_OF_BIRTH)

Here is the result:

  } where {
    bind(iri(concat("person/",?customer_id)) as ?person_URL)
    bind(iri(concat("person/",?customer_id,"/address")) as ?person_address_URL)
    bind(iri(concat("person/",?customer_id,"/birth")) as ?person_birth_URL)
    bind(iri(concat("person/",?customer_id,"/education")) as ?person_education_URL)
    bind(replace(replace(replace(?city,'[^\\p{L}\\p{N}]+','_'),'^_',''),'_$','') as ?CITY)
    bind(iri(concat("country/",?country,"/city/",?CITY)) as ?country_city_URL)
    bind(iri(concat("country/",?country)) as ?country_URL)
    bind(strdt(?date_of_birth,xsd:date) as ?DATE_OF_BIRTH)
    bind(strdt(?enrollment_date,xsd:date) as ?ENROLLMENT_DATE)
    bind(replace(replace(replace(?university,'[^\\p{L}\\p{N}]+','_'),'^_',''),'_$','') as ?UNIVERSITY)
    bind(iri(concat("university/",?UNIVERSITY)) as ?university_URL)
    bind(replace(replace(replace(?education_degree,'[^\\p{L}\\p{N}]+','_'),'^_',''),'_$','') as ?EDUCATION_DEGREE)
    bind(iri(concat("degree/",?EDUCATION_DEGREE)) as ?degree_URL)
  }

Prerequisites

TARQL: tested with version 1.2-SNAPSHOT, BUILD_DATE: 2017-12-07T13:33:10Z

See test/customer for an example (includes a Makefile for make).

Limitations

Don't use uppercase in field names as that may conflict with generated variable names.

Supports only one simple function urlify(), more should be added.

Non-ASCII characters in IRIs get converted to ugly escapes.

SEE ALSO

rdfpuml: a tool that generates PlantUML diagrams from RDF examples.

rdf2rml: a tool that generates R2RML transformations from RDF examples.

rdf2ontorefine: a tool that generates OntoRefine SPARQL updates from RDF examples.

AUTHOR

Vladimir Alexiev, Ontotext Corp

Last update: 9-Jun-2020