RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Table of Contents

Motivation

RDF is a graph data model:

  • The best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram.
  • Many RDF visualization tools exist, but they either
    • Focus on large graphs (where the details are not easily visible), or
    • Visualization results are not satisfactory, or
    • Manual tweaking of the diagrams is required

Developed a tool rdfpuml that makes true diagrams directly from Turtle examples

Benefits:

  • Guarantees consistency between said (triple statements) and shown (diagram)
  • Lets you focus on domain modeling rather than diagram layouting/tweaking
  • Enables efficient source control
  • Saves you lots of effort

Simple Example: Getty CONA: Object Discovery

Source Turtle (puml: are diagram control triples, explained further)

cona:456-thing crm:P12i_was_present_at cona:456-discovery-123.
cona:456-discovery-123 a crmx:Discovery;
  crm:P2_has_type <cona/production/discovery>;
  crm:P4_has_time-span cona:456-discovery-123-date.
cona:456-discovery-123-date a crm:E52_Time-Span;
  crm:P82a_begin_of_the_begin "1800"^^xsd:gYear;
  crm:P82b_end_of_the_end "1850"^^xsd:gYear.
########################################
<cona/production/discovery> a puml:Inline.

CONA-production-discovery-Simple.png

Simple Example: Generated PlantUML

Takes care of many hairy details, eg node name sanitization, collecting property names & values, inlining, Reification/Association, replacing () with [] (puml quirk)…

@startuml
hide empty methods
hide empty attributes
hide circle
skinparam classAttributeIconSize 0
class cona_456_discovery_123 as "cona:456-discovery-123"
cona_456_discovery_123 : a crmx:Discovery
class cona_456_discovery_123_date as "cona:456-discovery-123-date"
cona_456_discovery_123 --> cona_456_discovery_123_date : crm:P4_has_time-span
cona_456_discovery_123 : crm:P2_has_type cona:production/discovery
cona_456_discovery_123_date : a crm:E52_Time-Span
cona_456_discovery_123_date : crm:P82a_begin_of_the_begin "1800"^^xsd:gYear
cona_456_discovery_123_date : crm:P82b_end_of_the_end "1850"^^xsd:gYear
class cona_456_thing as "cona:456-thing"
cona_456_thing --> cona_456_discovery_123 : crm:P12i_was_present_at
@enduml

Diagram Features

Diagram readability is a prime concern. Some measures:

  • Shorten URLs aggressively. Some prefixed names you see in the diagram are not valid Turtle
  • Sort properties by name (rdf:type comes first)
  • Show literal types, use Turtle shortcuts (eg 1 instead of "1"^^xsd:integer)
  • Collect property values together; collect "parallel" properties together
  • Inlining: show object in the subject node, not as separate node
    • Types and literals are inlined automatically
    • Inline resource (optional label): used often for lookup values
      <cona/event/competition> a puml:Inline.
      cona_contrib:10000000 a puml:Inline; rdfs:label "VP".
      
    • Inline property: show all objects of a property inlined
      fn:annotationSetFrame a puml:InlineProperty. 
      
  • Namespace puml: used for diagram control (usually written after ####)

Collect "Parallel" Property Names

  • Multiple property instances between nodes are collected in one arrow and shown as several labels
  • Inverse arrows work fine

CONA-arel-replaced-MultipleArrowLabels.png

Arrow Customization

Customize arrow for one relation or all prop instances

<subj> puml:$dir-$head-$line <obj> .
<prop> puml:arrow puml:$dir-$head-$line .
  • $dir: left, right, up, down (default)
  • $head: none="", tri="|>", star="*", o="o" (default=normal arrow)
  • $line: dashed (default is solid)
  • Each part is optional. Example:
cona:456-production puml:left cona:456-motivation-123.

CONA-production-motivation-ArrowDir.png

Stereotypes and Colored Circles

Stereotype: UML lingo for «guillemetted name» and colored circle.

  • Set on individual node or whole class
iso:ThesaurusArray puml:stereotype "(A,red)".
gvp:GuideTerm      puml:stereotype "(G,green)".
gvp:Concept        puml:stereotype "(C,lightblue)".

GVP-ordered-collection-Stereotype.png

Reification

  • <s> <p> <o> represented as node with 3 addressing props plus extra data
<node> a <ReificationClass>;
  <subjectProp>  <s>;
  <shortcutProp> <p>;
  <objectProp>   <o>;
  # extra data, eg dct:created, dct:creator
  • RDF Reification is used most often (rdf:Statement)
  • CIDOC CRM (with extension crmx:property or bmo:PX_property)
  • Property Reification Vocabulary allows us to express this data formally:
ReificationClass subjectProp shortcutProp objectProp prop (shortcut)
rdf:Statement rdf:subject rdf:predicate rdf:object any prop
crm:E13_Attribute _Assignment crm:P140_assigned _attribute_to crmx:property crm:P141_assigned any CRM prop
crm:E14_Condition _Assessment crm:P34_concerned crmx:property crm:P35_has_identified crm:P44_has_condition
crm:E15_Identifier _Assignment crm:P140_assigned _attribute_to crmx:property crm:P37_assigned, crm:P38_deassigned crm:P1_is_identified_by, crm:P102_has_title
crm:E16_Measurement crm:P39_measured crmx:property crm:P40_observed _dimension crm:P43_has_dimension
crm:E17_Type _Assignment crm:P41_classified crmx:property crm:P42_assigned crm:P2_has_type or subprop

Reification Nodes

  • Automatically discover reifications as per the previous table and make PlantUML Association Class
  • Show addressing properties with arrows (← : →) pointing to the subject, shortcut, object
  • Limit: no more than 2 reifications per relation (and the second one is ugly). So use puml:noReify to avoid making reification

CONA-arel-Reification.png

Examples: Getty CONA: Contribs and Sources

Aggressive URL shortening, collecting multiple values, value comments

CONA-contribs-sources.png

Getty CONA: Iconographic Authority

Arrow direction, reification

CONA-ia_hercules-Complex.png

American Art Collaborative: Mapping Museum Data to CIDOC CRM

One option how to map "cast after" (from the same mold)

AAC-NPG-castAfter.png

FRBRoo: Don Quixote Example (Europeana EFAP Task Force)

FRBRoo-DonQuixote.png

EHRI: Person Network

Jewish social networks: CIDOC CRM and AgRelOn

EHRI-person-network.png

Multisensor: Video Annotation & Text Annotation (NIF)

MS-video-and-text-NIF.png

Multisensor: Social Network Analysis

Global influence and reachability

MS-social-network.png

Multisensor: Image Annotation with OpenAnnotation+FISE

FISE (Stanbol) is used to express Confidence

MS-OpenAnnotation-and-Stanbol-FISE.png

Multisensor: News Item and its Translation

Stereotypes (circle, «italic»), arrow direction

MS-translation.png

Multisensor: FrameNet Embedded in NIF

Made with PlantUML "by hand", not rdfpuml. Uses PlantUML Packages

MS-FrameNet-Embedding-in-NIF.png

Multisensor: FrameNet+NIF (Part 1)

MS-Frame-complex-part1.png

Multisensor: FrameNet+NIF (Part 2)

MS-Frame-complex-part2.png

OpenAnnotation: Example 44

Blank nodes, RDF lists

OA-eg44.png

Duraspace PCDM Metadata

PCDM_Multi_Page_Text-circles.png

Video Annotation: LinkedTV NISV Use Case

LinkedTV-Fig22-NISV-scenario.png

Video Annotation: Generated Diagram

linkedtv-nisv.png

ICIJ Offshore Leaks (RDF by Ontotext)

leaks-model.png

Offshore Leaks Evolution

Model is evolving as we unravel more of the data (github "image diff")

leaks-model-evolution.png

R2RML Generation

R2RML is the W3C standard for RDBMS->RDF conversion

  • It is quite verbose and requires semantic experience to write
  • What if we could describe RDBMS mapping in our examples?
  • We can: include SQL queries and embedded field names

rdf2rdb generates R2RML transformations from examples:

  • Saves about 15x in complexity
  • Ensures consistency of model and actual conversion

Getty Museum: Exhibitions Model

exhibitions.png

Getty Exhibitions: Generated R2RML

exhibitions.r2rml.png

Getty Exhibitions: Relational Data

constituents

constituentid constituent
1 'Getty Museum'
2 'MoMA'
3 'LACMA'

exhibitions

exhibitionid exhdepartment exhtitle displaydate beginisodate endisodate
123 53 'Getty through the ages' 'October 2016' '2016-10-01' '2016-10-30'

exhvenuesxrefs

exhvenxref exhid conid conaddrid approved dispord displaydate beginisodate endisodate
202 123 2 102 1 1 'Early Oct 2016' '2016-10-01' '2016-10-15'
203 123 3 103 1 2 'Late Oct 2016' '2016-10-16' '2016-10-30'

exhvenobjxrefs

exhvenuexrefid objectid catalognumber begindispldateiso enddispldateiso displayed
202 1001 'cat 1001' '2016-10-01' '2016-10-15' 1
203 1001 'cat 1001' '2016-10-16' '2016-10-30' 1
202 1002 'cat 1002' '2016-10-01' '2016-10-15' 1

Getty Exhibitions: Converted RDF Data

exhibitions-out.png

Getty Museum: Museum Object

JPGM-objects.png

Future Work

R2RML works great for RDBMS, but how about other sources?

Working to extend rdf2rml to generate:

  • RML: extends R2RML to handle RDB, XML, JSON, CSV
  • XSPARQL: extends XQuery with SPARQL construct and JSON input
  • tarql: handles TSV/CSV with SPARQL construct

ClinicalTrials.gov: Clinical Study Results (XML)

CT.png

Dun & Bradstreet (Fix-Width) to Fin. Ind. Business Ontology

DnB-bussf.png

Global Legal Entity Indentifier (XML) to FIBO

GLEI-FIBO.png

Farther Future

  • Modularize & package better
  • Integrate in Emacs org-mode: write Turtle, see diagram
  • Extend RDF by Example to describe & generate RDF Shapes
  • Another tool to visualize RDF Shapes (SHACL and Shex)

Date: 2016-11-29

Author: Vladimir Alexiev

Created: 2016-11-28 Mon 22:29

Emacs 25.0.50.1 (Org mode 8.2.10)

Validate