shacl: shaping the big ball of data mud

50
Shaping the Big Ball of Data Mud W3C's Shapes Constraint Language (SHACL) Richard Cyganiak Lotico Berlin Semantic Web Meetup, 17 November 2016

Upload: richard-cyganiak

Post on 15-Apr-2017

1.025 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: SHACL: Shaping the Big Ball of Data Mud

Shaping the Big Ball of Data Mud

W3C's Shapes Constraint Language (SHACL)

Richard CyganiakLotico Berlin Semantic Web Meetup, 17 November 2016

Page 2: SHACL: Shaping the Big Ball of Data Mud

Semantic WebRDF

SPARQLOWLRDFS

Page 3: SHACL: Shaping the Big Ball of Data Mud

RDFSPARQL

OWLRDFS

Page 4: SHACL: Shaping the Big Ball of Data Mud

Strengths Weaknesses• Flexible can-say-anything data model•Merging data is trivial• Shared, explicit meaning thanks to URIs•Mixing and matching of schemas;

partial understanding• Painstakingly developed vocabularies• “Neutral ground” for modelling• SPARQL

• Overgeneralisation: works for anything, but great at nothing• “RDF tax”• Logic foundations and web

foundations can be baggage•Maps poorly to common

programming language data structures• Schemaless nature makes

optimisation difficult• Not good at semi-structured

Page 5: SHACL: Shaping the Big Ball of Data Mud

Application Areas• Knowledge graphs• Publishing• Life sciences• Fraud detection & identity management• Data integration & analysis

The V’s of Big Data: Volume, Velocity, Variety

Page 6: SHACL: Shaping the Big Ball of Data Mud

https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/

Page 7: SHACL: Shaping the Big Ball of Data Mud
Page 8: SHACL: Shaping the Big Ball of Data Mud

RDFSPARQL

OWLRDFS

Validation?Constraint checking?

Page 9: SHACL: Shaping the Big Ball of Data Mud

RDF is supposedly self-describing.

RDF

Page 10: SHACL: Shaping the Big Ball of Data Mud

Schema.org

Page 11: SHACL: Shaping the Big Ball of Data Mud

Simple Knowledge Organization Scheme (SKOS)

Page 12: SHACL: Shaping the Big Ball of Data Mud

Dublin Core

Page 13: SHACL: Shaping the Big Ball of Data Mud

Data Cube Vocabulary

Page 14: SHACL: Shaping the Big Ball of Data Mud

R2RML

Page 15: SHACL: Shaping the Big Ball of Data Mud

Linked Data Platform (LDP)

Page 16: SHACL: Shaping the Big Ball of Data Mud

Why is RDFS not enough?

RDFSPARQL

OWLRDFS

Page 17: SHACL: Shaping the Big Ball of Data Mud

Why is RDFS not enough?• RDF “Schema” — and schemas are for validation, right?• It’s a misnomer; should be “RDF Vocabulary Definition Language”• Very limited expressivity• Not the right semantics for validation• ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …?

• Invalid data -> infer more invalid data

=> ex:Germany a ex:City

RDFS

Page 18: SHACL: Shaping the Big Ball of Data Mud

Why is OWL not enough?

RDFSPARQL

OWLRDFS

Page 19: SHACL: Shaping the Big Ball of Data Mud

Why is OWL not enough?• De facto a constraint language: logical contradiction => invalid• Very expressive• But targeted at logic modelling, not validity constraints• Not the right semantics for validation• ex:Dublin ex:inCountry ex:Ireland, ex:USA => …?

• Open world assumption• No unique name assumption

=> ex:Ireland owl:sameAs ex:USA

OWL

Page 20: SHACL: Shaping the Big Ball of Data Mud

ICV: OWL closed-world semantics in Stardog

Page 21: SHACL: Shaping the Big Ball of Data Mud

Why is SPARQL not enough?

RDFSPARQL

OWLRDFS

Page 22: SHACL: Shaping the Big Ball of Data Mud

Why is SPARQL not enough?SPARQL

Page 23: SHACL: Shaping the Big Ball of Data Mud

http://spinrdf.org/

Page 24: SHACL: Shaping the Big Ball of Data Mud

Why is SPARQL not enough?• SPARQL ASK seems ideal for constraint validation• Very expressive• Efficient implementations• But writing even simple constraints can be tedious

SPARQL

Page 25: SHACL: Shaping the Big Ball of Data Mud

Other proposals

Page 26: SHACL: Shaping the Big Ball of Data Mud

ShEx — Shape Expressions

http://shex.io/

Page 27: SHACL: Shaping the Big Ball of Data Mud

So, something new?

RDFSPARQL

OWLRDFS

Validation?Constraint checking?

Page 28: SHACL: Shaping the Big Ball of Data Mud

SHACLShapes Constraint

Language

Page 29: SHACL: Shaping the Big Ball of Data Mud

SHACL Overview • A language for “checking RDF graphs against conditions”• Produced by W3C Data Shapes Working Group• Work in progress, some features at risk• 4th Working Draft: August 2016• Should be done by June 2017• Like RDFS and OWL, SHACL constraints are themselves written in RDF• SPARQL underneath (for evaluation semantics and extensibility)

Page 30: SHACL: Shaping the Big Ball of Data Mud

ex:PersonShapea sh:Shape ;sh:targetClass ex:Person ;sh:property [

sh:predicate ex:ssn ;sh:maxCount 1 ;sh:datatype xsd:string ;sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" ;

] ;sh:property [

sh:predicate ex:child ;sh:class ex:Person ;sh:nodeKind sh:IRI ;

] ;sh:property [

sh:path [ sh:inversePath ex:child ] ;sh:name "parent" ;sh:maxCount 2 ;

] .

Page 31: SHACL: Shaping the Big Ball of Data Mud

How a Shape works

Diagram: Dimitris Kontokostas

Page 32: SHACL: Shaping the Big Ball of Data Mud

Targets: Initial selection of focus nodes• Node target• Class instance target• Subjects-of target• Objects-of target• SPARQL-based selection (advanced)

Page 33: SHACL: Shaping the Big Ball of Data Mud

Node constraintsConstraints about the focus node itself:

• Node kind (IRI, blank, literal)• IRI stem (namespace)• IRI regex• SPARQL query constraint (advanced)

Page 34: SHACL: Shaping the Big Ball of Data Mud

Property constraintsConstraints about a certain outgoing or incoming property of the focus node(s):

• Cardinality• Class• Datatype• Node kind (IRI, blank node, literal)• String min/max length, string regex• Numeric min/max

• Value must match another shape• Value must not match another shape

Page 35: SHACL: Shaping the Big Ball of Data Mud

Other features• Combine constraints with logical OR/any (default: AND/all)• Property-pair comparison (=, <, >)• Severities (Violation, Warning, Info)• Annotations (name, description, grouping, order)• Define additional types of constraints based on SPARQL (advanced)

Page 36: SHACL: Shaping the Big Ball of Data Mud

Violation reports can be produced in RDFex:ExampleConstraintViolation

a sh:ValidationResult ;sh:severity sh:Violation ;sh:focusNode ex:Bob ;sh:path ex:age ;sh:value "twenty two" ;sh:message "ex:age must be literal of datatype xsd:integer." ;sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;sh:sourceShape ex:PersonShape .

Page 37: SHACL: Shaping the Big Ball of Data Mud

Relationship to Rules• Rules: “If someone says this, then I say that.”• SHACL can’t do this.• Does not replace SWRL, Jena Rules, RIF, SPIN Rules

Page 38: SHACL: Shaping the Big Ball of Data Mud

Uses and implementations

Page 39: SHACL: Shaping the Big Ball of Data Mud

SHACL in TopBraid Composer:Shapes + Constraints

SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/

Page 40: SHACL: Shaping the Big Ball of Data Mud

SHACL in TopBraid Composer: SPARQL-based constraints

Page 41: SHACL: Shaping the Big Ball of Data Mud

SHACL in TopQuadrant’s web products (EVN, EDG)

Page 42: SHACL: Shaping the Big Ball of Data Mud
Page 43: SHACL: Shaping the Big Ball of Data Mud

SHACL Protégé Plugin

http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html

Page 44: SHACL: Shaping the Big Ball of Data Mud

Repairing SKOS taxonomies with SHACLValidation of SKOS with SHACL, and extension of SHACL with specification of repair strategies.

Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf

Page 45: SHACL: Shaping the Big Ball of Data Mud
Page 46: SHACL: Shaping the Big Ball of Data Mud

Validating the “bag of crisps”…• Validation is often not about correct/incorrect or valid/invalid• Constraints-first (e.g., SQL)• Well-formed vs valid (e.g., XML Schema)

• Validation is often about completeness and correctness for a specific purpose: “This is what I produce”; “This is what I understand”• Assumption is that there may be other statements• Different consumers may apply different constraints• SHACL should work well in this flexible, multi-source, multi-consumer

world.

Page 47: SHACL: Shaping the Big Ball of Data Mud

“Anyone can say anything about anything”

RDFSPARQL

OWLRDFS

Statements: What is being said?

What words dowe have?

What makes logical sense to say?

What did you sayabout XYZ?

OWL SHACL

Is that word used correctly?What do you need to know from me?You can't say that here!I’d never say that!

2017

Page 49: SHACL: Shaping the Big Ball of Data Mud

Backup slides

Page 50: SHACL: Shaping the Big Ball of Data Mud