2016.02 - validating rdf data quality using constraints to direct the development of constraint...

Post on 28-Jan-2018

507 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Validating RDF Data Quality using Constraints

to Direct the Development of Constraint Languages

Thomas Hartmann

Benjamin Zapilko, Joachim Wackerow, Kai Eckert

International Conference on Semantic Systems (ICSC 2016)

XML Validation

<!ELEMENT library (book+, author*)>

<!ELEMENT book (isbn, title, author-ref+)>

<!ATTLIST book

id ID #REQUIRED

>

<!ELEMENT author-ref EMPTY>

<!ATTLIST author-ref

id IDREF #REQUIRED

>

<!ELEMENT author (name)>

<!ATTLIST author

id ID #REQUIRED

>

<!ELEMENT isbn (#PCDATA)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT name (#PCDATA)>

RDF Validation Workshop

Working Groups on RDF Validation

W3C Data Shapes Working Group

DCMI RDF Application Profiles Task Group

http://purl.org/net/rdf-validation

81 Types of Constraints on RDF Data

Constraint Languages

SPARQL Query Language for RDF

SELECT ?concept

WHERE {

?concept a [ rdfs:subClassOf* skos:Concept ] .

FILTER NOT EXISTS {

?concept ?p ?o .

FILTER ( ?p IN (

skos:related,

skos:relatedMatch,

skos:broader, ... ) ) . } }

SPARQL Inferencing Notation (SPIN)

# FILTER NOT EXISTS { ?book author ?person }

[ a sp:Filter ;

sp:expression [

a sp:notExists ;

sp:elements (

[ sp:subject [ sp:varName "book" ] ;

sp:predicate author ;

sp:object [ sp:varName "person" ]])]])

Web Ontology Language (OWL)

:Publication rdfs:subClassOf

[ a owl:Restriction ;

owl:onProperty :author ;

owl:allValuesFrom :Person ] .

Shape Expressions (ShEx)

:Publication {

( :isbn xsd:string, :title xsd:string )

|

( :issn xsd:string, :title xsd:string )}

Resource Shapes (ReSh)

:Computer-Science-Book

a oslc:ResourceShape ;

oslc:property [

oslc:propertyDefinition :subject ;

oslc:allowedValues [

oslc:allowedValue

"Computer Science" ,

"Informatics" ,

"Information Technology" ] ] .

[ a dsp:DescriptionTemplate ;

dsp:resourceClass :Science-Fiction-Book ;

dsp:statementTemplate [

dsp:property :subject ;

dsp:nonLiteralConstraint [

dsp:valueClass skos:Concept ;

dsp:valueURI

:Science-Fiction, :Sci-Fi, :SF ;

dsp:vocabularyEncodingScheme

:Science-Fiction-Book-Subjects ; ] ] .

Description Set Profiles (DSP)

Shapes Constraint Language (SHACL)

:BookShape

a sh:Shape ;

sh:scopeClass :Book ;

sh:property [

sh:predicate :author ;

sh:valueShape :PersonShape ;

sh:minCount 1 ; ] .

http://purl.org/net/rdfval-demo

RDF Validation Environment

Constraint Types Classification

1. RDFS/OWL Based

2. Constraint Language Based

3. SPARQL Based

RDFS/OWL Based

:Publication rdfs:subClassOf

[ a owl:Restriction ;

owl:onProperty :author ;

owl:allValuesFrom :Person ] .

Constraint Language Based

:Publication {

( :isbn xsd:string, :title xsd:string )

|

( :issn xsd:string, :title xsd:string )}

SPARQL Based

SELECT ?concept

WHERE {

?concept a [ rdfs:subClassOf* skos:Concept ] .

FILTER NOT EXISTS {

?concept ?p ?o .

FILTER ( ?p IN (

skos:related,

skos:relatedMatch,

skos:broader, ... ) ) . } }

Constraints Classification

1. Informational

2. Warning

3. Error

Evaluation Setup

• 115 constraints from vocabularies and experts

• constraints classified and implemented

• on 3 vocabularies in the SBE sciences– well-established vocabularies (QB, SKOS)

– vocabulary under development (DDI-RDF)

Validated Data Sets

Vocabulary Data Sets Triples

QB 9,990 3,775,983,610

SKOS 4,178 477,737,281

DDI-RDF 1,526 9,673,055

Total 15,694 4.26 billion

33 SPARQL Endpoints

Finding 1

C [%] CV [%]

SPARQL 63.2 78.2

CL 34.7 21.8

RDFS/OWL 35.6 21.8

C (constraints), CV (constraint violations)

Finding 2

C [%] CV [%]

SPARQL 63.2 78.2

CL 34.7 21.8

RDFS/OWL 35.6 21.8

C (constraints), CV (constraint violations)

Finding 3

C [%] CV [%]

Info 42.3 31.3

Warning 18.7 62.7

Error 39.0 6.1

C (constraints), CV (constraint violations)

Limitations

> 3 Vocabularies

> 1 Domain

top related