doctoral examination at the karlsruhe institute of technology (08.07.2016)

Post on 14-Apr-2017

202 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu

Validation Frameworkfor RDF-based Constraint Languages

M.Sc. (TUM) Thomas Hartmann

Professor Dr. York Sure-VetterProfessor Dr. Kai Eckert (Stuttgart Media University)Professor Dr. Rudi StuderProfessor Dr. Andreas Geyer-Schulz

Disputation, 08.07.2016

2

enthusiasm for SW technologies

problem statement

3

common need for RDF Validation

problem statement

4

common needs of data practitioners2013: W3C RDF Validation Workshop2014: 2 international working groups on RDF validationconstraint languages

SPARQL Query Language for RDFSPARQL Inferencing Notation (SPIN)Web Ontology Language (OWL) Shape Expressions (ShEx)Resource Shapes (ReSh)Description Set Profiles (DSP)Shapes Constraint Language (SHACL)

none of these languages meets all requirements

RDF validation as research field

problem statement

W3C RDF Data Shapes Working Group

DCMI RDF ApplicationProfiles Task Group

5

Resource Description Framework (RDF)

5problem statement

6

constraints of running example

6problem statement

7

constraints of running example

7problem statement

8

constraints of running example

8problem statement

9

constraints of running example

9problem statement

10

constraints of running example

10problem statement

11

provide a basis for continued research RDF validationdevelopment of constraint languages

further development of constraint languages based on commonly approved requirements incorporate the findings into the working groups

thesis objectives

thesis objectives

www.kit.edu12

5 research questions

13

Which types of research data and related metadata are not yet representable in RDF and

how to adequately model them to be able to validate RDF data

against constraints extractable from these vocabularies?

research question 1

RQ1

IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46

LDOW (WWW 2013)SemStats (ISWC 2013)

DC 2012ESWC 2011 (Poster)

DDI MovingForward Project

RDF Vocabularies Working Group

14

How to directly validate XML data on semantically rich OWL axioms

using common RDF validation tools when XML Schemas, adequately representing particular domains,

have already been designed?

research question 2

RQ2

IJMSO, 8(3)ISWC 2012

ICITST 2011OCAS (ISWC 2011)

www.kit.edu15

research question 3

16

http://purl.org/net/rdf-validation

DC 2014RQ3

17RQ3

18RQ3

19RQ3

20RQ3

21

Which types of constraints must be expressible by constraint languages to meet

all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?

research question 3

RQ3

22

a constraint is instantiated from a constraint typeeach constraint type corresponds to a requirement

81 constraint types

types of constraints on RDF data

RQ3

www.kit.edu23

research question 4

24

ShEx:

ReSh:

SHACL:

:Book { :author @:Person{1, } }

:Book a rs:ResourceShape ; rs:property [ rs:propertyDefinition :author ; rs:valueShape :Person ; rs:occurs rs:One-or-many ; ] .

minimum qualified cardinality restrictions (R-75)

:BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] . :PersonShape a sh:Shape ; sh:scopeClass :Person .

RQ4

25

SPARQL and SPIN:

CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE { ?subject a ?C1 ; ?predicate ?object . BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ). BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) . FILTER ( ?cardinality < ?minimumCardinality ) . FILTER ( ?minimumCardinality = 1 ) . FILTER ( ?C1 = :Book ) . FILTER ( ?C2 = :Person ) . FILTER ( ?predicate = :author ) . }

SELECT ( COUNT ( ?arg1 ) AS ?c ) WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . }

RQ4

minimum qualified cardinality restrictions (R-75)

26

minimum qualified cardinality restrictions (R-75)

OWL:

DSP:

:Book rdfs:subClassOf [ a owl:Restriction ; owl:minQualifiedCardinality 1 ; owl:onProperty :author ; owl:onClass :Person ] .

[ dsp:resourceClass :Book ; dsp:statementTemplate [ dsp:minOccur 1 ; dsp:property :author ; dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] .

RQ4

27

high-level constraint languages eitherlack an implementation orare based on different implementations

How to consistently validate RDF data against constraints of any constraint type

expressed in any RDF-based constraint language?

research question 4-1

RQ4

28

validation environment

constraint language implementation (SPIN mapping):

:MinimumQualifiedCardinalityRestrictions a spin:ConstructTemplate ; spin:body [ ... CONSTRUCT { ... } WHERE { ... } ... ] .

RQ4

29

validation process

RQ4

30RQ4

validation results

30

31

validation results

RQ4 31

32

validation results

RQ4 32

33

validation results

RQ4 33

34

validation results

RQ4 34

35

validation results

RQ4 35

36

validation results

RQ4 36

37

full implementations forall OWL 2 and DSP language constructsall constraint types expressible in OWL 2 and DSPmajor constraint types representable by ShEx and ReSh

RDF serialization for DSP

validation environment

http://purl.org/net/rdfval-demo

RQ4

38

http://purl.org/net/rdfval-demo

RQ4

39

constraints and constraint language constructs must be representable in RDF

constraint languages and supported constraint types must be expressible in SPARQL

limitations

RQ4

40

How to represent constraints of any constraint type and how to reduce the representation of constraints of any constraint type

to the absolute minimum?

research question 4-2

RQ4

DSP ReSh ShEx SHACL OWL 2 SPARQL

17.3 (14)

25.9 (21)

29.6 (24)

51.9 (42)

67.9 (55)

100.0(81)

41

intermediate abstraction layerbased on formal logicsenables to express any constraint typeenables straight-forward mappings from high-level constraint languagesreduces the representation of constraints to the absolute minimum

validation framework for RDF-based constraint languages

RQ4

42

conceptual model

DC 2015

RQ4

74%

26%

43RQ4 43

simple constraints

44

different validation results

RQ4

45

different validation results

RQ4 45

46

different validation results

RQ4 46

47

different validation results

RQ4 47

48

different validation results

RQ4 48

49

different validation results

RQ4 49

50

How to ensure for any constraint type that RDF data is consistently validated against

semantically equivalent constraints of the same constraint typeacross RDF-based constraint languages?

framework is solely based on the abstract definitions of constraint typesjust 1 SPIN mapping for each constraint type

research question 4-3

RQ4

51RQ4

semantically equivalent constraints

51

52

How to ensure for any constraint type that semantically equivalent constraints of the same constraint type

can be transformed from one RDF-based constraint language to another?

gc = mα (cα)

cβ = m'β (gc)

RQ4

research question 4-4

53

What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed

prior to validation to enhance data quality?

research question 5

RQ5

SEMANTiCS 2015

54

collected, classified, and implemented 115 constraintsfrom vocabularies or domain experts

on 3 common vocabularieswell-established (QB, SKOS)under development (DDI-RDF)

evaluation

evaluation

IJSC, 10(2)ICSC 2016

33 SPARQL endpoints

55

future work: validation database and framework

maintain and extend RDF validation databasecollect case studies and use casesextract requirementspublish constraint typeskeep framework in syncevaluate solutions

future work

http://purl.org/net/rdf-validation

56

future work: combine framework with SHACL

derive SHACL extensions define mappings from SHACL to the abstraction layer and backmaintain consistency of implementations of constraint types

future work

W3C RDF Data Shapes Working Group

DCMI RDF ApplicationProfiles Task Group

57

summary of main contributions

development of 3 RDF vocabulariesdirect validation of XML using common RDF validation toolspublication of 81 constraint typesvalidation framework for RDF-based constraint languagesrole of reasoning for RDF validation

THANK YOU!

58

acknowledgements, publications, research data

30 publications6 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports1. author of all (except 1) journal articles, conference articles, workshop articles

research data and resultsKIT research data repository: http://dx.doi.org/10.5445/BWDD/11GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis

4 international working groupsDCMI RDF Application Profiles Task Group

part of the editorial board

RDF Vocabularies Working Groupeditor for DDI-RDF and PHDD

W3C RDF Data Shapes Working GroupDDI Moving Forward Project

THANK YOU!

www.kit.edu59

appendix

60

publications: journal articles

1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Development of Constraint Languages by Checking Constraints on RDF Data. International Journal of Semantic Computing, 10(02), 1–25. http://www.worldscientific.com/worldscinet/ijsc

2. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4

3. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4

4. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4

5. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4

6. Bosch, Thomas & Mathiak, B. (2013). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760

Please note that in 2015, my last name changed from Bosch to Hartmann.

61

publications: articles in conference proceedings

1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/

2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368

3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867

4. Bosch, Thomas & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257

5. Bosch, Thomas & Eckert, K. (2014). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/270

Please note that in 2015, my last name changed from Bosch to Hartmann.

62

publications: articles in conference proceedings

6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654

7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34

8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html

9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html

Please note that in 2015, my last name changed from Bosch to Hartmann.

63

publications: articles in workshop proceedings

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/

2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings

3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/

64

publications: specifications

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery

2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html

65

publications: technical reports

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Hartmann, Thomas (2016). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062

2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02

3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements

4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable

5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933

66

publications: technical reports

Please note that in 2015, my last name changed from Bosch to Hartmann.

6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479

7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478

8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470

9. Bosch, Thomas & Mathiak, B. (2013). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/

10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series

67

research questions

1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies?

2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed?

3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?

4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another?

5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality?

appendix

68

summary of contributions

1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies

2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains

3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints

4.1 Consistent validation across RDF-based constraint languages4.2 Minimal representation of constraints of any type4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of

the same constraint type across RDF-based constraint languages4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be

transformed from one RDF-based constraint language to another5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type

(1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics

6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality

appendix

69

summary of limitations

1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way2. Constraints of supported constraint types and constraint language constructs must be representable in RDF3. Constraint languages and supported constraint types must be expressible in SPARQL4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies

appendix

www.kit.edu70

research question 1

71

Which types of research data and related metadata are not yet representable in RDF and

how to adequately model them to be able to validate RDF data

against constraints extractable from these vocabularies?

research question 1

RQ1

IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46

LDOW (WWW 2013)SemStats (ISWC 2013)

DC 2012ESWC 2011 (Poster)

DDI MovingForward Project

RDF Vocabularies Working Group

72

development of 3 RDF vocabularies:

1. DDI-RDF Discovery Vocabulary (DDI-RDF)to describe unit-record data

2. Physical Data Description (PHDD)to describe data in tabular format and its physical properties

3. The SKOS Extension for Statistics (XKOS)to describe the structure and textual properties of formal statistical classificationsto describe relations between classifications and concepts and among concepts

contribution

RQ1

www.kit.edu73

research question 2

74

XML, XML Schema (XSD)RDF, Web Ontology Language (OWL)XML Schemas > OWL ontologiestime-consuming work designing domain ontologies from scratch by handreuse information contained in XML Schemas

designing OWL domain ontologies

RQ2

75

How to directly validate XML data on semantically rich OWL axioms

using common RDF validation tools when XML Schemas, adequately representing particular domains,

have already been designed?

research question 2

RQ2

IJMSO, 8(3)ISWC 2012

ICITST 2011OCAS (ISWC 2011)

76

sub-class relationshipsOWL hasValue restrictions on data propertiesOWL universal restrictions on object properties

semantically rich OWL axioms

<library> <book year="February 1890"> <author> <name>Arthur Conan Doyle</name> </author> <title>The Sign of the Four</title> </book></library>

Title ⊑ value.stringYear ⊑ value.integer

RQ2

77

on formal logics based transformationsOWL axioms extracted out of XML Schemas

explicitlyimplicitly

formally underpin transformationsto formally define and model semantics in a semantically correct way

complete extraction of XML Schemas' structural informationXML can directly be validated against semantically rich OWL axiomsany XML Schema is convertible to OWLminimized effort designing OWL domain ontologies

contributions

IJMSO, 8(3)

RQ2

78

ISWC 2012ICITST 2011

OCAS (ISWC 2011)

RQ2

79

1. step of approachexecuted generic test cases created out of the XML Schema meta-modeltransformed XML Schemas of 6 XML standards

2. step of approachspecified SWRL rules for 3 OWL domain ontologies

verified hypothesisdetermined effort for traditional manual approachestimated effort for semi-automatic approachDDI-RDF serves as OWL domain ontology

The effort and the time needed to deliver high quality domain ontologies from scratch by reusing information of already existing XML Schemas is much less than creating domain ontologies completely manually and from the ground up.

evaluation

IJMSO, 8(3)

RQ2

www.kit.edu80

research question 5

81

What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed

prior to validation to enhance data quality?

research question 5

RQ5

SEMANTiCS 2015

82

What is the role reasoning plays in practical data validation?

research question 5-1

RQ5

83

reasoning may resolve violations

Book ⊑ author.PersonBook(Huckleberry-Finn)author(Huckleberry-Finn, Mark-Twain)

→ Person(Mark-Twain)

RQ5

84

reasoning may cause violations

Publication ⊑ publisher.Publisher∃Book(Huckleberry-Finn)

Book Publication⊑

RQ5

85

reasoning solves redundency

Publication ⊑ publicationDate . xsd:date∃

Book ⊑ PublicationConference-Proceeding ⊑ PublicationJournal-Article ⊑ Publication

RQ5

86

For which constraint types reasoning may be performed prior to validation to enhance data quality?

research question 5-2

RQ5

87

> 2/5 of constraint typesproperty domains (R-25):

constraint types with reasoning

∃ author. ⊤ ⊑ Publicationauthor(Alices-Adventures-In-Wonderland, Lewis-Carroll)

→ rdf:type(Alices-Adventures-In-Wonderland, Publication)

RQ5

88

< 3/5 of constraint typesliteral pattern matching (R-44):

constraint types without reasoning

RQ5

ISBN a rdfs:Datatype ; owl:equivalentClass [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ([ xsd:pattern "^\d{9}[\d|X]$" ])] .

Book ⊑ identifier.ISBN

89

For which constraint types validation results differ(1) if the CWA or the OWA and

(2) if the UNA or the nUNA is assumed?

CWA dependent: 56.8%UNA dependent: 66.6%

research question 5-3

RQ5

90

56.8% of constraint typesminimum qualified cardinality restrictions (R-75):

CWA dependent constraint types

RQ5

Book title. ⊑ ∃ ⊤

91

disjoint classes (R-7):

CWA independent constraint types

RQ5

Book JournalArticle ⊓ ⊑ ⊥

92

66.6% of constraint typesfunctional properties (R-57/65):

UNA dependent constraint types

RQ5

funct(title)

title(The-Adventures-of-Huckleberry-Finn, "The Adventures of Huckleberry Finn")

title(The-Adventures-of-Huckleberry-Finn, "Die Abenteuer des Huckleberry Finn")

93

literal value comparison (R-43):

UNA independent constraint types

RQ5

birthDate(Albert-Einstein, "1955-04-18")deathDate(Albert-Einstein, "1879-03-14")

birthDate(Albert_Einstein, "1879-03-14")deathDate(Albert_Einstein, "1955-04-18")

owl:sameAs(Albert-Einstein, Albert_Einstein)

www.kit.edu94

evaluation

95

collected, classified, and implemented 115 constraintsfrom vocabularies or domain experts

on 3 common vocabularieswell-established (QB, SKOS)under development (DDI-RDF)

evaluation

evaluation

IJSC, 10(2)ICSC 2016

33 SPARQL endpoints

96

classification of constraint typesRDFS/OWL basedconstraint language basedSPARQL based

classification of constraintsinformationalwarningerror

evaluation

classification

97

RDFS/OWL based

evaluation

classification of constraint types

:Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .

98

constraint language based

evaluation

classification of constraint types

:Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}

99

SPARQL based

evaluation

classification of constraint types

SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }

100

C (constraints), CV (constraint violations)values in %

evaluation

finding 1

C CV

SPARQL 63.2 78.2

CL 34.7 21.8

RDFS/OWL 35.6 21.8

101

C (constraints), CV (constraint violations)values in %

evaluation

finding 2

C CV

SPARQL 63.2 78.2

CL 34.7 21.8

RDFS/OWL 35.6 21.8

102

C (constraints), CV (constraint violations)values in %

evaluation

finding 3

C CV

Info 42.3 31.3

Warning 18.7 62.7

Error 39.0 6.1

www.kit.edu103

future work

104

future work: RQ1

publication of RDF vocabulariesDDI Alliance specificationsW3C recommendation for DDI-RDF

DDI-Lifecycle MD (Model-Driven)new requirements based on experiences with DDI-RDFinternational working group: DDI Moving Forward Project individual contributions

formalize conceptual model (using UML 2)conceptualize and implement diverse model serializations (e.g., RDFS/OWL)

future work

105

aligning PHDD and CSV on the WEBoverlap in the description of tabular data in CSV formatbroader scope of PHDD

description of tabular data with fixed record lengthdescription of tabular data with multiple records per case

evaluation for use in DDI-Lifecycle MD

future work: RQ1

future work

106

future work: RQ2

bidirectional transformations from models of any meta-model to OWLgeneralize from XSD meta-model based unidirectional transformations from XSD models into OWL modelsenable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools

future work

107

future work: validation database and framework

maintain and extend RDF validation databasecollect case studies and use casesextract requirementspublish constraint typeskeep framework in syncevaluate solutions

future work

http://purl.org/net/rdf-validation

108

future work: combine framework with SHACL

derive SHACL extensions define mappings from SHACL to the abstraction layer and backmaintain consistency of implementations of constraint types

future work

W3C RDF Data Shapes Working Group

DCMI RDF ApplicationProfiles Task Group

top related