doctoral examination at the karlsruhe institute of technology (08.07.2016)

108
KIT – Die Forschungsuniversität in der Helmholtz- Gemeinschaft www.kit.edu Validation Framework for RDF-based Constraint Languages M.Sc. (TUM) Thomas Hartmann Professor Dr. York Sure-Vetter Professor Dr. Kai Eckert (Stuttgart Media University) Professor Dr. Rudi Studer Professor Dr. Andreas Geyer-Schulz Disputation, 08.07.2016

Upload: dr-ing-thomas-hartmann

Post on 14-Apr-2017

202 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu

Validation Frameworkfor RDF-based Constraint Languages

M.Sc. (TUM) Thomas Hartmann

Professor Dr. York Sure-VetterProfessor Dr. Kai Eckert (Stuttgart Media University)Professor Dr. Rudi StuderProfessor Dr. Andreas Geyer-Schulz

Disputation, 08.07.2016

Page 2: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

2

enthusiasm for SW technologies

problem statement

Page 3: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

3

common need for RDF Validation

problem statement

Page 4: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

4

common needs of data practitioners2013: W3C RDF Validation Workshop2014: 2 international working groups on RDF validationconstraint languages

SPARQL Query Language for RDFSPARQL Inferencing Notation (SPIN)Web Ontology Language (OWL) Shape Expressions (ShEx)Resource Shapes (ReSh)Description Set Profiles (DSP)Shapes Constraint Language (SHACL)

none of these languages meets all requirements

RDF validation as research field

problem statement

W3C RDF Data Shapes Working Group

DCMI RDF ApplicationProfiles Task Group

Page 5: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

5

Resource Description Framework (RDF)

5problem statement

Page 6: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

6

constraints of running example

6problem statement

Page 7: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

7

constraints of running example

7problem statement

Page 8: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

8

constraints of running example

8problem statement

Page 9: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

9

constraints of running example

9problem statement

Page 10: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

10

constraints of running example

10problem statement

Page 11: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

11

provide a basis for continued research RDF validationdevelopment of constraint languages

further development of constraint languages based on commonly approved requirements incorporate the findings into the working groups

thesis objectives

thesis objectives

Page 12: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu12

5 research questions

Page 13: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

13

Which types of research data and related metadata are not yet representable in RDF and

how to adequately model them to be able to validate RDF data

against constraints extractable from these vocabularies?

research question 1

RQ1

IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46

LDOW (WWW 2013)SemStats (ISWC 2013)

DC 2012ESWC 2011 (Poster)

DDI MovingForward Project

RDF Vocabularies Working Group

Page 14: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

14

How to directly validate XML data on semantically rich OWL axioms

using common RDF validation tools when XML Schemas, adequately representing particular domains,

have already been designed?

research question 2

RQ2

IJMSO, 8(3)ISWC 2012

ICITST 2011OCAS (ISWC 2011)

Page 15: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu15

research question 3

Page 16: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

16

http://purl.org/net/rdf-validation

DC 2014RQ3

Page 17: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

17RQ3

Page 18: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

18RQ3

Page 19: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

19RQ3

Page 20: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

20RQ3

Page 21: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

21

Which types of constraints must be expressible by constraint languages to meet

all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?

research question 3

RQ3

Page 22: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

22

a constraint is instantiated from a constraint typeeach constraint type corresponds to a requirement

81 constraint types

types of constraints on RDF data

RQ3

Page 23: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu23

research question 4

Page 24: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

24

ShEx:

ReSh:

SHACL:

:Book { :author @:Person{1, } }

:Book a rs:ResourceShape ; rs:property [ rs:propertyDefinition :author ; rs:valueShape :Person ; rs:occurs rs:One-or-many ; ] .

minimum qualified cardinality restrictions (R-75)

:BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] . :PersonShape a sh:Shape ; sh:scopeClass :Person .

RQ4

Page 25: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

25

SPARQL and SPIN:

CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE { ?subject a ?C1 ; ?predicate ?object . BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ). BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) . FILTER ( ?cardinality < ?minimumCardinality ) . FILTER ( ?minimumCardinality = 1 ) . FILTER ( ?C1 = :Book ) . FILTER ( ?C2 = :Person ) . FILTER ( ?predicate = :author ) . }

SELECT ( COUNT ( ?arg1 ) AS ?c ) WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . }

RQ4

minimum qualified cardinality restrictions (R-75)

Page 26: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

26

minimum qualified cardinality restrictions (R-75)

OWL:

DSP:

:Book rdfs:subClassOf [ a owl:Restriction ; owl:minQualifiedCardinality 1 ; owl:onProperty :author ; owl:onClass :Person ] .

[ dsp:resourceClass :Book ; dsp:statementTemplate [ dsp:minOccur 1 ; dsp:property :author ; dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] .

RQ4

Page 27: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

27

high-level constraint languages eitherlack an implementation orare based on different implementations

How to consistently validate RDF data against constraints of any constraint type

expressed in any RDF-based constraint language?

research question 4-1

RQ4

Page 28: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

28

validation environment

constraint language implementation (SPIN mapping):

:MinimumQualifiedCardinalityRestrictions a spin:ConstructTemplate ; spin:body [ ... CONSTRUCT { ... } WHERE { ... } ... ] .

RQ4

Page 29: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

29

validation process

RQ4

Page 30: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

30RQ4

validation results

30

Page 31: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

31

validation results

RQ4 31

Page 32: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

32

validation results

RQ4 32

Page 33: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

33

validation results

RQ4 33

Page 34: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

34

validation results

RQ4 34

Page 35: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

35

validation results

RQ4 35

Page 36: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

36

validation results

RQ4 36

Page 37: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

37

full implementations forall OWL 2 and DSP language constructsall constraint types expressible in OWL 2 and DSPmajor constraint types representable by ShEx and ReSh

RDF serialization for DSP

validation environment

http://purl.org/net/rdfval-demo

RQ4

Page 38: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

38

http://purl.org/net/rdfval-demo

RQ4

Page 39: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

39

constraints and constraint language constructs must be representable in RDF

constraint languages and supported constraint types must be expressible in SPARQL

limitations

RQ4

Page 40: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

40

How to represent constraints of any constraint type and how to reduce the representation of constraints of any constraint type

to the absolute minimum?

research question 4-2

RQ4

DSP ReSh ShEx SHACL OWL 2 SPARQL

17.3 (14)

25.9 (21)

29.6 (24)

51.9 (42)

67.9 (55)

100.0(81)

Page 41: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

41

intermediate abstraction layerbased on formal logicsenables to express any constraint typeenables straight-forward mappings from high-level constraint languagesreduces the representation of constraints to the absolute minimum

validation framework for RDF-based constraint languages

RQ4

Page 42: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

42

conceptual model

DC 2015

RQ4

74%

26%

Page 43: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

43RQ4 43

simple constraints

Page 44: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

44

different validation results

RQ4

Page 45: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

45

different validation results

RQ4 45

Page 46: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

46

different validation results

RQ4 46

Page 47: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

47

different validation results

RQ4 47

Page 48: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

48

different validation results

RQ4 48

Page 49: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

49

different validation results

RQ4 49

Page 50: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

50

How to ensure for any constraint type that RDF data is consistently validated against

semantically equivalent constraints of the same constraint typeacross RDF-based constraint languages?

framework is solely based on the abstract definitions of constraint typesjust 1 SPIN mapping for each constraint type

research question 4-3

RQ4

Page 51: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

51RQ4

semantically equivalent constraints

51

Page 52: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

52

How to ensure for any constraint type that semantically equivalent constraints of the same constraint type

can be transformed from one RDF-based constraint language to another?

gc = mα (cα)

cβ = m'β (gc)

RQ4

research question 4-4

Page 53: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

53

What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed

prior to validation to enhance data quality?

research question 5

RQ5

SEMANTiCS 2015

Page 54: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

54

collected, classified, and implemented 115 constraintsfrom vocabularies or domain experts

on 3 common vocabularieswell-established (QB, SKOS)under development (DDI-RDF)

evaluation

evaluation

IJSC, 10(2)ICSC 2016

33 SPARQL endpoints

Page 55: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

55

future work: validation database and framework

maintain and extend RDF validation databasecollect case studies and use casesextract requirementspublish constraint typeskeep framework in syncevaluate solutions

future work

http://purl.org/net/rdf-validation

Page 56: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

56

future work: combine framework with SHACL

derive SHACL extensions define mappings from SHACL to the abstraction layer and backmaintain consistency of implementations of constraint types

future work

W3C RDF Data Shapes Working Group

DCMI RDF ApplicationProfiles Task Group

Page 57: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

57

summary of main contributions

development of 3 RDF vocabulariesdirect validation of XML using common RDF validation toolspublication of 81 constraint typesvalidation framework for RDF-based constraint languagesrole of reasoning for RDF validation

THANK YOU!

Page 58: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

58

acknowledgements, publications, research data

30 publications6 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports1. author of all (except 1) journal articles, conference articles, workshop articles

research data and resultsKIT research data repository: http://dx.doi.org/10.5445/BWDD/11GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis

4 international working groupsDCMI RDF Application Profiles Task Group

part of the editorial board

RDF Vocabularies Working Groupeditor for DDI-RDF and PHDD

W3C RDF Data Shapes Working GroupDDI Moving Forward Project

THANK YOU!

Page 59: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu59

appendix

Page 60: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

60

publications: journal articles

1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Development of Constraint Languages by Checking Constraints on RDF Data. International Journal of Semantic Computing, 10(02), 1–25. http://www.worldscientific.com/worldscinet/ijsc

2. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4

3. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4

4. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4

5. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4

6. Bosch, Thomas & Mathiak, B. (2013). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760

Please note that in 2015, my last name changed from Bosch to Hartmann.

Page 61: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

61

publications: articles in conference proceedings

1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/

2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368

3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867

4. Bosch, Thomas & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257

5. Bosch, Thomas & Eckert, K. (2014). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/270

Please note that in 2015, my last name changed from Bosch to Hartmann.

Page 62: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

62

publications: articles in conference proceedings

6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654

7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34

8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html

9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html

Please note that in 2015, my last name changed from Bosch to Hartmann.

Page 63: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

63

publications: articles in workshop proceedings

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/

2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings

3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/

Page 64: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

64

publications: specifications

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery

2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html

Page 65: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

65

publications: technical reports

Please note that in 2015, my last name changed from Bosch to Hartmann.

1. Hartmann, Thomas (2016). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062

2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02

3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements

4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable

5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933

Page 66: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

66

publications: technical reports

Please note that in 2015, my last name changed from Bosch to Hartmann.

6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479

7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478

8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470

9. Bosch, Thomas & Mathiak, B. (2013). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/

10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series

Page 67: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

67

research questions

1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies?

2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed?

3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?

4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another?

5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality?

appendix

Page 68: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

68

summary of contributions

1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies

2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains

3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints

4.1 Consistent validation across RDF-based constraint languages4.2 Minimal representation of constraints of any type4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of

the same constraint type across RDF-based constraint languages4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be

transformed from one RDF-based constraint language to another5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type

(1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics

6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality

appendix

Page 69: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

69

summary of limitations

1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way2. Constraints of supported constraint types and constraint language constructs must be representable in RDF3. Constraint languages and supported constraint types must be expressible in SPARQL4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies

appendix

Page 70: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu70

research question 1

Page 71: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

71

Which types of research data and related metadata are not yet representable in RDF and

how to adequately model them to be able to validate RDF data

against constraints extractable from these vocabularies?

research question 1

RQ1

IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46

LDOW (WWW 2013)SemStats (ISWC 2013)

DC 2012ESWC 2011 (Poster)

DDI MovingForward Project

RDF Vocabularies Working Group

Page 72: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

72

development of 3 RDF vocabularies:

1. DDI-RDF Discovery Vocabulary (DDI-RDF)to describe unit-record data

2. Physical Data Description (PHDD)to describe data in tabular format and its physical properties

3. The SKOS Extension for Statistics (XKOS)to describe the structure and textual properties of formal statistical classificationsto describe relations between classifications and concepts and among concepts

contribution

RQ1

Page 73: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu73

research question 2

Page 74: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

74

XML, XML Schema (XSD)RDF, Web Ontology Language (OWL)XML Schemas > OWL ontologiestime-consuming work designing domain ontologies from scratch by handreuse information contained in XML Schemas

designing OWL domain ontologies

RQ2

Page 75: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

75

How to directly validate XML data on semantically rich OWL axioms

using common RDF validation tools when XML Schemas, adequately representing particular domains,

have already been designed?

research question 2

RQ2

IJMSO, 8(3)ISWC 2012

ICITST 2011OCAS (ISWC 2011)

Page 76: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

76

sub-class relationshipsOWL hasValue restrictions on data propertiesOWL universal restrictions on object properties

semantically rich OWL axioms

<library> <book year="February 1890"> <author> <name>Arthur Conan Doyle</name> </author> <title>The Sign of the Four</title> </book></library>

Title ⊑ value.stringYear ⊑ value.integer

RQ2

Page 77: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

77

on formal logics based transformationsOWL axioms extracted out of XML Schemas

explicitlyimplicitly

formally underpin transformationsto formally define and model semantics in a semantically correct way

complete extraction of XML Schemas' structural informationXML can directly be validated against semantically rich OWL axiomsany XML Schema is convertible to OWLminimized effort designing OWL domain ontologies

contributions

IJMSO, 8(3)

RQ2

Page 78: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

78

ISWC 2012ICITST 2011

OCAS (ISWC 2011)

RQ2

Page 79: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

79

1. step of approachexecuted generic test cases created out of the XML Schema meta-modeltransformed XML Schemas of 6 XML standards

2. step of approachspecified SWRL rules for 3 OWL domain ontologies

verified hypothesisdetermined effort for traditional manual approachestimated effort for semi-automatic approachDDI-RDF serves as OWL domain ontology

The effort and the time needed to deliver high quality domain ontologies from scratch by reusing information of already existing XML Schemas is much less than creating domain ontologies completely manually and from the ground up.

evaluation

IJMSO, 8(3)

RQ2

Page 80: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu80

research question 5

Page 81: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

81

What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed

prior to validation to enhance data quality?

research question 5

RQ5

SEMANTiCS 2015

Page 82: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

82

What is the role reasoning plays in practical data validation?

research question 5-1

RQ5

Page 83: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

83

reasoning may resolve violations

Book ⊑ author.PersonBook(Huckleberry-Finn)author(Huckleberry-Finn, Mark-Twain)

→ Person(Mark-Twain)

RQ5

Page 84: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

84

reasoning may cause violations

Publication ⊑ publisher.Publisher∃Book(Huckleberry-Finn)

Book Publication⊑

RQ5

Page 85: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

85

reasoning solves redundency

Publication ⊑ publicationDate . xsd:date∃

Book ⊑ PublicationConference-Proceeding ⊑ PublicationJournal-Article ⊑ Publication

RQ5

Page 86: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

86

For which constraint types reasoning may be performed prior to validation to enhance data quality?

research question 5-2

RQ5

Page 87: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

87

> 2/5 of constraint typesproperty domains (R-25):

constraint types with reasoning

∃ author. ⊤ ⊑ Publicationauthor(Alices-Adventures-In-Wonderland, Lewis-Carroll)

→ rdf:type(Alices-Adventures-In-Wonderland, Publication)

RQ5

Page 88: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

88

< 3/5 of constraint typesliteral pattern matching (R-44):

constraint types without reasoning

RQ5

ISBN a rdfs:Datatype ; owl:equivalentClass [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ([ xsd:pattern "^\d{9}[\d|X]$" ])] .

Book ⊑ identifier.ISBN

Page 89: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

89

For which constraint types validation results differ(1) if the CWA or the OWA and

(2) if the UNA or the nUNA is assumed?

CWA dependent: 56.8%UNA dependent: 66.6%

research question 5-3

RQ5

Page 90: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

90

56.8% of constraint typesminimum qualified cardinality restrictions (R-75):

CWA dependent constraint types

RQ5

Book title. ⊑ ∃ ⊤

Page 91: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

91

disjoint classes (R-7):

CWA independent constraint types

RQ5

Book JournalArticle ⊓ ⊑ ⊥

Page 92: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

92

66.6% of constraint typesfunctional properties (R-57/65):

UNA dependent constraint types

RQ5

funct(title)

title(The-Adventures-of-Huckleberry-Finn, "The Adventures of Huckleberry Finn")

title(The-Adventures-of-Huckleberry-Finn, "Die Abenteuer des Huckleberry Finn")

Page 93: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

93

literal value comparison (R-43):

UNA independent constraint types

RQ5

birthDate(Albert-Einstein, "1955-04-18")deathDate(Albert-Einstein, "1879-03-14")

birthDate(Albert_Einstein, "1879-03-14")deathDate(Albert_Einstein, "1955-04-18")

owl:sameAs(Albert-Einstein, Albert_Einstein)

Page 94: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu94

evaluation

Page 95: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

95

collected, classified, and implemented 115 constraintsfrom vocabularies or domain experts

on 3 common vocabularieswell-established (QB, SKOS)under development (DDI-RDF)

evaluation

evaluation

IJSC, 10(2)ICSC 2016

33 SPARQL endpoints

Page 96: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

96

classification of constraint typesRDFS/OWL basedconstraint language basedSPARQL based

classification of constraintsinformationalwarningerror

evaluation

classification

Page 97: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

97

RDFS/OWL based

evaluation

classification of constraint types

:Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .

Page 98: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

98

constraint language based

evaluation

classification of constraint types

:Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}

Page 99: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

99

SPARQL based

evaluation

classification of constraint types

SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }

Page 100: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

100

C (constraints), CV (constraint violations)values in %

evaluation

finding 1

C CV

SPARQL 63.2 78.2

CL 34.7 21.8

RDFS/OWL 35.6 21.8

Page 101: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

101

C (constraints), CV (constraint violations)values in %

evaluation

finding 2

C CV

SPARQL 63.2 78.2

CL 34.7 21.8

RDFS/OWL 35.6 21.8

Page 102: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

102

C (constraints), CV (constraint violations)values in %

evaluation

finding 3

C CV

Info 42.3 31.3

Warning 18.7 62.7

Error 39.0 6.1

Page 103: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

www.kit.edu103

future work

Page 104: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

104

future work: RQ1

publication of RDF vocabulariesDDI Alliance specificationsW3C recommendation for DDI-RDF

DDI-Lifecycle MD (Model-Driven)new requirements based on experiences with DDI-RDFinternational working group: DDI Moving Forward Project individual contributions

formalize conceptual model (using UML 2)conceptualize and implement diverse model serializations (e.g., RDFS/OWL)

future work

Page 105: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

105

aligning PHDD and CSV on the WEBoverlap in the description of tabular data in CSV formatbroader scope of PHDD

description of tabular data with fixed record lengthdescription of tabular data with multiple records per case

evaluation for use in DDI-Lifecycle MD

future work: RQ1

future work

Page 106: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

106

future work: RQ2

bidirectional transformations from models of any meta-model to OWLgeneralize from XSD meta-model based unidirectional transformations from XSD models into OWL modelsenable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools

future work

Page 107: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

107

future work: validation database and framework

maintain and extend RDF validation databasecollect case studies and use casesextract requirementspublish constraint typeskeep framework in syncevaluate solutions

future work

http://purl.org/net/rdf-validation

Page 108: Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)

108

future work: combine framework with SHACL

derive SHACL extensions define mappings from SHACL to the abstraction layer and backmaintain consistency of implementations of constraint types

future work

W3C RDF Data Shapes Working Group

DCMI RDF ApplicationProfiles Task Group