doctoral examination at the karlsruhe institute of technology (08.07.2016)
TRANSCRIPT
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu
Validation Frameworkfor RDF-based Constraint Languages
M.Sc. (TUM) Thomas Hartmann
Professor Dr. York Sure-VetterProfessor Dr. Kai Eckert (Stuttgart Media University)Professor Dr. Rudi StuderProfessor Dr. Andreas Geyer-Schulz
Disputation, 08.07.2016
2
enthusiasm for SW technologies
problem statement
3
common need for RDF Validation
problem statement
4
common needs of data practitioners2013: W3C RDF Validation Workshop2014: 2 international working groups on RDF validationconstraint languages
SPARQL Query Language for RDFSPARQL Inferencing Notation (SPIN)Web Ontology Language (OWL) Shape Expressions (ShEx)Resource Shapes (ReSh)Description Set Profiles (DSP)Shapes Constraint Language (SHACL)
none of these languages meets all requirements
RDF validation as research field
problem statement
W3C RDF Data Shapes Working Group
DCMI RDF ApplicationProfiles Task Group
5
Resource Description Framework (RDF)
5problem statement
6
constraints of running example
6problem statement
7
constraints of running example
7problem statement
8
constraints of running example
8problem statement
9
constraints of running example
9problem statement
10
constraints of running example
10problem statement
11
provide a basis for continued research RDF validationdevelopment of constraint languages
further development of constraint languages based on commonly approved requirements incorporate the findings into the working groups
thesis objectives
thesis objectives
www.kit.edu12
5 research questions
13
Which types of research data and related metadata are not yet representable in RDF and
how to adequately model them to be able to validate RDF data
against constraints extractable from these vocabularies?
research question 1
RQ1
IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46
LDOW (WWW 2013)SemStats (ISWC 2013)
DC 2012ESWC 2011 (Poster)
DDI MovingForward Project
RDF Vocabularies Working Group
14
How to directly validate XML data on semantically rich OWL axioms
using common RDF validation tools when XML Schemas, adequately representing particular domains,
have already been designed?
research question 2
RQ2
IJMSO, 8(3)ISWC 2012
ICITST 2011OCAS (ISWC 2011)
www.kit.edu15
research question 3
16
http://purl.org/net/rdf-validation
DC 2014RQ3
17RQ3
18RQ3
19RQ3
20RQ3
21
Which types of constraints must be expressible by constraint languages to meet
all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?
research question 3
RQ3
22
a constraint is instantiated from a constraint typeeach constraint type corresponds to a requirement
81 constraint types
types of constraints on RDF data
RQ3
www.kit.edu23
research question 4
24
ShEx:
ReSh:
SHACL:
:Book { :author @:Person{1, } }
:Book a rs:ResourceShape ; rs:property [ rs:propertyDefinition :author ; rs:valueShape :Person ; rs:occurs rs:One-or-many ; ] .
minimum qualified cardinality restrictions (R-75)
:BookShape a sh:Shape ; sh:scopeClass :Book ; sh:property [ sh:predicate :author ; sh:valueShape :PersonShape ; sh:minCount 1 ; ] . :PersonShape a sh:Shape ; sh:scopeClass :Person .
RQ4
25
SPARQL and SPIN:
CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE { ?subject a ?C1 ; ?predicate ?object . BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ). BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) . FILTER ( ?cardinality < ?minimumCardinality ) . FILTER ( ?minimumCardinality = 1 ) . FILTER ( ?C1 = :Book ) . FILTER ( ?C2 = :Person ) . FILTER ( ?predicate = :author ) . }
SELECT ( COUNT ( ?arg1 ) AS ?c ) WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . }
RQ4
minimum qualified cardinality restrictions (R-75)
26
minimum qualified cardinality restrictions (R-75)
OWL:
DSP:
:Book rdfs:subClassOf [ a owl:Restriction ; owl:minQualifiedCardinality 1 ; owl:onProperty :author ; owl:onClass :Person ] .
[ dsp:resourceClass :Book ; dsp:statementTemplate [ dsp:minOccur 1 ; dsp:property :author ; dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] .
RQ4
27
high-level constraint languages eitherlack an implementation orare based on different implementations
How to consistently validate RDF data against constraints of any constraint type
expressed in any RDF-based constraint language?
research question 4-1
RQ4
28
validation environment
constraint language implementation (SPIN mapping):
:MinimumQualifiedCardinalityRestrictions a spin:ConstructTemplate ; spin:body [ ... CONSTRUCT { ... } WHERE { ... } ... ] .
RQ4
29
validation process
RQ4
30RQ4
validation results
30
31
validation results
RQ4 31
32
validation results
RQ4 32
33
validation results
RQ4 33
34
validation results
RQ4 34
35
validation results
RQ4 35
36
validation results
RQ4 36
37
full implementations forall OWL 2 and DSP language constructsall constraint types expressible in OWL 2 and DSPmajor constraint types representable by ShEx and ReSh
RDF serialization for DSP
validation environment
http://purl.org/net/rdfval-demo
RQ4
38
http://purl.org/net/rdfval-demo
RQ4
39
constraints and constraint language constructs must be representable in RDF
constraint languages and supported constraint types must be expressible in SPARQL
limitations
RQ4
40
How to represent constraints of any constraint type and how to reduce the representation of constraints of any constraint type
to the absolute minimum?
research question 4-2
RQ4
DSP ReSh ShEx SHACL OWL 2 SPARQL
17.3 (14)
25.9 (21)
29.6 (24)
51.9 (42)
67.9 (55)
100.0(81)
41
intermediate abstraction layerbased on formal logicsenables to express any constraint typeenables straight-forward mappings from high-level constraint languagesreduces the representation of constraints to the absolute minimum
validation framework for RDF-based constraint languages
RQ4
42
conceptual model
DC 2015
RQ4
74%
26%
43RQ4 43
simple constraints
44
different validation results
RQ4
45
different validation results
RQ4 45
46
different validation results
RQ4 46
47
different validation results
RQ4 47
48
different validation results
RQ4 48
49
different validation results
RQ4 49
50
How to ensure for any constraint type that RDF data is consistently validated against
semantically equivalent constraints of the same constraint typeacross RDF-based constraint languages?
framework is solely based on the abstract definitions of constraint typesjust 1 SPIN mapping for each constraint type
research question 4-3
RQ4
51RQ4
semantically equivalent constraints
51
52
How to ensure for any constraint type that semantically equivalent constraints of the same constraint type
can be transformed from one RDF-based constraint language to another?
gc = mα (cα)
cβ = m'β (gc)
RQ4
research question 4-4
53
What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed
prior to validation to enhance data quality?
research question 5
RQ5
SEMANTiCS 2015
54
collected, classified, and implemented 115 constraintsfrom vocabularies or domain experts
on 3 common vocabularieswell-established (QB, SKOS)under development (DDI-RDF)
evaluation
evaluation
IJSC, 10(2)ICSC 2016
33 SPARQL endpoints
55
future work: validation database and framework
maintain and extend RDF validation databasecollect case studies and use casesextract requirementspublish constraint typeskeep framework in syncevaluate solutions
future work
http://purl.org/net/rdf-validation
56
future work: combine framework with SHACL
derive SHACL extensions define mappings from SHACL to the abstraction layer and backmaintain consistency of implementations of constraint types
future work
W3C RDF Data Shapes Working Group
DCMI RDF ApplicationProfiles Task Group
57
summary of main contributions
development of 3 RDF vocabulariesdirect validation of XML using common RDF validation toolspublication of 81 constraint typesvalidation framework for RDF-based constraint languagesrole of reasoning for RDF validation
THANK YOU!
58
acknowledgements, publications, research data
30 publications6 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports1. author of all (except 1) journal articles, conference articles, workshop articles
research data and resultsKIT research data repository: http://dx.doi.org/10.5445/BWDD/11GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis
4 international working groupsDCMI RDF Application Profiles Task Group
part of the editorial board
RDF Vocabularies Working Groupeditor for DDI-RDF and PHDD
W3C RDF Data Shapes Working GroupDDI Moving Forward Project
THANK YOU!
www.kit.edu59
appendix
60
publications: journal articles
1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Directing the Development of Constraint Languages by Checking Constraints on RDF Data. International Journal of Semantic Computing, 10(02), 1–25. http://www.worldscientific.com/worldscinet/ijsc
2. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4
3. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4
4. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4
5. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4
6. Bosch, Thomas & Mathiak, B. (2013). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760
Please note that in 2015, my last name changed from Bosch to Hartmann.
61
publications: articles in conference proceedings
1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/
2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368
3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867
4. Bosch, Thomas & Eckert, K. (2014). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257
5. Bosch, Thomas & Eckert, K. (2014). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/270
Please note that in 2015, my last name changed from Bosch to Hartmann.
62
publications: articles in conference proceedings
6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654
7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34
8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html
9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html
Please note that in 2015, my last name changed from Bosch to Hartmann.
63
publications: articles in workshop proceedings
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/
2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings
3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/
64
publications: specifications
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery
2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html
65
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Hartmann, Thomas (2016). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062
2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02
3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements
4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable
5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933
66
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479
7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478
8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470
9. Bosch, Thomas & Mathiak, B. (2013). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/
10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series
67
research questions
1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies?
2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed?
3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?
4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another?
5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality?
appendix
68
summary of contributions
1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies
2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains
3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints
4.1 Consistent validation across RDF-based constraint languages4.2 Minimal representation of constraints of any type4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of
the same constraint type across RDF-based constraint languages4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be
transformed from one RDF-based constraint language to another5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type
(1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics
6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality
appendix
69
summary of limitations
1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way2. Constraints of supported constraint types and constraint language constructs must be representable in RDF3. Constraint languages and supported constraint types must be expressible in SPARQL4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies
appendix
www.kit.edu70
research question 1
71
Which types of research data and related metadata are not yet representable in RDF and
how to adequately model them to be able to validate RDF data
against constraints extractable from these vocabularies?
research question 1
RQ1
IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46
LDOW (WWW 2013)SemStats (ISWC 2013)
DC 2012ESWC 2011 (Poster)
DDI MovingForward Project
RDF Vocabularies Working Group
72
development of 3 RDF vocabularies:
1. DDI-RDF Discovery Vocabulary (DDI-RDF)to describe unit-record data
2. Physical Data Description (PHDD)to describe data in tabular format and its physical properties
3. The SKOS Extension for Statistics (XKOS)to describe the structure and textual properties of formal statistical classificationsto describe relations between classifications and concepts and among concepts
contribution
RQ1
www.kit.edu73
research question 2
74
XML, XML Schema (XSD)RDF, Web Ontology Language (OWL)XML Schemas > OWL ontologiestime-consuming work designing domain ontologies from scratch by handreuse information contained in XML Schemas
designing OWL domain ontologies
RQ2
75
How to directly validate XML data on semantically rich OWL axioms
using common RDF validation tools when XML Schemas, adequately representing particular domains,
have already been designed?
research question 2
RQ2
IJMSO, 8(3)ISWC 2012
ICITST 2011OCAS (ISWC 2011)
76
sub-class relationshipsOWL hasValue restrictions on data propertiesOWL universal restrictions on object properties
semantically rich OWL axioms
<library> <book year="February 1890"> <author> <name>Arthur Conan Doyle</name> </author> <title>The Sign of the Four</title> </book></library>
Title ⊑ value.stringYear ⊑ value.integer
RQ2
77
on formal logics based transformationsOWL axioms extracted out of XML Schemas
explicitlyimplicitly
formally underpin transformationsto formally define and model semantics in a semantically correct way
complete extraction of XML Schemas' structural informationXML can directly be validated against semantically rich OWL axiomsany XML Schema is convertible to OWLminimized effort designing OWL domain ontologies
contributions
IJMSO, 8(3)
RQ2
78
ISWC 2012ICITST 2011
OCAS (ISWC 2011)
RQ2
79
1. step of approachexecuted generic test cases created out of the XML Schema meta-modeltransformed XML Schemas of 6 XML standards
2. step of approachspecified SWRL rules for 3 OWL domain ontologies
verified hypothesisdetermined effort for traditional manual approachestimated effort for semi-automatic approachDDI-RDF serves as OWL domain ontology
The effort and the time needed to deliver high quality domain ontologies from scratch by reusing information of already existing XML Schemas is much less than creating domain ontologies completely manually and from the ground up.
evaluation
IJMSO, 8(3)
RQ2
www.kit.edu80
research question 5
81
What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed
prior to validation to enhance data quality?
research question 5
RQ5
SEMANTiCS 2015
82
What is the role reasoning plays in practical data validation?
research question 5-1
RQ5
83
reasoning may resolve violations
Book ⊑ author.PersonBook(Huckleberry-Finn)author(Huckleberry-Finn, Mark-Twain)
→ Person(Mark-Twain)
RQ5
84
reasoning may cause violations
Publication ⊑ publisher.Publisher∃Book(Huckleberry-Finn)
Book Publication⊑
RQ5
85
reasoning solves redundency
Publication ⊑ publicationDate . xsd:date∃
Book ⊑ PublicationConference-Proceeding ⊑ PublicationJournal-Article ⊑ Publication
RQ5
86
For which constraint types reasoning may be performed prior to validation to enhance data quality?
research question 5-2
RQ5
87
> 2/5 of constraint typesproperty domains (R-25):
constraint types with reasoning
∃ author. ⊤ ⊑ Publicationauthor(Alices-Adventures-In-Wonderland, Lewis-Carroll)
→ rdf:type(Alices-Adventures-In-Wonderland, Publication)
RQ5
88
< 3/5 of constraint typesliteral pattern matching (R-44):
constraint types without reasoning
RQ5
ISBN a rdfs:Datatype ; owl:equivalentClass [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ([ xsd:pattern "^\d{9}[\d|X]$" ])] .
Book ⊑ identifier.ISBN
89
For which constraint types validation results differ(1) if the CWA or the OWA and
(2) if the UNA or the nUNA is assumed?
CWA dependent: 56.8%UNA dependent: 66.6%
research question 5-3
RQ5
90
56.8% of constraint typesminimum qualified cardinality restrictions (R-75):
CWA dependent constraint types
RQ5
Book title. ⊑ ∃ ⊤
91
disjoint classes (R-7):
CWA independent constraint types
RQ5
Book JournalArticle ⊓ ⊑ ⊥
92
66.6% of constraint typesfunctional properties (R-57/65):
UNA dependent constraint types
RQ5
funct(title)
title(The-Adventures-of-Huckleberry-Finn, "The Adventures of Huckleberry Finn")
title(The-Adventures-of-Huckleberry-Finn, "Die Abenteuer des Huckleberry Finn")
93
literal value comparison (R-43):
UNA independent constraint types
RQ5
birthDate(Albert-Einstein, "1955-04-18")deathDate(Albert-Einstein, "1879-03-14")
birthDate(Albert_Einstein, "1879-03-14")deathDate(Albert_Einstein, "1955-04-18")
owl:sameAs(Albert-Einstein, Albert_Einstein)
www.kit.edu94
evaluation
95
collected, classified, and implemented 115 constraintsfrom vocabularies or domain experts
on 3 common vocabularieswell-established (QB, SKOS)under development (DDI-RDF)
evaluation
evaluation
IJSC, 10(2)ICSC 2016
33 SPARQL endpoints
96
classification of constraint typesRDFS/OWL basedconstraint language basedSPARQL based
classification of constraintsinformationalwarningerror
evaluation
classification
97
RDFS/OWL based
evaluation
classification of constraint types
:Publication rdfs:subClassOf [ a owl:Restriction ; owl:onProperty :author ; owl:allValuesFrom :Person ] .
98
constraint language based
evaluation
classification of constraint types
:Publication { ( :isbn xsd:string, :title xsd:string ) | ( :issn xsd:string, :title xsd:string )}
99
SPARQL based
evaluation
classification of constraint types
SELECT ?concept WHERE { ?concept a [ rdfs:subClassOf* skos:Concept ] . FILTER NOT EXISTS { ?concept ?p ?o . FILTER ( ?p IN ( skos:related, skos:relatedMatch, skos:broader, ... ) ) . } }
100
C (constraints), CV (constraint violations)values in %
evaluation
finding 1
C CV
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
101
C (constraints), CV (constraint violations)values in %
evaluation
finding 2
C CV
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
102
C (constraints), CV (constraint violations)values in %
evaluation
finding 3
C CV
Info 42.3 31.3
Warning 18.7 62.7
Error 39.0 6.1
www.kit.edu103
future work
104
future work: RQ1
publication of RDF vocabulariesDDI Alliance specificationsW3C recommendation for DDI-RDF
DDI-Lifecycle MD (Model-Driven)new requirements based on experiences with DDI-RDFinternational working group: DDI Moving Forward Project individual contributions
formalize conceptual model (using UML 2)conceptualize and implement diverse model serializations (e.g., RDFS/OWL)
future work
105
aligning PHDD and CSV on the WEBoverlap in the description of tabular data in CSV formatbroader scope of PHDD
description of tabular data with fixed record lengthdescription of tabular data with multiple records per case
evaluation for use in DDI-Lifecycle MD
future work: RQ1
future work
106
future work: RQ2
bidirectional transformations from models of any meta-model to OWLgeneralize from XSD meta-model based unidirectional transformations from XSD models into OWL modelsenable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools
future work
107
future work: validation database and framework
maintain and extend RDF validation databasecollect case studies and use casesextract requirementspublish constraint typeskeep framework in syncevaluate solutions
future work
http://purl.org/net/rdf-validation
108
future work: combine framework with SHACL
derive SHACL extensions define mappings from SHACL to the abstraction layer and backmaintain consistency of implementations of constraint types
future work
W3C RDF Data Shapes Working Group
DCMI RDF ApplicationProfiles Task Group