Download - Linked Data Quality Assessment: A Survey
![Page 1: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/1.jpg)
Data Quality Assessment for Linked Data: A Survey
Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Sören Auer
1Data Quality Tutorial, September 12, 2016
![Page 2: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/2.jpg)
OutlineSurvey Methodology
LDQ Dimensions and Metrics
LDQ Assessment Tools
LDQ In Practice
2
![Page 3: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/3.jpg)
OutlineSurvey Methodology
LDQ Dimensions and Metrics
LDQ Assessment Tools
LDQ In Practice
3
![Page 4: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/4.jpg)
Survey Methodology — Steps IRelated Surveys
Research Questions
Eligibility Criteria
Search Strategy
Title & Abstract Reviewing
4
![Page 5: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/5.jpg)
Survey Methodology — Research Questions• How can one assess the quality of Linked Data employing a
conceptual framework integrating prior approaches?
• What are the data quality problems that each approach assesses?
• Which are the data quality dimensions and metrics supported by the proposed approaches?
• What kinds of tools are available for data quality assessment?
5
![Page 6: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/6.jpg)
Survey Methodology — Eligibility CriteriaInclusion criteria:
Must satisfy:
• published between 2002 and 2014.
Should satisfy:
• data quality assessment
• trust assessment
• proposed and/or implemented an approach
• assessed the quality of LD or information systems based on LD
Exclusion criteria:
• not peer-reviewed
• published as a poster abstract
• data quality management
• other forms of structured data
• did not propose any methodology or framework
6
![Page 7: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/7.jpg)
Survey Methodology — StepsRemove duplicates
Further potential articles
Compare short- listed articles
Quantitative analysis
Qualitative analysis
7
![Page 8: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/8.jpg)
Survey Methodology — Results
8
30 core articles
Conference - 21
Journal - 8
Masters Thesis - 1
18 Dimensions
69 Metrics
![Page 9: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/9.jpg)
OutlineSurvey Methodology
LDQ Dimensions and Metrics
LDQ Assessment Tools
LDQ In Practice
9
![Page 10: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/10.jpg)
LDQ Dimensions & Metrics• Data Quality: commonly conceived as a multi-dimensional
construct with a popular definition ‘fitness for use’*.
• Dimension: characteristics of a dataset.
• Metric: or indicator is a procedure for measuring an information quality dimension.
10
*Juran et al., The Quality Control Handbook, 1974
![Page 11: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/11.jpg)
18 LDQ Dimensions
11
![Page 12: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/12.jpg)
LDQ Dimensions - Accessibility dimensions & metrics• Availability - extent to which data (or some portion of it) is present, obtainable and
ready for use
• accessibility of the SPARQL endpoint and the server
• dereferenceability of the URI
• Interlinking - degree to which entities that represent the same concept are linked to each other, be it within or between two or more data sources
• detection of the existence and usage of external URIs
• detection of all local in-links or back-links: all triples from a dataset that have the resource’s URI as the object
12
![Page 13: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/13.jpg)
LDQ Dimensions - Representational dimensions & metrics• Interoperability - degree to which the format and structure of the information conforms to
previously returned information as well as data from other sources
• detection of whether existing terms from all relevant vocabularies for that particular domain have been reused
• usage of existing vocabularies for a particular domain
• Interpretability - refers to technical aspects of the data, that is, whether information is represented using an appropriate notation and whether the machine is able to process the data
• detection of invalid usage of undefined classes and properties
• detecting the use of appropriate language, symbols, units, datatypes and clear definitions
13
![Page 14: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/14.jpg)
LDQ Dimensions - Intrinsic dimensions & metrics• Syntactic Validity - degree to which an RDF document conforms to
the specification of the serialization format
• detecting syntax errors using (i) validators, (ii) via crowdsourcing
• by (i) use of explicit definition of the allowed values for a datatype, (ii) syntactic rules (type of characters allowed and/or the pattern of literal values)
14
![Page 15: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/15.jpg)
LDQ Dimensions - Intrinsic dimensions & metrics• Completeness
• Schema - ontology completeness
• no. of classes and properties represented / total no. of classes and properties
• Property - missing values for a specific property
• no. of values represented for a specific property / total no. of values for a specific property
• Population - % of all real-world objects of a particular type
• Interlinking - degree to which instances in the dataset are interlinked
15
![Page 16: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/16.jpg)
LDQ Dimensions - Contextual dimensions & metrics• Understandability - refers to the ease with which data can be comprehended
without ambiguity and be used by a human information consumer
• human-readable labelling of classes, properties and entities as well as presence of metadata
• indication of the vocabularies used in the dataset
• Timeliness - measures how up-to-date data is relative to a specific task
• freshness of datasets based on currency and volatility
• freshness of datasets based on their data source
16
![Page 17: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/17.jpg)
OutlineSurvey Methodology
LDQ Dimensions and Metrics
LDQ Assessment Tools
LDQ In Practice
17
![Page 18: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/18.jpg)
LDQ Assessment Tools
18
![Page 19: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/19.jpg)
LDQ Assessment Tools - RDFUnit
http://aksw.org/Projects/RDFUnit.html 19
Syntactic Validity
Semantic Accuracy
Consistency
![Page 20: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/20.jpg)
LDQ Assessment Tools - Dacura
http://dacura.cs.tcd.ie/about-dacura/ 20
Interpretability
Semantic Accuracy
Consistency
![Page 21: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/21.jpg)
OutlineSurvey Methodology
LDQ Dimensions and Metrics
LDQ Assessment Tools
LDQ In Practice
21
![Page 22: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/22.jpg)
Linked Data Quality — In Practice
22
Linked Data Quality
Methodologies
Tools
Use Cases
Beyond Data
Vocabulary
![Page 23: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/23.jpg)
23
Crowdsourcing Linked Data Quality Assessment
![Page 24: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/24.jpg)
LDQ Assessment Tools — Luzzu
http://eis-bonn.github.io/Luzzu/index.html 24
2 Assess
3 Clean
4 Store5 Rank
1 Metric
![Page 25: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/25.jpg)
LDQ Assessment Tools — LODLaundromat
http://lodlaundromat.org/25
![Page 26: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/26.jpg)
LDQ Use Cases — Open Data Portals
26
Automated Quality Assessment of Metadata across Open Data Portals. Neumaier et. al., JDIQ 2016.
Completeness Interoperability
Relevancy Accuracy
Openness
![Page 27: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/27.jpg)
LDQ Beyond Data — Mapping Quality
27
Dimou et al. Assessing and Refining Mappings to RDF to Improve Dataset Quality. ISWC 2015.
https://github.com/RMLio/RML-Validator
![Page 29: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/29.jpg)
W3C Data Quality Vocabulary
29https://www.w3.org/TR/vocab-dqv/
dqv:Category
dqv:Dimension
dqv:Metric
dqv:QualityMeasurementqb:Observation
dqv:QualityMeasurementDatasetqb:DataSet dqv:inDimension
dqv:inCategory
dqv:isMeasurementOfdqv:hasQuality Measurement
![Page 30: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/30.jpg)
Challenges• Propagation of errors
• Management/Improvement
• Usage of the standard vocabulary
• Quality-based search engines
30
![Page 31: Linked Data Quality Assessment: A Survey](https://reader034.vdocuments.us/reader034/viewer/2022042723/5879f18d1a28ab70298b4c89/html5/thumbnails/31.jpg)
Thank you!Questions?
[email protected] @AmrapaliZ
Quality assessment for linked data: A survey A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer Semantic Web 7 (1), 63-93