linked data quality assessment: a survey

31
Data Quality Assessment for Linked Data: A Survey Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Sören Auer 1 Data Quality Tutorial, September 12, 2016

Upload: amrapali-zaveri-phd

Post on 14-Jan-2017

349 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Linked Data Quality Assessment: A Survey

Data Quality Assessment for Linked Data: A Survey

Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Sören Auer

1Data Quality Tutorial, September 12, 2016

Page 2: Linked Data Quality Assessment: A Survey

OutlineSurvey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice

2

Page 3: Linked Data Quality Assessment: A Survey

OutlineSurvey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice

3

Page 4: Linked Data Quality Assessment: A Survey

Survey Methodology — Steps IRelated Surveys

Research Questions

Eligibility Criteria

Search Strategy

Title & Abstract Reviewing

4

Page 5: Linked Data Quality Assessment: A Survey

Survey Methodology — Research Questions• How can one assess the quality of Linked Data employing a

conceptual framework integrating prior approaches?

• What are the data quality problems that each approach assesses?

• Which are the data quality dimensions and metrics supported by the proposed approaches?

• What kinds of tools are available for data quality assessment?

5

Page 6: Linked Data Quality Assessment: A Survey

Survey Methodology — Eligibility CriteriaInclusion criteria:

Must satisfy:

• published between 2002 and 2014.

Should satisfy:

• data quality assessment

• trust assessment

• proposed and/or implemented an approach

• assessed the quality of LD or information systems based on LD

Exclusion criteria:

• not peer-reviewed

• published as a poster abstract

• data quality management

• other forms of structured data

• did not propose any methodology or framework

6

Page 7: Linked Data Quality Assessment: A Survey

Survey Methodology — StepsRemove duplicates

Further potential articles

Compare short- listed articles

Quantitative analysis

Qualitative analysis

7

Page 8: Linked Data Quality Assessment: A Survey

Survey Methodology — Results

8

30 core articles

Conference - 21

Journal - 8

Masters Thesis - 1

18 Dimensions

69 Metrics

Page 9: Linked Data Quality Assessment: A Survey

OutlineSurvey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice

9

Page 10: Linked Data Quality Assessment: A Survey

LDQ Dimensions & Metrics• Data Quality: commonly conceived as a multi-dimensional

construct with a popular definition ‘fitness for use’*.

• Dimension: characteristics of a dataset.

• Metric: or indicator is a procedure for measuring an information quality dimension.

10

*Juran et al., The Quality Control Handbook, 1974

Page 11: Linked Data Quality Assessment: A Survey

18 LDQ Dimensions

11

Page 12: Linked Data Quality Assessment: A Survey

LDQ Dimensions - Accessibility dimensions & metrics• Availability - extent to which data (or some portion of it) is present, obtainable and

ready for use

• accessibility of the SPARQL endpoint and the server

• dereferenceability of the URI

• Interlinking - degree to which entities that represent the same concept are linked to each other, be it within or between two or more data sources

• detection of the existence and usage of external URIs

• detection of all local in-links or back-links: all triples from a dataset that have the resource’s URI as the object

12

Page 13: Linked Data Quality Assessment: A Survey

LDQ Dimensions - Representational dimensions & metrics• Interoperability - degree to which the format and structure of the information conforms to

previously returned information as well as data from other sources

• detection of whether existing terms from all relevant vocabularies for that particular domain have been reused

• usage of existing vocabularies for a particular domain

• Interpretability - refers to technical aspects of the data, that is, whether information is represented using an appropriate notation and whether the machine is able to process the data

• detection of invalid usage of undefined classes and properties

• detecting the use of appropriate language, symbols, units, datatypes and clear definitions

13

Page 14: Linked Data Quality Assessment: A Survey

LDQ Dimensions - Intrinsic dimensions & metrics• Syntactic Validity - degree to which an RDF document conforms to

the specification of the serialization format

• detecting syntax errors using (i) validators, (ii) via crowdsourcing

• by (i) use of explicit definition of the allowed values for a datatype, (ii) syntactic rules (type of characters allowed and/or the pattern of literal values)

14

Page 15: Linked Data Quality Assessment: A Survey

LDQ Dimensions - Intrinsic dimensions & metrics• Completeness

• Schema - ontology completeness

• no. of classes and properties represented / total no. of classes and properties

• Property - missing values for a specific property

• no. of values represented for a specific property / total no. of values for a specific property

• Population - % of all real-world objects of a particular type

• Interlinking - degree to which instances in the dataset are interlinked

15

Page 16: Linked Data Quality Assessment: A Survey

LDQ Dimensions - Contextual dimensions & metrics• Understandability - refers to the ease with which data can be comprehended

without ambiguity and be used by a human information consumer

• human-readable labelling of classes, properties and entities as well as presence of metadata

• indication of the vocabularies used in the dataset

• Timeliness - measures how up-to-date data is relative to a specific task

• freshness of datasets based on currency and volatility

• freshness of datasets based on their data source

16

Page 17: Linked Data Quality Assessment: A Survey

OutlineSurvey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice

17

Page 18: Linked Data Quality Assessment: A Survey

LDQ Assessment Tools

18

Page 19: Linked Data Quality Assessment: A Survey

LDQ Assessment Tools - RDFUnit

http://aksw.org/Projects/RDFUnit.html 19

Syntactic Validity

Semantic Accuracy

Consistency

Page 20: Linked Data Quality Assessment: A Survey

LDQ Assessment Tools - Dacura

http://dacura.cs.tcd.ie/about-dacura/ 20

Interpretability

Semantic Accuracy

Consistency

Page 21: Linked Data Quality Assessment: A Survey

OutlineSurvey Methodology

LDQ Dimensions and Metrics

LDQ Assessment Tools

LDQ In Practice

21

Page 22: Linked Data Quality Assessment: A Survey

Linked Data Quality — In Practice

22

Linked Data Quality

Methodologies

Tools

Use Cases

Beyond Data

Vocabulary

Page 23: Linked Data Quality Assessment: A Survey

23

Crowdsourcing Linked Data Quality Assessment

Page 24: Linked Data Quality Assessment: A Survey

LDQ Assessment Tools — Luzzu

http://eis-bonn.github.io/Luzzu/index.html 24

2 Assess

3 Clean

4 Store5 Rank

1 Metric

Page 25: Linked Data Quality Assessment: A Survey

LDQ Assessment Tools — LODLaundromat

http://lodlaundromat.org/25

Page 26: Linked Data Quality Assessment: A Survey

LDQ Use Cases — Open Data Portals

26

Automated Quality Assessment of Metadata across Open Data Portals. Neumaier et. al., JDIQ 2016.

Completeness Interoperability

Relevancy Accuracy

Openness

Page 27: Linked Data Quality Assessment: A Survey

LDQ Beyond Data — Mapping Quality

27

Dimou et al. Assessing and Refining Mappings to RDF to Improve Dataset Quality. ISWC 2015.

https://github.com/RMLio/RML-Validator

Page 28: Linked Data Quality Assessment: A Survey

28

W3C Data Quality Vocabularyhttps://www.w3.org/ TR/vocab-dqv/

Page 29: Linked Data Quality Assessment: A Survey

W3C Data Quality Vocabulary

29https://www.w3.org/TR/vocab-dqv/

dqv:Category

dqv:Dimension

dqv:Metric

dqv:QualityMeasurementqb:Observation

dqv:QualityMeasurementDatasetqb:DataSet dqv:inDimension

dqv:inCategory

dqv:isMeasurementOfdqv:hasQuality Measurement

Page 30: Linked Data Quality Assessment: A Survey

Challenges• Propagation of errors

• Management/Improvement

• Usage of the standard vocabulary

• Quality-based search engines

30

Page 31: Linked Data Quality Assessment: A Survey

Thank you!Questions?

[email protected] @AmrapaliZ

Quality assessment for linked data: A survey A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer Semantic Web 7 (1), 63-93