crowdsourcing linked data quality assessment

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

@ISWC2013

Crowdsourcing Linked Data Quality AssessmentMaribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer and Jens Lehmann

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

2 10.04.2023

Motivation

Acosta et al. – Crowdsourcing Linked Data Quality Assessment

Varying quality of Linked Data sources

Some quality issues require certain interpretation that can be easily performed by humans

Solution: Include human verification in the process of LD quality assessment

Direct application: Detecting pattern in errors may allow to identify (and correct) the extraction mechanisms

dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.


3 10.04.2023

Research questions

RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?

RQ2: What type of crowd is most suitable for each type of quality issue?

RQ3: Which types of errors are made by lay users and experts when assessing RDF triples?



4 10.04.2023

Related work


Crowdsourcing & Linked

Data

Web of data quality

assessment

Our work

ZenCrowd

Entity resolution

CrowdMAPOntology allignment

GWAP for LD

Assessing LD

mappings(Automatic)

Quality

characteristics of LD data sources

(Semi-automatic)

DBpedia

WIQA, Sieve,

(Manual)


5 10.04.2023

OUR APPROACH



6 10.04.2023

Methodology

Selecting LD quality issues to crowdsource

Selecting the appropriate crowdsourcing approaches

Designing and generating the interfaces to present the data to the crowd Acosta et al. – Crowdsourcing Linked Data Quality Assessment

1

2

3

Dataset

{s p o .}{s p o .}

Correct

Incorrect +Quality issue

Steps to implement the methodology

1

2

3


7 10.04.2023

Three categories of quality problems occur in DBpedia [Zaveri2013] and can be crowdsourced:

Incorrect object Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.

Incorrect data type or language tags Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島” @en.

Incorrect link to “external Web pages” Example: dbpedia:John-Two-Hawks dbpedia-owl:wikiPageExternalLink

<http://cedarlakedvd.com/>Acosta et al. – Crowdsourcing Linked Data Quality Assessment

Selecting LD quality issues to crowdsource

1


8 10.04.2023

Selecting appropriate crowdsourcing approaches (1)


2

ContestLD ExpertsDifficult taskFinal prize

Find Verify

MicrotasksWorkersEasy taskMicropayments

TripleCheckMate [Kontoskostas2013] MTurk

Adapted from [Bernstein2010]http://mturk.com


9 10.04.2023 Acosta et al. – Crowdsourcing Linked Data Quality Assessment

Presenting the data to the crowd

• Selection of foaf:name or rdfs:label to extract human-readable descriptions

• Values extracted automatically from Wikipedia infoboxes

• Link to the Wikipedia article via foaf:isPrimaryTopicOf

• Preview of external pages by implementing HTML iframe

Microtask interfaces: MTurk tasksIncorrect object

Incorrect data type or language tag

Incorrect outlink

3


10 10.04.2023

EXPERIMENTAL STUDY



11 10.04.2023

Experimental design

• Crowdsourcing approaches:• Find stage: Contest with LD experts

• Verify stage: Microtasks (5 assignments)

• Creation of a gold standard:• Two of the authors of this paper (MA, AZ) generated the

gold standard for all the triples obtained from the contest

• Each author independently evaluated the triples

• Conflicts were resolved via mutual agreement

• Metric: precision



12 10.04.2023

Overall results

LD Experts Microtask workers

Number of distinct participants

50 80

Total time3 weeks (predefined) 4 days

Total triples evaluated1,512 1,073

Total cost~ US$ 400 (predefined) ~ US$ 43

Maribel Acosta - Identifying DBpedia Quality Issues via Crowdsourcing


13 10.04.2023

Precision results: Incorrect object task• MTurk workers can be used to reduce the error rates of LD

experts for the Find stage

• 117 DBpedia triples had predicates related to dates with incorrect/incomplete values:

”2005 Six Nations Championship” Date 12 .

• 52 DBpedia triples had erroneous values from the source:

”English (programming language)” Influenced by ? .• Experts classified all these triples as incorrect

• Workers compared values against Wikipedia and successfully classified this triples as “correct”


Triples compared LD Experts MTurk (majority voting: n=5)

509 0.7151 0.8977


14 10.04.2023

Precision results: Incorrect data type task


Date

English Millimetre

Nanometre Num-ber

Number with dec-

imals

Second Volt Year Not speci-fied /

URI

0

20

40

60

80

100

120

140

Experts TP

Experts FP

Crowd TP

Crowd FP

Data types

Nu

mb

er o

f tr

iple

s

Triples compared LD Experts MTurk (majority voting: n=5)

341 0.8270 0.4752


15 10.04.2023

Precision results: Incorrect link task

• We analyzed the 189 misclassifications by the experts:

• The 6% misclassifications by the workers correspond to pages with a language different from English.


50%39%

11%

Freebase links

Wikipedia images

External links

Triples compared Baseline LD Experts MTurk (n=5 majority voting)

223 0.2598 0.1525 0.9412


16 10.04.2023

Final discussion

RQ1: Is it possible to detect quality issues in LD data sets via crowdsourcing mechanisms?

Both forms of crowdsourcing can be applied to detect certain LD quality issues

RQ2: What type of crowd is most suitable for each type of quality issue?

The effort of LD experts must be applied on those tasks demanding specific-domain skills. MTurk crowd was exceptionally good at performing data comparisons

RQ3: Which types of errors are made by lay users and experts?

Lay users do not have the skills to solve domain-specific tasks, while experts performance is very low on tasks that demand an extra effort (e.g., checking an external page)



17 10.04.2023

CONCLUSIONS & FUTURE WORK



18 10.04.2023

Conclusions & Future Work

A crowdsourcing methodology for LD quality assessment:

Find stage: LD experts

Verify stage: MTurk workers

Crowdsourcing approaches are feasible in detecting the studied quality issues

Application: Detecting pattern in errors to fix the extraction mechanisms

Future Work

Conducting new experiments (other quality issues and domains)

Integration of the crowd into curation processes and tools



19 10.04.2023

References & Acknowledgements

[Bernstein2010]

[Kontoskostas2013]

[Zaveri2013]


M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R.

Karger, D. Crowell, and K. Panovich. Soylent: a word processor with a crowd

inside. In Proceedings of the 23nd annual ACM symposium on User interface

software and technology, UIST ’10, pages 313–322, New York, NY, USA, 2010.

ACM.

A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality

assessment methodologies for linked open data. Under review,

http://www.semantic-web-journal.net/content/quality-assessment-

methodologies-linked-open-data.

D Kontokostas, A Zaveri, S Auer, J Lehmann. TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data . Knowledge Engineering and the Semantic Web, 2013


20 10.04.2023

QUESTIONS?


ContestLD ExpertsDifficult taskFinal prize

Find Verify

MicrotasksWorkersEasy taskMicropayments

TripleCheckMate MTurk

Incorrect object

Incorrect data type

Incorrect outlink

Object values

Data types Interlinks

Linked Data experts

0.7151 0.8270 0.1525

MTurk (majority voting)

0.8977 0.4752 0.9412

Results: Precision

ApproachMTurk tasks

crowdsourcing linked data quality assessment

Technology