crowdsourcing tasks in linked data management

16
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Institute of Applied Informatics and Formal Description Methods (AIFB) Institute of Applied Informatics and Formal Description Methods (AIFB) Crowdsourcing tasks in Linked Data management Elena Simperl, 1 Barry Norton, 2 Denny Vrandecic 1 1 Institute AIFB, Karlsruhe Institute of Technology, Germany 2 Ontotext AD, Bulgaria

Upload: barry-norton

Post on 08-May-2015

1.822 views

Category:

Technology


0 download

DESCRIPTION

Talk delivered at the Consuming Linked Data Workshop, ISWC 2011

TRANSCRIPT

Page 1: Crowdsourcing tasks in Linked Data management

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

Institute of Applied Informatics and Formal Description Methods (AIFB)  Institute of Applied Informatics and Formal Description Methods (AIFB)

 

Crowdsourcing tasks in Linked Data managementElena Simperl,1 Barry Norton,2 Denny Vrandecic1

1Institute AIFB, Karlsruhe Institute of Technology, Germany2Ontotext AD, Bulgaria

Page 2: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

2 11.04.2023

Motivation

Various aspects of Linked Data management naturally rely on human intelligence to yield optimal results

But reaching a critical mass of useful contributions from all relevant stakeholders is still more an art than an engineering exercise

Seminar - Die Rolle von Ontologien in Linked Data – Kickoff

Page 3: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

3 11.04.2023

Microtask platforms

Define taskBreak task into smaller

units

Evaluate the results

Seminar - Die Rolle von Ontologien in Linked Data – Kickoff

optimization

Page 4: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

4 11.04.2023

Approach

Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs

Integral part of Linked Data tools and applicationsAt design time application developer specifies which data portions workers can process and via which types of HITs

At run time The system materializes the data

Workers process it

Data and application are updated to reflect crowdsourcing results

Crowdsourcing tasks in Linked Data management

Page 5: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

5 11.04.2023

Examples of Linked Data tasks amenable to crowdsourcing

Identity resolution

Metadata completion and checking/correction

Classification

Ordering

Quantitative

Qualitative

Translation

Crowdsourcing tasks in Linked Data management

Page 6: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

6 11.04.2023

Running Example

Crowdsourcing tasks in Linked Data management

Page 7: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

7 11.04.2023

Identity resolution

Identity Resolution “involves the creation of sameAs links, either by comparison of metadata or by investigation of links on the human Web.”

{?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along}

Input:

{OPTIONAL {?airport owl:sameAs ?station}}

Output:

Crowdsourcing tasks in Linked Data management

Page 8: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

8 11.04.2023

Metadata completion & correction

“Certain properties, necessary for a given query, may not be uniformly populated. Manually conducted research might be necessary to transfer this information from the human-readable Web”

{?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long; dbp:icao ?badicao}

Input:

{?station dbp:icao ?goodicao}Output:

Crowdsourcing tasks in Linked Data management

Page 9: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

9 11.04.2023

Classification

“Linked Data emphasis[es…] relationships between resources [over classification]. [D]ue to the promoted use of generic vocabularies, is it not always possible to infer classification from […] properties”

{?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long}

Input:

{?station a ?type. ?type rdfs:subClassOf metar:Station}

Output:

Crowdsourcing tasks in Linked Data management

Page 10: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

10 11.04.2023

Ordering

“Having means to rank Linked Data content along specific dimensions is typically deemed useful for querying and browsing […both] “specific” ordering [(e.g. timestamps) … and] orderings […] via “less straightforward” built-ins [(e.g. pref/alt labels)]”

{?station foaf:depiction ?x, ?y}

Input:

{{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}

Output:

quantitative

qualitative

Crowdsourcing tasks in Linked Data management

Page 11: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

11 11.04.2023

Translation

“[An important] aspect of the labeling of resources for humans is multi-linguality […] actual provision of labels in non-English languages is currently rather low”

{?station rdfs:label ?enlabel. FILTER (LANG(?label) = "EN")}

Input:

{?station rdfs:label ?bglabel. FILTER (LANG(?label) = "BG")}

Output:

Crowdsourcing tasks in Linked Data management

Page 12: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

12 11.04.2023

Open query answering

Query a FOAF-file using the vCard vocabulary

hp:Harry foaf:mbox <mailto:[email protected]> ;

foaf:nick "Harry" ; foaf:familyName "Potter" .

SELECT ?name ?email WHERE

{ ?p vcard:email ?email ; vcard:fn ?name }

In order to answer the query as intendedVocabulary mapping and entity resolution (foaf to vcard)

Metadata completion (full name is Harry Potter)

Crowdsourcing tasks in Linked Data management

Page 13: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

13 11.04.2023

Limitations of microtask crowdsourcing

Decomposability

Verifiability

Expertise

Compositions to deal with tasks with underspecified workflow and/or multiple correct answers

Crowdsourcing tasks in Linked Data management

Page 14: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

14 11.04.2023

Challenges

Decomposition of user-visible queries:SPARQL

Easy: Low quality (meta)data can be subject to automated checking (even if not fixing)

Medium: Missing data (and translation) can be automatically identified (but knowing to which dataset it should belong is not necessarily clear)

Difficult:

Interlinking (at least sameAs) is somewhat implicit (using entailment) and knowing where user expects

Query optimisation obfuscates what is used and should involve costs for human tasks

Pig might be somewhat easier in latter regard

CachingNaively we can materialise HIT results into datasets

How to deal with partial coverage and dynamic datasetsCrowdsourcing tasks in Linked Data management

Page 15: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

15 11.04.2023

Further Challenges

Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components

Optimal user interfaces of graph-like content(Contextual) Rendering of LOD entities and tasks

Pricing and workers’ assignment Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs?

Dealing with spam / gaming

Crowdsourcing tasks in Linked Data management

Page 16: Crowdsourcing tasks in Linked Data management

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

 

16 11.04.2023

QUESTIONS

Crowdsourcing tasks in Linked Data management