crowdsourcing tasks in linked data management
Post on 08-May-2015
1.822 Views
Preview:
DESCRIPTION
TRANSCRIPT
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
Institute of Applied Informatics and Formal Description Methods (AIFB) Institute of Applied Informatics and Formal Description Methods (AIFB)
Crowdsourcing tasks in Linked Data managementElena Simperl,1 Barry Norton,2 Denny Vrandecic1
1Institute AIFB, Karlsruhe Institute of Technology, Germany2Ontotext AD, Bulgaria
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
2 11.04.2023
Motivation
Various aspects of Linked Data management naturally rely on human intelligence to yield optimal results
But reaching a critical mass of useful contributions from all relevant stakeholders is still more an art than an engineering exercise
Seminar - Die Rolle von Ontologien in Linked Data – Kickoff
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
3 11.04.2023
Microtask platforms
Define taskBreak task into smaller
units
Evaluate the results
Seminar - Die Rolle von Ontologien in Linked Data – Kickoff
optimization
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
4 11.04.2023
Approach
Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs
Integral part of Linked Data tools and applicationsAt design time application developer specifies which data portions workers can process and via which types of HITs
At run time The system materializes the data
Workers process it
Data and application are updated to reflect crowdsourcing results
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
5 11.04.2023
Examples of Linked Data tasks amenable to crowdsourcing
Identity resolution
Metadata completion and checking/correction
Classification
Ordering
Quantitative
Qualitative
Translation
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
6 11.04.2023
Running Example
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
7 11.04.2023
Identity resolution
Identity Resolution “involves the creation of sameAs links, either by comparison of metadata or by investigation of links on the human Web.”
{?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along}
Input:
{OPTIONAL {?airport owl:sameAs ?station}}
Output:
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
8 11.04.2023
Metadata completion & correction
“Certain properties, necessary for a given query, may not be uniformly populated. Manually conducted research might be necessary to transfer this information from the human-readable Web”
{?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long; dbp:icao ?badicao}
Input:
{?station dbp:icao ?goodicao}Output:
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
9 11.04.2023
Classification
“Linked Data emphasis[es…] relationships between resources [over classification]. [D]ue to the promoted use of generic vocabularies, is it not always possible to infer classification from […] properties”
{?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long}
Input:
{?station a ?type. ?type rdfs:subClassOf metar:Station}
Output:
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
10 11.04.2023
Ordering
“Having means to rank Linked Data content along specific dimensions is typically deemed useful for querying and browsing […both] “specific” ordering [(e.g. timestamps) … and] orderings […] via “less straightforward” built-ins [(e.g. pref/alt labels)]”
{?station foaf:depiction ?x, ?y}
Input:
{{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}
Output:
quantitative
qualitative
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
11 11.04.2023
Translation
“[An important] aspect of the labeling of resources for humans is multi-linguality […] actual provision of labels in non-English languages is currently rather low”
{?station rdfs:label ?enlabel. FILTER (LANG(?label) = "EN")}
Input:
{?station rdfs:label ?bglabel. FILTER (LANG(?label) = "BG")}
Output:
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
12 11.04.2023
Open query answering
Query a FOAF-file using the vCard vocabulary
hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ;
foaf:nick "Harry" ; foaf:familyName "Potter" .
SELECT ?name ?email WHERE
{ ?p vcard:email ?email ; vcard:fn ?name }
In order to answer the query as intendedVocabulary mapping and entity resolution (foaf to vcard)
Metadata completion (full name is Harry Potter)
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
13 11.04.2023
Limitations of microtask crowdsourcing
Decomposability
Verifiability
Expertise
Compositions to deal with tasks with underspecified workflow and/or multiple correct answers
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
14 11.04.2023
Challenges
Decomposition of user-visible queries:SPARQL
Easy: Low quality (meta)data can be subject to automated checking (even if not fixing)
Medium: Missing data (and translation) can be automatically identified (but knowing to which dataset it should belong is not necessarily clear)
Difficult:
Interlinking (at least sameAs) is somewhat implicit (using entailment) and knowing where user expects
Query optimisation obfuscates what is used and should involve costs for human tasks
Pig might be somewhat easier in latter regard
CachingNaively we can materialise HIT results into datasets
How to deal with partial coverage and dynamic datasetsCrowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
15 11.04.2023
Further Challenges
Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components
Optimal user interfaces of graph-like content(Contextual) Rendering of LOD entities and tasks
Pricing and workers’ assignment Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs?
Dealing with spam / gaming
Crowdsourcing tasks in Linked Data management
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
16 11.04.2023
QUESTIONS
Crowdsourcing tasks in Linked Data management
top related