dbpedia leipzig2014 csarasua_open
TRANSCRIPT
Cristina Sarasua Data Interlinking together with Crowd Workers 1Institute for Web Science and Technologies · Univ ersity of Koblenz-Landau, Germany
Data Interlinkingtogether with Crowd Workers
Cristina Sarasua
2nd DBpedia Community Meeting, Leipzig
Cristina Sarasua Data Interlinking together with Crowd Workers 2
Image: http://www.w3.org/DesignIssues/diagrams/lod/597992118v2_350x350_Back.jpg
Cristina Sarasua Data Interlinking together with Crowd Workers 3
Scenario for data interlinking
Music data integration
Cristina Sarasua Data Interlinking together with Crowd Workers 4
• A: Extending the description of resources� enabling richer queries
What for?
dbpediasong1
d1song1
owl:sameAs
dbpediaLeipzig
d1song1
o:wasPlayedIn
Cristina Sarasua Data Interlinking together with Crowd Workers 6
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band;a dbpedia-owl:Song;dc:title ``U.F.O.´´;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band;a dbpedia-owl:Song;dc:title ``U.F.O.´´;
D1
DBpedia
owl:sameAs ?
Cristina Sarasua Data Interlinking together with Crowd Workers 7
• Goal : typed link tocreate (e.g. owl:sameAs)
• Information to analyse(i.e. attribute-values)
• Decision criterion (e.g. levenshtein < 2)
automatic
Cristina Sarasua Data Interlinking together with Crowd Workers 8
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song ;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band ;prop:name ``U.F.O.´´;
dbpedia:U.F.O._(song a dbpedia-owl:Work;a dbpedia-owl:Song ;dc:title ``U.F.O.´´;prop:artist dbpedia:Coldplay;
dbpedia:UFO_(band) a dbpedia-owl:Band ;prop:name ``U.F.O.´´;
D1
DBpedia
Human toguide theprocess
owl:sameAs ?
Cristina Sarasua Data Interlinking together with Crowd Workers 9
d1:song1 a ma:AudioTrack;ma:title ``Soon´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;
d1:song1 a ma:AudioTrack;ma:title ``Soon´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;
dbpedia:Transatlantic_KK a dbpedia-owl:Work;a dbpedia-owl:Album;dc:title ``Soon´´;dbprop:artist dbpedia:Delorean_(band);
dbpedia:Soon_(Tanya_Tucker_song) a dbpedia-owl:Work ;a dbpedia-owl:MusicalWork;dc:title ``Soon´´;dbprop:artist dbpedia:Tanya_Tucker;
dbpedia:Transatlantic_KK a dbpedia-owl:Work;a dbpedia-owl:Album;dc:title ``Soon´´;dbprop:artist dbpedia:Delorean_(band);
dbpedia:Soon_(Tanya_Tucker_song) a dbpedia-owl:Work ;a dbpedia-owl:MusicalWork;dc:title ``Soon´´;dbprop:artist dbpedia:Tanya_Tucker;
D1
DBpedia
Human tocorrect
owl:sameAs ?
Cristina Sarasua Data Interlinking together with Crowd Workers 10
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
d1:song1 a ma:AudioTrack;ma:title ``UFO´´;ma:locator musicexample:s1896.mp3^^xsd:anyURI;ma:hasKeyword d1:colplay;
dbpedia:Leipzig a dbpedia-owl:Place;rdfs:label ``Leipzig´´;
dbpedia:Leipzig a dbpedia-owl:Place;rdfs:label ``Leipzig´´;
D1
DBpedia
Human tocrate
new links
o:wasPlayedIn?
Cristina Sarasua Data Interlinking together with Crowd Workers 11
• Creative and proactive• Listen / watch / search• Process / associate / more
complicated conclusionshuman
Cristina Sarasua Data Interlinking together with Crowd Workers 13
Crowd -powered data interlinking
• Building a system that– Combines algorithmic and human
computation– Systematically involves humans
via microtasks– Considers the aforementioned
typs of links– Schema- and instance-level links
Automaticinterlinking
Cristina Sarasua Data Interlinking together with Crowd Workers 14
It worked! quick, unexpensiveSee CrowdMAP [Sarasua et al., 2012]
Overview
Cristina Sarasua Data Interlinking together with Crowd Workers 16
A microtask
Challenge #1: It has to work with ANYONE
Challenge #2: We still want a data-independent solution
Cristina Sarasua Data Interlinking together with Crowd Workers 17
Picture: Icon made by Freepik from http://www.flaticon.com
Ongoing work
How toimprove?
Cristina Sarasua Data Interlinking together with Crowd Workers 18
Crowdsourcing approaches• Additional incentives to make them process
more links, faster (e.g. display #links left)• Let them explain others: write the argument
for the decision• Show similar link: decide by comparison
How to optimize the process ?
Cristina Sarasua Data Interlinking together with Crowd Workers 19
Crowdsourcing approaches• Additional incentives to make them process
more links, faster (e.g. display #links left)• Let them explain others: others: write the
argument for the decision• Show similar link: decide by comparison
How to optimize the process ?
Challenge #3: How to decide what is an analogous link here? (danger of bias?)
predicate rdf:type False positive / negative
Cristina Sarasua Data Interlinking together with Crowd Workers 20
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense
How to optimize the process ?
Cristina Sarasua Data Interlinking together with Crowd Workers 21
Data-oriented approaches• Test and instructing links: targeted selection
• Scheduled sequences of links to process: • Validate vs identify microtasks :
How to optimize the process ?
Challege #4: How to build that programmatically?
data analysis data + crowd data + expert
Difficult case, rare
Easy case, common
Cristina Sarasua Data Interlinking together with Crowd Workers 22
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense
How to optimize the process ?
Cristina Sarasua Data Interlinking together with Crowd Workers 23
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense• Validate vs identify microtasks
How to optimize the process ?(II)
Challege #5: How to predict how suitable a worker will be forprocessing a particular link?
Which features of links have influence in the prediction?
Previous cross-platformexperience (CrowdWorkCV)
See also [Sarasua et al., 2013]
Ranking a list of suitablelinks based on training links
Cristina Sarasua Data Interlinking together with Crowd Workers 24
Data-oriented approaches• Test and instructing links: targeted selection• Scheduled sequences of links to process: to
make more sense
How to optimize the process ?(II)
Challege #6: How should we assess a priori if (and to whatextent approx.) we need crowdsourcing for a particular pair
of data sets?
Cristina Sarasua Data Interlinking together with Crowd Workers 26
• Yes, microtask crowdsourcing allows you to involvehumans for processing lots of data, it is cost-effective and fast
• Research shows it is a feasible complement to datainterlinking algorithms
• BUT do not underestimate the microtasks management
Coming soon …http://github.com/criscod
Take-away messages
Cristina Sarasua Data Interlinking together with Crowd Workers 27
[Schmachtenberg et al., 2014]
Open question : wouldn ´t crowd -powereddata interlinking enrich this table ?
Cristina Sarasua Data Interlinking together with Crowd Workers 28Institute for Web Science and Technologies · Univ ersity of Koblenz-Landau, Germany
Thank you for your attention!
Contact:Cristina SarasuaInstitute for Web Science and TechnologiesUniversität Koblenz-Landau [email protected]
Cristina Sarasua Data Interlinking together with Crowd Workers 29
• Sarasua, C. Crowdsourced Interlinking on the Web of Data. In: 18th International Conference on Knowledge Engineering and Knowledge Management(EKAW). Doctoral Symposium. (2012)
• Sarasua, C., Simperl, E., Noy, N.F.: CrowdMAP: Crowdsourcing ontology alignment with microtasks. In: Proceedings of the 11th International Semantic Web Conference (ISWC). (2012)
• Sarasua, C. Thimm, M.: Microtask available, send us your CV! In: Proceedings of the International Workshop on Crowd Work and Human Computation(CrowdWork 2013). (2013)
• Max Schmachtenberg, Christian Bizer, Heiko Paulheim: Adoption of the Linked Data Best Practices in Different Topical Domains. 13th International Semantic Web Conference (ISWC2014) - RDB Track, Riva del Garda, Italy, October 2014
References