crowdtruth: machine-human computation for harnessing disagreement in semantic interpretation
DESCRIPTION
Presentation at ISWC2014, RDBS trackTRANSCRIPT
CrowdTruth.org
Machine-‐Human Computa7on for Harnessing Disagreement in
Seman7c Interpreta7on
Oana Inel, Khalid Khamkham, Ta0ana Cristea, Anca Dumitrache, Arne Rutjes , Jelle v.d Ploeg , Lukasz Romaszko, Lora Aroyo, Robert-‐Jan Sips
Importance of Human Annota7on
• Seman7c interpreta7on of data is needed in all sciences
• Humans analyze examples and annotate them for the “correct” interpreta7on
• Machines learn & are evaluated from those examples
Lora Aroyo @laroyo
HUMAN DISAGREEMENT IS ESSENTIAL IN HELPING MACHINES WITH SEMANTIC INTERPRETATION!
Lora Aroyo @laroyo
ANTIBIOTICS are the first line treatment for indications of TYPHUS. à 95% Patients with TYPHUS who were given ANTIBIOTICS exhibited side-effects. à 80% With ANTIBIOTICS in short supply, DDT was used during WWII to control the insect vectors of TYPHUS. à 50%
Does each sentence express the TREAT relation?
Lora Aroyo @laroyo
disagreement can reflect the degree of clarity in a sentence
GADOLINIUM agents are useful for patients with renal impairment, but in patients with severe renal failure requiring dialysis it presents a risk of nephrogenic systemic FIBROSIS.
CAUSE? or SIDE EFFECT?
What is the RELATION between the highlighted terms?
Lora Aroyo @laroyo
disagreement can indicate ambiguity of the rela7on
70% 45%
disagreement can indicate low quality workers
• S1: ANTIBIOTICS are the first line treatment for indications of TYPHUS. • S2: QUININE is not a reliable cure for MALARIA.
Does each sentence express the TREAT relation?
Lora Aroyo @laroyo
Worker S1 S2 Worker 1 yes no Worker 2 yes no Worker 3 yes Worker 4 no Worker 5 no yes
• Machine Pre-processing: op0mizing crowdsourcing • Micro-task Template Library: reuse & op0miza0on • CrowdTruth Analytics: disagreement-‐based metrics
• Novel approach to ground truth data collec0on & evalua0on • PROV for tracking versions of data and processing steps • Reusability in variety of annota7on tasks & domains with text,
image, video (thinking about sound)
CrowdTruth SoJware Components: Machines & Crowds Workflow
Lora Aroyo @laroyo
• Open source: hQps://github.com/CrowdTruth • Web service: hQp://stable.crowdtruth.org
Lora Aroyo @laroyo
CrowdTruth SoJware
• Open source: hQps://github.com/CrowdTruth • Web service: hQp://stable.crowdtruth.org
Lora Aroyo @laroyo
CrowdTruth SoJware: Crowdsourcing Job Analy7cs
• Open source: hQps://github.com/CrowdTruth • Web service: hQp://stable.crowdtruth.org
Lora Aroyo @laroyo
CrowdTruth SoJware: Worker Analy7cs
crowdtruth.org github.com/CrowdTruth
Lora Aroyo @laroyo