translation by collaboration among monolingual users
DESCRIPTION
Translation by Collaboration among Monolingual Users. Benjamin B. Bederson www.cs.umd.edu/~bederson @ bederson Computer Science Department Human-Computer Interaction Lab Institute for Advanced Computer Studies iSchool University of Maryland. Social Participant. Computational - PowerPoint PPT PresentationTRANSCRIPT
Translation by Collaboration among Monolingual UsersBenjamin B. Bederson
www.cs.umd.edu/~bederson@bederson
Computer Science DepartmentHuman-Computer Interaction Lab
Institute for Advanced Computer StudiesiSchool
University of Maryland
Programmer User Social Participant
Computational Participant
Human Computation
ThingsHUMANS
can do
ThingsCOMPUTERS
can do
TranslationPhoto tagging
Face recognitionHuman detection
Speech recognitionText analysis
Planning
Human Computation Taxonomy
SocialComputing
Data Mining
Collective Intelligence
Crowdsourcing
HumanComputation
The problem of translation
Source: Global Reach, Internet World Stats
Languages on Internet by Population
English28%
Chinese23%
Spanish8%
Japanese5%
the rest37%
2009
English32%
Chinese21%
Spanish8%
Japanese8%
the rest31%
2005
English52%
Chinese5%
Spanish5%
Japanese9%
the rest29%
2000
A real-world problem
International Children’s Digital Library
www.childrenslibrary.org
A real-world problem: ICDL
Now:– ~5,000 books– 55 languages– Some translations in a few
languages– 3,000 volunteer translators– 100K unique visitors/month
Goal:– 10,000 books– 100 languages– Every book in every
language!
www.childrenslibrary.org
The space of solutions
Machine Translation (MT)
Large volume, cheap, fast Unreliable quality
Professional Translators
High quality, but slow and expensive(even for common language pairs)
Amateur Translators
Online Labor Markets
The key idea
Translation with the Crowd
vs. 1,200,000 contributors Wikipedia: 900 translators
Translate with the Monolingual Crowd
Quality
Spee
d / A
fford
abili
tyMachineTranslation
Professional Bilingual Human Participation
Amateur Bilingual Human Participation
MonolingualHumanParticipation
Monolingual collaboration
Target LanguageMT
repeat …
Source Language
Original Sentence Translation Candidate
CrowdTasks:
1 Vote
2 Identify translation errors
3 Create new translationcandidates
1 Vote
3 Paraphrase source sentence
2 Explain errors
CrowdTasks:
New candidate
12
3
MT and
word alignment…
MT andword alignment
Explanation
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Mary
Sees: In general, it means well, both.MT
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
MT
MT
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Edits into: En général, nous nous entendons bien.
(lit. In general, we get along well.)
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
Sees: In general, we get along fine.
MT
MT
MT
enrichment
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Edits into: En général, nous nous entendons bien.
(lit. In general, we get along well.)
Sees: En général, nous sommes de bons amis.(lit. In general, we are good friends.)
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
Sees: In general, we get along fine.
Edits into: In general, we are good friends.
MT
MT
MT
MT
enrichment
PierreSays: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.)
Sees: En général, Il est à la fois de nous.
Edits into: En général, nous nous entendons bien.
(lit. In general, we get along well.)
Sees: En général, nous sommes de bons amis.(lit. In general, we are good friends.)
Proposes to stop with current translation
Mary
Sees: In general, it means well, both.
Edits into: In general, it is about both of us.
Sees: In general, we get along fine.
Edits into: In general, we are good friends.
Agrees to stop with current translation
MT
MT
MT
MT
enrichment
Target Side - Vote
Target Side - Identify Errors
Target Side - Edit Translations
Source Side – Explain Errors
Source Side – Vote & Confirm
What we’ve accomplished so far
Experiment 1• 60 Spanish / 22 German speakers• ICDL volunteers• Worked on
– 4 Spanish books => German– 1 German book => Spanish
TranslateTheWorld.org
Evaluation• 2 German-Spanish bilingual evaluators• Fluency and adequacy: 5-point score• Compared Google Translate and MonoTrans2
Results - Fluency
1 2 3 4 50
25
50
75
100
125
150
Google MonoTrans2
# of
sent
ence
s
Results - Fluency
1 2 3 4 50
25
50
75
100
125
150
Google MonoTrans2
# of
sent
ence
s
Results - Accuracy
1 2 3 4 50
25
50
75
100
125
150
Google MonoTrans2
# of
Sen
tenc
es
Results - Accuracy
1 2 3 4 50
25
50
75
100
125
150
Google MonoTrans2
# of
Sen
tenc
es
Punchline
Google MonoTrans2Sentences with fluency = 5 21 112Sentences with accuracy = 5 17 118Sentences where BOTH = 5 17 110
Sentences for which both bilingual evaluators agree score = 5
(N=162 sentences worked on in the experiment)
Straight MT: 10% of sentences ready for prime time
MonoTrans2: 68% of sentences ready for prime time
Experiment 2
• An alternative use case for crowdsourced translation… Fanmi mwen nan Kafou, 24
Cote Plage, 41A bezwen manje ak dlo
Moun kwense nan Sakre Kè nan Pòtoprens
Ti ekipman Lopital General genyen yo paka minm fè 24 è
Fanm gen tranche pou fè yon pitit nan Delmas 31
Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.
My family in Carrefour, 24 Cote Plage, 41A needs food and water
People trapped in Sacred Heart Church, PauP
General Hospital has less than 24 hrs. supplies
Undergoing children delivery Delmas 31
Experiment 2
• An alternative use case for crowdsourced translation…
Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.
TranslateTheWorld.org
Fluency Distribution
Adequacy Distribution
Punchline
Google MonoTrans2Sentences with fluency = 5 1 (1%) 22 (30%)Sentences with adequacy = 5 11 (14%) 29 (38%)Sentences where BOTH = 5 0 (0%) 14 (18%)
Sentences for which both bilingual evaluators agree score = 5
(N=76 sentences completed)
Straight MT: 0% of sentences preserve all the meaning
MonoTrans2: 38% of sentences preserve all the meaning
Scaling Up
Live for one week:• 137,000 page views• 1,900 task submissions• 19 secs per task
Example
Copying is the sincerest form of flattery…
Toward a more general architecture
Joining forces with Chris Callison-Burch, Johns Hopkins University
Take-aways
• By combining – machine translation technology– human-computer interfaces– Crowdsourcing
it is possible to achieve accurate translation without bilingual human expertise.
Participating Students:
Chang HuCS Ph.D. student
Alex QuinnCS Ph.D. student
Vlad EidelmanCS Ph.D. student
Yakov KronrodLinguistics Ph.D. student
Olivia BuzekCS/Linguistics undergrad
New Paradigms…
Human Comp.
Comp. Ling.
HCI
TranslateTheWorld.org
Philip ResnikProfessor
LinguisticsInstitute of Advanced
Computer Studies