joke daems [email protected] supervised by: lieve macken, sonia vandepitte, robert hartsuiker two...
TRANSCRIPT
Joke [email protected]
www.lt3.ugent.be/en/projects/robot
Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker
Two sides of the same coinassessing translation quality through
adequacy and acceptability error analysis
What makes error analysis so complicated?
“There are some errors for all types of distinctions, but the most problematic
distinctions were for adequacy/fluency and seriousness.”
– Stymne & Ahrenberg, 2012
Does a problem concern adequacy, fluency, both, neither?
How do we determine the seriousness of an error?
Two types of quality
“Whereas adherence to source norms determines a translation's adequacy as
compared to the source text, subscription to norms originating in the target culture
determines its acceptability.”- Toury, 1995
Why mix?
Subcategories
Acceptability
Grammar & Syntax
Lexicon
Spelling & typos
Style & register
Coherence
Adequacy
Contradiction
Deletion
Addition
Word sense
Meaning shift
Acceptability: fine-grainedGrammar & Syntax Lexicon Spelling & Typos Style & Register Coherence
article wrong preposition capitalization register conjunction
comparative/superlative wrong collocation spelling mistake untranslated missing info
singular/plural word nonexistent compound repetition logical problem
verb form punctuation disfluent paragraph
article-noun agreement typo short sentences inconsistency
noun-adj agreement long sentence coherence - other
subject-verb agreement text type
reference style – other
missing
superfluous
word order
structure
grammar – other
Adequacy: fine-grained
Meaning shift
contradiction meaning shift caused by misplaced word
word sense disambiguation deletion
hyponymy addition
hyperonymy explicitation
terminology coherence
quantity inconsistent terminology
time other
meaning shift caused by punctuation
How serious is an error?
“Different thresholds exist for major, minor and critical errors. These should be flexible,
depending on the content type, end-user profile and perishability of the content.”
- TAUS, error typology guidelines, 2013
Give different weights to error categories depending on text type & translation brief
Application example: comparative analysis
wrong collocation
word sense
deletion
punctuation
other meaning shift
0% 2% 4% 6% 8% 10% 12%
Top HT problems newspaper articles
punctuationother meaning shift
compoundtypo
word sensewrong collocation
0% 2% 4% 6% 8% 10% 12%
Top PE problems newspaper articles
wrong collocationuntranslated
other meaning shiftcompound
logical problemterminology
0% 2% 4% 6% 8%10%
12%14%
16%18%
20%
Top HT problems technical texts
other meaning shift
untranslated
article
logical problem
terminology
compound
0% 2% 4% 6% 8% 10%12%14%16%18%
Top PE problemstechnical texts
Next step:diagnostic & comparative evaluation
• What makes a ST-passage problematic?• How problematic is this passage really? (i.e.:
how many translators make errors)• Which PE errors are caused by MT?• Which MT errors are hardest to solve?
Link all errors to corresponding ST-passage
Source text-related error sets
• ST: Changes in the environment that are sweeping the planet...• MT: Veranderingen in de omgeving die het vegen van de
planeet tot stand brengen... (wrong word sense) "Changes in the environment that bring about the brushing of the planet..."
• PE1: Veranderingen in de omgeving die het evenwicht op de planeet verstoren... (other type of meaning shift) "Changes in the environment that disturb the balance on the planet..."
• PE2: Veranderingen in de omgeving die over de planeet rasen... (wrong collocation + spelling mistake) "Changes in the environment that raige over the planet..."
Application example: impact of MT errors on PE
0
5
10
15
20
25
30
Top 10 MT errors newspaper articles
compound
terminology
article
logical p
roblem
other meaning sh
ift
deletion
structu
re
verb fo
rm
missing co
nstituent
word ord
er0
5
10
15
20
25
30
Top 10 MT errors technical texts
Summary
• Improve error analysis by:
– judging acceptability and adequacy separately
– making error weights depend on translation brief
– having more than one annotator
– introducing consolidation phase
• Improve diagnostic and comparative evaluation by:
– linking errors to ST-passages
– taking number of translators into account
Open questions
• How can we reduce annotation time?– Ways of automating (part) of the process?– Limit annotation to subset of errors?
• How to better implement ST-related error sets?– Ways of automatically aligning ST, MT, and various
TT’s at word-level?
Quantification of ST-related error sets
ST
MT (1)
MT1(0.5)
wrong word sense (0.5)
MT2 (0.5)
PE (1)
PE1 (0.5)
other meaning shift
(0.5)
PE2(0.5)
wrong collocation
(0.25)
spelling mistake (0.25)
Inter-annotator agreementHT&PEacceptability HT&PE adequacy MT acceptability MT adequacy
Exp1 Exp2 Exp1 Exp2 Exp1 Exp2 Exp1 Exp2
Initial agreement
39% (κ=0.32)
50%(κ=0.44)
42% (κ=0.31)
46%(κ=0.30)
53% (κ=0.49)
79%(κ=0.77)
57% (κ=0.46)
51%(κ=0.41)
Agreement after consolidation
67% (κ=0.65)
81%(κ=0.80)
82% (κ=0.79)
94%(κ=0.92)
84% (κ=0.83)
95%(κ=0.94)
94% (κ=0.92)
86%(κ=0.83)
Correlation between annotators
r=0.67, n=38, p<0.001
r=0.95, n=34, p<0.001
r=0.87, n=38, p<0.001
r=0.86, n=34, p<0.001
n/a n/a n/a n/a
Agreement on categories
90% (κ=0.89)
89%(κ=0.88)
89% (κ=0.87)
88%(κ=0.83)
83% (κ=0.81)
93%(κ=0.93)
86% (κ=0.79)
86%(κ=0.82)