continuous operational evaluation of evolving proprietary mt solution’s adequacy
DESCRIPTION
May 26 th 2014. Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy. Ekaterina Stambolieva e [email protected]. Outline. Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work. WHY?. impending industry problem:. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/1.jpg)
May 26th 2014
Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy
Ekaterina [email protected]
![Page 2: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/2.jpg)
Why?Why?
MT Adequacy?MT Adequacy?
What?What?
EvaluationEvaluation
FindingsFindings
Conclusion & Future WorkConclusion & Future Work
Outline
![Page 3: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/3.jpg)
impending industry problem:
WHY?
MTE, May 26th 2014
![Page 4: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/4.jpg)
impending industry problem:
WHY?
MTE, May 26th 2014
How do we compare MT systems over time?
![Page 5: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/5.jpg)
impending industry problem:
We measure MT quality continuously
WHY?
MTE, May 26th 2014
How do we compare MT systems over time?
![Page 6: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/6.jpg)
impending industry problem:
We measure MT quality continuously
WHY?
MTE, May 26th 2014
How do we compare MT systems over time?
BLEU?
![Page 7: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/7.jpg)
impending industry problem:
We measure MT quality continuously
WHY?
MTE, May 26th 2014
How do we compare MT systems over time?
BLEU?We want adequate
translations
![Page 8: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/8.jpg)
Why?Why?
MT Adequacy?MT Adequacy?
What?What?
EvaluationEvaluation
FindingsFindings
Conclusion & Future WorkConclusion & Future Work
Outline
![Page 9: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/9.jpg)
How do we define MT adequacy in business?
ADEQUACY
MTE, May 26th 2014
![Page 10: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/10.jpg)
How do we define MT adequacy in business?
ADEQUACY
MTE, May 26th 2014
accelerate time-to-deliveryreduce translation costsachieve near-native fluency
![Page 11: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/11.jpg)
adequacy
ADEQUACY
MTE, May 26th 2014
![Page 12: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/12.jpg)
adequacy
improving MT output’s acceptance for the task of post-editing
ADEQUACY
MTE, May 26th 2014
![Page 13: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/13.jpg)
We aim at evaluating our MT systems continuously and compare results over time
WHAT
MTE, May 26th 2014
![Page 14: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/14.jpg)
We aim at evaluating our MT systems continuously and compare results over time
We design our system’s improvements based on human end-user feedback
WHAT
MTE, May 26th 2014
![Page 15: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/15.jpg)
We aim at evaluating our MT systems continuously and compare results over time
We design our system’s improvements based on human end-user feedback
We do not directly evaluate translation quality, instead we assesses over-time MT output improvement
WHAT
MTE, May 26th 2014
![Page 16: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/16.jpg)
We aim at evaluating our MT systems continuously and compare results over time
We design our system’s improvements based on human end-user feedback
We do not directly evaluate translation quality, instead we assesses over-time MT output improvement
no annotation effort required
WHAT
MTE, May 26th 2014
![Page 17: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/17.jpg)
Why?Why?
MT Adequacy?MT Adequacy?
What?What?
EvaluationEvaluation
• Edit DistanceEdit Distance
FindingsFindings
Conclusion & Future WorkConclusion & Future Work
Outline
![Page 18: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/18.jpg)
We compare the results of 2 MT English<->Danish systems
THE EXAMPLE
MTE, May 26th 2014
![Page 19: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/19.jpg)
We compare the results of 2 MT English<->Danish systems
THE EXAMPLE
MTE, May 26th 2014
BLEU
1 2 EN->DA 59.22DA->EN 64.26
![Page 20: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/20.jpg)
We compare the results of 2 MT English<->Danish systems
THE EXAMPLE
MTE, May 26th 2014
BLEU
1 2 EN->DA 59.22 58.84DA->EN 64.26 63.98
![Page 21: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/21.jpg)
3 objective categories to evaluate MT output
– Does the MT output look better than before?
– Does the MT output look worse than before?
– Is it difficult for you to judge whether the MT output is better or not?
CATEGORIES
MTE, May 26th 2014
![Page 22: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/22.jpg)
We will present MT output evaluation based on the Edit Distance (ED) score
EVALUATION
MTE, May 26th 2014
![Page 23: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/23.jpg)
We will present MT output evaluation based on the Edit Distance (ED) score
EVALUATION
MTE, May 26th 2014
We compute in how many edits MT output transforms into the human
translation segment based on the same source
![Page 24: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/24.jpg)
Why?Why?
MT Adequacy?MT Adequacy?
What?What?
EvaluationEvaluation
FindingsFindings
Conclusion & Future WorkConclusion & Future Work
Outline
![Page 25: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/25.jpg)
new MTED
old MT ED
87.08 71.31
94.77 87.44
82.62 66.04
74.19 73.84
84.36 79.79
91.26 88.06
75.12 74.48
FINDINGS
MTE, May 26th 2014
![Page 26: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/26.jpg)
new MTED
old MT ED
87.08 71.31
94.77 87.44
82.62 66.04
74.19 73.84
84.36 79.79
91.26 88.06
75.12 74.48
FINDINGS
MTE, May 26th 2014
Y X N
Annotator 1 60% 36% 4%
Annotator 2 76% 16% 8%
Annotator 3 68% 24% 8%
![Page 27: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/27.jpg)
new MTED
old MT ED
87.08 71.31
94.77 87.44
82.62 66.04
74.19 73.84
84.36 79.79
91.26 88.06
75.12 74.48
FINDINGS
MTE, May 26th 2014
Improved MT acceptance
for the task of post-editing
![Page 28: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/28.jpg)
new MTED
old MT ED
87.08 71.31
94.77 87.44
82.62 66.04
74.19 73.84
84.36 79.79
91.26 88.06
75.12 74.48
FINDINGS
MTE, May 26th 2014
Length variance comparison
between MT output with the new and old
system does not affect MT acceptance
![Page 29: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/29.jpg)
Why?Why?
MT Adequacy?MT Adequacy?
What?What?
EvaluationEvaluation
FindingsFindings
Conclusion & Future WorkConclusion & Future Work
Outline
![Page 30: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/30.jpg)
Modify ED to take into consideration the number of UNK words
Modify the metric so that it detects small improvements in the system
– such as number isolation– tag protection
Take segment character length into consideration
– So not to penalize too much shorter segments
FUTURE WORK
MTE, May 26th 2014
![Page 31: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/31.jpg)
Modify ED to take into consideration the number of UNK words
Modify the metric so that it detects small improvements in the system
– such as number isolation– tag protection
Take segment character length into consideration
– So not to penalize too much shorter segments
FUTURE WORK
MTE, May 26th 2014
![Page 32: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/32.jpg)
Modify ED to take into consideration the number of UNK words
Modify the metric so that it detects small improvements in the system
– such as number isolation– tag protection
Take segment character length into consideration
– So not to penalize too much shorter segments
FUTURE WORK
MTE, May 26th 2014
![Page 33: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy](https://reader036.vdocuments.us/reader036/viewer/2022081517/56815b1b550346895dc8cb39/html5/thumbnails/33.jpg)
Thank you
MTE, May 26th 2014