deepminer - advanced leveraging :integrating translation memories and machine translation
DESCRIPTION
Presentation at TEKOM October 25th,2012 DeepMiner Integrating Translation Memories and Machine TranslationTRANSCRIPT
![Page 1: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/1.jpg)
DeepMiner Integrating Translation Memories and Machine Translation
TEKOM
October 25th, 2012
Presenter: Daniel Benito
![Page 2: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/2.jpg)
Introduction
• History
• Limitations of Translation Memory
• Beyond Segment-Level Reuse – Machine Translation
– Fuzzy Match Repair
– Advanced Leveraging
– Combining TM and MT
• Current Limitations
• Perspectives
• Conclusion
![Page 3: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/3.jpg)
History
• Past:
– 1950s – Early Machine Translation (MT) experiments
– 1960s – General awareness that Machine Translation (MT) was not going to replace human translators
– 1970s – First proposals for Translator Workstations
– 1990s – Translation Memory (TM) became viable
• Present:
– TM technology has barely advanced in the last ten years
– MT has advanced to the point where its applications in the translation industry are incontrovertible
![Page 4: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/4.jpg)
Limitations of Translation Memory
• Segment-level translation reuse is only useful in limited cases
• Even in highly repetitive texts, most of the repetitions happen at the sub-segment level:
– Terms and phrases
– Sentence structure
• Most Translation Memory systems are limited to providing fuzzy matches but are unable to exploit sub-segment repetition
![Page 5: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/5.jpg)
Beyond Segment-level Reuse
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• What can we do to reduce the time spent editing fuzzy matches?
– Ignore the fuzzy matches and use MT
– Automatically repair the fuzzy matches
![Page 6: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/6.jpg)
Machine Translation
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Results returned by various MT systems: DE: Die schwarze Katze in der Regel schläft im Flur.
DE: Die schwarze Katze schläft normalerweise im Flur.
• Achieving consistency and using specific terminology (e.g. Gang instead of Flur) will require some degree of training or post-editing
![Page 7: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/7.jpg)
Machine Translation
• General-purpose MT engines such as Google Translate or Microsoft Translator usually require extensive post-editing, but can be used for inspiration
• Rule-based and statistical MT engines customized for specific domains offer much higher quality but require expensive tuning or retraining
• It is usually more expensive to use MT than to manually edit a fuzzy match
![Page 8: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/8.jpg)
Fuzzy Match Repair
• Inspired by the translation by analogy concept from Example-Based Machine Translation (EBMT)
• Attempts to maintain the quality and consistency of existing translations in the TM while increasing productivity
![Page 9: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/9.jpg)
Fuzzy Match Repair
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• We can replace graue with schwarze and Wohnzimmer with Gang to produce an exact match.
![Page 10: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/10.jpg)
Fuzzy Match Repair
• Requires knowing the following translations: grey → graue
black → schwarze
living room → Wohnzimmer
hallway → Gang
• What do we do if those translations are not explicitly in our TMs or termbases?
![Page 11: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/11.jpg)
Advanced Leveraging
• Bilingual concordance search:
EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
EN: Mary has bought a new pair of grey running shoes.
DE: Maria hat ein neues Paar graue Laufschuhe gekauft.
EN: This article is also available in grey.
DE: Dieser Artikel ist auch in grau erhältlich.
![Page 12: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/12.jpg)
Advanced Leveraging
• Statistically infer translations from the TM
• Compare all of the German translations and suggest one or more probable translations (e.g. graue, grau)
• Requires:
– Large TMs with many examples
– Consistent translations in the TM
![Page 13: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/13.jpg)
Combining TM and MT
• We can use MT as an additional resource for finding the translations needed to repair fuzzy matches
• MT systems often give better results for terms and short phrases than for long sentences
• We approach this combination based on the following premises: – A client’s own data is considered to be of higher quality
and will always have priority over the Machine Translation results
– A fuzzy match repaired with Machine Translation will usually be better than a normal fuzzy match, and better than an MT result for an entire segment
![Page 14: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/14.jpg)
Combining TM and MT
• We need to translate: EN: The black cat usually sleeps in the hallway.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• Our termbase contains: EN: grey
DE: graue
EN: black
DE: schwarze
EN: hallway
DE: Gang
![Page 15: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/15.jpg)
Combining TM and MT
• We do not have the translation for living room in our TM or our termbase, so we can request it from the MT system:
EN: living room
DE: Wohnzimmer
• The combination of material in our TM, termbase and MT system allows to perform the appropriate replacements and obtain:
EN: The black cat usually sleeps in the hallway.
DE: Die schwarze Katze schläft gewöhnlich im Gang.
![Page 16: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/16.jpg)
Current Limitations
• We need to translate: EN: The white dog usually sleeps in the living room.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• Our termbase contains: EN: grey cat
DE: graue Katze
![Page 17: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/17.jpg)
Current Limitations
• Asking the MT system for the missing translation, we get:
EN: white dog
DE: weißer Hund
• The result of fixing the fuzzy match is: EN: The white dog usually sleeps in the living room.
DE: Die weißer Hund schläft gewöhnlich im Wohnzimmer.
• Some post-editing is still required
![Page 18: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/18.jpg)
Current Limitations
• We need to translate: EN: The grey cat often sleeps in the living room.
• Our TM contains: EN: The grey cat usually sleeps in the living room.
DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.
• The translations we get from the MT system are: EN: usually
DE: normalerweise
EN: often
DE: oft
• We cannot repair the fuzzy match because we do not know how usually has been translated
![Page 19: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/19.jpg)
Future Developments
• Greater integration with the MT engines
– Access to internal translation candidates: • EN: usually
• DE: normalerweise, gewöhnlich, sonst, ...
– Access to internal language models: • DE: Die weißer Hund – never
• DE: Der weiße Hund – often
– Automatic upload of new TM material to the MT engine so it can be used for retraining in the future
![Page 20: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/20.jpg)
Conclusion
• Traditional segment-level translation reuse has reached its full potential
• ATRIL’s Déjà Vu X2 already includes DeepMiner technology that improves productivity by cleverly combining all the approaches we described:
– (Statistical) Machine Translation
– Example-Based Machine Translation
– Advanced Leveraging (sub-segment matching)
![Page 21: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/21.jpg)
Questions?
![Page 22: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/22.jpg)
Additional Topics
![Page 23: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/23.jpg)
Predictive Typing
• Find all sub-segment matches and offer them to the translator as he or she types
• Suggestions are context-sensitive, so there are never too many results to choose from
• Translations are constructed piece by piece from previous texts, guided by the translator
![Page 24: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/24.jpg)
Advanced Predictive Typing
• Advanced Leveraging techniques for statistically inferring sub-segment translations from the TM can be adapted to provide additional predictive typing suggestions
• Translations from MT can be added to the predictive typing mechanism, to offer additional suggestions for translations of terms and phrases
![Page 25: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/25.jpg)
MT integrations in Déjà Vu X2
• Systran Entreprise Server
• Google Translate
• Microsoft Translator
• PROMT Translation Server
• itranslate4eu
![Page 26: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/26.jpg)
Systran Entreprise Server
![Page 27: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/27.jpg)
Google Translate
![Page 28: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/28.jpg)
Microsoft Translator
![Page 29: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/29.jpg)
PROMT Translation Server
![Page 30: DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation](https://reader033.vdocuments.us/reader033/viewer/2022051211/555cf474d8b42add648b4d1a/html5/thumbnails/30.jpg)
itranslate4eu