improve your kantanmt machine translation engine

42
3 Steps to Improve Quality No Hardware. No Software. No Hassle MT

Upload: kantanmt

Post on 01-Dec-2014

464 views

Category:

Technology


2 download

DESCRIPTION

A guide to improving the quality of KantanMT machine translation engines using a methodology developed by KantanMT.

TRANSCRIPT

  • 1. No Hardware. No Software. No Hassle MT.

2. With KantanMT3 Steps to Improve Quality 3. What we aim to cover today? How to improve the quality of your MT Engine? A Build Measure Learn process How do we measure and quantify Quality in MT? Practical illustrations throughout of KantanMT in action Questions & Answers 3 Steps to Improve Quality 4. 3-Steps to Higher Quality Evolutionary Process Not a once off step Continuous improvement loop Incremental Improvements over time GIGO Build Build BuildKantan MT EngineGarbage In => Garbage OutProduction Engine At least three iterations Experimentation with different inputs Measurements of different outputs Control your own destiny LearnMeasure3 Steps to Improve Quality 5. Build Building quality training streams Training Data Bad Training Data How KantanMT learns to translate Mimic your style, terminology, fluency Garbage In => Garbage OutThree main factors: Quality Relevance to domain Quantity3 Steps to Improve Quality 6. Build Building quality training streams Training Data - Three main factors: Quality The linguistic quality of the training material is crucially importantRelevance to domain A high quality MT system has good domain knowledge Similar to the way youve always worked with Translation Memories and CAT toolsQuantity The more training data you use to build your engine the better its capacity to generate translations that mimic your translation style and terminology3 Steps to Improve Quality 7. Build Building quality training streams Balancing the equationQuality3 Steps to Improve Quality 8. Build is Quantity important? Not if quality is good its a balancing act!Quality3 Steps to Improve Quality 9. Build Building quality training streams Quality Training Data - Suitable Sources * KantanMT Stock EnginesLanguage Base Data* Translation MemoriesTranslation sources* Other Translation MemoriesDomain Base DataMonolingual DataTraining Data* Monolingual (target only) data3 Steps to Improve Quality 10. Build Building quality training streams Advantages of clean, high-quality training data Less correction of errors Finding cause of errors is easier Easy to fill gaps Faster processing time Large volume of dirty make correction difficult Finding root cause of problem challenging Slower training and processing time 3 Steps to Improve Quality 11. Build in KantanMT3 Steps to Improve Quality 12. Build Measure - Learn BuildKantan MT EngineLearnMeasure Measure3 Steps to Improve Quality 13. Measure KantanMT engine calibration What to Measure?BLEUF-MeasureWord-countsTER3 Steps to Improve Quality 14. Measure KantanMT engine calibration BLEU Score Scoring system developed to automate this process of evaluation Internationally recognised and most widely used measure of the quality of your MT engine The BLEU metric scores a translation on a scale of 0 to 100% The closer to 100%, the more the translation correlates to a human translation AIM: HIGH3 Steps to Improve Quality 15. Measure KantanMT engine calibration F-Measure Score F-Measure is an automated measurement to determine the precision and recall capabilities A general guide to determine the overall quality performance of an engine Ratio between recall and precision measurements Displayed as a percentage value on a scale of 0 to 100% AIM: HIGH3 Steps to Improve Quality 16. Measure KantanMT engine calibration TER Score A method to help in predict the post-editing effort TER is quick to use and correlates highly with actual post-editing effort A TER score is a value in the range of 0 to 100% AIM: LOW 3 Steps to Improve Quality 17. Measure KantanMT engine calibration Word-counts At least 1.5-2.0 million words to build a predictable, quality KantanMT engine Less than 2m words - then the engine has to be used only in a narrow field-domain Wide field-domain engine then you would need in the order of 10-15m words of training data3 Steps to Improve Quality 18. Measure KantanMT engine calibration Track your scores using KantanWatch3 Steps to Improve Quality 19. Measure KantanMT engine calibration Compare Engine Scores & Performance3 Steps to Improve Quality 20. Build Measure Learn BuildKantan MT EngineLearn LearnMeasure Measure3 Steps to Improve Quality 21. Learn KantanMT Experimentation3 Steps to Improve Quality 22. Learn KantanMT Experimentation3 Steps to Improve Quality 23. Learn KantanMT Experimentation3 Steps to Improve Quality 24. Learn KantanMT Experimentation Running and learning from your first translation job BLEU 24%F-Measure 50%TER 66%Wordcount 172K3 Steps to Improve Quality 25. Learn KantanMT Experimentation Learn from examining the output LowHighLowCatalog Errors OKUntranslated text Incorrect numeric formatting Invalid characters High level of post-editing requiredConclusions Engine coverage is bad due to low wordcount Post-Editing is high due to low engine coverage Training data doesnt contain correct numeric formatting Bad formatting in training data3 Steps to Improve Quality 26. Learn KantanMT Experimentation Learn from examining the output LowOKHighLowAction Plan Coverage More training data required, relevant and of high quality. Also use a Glossary File to improve terminology consistency and accuracy. Numeric Formatting Use PEX rule to post-edit translation and fix numeric formats Invalid Character Use PEX rule to fix this invalid character issue Post-Editing By increasing the quantity of training data the KantanMT engine will perform better overall3 Steps to Improve Quality 27. Build Action Plan Action Plan Coverage More training data required, relevant and of high quality Post-Editing By increasing the quantity of training data the KantanMT engine will perform better overall3 Steps to Improve Quality 28. Measure Action Plan Your latest scores are3 Steps to Improve Quality 29. Measure Action Plan Results using more relevant, high quality Training Data BLEUF-Measure64% ExcellentTER63%33%Very GoodVery GoodWordcount 479K GoodPreviouslyLowOKHighLow3 Steps to Improve Quality 30. Learn/Build Action Plan Customise your Engine Runtime customisation improves Quality too!PEX Rules Kantan MT EngineHigher Quality Machine TranslationReduced PostEditingTBX Files3 Steps to Improve Quality 31. Learn/Build Action Plan Action Plan Coverage Use a Glossary File to improve terminology consistency and accuracy Numeric Formatting Use PEX rule to post-edit translation and fix numeric formats Invalid Character Use PEX rule to fix this invalid character issue3 Steps to Improve Quality 32. Learn/Build Action PlanPEX fileOriginal output Action Plan Coverage Use a Glossary File to improve terminology consistency and accuracy Numeric Formatting Use PEX rule to post-edit translation and fix numeric formats Invalid Character Use PEX rule to fix this invalid character issue3 Steps to Improve Quality 33. Build Measure Learn The Results Analyse outputUntranslated text Numeric Formatting Invalid Character IMPROVED QUALITY3 Steps to Improve Quality 34. Build Measure Learn BuildKantan MT EngineLearn Learn Human Post-Editing as part of the Learn step Take the KantanMT output Post-Edit it by a Linguist Re-build the KantanMT Engine Rapidly improves Quality of your KantanMT Engine Measure3 Steps to Improve Quality 35. Learn Action Plan Post-Editing Feedback - Rapidly improves your KantanMT Engine.Kantan MT EngineXLIFF, TMX Machine TranslationPost-EditingKantan MT EngineHigher Quality Rebuild KantanMT EngineFinalised Publishable XLIFF, TMX3 Steps to Improve Quality 36. Summary Build-Measure-Learn3 Steps to Improve Quality 37. Summary Build-Measure-Learn3 Steps to Improve Quality 38. Summary Build-Measure-Learn Who can do this?YOU CAN! 3 Steps to Improve Quality 39. Summary Build-Measure-Learn You as a LSP or Language Professional provide: Extensive Language expertise Skills to ensure accuracy and precision of your translation Management / maintenance of TMs for your clients for use in your CAT tools KantanMT provides: Software and the Hardware to Build your enginesQuality metrics to Measure the quality of your engineTools and Process to Learn and then teach your engine Support and Help 3 Steps to Improve Quality 40. Summary Build-Measure-Learn Follow this Build Measure Learn process KantanMT will increase Productivity Process more words per hour per day Net result? Higher Earnings More Income Better Margins 3 Steps to Improve Quality 41. Questions & AnswersThank you! 3 Steps to Improve Quality 42. Additional information For additional information please visit: http://www.kantanmt.comContact me at: Kevin McCoy E-mail: [email protected] Mobile: +353 86 823 1527 3 Steps to Improve Quality