planning and executing a successful machine translation ... › wp-content › uploads › 2016 ›...
TRANSCRIPT
Copyright © 2016, Asia Online Pte LtdCopyright © 2016, Asia Online Pte Ltd
Planning and Executing a Successful
Machine Translation Project
Tim Cox, Sales Manager
Dion Wiggins, CTO
Contact: [email protected]
Copyright © 2016, Asia Online Pte Ltd
• Introduction to Language Studio™
• Qualifying when to build an MT engine
• Preparing to build your MT engine
• Error identification and quality improvement
methodologies
• Post editing process guidelines
• Measuring Quality and Return on Investment
Copyright © 2016, Asia Online Pte Ltd
LOWESTTotal Cost of Translation
HIGHESTReturn On Investment
FASTESTTime to Achieve ROI
LEASTTime & Effort to Publish
Copyright © 2016, Asia Online Pte Ltd
DIY & EXPERTGuided Professional Engines
NO DATANeeded to Start
EMPOWEREDComplete Control
FLEXIBLEBilling Models
Copyright © 2016, Asia Online Pte Ltd
INTEGRATEDInto a Broad Range of Platforms
SECUREYour Data is Your Data
SCALABLEBillions of Words Per Day
DEPLOYCloud and On-Site
Copyright © 2016, Asia Online Pte Ltd
Automotive Military and Defense
Banking and Finance News and Media
eDiscovery Patents and Legal
Engineering & Manufacturing Politics and Government
Information Technology Retail and eCommerce
Life Sciences Travel and Hospitality
Jump start your custom engines by building on top of our Industry Data
Copyright © 2016, Asia Online Pte Ltd
• Historical Sweet Spot– HIGH WORD VOLUMES– You have high volumes in a single domain– You have high volumes for a single customer– You have recurring smaller jobs in a domain – You have recurring smaller jobs for a customer– You expect multiple projects from a customer– You need data – lots and lots of data
• BASIC Customization– 50,000 words or more– Your data only
• PROFESSIONAL Customization– Data manufacturing– Your data is desirable, but not a requirement– No data needed – Highly specialized text
Copyright © 2016, Asia Online Pte Ltd
Language Pair Foundation Industry Foundation Client Data
+
Custom Engine
Language Studio Foundation Data
+
Sub-Domain Specific Data
Manufactured Data
+
Language Studio
MT Platform
Training Data
+
Training Ready to Translate
Language Studio
PowerTrain
Copyright © 2016, Asia Online Pte Ltd
BASIC Customization– Commodity Level Rapid Customization delivers a working
engine in just a few hours. – Do-It-Yourself (DIY) customization via an easy to use web
based user interface, with fully automated training.– Delivers typical productivity increases of 10-30% when data
provided is high quality, clean and suitable for the domain.– Easy upgrade path to PROFESSIONAL
PROFESSIONAL Customization– Expert Driven Comprehensive Customization with deep data
cleaning, data manufacturing, term management, gap analysis and adaptation for writing style and target audience.
– Language Studio Experts engage directly with you to understand your specific requirements and goals and tailor a Quality Improvement Plan to meet your specific needs.
– High quality specialized output that delivers typical productivity increases of 300%+ with rapid quality improvement for the highest ROI.
Copyright © 2016, Asia Online Pte Ltd
Some of the very technical segments
were better quality than what our
human translators were producing.
-- Simon Bratina
Executive Technical Director
300%+ Productivity Gain
50%+ Segments Require No Edits
Copyright © 2016, Asia Online Pte Ltd
Language Studio™ custom translation engines are customized to near-human quality
through a human-guided process that leverages a comprehensive range of
automated tools. Only though human cognition and understanding can the highest
quality granular custom engines be created for each client and purpose.
Copyright © 2016, Asia Online Pte Ltd
When developing an MT system for a particular purpose, this first iteration is called
a Diagnostic Engine. As the name would indicate, it is used to diagnose issues and
fix errors. It is usual for such immature engines to contain some error patterns.
The below diagram shows the 3 basic stages of MT engine development.
Copyright © 2016, Asia Online Pte Ltd
Copyright © 2016, Asia Online Pte Ltd
• BASIC– Translation Memories– Glossaries– Non-Translatable Terms– Target Language Documents
• PROFESSIONAL– Source Language Content– URLs– Key Terms– Style Guides
Copyright © 2016, Asia Online Pte Ltd
• BASIC– Extraction of data into clean text format
• Bilingual and monolingual
– Basic checks on content for poor quality
– Basic checks on content for suitability as training data
Copyright © 2016, Asia Online Pte Ltd
• PROFESSIONAL– Guided by Language Studio Expert– Unique Customized Quality Improvement Plan– Clean Data SMT Approach– Deep Data Cleaning– Data Manufacturing– Stylistic Control– Gap Analysis– Term Management
• Bilingual Term Creation• Normalization
Optimal quality is achieved
through a collaborative and
interactive process
between the client and
Language Studio Experts.
Copyright © 2016, Asia Online Pte Ltd
• Language Studio provides tools and processes via PROFESSIONAL Customization with Expert Guidance for all of the above models.
• Language Studio is the only MT provider than offers an option to customize with without any data at all.
• For high-quality a cooperative effort is necessary that involves language professionals (provided by the client), process and automated tools provide by Language Studio.
No Data A Little Data Sufficient Data Large Amounts of Data
Copyright © 2016, Asia Online Pte Ltd
Spanish Original SourceSe necesitó una gran maniobra
política muy prudente a fin de
facilitar una cita de los dos
enemigos históricos.
English Children's BooksA lot of care was taken to not upset
others when arranging the meeting
between the two long time enemies.
English BusinessSignificant amounts of cautious political
maneuvering were required in order to
facilitate a rendezvous between the
two bitter historical opponents.
Fewer Edits
Higher Productivity
Greater Margins
Happy Editors
Copyright © 2016, Asia Online Pte Ltd
• Language Studio™ provides tools and processes for
normalization of terminology
• Benefits include cost reductions, faster deliverables,
higher customer satisfaction and happier post editors
Copyright © 2016, Asia Online Pte Ltd
– Extract high quality data meeting engine QA
requirements
– Training of Engine
– Incremental Improvements (Re-Training)
Copyright © 2016, Asia Online Pte Ltd
• BASIC– You Are In Control – Your Analysis / Your Corrections
– Log files
– Runtime Rules• Glossary
• Non-Translatable Terms
• Pre-Translation Corrections
• Post Translation Adjustments
Copyright © 2016, Asia Online Pte Ltd
• The Diagnostic Engine: Error types to look for;
� Formatting errors.
� Times, dates – what is the desired format?
� Currencies – format, is conversion required?
� Measurements – specialised subject domains may have particular
scientific measurements and metrics.
� Capitalization – lets get it right.
� Key Terminology.
* Note. It is important to be aware of the differences in the types of
errors made by Machine Translation and Human Translation.
**Grammatical errors are fixed over time by Post Edit feedback
Copyright © 2016, Asia Online Pte Ltd
Before Machine TranslationBefore Machine Translation
Pre-Translation JavaScript (JS)- Complex pre-processing can
be customized via JavaScript.
Pre-Translation Corrections (PTC)- A list of terms that adjust the source
text fixing common issues and
making it more suitable for translation.
Non-Translatable Terms (NTT)- A list of monolingual terms that are
used to ensure key terms are not
translated.
Runtime Glossary (GLO)
- A list of bilingual terms that are used to
ensure terminology is translated a
specific way.
After Machine TranslationAfter Machine Translation
Target text is processed and modified.
Post Translation Adjustment (PTA)- A list of terms in the target language that
modify the translated output. This is very
useful for normalization of target terms.
Post Translation JavaScript (JS)- Complex post-processing can
be customized via JavaScript.
Source text is processed and modified.
Copyright © 2016, Asia Online Pte Ltd
Original Source Corrected Source
PrecisionTMWorkstations Precision™ Workstations
ChinaSingaporeSydney China, Singapore, Sydney
Hyper-VTM Hyper-V™
6TBExternal 6TB External
w/ with
TO Q1 TO QUESTION 1
— <wall/>:<wall/>
(\d)'|"(?=[ ](HD|disp|SAS|SATA)) ${1}-inch
• Support for case sensitive and case insensitive matches.
• Support for regular expressions.
Copyright © 2016, Asia Online Pte Ltd
Original Source Specified Translation
Portugal-Portuguese Portugais (Portugal)
Independent Software Vendor (ISV) éditeurs de logiciels indépendants (ISV)
South Holland Province La Province Hollande-Méridionale
Proof of Concept (POC) engagement mission de validation technique
HBA adaptateur de bus hôte
Fine print Clauses complémentaires
Standup HBA adapter pour adaptateur de bus hôte
HBA standup adapter pour adaptateur de bus hôte
Copyright © 2016, Asia Online Pte Ltd
Copyright © 2016, Asia Online Pte Ltd
• PROFESSIONAL– Expert Analysis and Guidance by Language Studio Experts– Next Steps Defined– Quality Improvement Plan Refined– Iterative Improvement– Deep analysis of issues and improvement options– Training and Premium Support– Priority Training– Rapid Retraining– Faster Quality Improvement
Copyright © 2016, Asia Online Pte Ltd
Common Mistake: Measure quality of initial engine and determine costs from this engine.
• Many issues in data and formatting can only be seen once an engine has been customized.
• One of the most important metrics is how quickly a custom engine can improve.
• Many of the most significant issues can be addressed very quickly.
• Metrics should begin once an initial correction / diagnostic phase has been completed and corrective action taken.
Copyright © 2016, Asia Online Pte Ltd
• Directly linked to time to improve quality
• Faster improvement delivers faster ROI
• Many customers
recovers costs on
their first project
• Projects progressively
become lower cost
Copyright © 2016, Asia Online Pte Ltd
• Post Editors need to be Ready, Willing and Able to
work with MT.
• Training on Post Editing of MT.
• Error patterns, what to work on, feedback.
• Remuneration – very important to pay them
properly.
• Defined method by timing a trusted translator to
establish productivity of post editing.
• Recognize translator bias and ‘red pen syndrome’.
Copyright © 2016, Asia Online Pte Ltd
ProofEditTranslate
Machine Translation
Quality Check
Is MT Segment
Editable?
ProofHuman Translation
Edit
Post Edit
No Yes
Note: Human Translator and Post Editor can be the same
person, but Human Translator and Editor must not
be the same person.
Copyright © 2016, Asia Online Pte Ltd
0
-1
-2
-3
-4
PASSIVEHave decided to try it “soon”, but not yet
ready/willing to responded to signals for MT
adoption, established globalization trends and
changes. Riding the tail of technology adoptions.
“We are waiting for a customer to request MT”
OBSTRUCTIVEFind reasons to not proceed – budget, time, people,
process, training, technology, the weather.
“We don’t have time for this, we are too busy!”
SCORNFULTake a position of being threatened by MT and blame many negative
industry trends on MT. Actively seek examples of failure from the
market instead of success. Use Google as quality benchmark.
“Look how bad this MT is!! LOL”
REJECTIONJust block the opportunity entirely. No real reason needed.
“I don’t like MT. I am a human. I am creative. A machine can never do what I do!”
DENIALAre aware of MT, see market proof points from competitors, but refuse to
pay attention.
“It does not impact my business. Why should I care?”
Copyright © 2016, Asia Online Pte Ltd
MANAGED
05Level
04Level
03Level
02Level
01Level
• Custom MT Engines by Domain
• Post Editor Training & Feedback
• New Editor Payment Models
• TMS Integrated with MT Platform
• Opportunistic Cost Focused
• Minimal Business Benefit
• MT Project by Project
• Generic MT Engines
• High Failure Rate
• Productivity Metrics
• Clear Productivity Gains
and Benefits
• Recruiting Post Editing Specialists
• Granular Domain Specific Engines
• Dedicated MT Project Managers
• Broad Management Support
• MT Specific Workflows
• Business KPIs
• MT Specific Products
• MT as a Corporate Strategy
• Full Management & Staff Support
• Customization
Planning
• Terminology
Management
• Revenue Metrics
• Iterative Business Process
Review and Optimization
• Iterative Sales Strategy
Review and Optimization
• Customer Driven MT Improvement
• Customer Driven MT Solutions
• Customer Driven Workflows
• MT Customization /
Quality Specialists
• MT Sales Training
• MT / TM Integration
Copyright © 2016, Asia Online Pte Ltd
MT based on the Language Studio platform
and Asia Online’s MT Maturity Model
is now an integral part of our
business growth strategy.
In addition to enabling access to new market
segments, Language Studio has enabled
EQHO to increase profit margins, reduce
delivery times and raise translation quality.
-- Yvan Hennecart, COO
Copyright © 2016, Asia Online Pte LtdCopyright © 2016, Asia Online Pte Ltd
Copyright © 2016, Asia Online Pte Ltd
• Companies must do more than just replace first pass
human translators with machines.
• Companies to adapt their processes from the front to the
back to take full advantage of MT.
• Staff training in new tools, processes and offerings.
• A solid understanding of your data.
• Realistic expectations (timeline and results).
Copyright © 2016, Asia Online Pte Ltd
• The Language Studio™ Diagnostic engine is when initial fixes are
made.
• Effective testing and improvement at this stage allows a system to
go into production quickly.
• Language Studio MT systems improve over time via the post edit
feedback loop.
• Moving into production faster with MT reduces costs and speeds
up return on investment.
• LSPs need to have realistic expectations set and shared.
• Preparing your freelance translators to be MT post editors is
critical.
• Dedicated MT project managers in the LSP manage the transition
to new systems and workflows.
Copyright © 2016, Asia Online Pte Ltd
• When is an MT engine good enough? This depends upon language
pair, subject domain, project requirements for end product quality:
- Typical answer is: lower total cost and faster than all-human model.
• Testing and Benchmarking methodologies:
- Productivity / cost comparisons -> all human verses MT + human
• It must be realised that human Post Editors get used to MT content
over time and therefore become more productive than they were
during initial tests.
• MT engines improve over time.
• The quicker you go into production the sooner the gains can be
realized.
Copyright © 2016, Asia Online Pte Ltd