beyond data: delivering machine translation with subject matter expertise
TRANSCRIPT
Beyond Data Delivering Machine Translation with
Subject Matter Expertise
John TinsleyDirector / Co-Founder
TAUS MT Showcase. 4th June 2014, Dublin
We provide Machine Translation solutions with Subject Matter Expertise
We do this using Linguistic Engineering
An “ensemble” MT architecture
The world’s first and only patent specific MT system that’s ready to go
Data EngineeringWhat is Linguistic Engineering?
Pre-processing Post-processing
Input Output
Training Data
Patents: an MT nightmare
L is an organic group selected from -CH2-(OCH2CH2)n-, -CO-NR'-, with R'=H or C1-C4 alkyl group; n=0-8; Y=F, CF3 …
maximum stress of 1.2 to 3.5 N/mm<2> and a maximum elongation of 700 to 1,300% at 0[deg.] C.
Long Sentences
Technical constructions
Largest single document: 249,322 words
Longest Sentence: 1,417 words
Data EngineeringWhat is Linguistic Engineering?
Pre-processing Post-processing
Input Output
Training Data
Data Engineering + Linguistic EngineeringAn “ensemble” architecture
Chinese pre-ordering rules
StatisticalPost-editing
Input
Output
Training Data
Spanish med-deviceentity recognizer Multi-output
Combination
Korean pharmatokenizer
Patent inputclassifier
Client TM/terminology (optional)
Japanese scriptnormalisation
GermanCompounding rules
Moses
RBMT
Moses
Moses
If you don’t understand it, you can’t translate it
MT with Subject Matter Expertise
“Allopurinol-induced serious cutaneous adverse reactions (SCAR), including Steven Johnson’s syndrome
(SJS) and toxic epidermal necrolysis (TEN), are associated with a genetic marker, the HLA-B*5801
allele.”
“IPTranslator is perfect for someone who needs to search [patents] across multiple languages and with is useful in the case of both patentability and infringement searches.”
– Aalt van de Kuilen, Global Head of Patent Information, Abbott
Machine Translation for Patents
What is the value for users?
Specialist solutions deliver more useable outcomes for the user
Post-editing
For information purposes
Multilingual search
Increased productivity
Extract more meaning
Retrieve more relevant results
=
=
=
De-risking the machine translation proposition
What is the value for users?
+ Data + Time + €€€ = ???
+ No data needed + Systems are ready to go + No upfront cost= Evaluate immediately
Our PrerequisitesTypical Prerequisites
Customisation. Refinement.
» Incorporation of user feedback» Incremental training with post-edits» Tuning for specific input types
Iconic in practice
client case study
Iconic in practice
Iconic had a domain-specific MT solution for that industry
Machine Translation technology for the legal industry
Business Need
Iconic in practice
Delivered immediately and initial results were positive
Translation samples required for initial evaluation
Process (1)
Iconic in practice
“The complexities and unforeseen but inevitable surprises of MT integration in large scale production processes were handled both competently and efficiently.”
Integrate Iconic with GlobalSight for productivity pilot
Process (2)
Iconic in practice
>20% productivity increase for translator post-editing Iconic output
“Iconic delivered measurable productivity gains from the outset”
Performance
Iconic in practice
• Ongoing improvement through feedback from translators• Ongoing improvement through the incorporation of post-edits
• More than 4 million words translated to date for Asian languages• Periodic roll-out of new languages over time
Looking forward
Need: short-term solution to provide on-demand translation through a web search interface
Iconic in practice
Process: integrate directly through Iconic API and evaluate quality and throughput concurrently
Outcomes: in 5 months of production for English-Portuguese alone, we processed:
• 15,526 translation requests• 14,606,374 words
All content is not created equal
We cannot afford to be dogmatic when it comes to MT
Domain specific MT is about more than just data
Know your subject matter!
Take home messages…
Thank You! [email protected]
@IconicTrans