taus mt showcase, strategies for building competitive advantage and revenue from machine...

45
TAUS MACHINE TRANSLATION SHOWCASE Strategies for Building Competitive Advantage and Revenue from Machine Translation 14:40 – 15:00 Wednesday, 10 April 2013 Dion Wiggins Asia Online

Upload: taus-enabling-better-translation

Post on 03-Jul-2015

1.249 views

Category:

Technology


0 download

DESCRIPTION

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCore

TRANSCRIPT

Page 1: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

TAUS  MACHINE  TRANSLATION  SHOWCASE  

Strategies for Building Competitive Advantage and Revenue from Machine Translation 14:40 – 15:00 Wednesday, 10 April 2013 Dion Wiggins Asia Online

Page 2: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Dion  Wiggins  Chief  Execu<ve  Officer  [email protected]    

Business  Strategies  for  Building  Strategic  Advantage  and  Revenue  from  

Machine  Transla<on  

Page 3: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Page 4: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Human  Resources  –  Linguis@c  

•  Language  /  Transla@on  •  Natural  Language  Programming  (NLP)  

–  Technical  •  Opera@ng  System  •  SoGware  installa@on  and  support  

–  Programming  •  Tailoring  to  needs  of  the  business  

•  Integra@on  with  other  tools  and  plaLorms  

•  Infrastructure  –  Hardware  

•  Hosted,  purchased  –  SoGware  

•  Licensed,  Hosted,  Open  Source    

•  Data  Requirements  –  Third  party  

•  Free,  Commercial  –  Internal  data  –  Data  manufacturing  –  Clean  vs.  Dirty  Data  SMT  –  Rules  vs.  SMT  vs.  Hybrid  

•  Skill  Development  –  Hosted  -­‐  basic  skills  –  Onsite  Moses  –  comprehensive  

•  TMS  /  Workflow  Integra@on  –  Pre-­‐built,  custom  development  

•  Document  Format  Support  –  Wide,  limited  

Page 5: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Transla@on  Costs  – Monthly  fee,  per  word,  human  resources  

•  Customiza@on  Costs  –  Up  front,  embedded  on  transla@on  costs,  human  resources  

•  Management  Costs  –  Oversight,  improvement    

•  Control  –  Extensive,  limited  

•  Data  Security  –  Contract,  internal  

•  Project  Type  –  Language  Pair  –  Domain  

•  Risk  – Managed  by  expert  – Managed  by  your  term  –  Likelihood  of  failure  

•  Time  to  Quality  –  Trained  by  professionals,  learned  skills  

•  Cost  of  Post  Edi@ng  –  Higher  quality  MT  should  result  in  lower  cost  of  edi@ng  

Page 6: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

A  An  infinite  demand  –  a  well  defined  and  growing  problem  that  has  always  been  looking  for  a  solu@on  –  what  was  missing  was  …  

Machine  Transla<on  M   T  

eMpTy  Promises  50  Years  of  

Q  Why  does  an  industry  that  has  spent  50  years  failing  to  deliver  on  its  promises  s@ll  exist?  

Page 7: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Quality  

Control  

Focus  

Page 8: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

4.  Manage  Manage  transla@on  projects  while  genera@ng  correc@ve  data  for  quality  improvement.  

2.  Measure  Measure  the  quality  of  the  engine  for  ra@ng  and  future  improvement  comparisons  

3.  Improve  Provide  correc@ve  feedback  removing  poten@al  for    transla@on  errors.  

1.  Customize  Create  a  new  custom  engine  using  founda@on  data  and  your  own  language  assets  

Page 9: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Quality requires an understanding of

the data

There is no exception to this rule

Page 10: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

1.  Click  Training  Data  tab.  2.  Click  on  Upload  and  select  TMX  files.  3.  Click  Training  Data  tab.  4.  Click  Build  

Some  even  brag  that  it  is  this  simple.      

“Seriously,  that’s  it!”  

Perhaps  it  should  have  been    

 

“Seriously,  that’s  it????”  

Page 11: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Flaws  in  the  One  BuWon  Instant  MT  Approach  

•  Simply  upload  your  data  and  magic  happens  to  create  a  custom  MT  engine  in  hours/minutes.  

•  Seriously,  that’s  it!  

•  MT  cannot  not  read  your  mind.  •  It  cannot  determine  which  wri<ng  

style,  target  audience,  formats,  vocabulary  or  capitaliza<on  you  want.    

•  It  cannot  determine  what  is  missing  and  whether  your  data  is  suitable  for  your  goal.  

•  You  don’t  know  which  is  the  right  data  

Page 12: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Just  Add  Water  Upload  Data  

If  it  was  really  this  easy,  don’t  you  think  custom  MT  success  stories  would  be  everywhere?  

Page 13: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Moses  Case  study  that  describes  the  effort  in  detail:  hhp://slidesha.re/KwkdUH  •  Summary:  

–  Needs  expert  programmer,  expert  project  manager  –  Requires  very  powerful  hardware  –  Large  amounts  of  soGware  development  –  TAUS  Data  Associa@on  membership  EUR  15,000  for  data  –  360  man  hours  to  set  up  first  pilot  –  Mul@-­‐year  effort  with  considerable  funding  required  –  Transla@on  quality  close  to  that  of  Bing  

 

“The  ready  availability  of  the  Moses  MT  engine  under  an  open  source  license  enables  everybody  to  create  staCsCcal  MT  engines  from  parallel  data  with  a  

moderate  amount  of  effort.”  

“With  self-­‐serve  MT,  clients  without  the  necessary  MT  and  compuCng  experCse  to  install  Moses  themselves,  have  for  the  first  Cme  the  ability  to  build  an  MT  system  

based  on  their  own  user  requirements  preLy  much  instantly.“  

Page 14: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Do  it  yourself  Moses  and  Self  Service  Moses  primarily  target  and  solve  the  engineering  complexity  of  deploying  a  basic  Moses  system  

•  There  are  many  other  technical  and  data  requirements  necessary  

•  Many  addi@onal  technology  components  are  needed.  Some  have  not  yet  been  developed  such  as  TMS  integra@on,  XML  tag  handling  etc.  

For  a  good  blog  entry  and  discussion  on  this  topic  see  hLp://bit.ly/rWAxG7  

Page 15: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

1.  What  is  the  right  data  to  upload  for  my  MT  system?  2.  How  should  I  prepare  my  data?  3.  What  cleaning  can  I  do  that  the  magic  1  click  buhon    

does  not  do?  4.  What  impact  will  my  data  have  on  the  MT  system?  5.  Will  the  data  I  upload  improve  or  decrease  quality?  6.  What  will  mixing  data  from  mul@ple  domains  do  to  my  MT  system?  7.  Should  I  add  some  or  all  of  the  TAUS  data  to  my  system?  8.  Once  I  have  a  system,  how  can  I  make  it  beher?  9.  When  I  see  an  error  in  my  MT  output,  how  can  I  know  the  cause  of  the  error?  10.  When  I  see  an  error  in  my  MT  output,  how  can  I  fix  the  error?  11.  …  ..  1.  …                                                                                                                              1.          …  

Page 16: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Defini@on  –  Domain  –  Target  Audience  –  Preferred  Wri@ng  Style  –  Glossaries,  Non-­‐Translatable  Terms,  Preferred  Capitaliza@on  –  Special  Formapng  Requirements  –  Quality  Requirements  

•  Data  Gathering  –  Source  data  in  domain  –  Bilingual  data  to  support  domain  –  Monolingual  data  to  support  domain  

•  Data  Analysis  –  Gap  analysis  –  High  frequency  terms  –  Term  extrac@on  

•  Data  Genera@on  –  Suppor@ng  grammar  structures  –  Source  Data  Analysis  

•  Cleaning  of  Data  •  Tuning  and  Test  Set  Prepara@on  •  Diagnos@c  Engine  

–  Fine  tuning  

Provided  by  client  and  gathered  from  third  par@es.  

Page 17: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Near  human  quality  automated  transla<on  designed  for  the  professional  transla<on  industry  –  Many  customers  have  achieved  quality  levels  where  more  than  50%  

of  raw  machine  transla@on  requires  no  edi@ng  at  all  –  Case  studies  of  customers  that  have  achieved  3  x  margin  with  1/3  the  

human  resources  –  Regularly  replacing  compe@tors  pre-­‐exis@ng  installa@ons  

•  Machine  +  Human  approach  delivers  higher  quality  than  a  human  only  approach  –  More  consistent  wri@ng  style  and  more  accurate  terminology  

•  Rapid  ongoing  transla<on  quality  improvement  –  Post  edited  machine  transla@on  is  fed  back  to  the  engine  which  learns  

from  its  previous  errors  by  analyzing  the  correc@ons  –  Live  feedback  as  new  content  is  published  

•  Enable  clients  to  control  preferred  terminology,  vocabulary  and  wri<ng  style  

Spanish  Original  Before  Transla<on:  

Se  necesitó  una  gran  maniobra  polí@ca  muy  prudente  a  fin  de  facilitar  una  cita  de  los  dos  enemigos  históricos.  

Business  News  Aaer  Transla<on:  

Significant  amounts  of  cau@ous  poli@cal  maneuvering  were  required  in  order  to  facilitate  a  rendezvous  between  the  two  biher  historical  opponents.  

Children’s  Books  Aaer  Transla<on:  

A  lot  of  care  was  taken  to  not  upset  others  when  organizing  the  mee@ng  between  the  two  long  @me  enemies.  

We  found  that  52%  of  the  raw  original  output  from  Asia  Online  had  no  errors  at  all  –  which  is  great  for  an  

ini<al  engine.            .            –  Kevin  Nelson,    

Managing  Director,    Omnilingua  Worldwide  

“ ”

Complete  Stylis<c  Control    

Two  different  output  styles  for  the  same  input  sentence  

Page 18: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

LP   Top-­‐Level  Domain  

Engines/Sub-­‐Domains  

EN-­‐ES   Automo<ve                

Honda   Cars  

Motorbikes  

Toyota   Marke@ng  

Service  Reports  

User  Manuals  Engineering  Service  Manuals  

User  Manuals  Engineering  Service  Manuals  

Page 19: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Page 20: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Data    –  Gathered  from  as  many  sources  as  

possible.  –  Domain  of  knowledge  does  not  maher.  –  Data  quality  is  not  important.    –  Data  quan<ty  is  important.  

•  Theory    –  Good  data  will  be  more  sta<s<cally  

relevant.    

•  Data  –  Gathered  from  a  small  number  of  

trusted  quality  sources.  –  Domain  of  knowledge  must  match  

target  –  Data  quality  is  very  important.  –  Data  quan@ty  is  less  important.  

•  Theory  –  Bad  or  undesirable  paAerns  cannot  be  

learned  if  they  don’t  exist  in  the  data.    

Dirty  Data  SMT  Model  

Clean  Data  SMT  Model  

Page 21: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Page 22: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  There  is  no  magic  in  MT,  human  effort  is  required.  •  The  quality  of  the  output  and  suitability    for  purpose  is  directly  in  propor@on    to  the  amount  of  human  effort.  

•  Without  human  direc@on,    MT  will  cost  more    in  the  long  term    and  is  more  likely    to  fail.  

Page 23: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Bad  transla@ons  •  Out  of  domain  text  •  Unbalanced  /  Biased  

–  Too  much  text  from  other  domains  •  Mixed  /  Wrong  language  •  Junk  and  noise  •  Broken  HTML  •  Mixed  Encoding  •  Missing  diacri@cs    

–  café  vs.  cafe  •  OCR  Text  •  Machine  translated  text  •  Anything  that  is  not  high    

quality  and  in  domain  

Put  Simply:  If  a  bad  paWern  does  not  exist  in  your  training  data,  you  cannot  generate  such  a  

bad  paWern  as  transla<on  output.  

Page 24: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

English  Source   Human  Transla<on   Google  Transla<on   Google  Context  I  went  to  the  bank   Fui  al  banco   Fui  al  banco   Bank  as  in  finance  I  went  to  the  bank  to  deposit  money  

Fui  al  banco  para  depositar  dinero  

Fui  al  banco  a  depositar  el  dinero  

Bank  as  in  finance  

I  went  to  the  bank  of  the  turn  in  my  car  

Fui  en  coche  a  la  inclinación  de  la  vuelta  

Fui  a  la  orilla  de  la  vuelta  en  mi  coche  

Bank  as  in  river  bank  

I  put  my  car  into  the  bank  of  the  turn  

Puse  mi  coche  en  la  inclinación  de  la  vuelta.  

Pongo  mi  coche  en  el  banco  de  la  vuelta  

Bank  as  in  finance  

I  swam  to  the  bank  of  the  river  

Nadé  en  la  orilla  del  río   Nadé  hasta  la  orilla  del  río  

Bank  as  in  river  bank  

I  banked  my  money   Deposité  mi  dinero   Yo  depositado  mi  dinero   Banked  as  in  finance  I  banked  my  car  into  the  turn  

Incliné  mi  coche  en  la  vuelta  

Yo  depositado  mi  coche  en  la  vuelta  

Banked  as  in  finance  

I  banked  my  plane  into  a  steep  dive  

Incliné  mi  avión  en  para  una  zambullida.  

Yo  depositado  en  mi  avión  en  picada  

Banked  as  in  finance  

The  above  examples  show  that  Google  is  biased  towards  the  banking  and  finance  domain  Issue:  There  is  much  more  mul<lingual  banking  and  finance  data  available  to  learn  from  than  there  is  aeronau<cal  or  water  sports  data  available.  Cause:  

Page 25: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Compe<tors  require  20%  or  more  addi<onal  data  than  the  ini<al  training  data  to  show  notable  improvements.    –  This  could  take  years  for  most  LSPs  –  This  is  the  dirty  lihle  secret  of  the  

Dirty  Data  SMT  approach  that  is  frequently  acknowledged.  

•  Asia  Online  has  reference  customers  that  have  had  notable  improvements  with  just  1  days  work  of  post  edi<ng.  –  Only  possible  with  Clean  Data  SMT  

<  0.1%  Improvements  daily  based  on  

edits  

Typical  Dirty  Data  SMT  engines  will  have  between  2  million  and  20  million  sentences  in  the  iniCal  

training  data.    

Page 26: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Language  Studio  Allows:  •  Automated  iden<fica<on  of  areas  of  weakness  •  Post  Edi<ng  Feedback  focusing  directly  on  areas  of  

weakness  •  Automated  error  paWern  analysis  and  correc<on  •  Analysis  and  Resolu<on  of  Unknown  Words  •  Determina<on  and  resolu<on  of  high  frequency  

phrases  •  Terminology  Extrac<on  •  Balancing  Bilingual  Phrases  against  Monolingual  Data  •  Run<me  glossary  •  Run<me  spelling  dic<onary  •  PaWern  handling  and  adjustments  •  Incremental  Improvement  Training  •  Automated  Quality  Measurement  •  Human  Quality  Measurement    •  Quality  Confidence  Scores  for  each  segment  

•  Get  more  dirty  data  •  Human  translate  more  data  

Compe<tors  Sta<s<cal  MT  

Compe<tors  Rule  Base  MT  

•  Add  dic<onary  entries  (limit  20K  words)  •  Train  a  language  model  to  fix  broken  

rules  output  (limit  40K  phrases)  

Page 27: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Typically  about  10-­‐20  examples  for  each  clean  word  of  phrase.  

•  Each  correc<on  has  sta<s<cal  relevance  and  impact  can  be  clearly  seen.  

•  Correc<ons  usually  involve  adding  data  to  fill  gaps.  

•  Far  less  correc<on  of  actual  errors.  •  Clean  data  means  cause  of  errors  can  be  

understood  and  corrected.  •  Concordance  used  to  create  unbiased  

examples/phrases  and  ensure  scope  covered.    

•  Large  volumes  of  dirty  data  prohibits  manual  correc<on.  

•  Individual  correc<ons  are  not  sta<s<cally  relevant.  

•  Manual  correc<ons  must  compete  against  1,000’s  of  bad  examples.  Imprac<cal  to  create  enough  examples  manually.  

•  Understanding  the  cause  of  errors  is  difficult.  •  Slows  training  and  overall  processing  <me.  

Requires  more  resources  to  process  excess  data.  

•  Only  solu<on  is  to  acquire  more  dirty  data  and  hope  problem  is  fixed.  But  may  get  worse  or  cause  new  errors.  

Page 28: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

1960’s   1980’s  

1990’s   2012  

Page 29: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Before  Machine  TranslaCon  

Pre-­‐Transla<on  JavaScript  (JS)  -­‐  Complex  pre-­‐processing  can        be  customized  via  JavaScript.  Pre-­‐Transla<on  Correc<ons  (PTC)  -­‐  A  list  of  terms  that  adjust  the  source        text  fixing  common  issues  and        making  it  more  suitable  for  transla@on.  Non-­‐Translatable  Terms  (NTT)  -­‐  A  list  of  monolingual  terms  that  are        used  to  ensure  key  terms  are  not        translated.  Run<me  Glossary  (GLO)  -­‐  A  list  of  bilingual  terms  that  are  used  to        ensure  terminology  is  translated  a        specific  way.  

AUer  Machine  TranslaCon  Target  text  is  processed  and  modified.  Post  Transla<on  Adjustment  (PTA)    -­‐  A  list  of  terms  in  the  target  language  that          modify  the  translated  output.  This  is  very          useful  for  normaliza@on  of  target  terms.  Post  Transla<on  JavaScript  (JS)    -­‐  Complex  post-­‐processing  can            be  customized  via  JavaScript.  

Run<me  customiza<ons  can  be  applied  in  2  forms:  Default:  Applied  to  all  jobs.  Job  Specific:  A  different  set  of  customiza@ons  can  be  applied  for  different  clients.  

Source  text  is  processed  and  modified.  

Page 30: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Typical  MT  +    Post  Edi<ng  

Speed  

28,000  0  

3,000  

6,000  

9,000  12,000  

25,000  

21,000  

18,000  15,000  

Human  Transla<on  

*Fastest  MT  +  Post  Edi@ng  Speed  reported  by  clients.  

*  

Words  Per  Day  Per  Translator  

Average  person  reads  200-­‐250  words  per  minute.  96,000-­‐120,000  in  8  hours.    ~35  Cmes  faster  than  human  translaCon.  

Page 31: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Metrics  That  Really  Count    •  Produc<vity  –  Words  per  day  per  human  resource  •  Margin  –  2-­‐3  <mes  the  profit  margin  is  commonplace  •  Consistency  –  Wri<ng  style  and  terminology  

ü  MT  +  Human  delivers  higher  quality  than  a  human  only  approach  

•  Deals  ü  New  deals  not  accessible  with  a  human  only  approach  ü  Deals  where  you  could  offer  a  more  compe@@ve  bid  due  

to  MT  than  your  compe@tors  ü  Deals  that  would  have  been  lost  to  a  compe@tor  without  

the  advantages  that  MT  offers  

Raw  MT   oaen   has   a   greater   number   of   errors   than  first  pass  human  transla<on.    

However:  Language  Studio™  MT  is  stylised  to  a  specific  domain,  customer   and   target   audience,   so   quality   is  considerably  higher  than  other  MT  systems.    

This  means  that:  1.  MT  errors  are  easy  to  see  and  easy  to  fix    

(i.e.  simple  grammar).    2.  MT  provides  more  accurate  and  consistent  

terminology  than  human  translators,  especially  when  more  than  1  human  works  on  a  project.  

3.  Human  errors  may  be  fewer,  but  harder  to  see  and  harder  to  fix.  

Coun@ng  the  number  of  errors  only,  offers  no  value  as  a  metric   as   the   complexity   of   the   error   is   not   taken  into  account.    

MT  with  more  errors  is  oaen  faster  to  edit  and  fix  than  first  pass  human  transla<ons  with  fewer  errors.    

ProducCvity  is  the    Best  Quality  Metric  

Examples  of  other  “Useful”  Quality  Indicators  Automated  Metrics  (Good  indicators,  but  not  absolute)  •  BLEU  (Bilingual  Evalua@on  Understudy)  •  NIST  •  F-­‐Measure  (F1  Score  or  F-­‐Score)  •  METEOR  (Metric  for  Evalua@on  of  Transla@on  with  Explicit  ORdering)    Manual  Quality  Metrics  (Most  not  designed  for  MT,  more  for  HT)  •  Edit  Distance  (Does  not  take  into  account  complexity  of  edit)  •  SAE-­‐J2450  (Industry  specific)  

Time  Margin  

Page 32: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Opportunis@c  approach  •  Many  LSPs  are  interested  in  MT,  but  not  

willing  to  take  the  plunge  without  a  paying  client.    

•  Limited  to  one  client  •  Harder  to  sell  –  longer  sales  cycle  •  OGen  build  one  language  pair    to  try,  before  

commipng  to  others  

•  Proac@ve  approach  •  Leverages  exis@ng  transla@on  assets  •  Can  be  sold  to  many  clients  •  Easier  to  sell  -­‐  test  and  show  •  Can  sell  mul@ple  language  pairs  at  the  same  

@me  •  Generally  a  higher  Return  On  Investment  

(ROI)  

Wait  for  a  Project    That  Requires  MT  

Create  a  Product    For  Resale  

Revenue   Revenue  Recurring  revenues  from  words  translated  One  @me  revenues  from  resale  of  customiza@on  Post  edi@ng  Preparing  source  data  Terminology  defini@on  Non-­‐Translatable  terms  Unknown  and  high-­‐frequency  phrase  resolu@on  

Recurring  revenues  from  words  translated  Preparing  source  data  Post  edi@ng  Run@me  glossary  prepara@on  Non-­‐translatable  terms  defini@on  

Page 33: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Reduce  Project  Costs     Faster  Transla@on    Delivery    

Revenue   Revenue  

•  Helps  to  manage  margin  squeeze:  –  compete  with  compe@tors  using  cheaper  (perhaps  

lower  quality)  resources  or  compe@tors  using  MT  

•  Helps  to  cost  jus@fy  business  cases  that  may  not  be  viable  using  a  human  only  approach  

•  Can  be  used  behind  the  scenes  (like  a  transla@on  memory)  or  disclosed  to  client  

•  More  client  work  in  other  areas  as  a  result  of  leG  over  transla@on  budget.  

•  New  projects  that  could  not  have  been  delivered  on  due  to  @me  and  resource  constraints  

•  Helps  clients  that  want  to  simultaneously  ship  product  in  mul@ple  languages  

•  New  clients  in  research,  analysis,  data  mining  and  discovery  markets  

•  New  clients  that  need  real-­‐@me  or  near  real-­‐@me  transla@on  

Depending  on  project  or  product  model,  revenues  will  vary.  See  previous  slide.    Addi@onal  revenues  from  client  gepng  more  ROI  and  willing  to  invest  in  new  languages.  

Preparing  source  data  Run@me  glossary  prepara@on  Non-­‐translatable  terms  defini@on  Post  edi@ng  Recurring  revenues  from  words  translated  

Page 34: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Expand  Exis@ng    Rela@onships  

Added    Func@onality  

Revenue   Revenue  

•  Opportuni@es  to  translate  addi@onal  material  for  markets  that  may  not  have  been  cost  viable  with  a  human  only  approach  

•  Reuse  custom  MT  for  mul@ple  purposes  •  Enable  clients  to  beher  compete  in  

markets  that  were  only  par@ally  addressed  due  to  cost  and  @me  

•  Expand  service  offerings  with  new  features  such  as  mul@lingual  customer  support  

•  Integrate  machine  transla@on  into  exis@ng  client  technologies,  products  and  services  

Preparing  source  data  Terminology  defini@on  Non-­‐Translatable  terms  Unknown  and  high-­‐frequency  phrase  resolu@on  Post  edi@ng  Recurring  revenues  from  words  translated  One  @me  revenues  from  resale  of  customiza@on  

Same  as  for  Expanding  Exis@ng  Rela@ons    Addi@onally  able  to  charge  various  service  fees  rela@ng  to  the  new  services  offered.  For  example,  transla@ng  common  Q&A  for  customer  support  and  a  commission  on  integrated  mul@lingual  support  products.  

Page 35: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  Engine  Learning  Itera<on  

1   2   5  4  3   6  

Publica<on  Quality  Target  

Post  Edi<ng    Effort  

Qua

lity  

Post  Edi<ng  Effort  Reduces  Over  Time    The  post  edi@ng  and  cleanup  effort  gets  easier  as  the  MT  engine  improves.    Ini@al  efforts  should  focus  on  error  analysis  and  correc@on  of  a  representa@ve  sample  data  set.      Each  successive  project  should  get  easier  and  more  efficient.  

Raw  MT  Quality  

Engine  Learning  Itera<on  1   2   5  4  3   6  

6  5  4  3  2  1  

Post  Edi<ng  (Human  Transla<on)  

MT  Post  Edi<ng  

Cost  Per  W

ord  

Post  Edi<ng  Cost                MT  learns  from  post  edi@ng  feedback  and  quality  of  

transla@on  constantly  improves.    Cost  of  post  edi@ng  progressively  reduces  as  MT  quality  increases  aGer  each  engine  learning  itera@on.  

Page 36: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

How  Omnilingua  Measures  Quality  –  Triangulate  to  find  the  data  –  Raw  MT  J2450  v.  Historical  Human  Quality  J2450  –  Time  Study  Measurements  –  OmniMT  EffortScore™  

Everything  must  be  measured  by  effort  first  –  All  other  metrics  support  effort  metrics  –  Produc@vity  is  key  

∆  Effort  >  MT  System  Cost  +  Value  Chain  Sharing        

Page 37: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Built  as  a  Human  Assessment  System:    –  Provides  7  defined  and  ac@onable  error  classifica@ons.  –  2  severity  levels  to  iden@fy  severe  and  minor  errors.    

•  Provides  a  Measurement  Score  Between  1  and  0:    –  A  lower  score  indicates  fewer  errors.  –  Objec@ve  is  to  achieve  a  score  as  close  to  0  (no  errors/issues)  as  possible.    

•  Provides  Scores  at  Mul@ple  Levels:    –  Composite  scores  across  an  en@re  set  of  data.  –  Scores  for  logical  units  such  as  sentences  and  paragraphs.    

Page 38: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Asia  Online  v.  Compe<ng  MT  System   Factor  

Total  Raw  J2450  Errors   2x  Fewer  

Raw  J2450  Score   2x  Beher  

Total  PE  J2450  Errors   5.3x  Fewer  

PE  J2450  Score   4.8x  Beher  

PE  Rate   32%  Faster  

       We  found  that  52%  of  the  raw  original  output  from  Asia  Online  had  no  errors  at  all    –  which  is  great  for  an  ini<al  engine.      

“ ”

     There  were  far  fewer  errors  produced  by  the  Language  Studio™  custom  MT  engine  than  the  compe<tor's  legacy  MT  engine.      

Notably  there  were  fewer  wrong  meanings,  structural  errors  and  wrong  terms  in  the  Language  Studio™  custom  MT  engine,  that  were  "typical  SMT  problems"  in  the  

compe@tor's  legacy  MT  engine.    

The  final  transla<on  quality  aaer  post-­‐edi<ng  was  beWer  with  the  new  Language  Studio™  custom  MT  engine  than  the  compe<tor's  legacy  MT  engine  and  also  beWer  

than  a  human  only  transla<on  approach.    Terminology  was  more  consistent  with  a  combined  Language  Studio™  custom  MT  engine  

plus  human  post  edi@ng  approach.    

“ ”

” “ –  Kevin  Nelson,    

Managing  Director,    Omnilingua  Worldwide  

Page 39: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  LSP:  Sajan  •  End  Client  Profile:  

–  Large  global  mul@na@onal  corpora@on  in  the  IT  domain.  –  Has  developed  its  own  proprietary  MT  system  that  has  been  developed  over  many  years.  

•  Project  Goals  –  Eliminate  the  need  for  full  transla@on  and  limit  it  to  MT  +  Post-­‐edi@ng  

•  Language  Pair:    –  English  -­‐>  Simplified  Chinese.  –  English  -­‐>  European  Spanish.  –  English  -­‐>  European  French.  

•  Domain:  IT  •  2nd  Itera@on  of  Customized  Engine  

–  Customized  ini@al  engine,  followed  by  an  incremental  improvement  based  on  client  feedback.  

•  Data    –  Client  provided  ~3,000,000  phrase  pairs.    –  26%  were  rejected  in  cleaning  process  as  unsuitable  for  SMT  training.  

•  Measurements:  –  Cost  –  Timeframe  –  Quality  

Page 40: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

•  Quality  –  Client  performed  their  own  metrics  –  Asia  Online  Language  Studio™  was  

considerably  beher  than  the  clients  own  MT  solu@on.  

–  Significant  quality  improvement  aGer  providing  feedback  –  65  BLEU  score.  

–  Chinese  scored  beher  than  first  pass  human  transla@on  as  per  client’s  feedback  and  was  faster  and  easier  to  edit.  

•  Result    –  Client  extremely  impressed  with  result  

especially  when  compared  to  the  output  of  their  own  MT  engine.  

–  Client  has  commissioned  Sajan  to  work  with  more  languages  

70%  Time  Saving  

60%  Cost  Saving  

LRC  have  uploaded  Sajan’s  slides  and  video  PresentaCon  from  the  recent  LRC  conference:  Slides:  hLp://bit.ly/r6BPkT            Video:  hLp://bit.ly/trsyhg  

Page 41: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Travel  &  Leisure  Ver@cal  

English  to  Spanish  Language  Pair  

Custom  MT  engines  built  and  programma@cally  consumed    

A  human  post  edit  step  was  included  in  workflow  and  measurement  

Scien@fic  measures  of  produc@vity  for  all  phases  of  process  

Page 42: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Base  training  materials  provided  and  catalogued  

Asia  Online  trained  the  engine  and  released  to  a  diagnos@c  stage  

First  pass  of  new  content  through  diagnos@c  engine  yielded  posi@ve  results  

Asia  Online  provided  advanced  data  genera@on  technologies  to  the  diagnos@c  engine  through  monolingual  data  crawling,  applica@on  of  run@me  rules,  and  pre-­‐transla@on  adjustments    

Even  further  progress  achieved  from  extrac@ng  and  applying  a  industry  specific  high  frequency  term  list  from  the  source  

Page 43: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

58%  of  segments  required  no  edits  

Page 44: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  

     

Post  Edit  Produc<vity  Analysis  

Produc@vity  Percentage   328%  Increase  

Produc@vity  Rate   8,208  words  a  day  

Page 45: TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

Copyright  ©  2013,  Asia  Online  Pte  Ltd  Copyright  ©  2013,  Asia  Online  Pte  Ltd  

Dion  Wiggins  Chief  Execu<ve  Officer  [email protected]    

Business  Strategies  for  Building  Strategic  Advantage  and  Revenue  from  

Machine  Transla<on