machine translation: 50 shades of grey - … · machine translation: 50 shades of grey continued...

5
It took us a long time, but translators have finally largely accepted that technology plays an important role in our professional lives. Most of us use advanced trans- lation technology such as translation environment tools (TEnTs) that include terminology management, automated quality assurance, and translation memory. And those who do not have typically made informed decisions on why they do not. When it comes to machine transla- tion (MT), however , the situation is very different. It is probably fair to say that most of us wish we had never been forced to have an opinion on this subject, but forced we are. No matter whom we talk to in the general public about translation, MT inevitably comes up. Answering with platitudes or scornful dismissal does not help anyone. Instead, we have to know and understand what we are talking about. In addition to the important issue of how we represent MT to the outside world, though, there is another more fundamental question we need to address: What role could or should MT play in our professional lives? This is the slippery and many-hued t opic t hat translat ors Charlott e Brasler and Jost Zetzsche try t o tackle in this conversation. Charlotte: Attending the Association for Machine Translation in the Americas (AMTA) conference last October in San Diego was a real eye opener for me. I came away with so much to think about regarding the future of being a translator/post-editor and about the challenges our industry faces. I also came away with an instant, very humbling experience of how little I knew about MT, and I would venture to say that many trans- lators out there are no different from me. There is an acute need for transla- tors to educate themselves much more thoroughly about what MT is all about and where and how it can be applied. Jost: I agree, most translators do not know much about MT (aside from the same-old-same-old jokes about the silly mistakes). By the same token, many within the MT community do not know much about translators either. That is why I thought it was great that both ATA’s Annual · The ATA Chronicle n February 2013 13 Machine Translation: 50 Shades of Grey By Charlotte Brasler and Jost Zetzsche Machine translation is so much more than large online statistical engines like Google, and the argument for and against it is infinitely more complex and interesting. Charlotte Brasler Jost Zetzsche

Upload: hoangquynh

Post on 11-Sep-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Translation: 50 Shades of Grey - … · Machine Translation: 50 Shades of Grey Continued There is an acute need for translators to educate themselves much more thoroughly

It took us a long time, buttranslators have finally largelyaccepted that technology plays animportant role in our professionallives. Most of us use advanced trans-lation technology such as translationenvironment tools (TEnTs) thatinclude terminology management,automated quality assurance, andtranslation memory. And those whodo not have typically made informeddecisions on why they do not.

When it comes to machine transla-tion (MT), however, the situation isvery different. It is probably fair tosay that most of us wish we had neverbeen forced to have an opinion on thissubject, but forced we are. No matterwhom we talk to in the general publicabout translation, MT inevitablycomes up. Answering with platitudesor scornful dismissal does not helpanyone. Instead, we have to know andunderstand what we are talkingabout.

In addition to the important issueof how we represent MT to the outsideworld, though, there is another more

fundamental question we need toaddress: What role could or shouldMT play in our professional lives?This is the slippery and many-huedtopic that translators CharlotteBrasler and Jost Zetzsche try totackle in this conversation.

Charlotte: Attending the Associationfor Machine Translation in theAmericas (AMTA) conference lastOctober in San Diego was a real eyeopener for me. I came away with somuch to think about regarding thefuture of being a translator/post-editorand about the challenges our industryfaces. I also came away with an

instant, very humbling experience ofhow little I knew about MT, and Iwould venture to say that many trans-lators out there are no different fromme. There is an acute need for transla-tors to educate themselves much morethoroughly about what MT is all aboutand where and how it can be applied.

Jost: I agree, most translators do notknow much about MT (aside from thesame-old-same-old jokes about thesilly mistakes). By the same token,many within the MT community donot know much about translatorseither. That is why I thought it wasgreat that both ATA’s Annual ·

The ATA Chronicle n February 2013 13

Machine Translation: 50 Shades of GreyBy Charlotte Brasler and Jost Zetzsche

Machine translation is so much more than large onlinestatistical engines like Google, and the argument

for and against it is infinitely more complex and interesting.

Charlotte Brasler Jost Zetzsche

Page 2: Machine Translation: 50 Shades of Grey - … · Machine Translation: 50 Shades of Grey Continued There is an acute need for translators to educate themselves much more thoroughly

Conference and AMTA’s conferencewere co-located in San Diego last year.

So, what kind of things do you thinktranslators need to learn about MT?

Charlotte: Translators need to have abasic knowledge of MT. For example,what distinguishes rule-based, statis-tical, and hybrid MT from each other?What is good MT and bad MT? Forwhich languages and domains doesMT work the best? Translators alsoneed to understand buzzwords likeBLEU score,1 SAE J2450,2 and EditDistance.3 When translators can speakintelligently on these subjects, we canengage in a much more professionaldialogue with “the other side,” the MTcommunity. I was positively surprisedto hear many AMTA attendees expresstheir appreciation at seeing profes-sional translators attending their con-ference, so if we reach out, I amcertain we will receive a warm wel-come. Now, what do you think is thegreatest impediment to translatorinvolvement in MT?

Jost: Your list sounds like a lot fortranslators to learn about MT!Honestly, I am not sure that everytranslator needs to know each of thosedetails. I think it does not hurt to knowthat one kind of MT engine relies ondata (statistical), one on rules (rule-based), and yet another on a mix ofboth (hybrid), and that every MTengine will produce different results.Most importantly, though, every pro-fessional translator needs to under-stand how MT relates to us and ourprofession. This is not only importantfor our own understanding and confi-dence, but also so that we can commu-nicate it to the outside world—kind ofalong the lines of my “Talking Points”GeekSpeak column in the October2012 issue of The ATA Chronicle.4

Once we have managed to form an

opinion about MT based on its actualmerits and shortcomings, rather thanon some pre-formed ideas—which arethe greatest impediment to translatorinvolvement right now—then we candetermine whether becoming moreinvolved in MT has any benefit to usindividually. We might even want totake part in some higher level activitiessuch as working for MT developers orengaging in strategic consulting withcompanies about its use.

Charlotte: In my opinion, the greatestimpediment to translator involvementis a mixture of ignorance (“Machinetranslation is just Google Translate”)and fear (“The machine will eventu-ally replace me, so I will stick myhead in the sand now”). If we canchange our attitude to “This is inter-esting, so how can I make this workfor me?”—in short, embracechange—then we are in a much bettermindset to take advantage of MT.

Jost: At this point, I suspect that manytranslators still see the question as “CanI make this work for me?” rather than“How can I make this work for me?”

Charlotte: Maybe, but I would like tosee all types of translators get involvedby educating themselves better aboutMT. Some translators may indeed lookinto MT and post-editing and say,“That’s not for me.” That is fine. Atleast they have made the effort oflooking into the possibilities offered

by MT and can now speak from expe-rience rather than hearsay. Others willlook at MT and see it not only as anopportunity to take their interest in lin-guistics in a new direction, but also asan opportunity to generate moreincome without working harder orlonger in a market that is continuallybeing squeezed on price.

The important thing to remember isthat we are such a diverse bunch. Oneend of the spectrum could be experi-enced translators who are able tocharge “premium rates” from estab-lished direct clients. They may doubtthe relevance and usefulness of MT ifthey already earn a comfortable livingand the language combination inwhich they work is not suitable forMT. At the other end is the new grad-uate who may be contemplating a tra-ditional freelance career versus apost-editor/MT consulting career. Ithink that the level of involvement willdepend on who you are and where youwant to go with your career in lan-guages. Most of us probably fit some-where in between. We work foragencies, we are feeling squeezed onprice, and we will embrace technologyto the extent that it makes us able totranslate faster and better so we canmaintain or improve our incomewithout working more and more.

Jost: Yes, you are probably right.When I think of MT and the profes-sional translation community, I canthink of four levels of engagement:

14 The ATA Chronicle n February 2013

Machine Translation: 50 Shades of Grey Continued

There is an acute need for translators to educatethemselves much more thoroughly about what MT is all

about and where and how it can be applied.

Page 3: Machine Translation: 50 Shades of Grey - … · Machine Translation: 50 Shades of Grey Continued There is an acute need for translators to educate themselves much more thoroughly

first, development and consulting;second, fine-tuning and training cus-tomized MT programs for ourselves orour clients; third, as a post-editor of MTtexts (typically these involve cus-tomized MT programs); and fourth,using MT as a translation aid alongsideother aids such as translation memories.

Charlotte: Yes, those four levels ofengagement could be seen as apyramid representing the degree oftranslator involvement today. At thetop, Level 1, is involvement on thedeveloper side, working with MTdevelopment companies such asSystran, Google, and Microsoft, andworking with translation buyers tohelp them find ways to integrate MT.The next level down, Level 2, istesting and fine-tuning MT enginesbased on the client’s or your own data.This could be done for your ownneeds or in an in-house or consultantrole. This level represents an areawith real opportunity for translatorswho want to combine an interest intechnology and linguistics. Level 3involves working as a post-editor foragencies already using a customizedMT engine and whose clients belongto Level 2. This is a fast growing areawith an immediate and pressing needin many languages, and this will onlyrise.

The widest and lowest level, Level 4,which involves using MT as a transla-tion aid to speed your translation work,is the level in our pyramid to whichmost translators belong. They use aTEnT with a connection to an onlinestatistical engine like Google orMicrosoft. They do not use a cus-tomized engine with their own transla-tion memories. Since most translatorsbelong to Level 4, their point of refer-ence when asked about MT is natu-rally to think of Google Translate. Theeye-opening experience for me

recently was that MT is so much morethan large online statistical engineslike Google, and the argument for andagainst it is infinitely more complexand interesting.

Jost: So you are talking about the cus-tomization of MT engines for a spe-cific purpose?

Charlotte: Yes, an MT engine can besomething you train for whatever youneed it to translate, such as your spe-cific language combination and sub-ject area. For example, if you are afreelancer and have a sufficientamount of data in your translationmemories, you can train your ownengine. This requires a certain amountof technical skill, but it is not rocketscience. There are already severalopen-source MT engines online, andthough they may not yet be “mature,”they are out there and will only con-tinue to evolve. Having your ownengine with your own data also getsaround the sticky issues of confiden-tiality, because you feed only yourown closed system and do not senddata into cyberspace. This issue ofconfidentiality is relevant both to free-lancers entrusted with their clients’data and to big corporate clients.Training your own engine means thatyou are also in control of the data thatserves as the engine’s “diet.” You canmake sure it receives only qualitydata, which increases the quality ofthe output. I see MT as just an exten-

sion of today’s TEnTs, in that theyprovide suggestions when the fuzzymatch drops below 75%, similar tothe sub-segment leveraging that virtu-ally all TEnTs already provide. MTjust makes that function much morepowerful. And remember: the MTengine in a “closed,” customizedsystem is still pulling from data intranslation memories that were cre-ated by a human in the first place.

Jost: Those are good points, but it isimportant to keep in mind that well-trained MT engines typically workwell only for the very specific subjectareas/clients for which they weretrained by the translator. Any devia-tion from that messes with the accu-racy. So this might work wonderfullyfor some translators with one or tworeally big clients, but others mightfind it hard to justify the effort oftraining an MT engine for each of themany clients and/or types of projectson which they work.

Charlotte: The translation industry ischanging quickly, and no matter howfast and efficient we become with ourTEnTs, we cannot stop the deluge ofcontent that is being generated and thescarcity of translators to translate itall. The result is that many companiesare looking into MT to save moneyand time and still get their materialtranslated. Humans are slow andexpensive but creative; machines arefast and cheap but stupid. In

The ATA Chronicle n February 2013 15

·

The greatest impediment to translator involvement in machine translation is a mixture of

ignorance and fear.

Page 4: Machine Translation: 50 Shades of Grey - … · Machine Translation: 50 Shades of Grey Continued There is an acute need for translators to educate themselves much more thoroughly

Machine Translation: 50 Shades of Grey Continued

between stands the quality factor.That is where we as translators comein: we can affect this quality that the“stupid” machines cannot figure outthemselves. If we can use themachines to help us increase ourspeed—which we are already doingwith TEnTs—and post-edit the MToutput to a human translation qualitylevel, we will all reap the financialbenefits. The clients will get theircontent translated faster and cheaper,and we will make more money byproducing more high-quality wordsfaster. That is a win-win situation.

Jost: I think we need to be carefulabout generalizing when saying thatcompanies will increase their use ofMT and have content that needs to bepost-edited. There will doubtless bemore and more companies going thatroute, but I would say it is safe toassume that for a number of years tocome the amount of text that profes-sional translators will have to dealwith that needs to be “translated fromscratch,” without having been trans-lated by a machine first, will outweighthe stuff that needs to be post-edited.

You are right, though. There isalready an unmet need for MT post-editors, and that need will continue toincrease. This means a lot of goodopportunities for some of us, but Ialso think that many of us will notencounter post-editing MT for a longtime to come. Of course, this verymuch depends on variables such aslanguage combination, area of spe-cialization, target group, and the typeof customer. Would you agree?

Charlotte: Yes, generic MT programscertainly have shortcomings when itcomes to certain languages. While theScandinavian languages (exceptFinnish) in combination with Englishare a sort of poster child for output

quality, other languages such asGerman, which is syntactically verydifferent from English, are notyielding the kinds of quality MToutput that make MT attractive to use.So it really depends a great deal onthe language combination.

Aside from improving the produc-tivity of translators, other usage sce-narios of MT include raw MT orlightly post-edited MT that is consid-ered sufficient by the users of the MToutput—the readers of the translation.A case in point would be knowledgebase articles for software, where amediocre translation is still better thannothing, and many users actuallyanswer “yes” when asked if the articlesolved their problems.5 Consideringthe short life span of a knowledgebase article, MT offers a great solu-tion that is “fast, cheap, and useful.”Again, all MT use cases are different,and it is hard, if not impossible, togeneralize.

Regarding the quality produced byMT, I think it is important to ask thefollowing: If MT can be honed tohuman quality level through cus-tomization of the engine and post-editing of the output, does it matter ifa translation was done from scratchby a human, or by a trained machineand post-edited? The answer to thisquestion would certainly depend on

the output’s “fitness for its purpose,”as mentioned above. In my mind, theanswer is a resounding “NO.” If MTmakes knowledge available to peoplewho would otherwise not have accessto this knowledge because it is lockedin another language, it does not matterif the translation was produced by atrained machine.

Jost: Yes, MT can be used in verypositive ways to open up content forlanguage groups that do not haveaccess to much content otherwise.Examples include Google’s andMicrosoft’s release of Haitian CreoleMT programs after the Haiti earth-quake, or the recent release of Hmongand Mayan MT capabilities byMicrosoft.

I think that it is really important,however, to understand that there is afundamental difference between thework of post-editing MT translationsand post-editing fuzzy matches, andoften this is not represented fairly in theMT community. Provided that yourtranslation memory is in good shape,editing a fuzzy match means altering aninherently correct segment (correct as atranslation for the earlier source seg-ment) to match your current source seg-ment. Typically this involves changinga couple of terms, which can be doneeasily. This is not necessarily so with

16 The ATA Chronicle n February 2013

Once we have managed to form an opinion about MT based on its actual merits and shortcomings, rather than on some pre-formed ideas, we can

then determine whether becoming more involved in MT has any benefit to us individually.

Page 5: Machine Translation: 50 Shades of Grey - … · Machine Translation: 50 Shades of Grey Continued There is an acute need for translators to educate themselves much more thoroughly

MT, though, which is not inherentlycorrect. It can be, but it does not have tobe. If you work in my language combi-nation (English>German), you willquickly find that more often than notthere are fundamental changes you willneed to make to bring the translation tothe required quality level.

Charlotte: How widely do you thinkMT is being used by translatorstoday? This would include desktoptools like Systran and PROMT, orGoogle Translate and other statisticalengines available through applicationprogramming interfaces in manyTEnTs today.

Jost: Many people would like to knowthat number! I am as clueless as most,but I think it is safe to say that thenumber of professional translators whoare using MT has risen in the pastcouple of years. I suspect that youwould find large differences of usageamong different language groups, typeof projects, experience levels, and evenin the way it is used. Significantly, MTseems to have transitioned into one ofmany productivity tools that are usefulto some but not all, and I think thestigma it used to have among translatorsis gradually going away.

Charlotte: I would sure like to knowthat number as well. My gut feeling is

that few translators have used a cus-tomized engine, but many translatorsare currently experimenting with toolslike Google Translate. My other feelingis that few translators have “come out ofthe closet” so to speak about usingonline MT. This could be due to the per-ceived stigma attached or to the confi-dentiality they are potentially breaking.However, I think that on both fronts wecan expect to see exciting developmentsin the coming years as attitudes towardMT and confidentiality change.

Notes1. BLEU (Bilingual Evaluation

Understudy) is an algorithm forevaluating the quality of text whichhas been machine-translated fromone natural language to another.The output from MT is consideredto be of good quality if it closelymatches a professional humantranslation.

Papineni, Papineni, Salim Roukos,Todd Ward, and Wei-Jing Zhu.“BLEU: A Method for AutomaticEvaluation of Machine Trans-lation.” In Proceedings of the 40thAnnual Meeting of the Associationfor Computational Linguistics (July2002), 311–318, http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf.

2. SAE J2450 Translation Quality

Metric Task Force, Quality Metricfor Language Translation ofService Information, www.sae.org/standardsdev/j2450p1.htm.

3. Edit Distance (also called translationerror rate) is an algorithm for calcu-lating the minimum number of editsthat have to be made to a string (theMT output) to make it match anotherstring (the reference translation). Thiswas discussed in AMTA PresidentMike Dillinger’s tutorial on machinetranslation during the 2012 AMTAconference in San Diego.

4. See www.internationalwriters.com/toolkit/12_October_Talking.pdf.

5. Knowledge bases are data reposito-ries that have been designed toenable people to retrieve and usethe information they contain. Theyare commonly used to complementa help desk or for sharing informa-tion among employees within anorganization. They might storetroubleshooting information, arti-cles, white papers, user manuals, oranswers to frequently asked ques-tions. Typically, a search engine isused to locate information in thesystem, or users may browsethrough a classification scheme.

The ATA Chronicle n February 2013 17

CHRONICLEThe

February 2013Volume XLII

Number 2A Publication

of the American

Translators Association

Machine Translation: 50 Shades of GreyInterpreting: Coming of AgeHow Do You Define a Deadline?

In this issue:

Send a Complimentary

Copy

If you enjoyed reading this issue of The ATA Chronicle and

think a colleague or organization would enjoy it too, we’ll

send a free copy.

Simply e-mail the recipient’s name and address to ATA

Headquarters—[email protected]—and we will send the

magazine with a note indicating that the copy is being sent

with your compliments. Help spread the word about ATA!