taus quality summit dublin welocalize presentation by olga beregovaya and lena marg
DESCRIPTION
Getting the Right UGC Translation Quality: Human or Machine, Professional or Crowd. Language Tools Team at Welocalize shares insights and expertise regarding the quality requirements for user generated contented. Human Translation, Machine Translation, Post-Editing, Crowdsourcing topics.Localization World TAUS Quality Summit Pre-Conference in Dublin, Ireland in June 2014.TRANSCRIPT
Getting the Right UGC Translation Quality:
Human or Machine,Professional or Crowd
Making Translation Strategy Decisions
Making Translation Strategy Decisions
Adding UGC to the Mix
major influence in peoples’ buying decisions
the second most trusted form of advertising (after word-of-mouth), with 70% of global consumers indicating they trust this platform
UGC in the form of support forums reduces the cost of supporting customers
UGC is always blended in with the “branded” web UI content
No traditional translation process, quality standards and pricing models can be viable in the UGC world – something’s got to give!
*Nielsen’s 2011 Global Trust in Advertising report, which surveyed more than 28,000 people in 56 countries
The UGC Quality Challenge
UGC & Translation Quality RequirementsThere are different kinds of UGC and traditional quality requirements are less relevant
than the impact and purpose requirements.
If your UGC is a part of your brand identity, needs to produce emotional impact and borders the "transcreated copy", you'll be best off engaging professional translators and providing brand book-level guidelines.
If your UGC is there to convey information but translations need to preserve the desired tone (a casual tone upbeat review), the crowd approach will be best.
If your UGC is there to provide technical instructions, you will need subject matter experts who will be able to accurately preserve the meaning of the source input in the translation.
Characteristics of UGC
authored by non-professionals and / or non-native speakersoften similar patterns to oral speech
sometimes authored by power users / “techies”often highly perishable
multitude of authors = diversity of styles
Examples:•Short forms (nite (night), sayin (saying), gr8 (great)), •Acronyms (lol (laugh out loud), iirc (if I remember correctly)), •Typing errors/misspellings (wouls (would), rediculous (ridiculous)), •Punctuation omissions/errors (im (I’m), dont (don’t)), •Non-dictionary slang (that was well mint (that was very good)), •Wordplay (that was soooooo great (that was so great)), •Censor avoidance (sh1t, f***), •Emoticons (:) (smileys), <3 (heart))•Foreign words used intentionally (al dente, bon voyage)
(Roturier, 2011), (Jiang et al, 2012), (Clark & Araki, 2011)
Travel Portal – Company + Customer Content
Yellow = UGCGreen = Web UI
Technical User Forums
Online Marketplace
Input - Examples
→→ emoticons, typing errors, missing punctuation, grammar errors, ‘techie’ speak, slang, …
UGC & MT
UGC has become a part of the Brand Strategy > raw MT will not be enough in all cases
Utility scoring should be used to measure the quality of raw MT for UGC; it rates the comprehensibility & utility of the output rather than the linguistic quality
MT evaluation results for UGC indicate that around 50% or less of comments / reviews are considered comprehensible
Researchers are focusing efforts on normalization and preprocessing steps of UGC in order to improve MT output and reduce the PE effort
…. How much can and should post-editing fix?
UGC & MT - Normalization
Normalization is the manual or automated process of: taking non-standard input and pre-translating them using scripts, regular expressions and other processes in order to make the source text more ‘normal’ before machine translation. Example:
UGC – Example Post-Editing Scenarios
Online Marketplace – post-editing for MT engine re-training
Closer to Full PE, due to very specific PE requirements and expected knowledge around MT engine logic of post-editors.
E.g.: Change words instead of changing sentence structure, do not add or leave out any information
2. Knowledgebase – crowd-sourced post-editing Light PE quality requirements with focus on severe
mistranslations and fixing corrupted content
3. Travel portal user reviews - sanity check of raw MT output
Extra Light PE with focus on severe mistranslations and offensive content on high volume & high perishability content
Levels of Post-Editing for UGC Source Raw MT Dutch light PE Dutch full PE COMMENTS ON EDITS
We have stayed here and their has been a few stag and hens but nothing to worry about. They have been very respectful that its a family hotel
We hebben hier verbleven en hun is geweest een paar hert en kippen maar niets te vrezen. Ze zijn heel respectvol dat het is een familiehotel
We hebben hier verbleven en er zijn een paar vrijgezellenfeesten geweest maar niets te vrezen. Ze zijn heel respectvol dat het een familiehotel is
We hebben hier verbleven en er zijn een paar vrijgezellenfeesten geweest maar er is niets te vrezen. Ze zijn heel respectvol dat het een familiehotel is.
Light PE: mistranslation for stag and hens, literal translation entered but meant were stag and hen parties. Also typo in source has been taken over (their > hun) and this had to be corrected in the PE. Full PE: Additional rewrites, adding the word "er" to improve readability.
Yes it is in the bedroom, we did not have a bed settee
Ja het is in de slaapkamer, we hebben niet een slaapbank
NO EDITS REQUIRED Ja het is in de slaapkamer, we hebben geen slaapbank
Full PE: "niet een" changed to "geen" for readability.
my dad is in a wheelchair is this hotel suitable
mijn vader is in een rolstoel is dit hotel geschikt
NO EDITS REQUIRED Mijn vader zit in een rolstoel, is dit hotel geschikt?
Full PE: added punctuation, Capitalization and made the sentence more readable.
UGC & Quality Guidelines
Style Guide for Professional
Translation – typically between 20-50 pages, outlines client-specific
requirements
Not appropriate for UGC(in terms of content & resources)
Rating UGC quality - Community
Translating UGC – Talent Search
- Knowledge of target and source locale- Knowledge of content / subject-matter- Product users (power users,…)- People who like to write / use their language skills- People who belong to a user / interest group- May need to learn using CAT technology- May need to follow very specific instructions
Crowdsourcing The term crowdsourcing was coined by Jeff Howe in 2006, as the act of taking a task
traditionally performed by an employee or contractor, and outsourcing it to an undefined and generally large group of people in the form of an open call.
Crowd-sourced translation is currently the most active area of investment in the industry. (Kelly, 2013)
Requires a shared and easy-to-use translation / post-editing platform
In order for crowd-sourcing to be most effective (turnaround times & throughputs), a good-sized crowd needs to be built – requiring some form of incentive / motivation
Some quality guidance is required to prevent chaos
Crowd & Quality Guidelines While crowd guidelines need to be adapted to the specific crowd (professional / non-professional, linguists / power-user, paid / un-paid,...) and content & purpose (brand identity, technical specialization, special user community – e.g. luxury versus student travel,...), they should generally be:
Simple & easy-to-follow by all crowd members
Brief and ideally without reference to additionl checks & documents
Clarify main purpose of translation / post-editing assignment
Provide clear rules (rather than lists of exceptions)
They may point out the handling of specific technical items.
Possible Translation Scenarios
MT + normalization; paid crowd with basic instructions on “do’s and don’ts”; crowd can be mix of translators / customers / …
possibly MT with Full PE, but possibly professional HT / transcreation
MT + “accuracy check” PE; crowd of technical users, savvy on product, linguistic errors are ok