taus quality summit dublin welocalize presentation by olga beregovaya and lena marg

Getting the Right UGC Translation Quality:

Human or Machine,Professional or Crowd

Making Translation Strategy Decisions

Adding UGC to the Mix

major influence in peoples’ buying decisions

the second most trusted form of advertising (after word-of-mouth), with 70% of global consumers indicating they trust this platform

UGC in the form of support forums reduces the cost of supporting customers

UGC is always blended in with the “branded” web UI content

No traditional translation process, quality standards and pricing models can be viable in the UGC world – something’s got to give!

*Nielsen’s 2011 Global Trust in Advertising report, which surveyed more than 28,000 people in 56 countries

The UGC Quality Challenge

UGC & Translation Quality RequirementsThere are different kinds of UGC and traditional quality requirements are less relevant

than the impact and purpose requirements.

If your UGC is a part of your brand identity, needs to produce emotional impact and borders the "transcreated copy", you'll be best off engaging professional translators and providing brand book-level guidelines.

If your UGC is there to convey information but translations need to preserve the desired tone (a casual tone upbeat review), the crowd approach will be best.

If your UGC is there to provide technical instructions, you will need subject matter experts who will be able to accurately preserve the meaning of the source input in the translation.

Characteristics of UGC

authored by non-professionals and / or non-native speakersoften similar patterns to oral speech

sometimes authored by power users / “techies”often highly perishable

multitude of authors = diversity of styles

Examples:•Short forms (nite (night), sayin (saying), gr8 (great)), •Acronyms (lol (laugh out loud), iirc (if I remember correctly)), •Typing errors/misspellings (wouls (would), rediculous (ridiculous)), •Punctuation omissions/errors (im (I’m), dont (don’t)), •Non-dictionary slang (that was well mint (that was very good)), •Wordplay (that was soooooo great (that was so great)), •Censor avoidance (sh1t, f***), •Emoticons (:) (smileys), <3 (heart))•Foreign words used intentionally (al dente, bon voyage)

(Roturier, 2011), (Jiang et al, 2012), (Clark & Araki, 2011)

Travel Portal – Company + Customer Content

Yellow = UGCGreen = Web UI

Technical User Forums

Online Marketplace

Input - Examples

→→ emoticons, typing errors, missing punctuation, grammar errors, ‘techie’ speak, slang, …

UGC & MT

UGC has become a part of the Brand Strategy > raw MT will not be enough in all cases

Utility scoring should be used to measure the quality of raw MT for UGC; it rates the comprehensibility & utility of the output rather than the linguistic quality

MT evaluation results for UGC indicate that around 50% or less of comments / reviews are considered comprehensible

Researchers are focusing efforts on normalization and preprocessing steps of UGC in order to improve MT output and reduce the PE effort

…. How much can and should post-editing fix?

UGC & MT - Normalization

Normalization is the manual or automated process of: taking non-standard input and pre-translating them using scripts, regular expressions and other processes in order to make the source text more ‘normal’ before machine translation. Example:

UGC – Example Post-Editing Scenarios

Online Marketplace – post-editing for MT engine re-training

Closer to Full PE, due to very specific PE requirements and expected knowledge around MT engine logic of post-editors.

E.g.: Change words instead of changing sentence structure, do not add or leave out any information

2. Knowledgebase – crowd-sourced post-editing Light PE quality requirements with focus on severe

mistranslations and fixing corrupted content

3. Travel portal user reviews - sanity check of raw MT output

Extra Light PE with focus on severe mistranslations and offensive content on high volume & high perishability content

Levels of Post-Editing for UGC Source Raw MT Dutch light PE Dutch full PE COMMENTS ON EDITS

We have stayed here and their has been a few stag and hens but nothing to worry about. They have been very respectful that its a family hotel

We hebben hier verbleven en hun is geweest een paar hert en kippen maar niets te vrezen. Ze zijn heel respectvol dat het is een familiehotel

We hebben hier verbleven en er zijn een paar vrijgezellenfeesten geweest maar niets te vrezen. Ze zijn heel respectvol dat het een familiehotel is

We hebben hier verbleven en er zijn een paar vrijgezellenfeesten geweest maar er is niets te vrezen. Ze zijn heel respectvol dat het een familiehotel is.

Light PE: mistranslation for stag and hens, literal translation entered but meant were stag and hen parties. Also typo in source has been taken over (their > hun) and this had to be corrected in the PE. Full PE: Additional rewrites, adding the word "er" to improve readability.

Yes it is in the bedroom, we did not have a bed settee

Ja het is in de slaapkamer, we hebben niet een slaapbank

NO EDITS REQUIRED Ja het is in de slaapkamer, we hebben geen slaapbank

Full PE: "niet een" changed to "geen" for readability.

my dad is in a wheelchair is this hotel suitable

mijn vader is in een rolstoel is dit hotel geschikt

NO EDITS REQUIRED Mijn vader zit in een rolstoel, is dit hotel geschikt?

Full PE: added punctuation, Capitalization and made the sentence more readable.

UGC & Quality Guidelines

Style Guide for Professional

Translation – typically between 20-50 pages, outlines client-specific

requirements

Not appropriate for UGC(in terms of content & resources)

Rating UGC quality - Community

Translating UGC – Talent Search

- Knowledge of target and source locale- Knowledge of content / subject-matter- Product users (power users,…)- People who like to write / use their language skills- People who belong to a user / interest group- May need to learn using CAT technology- May need to follow very specific instructions

Crowdsourcing The term crowdsourcing was coined by Jeff Howe in 2006, as the act of taking a task

traditionally performed by an employee or contractor, and outsourcing it to an undefined and generally large group of people in the form of an open call.

Crowd-sourced translation is currently the most active area of investment in the industry. (Kelly, 2013)

Requires a shared and easy-to-use translation / post-editing platform

In order for crowd-sourcing to be most effective (turnaround times & throughputs), a good-sized crowd needs to be built – requiring some form of incentive / motivation

Some quality guidance is required to prevent chaos

Crowd & Quality Guidelines While crowd guidelines need to be adapted to the specific crowd (professional / non-professional, linguists / power-user, paid / un-paid,...) and content & purpose (brand identity, technical specialization, special user community – e.g. luxury versus student travel,...), they should generally be:

Simple & easy-to-follow by all crowd members

Brief and ideally without reference to additionl checks & documents

Clarify main purpose of translation / post-editing assignment

Provide clear rules (rather than lists of exceptions)

They may point out the handling of specific technical items.

Possible Translation Scenarios

MT + normalization; paid crowd with basic instructions on “do’s and don’ts”; crowd can be mix of translators / customers / …

possibly MT with Full PE, but possibly professional HT / transcreation

MT + “accuracy check” PE; crowd of technical users, savvy on product, linguistic errors are ok

Thank You!

[email protected]@welocalize.com

mailto:[email protected]

mailto:[email protected]

taus quality summit dublin welocalize presentation by olga beregovaya and lena marg

Business

platform ugc

characteristics of ugc

ugc quality challenge

right ugc translation

quality of raw mt

different kinds of ugc

preprocessing steps

ugc world somethings