2013 gala miami: breaking into latin maerican markets on a small budget
DESCRIPTION
The Latin American market is composed of a mix of various Spanish dialects. If a company really wants to reach a specific audience in Latin America, it must use the right dialect. But how is it possible to translate marketing materials into four or five Spanish dialects without dramatically increasing costs? This session will discuss how a joint effort to create an MT engine for translating international Spanish into specific Latin American dialects (Spanish for Argentina, Chile, Columbia, Mexico, and Puerto Rico) made this challenge feasible, economical, and replicable.TRANSCRIPT
An MT Case Study:
Breaking into Latin American Markets
on a Small Budget
María Azqueta (SeproTec) & Diego Bartolomé (tauyou)
Spanish Worldwide
Spanish Language:
• Also known as Castellano.
• Latin-derived Romance language.
• Spanish is one of the six official languages of
the United Nations and an official language of
the European Union.
Spanish Worldwide
Spanish Worldwide
0 200 400 600 800 1000 1200
Mandarin Chinese
Spanish
English
Hindi/Urdu
407 million
311 million
955 million
360 million
Second most spoken language by number of native speakers
Spanish Worldwide
• For demographic reasons, the percentage of the
orld’s populatio that speaks Spa ish as a ati e language is increasing, while the percentage of
Chinese and English speakers is decreasing.
• Withi three or four ge eratio s, % of the orld’s population will communicate in Spanish.
• I 5 , the U ited States ill e the orld’s foremost Spanish speaking country.
Spanish on the Internet
• Spanish is the third most widely used language on
the Net.
• The use of Spanish on the Net has experienced a
growth rate of 807.4% between 2000 and 2011.
• Spain and Mexico are among the 20 countries with
the highest number of internet users.
• The demand for documents in Spanish is the fourth
largest fro a o g the orld’s la guages.
Spanish Worldwide and its Differences
High demand for translations into Spanish.
But… is the same Spanish spoken everywhere?
Spanish Worldwide and its Differences
RAE (Royal Spanish Academy) :
– Created in the 18th century, it is widely seen as
the arbiter of what is considered standard
Spanish.
– It produces authoritative dictionaries and
grammar guides.
– Although its decisions are not formally binding,
they are widely followed in both Spain and Latin
America.
Spanish Worldwide and its Differences
Lexical variations
Grammatical differences
Idioms
Different dialects and many differences:
Spanish Worldwide and its Differences
‘Neutral’ or ‘International’
Spanish
Latin American Spanish & European Spanish
Market Trend:
Why Adapt to the
Local Spanish of Each Country?
To reach different markets
People are most likely to buy when a product is advertised in their dialect
Why Adapt to the
Local Spanish of Each Country?
EN: Take a card from the deck
ES: Coge una carta de la baraja
Client A (Gaming Industry)
Why Adapt to the
Local Spanish of Each Country?
ES: Coge una carta de la baraja
AR: Agarrá una carta del mazo
CL: Toma una carta del naipe
CO: Coge una carta de la baraja
MX: Saca una carta de la baraja
PR: Coge una carta de la baraja
Coger (32 entries) http://rae.es/rae.html
1.tr. Asir, agarrar o tomar. U. t. c. prnl.
31. intr. vulg. Am. Realizar el acto sexual
Why Adapt to the
Local Spanish of Each Country?
Advise Clients
If you really want to break into a specific
market, you must decide which country
you want to target and localize your
material for the different Spanish dialects
spoken in each individual country.
The Main Problems Clients Face
Is there a cost-efficient solution
on the market?
tauyou MT Solution at SeproTec
Hybrid machine translation since January 2011
La guages: EN, ES, PT, GA, FR, IT…
Do ai s: Legal, Te h i al…
Glossaries and forbidden words lists
Average translated words per month: 700,000
Initial Brainstorming
MT from
EN > different ES dialects
Extensive post-editing would be required
Final Scope of the Project
Human translation + revision
English > Spanish (Spain)
MT of Spanish (Spain) into Spanish from:
• Argentina
• Chile
• Colombia
• Mexico
• Puerto Rico
Initial Approach for Latin American MT
Traditional Workflow
. Gather tra slatio e ories (EN → ES-XX)
2. Add generic material
3. Develop engine
4. Add linguistic pre- and post-processing
5. Improve quality over time
Drawbacks
Varying MT Quality
Depending on the domain and dialect
Initial Inconsistencies among Dialects
Handled with glossaries
Medium Post-Editing Effort
Could be improved over time
New Approach
Translate EN to Standard ES
Via standard high-quality human translation
Convert Standard ES to Latin American Variants
From Spanish to Spanish
Better final quality is achieved
Specifications
Countries
Argentina, Chile, Colombia, Mexico, Puerto Rico
Internal Glossaries to Handle Lexical Variations
It corrects discordance
Idioms
Grammatical Differences
It adapts verb tenses
Testing the Prototype Engine
Extraction of several texts (fashion, real-estate, human resources, automobile)
Sent to linguists and/or translators in each target country for localization
Performance of the same localizations by the engine
Comparison and contrasting of human and machine localization results
First Bug Report
Not all terms were localized
Concordance issues
(masc./fem.; sing./pl.)
Verbal tenses for Argentina
Human vs. Machine MT: 7.78 % error rate
First Bug Report
Some terms were changed/localized by the engine, but not by the humans.
(example)
Human error or MT error?
Testing the Prototype Engine
A glossary was created by extracting the terms localized by the linguists/translators.
This glossary was then sent to the same people who localized the texts to verify that all the terms were correctly localized and nothing was missing.
Testing the Prototype Engine
People can miss things.
Although many different variants of Spanish
exist, Spanish speakers understand many
terms that are foreign to their own dialect
when they read them in context,
sometimes to the point of accepting them
as their own. I believe that this may be
due to the phenomenon of globalization
and the internet.
Latest Bug Report
MT: 1.21% error rate
Achievements
Very little post-editing needed
Reduced error rate
Shortened deadlines
Significant cost reduction
Conclusions
Human localization is not perfect.
MT is not perfect either.
Combining human and machine translation
helps achieve high quality and reduce cost.
Further Work
Improving Glossaries
Through a simple web interface for PE
Extending Spanish Language Coverage
More dialects
Traductor.cervantes.es
Incorporating more languages
English, French and Portuguese
Bibliography
Yule, G. (2006). The Study of Language: Third
Edition, Cambridge University New York.
RAE
Instituto Cervantes
http://www.linguapress.com