15. alessandro cattelan (translated) natural language processing for translation)
TRANSCRIPT
Natural Language
Processing for Translation Alessandro Cattelan, Translated srl
Extremely fragmented
market both in terms of
language service
providers and customers.
Language industry size
Language service industry
$33.5 billion in 2012
http://www.commonsenseadvisory.com/Portals/0/downloads/12
0531_QT_Top_100_LSPs.pdf
Large customers spend millions
of dollars a year in translation.
However, it is the smaller
customers with limited budgets
that make up most of the market.
Language industry customers
Specific characteristics
Larger customers Large budgets
Use technology (MT, TM, termbases, etc.)
Efficient processes (translation is part of the development cycle)
Smaller customers Tight budgets
No technology and no processes
Smaller Customers
Even though they are on a
tight budget and use no
technology for translation, we
can still give them something
better than this…
Common requirements
Both smaller and larger customers are interested in:
Getting high quality translations
Receiving the translation as soon as possible
Saving as much as possible
Challenge → Opportunity
No technology and no processes
to improve efficiency in translation
Develop technology and
processes to win customers
Content reuse
Large public translation memories
make it possible to leverage
previously translated content and to
reduce weighted word count.
Collecting data
Aligning bilingual content
Making data available in CAT tools
Translation
Memory
Translation Memory
Never translate the same sentence twice… nor part of it!
Improving matching algorithm for translation memories
EN IT
To open a file, select File from the
menu and click on Open
Per aprire un file, selezionare File
dal menu e fare clic su Apri
Select File from the menu […]
Translation Memory
Never translate the same sentence twice… nor part of it!
Improving matching algorithm for translation memories
Using MT to complete fuzzy matches
EN IT
Select File from the menu Selezionare File dal menu
Select File from the menu and
click on New document
Selezionare File dal menu […]
Machine Translation
Most of the times, customers do not have custom MT engines nor
the data to create an engine.
Use existing domain-specific engines, even though they are not
adapted to the customer
Adapt generic engines to specific domains (needs to be fast!)
Adapt the engine in real-time with the user translations
Using generic engines
Post-processing of MT output from generic engines:
Correcting terminology issues
Adapting output to previous translations
Managing mark-up…
“If I have seen further it is by standing on the shoulders of giants.” [I. Newton]
MT quality evaluation
Establishing the right weight for words translated by MT systems.
MT quality evaluation
What is a fair rate
for editing machine
translation output?
Confidence scores for MT
Matching metrics for TM
segments
MT quality perceived by the
user
Terminology Management
Terminology management can have a great impact on
quality and productivity.
Automatic extraction of terminology
Finding target language equivalents for source terms
Adding context to the terms
Any questions?