how translators work in real life: scate observations

33
How translators work in real life: SCATE observations Frieda Steurs Iulianna van der Lek-Ciudin Tom Vanallemeersch

Upload: truongnhu

Post on 30-Dec-2016

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: How translators work in real life: SCATE observations

How translators work in

real life:

SCATE observations

Frieda Steurs

Iulianna van der Lek-Ciudin

Tom Vanallemeersch

Page 2: How translators work in real life: SCATE observations

What & Why

Improve translation efficiency and consistency

Underexploited translation resources

Poor integration of speech recognition

Overloaded interfaces

Page 3: How translators work in real life: SCATE observations

March 2014 - February 2018

Consortium

Centre for Computational Linguistics, University of Leuven

Industrial Advisory Committee

Page 4: How translators work in real life: SCATE observations

Today’s

focus

Page 5: How translators work in real life: SCATE observations

Methods Survey

Contextual inquiries

Page 6: How translators work in real life: SCATE observations

Methods

Survey: Dec 2014 – Feb 2015

46 questions

187 complete responses (75% from EU)

73 % freelance translators

25 % in-house translators

Few terminologists, interpreters, project

managers, post-editors

Contextual Inquiries: Nov 2014 - June 2015

16 professionals at their workplaces (BE, NL, LU)

Semi-structured interviews, observations, think-aloud,

post-interviews

Page 7: How translators work in real life: SCATE observations

Whom did we observe?

Organization type

Small translation agency

Medium-size translation/interpreting agency

Public institution

Freelance

Language pairs EN-NL/NL-EN , FR-NL, EN-FR, EN-RO

Translation experience 2-5 years vs. 5 + years

Domains of expertise Legal, ICT, Medical, Marketing

Main TEnT

Trados Studio 2014, Trados Studio 2011, Trados

Workbench, Déjà Vu X3, memoQ 2014,

Wordbee

Experience with TEnT <1 year (2) vs. 5+ years

Page 8: How translators work in real life: SCATE observations

Main findings and implications Needs and shortcomings of tools

Observation of terminological strategies

Page 9: How translators work in real life: SCATE observations

Translators’ Linguistic Resources

Resource State-of-the-art Opportunities

Translation

Memories

• Heavily used

• Concordance, term look-up

features, term extraction

• Term extraction rarely used

• Alignment

• No support for comparable

corpora (possible to upload

monolingual documents for

reference)

• Syntactic concordance

• Bilingual/multilingual

term extraction

• More focus on

monolingual corpora

• Features to compile

and query comparable

corpora

Online Translation

Memories

• Perform look-up during

translation

• Automatic insertion

• Concordance searches

• Moderate-low quality control

• More advanced filtering

techniques

• QA tools

Page 10: How translators work in real life: SCATE observations

Translators’ Linguistic Resources

Resource State-of-the-art Opportunities

Local term bases • Usage is still low (SCATE

survey -> 52%)

• Automatic term recognition

• Basic categories

• TBX not adopted by all tool

developers

• Users prefer to exchange

data in CSV, Excel

• Improve usability

• More flexibility and

customization to suit

users with different

needs

• A unified interface for

online/local term bases

• Support for ontologies

Online term banks Perform look-up (exact/fuzzy)

during translation

Advanced pre-filtering,

techniques, better look-up

interfaces

Online dictionaries,

search engines

Consulted either online or via

a WebSearch feature in CAT

Concordance-like searches

directly from the translation

editor

Page 11: How translators work in real life: SCATE observations

Translators’ Linguistic Resources

Resource State-of-the-art Opportunities

Machine Translation • Usage is still low (SCATE

survey 27%)

• Consulted online

• Via API in CAT

• Segment assembly

(DejaVu, memoQ)

• Autocompletion

suggestions (SDL Trados)

• Adaptive MT (MateCat,

Lilt)

• Improve confidence

estimation

• Interfaces for post-

editing

• Train own MT engine

with own TMs, TBs

Page 12: How translators work in real life: SCATE observations

Term collection

• Manually (88%)

• Semi-automatically via term extraction programs (22%)

Term storage

• CAT TB (52%) Most frequent form/canonical form

• MS Excel (43%) The language equivalents (56%)

• MS Word (27%)

Term research

• Online resources (94%)

• Personal resources (85%)

• Client’s resources (64%)

SCATE Users’ survey 2014-2015

187 survey participants

139 perform terminology activities

Page 13: How translators work in real life: SCATE observations

Search engines

Google

Bing

Online dictionaries

Oxford

Proz.com

Van Dale

TermWiki Search

TermCoord glossary links

Term banks

IATE

Termium Plus

EuroTerm Bank

FAOTERM

WTOTERM

Monolingual Corpora

Eur-lex

Global web-based English

British National corpus

Corpus of contemporary AE

Parallel corpora

Linguee

Europarl

Glosbe

TAUS Search

SCATE Users’ survey 2014-2015

Most used online terminology resources

Page 14: How translators work in real life: SCATE observations

Reasons for NOT managing terminology

No knowledge about terminology management

theory and principles

It is the responsibility of somebody else

It has no added value

It is a time-consuming task

Term bases are complex

Reliance on the translation memories

SCATE Users’ survey 2014-2015

Page 15: How translators work in real life: SCATE observations

Systematic terminology

management

• Collect terms and concepts

from global field

• Construct a concept

system

• Create well-structured

definitions

• Create term entries

Ad-hoc terminology

management

• Identify terms in isolated

contexts

• Create initial term entries

• Add definition, context ….

Adapted from Handbook of

Terminology Management Vol 1.

Medium & small

LSPs, freelancers

In-house translation

departments of large

organizations

Page 16: How translators work in real life: SCATE observations

Terminology strategies

Institution In-house translation

departments

Translators / terminologists

In-house terminology coordination

Systematic and ad-hoc terminology management

Term extraction – not a standard practice!

16

Terminology tools Translation

tools

IATE database SDL Trados

Studio

Eur-Lex In-house MT

Quest Metasearch (Bilingual) Voice recognition

Euramis Concordance

DGT Vista

Electronic dictionaries,

glossaries

Term extraction tools:

SynchroTerm, SDL MultiTerm

Extract, TermTreffer

External corpus query tools,

e.g. TextStat

Page 17: How translators work in real life: SCATE observations

Terminology strategies

Adapted after TermCoord documentation

Page 18: How translators work in real life: SCATE observations

Terminology strategies

Proactive terminology management

Preparation of “TermFolders” for important legislative

procedures:

Desktop research

Manual collection of web links and relevant

documents

Manual identification and extraction of term

candidates

….

Page 19: How translators work in real life: SCATE observations

Terminology strategies

Time-consuming No GlobalSearch

DIY Corpora

tools?

SCATE?

Page 20: How translators work in real life: SCATE observations

Terminology strategies

Small and medium-size LSPs, freelancers

Mainly ad-hoc, basic terminology management due to:

o Time pressure

o Lack of financial compensation

o Over-reliance on translation memories

o A general lack of knowledge and awareness of the

benefits of terminology management

o Not familiar with corpus compilation and query tools

Page 21: How translators work in real life: SCATE observations

Ad-hoc terminology strategies during translation

• LGP, terminology, phraseology, names of entities, typography/punctuation…

• Highlight or copy/paste SL term

Identify problem

• Local resources: Concordance, Term Look-up, Find & Replace, Global search

• Online resources via WebSeach or other integrated widgets

• MT via plugins, if available & allowed

• Online resources: Google -> Top hits (Bookmark link?)

• Contact client via e-mail or an online query spreadsheet

• Contact subject matter experts

Search for a solution

• One click

• Copy/paste

Insert translation

• Term base / Excel

Save terms

Page 22: How translators work in real life: SCATE observations

Implications

For translators, project managers, terminologists,

interpreters, translators’ educators:

Basic knowledge of terminology theory and practice

Terminology management tools

Preparation of glossaries before the start of the

project with the help of:

Corpus compilation and query tools (BootCat,

AntConc, SketchEngine)

Term extraction tools (SynchroTerm, Similis)

More focus on comparable corpora

Page 23: How translators work in real life: SCATE observations

Implications

For software developers:

Focus more on usability and personalization

Unified interfaces between local and online

resources

More sophisticated search functionalities

Integrate online resources that are

actually used by the users

More focus on comparable corpora

Page 24: How translators work in real life: SCATE observations

SCATE approach

Page 25: How translators work in real life: SCATE observations

SCATE research

Improvement of bilingual and multilingual term

extraction techniques from comparable

corpora

Integration of a syntactic concordancer in

parallel corpora: e.g. Poly-Gretel

Page 26: How translators work in real life: SCATE observations

Multilingual term extraction from

comparable corpora

A gold standard for Automatic Terminology Extraction

Compilation – Annotation – Evaluation

# words Hartfailure Wind

energy Corruption

Corruption

(parallel)

English 48.843 324.842 454.904 179.229

French 55.383 358.853 547.072 230.874

Dutch 50.850 315.605 476.179 223.495

Page 27: How translators work in real life: SCATE observations

Annotation: 4 labels (Term, Common Term, Out of Domain

Term and Named Entity) with elaborate and practical

guidelines

Evaluation: inter-annotator agreement between 3 annotators

after 2 iterations (av. f-score = 0,895; av. Cohen‘s kappa =

0.927)

Future work: linking the annotations in the comparable

medical corpus across all 3 languages

A Gold Standard for Automatic

Terminology Extraction

Page 28: How translators work in real life: SCATE observations

Bilingual lexicon induction from comparable

corpora

Techniques for extracting word representations:

o multilingual topic models

o multilingual word embedding models

o character-level representations

Comparable corpora

Cross-lingual semantic word representations

Bilingual lexicon

Best results

Page 29: How translators work in real life: SCATE observations

Poly-Gretel

Bilingual syntactic concordancer

Query parallel corpora

Available online at: http://gretel.ccl.kuleuven.be/poly-gretel/ebs/input.php?1477144000

Target audience:

Computer-assisted language learning (CALL)

Translators

Translation studies and comparative linguistics

Page 30: How translators work in real life: SCATE observations

Poly-Gretel

EN noun + report ↔ NL verslag + prep + noun

Example query:

Page 31: How translators work in real life: SCATE observations

Poly-Gretel

EN noun + report ↔ NL verslag + prep + noun

EN-NL constituents are automatically aligned

Page 32: How translators work in real life: SCATE observations

Poly-Gretel

EN noun + report ↔ NL noun

Example query:

Many compounds are possible