f rom t ranslation m achine t heory to m achine t ranslation t heory – some initial t houghts...

Post on 28-Mar-2015

227 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FROM TRANSLATION MACHINE THEORY TO MACHINE TRANSLATION THEORY

– SOME INITIAL THOUGHTS

Oliver ČuloUniversität Mainz

culo@uni-mainz.de

MT AS TRANSLATION MACHINE THEORY

TOPICS OF (EARLY) SMT

• Calculating translation models (Brown et al. 1993)

• sentence alignment (Gale & Church 1991)• word alignment (Och & Ney 2003)

…and a plethora of papers on how to improve these

RECENT RESURGENCE OF LINGUISTICS

• MT and the phrase (Fox 2002, Koehn et al. 2003, Eisele 2006)

• MT and dependency (Ding & Palmer 2005, Quirk et al. 2005, Žabokrtský et al. 2008)

• hybrid architectures (Eisele et al. 2008)• domain adaption (Koehn & Schroeder 2007,

Bertoldi & Federico 2009)• factored models (Koehn & Hoang 2007)• …

TRANSLATION-THEORETIC MODELLING OF MT

MT AND FUNCTIONAL TRANSLATION THEORY (1)

• Skopos theory (Reiss & Vermeer 1984)• pragmalinguistic model (House 1997),

function and loyalty (Nord 1997, 2006)

functional equivalence change in functiondocumentary instrumental

over covert

MT AND FUNCTIONAL TRANSLATION THEORY (2)

• aimed at functional equivalence (but does a machine or a GT user know?)

• aimed at instrumental (but in fact rather documentary; ethical dimensions?)

MT AND FUNCTIONAL TRANSLATION THEORY (3)

• MT and its lack of translation–functional considerations in system design (Schmidt in print)

• “human, purposeful action”-theoretic conception of translation as hindrance to acceptance of MT (Rozmyslowicz in print)

KNOWLEDGE TRANSFER TS -> MT

English texts

German texts

Reference Corpus ER

Reference Corpus GR

Register-control led Corpus EO

Register-control led Corpus GO

Translat ion Corpus GTrans

T ranslat ion Corpus ET rans

17 registers, 2, 000 w ord

samples each

68, 000 words

8 registers, at least 10 texts each, 3, 125 w ords (av. )

1 mil l ion words

English texts

German texts

Reference Corpus ER

Reference Corpus GR

Register-control led Corpus EO

Register-control led Corpus GO

Translat ion Corpus GTrans

T ranslat ion Corpus ET rans

17 registers, 2, 000 w ord

samples each

68, 000 words

17 registers, 2, 000 w ord

samples each

17 registers, 2, 000 w ord

samples each

68, 000 words

8 registers, at least 10 texts each, 3, 125 w ords (av. )

1 mil l ion words

8 registers, at least 10 texts each, 3, 125 w ords (av. )

8 registers, at least 10 texts each, 3, 125 w ords (av. )

1 mil l ion words

CROCO

CROCO STRUCTURE: MULTILINGUAL

Register-controlled Corpus

Translation Corpus

Word layer

Word layer

Chunk layer

Chunk layer

Clause layer

Clause layer

Sentence layer

Sentence layer

+ Metainformatio

n+ PoS tagging + Morphology+ Sense relations

+ Phrase structure+ Grammatical functions

Alignment

layers

Tray 1 holds

In Fach 1 können bis zu 125 Blatt Papier eingelegt werden

PROBJ

SUBJ

SUBJ

FIN

FIN PRED

12

up to 125 sheets

DOBJ

FUNCTION SHIFTS (TYPOLOGICAL DIFFERENCES)

E2G_ESSAY

G2E_ESSAY

E2G_FICTIO

N

G2E_FICTIO

N

E2G_INST

R

G2E_INST

R05

101520253035404550

subj-*advsubj-*obj*adv-subj*obj-subj

FUNCTION SHIFTS PER REGISTER AND TRANSLATION DIRECTION

GRAMMATICAL FUNCTIONS IN THEME POSITION

EO_SHARE ETRANS_SHARE GO_SHARE GTRANS_SHARE

0

20

40

60

80

100

120

other

verbadv

compl

obj

subj

MT AND TRANSLATION FACTORS:REGISTER AND TRANSLATION DIRECTION

• often spoken of domains, but that term is too vague

• Kurokawa et al. (2009) – training translation models according to translation

direction (A), and without (B)– for a performance of (A) equivalent to (B), they

needed only ca. 1/5 of the data size• feature selection problem: which feature per

register and translation direction (e.g. Diwersy et al. 2013, also an overview in Oakes & Ji 2012)

POST-EDITING

INCREASING ROLE OF MT IN TRANSLATION

• MT integrated into Translation Memories, many translation workflows (SDL 2011, Bajon et al. 2012, O‘Brien 2012)

• as MT needs to be post-edited, in consequence post-editing becomes a more and more important component of the translator’s job profile

CRITT TPR DATABASE

project coordinator: Copenhagen Business School

English-German data collection at FTSK in Germersheim

translation vs. post-editing vs. (blind) editing

6 source texts (ST) with different complexity levels

(Hvelplund 2011)

12 professional translators, 12 semi-professional

translators

MT system: Google Translate

eye-tracking (Tobii TX 300), key-logging (Translog II),

retrospective questionnaires

EYE-TRACKING AND KEY-LOGGING POST-EDITING

PROCESSING TIMES

cf. Carl, Gutermuth & Hansen-Schirra in print

PROCESSING STYLES

Time

Wor

d nu

mbe

r

Time

Wor

d nu

mbe

r

PROCESSING PATTERNS

Time

Wor

d nu

mbe

r

Time

Wor

d nu

mbe

r

INTERFERENCE

ST: In a gesture sure to rattle the Chinese Government, Steven Spielberg pulled out of the Beijing Olympics to protest against China's backing for Sudan's policy in Darfur.

HT: Als Zeichen des Widerstands gegen die Chinesische Regierung... ‘As sign the-GEN. resistance against the Chinese government…’

LACK OF CONSISTENCY

ST: Killer nurse receives four life sentences. Hospital nurse C.N. was imprisoned for life today for the killing of four of his patients.

PE: Killer-Krankenschwester zu viermal lebenslanger Haft verurteilt. Der Krankenpfleger C.N. wurde heute auf Lebenszeit eingesperrt für die Tötung von vier seiner Patienten.

‘Killer nurse.FEM to four times lifetime imprisonment sentenced. The nurse.MASC C.N. was today on lifetime imprisoned for the killing of four his.GEN patients.

OVERVIEW

CONCLUSIONS AND SUGGESTIONS

FUTURE WORK

• Entrenchment of MT in TS (theory): – common ground– more acceptance– improved description of MT workflow for the

translator– imrpoved task descriptions for PE

SOME TENTATIVE SUGGESTIONS TO OURSELVES FOR BETTER TASK DESCRIPTION BASED ON TRANSLATOR CONCEPTS

Task description Function of the text (e.g. Nord 2006, House 1997)

terminological idiomaticity

As little as possible (rapid PE)

documentary Conceptually equivalent, non-terms but also dispreferred or deprecated terms may be used

Unidiomatic, but understandable wording may remain (disambiguated at word level!)

As much as possible (full PE)

Covert instrumental

Only allowed terms can be used

Phraseology according to the domain

Intermediate levels Overt instrumental (usable, but identifiable as translation)

Only terms, but also dispreferred and maybe deprecated

Idiomatic, but also non-standard phraseology

THANK YOU FOR YOUR ATTENTION!

... AND YOUR QUESTIONS, COMMENTS, ...

REFERENCES (1)Bertoldi, Nicola, and Marcello Federico. 2009. “Domain Adaption for Statistical Machine Translation with

Monolingual Resources.” In Proceedings of the Fourth Workshop on Statistical Machine Translation, 182–189. Athens, Greece: Association for Computational Linguistics.

Brown, Peter E., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. “The Mathematics of Statistical Machine Translation: Parameter Estimation.” Computational Linguistics 2 (19): 263–311.

Eisele, Andreas. 2006. “Parallel Corpora and Phrase-based Statistical Machine Translation for New Language Pairs via Multiple Intermediaries.” In 5th International Conference on Language Resources and Evaluation (LREC) 2006.

Eisele, Andreas, Christian Federmann, Hans Uszkoreit, Saint-Amand Hervé, Martin Kay, Michael Jellinghaus, Sabine Hunsicker, Teresa Herrmann, and Yu Chen. 2008. “Hybrid Architectures for Multi-Engine Machine Translation.” In Translating and the Computer 30. London, UK.

Fox, Heidi J. 2002. “Phrasal Cohesion and Statistical Machine Translation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 304–11. Philadelphia: ACL.

Gale, William A, and Kenneth W Church. 1993. “A Program for Aligning Sentences in Bilingual Corpora.” Computational Linguistics 19 (1): 75–102.

House, Juliane. 1997. Translation Quality Assessment. A Model Revisited. Tübingen: Gunter Narr Verlag.Koehn, Philipp, Franz Josef Och, and Daniel Marcu. 2003. “Statistical Phrase-Based Translation.” In Proceedings

of HLT-NAACL 2003, 127–133.Koehn, Philipp, and Josh Schroeder. 2007. “Experiments in Domain Adaptation for Statistical Machine

Translation.” In ACL Workshop on Machine Translation 2007.

REFERENCES (2)Kurokawa, David, Cyril Goutte, and Pierre Isabelle. 2009. “Automatic Detection of Translated Text and Its Impact

on Machine Translation.” Proceedings. MT Summit XII, The Twelfth Machine Translation Summit International Association for Machine Translation Hosted by the Association for Machine Translation in the Americas.

Lapshinova-Koltunski, Ekaterina. 2013. “VARTRA: A Comparable Corpus for the Analysis of Translation Variation.” In Proceedings of the 6th Workshop on Building and Using Comparable Corpora, 77–86. Sofia, Bulgaria.

Lembersky, Gennadi, Noam Ordan, and Shuly Wintner. 2012. “Language Models for Machine Translation: Original Vs. Translated Texts.” Computational Linguistics 38 (4): 799–825.

Nord, Christiane. 1997. Translating as a Purposeful Activity. Functionalist Approaches Explained. Translation Theories Explained 1. Manchester: Jerome.

———. 2006. “Translating for Communicative Purposes Across Culture Boundaries.” Journal of Translation Studies 9 (1): 43–60.

Och, Franz-Josef, and Hermann Ney. 2003. “A Systematic Comparison of Various Statistical Alignment Models.” Computational Linguistics 29 (1): 19–51.

Reiss, Katharina, and Hans J. Vermeer. 1984. Grundlegung Einer Allgemeinen Translationstheorie. Linguistische Arbeiten 147. Tübingen: M. Niemeyer.

top related