register-specific interference in translation · biber’s bottom-up approach: ... typology, and...

31
Register-specific interference in translation Gert De Sutter Ghent University Stefan Evert FAU Erlangen-Nürnberg Stella Neumann RWTH Aachen University

Upload: dinhkien

Post on 05-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Register-specificinterference in translation

Gert De SutterGhent University

Stefan EvertFAU Erlangen-Nürnberg

Stella NeumannRWTH Aachen University

Page 2: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Register defined

Functional language variation according to use, determined by the context of situation (Halliday 1978)

Central role in language Language system: virtual collection of all possible linguistic

features, no existence outside of linguistic theorising Register: actually available linguistic features in a given situational

context Register determines the distribution of linguistic features and their

concrete specification Probabilistic perspective on features No language use outside of register: Halliday & Hasan (1989,

40): we are never selecting with complete freedom from all resources of our linguistic system

2 (Matthiessen 1993, Halliday 1991)

Translators actively collect register-specific corpora to adapt theirtranslation to the informally perceived features of the register

Page 3: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

The multi-dimensional character of registers

Texts characterised by features on all levels of linguistic description all of them to be included in register analysis

Biber’s bottom-up approach: n dimensions as the result of multivariate analysis of corpora

SFL-inspired top-down approach: Set of latent parameters that can be operationalised in terms of linguistic indicators

Both require a holistic approach

3 Biber (1995), Matthiessen (1993)

Page 4: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Translationese

Translation propertiesEffects such as normalisation/sanitisation, simplification,

explicitation etc. that make translations distinct from non-translated texts (e.g. Baker 1996)

Disputed in translation studies, butEffect not uniform across language pairs (i.e. universal), but

probabilistic (Toury 2004), playing out specifically per language pair

Multifactorial: Contradictory effect of different featuresRegister-specific (Neumann 2013, Delaere 2015)

Machine learning discriminates with high accuracyText classification: orig. vs. trans. (e.g. Volansky et al. 2015) Indirect evidence for translationese, i.e. language use

specific to translations

4

Page 5: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Interference / shining through

In SLA research: transfer of features from L1 into L2 Also observable in translations

but: translations by default from the L2 into the L1, i.e. opposite direction from using a foreign language

Why should the foreign language interfere with the native language?

Possible explanationsWriting under the influence: Individual features of the ST so

salient that translator uses them in the TTParallel activation of both language systems: since

translation takes place between two languages, bothlanguages must be activated and more than just the ST triggers might be at work (“genuine shining through”)

5

Page 6: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Lines of research our approach draws on

Traditional approach (e.g. Neumann to 2013) comparison of translated and original texts (or texts and their

translations) with respect to individual features theoretical interpretation in terms of linguistic functions

Machine learning: identification of translationese e.g. Baroni & Bernardini (2006), Koppel & Ordan (2011) usually based on low-level features (words, POS, n-grams)

Multivariate statistical analysis of variation (Biber 1988, …) latent (register) dimensions = groups of correlated features we use principal component analysis (PCA) instead of Biber's

factor analysis (for mathematical reasons) But cannot show fine distinctions between originals and

translations6

Page 7: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Our approach(Diwersy, Evert & Neumann 2014, Evert & Neumann 2017)

Theory-driven choice of features Distance between feature vectors = (dis)similarity of texts

wrt. theoretical framework / research question Exploratory multivariate analysis

identifies latent dimensions with PCA orthogonal projection interpreted as “perspective” on data set

Visualisation view shape of data set from different perspectives

Minimally supervised intervention introduce theory-neutral information (DE/EN and transl./orig.) linear discriminant analysis (LDA) identifies best ‘perspective’

for discrimination of these categories Interpretation

characteristic features of latent dimensions can be interpreted in terms of theoretical background (here: underlying functions)

7

Page 8: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Previous work

150 EN-DE and DE-EN translation pairs from the CroCo Corpus (Hansen-Schirra et al. 2012): focus on 5 relatively similar registers ESSAY, POPSCI, SHARE, SPEECH, WEB

27 lexico-grammatical indicators of underlying functions proposed by Neumann (2013) in the context of register theory Only comparable indicators E.g. nouns/tokens, finites/sentences, passives/verbs,

imperatives/sentences, adverbial themes/themes, contracted forms/tokens, lexical density, tokens/sentences

Each text represented as a feature vector in multi-dimensional space characterized by the 27 indicators (as z-scores)

“Shining through” perspective: LDA for EN vs. DE originals Complemented by PCA dimensions capturing register variation

8

Page 9: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Multivariate analysis: EN / DE discriminant

9

EN

DE

Page 10: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Multivariate analysis: Translations vs. originals

10

EN

DE

Page 11: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Distribution along the language discriminant

11

d = 1.5 d = -1.1significant***shining through

Page 12: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Evidence of direction-specific shining through

Separation of EN vs. DE original texts, withtranslations filling in space between the originalsClear evidence of shining-through based on comparable

register features

Directionality effect contradicts simple parallel activation assumption If shining through was due to parallel activation of both

language systems, the effect should be the same in both directions and independent of register

Potential explanation: diverging prestige of thelanguages involved (Toury 2012)Prestige could be modulated by register

12

Page 13: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Hypothesis

Translators are aware of and react to register-specifictranslation requirementsThis includes the specific amount of shining through

required/permissible in the given register

CroCo data suggests register effect, but too small formore detailed analysisMoreover, the robustness of the shining through effect

would be corroborated if found for other language pairs withsimilar constellations in terms of prestige

13

Page 14: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

The Dutch Parallel Corpus

Bi-directional corpus in the language pairs English-Dutch and French-Dutch (Macken et al. 2011)5017 texts, 10+ million word tokensSix domainsExisting alignment not used for this study

Language-specific PoS taggingEnglish sub-corpus re-tagged with CLAWS tagger (Garside

1987) for more specific coverage, higher accuracy

Currently focus on EN-NL and NL-EN translations

14

Page 15: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Preprocessing

Re-classification of text types drawing on Biber and Conrad’s (2009) situational characteristics (Delaere 2015, 61) New text types: Broad Commercial Texts, Instructive Texts,

Journalistic Texts, Legal Texts, Political Speeches, Specialized Communication, Tourist Information, Fiction

Data: 504 pairs of original and matching translated texts Text types with ≤ 10 texts removed (Legal, Tourist, Fiction) At most 75 pairs per text type & translation direction Short texts with < 500 tokens removed

(CroCo: texts of 500 – 5,000 tokens) 37 lexico-grammatical indicators as log-transformed z-scores

Logarithmic transformation reduces skew & outliers,especially if short texts are not excluded

15

Page 16: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Multivariate analysis: EN / NL discriminant

16

EN

NL

Page 17: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Multivariate analysis: Translations vs. originals

17

EN

NL

Page 18: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Distribution along the language discriminant

18

d = 0.7*** n.s.shiningthrough

Page 19: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Discriminant distribution across text types

19

Page 20: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Discriminant distribution across text types

20

*** n.s.

Page 21: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Discriminant distribution across text types

21

*** *

Page 22: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Discriminant distribution across text types

22

*** ***

Page 23: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Discriminant distribution across text types

23

n.s.n.s.

Page 24: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

EN / NL discriminant: Feature weights

24

EN

NL

Page 25: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

EN / NL discriminant: Contributions

25

Page 26: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

EN / NL discriminant: Contributions

26

Page 27: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Discussion

Register distribution clearly visible Smaller shining through effect than for EN-DE and

only for DutchQuestion of the language pair or of the corpus

design/compilation?Plausible: (over) normalisation in English

Register-specific shining throughBroad commercial and Specialised texts blur the distinction

of languages AND display strongest shining through effectUnusual effect in Journalistic texts possibly due to

incomparability of registers methodological issueTarget language orientation of Political speeches

27

Page 28: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Summary & outlook

Analysis rests on the comparability of featuresFlaws in the choice of features and their counting will create

artefacts

Corpus design matters Include French data from the DPC

More complex relationship between languages: interactionbetween French and Dutch

Methodological considerationsHow stable are the LDA and PCA dimensions? Idea: systematic bootstrapping of texts and features to

determine which patterns are “real” in the plots

28

Page 29: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

Thank you for your attention!

Stella Neumann Stefan Evert Gert De [email protected] [email protected] [email protected]

Page 30: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

References 1Baroni, Marco, and Silvia Bernardini. 2006. ‘A New Approach to the Study of Translationese: Machine-Learning the Difference between Original and Translated Text’. Literary and Linguistic Computing 21 (3): 259–274. Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: CUP.———. 1995. Dimensions of Register Variation. Cambridge: Cambridge University Press.Delaere, Isabelle. 2015. ‘Do Translators Walk the Line? Visually Exploring Translated and Non-Translated Texts in Search of Norm Conformity’. PhD dissertation, Ghent: University of Ghent.Delaere, Isabelle, Gert De Sutter, and Koen Plevoets. 2012. ‘Is Translated Language More Standardized than Non-Translated Language?: Using Profile-Based Correspondence Analysis for Measuring Linguistic Distances between Language Varieties.’ Target 24 (2): 203–24. Diwersy, Sascha, Stefan Evert, and Stella Neumann. 2014. ‘A Weakly Supervised Multivariate Approach to the Study of Language Variation’. In Aggregating Dialectology, Typology, and Register Analysis. Linguistic Variation in Text and Speech, edited by Benedikt Szmrecsanyi and Bernhard Wälchli, 174–204. Berlin/New York: de Gruyter.Evert, Stefan, and Stella Neumann. in press 2017. ‘The Impact of Translation Direction on Characteristics of Translated Texts. A Multivariate Analysis for English and German’. In Empirical Translation Studies. New Theoretical and Methodological Traditions, edited by Gert De Sutter, Marie-Aude Lefer, and Isabelle Delaere. Berlin: de Gruyter.Garside, Roger, and Nicholas Smith. 1997. ‘A Hybrid Grammatical Tagger: CLAWS4’. In Corpus Annotation: Linguistic Information from Computer Text Corpora, edited by Roger Garside, Geoffrey Leech, and Anthony McEnery, 102–121. London: Longman. Halliday, M. A. K. 1978. Language as Social Semiotic. The Social Interpretation of Language and Meaning. London: Arnold.

30

Page 31: Register-specific interference in translation · Biber’s bottom-up approach: ... Typology, and Register Analysis ... Characteristics of Translated Texts. A Multiv ariate Analysis

References 2Halliday, M.A.K. 1991. ‘Towards Probabilistic Interpretations’. In Functional and Systemic Linguistics. Approaches and Uses, edited by Eija Ventola, 39–61. Berlin, New York: Mouton de Gruyter.Halliday, M. A. K., and Ruqaiya Hasan. 1989. Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective. Oxford: Oxford University Press.Hansen-Schirra, Silvia, Stella Neumann, and Erich Steiner. 2012. Cross-Linguistic Corpora for the Study of Translations - Insights from the Language Pair English-German. de Gruyter Mouton. Koppel, Moshe, and Noam Ordan. 2011. ‘Translationese and Its Dialects’. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 1318–26. Portland, Oregon: Association for Computational Linguistics. Macken, Lieve, Orphée De Clercq, and Hans Paulussen. 2011. ‘Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus’. Meta: Journal des traducteurs 56 (2): 374–90. Matthiessen, Christian M. I.M. 1993. ‘Register in the Round: Diversity in a Unified Theory of Register Analysis’. In Register Analysis. Theory and Practice, edited by Mohsen Ghadessy, 221–292. London: Pinter.Neumann, Stella. 2013. Contrastive Register Variation. A Quantitative Approach to the Comparison of English and German. Berlin, Boston: de Gruyter Mouton. Toury, Gideon. 2004. ‘Probabilistic Explanations in Translation Studies. Welcome as They Are, Would They Qualify as Universals?’ In Translation Universals. Do They Exist?, edited by Anna Mauranen and Pekka Kujamäki, 15–32. Amsterdam/Philadelphia: Benjamins.———. 2012. Descriptive Translation Studies – and beyond: Revised Edition. 2nd ed. Vol. 100. Benjamins Translation Library. Amsterdam: Benjamins. Volansky, Vered, Noam Ordan, and Shuly Wintner. 2015. ‘On the Features of Translationese’. Digital Scholarship in the Humanities 30 (1): 98–118.

31