using an enhanced mda model in study of world englishes richard xiao university of central...
TRANSCRIPT
Using an enhanced MDA model in study of World Englishes
Richard Xiao
University of Central Lancashire
2
Overview of the talk
• Biber’s (1988) MF/MD analytical framework
• The enhanced multidimensional analysis (MDA) model
• An MDA analysis of five varieties of English in the ICE
3
Factor analysis
• The key to the multidimensional analysis approach
• A common data reduction method available in many standard statistics packages such as SPSS
• Reducing a large number of variables to a manageable set of underlying factors or dimensions
• Extensively used in social sciences to identify clusters of variables
4
Biber’s MF/MD approach
• Established in Biber (1988): Variation across Speech and Writing (CUP)– Factor analysis of 67 functionally related
linguistic features– 481 text samples, amounting to 960,000
running words• LOB• London-Lund• Brown corpus• A collection of professional and personal letters
5
Biber’s MF/MD approach
• Biber’s seven factors / dimensions– Informational vs. involved production– Narrative vs. non-narrative concerns– Explicit vs. situation-dependent reference– Overt expression of persuasion– Abstract vs. non-abstract information – Online informational elaboration– Academic hedging
6
Biber’s MF/MD approach
• Influential and widely used– Synchronic analysis of specific registers / genres and
author styles– Diachronic studies describing the evolution of
registers– Register studies of non-Western languages and
contrastive analyses– Research of University English and materials
development– Move analysis and study of discourse structure
• …largely confined to grammatical categories
7
The enhanced MDA model
• Enhancing Biber’s MDA by incorporating semantic components with grammatical categories– Wmatrix = CLAWS + USAS– A total of 141 linguistic features investigated
• 109 features retained in the final model
– Five million words in 2,500 text samples, with one million for each of the 5 varieties of English
• ICE – GB, HK, India, Singapore, the Philippines• 300 spoken + 200 written samples• 12 registers ranging from private conversation to academic
writing
8
ICE registers and proportionsS1A (20%) Spoken – Private
S1B (16%) Spoken – Public
S2A (14%) Spoken – Monologue – Unscripted
S2B (10%) Spoken – Monologue – Scripted
W1A (4%) Written – Non-printed – Non-professional writing
W1B (6%) Written – Non-printed – Correspondence
W2A (8%) Written – Printed – Academic writing
W2B (8%) Written – Printed – Non-academic writing
W2C (4%) Written – Printed – Reportage
W2D (4%) Written – Printed – Instructional writing
W2E (2%) Written – Printed – Persuasive writing
W2F (4%) Written – Printed – Creative writing
9
141 linguistic features covered
• A) Nouns 21 categories, e.g.– nominalisation, other nouns; 19 semantic classes of
nouns (e.g. evaluations, speech acts)
• B) Verbs: 28 categories, e.g.– Do as pro-verb, be as main verb, tense and aspect
markers, modals, passives, 16 semantic categories of verbs
• C) Pronouns: 10 categories, e.g.– Person, case, demonstrative
• D) Adjectives: 11 categories, e.g.– Attributive vs. predicative use, 9 semantic categories
10
141 linguistic features covered
• E) Adverbs: 7 categories• F) Prepositions (2 categories)• G) Subordination (3 categories)• H) Coordination (2 categories)• I) WH-questions / clauses (2 categories)• J) Nominal post-modifying clauses (5 categories)• K) THAT-complement clauses (3 categories)• L) Infinitive clauses (3 categories)• M) Participle clauses (2 categories)• N) Reduced forms and dispreferred structures (4
categories)• O) Lexical and structural complexity (3 categories)
11
141 Linguistic features covered
• P) Quantifiers (4 categories)• Q) Time expressions (11 categories)• R) Degree expressions (8 categories)• S) Negation (2 categories)• T) Power relationship (4 categories)• U) Definiteness (2 categories)• V) Helping/hindrance (2 categories)• X) Linear order (1 category)• Y) Seem / Appear (1 category)• Z) Discourse bin (1 category)
12
Procedure of data analysis• 1) Data clean-up• 2) Grammatical and semantic tagging with Wmatrix• 3) Extracting the frequencies of 141 linguistic features
from 2,500 corpus files• 4) Building a profile of normalised frequencies (per 1,000
words) for each linguistic feature• 5) Factor analysis
– Factor extraction (Principal Factor Analysis)– Factor rotation (Pramax)– Optimum structure: 9 factors
• 6) Interpreting extracted factors• 7) Computing factor scores• 8) Using the enhanced MDA model in exploration of
variation across registers and language varieties
13
The enhanced MDA model• Nine factors established in the new model
– 1) Interactive casual discourse vs. informative elaborate discourse
– 2) Elaborative online evaluation– 3) Narrative concern– 4) Human vs. object description – 5) Future projection– 6) Personal impression and judgement– 7) Lack of temporal / locative focus– 8) Concern with degree and quantity– 9) Concern with reported speech
• Robustness of the model in register analysis
14
5 English varieties across 9 factors
-20
-15
-10
-5
0
5
Factor1
Factor2
Factor3
Factor4
Factor5
Factor6
Factor7
Factor8
Factor9
Factors
Fac
tor
sco
re
GB
HK
IN
PH
SG
• Both differences and similarities• This general picture may blur many register-based subtleties
– Language can vary across registers even more substantially than across language varieties (cf. Biber 1995)
15
1) Interactive casual discourse vs. informative elaborate discourse
• Indian English displays the lowest score in nearly all registers - it is less interactive but more elaborate
– Sanyal (2007): “clumsy Victorian English [that] hangs like a dead Albatross around each educated Indian’s neck”
• Modern BrE appears to be most interactive and least elaborate (e.g. S1A, S1B, W2D)
• 3 varieties of English used in East and Southeast Asia are very similar
-50-40-30-20-10
0102030405060
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
RegisterF
acto
r sc
ore
GB HK IN PH SG
F=9.04, 4 d.f. p<0.001
16
2) Elaborative online evaluation
• BrE generally shows a higher score than non-native varieties of English (e.g. W2A, W1B, S2B)
• Non-native English varieties tend to be very similar in most registers
-6
-4
-2
0
2
4
6
8
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
RegisterF
acto
r sc
ore
GB HK IN PH SG
F=14.13 4 d.f.p<0.001
17
3) Narrative concern
• BrE demonstrates a greater propensity for narrative concern– Most noticeably in news reportage (W2C) and instructional writing (W2D)
• Indian English is least concerned with narrative– Esp. in registers like correspondence (W1B), instructional writing (W2D),
and unscripted monologue (S2A)
-8
-6
-4
-2
0
2
4
6
8
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
Register
Fac
tor
sco
re
GB HK IN PH SG
F=7.974 d.f.p<0.001
18
4) Human vs. object description
• Very close in a number of registers• Indian English and BrE show similarity in a greater range of
registers• HK and Singapore Englishes display great similarity
-6
-5
-4
-3
-2
-1
0
1
2
3
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
RegisterF
acto
r sc
ore
GB HK IN PH SG
F=5.92 4 d.f.p<0.001
19
5) Future projection
• BrE has the highest score in all printed written registers (W2A–W2F)• Indian English shows the lowest score in nearly all registers
-8
-6
-4
-2
0
2
4
6
8
10
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
RegisterF
acto
r sc
ore
GB HK IN PH SG
F=47.63 4 d.f.p<0.001
20
6) Personal impression / judgement
• Very similar in many registers…with most noticeable differences in non-printed written registers (W1A, W1B), non-academic writing (W2B), and news reportage (W2C)
• HK English displays a distribution pattern similar to Singapore English in spoken registers (S1A–S2B) and unpublished written registers (W1A, W1B), but it is very close to Philippine English in printed writing (W2A–W2F)
-4
-2
0
2
4
6
8
10
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
RegisterF
acto
r sc
ore
GB HK IN PH SG
F=12.25 4 d.f.p<0.001
21
7) Lack of temporal / locative focus
• Overall difference is not significant statistically– …but there are noticeable differences in some registers (e.g. W1B,
W2D)• Indian English demonstrates a consistently higher score in spoken
registers (S1A-S2B) – …but a lower score in unpublished writing (e.g. W1B)
-12
-10
-8
-6
-4
-2
0
2
4
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
RegisterF
acto
r sc
ore
GB HK IN PH SG
F=2.28 4 d.f.p=0.058
22
8) Concern with degree / quantity
• BrE generally displays a higher score in nearly all registers• HK English does not appear to be concerned with degree and quantity (e.g.
W2D)• Similarly Indian English also lacks a focus on degree and quantity (e.g.
W1B)
-6-5-4-3-2-1012345
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
RegisterF
acto
r sc
ore
GB HK IN PH SG
F=24.324 d.f.p<0.001
23
9) Concern with reported speech
• Overall difference is not significant• Noticeable difference in news reportage (W2C)
– East and Southeast Asian English varieties show a greater propensity for concern with reported speech than BrE and Indian English
-6
-4
-2
0
2
4
6
8
10
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
Register
Fac
tor
sco
re
GB HK IN PH SG
F=1.51 4 d.f.p=0.196
24
Summary and future research
• Summary– Seeking to enhance Biber’s MDA model with
semantic components– Introducing the new model in research of World
Englishes• Directions for future research
– More native English varieties from the Inner Circle– A wider and more balanced coverage of geographical
regions– Including socio-culturally relevant semantic categories– Combining corpora and more traditional resources in
socio-cultural studies and historical research• …adequately descriptive + sufficiently explanatory…
25
Thank you!