using an enhanced mda model in study of world englishes richard xiao university of central...

Using an enhanced MDA model in study of World Englishes

Richard Xiao

University of Central Lancashire

[email protected]

mailto:[email protected]

2

Overview of the talk

• Biber’s (1988) MF/MD analytical framework

• The enhanced multidimensional analysis (MDA) model

• An MDA analysis of five varieties of English in the ICE

3

Factor analysis

• The key to the multidimensional analysis approach

• A common data reduction method available in many standard statistics packages such as SPSS

• Reducing a large number of variables to a manageable set of underlying factors or dimensions

• Extensively used in social sciences to identify clusters of variables

4

Biber’s MF/MD approach

• Established in Biber (1988): Variation across Speech and Writing (CUP)– Factor analysis of 67 functionally related

linguistic features– 481 text samples, amounting to 960,000

running words• LOB• London-Lund• Brown corpus• A collection of professional and personal letters

5


• Biber’s seven factors / dimensions– Informational vs. involved production– Narrative vs. non-narrative concerns– Explicit vs. situation-dependent reference– Overt expression of persuasion– Abstract vs. non-abstract information – Online informational elaboration– Academic hedging

6


• Influential and widely used– Synchronic analysis of specific registers / genres and

author styles– Diachronic studies describing the evolution of

registers– Register studies of non-Western languages and

contrastive analyses– Research of University English and materials

development– Move analysis and study of discourse structure

• …largely confined to grammatical categories

7

The enhanced MDA model

• Enhancing Biber’s MDA by incorporating semantic components with grammatical categories– Wmatrix = CLAWS + USAS– A total of 141 linguistic features investigated

• 109 features retained in the final model

– Five million words in 2,500 text samples, with one million for each of the 5 varieties of English

• ICE – GB, HK, India, Singapore, the Philippines• 300 spoken + 200 written samples• 12 registers ranging from private conversation to academic

writing

8

ICE registers and proportionsS1A (20%) Spoken – Private

S1B (16%) Spoken – Public

S2A (14%) Spoken – Monologue – Unscripted

S2B (10%) Spoken – Monologue – Scripted

W1A (4%) Written – Non-printed – Non-professional writing

W1B (6%) Written – Non-printed – Correspondence

W2A (8%) Written – Printed – Academic writing

W2B (8%) Written – Printed – Non-academic writing

W2C (4%) Written – Printed – Reportage

W2D (4%) Written – Printed – Instructional writing

W2E (2%) Written – Printed – Persuasive writing

W2F (4%) Written – Printed – Creative writing

9

141 linguistic features covered

• A) Nouns 21 categories, e.g.– nominalisation, other nouns; 19 semantic classes of

nouns (e.g. evaluations, speech acts)

• B) Verbs: 28 categories, e.g.– Do as pro-verb, be as main verb, tense and aspect

markers, modals, passives, 16 semantic categories of verbs

• C) Pronouns: 10 categories, e.g.– Person, case, demonstrative

• D) Adjectives: 11 categories, e.g.– Attributive vs. predicative use, 9 semantic categories

10

141 linguistic features covered

• E) Adverbs: 7 categories• F) Prepositions (2 categories)• G) Subordination (3 categories)• H) Coordination (2 categories)• I) WH-questions / clauses (2 categories)• J) Nominal post-modifying clauses (5 categories)• K) THAT-complement clauses (3 categories)• L) Infinitive clauses (3 categories)• M) Participle clauses (2 categories)• N) Reduced forms and dispreferred structures (4

categories)• O) Lexical and structural complexity (3 categories)

11

141 Linguistic features covered

• P) Quantifiers (4 categories)• Q) Time expressions (11 categories)• R) Degree expressions (8 categories)• S) Negation (2 categories)• T) Power relationship (4 categories)• U) Definiteness (2 categories)• V) Helping/hindrance (2 categories)• X) Linear order (1 category)• Y) Seem / Appear (1 category)• Z) Discourse bin (1 category)

12

Procedure of data analysis• 1) Data clean-up• 2) Grammatical and semantic tagging with Wmatrix• 3) Extracting the frequencies of 141 linguistic features

from 2,500 corpus files• 4) Building a profile of normalised frequencies (per 1,000

words) for each linguistic feature• 5) Factor analysis

– Factor extraction (Principal Factor Analysis)– Factor rotation (Pramax)– Optimum structure: 9 factors

• 6) Interpreting extracted factors• 7) Computing factor scores• 8) Using the enhanced MDA model in exploration of

variation across registers and language varieties

13

The enhanced MDA model• Nine factors established in the new model

– 1) Interactive casual discourse vs. informative elaborate discourse

– 2) Elaborative online evaluation– 3) Narrative concern– 4) Human vs. object description – 5) Future projection– 6) Personal impression and judgement– 7) Lack of temporal / locative focus– 8) Concern with degree and quantity– 9) Concern with reported speech

• Robustness of the model in register analysis

14

5 English varieties across 9 factors

-20

-15

-10

-5

0

5

Factor1

Factor2

Factor3

Factor4

Factor5

Factor6

Factor7

Factor8

Factor9

Factors

Fac

tor

sco

re

GB

HK

IN

PH

SG

• Both differences and similarities• This general picture may blur many register-based subtleties

– Language can vary across registers even more substantially than across language varieties (cf. Biber 1995)

15

1) Interactive casual discourse vs. informative elaborate discourse

• Indian English displays the lowest score in nearly all registers - it is less interactive but more elaborate

– Sanyal (2007): “clumsy Victorian English [that] hangs like a dead Albatross around each educated Indian’s neck”

• Modern BrE appears to be most interactive and least elaborate (e.g. S1A, S1B, W2D)

• 3 varieties of English used in East and Southeast Asia are very similar

-50-40-30-20-10

0102030405060

S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F

RegisterF

acto

r sc

ore

GB HK IN PH SG

F=9.04, 4 d.f. p<0.001

16

2) Elaborative online evaluation

• BrE generally shows a higher score than non-native varieties of English (e.g. W2A, W1B, S2B)

• Non-native English varieties tend to be very similar in most registers

-6

-4

-2

0

2

4

6

8


RegisterF

acto

r sc

ore

GB HK IN PH SG

F=14.13 4 d.f.p<0.001

17

3) Narrative concern

• BrE demonstrates a greater propensity for narrative concern– Most noticeably in news reportage (W2C) and instructional writing (W2D)

• Indian English is least concerned with narrative– Esp. in registers like correspondence (W1B), instructional writing (W2D),

and unscripted monologue (S2A)

-8

-6

-4

-2

0

2

4

6

8


Register

Fac

tor

sco

re

GB HK IN PH SG

F=7.974 d.f.p<0.001

18

4) Human vs. object description

• Very close in a number of registers• Indian English and BrE show similarity in a greater range of

registers• HK and Singapore Englishes display great similarity

-6

-5

-4

-3

-2

-1

0

1

2

3


RegisterF

acto

r sc

ore

GB HK IN PH SG

F=5.92 4 d.f.p<0.001

19

5) Future projection

• BrE has the highest score in all printed written registers (W2A–W2F)• Indian English shows the lowest score in nearly all registers

-8

-6

-4

-2

0

2

4

6

8

10


RegisterF

acto

r sc

ore

GB HK IN PH SG

F=47.63 4 d.f.p<0.001

20

6) Personal impression / judgement

• Very similar in many registers…with most noticeable differences in non-printed written registers (W1A, W1B), non-academic writing (W2B), and news reportage (W2C)

• HK English displays a distribution pattern similar to Singapore English in spoken registers (S1A–S2B) and unpublished written registers (W1A, W1B), but it is very close to Philippine English in printed writing (W2A–W2F)

-4

-2

0

2

4

6

8

10


RegisterF

acto

r sc

ore

GB HK IN PH SG

F=12.25 4 d.f.p<0.001

21

7) Lack of temporal / locative focus

• Overall difference is not significant statistically– …but there are noticeable differences in some registers (e.g. W1B,

W2D)• Indian English demonstrates a consistently higher score in spoken

registers (S1A-S2B) – …but a lower score in unpublished writing (e.g. W1B)

-12

-10

-8

-6

-4

-2

0

2

4


RegisterF

acto

r sc

ore

GB HK IN PH SG

F=2.28 4 d.f.p=0.058

22

8) Concern with degree / quantity

• BrE generally displays a higher score in nearly all registers• HK English does not appear to be concerned with degree and quantity (e.g.

W2D)• Similarly Indian English also lacks a focus on degree and quantity (e.g.

W1B)

-6-5-4-3-2-1012345


RegisterF

acto

r sc

ore

GB HK IN PH SG

F=24.324 d.f.p<0.001

23

9) Concern with reported speech

• Overall difference is not significant• Noticeable difference in news reportage (W2C)

– East and Southeast Asian English varieties show a greater propensity for concern with reported speech than BrE and Indian English

-6

-4

-2

0

2

4

6

8

10


Register

Fac

tor

sco

re

GB HK IN PH SG

F=1.51 4 d.f.p=0.196

24

Summary and future research

• Summary– Seeking to enhance Biber’s MDA model with

semantic components– Introducing the new model in research of World

Englishes• Directions for future research

– More native English varieties from the Inner Circle– A wider and more balanced coverage of geographical

regions– Including socio-culturally relevant semantic categories– Combining corpora and more traditional resources in

socio-cultural studies and historical research• …adequately descriptive + sufficiently explanatory…

25

Thank you!

using an enhanced mda model in study of world englishes richard xiao university of central...

Documents

grammatical categories

register analysis slide

categories s negation

categories v helpinghindrance

categories u definiteness

categories g subordination

categories f prepositions

categories o lexical