combining intuition with corpus linguistic analysis: a study of lexical chunks in four chinese...

33
Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria Leedham FLaRN 2010

Upload: jaeden-scrogham

Post on 16-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing

Maria Leedham FLaRN [email protected]

Page 2: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

BACKGROUND TO STUDY 2

FLaRN 2010 Maria Leedham

Page 3: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Chunking through intuition: Study 1

RQ:• To what extent can NSs and NNSs chunk NNS speech?

Data: • transcripts of 2 intermediate-level Japanese students’ speech• students were recorded 3 times with a 2-month gap between each• total of approx.1500 words across the 6 transcripts

Method• Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts

(training, examples and practice given first)• Step 2: Japanese students asked to identify chunks in their own

transcripts• Step 3: author chunks transcripts with assistance from WordSmith Tools

(Leedham, 2006) FLaRN 2010 Maria Leedham

Page 4: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Example of chunked transcript from Study 1

Key:

italics - words classified by the NNS as a chunk.

underline – words 2 or 3 out of the 3 NSs classified as a chunk

1 ahh…first err I, I learned, learnt? (mmhmm I learnt) err (2.0) I should err.. I

2 should be more positive? (right) positive… in UK because ahh…when, when I

3 went to London err… last Sunday (mhmm) ahh (2.0) some, some of the

4 underground line (mm) line was no service (oh dear) ((speaker laughs)) I was

5 really surprised and, because it can, cannot be (mm) in Japan (mm) you know,

6 sun- in, in Sunday, on? (mm) on Sunday many, many people (mm) come to

7 London (mm) and go around some place (mm)... so everyone need to, need a

8 train (mm) so, but maybe four or five lines… was not, no service (mm) so…

9 I… I have to think err what I should do ((speaker laughs)) and no, I’ve never, I

10have never been to London that, so, this was the first time I’ve been to London

11(mm) so… FLaRN 2010 Maria Leedham

Page 5: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Findings from Study 1

Findings: • little inter or intra-rater reliabilitiy• many ‘missing’ chunks (eg ‘of course’, ‘you know’) both across and

within raters• frustrating and time-consuming task for NSs• BUT… the Japanese ss could do this task AND also offered insights

into when/why…

(eg student M: “I used to say that but now I know it’s not usual”.)• the more time spent looking for chunks, the more will be found

Coda• a further recording, transcribing & awareness-raising cycle suggests

that this resulted in uptake• both students found it highly motivating to record and analyse

transcripts of their talkFLaRN 2010 Maria Leedham

Page 6: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Chunking through intuition: Study 1

Method• Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts

(training, examples and practice given first)

• Step 2: Japanese students asked to identify chunks in their own transcripts

• Step 3: author chunks transcripts with assistance from WordSmith Tools v.5

FLaRN 2010 Maria Leedham

Page 7: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

STUDY 2:

FLaRN 2010 Maria Leedham

Page 8: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Outline

1. Research questions

2. The students and the texts

3. The two methods

4. Findings

4.1. Method 1

4.2 Method 2

5. Conclusions and Implications

FLaRN 2010 Maria Leedham

Page 9: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Research Questions

1. What can a study of lexical chunks reveal about these Chinese students’ writing?

2. What does each method contribute?

FLaRN 2010 Maria Leedham

Page 10: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

The Students

Wei• Male• BSc Engineering

Feng• Female• BSc Food Science with

Business

Ping• Female• BA Hospitality, Leisure &

Tourism Management (HLTM)

Hong• Male• BA HLTM

FLaRN 2010 Maria Leedham

Criteria- L1 Chinese (Mandarin or Cantonese)- All secondary education in home country- Contributions from years 1 & 2 and year 3 of undergraduate study

Page 11: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

The texts

FLaRN 2010 Maria Leedham

Student Discipline No. words No. texts

Wei Engineering 12,779 10

Feng

Food Science

13,683 10

Ping HLTM 13,368 5

Hong

HLTM 8,537 4

Totals

48,367

29

Corpus No. Texts

No. Words

English-Engineering

97 203,379

English-Food 28 73,402

English-HLTM 55 64,563

All-L1Chinese 146 279,695

All-L1English 611 1,335,676

Reference corpora

Page 12: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Combining intuition and corpus searches

FLaRN 2010 Maria Leedham

Method 1: Manual analysis

• Read all 4 Chinese students’ texts• Read twice, with 6 months between

• Read equivalent, randomly-selected English students’ texts

• Noted ‘salient’ features, then searched corpora of the individual’s texts, the discipline, all Chinese students’ writing, all English students’ writing.

Method 2: Key n-gram searches

• Used WordSmith Tools, v.5 (Scott, 2008)

• Searched for key n-grams in the corpus of texts from each student, using relevant discipline corpus from L1 English as reference

• Setting p=0.00001, deleted short n-grams within longer n-grams

• Compiled key n-gram lists• Looked at concordance lines and texts for more context

Page 13: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Formulaic sequences in sample of Wei’s writing (Engineering)

Introduction

A design methodology for a gearbox is presented in this report. The input horse power, the input speed and net reductions in the gearbox are the parameters to be specified. A gearbox takes an input shaft rotating and converts it via a gear train into up to three outputs, the process of designing a gearbox is to figure out which ratios are needed and to implement those ratios in the form of positioning various sizes of connected gears. The specification of the gearbox depends on its area of application.

• In this report, a gearbox is designed for a commercial meat slicer which has its final shaft rotating at between 80 and 100 rev/min. The input of the meat slicer is a constant speed AC motor running at 1800 rev/min and delivering 1.2 kW. A few points have to be considered on this system, the size of the gearbox is severe restricted, since it has to go onto a work surface where there is severe competition for space. And the motor may be in-line or at right angles to the grinder. Furthermore, the duty is expected to be up to 6 hours per day.

FLaRN 2010 Maria Leedham

Page 14: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Outline

1. Research questions

2. The students and the texts

3. The two methods

4. Findings

4.1. Method 1

4.2 Method 2

5. Conclusions and Implications

FLaRN 2010 Maria Leedham

Page 15: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Idiosyncratic language

In one word computer based tools contribute an…

In one word the overall system can be described… (Wei, years 2 & 3)

In light of this, it is suggested that buying IHG…

In light of this, it can be suggested that…

In light of this, it is recommended that buying IHG… (Ping, year 3, in 1 text)

… but simply writing a responsible tourism policy is no longer enough. It is a must to show practical action,… (Hong, Year 1)

a winning city, the authorities of Liverpool have to rebuild its image to get rid of the negative picture. (Hong, Year 2)

…and boost its marketing campaigns in order to catch the world’s eyes on Scotland. (Hong, year 3)

FLaRN 2010 Maria Leedham

Page 16: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Vague language

• In catering services, restaurants in Oxford and Bath are more or less the same. (Hong, Year 1)

• From those tables, the same thing as section 3.1 could be found … (Wei, Year 1).

• …a measurement system for measuring low-lever force, a kind of cantilever rig which is called…

• A kind of variable inductance sensor has been chosen…

• …Furthermore, with processing data, a kind of filter is always needed to separate certain… (Wei, year 2, same assignment)

• At that time, I found that this hotel is a little bit out of my expectation. (Hong, Year 2)

FLaRN 2010 Maria Leedham

Page 17: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Vague language

FLaRN 2010 Maria Leedham

N Concordance

1 of albumin solution and perchlorate acid. Therefore a bit of RNA was digested, and that gives a relatively high

2 acid and the reaction. The absorbance of tube 1 is a bit higher than the control, there might be a bit of DNase

3 greater, so it seems that the process has a little bit more risk to produce products over the LSL than to

4 the IBT and the conferences; however, there is a little bit different in the rate structure of the ILT. Since there

5 for introduction of contributory negligence may be a bit tight. Although contributory negligence may be

6 and hence that person does not mind paying a little bit extra for this. There is also the public perception that

7 lead them to a food source. Trail pheromones pose a bit of a problem for ants though because they need to

8 is something I'm not used to doing, so it comes as a bit of a shock. I did encounter difficulties using Xemacs

9 everyday use, this type to identity recognition seems a bit extreme, and the use of passwords and usernames

10 Seeing so many sliders and buttons may seem a bit overwhelming for some people. After reading the

• L1 English students use: ‘a bit of a ‘ + N eg ‘a bit of a problem’, ‘a bit of a shock’, ‘a bit of a dog’s breakfast’

• Often this is from reflective writing ‘The conclusion was also a bit of a victim in my editings, bringing it down to one small sentence for each of the areas of discussion’. (6101c Cybernetics Year 3 essay)

Page 18: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Chunks with – and without – ‘I’ & ‘we’

• From the experiment, it was known that the mechanical properties of carbon steel AN and carbon steel N….

• It was found out the mechanical properties of carbon steel AN was incorrect in this experiment,…

(Wei, Year 1)

• Meanwhile, if we clipped the current probe round one of the motor supply leads, and connected it to Ch1 of the oscilloscope, we could get two copies of the transient starting current of the motor from the oscilloscope. From these two copies, we could calculated…

FLaRN 2010 Maria Leedham

Page 19: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Chunks with – and without – ‘I’ & ‘we’

L1English students

FLaRN 2010 Maria Leedham

0

1000

2000

3000

4000

5000

6000

7000

pmw

student(s)

I

we

N Cluster Freq.

1 I FEEL THAT 8

2 I WAS DOING 7

3 WHAT I WAS 7

4 I HAVE LEARNT 7

5 HAVE LEARNT THAT 6

6 I SHOULD HAVE 6

7 FEEL THAT I 5

8 I NEED TO 5

Page 20: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Linkers• This can create a positive image for Scotland, on the other hand, (Ping Year 3)

• …In other words, people are buying expectations... (Hong, year 3)

• As a consequence, it can attract many travelers… (Hong, Year 2)• On the contrary, the predominance of SMEs... (Ping, Year 2)

• First of all, the dimension of the brake disc is decided. (Wei, Year 3)• What is more, Bath is served by a large number of local bus services…

(Hong, Year 1)

References to data• ‘as shown in table’ (Wei x 2, Ping x 2)

• ‘according to’ (Wei x 4)

• ‘as illustrated in table + NUMBER’ (Ping x 2)

FLaRN 2010 Maria Leedham

Page 21: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Summary of method 1 findings

Salient chunks in the Chinese students’ writing were:

• Idiosyncratic chunks (‘in light of the’)

• Vague language (‘a bit of’) – though note English students’ use of ‘a little bit of’

• High use of chunks with ‘we’ and low use of chunks with ‘I’ – partly due to English students’ reflective writing

• Use of favoured linkers (‘on the other hand’)

• Reference to data in tables and figures (‘according to the equation’)

• BUT… very difficult to intuit chunks in unfamiliar disciplines

FLaRN 2010 Maria Leedham

Page 22: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Outline

1. Research questions

2. The students and the texts

3. The two methods

4. Findings

4.1. Method 1

4.2 Method 2

5. Conclusions and Implications

FLaRN 2010 Maria Leedham

Page 23: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Method 2: Key n-gram searches

• Used WordSmith Tools, version 5 (Scott, 2008)• Searched for key n-grams (= ‘key clusters’) in the corpus of texts

from each of the 4 students• Relevant discipline corpus from L1 English used as reference corpus• P=0.00001, deleted short n-grams within longer n-grams• Compiled a key n-gram list for each student• Grouped these key n-grams into themes• Looked at concordance lines for more context

FLaRN 2010 Maria Leedham

Page 24: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

FLaRN 2010 Maria Leedham

Ping: HLTM

Rank ClusterPing Freq.

Ping Texts

L1EngHLTM Freq.

L1EngHLTM Texts Keyness

1 the hospitality industry 16 3 42 12 602 recruitment and selection 15 1 0 0 563 in the hospitality industry 10 2 20 9 374 please see appendix 10 1 0 0 375 with reference to appendix 8 1 0 0 306 higher than the original figure of 8 1 0 0 307 the new level of net profit 8 1 0 0 308 quality of service 8 3 0 0 309 the cost of 7 5 0 0 2610 to the guests 7 2 5 3 2611 it is believed that 6 2 2 2 2212 of the employees 6 1 0 0 2213 there will be 8 2 3 3 2114 of the group 8 1 1 1 2115 to reach the break even point 5 1 0 0 1916 on the other hand 5 3 2 1 1917 will be a 5 2 3 3 1918 high quality of service 5 2 0 0 1919 cost of sales 5 2 0 0 1920 the nature of 5 2 2 2 1921 Watson and Head 5 1 0 0 1922 IHG annual report 5 1 0 0 1923 a higher contribution 5 1 0 0 1924 Atrill and McLaney 5 1 0 0 1925 P E ratio 5 1 0 0 1925 served to the 5 1 0 0 19

N-grams

Page 25: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Idiosyncratic language

FLaRN 2010 Maria Leedham

N Concordance

1 the new level of net profit,£559.5, is 62.17% higher than the original figure of £345, which is a significant growth. g)

2 The new level of net profit,£609, is 76.52% higher than the original figure of £345. Business decision 8 Promotion

3 The new level of net profit,£545, is 57.97% higher than the original figure of £345. Business decision 7The other

4 new level of net profit is£477, which is 38.33% higher than the original figure of £345. Business decision 6There is a la

5 The new level of net profit,£513, is 48.70% higher than the original figure of £345. Business decision 5It is clearly

6 The new level of net profit,£541, is 56.81% higher than the original figure of £345. Business decision 4By

7 The new level of net profit,£527, is 52.75% higher than the original figure of £345. Business decision 3Since the

8 The new level of net profit,£625, is 81.16% higher than the original figure of £345. Business decision 2Since the

Ping's year 2 proposal

‘aim of the’‘of the assignment is to design’ ‘to develop an understanding of’(Wei)

(the) aim

object

of the assignment is to design

is to develop an understanding of

Page 26: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

FLaRN 2010 Maria Leedham

Discipline-specific n-grams• “Marriott Liverpool city centre”, “the Liverpool tourism industry”,

‘the tourism industry’ (Hong)• ‘the hospitality industry’, ‘recruitment and selection’,• ‘in the hospitality industry’ (Ping)

Passive voice• ‘be worked out’, ‘can be calculated’ (Wei) • ‘there will be’, ‘it is believed that’ (Ping)

References to data• ‘with reference to appendix’, ‘please see appendix’ (Ping) • ‘in the appendix’, ‘briefing sheet in appendix’, ‘is shown as’, ‘tables of

data’, ‘were recorded as below’• ‘was calculated with eq.’ (Wei)

Page 27: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Favoured linkers decrease over time

FLaRN 2010 Maria Leedham

0

50

100

150

200

250

300

350

on the other hand

in the long run

at the same time

in other words

last but not least

pmw

Linker

Chi12

Chi3

Eng12

Eng3

Subcorpus

Page 28: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Summary of method 2 findings

• Many of the same findings from method 1– idiosyncratic chunks– some linkers –esp. ‘on the other hand’– low use of chunks with ‘I’– references to data

• Also…. discipline-specific chunks

• Easy to compare one student’s texts with the discipline reference corpus & each L1 reference corpus

• Similar findings occur within the Chinese students overall

• NB Keyness measures differenceFLaRN 2010 Maria Leedham

Page 29: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Outline

1. Research questions

2. The students and the texts

3. The two methods

4. Findings

4.1. Method 1

4.2 Method 2

5. Conclusions and Implications

FLaRN 2010 Maria Leedham

Page 30: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

FLaRN 2010 Maria Leedham 30 of 10

Intuitive reading Key n-grams analysisFinds frequent chunks

(n-grams)

Plus

• Large quantities of data can be analysed quickly

• Accurate

• Easily replicable

Minus

• Single chunks are missed

• Arbitrary parameters

• Conflation of writing from lots of individuals

• Sense of text as complete document is lost

Finds semantically whole units (formulaic sequences)

Plus

• A person can recognise single instances that a computer would miss

• The text is read as a complete document - as intended by the writer

Minus

• Time-consuming and tiring• Problem of inter-rater reliability

• Problem of intra-rater consistency

• Hard to replicate

Page 31: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

Combining methods…

• Combine the two methods through a recursive process of reading texts and checking the sequences in a corpus, also searching for key n-grams for less intuitive sequences.

“ultimately, the most revealing insights… will be gained from a closer look at the texts, the speakers, and the situational variables; quantitative analysis alone can never provide a satisfactory picture” (Simpson, 2004:41).

FLaRN 2010 Maria Leedham

Page 32: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

FLaRN 2010 Maria Leedham

Page 33: Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

References

• Foster, P. (2001). "Rules and routines: A consideration of their role in the task-based langage production of native and non-native speakers", in M. Bygate, P. Skehan, and M. Swain, (eds.), Task-Based Learning: Language Teaching, Learning and Assessment. Longman: London.

• Heuboeck, A., Holmes, J. & Nesi, H. 2007 The Bawe Corpus Manual. Retrieved from http://www.coventry.ac.uk/researchnet/d/505/a/5160.

• Leedham, 2006. “Do I speak better? – A longitudinal study of lexical chunking in the spoken language of two Japanese students”. In The East Asian Learner.

• Scott, M. 2008. WordSmith Tools v.5. Oxford University Press.• Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University

Press.

• BAWE corpus- ESRC project number: RES-000-23-0800

FLaRN 2010 Maria Leedham