ia901 2012 session four

Post on 25-Feb-2016

33 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

IA901 2012 Session Four. Lab Session: Corpora What is a corpus? What do corpora tell us about the English language Corpus-driven language description Practical application of corpora in the classroom. A link to last week…. HONIED or HONEYED? ENJOY → ENJOYED PLAY → PLAYED - PowerPoint PPT Presentation

TRANSCRIPT

IA901 2012 Session Four

• Lab Session: Corpora

• What is a corpus?• What do corpora tell us about the English language• Corpus-driven language description• Practical application of corpora in the classroom

A link to last week…

HONIED or HONEYED?

ENJOY → ENJOYEDPLAY → PLAYED

WORRY → WORRIEDHURRY → HURRIED

MONEY → MONIED / MONEYED?

“Honeyed” is almost 40 times as common

(online) as “honied”

Option 1 Option 2 Preferred plural form?cowboys cowsboycowgirls cowsgirlbreakfasts breaksfastchristmases christsmasbusinesses busiesnessgirl from Ipanemas girls from Ipanemamother-in-laws mothers-in-lawgin and tonics gins and tonictablespoonfuls tablespoonsfulwork of arts works of art

hole in ones holes in onepasserbys passersbygovernor-generals governors-generalPOWs POW

Also in relation to last week’s session, I found that:

“mother-in-laws” is almost 50% more common than “mothers-in-law”

“tablespoonfuls” is 12 times more common than “tablespoonsful”

“passersby” is almost 17 times more common than “passerbys”

“gin and tonics” is over 60 times more common than “gins and tonic”

“works of art” is over 250 times more common than “work of arts”

What is a corpus?

What can it tell us?

Where do you think this word list comes from?

And this?

created using wordle.net

So…

is my IA902 corpus a “principled collection of texts available for qualitative and quantitative analysis”? (Biber, Conrad, Reppen, 1998)

A history of corpora

1700s: Dr Johnson wrote the first

comprehensive dictionary of English,

compiled by manually collating samples of

language from 1560-1660.

1960s Brown Corpus of Standard American English : first of the modern, computer

readable, general corpora

1980s John Sinclair & colleagues: Collins Birmingham University International

Language Database (COBUILD)

1987 Collins COBUILD English Dictionary

1990 Willis: the Lexical Syllabus

2007 Cambridge International Corpus => 1 billion words

ANC

BASE

BNC

BoE

BROWN

CIC

CANCODE

COBUILD

MICASE

American National Corpus

British Academic Spoken English

British National Corpus

Bank of English

Brown University

Cambridge International Corpus

Cambridge & Nottingham Corpus of Discourse in

English

Collins Birmingham University International Language

Database

Michigan Corpus of Academic Spoken English

Corpora not limited to general or native-speaker data:

Business and Academic corpora

The International Corpus of Learner English

VOICE (The Vienna-Oxford International Corpus of English) is a

collection of English as a Lingua Franca

Corpus development with the idea of SUEs (Successful Users of

English) as a model

How big does a corpus need to be?

What do corpora tell us?

• Frequency of individual words

• Frequency of “chunks”

Frequency of individual words

Word Freq %

1 I 13 9.772 YESTERDAY 9 6.773 TO 8 6.024 MM 6 4.515 NOW 5 3.766 OH 4 3.017 SHE 4 3.018 A 3 2.269 AWAY 3 2.2610 BELIEVE 3 2.26

Within a bigger corpus (say, 5 million words), which words would you expect to occur most frequently? Write down 10 words that you’d expect to be in the top 50.

What differences would you expect to find between lists of the most frequent words in corpora of WRITTEN and SPOKEN English?

From O’Keefe et al (2007)

From O’Keefe et al (2007)

A B C Dpossibly, must, seem ,

just, clearly, honestly, pretty

house, TV, cheese, kids

sad, brilliant, lovely, terrible

Eventually, always, usually, generally

explain, accept, help, listen

O’Keefe et al (2007) divide the 2000 most frequently occurring words in the CIC and CANCODE corpora into 4 sub-lists: A = 1-500 B = 501-1000 C = 1001-1500 D = 1501-2000.

Can you identify the most frequently-occurring word in each set below?

A B C Dpossibly, must, seem ,

just, clearly, honestly, pretty

house, TV, cheese, kids

sad, brilliant, lovely, terrible

Eventually, always, usually, generally

explain, accept, help, listen

O’Keefe et al (2007) divide the 2000 most frequently occurring words in the CIC and CANCODE corpora into 4 sublists: A = 1-500 B = 501-1000 C = 1001-1500 D = 1501-2000.

Can you identify the most frequently-occurring word in each set below?

A B C Dpossibly, must, seem

just, clearly, honestly, pretty

house, TV, cheese, kids

sad, brilliant, lovely, terrible

Eventually, always, usually, generallyexplain, accept, help, listen

A B C Dmust seem possibly

just, clearly, honestly, pretty

house, TV, cheese, kids

sad, brilliant, lovely, terrible

Eventually, always, usually, generallyexplain, accept, help, listen

A B C Dmust seem possibly

just pretty clearly honestly

house, TV, cheese, kids

sad, brilliant, lovely, terrible

Eventually, always, usually, generallyexplain, accept, help, listen

A B C Dmust seem possibly

just pretty clearly honestly

house kids TV cheese

sad, brilliant, lovely, terrible

Eventually, always, usually, generallyexplain, accept, help, listen

A B C Dmust seem possibly

just pretty clearly honestly

house kids TV cheese

lovely terrible brilliant sad

Eventually, always, usually, generallyexplain, accept, help, listen

A B C Dmust seem possibly

just pretty clearly honestly

house kids TV cheese

lovely terrible brilliant sad

always usually eventually generally

explain, accept, help, listen

A B C Dmust seem possibly

just pretty clearly honestly

house kids TV cheese

lovely terrible brilliant sad

always usually eventually generally

help listen explain accept

A B C Dmust seem possibly

just pretty clearly honestly

house kids TV cheese

lovely terrible brilliant sad

always usually eventually generally

help listen explain accept

“The broad categories of a basic vocabulary” (O’Keefe et al, 2007)

A B C DMODAL ITEMS must seem possibly

just pretty clearly honestly

house kids TV cheese

lovely terrible brilliant sad

always usually eventually generally

help listen explain accept

“The broad categories of a basic vocabulary” (O’Keefe et al, 2007)

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

house kids TV cheese

lovely terrible brilliant sad

always usually eventually generally

help listen explain accept

“The broad categories of a basic vocabulary” (O’Keefe et al, 2007)

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

lovely terrible brilliant sad

always usually eventually generally

help listen explain accept

“The broad categories of a basic vocabulary” (O’Keefe et al, 2007)

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

BASIC ADJECTIVES lovely terrible brilliant sad

always usually eventually generally

help listen explain accept

“The broad categories of a basic vocabulary” (O’Keefe et al, 2007)

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

BASIC ADJECTIVES lovely terrible brilliant sad

BASIC ADVERBS always usually eventually generally

help listen explain accept

“The broad categories of a basic vocabulary” (O’Keefe et al, 2007)

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

BASIC ADJECTIVES lovely terrible brilliant sad

BASIC ADVERBS always usually eventually generally

BASIC VERBS FOR ACTIONS AND EVENTS

help listen explain accept

“The broad categories of a basic vocabulary” (O’Keefe et al, 2007)

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

BASIC ADJECTIVES lovely terrible brilliant sad

BASIC ADVERBS always usually eventually generally

BASIC VERBS FOR ACTIONS AND EVENTS

help listen explain Accept

DELEXICAL VERBS

DISCOURSE MARKERS

GENERAL DEICTICS

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

BASIC ADJECTIVES lovely terrible brilliant sad

BASIC ADVERBS always usually eventually generally

BASIC VERBS FOR ACTIONS AND EVENTS

help listen explain Accept

DELEXICAL VERBS do

DISCOURSE MARKERS

GENERAL DEICTICS

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

BASIC ADJECTIVES lovely terrible brilliant sad

BASIC ADVERBS always usually eventually generally

BASIC VERBS FOR ACTIONS AND EVENTS

help listen explain Accept

DELEXICAL VERBS do

DISCOURSE MARKERS so

GENERAL DEICTICS

A B C DMODAL ITEMS must seem possibly

STANCE WORDS just pretty clearly honestly

BASIC NOUNS house kids TV cheese

BASIC ADJECTIVES lovely terrible brilliant sad

BASIC ADVERBS always usually eventually generally

BASIC VERBS FOR ACTIONS AND EVENTS

help listen explain Accept

DELEXICAL VERBS do

DISCOURSE MARKERS so

GENERAL DEICTICS here

Three Relevant Word lists?

The General Service List (Michael West, 1953)

The Academic Word List (Averil Coxhead, 2000)

The Academic Keyword List (Magali Paquot, 2010)

Good news for the beginner?

Bad news for the advanced-level student?

From O’Keefe et al (2007)

Frequency of “chunks”

• Collocation

• Strings of words

• Colligation

Definitions

Biber et al (2002):

Collocation : “a combination of lexical words which frequently co-occur in texts”

Lexical Bundle : “a sequence of words which is used repeatedly in texts”

Alternatives:

Collocation:

- “just the way we say it”?

- “the occurrence of two or more words within a short space of each other in a text” (Sinclair, 1991)

- “the relationship a lexical item has with items that appear with greater than random probability in its (textual) context” (Hoey, 1991)

- “a psychological association between words (rather than lemmas) up to four words apart =…evidenced by their occurrence together in corpora more often than is explicable in terms of random distribution” (Hoey, 2005)

- “the lexical company that words keep” (Hoey, 2011)

Collocations Dictionaries

username: mholloway, password: ia902

What words collocate with both STUDY and RESEARCH?

What words collocate with both STUDY and RESEARCH?

“Chunks” : how long? How significant?

Put the following items in order of the frequency with which they are used in spoken English:

a) a bit ofb) and things like thatc) regularlyd) sincee) this that and the otherf) twice

From O’Keefe et al (2007)

a couple of, possible, at the moment, alone, all the time, fun, in terms of, something like that, expensive, you know what i mean, stairs, at the same time, nowhere

From O’Keefe et al (2007)

Commonly-occurring six-word chunks:

1. Do you know _______ _______ _______?2. At the end _______ _______ _______3. And all the rest _______ _______4. And all that sort _______ _______5. I don’t know _______ _______ _______

6. Do you know what I mean?7. At the end of the day8. All of the rest of it9. And all that sort of thing10. I don’t know what it is

From O’Keefe et al (2007)

“a bit” is the 24th most common two-word chunk in CANCODE

but,…what does “a bit” mean? Does it have any meaning by itself?

How meaningful is “a bit” as a quantifier?

What about its “hedging” function?

It also belongs to several “frames”:

e.g. it was a bit of a mess problem performance hassle nuisance bargain

COLLIGATION : Where lexis meet grammar?

Data on language usage tells us that:

• “a bit” is more likely than “the bit”• “a bit” is likely to be followed by “of” + NP• “a bit” is more likely to be used in an object

position than a subject position

1. Which preposition is most likely to follow DIFFERENT – TO or FROM?

DIFFERENT TO DIFFERENT

FROM

Brown 0 35

BNC Written 4 22

BNC Spoken 21 12

From the Compleat Lexcial Tutor:

Entitle – Active or Passive?

ILLUSTRATE and DRAW

Among the many differences you may have found between these two words, did you discover anything about COLLIGATION?

DRAW is a more frequent item than ILLUSTRATEBoth verbs are frequently preceded by “to”.Relatively speaking, ILLUSTRATE occurs significantly more frequently with “to” than DRAW doesILLUSTRATE is frequently used in INFINITIVE CLAUSE

To illustrate this, we can compare concordance lists of each word using any of the websites linked to on the IA902 blog.

Widening context / Narrowing meaning

• Written and spoken contexts

• Semantic association

• Semantic prosody

Differences in spoken and written English:

- data on spoken English reflects an orientation to the “speaker-listener world in conversation”. (I, you)- spoken discourse markers (well, right)- high frequency items that are arguably not words at all (yeah, oh, er)

What functions do ABSOLUTELY and DEFINITELY have in spoken English?

What would you expect to be the most common uses of the words LIKE and MEAN?

Collocates for LIKE & MEAN (BNC Written & Spoken + Brown)

LIKEwould=35 look=27 was=25 I=20 looked=18 looks=17 and=15more=15 just=14

not=14 something=13 is=12 much=11 the=11 you=11 feel=10

MEANI=611 you=86 not=38 the=29 would=27 to=13 we=11 Didn’t=10 may=10

will=9 a=8 could=8 that=8 Don’t=7 it=7 can=6 necessarily=6 (mm=2)

Semantic asssociation

Semantic prosody

Collocations: inner ear, glue ear; a clip round the ear; she whispered in his ear; ear, nose, and throat doctor; hear a voice in your ear

Semantic association: parts of the body

Semantic prosody???

What’s the difference between SKINNY and SLIM?

Slim : elegant, graceful

Skinny: sick, shy?

Differences between HANDSOME and PRETTY

Differences between HANDSOME and PRETTY

Differences between HANDSOME and PRETTY

How would explain the difference between CAUSE and PROVIDE?

How would explain the difference between CAUSE and PROVIDE?

Materials

Corpus-informed publications for students

Corpus-informed publications for students

Corpus-informed publications for students

Corpus-informed publications for students

Corpus-informed publications for students

For teachers: Corpus-informed or “impulse-based”?

Activities

From Cobb (1997)

Discussion

Disadvantages?

- overly-reliant on technology?

- does navigation of corpora also require an element of “instinct”?

- the dangers of becoming “corpus-bound”

From O’Keefe et al (2007)

For further exploration

- what do corpora tell us about existing theories of language? (see Hoey, 2005)

- how can YOU use corpora in your teaching?

- what use can you make of corpora in your research?

top related