brian walker university of huddersfield we will look at ... •basic concepts and terminology. •a...

60
Brian Walker University of Huddersfield

Upload: tranngoc

Post on 28-Apr-2018

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Brian Walker University of Huddersfield

Page 2: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Training Schedule – day#1

09:00 Lectures: Introduction to corpus linguistics: history, terminology and methodology

11:00 Break

11:15 Lecture and practical sessions: What can you do with a corpus? Introducing AntConc.

12:45 LUNCH

14:00 Practical sessions: AntConc

15:30 Break

15:45 Lecture: Building a corpus – theory and practice

16:45 Round-up / Q&A.

17:00 FINISH

Page 3: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Training Schedule – day#2

09:00 Lecture: Mark-up and annotation.

10:00 Practical session: adding mark-up and annotation

11:00 Break

11:15 Practical session: exploiting mark-up and annotation

12:30 LUNCH

13:30 Practical session: further exploitation of mark-up and annotation.

15:30 Break

15:45 Lecture: Using corpus tools to explore a comic strip

16:45 Round-up / Q&A.

17:00 – FINISH

Page 4: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Introduction to Corpus Linguistics

Brian Walker, 2016

Page 5: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Introduction

We will look at ...

• Basic concepts and terminology.

• A little bit of history.

• What you can do with a corpus.

Page 6: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

What is Corpus Linguistics?

Corpus linguistics uses naturally occurring language data for linguistics analysis.

Large amounts of

Machine readable

Computer s/w

Large samples of

Page 7: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

What is corpus linguistics?

• Large samples of language data = corpus.

• Latin corpus: ‘body’ (plural corpora)

• Put simply: a corpus is a ‘body’ of text

Page 8: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

• Uses a corpus.

• Uses computers for analysis (not always the case).

• Empirical – analysing actual patterns of language use.

• Depends on quantitative and qualitative analytical techniques.

Biber, Conrad & Reppen (1998: 4)

What is Corpus Linguistics?

Page 9: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

• Uses a corpus.

• Uses computers for analysis (not always the case).

• Empirical – analysing actual patterns of language use.

• Depends on quantitative and qualitative analytical techniques.

Biber, Conrad & Reppen (1998: 4)

What is Corpus Linguistics?

Page 10: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

• Uses a corpus.

• Uses computers for analysis (not always the case).

• Empirical – analysing actual patterns of language use.

• Depends on quantitative and qualitative analytical techniques.

Biber, Conrad & Reppen (1998: 4)

What is Corpus Linguistics?

Page 11: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Early Corpus Linguistics

• Field Linguistics • Boas’s studies of native American languages • Bloomfield’s description of Tagalog • Hockett’s work on Potawatomi • Harris’s emphasis on the importance of results

being derived from data Franz Boas

Leonard Bloomfield

Charles Hockett Zellig Harris

While until about 1880 investigators confined themselves to the collection of vocabularies and brief grammatical notes, it has become more and more evident that large masses of texts are needed in order to elucidate the structure of languages. (Boas 1917: 1)

Page 12: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Principles of Chomskyan linguistics

• Homogeneous underlying system of language

• Describe the language of the ideal speaker/hearer

• Focus on linguistic competence as opposed to linguistic performance

Corpus linguistics doesn’t mean anything. It’s like saying suppose a physicist decides, suppose physics and chemistry decide that instead of relying on experiments, what they’re going to do is take videotapes of things happening in the world and they’ll collect huge videotapes of everything that’s happening and from that maybe they’ll come up with some generalizations or insights. (Chomsky, quoted in Andor 2004: 97)

Page 13: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Problems with intuition

Issue of acceptability

• I was 19 when I started university

• I were 19 when I started university

Impossibility of studying certain aspects of language without recourse to corpus data

• Historical linguistics

• Language change/variation

• Language acquisition

…this [intuition] is a very strange notion of data. Normally one expects a scientist to develop theories to describe and explain some phenomena which already exist, independently of the scientist. One does not expect a scientist to make up the data at the same time as the theory, or even to make up the data afterwards, in order to illustrate the theory. (Stubbs 1996: 29)

Page 14: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Methodology vs. theory

Two views:

Methodologist

CL is a methodology for studying large amounts of language data using computer software

Neo-Firthian

CL is a sub-discipline of linguistics, concerned with explaining relationships between meaning and structure in language

Page 15: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

What is a corpus?

• Machine-readable form

• Very large

• Representative sample

• (Standard reference)

• Often annotated McEnery and Wilson (2001)

Page 16: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Machine readable form

• Nowadays, corpus = machine readable

• Corpora tend to sit on a computer

• Not always the case

Page 17: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Very large

• Corpora are usually very large: 10 x 1000s, 100 x 1000s, millions of words.

• Usually a finite size

• Size decided at design stage – when size reached, data collection stops.

• Exception – monitor corpus

– E.g. COBUILD Corpus (Birmingham, UK)

– Dictionary compiling

Page 18: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

A representative sample

• Corpora are so big that they can be a ‘representative sample’ of a language or a language variety

• Also depends a lot on design of corpus

– (more later)

Page 19: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

(Standard reference)

• A corpus might be a standard reference or a ‘benchmark’ for a particular variety of language against which other texts or corpora can be compared

Page 20: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Annotation

• Just the words on their own = ‘raw text’.

• Annotation = extra information about what is in the corpus.

• Can help with the analysis of the data.

Page 21: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Annotation

Information about the text:

• Where it came from

• Who produced it

• Genre

• Etc.

Page 22: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Annotation

Adding information to the body of the text:

• e.g. gender of speaker;

• e.g. different parts of a text (headline, main story);

• e.g. the grammatical function of each word.

Page 23: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Annotation

• Annotation can be a manual process (takes ages)

• But some linguistic annotation can be done automatically

– e.g. word meaning (semantic)

– e.g. grammatical class of each word in the corpus (noun, verb, etc.)

Page 24: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

What is a corpus?

• Machine-readable form

• Very large

• Representative sample

• (Standard reference)

• Often annotated McEnery and Wilson (2001)

Page 25: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

A corpus

• …a finite-sized body of machine-readable text, sampled in order to be maximally representative of the language variety under consideration.

(McEnery & Wilson 2001: 32)

Page 26: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

A brief history of corpora

Page 27: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

The Survey of English

• Instigated 1959 by Randolph Quirk at University College London

• One million words of written and spoken British English, made up of 200 text samples of 5000 words each

• Electronic version of the spoken data produced in collaboration with Lund University: the London-Lund Corpus

• Manually annotated for prosodic and paralinguistic features

• Grammatical structures for each text sample recorded on file cards

• Searching the corpus meant a trip to the Survey offices to search through filing cabinets of data!

Page 28: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

The Survey of English

• Instigated 1959 by Randolph Quirk at University College London

• One million words of written and spoken British English, made up of 200 text samples of 5000 words each

• Electronic version of the spoken data produced in collaboration with Lund University: the London-Lund Corpus

• Manually annotated for prosodic and paralinguistic features

• Grammatical structures for each text sample recorded on file cards

• Searching the corpus meant a trip to the Survey offices to search through filing cabinets of data!

Page 29: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Building the Brown corpus

• The Brown Corpus • Built by Nelson Francis and Henry

Kučera at Brown University, USA • One million words of written

American English (1961), made up of 500 text samples of 2000 words each

• Enabled frequency measures of words • Confirmed Zipf’s law • The most frequent word in a corpus is

approximately twice as frequent as the second most frequent, and three times as frequent as the third most frequent, etc.

• Frequency is inversely proportional to rank

Page 30: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Extending the Brown family

• 1970-78: LOB • Built by Geoffrey Leech and

colleagues at Lancaster University • One million words of written British

English (1961), made up of 500 text samples of 2000 words each

• FROWN: Written American English from 1991

• FLOB: Written British English from 1991

• BE06: Written British English from early years of 21st century

• LOBalike: Written British English from 2011

Page 31: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Extending the Brown family

• 1970-78: LOB • Built by Geoffrey Leech and

colleagues at Lancaster University • One million words of written British

English (1961), made up of 500 text samples of 2000 words each

• FROWN: Written American English from 1991

• FLOB: Written British English from 1991

• BE06: Written British English from early years of 21st century

• LOBalike: Written British English from 2011

Page 32: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Further developments:

• Currently, one of the best contemporary UK English corpora • 100 million words from the early 1990s • Represents a wide range of both spoken and written modern

British English: – Written data

• 90 million words • Includes extracts from newspapers, academic books, popular fiction,

letters and university essays

– Spoken data • 10 million words • Includes demographic data and context governed data • The demographic part

– Transcripts of about 900 everyday unscripted spoken conversations

• The context-governed part – Spoken language collected in public contexts – e.g. radio phone-ins, government

meetings, classroom interactions

Page 33: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Making sense of meaning

• COBUILD project initiated at Birmingham in 1980 - resulted in the Bank of English • English Lexical Studies 1963: Sinclair, Susan Jones and Robert Daley analysed a small

corpus of spoken and written English to investigate the relationship between words and meaning

• Meaning is best seen as a property of words in combination • Builds on J. R. Firth’s concept of collocation

Page 34: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

What can you do with a corpus?

Page 35: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Frequency analysis

• Simple statistical measure can:

– offer an insight into how often particular words are used in a data set;

– be indicative of the overriding concerns expressed in a text.

– be used to investigate lexical change across time or differences between texts.

Page 36: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Frequency analysis

Frequency analysis – example:

0

0.05

0.1

0.15

0.2

0.25

0.3

1945 1950 1951 1955 1959 1964 1966 1970 1974(Feb)

1974(Oct)

1979 1983 1987 1992 1997 2001 2005 2010

The changing

frequency of

choice in party

political

manifestos,1945 -

2010.

Page 37: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Concordances

• A concordance is a list of all the sentences in which our target word occurs.

• Concordances are helpful because they can allow us to see patterns of language use.

• Corpus linguistic software is used to generate concordance lines.

Page 38: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Concordances

to shake, honey blonde hair cascading over slim shoulders. The girl just has to laugh. She's talk

hen I have a bath,' she laughed. Anthea - tall, slim and breathtakingly pretty - is nearly ten years

wigs, as he calls her, would want to drag this slim six-footer off the street. He had the same

s about six foot tall and very attractive. She is slim with blonde hair and looks like a catwalk mod

Amanda, of Chadwell Heath, Essex, had hoped to slim before the first wedding. Then she discovered

t again, Joy walked over to the table with two slim girls seated there. They were arguing about

the left read' These seats are for those of very slim build only'. Joy was now standing, reading the

d even though their chance of happiness was slim and Wickham was disliked, the marriage still

with nine centuries. Six feet (1.8 metres) tall, slim and athletic, his right-handed batting was less

to his two companions.' Aye,' replied another, slim and small as a child but with a face centuries

hree young children. She was dark-haired and slim , 32 years of age and pleasant. Her husband,

I could see no special contact between them. Slim in her dungarees, with her long, curly, chest

ecute impressively unimpressive water-tricks: slim , brown sprites. We return to the village by a

and fat. The other was a Sikh, very small and slim . They looked like a comic turn. There was a

give up totally, but settled for three or four slim cigars a day instead of ten to fifteen cigarette

built-in disappointments, ounce by ounce the slim frame turned to flab, and in the end Baxter

Page 39: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Collocations

• Collocation = relationship between words that tend to occur together

– Words that tend to occur near word X are the collocates of word X

– Based on frequencies

– Statistical measures

Page 40: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Collocations

• Important in corpus linguistics.

• The company a word keeps can give that word implicit associations or assumptions.

• ‘You shall know a word by the company it keeps.’ (Firth 1957: 11).

Collocations

Page 41: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Collocations

• Juvenile = young, youthful, a young person

– Collocates: delinquency, delinquent, delinquents, offenders, diabetes, crime, court

• Juvenile has negative associations

• Semantic prosody

Collocations

Page 42: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Collocations

– Near-synonyms often differ in terms of their collocations

Collocations

Page 43: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Collocations

• Young

– Collocates: mums-to-be, bloods, nubile, hopefuls, impressionable, up-and-coming

• Negative associations?

Collocations

Page 44: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Bart You're up to something, aren't ya?

Homer No! I'm just going out to commit certain deeds.

Collocations

Page 45: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

s. <p/> A39 57 A39 58 <h_><p_>The write way to commit murder<p/> A39 59 <p_><quote_>"Advice and inform

of God is manifested. <tf_>Kill, D03 78 rob and commit adultery<tf/> are all deeds forbidden in the D03

0 of a religious sect who orders his followers to commit suicide.<p/> D11 131 <p_>"God, permitting the mir

bility of episcopal ordination"<quote/> would not commit the D17 47 Methodist Church to the view that th

7 article. Take care though: don't let your words commit an editor to E10 98 using a specific picture, w

theory and deconstruction is such as to G67 189 commit the reasoner to defending certain values.<p/> G67

ithin the Service about offenders who continue to commit H09 191 crime while on bail<p/> H09 192 <p_>Whil

4m ($45m).<p/> H27 148 <p_>However, it would only commit itself to a forecast of H27 149 maintained sales

democracy from collapse, but this was to J41 142 commit <quote_>"a common fallacy in social thought which

45 163 the effort levels that they are willing to commit. Let contracts J45 164 with regard to effort be

2 <p_>Her cheeks flushed crimson and he strove to commit to memory P08 53 the lovely colour as the blood

222 never took the slot, although he did briefly commit to an ROTC A10 223 program before putting his na

1894.<p/> A26 13 <p_>Commissioners hesitated to commit themselves after one of the A26 14 monument's c

ote/> <quote_>"Cold Feet A32 243 - Why Men Won't Commit"<quote/> and <quote_>"Letting Go and Moving A32

B13 92 addressed men who use drugs or those who commit adultery, and who B13 93 get AIDS and other ven

ceeds rational basis. Since urban blacks B17 61 commit more crime proportionately (although not numerica

us consequences. Mr. C12 185 Deng was hounded to commit suicide in 1966 and his criticism is now C12 186

nd Jodie squabble C13 199 because he's afraid to commit to marriage.<p/> C13 200 <p_>Social issues, too,

ue <quote_>"is to do something about it, i.e., to commit oneself D03 187 to a way of life ..."<quote/><p/

n objective theistic D03 192 statement; it is to commit oneself to living life and to D03 193 understand

to be silly and trivial, because I don't want to commit D06 180 an overt, nonrational act and I don't wa

WN:E28\><h_><p_>SANITATION<p/> E28 2 <p_>HOW TO COMMIT BIOCIDE<p/> E28 3 <p_>In the strictest sense, s

an <tf|>offensive F04 52 position. That is, to commit to an aggressive daily-action plan F04 53 desig

form drives 15 F11 31 percent of its victims to commit suicide. (For a list of symptoms, F11 32 see 'A

, artificial persons make decisions that F37 23 commit other people. At the same time, the power to spea

act G22 13 open to us now would be unjust is to commit ourselves to avoiding G22 14 it. But what of pa

H08 57 exploiting the Gulf war as a pretext to commit terrorism.<p/> H08 58 <p_>While we can be proud

p/> H09 52 <p_>First, we must get the people who commit crimes out of the H09 53 community, and we must

<p_>And, it increases penalties for criminals who commit gun H09 69 offenses.<p/> H09 70 <p_>We have no

rease the penalties on those who use such guns to commit H09 120 crimes.<p/> H09 121 <p_>Mr. President, I

y requiring grantees H26 155 in most programs to commit their own funds for a portion of the H26 156 cos

to do with its value; to think so is to J30 27 commit a genetic fallacy. After I wrote this, I came acr

ereas J43 34 disengaged delinquents are free to commit a variety of illegal J43 35 activities, such fr

on a particular illegal J43 38 possibility. Why commit anti-gay violence versus rape or armed J43 39 r

_>Hitler understandably regarded people who could commit such J56 150 acts against Britain as his natural

rt with this J58 131 their so natural Right, but commit onely<&|>sic! the Administration J58 132 of such

lives K23 172 were before us. Rarely did anyone commit suicide. Here, hundreds of K23 173 people sit, w

asked Michael. <quote_>"Did you want P17 102 to commit suicide?"<quote/><p/> P17 103 <p_><quote_>"Oh, no

Page 46: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

• We might be able to guess some collocates intuitively, but corpus tools can help us identify others.

• If we suspect a word combination to be unusual or deviant, we can check our intuition by looking at the collocates for that word in a general corpus.

Collocations

Page 47: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

[Context: This extract occurs towards the end of the sketch. Alan enters and very quickly summarises how the war finally came to an end.]

Alan (entering) But the tide was turning, the wicket was drying out. It was deuce – advantage Great Britain. Then America and Russia asked if they could join in, and the whole thing turned into a free-for-all. And so, unavoidably, came peace, putting an end to organised war as we knew it.

(Alan Bennett, Beyond the Fringe)

Collocations

Page 48: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Collocations

• The most common post-modifying collocate of organised is crime

• In the BNC, crime appears as a collocate of organised 61 times in 41 texts

• War appears as a collocate of organised only once – and in this one instance it appears after war – When this war broke out organised Labour in this country lost the

initiative (CE7565)

• Organised war is an unusual co-occurrence

• Why? Does it take on some meaning from the contexts in which it is habitually used, which convey an attitude or stance of the author?

Page 49: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

• A keyword is a word which occurs in a text or corpus more frequently than you would expect by chance alone

– … based on comparison with another (benchmark) corpus (e.g. the BNC)

– … and the difference has to be statistically significant

Keywords

Page 50: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Keywords

Text #1 wordlist

Text #2 wordlist

Comparison

process

Key words list

Apply statistical test

(e.g. Log Likelihood).

The over-represented (and

under-represented) words in

text #1 when compared with

text #2

Difference must be

statistically significant

Page 51: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

• A text’s keywords often point towards its content or its biases and/or can act as style markers (Enkvist 1973)

• Keywords are often a good guide to what would be interesting to look at in more detail

Keywords

Page 52: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Why use a corpus?

Page 53: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Why use a corpus?

• Allows linguists to access quantitative information about language, which can often be used to support qualitative analysis.

• Insights into language gained from corpus analysis are often generalisable in a way that insights gained from the qualitative analysis of small samples of data are not.

• We can look at patterns in large bodies of texts to identify language trends and tendencies, and test our intuitions.

Page 54: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Why use a corpus?

• Using corpus data forces us to acknowledge how language is really used (which is often different from how we think it is used)

• Computer analysis can also reveal atypical or unusual uses (relative to some norm), which may not be possible to observe through manual analysis

Page 55: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Some potential problems

• To give reliable results which will lead to useful findings, a corpus needs to be representative of the language data we are interested in (more on this later).

• Availability/difficulty of collecting data (e.g. historical texts; transcribing spoken texts).

• Accuracy of automated processes can vary with different tools and text-types.

Page 56: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Looking ahead

• Development of tools and technologies

• Corpus techniques increasingly used in other disciplines

• Inter-disciplinarity

• Multimodal corpora (e.g. Headtalk, Knight et al. 2008)

• Corpus Linguistics and Geographical Information Systems. This involves extracting place-names from a corpus, searching for their semantic collocates and creating maps to allows users to visualise how concepts such as war and money are distributed geographically (Gregory and Hardie 2011)

Page 57: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Summary

The basic idea:

• By analysing VERY large amounts of textual data, we can ...

• establish norms about the variety of language being studied

• test theories about language

• spot common and rare language phenomena

• reduce bias

Page 58: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Summary

The computer can’t do it all for us – we still have to analyse the results and ask ...

‘What does it all mean?’

Page 59: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

Any questions so far?

Page 60: Brian Walker University of Huddersfield We will look at ... •Basic concepts and terminology. •A little bit of history. •What you can do with a corpus

References

• Andor, J. (2004) ‘The master and his performance: an interview with Noam Chomsky’, Intercultural Pragmatics 1(1): 93-111.

• Boas, F. (1917) ‘Introduction’, International Journal of American Linguistics 1(1): 1-8. [Reprinted in Boas, D. (1940) Race, Language and Culture, pp. 199-210. The Free Press; New York.]

• Gregory, I. and Hardie, A. (2011) ‘Visual GISting: bringing together corpus linguistics and Geographical Information Systems’, Literary and Linguistic Computing 26(3): 297-314.

• Knight, D., Adolphs, S., Tennent, P. and Carter, R. (2008) ‘The Nottingham Multi-Modal Corpus: a demonstration’, Proceedings of the 6th Language Resources and Evaluation Conference, Palais des Congrés Mansour Eddahbi, Marrakech, Morocco, 28-30th May.

• Stubbs, M (1996) Text and Corpus Analysis. Oxford: Blackwell.