language corpora and the language classroom

48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Page 1 of 48 Language corpora and the language classroom 1. Introduction These days, language corpora are being used by language teachers, researchers and students more and more often. Computers have become widely available in homes and schools, corpora can be searched on the Internet for free and corpus resources have improved the quality and the access to the methods of corpus linguistics in applied fields such as foreign language teaching. Compiling your own ad-hoc corpus or a corpus of your own students is easier today than ever before and free resources abound. The most important application of corpora in language classrooms is called Data-driven learning. Corpus Linguistics (CL) and Data-driven learning (DDL) are two terms that have caught the attention of teachers in foreign language teaching (FLT) and researchers alike for a decade now. This is so because the assumptions behind CL and DDL are of enormous importance to language researchers and FL teachers. In a very recent publication, O'Keeffe, McCarthy and Carter (2007:21) state the following about the application of language corpora in FLT: As well as providing an empirical basis for checking our intuitions about language, corpora have also brought to light features about language which had eluded our intuition […] In terms of what we actually teach, numerous studies have shown us that the language presented in textbooks is frequently still based on intuitions about how we use language, rather than actual evidence of use. It seems that language corpora can help us discover that which apparently appears undisputed in prescriptive or in intuition-led textbooks and other reference materials. In the following paragraphs, we will offer a brief account of the implications of CL and DDL for mainstream FLT. In particular, we aim to present useful insights into how using language corpora can help our teaching. Most of the resources presented in this chapter are freely available on the Internet.

Upload: pascual-perez-paredes

Post on 06-Dec-2014

10.243 views

Category:

Education


1 download

DESCRIPTION

Pérez-Paredes, P. & Díez Bedmar, B. 2010. Language corpiora and the language classroom. Murcia: Consejería de Educación de la CARM. ISBN 978-84-692-4229-2

TRANSCRIPT

Page 1: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 1 of 48

Language corpora and the language classroom

1. Introduction

These days, language corpora are being used by language teachers, researchers and students

more and more often. Computers have become widely available in homes and schools,

corpora can be searched on the Internet for free and corpus resources have improved the

quality and the access to the methods of corpus linguistics in applied fields such as foreign

language teaching. Compiling your own ad-hoc corpus or a corpus of your own students is

easier today than ever before and free resources abound.

The most important application of corpora in language classrooms is called Data-driven

learning. Corpus Linguistics (CL) and Data-driven learning (DDL) are two terms that have

caught the attention of teachers in foreign language teaching (FLT) and researchers alike for a

decade now. This is so because the assumptions behind CL and DDL are of enormous

importance to language researchers and FL teachers. In a very recent publication, O'Keeffe,

McCarthy and Carter (2007:21) state the following about the application of language corpora

in FLT:

As well as providing an empirical basis for checking our intuitions about language,

corpora have also brought to light features about language which had eluded our

intuition […] In terms of what we actually teach, numerous studies have shown us that

the language presented in textbooks is frequently still based on intuitions about how

we use language, rather than actual evidence of use.

It seems that language corpora can help us discover that which apparently appears undisputed

in prescriptive or in intuition-led textbooks and other reference materials.

In the following paragraphs, we will offer a brief account of the implications of CL and DDL

for mainstream FLT. In particular, we aim to present useful insights into how using language

corpora can help our teaching.

Most of the resources presented in this chapter are freely available on the Internet.

Page 2: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 2 of 48

2. Corpus linguistics and Data and Data-driven learning in a nutshell

2.1. Data in FLT: preliminary issues

Data-driven learning is a language learning approach that is “basically developed through

self-conscious activities instead of being imparted through conceptual knowledge” (Pérez

Basanta, C and Rodríguez Martín: 146-7). In DDL, learners become active researchers, they

see language from a different perspective and discover language and communication facts that

otherwise may remain unseen.

In DDL, reading concordance lines is a usual practice. Take the word important, a basic

adjective that learners use on an everyday basis in schools. The following screenshot from

Collins WordbanksOnline English corpus1 shows fifty random uses of the Word in a 10-

million corpus of spoken British English:

1 http://www.collins.co.uk/Corpus/CorpusSearch.aspx

Page 3: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 3 of 48

Figure 1. Sample concordances of important in the Collins WordbanksOnline English corpus.

In a way, DDL promotes vertical reading rather than horizontal reading as learners are invited

to look at the accumulated frequency and co-occurrence of lexical items. In Figure 1, learners

could note the following:

The words to the left of important: more, most, quite, awfully, very, etc.

The words to the right of important: to + infinitive, factor, thing, point, etc.

However, using concordance lines is useful to note language behaviour that goes beyond the

boundaries of two words that appear in contiguity. Take the word sure as an instance. The

Cambridge Advanced Learner‟s Dictionary2 offers 8 entries for the word. You can find the

entries and examples below:

1: certain; without any doubt:

"What's wrong with him?" "I'm not really sure."

2 http://dictionary.cambridge.org/

Page 4: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 4 of 48

I'm sure (that) I left my keys on the table.

I feel absolutely sure (that) you've made the right decision.

It now seems sure (that) the election will result in another victory for the government.

Simon isn't sure whether/if he'll be able to come to the party or not.

Is there anything you're not sure of/about?

There is only one sure way (= one way that can be trusted) of finding out the truth.

See also cocksure.

2 be sure of/about sb to have confidence in and trust someone:

Henry has only been working for us for a short while, and we're not really sure about him yet.

You can always be sure of Kay.

3 be sure of yourself to be very or too confident:

She's become much more sure of herself since she got a job.

4 be sure of sth be confident that something is true:

He said that he wasn't completely sure of his facts.

5 be sure of getting/winning sth to be certain to get or win something:

We arrived early, to be sure of getting a good seat.

A majority of Congress members wanted to put off an election until they could be sure of

winning it.

6 be sure to to be certain to:

She's sure to win.

I want to go somewhere where we're sure to have good weather.

7 make sure (that) to look and/or take action to be certain that something happens, is true, etc:

Make sure you lock the door behind you when you go out.

8 If you have a sure knowledge or understanding of something, you know or understand it

very well:

I don't think he has a very sure understanding of the situation.

Isolated from any context, sure is usually taught as being highly assertive, that is, it is taught

to express certainty like I’m sure I was there. Of course, there is nothing wrong with this. As

you have read above, this is the usual mainstream use of the word. However, if we search for

sure in a corpus, in this case the SACODEYL English corpus of European young people, we

will find that there is a new pattern which emerges clearly: I‟m not sure + what / if/ whether.

See Figure 2:

Page 5: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 5 of 48

Figure 3: sure in SACODEYL English corpus.

It appears that I’m not sure is a powerful pattern to express hedging or tentative opinion as in

I’m not sure if I’d like to live there. Or followed by a canonical Subject + Verb + Complement

clause to indicate contrast or opinion as in I’m not sure. I’ve always wanted to be... or in I’m

not sure. I find art relaxing because…

As you can see, when we examine the different contexts in which a node is found, that is, the

word you are looking up, we can clearly see different patterns of use that are not always found

in textbooks or dictionaries.

Corpus linguists often discuss this phenomenon and try to account for it by looking at

language as a lexico-grammatical field of interplay rather than one where meaning is created

by the use of word in isolation (i.e. sure).

Bernardini (2004:16) highlights the fact that in DDL there is a “shift of emphasis from

deductive to inductive learning routines” which has a great impact on the agents of FLT. This

is summarised in Table 1:

FLT agents Shift

Teachers Become coordinators of research and facilitator

Learners Learn how to learn through exercises that involve the observation

and interpretation of patterns of use

Pedagogic grammars Are now informed by enough evidence and stimuli for the learner to

Page 6: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 6 of 48

arrive at developmentally-appropriate generalisations

Table 1. Shift of emphasis in DDL-FLT (Bernardini 2004: 16-7).

DDL then is about using data to promote richer language learning experiences. The

definition needs clarification, though. D in DDL stands for data, in other words, for language

data:

However, we should say that in the CL literature these data markedly present a computational

reading. We will try to go deep in the implications for language teachers and deflate the

obscurity that the term may shed in the following paragraphs.

2.1.1. Our English teaching is mediated by language data

We may have not reflected on the issue before, but when we decide on a textbook we are

opting for a particular set of language data to be used in our classroom.

In all probability, you face a situation where the Education Authorities have set an official

curriculum that you are bound to abide by. In a similar way, as a member of a large

institution, you are required to follow certain general methodological guidelines. Leaving

organizational aspects aside, however, teachers have the chance to reflect on their teaching

and choose the materials that best suit their learners. What choices can you make in terms of

the contents of your teaching? What are the main ingredients of your teaching? Do you stick

to a textbook? If so, to what extent do you or your Department consider the language in

there? Have you examined the language used in your textbook?

This is a fundamental issue that deserves our attention. EFL teachers, as most professionals in

other teaching areas, rely on solvent, reliable publishing houses that make an effort to mediate

between the learners and their teachers. In this process, the teacher, or group of teachers of a

school, has the opportunity to revise first and select then the textbooks that will be later used.

If we use language corpora as a complement to our teaching, we will be enlarging the width

of the scope of the language that we present to our students and, certainly, we will be

enriching their learning environment (Aston 1997).

But, before we move on to dealing with the ways in which we can use language corpora, let

us consider briefly the very basics of corpus linguistics.

Page 7: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 7 of 48

2.2. Introducing Corpus Linguistics

Corpus linguistics (CL) makes use of data to gain insight into how language works. A well-

known definition for corpus is the following:

Any collection of more than one text can be called a corpus, (corpus being Latin for

"body", hence a corpus is any body of text). But the term "corpus" when used in the

context of modern linguistics tends most frequently to have more specific connotations

than this simple definition3.

This definition is well rooted in the linguistic tradition, and thus the connotations that

McEnery and Wilson bring up are concerned with the role of a corpus in a research-oriented

paradigm. These connotations are

representativeness,

size,

machine-readable form and

standard reference.

If linguists claim that using a corpus is a convenient way to research language use and

behaviour, they have to make sure that their tool, that is their language corpus, and their

methodology are geared towards maximizing the representative quality of the language

samples that have been included in the corpus. McEnerey and Wilson have put it this way:

We are therefore interested in creating a corpus which is maximally representative of

the variety under examination, that is, which provides us with an as accurate a picture

as possible of the tendencies of that variety, as well as their proportions. What we are

looking for is a broad range of authors and genres which, when taken together, may be

considered to "average out" and provide a reasonably accurate picture of the entire

language population in which we are interested4.

An example of all this is the British National Corpus (BNC). The BNC claims to be

representative of the English language used in the UK in the late 80‟s; its size (100 million

words) is big enough to include most communications genre and textual types; it is of course

electronic and, as a consequence of it all, it has become a standard reference of British

English. The BNC is introduced in its website as follows:

The British National Corpus (BNC) is a 100 million word collection of samples of

written and spoken language from a wide range of sources, designed to represent a

wide cross-section of British English from the later part of the 20th century, both

spoken and written. The latest edition is the BNC XML Edition, released in 2007.

3 McEnery and Wilson. Corpus Linguistics. Available at

http://bowlandfiles.lancs.ac.uk/monkey/ihe/linguistics/corpus2/2fra1.htm 4 Idem.

Page 8: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 8 of 48

The written part of the BNC (90%) includes, for example, extracts from regional and

national newspapers, specialist periodicals and journals for all ages and interests,

academic books and popular fiction, published and unpublished letters and

memoranda, school and university essays, among many other kinds of text. The

spoken part (10%) consists of orthographic transcriptions of unscripted informal

conversations (recorded by volunteers selected from different age, region and social

classes in a demographically balanced way) and spoken language collected in different

contexts, ranging from formal business or government meetings to radio shows and

phone-ins5.

The BNC can be searched free of charge from http://www.natcorp.ox.ac.uk/ The results are

limited to 50 hits, but this is enough to have a clear idea of what we are looking into:

Figure 3. The BNC website.

However, using corpora is not the ultimate, one and only solution to linguistic inquiry and

research. This is not the place to revisit the old controversy between Noam Chomsky and

Charles Fillmore, two influential linguists of the second half of the XXth century. The former

has overtly criticized the use of language corpora as they are not seen as a reliable way to

render the complexity and vastness of language. Chomsky believed that the rules governing a

language could actually be scrutinized through introspection; the actual performance was

considered, by contrast, something that could not be apprehended. Fillmore criticised

armchair linguists that do not use real, that is, attested language data and, on the contrary,

rely on their own intuition and idiolect to develop complex theories of language.

5 From http://www.natcorp.ox.ac.uk/corpus/index.xml

Page 9: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 9 of 48

By the way, Fillmore criticises similarly corpus linguists that waste their time on design

issues, but that‟s a different story. The point here is that there has traditionally been a

controversy between introspection and data examination as valid tools for linguistic analysis.

Corpus Linguistics has gained now the interest of many researchers that believe that data need

to be collected before we can jump into conclusions about language use. In this sense, CL

methodology is empirical and data-driven.

Corpus-based research can be then characterised by two main features (Conrad 1999:3-4):

1. The use of a principled collection of naturally-occurring texts, that is, a corpus. The

BNC discussed above.

2. The use of computers for language analyses. Depending on the items being analysed,

these can be automatic or may need human interaction.

Corpus-based studies include both quantitative analyses and functional interpretations of

language use. The following table offers the basics of CL:

Page 10: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 10 of 48

Term Explanation

Chunks Groups of words that cluster together in n-number of words, i.e., 2,3,4,5, etc.

These are not necessarily phrases (i.e. Noun Phrases) or clauses, but rather

words that combine together in a statistically significant way. I don’t know,

what I really mean or a couple of are good examples of chunks.

Collocates Words that occur frequently in contiguity or almost in contiguity. To

determine whether a collocate is significant, the software package performs

statistical analyses.

Concordance

lines

Lines of text which show a node in the middle. The node is the word or string

of words that is being searched in a corpus.

Concordancer The software that generates concordance lines.

Corpus A principled collection of texts. This collection should follow strict design

guidelines if the corpus is to represent a language or a register.

Wordlist The list of words that are found in a corpus or in a particular text. This list

usually shows the frequency of occurrence and, possibly, other statistical

indexes.

Table 2. The basics of CL.

All these terms are usually found in descriptive accounts of English and have a very

interesting potential in language learning. For example, chunks are strings of n-words that

cluster together in a systematic way. Linguists such as Lewis (1993) or Nattinger and De

Carrico (1992) have stressed that lexis is primed over grammar in discourse:

Lexis is central in creating meaning, grammar plays a subservient managerial role. If

you accept this principle then the logical implication is that we should spend more

time helping learners develop their stock of phrases, and less time on grammatical

structures6.

Corpora are useful in revealing that the language speakers use relies heavily on chunking, that

is, the repetition of string of words. O'Keeffe, McCarthy and Carter (2007:60) highlight that

“language is available for use in ready-made chunks to a far greater extent than could ever be

accommodated by a theory of language which rested upon the primacy of syntax”. Let us give

you real instances of chunking in English. These authors have used the CANCODE corpus7, a

5-million word corpus of spoken British English, to generate the most frequent chunks of n-

words. These are the results for the top 1 and 2:

Top 1 chunk Top 2 chunk

3-word chunks I don‟t know a lot of

4-work chunks You know what I know what I mean

5-word chunks you know what I mean at the end of the

6-word chunks do you know what I mean at the end of the day

and these for the top 15 and 19 (chosen at random):

Top 15 chunk Top 19 chunk

6 Islam and Timmis: http://www.teachingenglish.org.uk/think/methodology/lexical_approach1.shtml

7 http://www.cambridge.org/elt/corpus/cancode.htm

Page 11: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 11 of 48

3-word chunks I think it‟s you know the

4-work chunks or something like that that sort of thing

5-word chunks I don‟t know what it an hour and a half

6-word chunks and at the end of the if you see what I mean (top

16)

O'Keeffe, McCarthy and Carter (2007:71) state that despite being syntactic fragments, these

chunks perform a very important pragmatic function beyond the word level and, significantly,

many have a discourse marking function (I mean, you know, you know what I mean, at the

end of the day, if you see what I mean,...).

In the same way, a corpus can be used to generate collocates, frequency lists and, as seen,

concordance lines. There are software packages that can handle this. Probably WordSmith

5.08 is one of the most complete suites available. Interesting non-commercial applications

include:

Generate concordance lines for every word in a text:

Text-based concordances: http://www.lextutor.ca/concordancers/text_concord/

Generate chunks for a text:

N-Gram phrase extractor: http://www.lextutor.ca/tuples/eng/

Search principled corpora:

Online concordancer: http://www.lextutor.ca/concordancers/concord_e.html

Generalte concordance lines, frequency lists, etc.:

Tubo Lingo: http://www.staff.amu.edu.pl/~sipkadan/lingo.htm

8 http://www.lexically.net/wordsmith/

Page 12: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 12 of 48

Figure 4. Online concordancer.

Page 13: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 13 of 48

2.3. How can we make use of Corpus Linguistics? Indirect approaches

Following Geoffrey Leech, Römer (2008) distinguishes between indirect and direct

applications of CL in the field of language teaching. Indirect approaches to corpora provide

access to corpus-informed insights into the nature of language. Those who consume this

information are typically, although not exclusively, researchers and language material writers

and designers. The typical users of this approach are teachers and learners themselves. The

following figure summarises this dichotomy:

Figure 5. Indirect and direct applications of CL in the language classroom (Römer 2008).

Direct approaches are focused on straight, hands-on learning activities and the generation of

classroom material. These direct hands-on experiences can be either guided or unguided by

the teachers, and thus it is likely that most teachers find tasks that are suitable to their

students‟ needs and contexts.

Indirect approaches to using corpora in the language classroom have occupied the agenda of

applied linguists for over a quarter of a century now. These approaches have benefited from

linguistic research into the nature of language and offer a fresh non-normative view of

naturally occurring language. One of the main contributions of these studies is that corpus

data very often question our perceptions of how language works. A good example of this is

Biber (1988) and, particularly useful in the context of FLT, Biber at al. (1999):

Page 14: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 14 of 48

Figure 6. Longman Grammar of Spoken and Written English (LGSWE).

The authors of the LGSWE claim that this work “describes the actual use of grammatical

features in different varieties of English: mainly conversation, fiction, newspaper language,

and academic prose […] The LGSWE adopts a corpus-based approach, which means that the

grammatical descriptions are based on the patterns of structure and use found in a large

collection of spoken and written texts, stored electronically, and searchable by computer”

(Biber et al. 1999: 4). So the idea here is that a well-designed corpus can be useful in learning

more about how language works. This is useful for both native and non-native speakers as

even the latter cannot rely on pure intuition to determine how language works across every

single register and communicative domain.

Let us have a look at one syntactical construction to illustrate the usefulness of corpora in the

language classroom. Existential clauses contain, in most cases, be as a verb and there as a

subject: There is no coffee is a nice example of locative here. There, however, introduces

other verbs: seem, appear, suppose and use to are nice examples. When to use one or another

as their meanings are so close? In the LGSWE we find corpus-driven information that tells us

that the frequency of appearance of these verbs after existential there depends on the textual

and domain features of the communicative event.

Thus there exist/exists is very frequent in academic texts while it is rare or infrequent in

conversation, fiction and news language. There come/comes, on the contrary, is infrequent in

academic language, conversation and news, but very often found in fiction texts and creative

language use. Figure 7 illustrates this point:

Page 15: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 15 of 48

Figure 7. Verbs other than be in existential constructions. Biber et al. (1999).

When these and similar verbs are followed by to be we discover interesting facts. There

seem/seems to be is found to occur across all 4 domains and textual types while there used to

be is untypical and not frequent at all in fiction, news or academic language:

Figure 8. To be after some verbs in some existential constructions. Biber et al. (1999).

In these examples we can note the interplay between grammatical categories and register.

Page 16: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 16 of 48

3. Direct approaches

As stated, direct approaches are more prone to immediate, straightforward classroom

applications. In some schools, it might be convenient to make use of a computer room while

in others teachers will prefer to develop materials that can be printed and later distributed. The

nature of the lesson will determine what kind of interaction we expect from our students.

3.1. Some tips

If you want your learners to plunge into using a corpus, our suggestion is to follow a

carefully-planned route:

1. Select a small group of learners. Using technology is cumbersome at times and computers

tend to crash in multimedia LANs which are often used by many. If your LAN restricts IPs or

domains, make sure before hand that the sites you plan to use are availble.

2. Avoid meta-language, such as linguistics, node or principled corpus. It is language, real

language that your learners will be more interested in.

3. Before getting your students to use a concordancer or a similar tool, distribute activities

where they can get used to reading vertically rather than horizontally. Make sure they get used

to interpreting the context and making hypothesis about contexts of use and prosodies, that is,

whether the line is used in a derogatory way or positively.

4. Select what you want your students will be looking up well beforehand. Examples or

activities that are over the top easily discourage students.

5. Try to put interesting questions to your students. Motivate them and make them become

interested in turning themselves into researchers or, better, detectives.

6. Select carefully the corpus you want to use. You may consider building your own corpus.

3.2. Activities: using SACODEYL

A corpus is an excellent tool to discover language behaviour and to learn more about

collocations and patterning. In teaching contexts, principled corpora may not adapt well to

your students‟ level, especially if these are very young. We recommend that you build your

own collection of texts if they are suitable to your students‟ needs. However, using

SACODEYL is a more straightforward option if you want to use teen talk, multimedia

corpora:

Page 17: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 17 of 48

By using a corpus as a tool to find out language, learners are given the chance to empower

their inductive skills to learn about language, which is highly instrumental for further

learning. Sinclair (2004:288) is definitely optimistic about the unmediated use of reference

corpora in the language classroom:

...both teacher and student can make use of a corpus right away, with only a modest

few hours orientation; there is no need to wait for the new textbooks and reference

books. Only fairly simple queries can be handled at this stage, but the results can be

illuminating and very helpful. For this, you will need a computer of normal

performance, a corpus and some query software. Will the corpus be 100% reliable,

comprehensive and representative? Of course not, but do your present books match

these targets? Or your reference grammars and dictionaries? Or any native speaker

models? Or any combination of these? Of course not.

Despite Sinclair‟s statement, the teaching context in secondary education is still far from

complying with much of the requirements above. Good reference corpora are commercial and

search tools are difficult to handle9. Mauranen (2004:1999) has voiced her concern for the

actual use of innovation in classrooms:

No teaching method can become an important innovation, whatever its potential, if it

does not make its way to the normal classroom where teachers and students ca use it as

part of their everyday routines, whit not too much extra hassle.

Fortunately, there are now a few instances of pedagogical corpora whose focus is more on

learning than on linguistic research and which happen to be free to use. SACODEYL is one of

these pedagogically-motivated corpora. ELISA, its predecessor and inspiration, is another

interesting effort:

ELISA is a collection of video-based interviews with native speakers of different

varieties of English (e.g. US, England, Scotland, Ireland, Australia) and from different

walks of life. They talk about their professional career. All interviews follow a general

pattern, covering a similar range of topics, e.g. the what the speakers do, their

educational background, how they started their career or business, the type of projects

they are involved in, their daily routines and future plans. While some of the speakers

engage in unusual professions (e.g. a tour guide at Ayers Rock, a guitar teacher, a

travel journalist and an arts therapist) and thus make for the attraction of the materials,

they all describe issues of general interest in professional contexts. The corpus

currently contains 25 interviews of 5 to 15 minutes. the transcripts amount to about

60,000 words in total10

.

9 Guy Aston and Lou Burnard published in 1998 The BNC handbook: exploring the British National Corpus

with SARA. Edinburgh Textbooks in Empirical Linguistics, an excellent reference book to fully exploit SARA. 10

http://www.uni-tuebingen.de/elisa/html/elisa_index.html

Page 18: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 18 of 48

SACODEYL offers young learners the language and the voices of their peers. As in ELISA,

SACODEYL kids talk about their daily routines, about themselves, their schools, their

hometowns, their leisure time activities and hobbies, films, books, sports and many other

topics.

The SACODEYL corpus has been annotated with a view on pedagogical applications. This

makes SACODEYL a very interesting complementary material in mainstream teaching where

teachers and students can find a familiar range of language/communications context. The

following figure illustrates this:

Figure 9. SACODEYL search categories.

These categories resemble the language and the communication-oriented methodology of

mainstream language teaching. Learners ant teachers using SACODEYL may want to

navigate the English corpus in exactly the same way as they mavigate the contents of their

textbook. In SACODEYL, every interview has been split into sections, that is, convenient

teaching and learning stretches of language which have a pedagogical value. Each section has

been annotated by experienced teachers who have assigned them a full array of categories and

subcategories. Having annotated the corpus, this can be searched accordingly:

Page 19: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 19 of 48

Figure 10. SACODEYL search categories in detail.

Users can also browse interviews:

Page 20: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 20 of 48

Figure 11. Browse area for SACODEYL English corpus.

And sections within interviews, search for sections that meet the criteria you set:

Figure 12. Browse area for SACODEYL section search.

Let us consider some activities for the language classroom. We assume that your learners are

Secondary School students of English, so we will use SACODEYL English corpus, a small

Page 21: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 21 of 48

corpus of teenage talk contributed by some 25 interviewees from the Reading area in the UK.

Here is a selection of activities that illustrate the type of

3.2.1. Activities focused on communication and attention to form

Tell your students to search for [Reading]. You may want to introduce them to the area and

neighbouring cities, all of them widely known. Ask them to read the concordance lines and

get them to classify (A) words on the left, (B) words on the right and (C) contexts of use:

Figure 13. Simple SACODEYL word search.

The following screen shows the number of hits by displaying the concordance lines:

Page 22: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 22 of 48

Figure 14. SACODEYL Search tool.

You may want to guide your students in their search. Providing tables to fill in is usually very

productive as this keeps students focused on the task, which becomes more convergent:

A

Write here the most frequent words or punctuation to the left of Reading

(like, feel, tell) about (live, be) (here) in the (centre, outskirts) of

B

Write here the most frequent words or puntuation to the right of Reading

as a place . / ? festival

C

Guess: What is it talked about?

Context 1

Context 2 Context 3

Page 23: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 23 of 48

Reactions to/ opinions on

your hometown /

where you live

Staying in Reading of leaving

Reading / Travelling

Reading festival

Table 3. Fill-in table.

In A and B students are invited to observe the surrounding context of a word and note the

accumulation of certain instances to the left or to the right of the node. In C, students are

invited to make hypotheses about what is being talked about. If desired, you can explore uses

of like about / feel about / tell about or [Murcia/ Cartagena as a place] or, more from a

communicative perspective, expressing opinion about your city/ place or the place where you

live. If you tell your students to search for [like about], they will be given instances where

kids use it in a real context embedded in the flow of speech. And more importantly, your

students will be presented with an opportunity to disambiguate other uses of [like about]:

Figure 15. SACODEYL Search tool.

In the case highlighted above, [like about] is used as a hedge, a very common feature of

spoken English. This is a convenient way to combine communication oriented teaching and

Form-focused instruction. This range of activities is focused on analysing the context of use

of a given word [Reading], both linguistically and communicatively.

Page 24: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 24 of 48

In a unit where music and concerts are presented, you may want to ask your students to find

out about [Reading Festival]. This is what they may find11

:

Figure 16. SACODEYL Search tool.

From here, students can go to the interview section where the speaker talks about it:

11

At the time of writing, the corpus search facility was under construction, so search results may vary.

Page 25: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 25 of 48

Figure 17. SACODEYL Search tool: section level.

and read and listen to what this speakers says about it:

Figure 18. SACODEYL English corpus: section level.

It is interesting to see how the online nature of spoken discourse affects the way we put things

while speaking. In this very short extract, your students can find the following, among others,:

-Native correction: [gonna to]

-Unfinished sentences: [been so, but]

-Contractions not frequently used by Sapnish speakers: [it‟ll be]

Page 26: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 26 of 48

As put by Bernardini (2004: 17) working “concordancing in particular may prove unique in

the acquisition and restructuring of competence [...] Language learning may be viewed as an

inductive process in which meaning and form come to be associated”.

3.2.2. Activities focused on attention to form and communication

Römer (2008: 19) has pointed out that concordance lines can be used by teachers to “create

DDL exercises tailored to their learners‟ proficiency level and their particular learning needs”.

A case in point is the use of articles. This will be dealt with later in chapter 4 from a different

angle.

Let us search for sections in SACODEYL English corpus that have been annotated as being

representative of this particular linguistic feature:

Figure 19. SACODEYL English corpus: category search on section level.

From this you may want to select stretches of language that can be submitted to students for

evaluation and analysis or simply they can be used as materials to improve their mastery of

the form. The following bits are interesting for different reasons. A is actually very

convenient to see the use of the indefinite article:

(A)

Page 27: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 27 of 48

Interviewer: So, what kind of house do you live in? Can you describe what kind of

house you live in?

Rachel: It‟s a semi-detached and it‟s got a garage and a big garden and it‟s quite big. It‟s got

quite a lot of rooms but I have to share my room with my sister.

You could present this in a cloze format:

Interviewer: So, what kind of ...house do you live in? Can you describe what kind of

...house you live in?

Rachel: It‟s ... semi-detached and it‟s got ...garage and ... big garden and it‟s quite big. It‟s

got quite ... lot of rooms but I have to share my room with my sister.

In B, we can notice the presence of the zero article:

(B)

Interviewer V: You say you‟ve got a lot of work this year why is that?

Sam: It‟s our first year of GCSEs so you‟ve got course work and it‟s like

writing essays for different subjects. And recently we‟ve been doing English we

did a we did a we did course work on a book Hard Times by Charles Dickens. Which

was a bit boring but, but we‟ve finished that now so it‟s alright.

You could present this in a cloze format:

Interviewer V: You say you‟ve got a lot of work this year why is that?

Sam: It‟s our first year of GCSEs so you‟ve got ...course work and it‟s like

writing ...essays for ...different subjects. And recently we‟ve been doing ...English we

did a we did a we did ...course work on ... book Hard Times by Charles Dickens. Which

was a bit boring but, but we‟ve finished that now so it‟s alright.

In actual fact, (B) can be expanded easily into an interesting source for pragmatic information

including sentence restructuring [did a a we did], sentence relatives to express evaluation

[Which was a bit boring] and conclusion [so it‟s alright].

Barlow (1996) sees in activities like these a potential for teachers to enrich the learning

environment and students‟ knowledge of language.

For a thorough account of concordance-based DDL, we suggest reading a practical book on

the issue (Tribble and Jones 1990):

Page 28: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 28 of 48

Figure 20. Concordances in the classroom, by Chris Tribble and Glyn Jones. Longman 1990.

Page 29: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 29 of 48

4. Indirect approaches: Learner corpora in the EFL classroom

4.1. Definition

Among the many types of corpora which can be compiled, analysed and used (see McEnery,

Xiao and Tono, 2006, for an overview), Computer Learner Corpora (CLC) stand out as one of

the most powerful pedagogic tools for the EFL or ESL classroom. As recently defined, they

are

„[…] electronic collections of foreign or second language learner texts collected on the basis

of strict design criteria.‟ (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007: 254)

In other words, a learner corpus is compiled when the oral or written texts produced by your

students of English are collected with strict design criteria, put in electronic format, and then

stored in your hard drive, memory stick, etc., so that you can conduct analyses with

programmes like WordSmith Tools, already mentioned:

Figure 21. From oral or written texts to a computer learner corpus.

Thanks to the availability of computers and freely available software to carry out analyses,

Learner Corpora Research (LCR) has been a fruitful field since the second half of the 1990s.

From that moment onwards, the growing number of publications either in edited volumes (cf.

Granger, 1998; Granger, Hung and Petch-Tyson, 2002; Guilquin, Papp and Díez-Bedmar, in

press, etc.), or international journals (cf. Corpora, Applied Linguistics, English Corpus

Studies, Journal of English for Academic Purposes, ReCALL, etc.) shows the potential of this

type of research and constitutes the first steps to the awareness of the possibilities that CLC

can offer for Second Language Acquisition and for the TEFL or TESOL classroom.

Page 30: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 30 of 48

4.2. Types of CLC

Due to the importance of CLC-based results, the number of CLC has mushroomed since the

second half of the 1990s. The research questions pursued by various researchers or research

teams have fostered different types of CLC, which are frequently classified according to four

related variables, namely the mode of the language in the learner corpus, its size, the type of

intervention (i.e. when the CLC-based will be applied in the design of materials, the

sequencing of the curriculum, etc.), and the type of annotation in the corpus.

Mode

Written

Spoken

Multimedia

Size

Big (commercial or some research teams)

Small

(research)

Type of Intervention12

Delayed Human Intervention

Early Human Intervention

Type of annotation13

Raw

POS-tagged

Semantically- tagged

Error-tagged

Table 4. Main variables considered for the classification of learner corpora.

4.3. Methodologies used with CLC

Compiling students‟ production does not constitute new practice to teachers of English as a

second or foreign language, as it has always been considered to create remedial exercises, test

their command of the foreign language, etc. However, the methodology used to conduct the

analysis of the students‟ production has changed along time, as researchers and teachers have

focused their attention on different aspects (the students‟ L1, the target language, etc.) and

technology has made it possible to compile CLC, i.e. learners‟ real data in electronic format.

Table 5 shows the three main methodologies used before the arrival of CLC. The first one,

Contrastive Analysis, in its strong form, did not consider the students‟ production, but the

12

This distinction was made by Sinclair (2001, vii). 13

For the types of annotation, refer to McEnery and Wilson

Page 31: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 31 of 48

similarities and differences between the students‟ L1 and their target language (i.e. Spanish

and English), in order to predict the difficulties that students would have. The weaknesses

found in this methodology led researchers to shift their attention to Error Analysis, whose

theoretical principles and methodological issues were provided in a series of articles in the

1960s and 1970s (and reprinted in Corder, 1981). Specially outstanding was the paper „The

significance of learners errors‟ (included in Corder, 1981), which proved that errors were

crucial to researchers, teachers and students, since they all could learn from them and apply

that knowledge to their research, teaching practice or learning process. Thus, the steps for

conducting an EA were followed by many teachers and researchers and the results published,

on some occasions, as dictionaries and lists of common errors.

However, Error Analysis only considered errors and dismissed the learners‟ correct use of the

foreign or second language. This led Selinker to his Interlanguage Analysis (IA) (Selinker,

1972), which examined the students‟ entire production, i.e. errors and non-errors alike. In this

way, it was possible to obtain a better description of the students‟ use of the foreign language

when performing a task at a specific point in time in their language learning process: their

interlanguage.

Methodology Focus of interest Publications

Pre

CL

C

Contrastive Analysis (CA) Comparison of

the students‟ L1 and their TL

Lado (1957)

Error Analysis (EA)

Students‟ real errors Corder (1981)

Interlanguage Analysis (IA) The students‟ whole

production, errors and non-

errors

Selinker (1972)

Table 5. Methodologies used to describe the students‟ language before CLC.

Despite not in a systematic way, teachers of English as a foreign or second language

frequently analyse their students‟ production following any of these methodologies or a

combination of some of them.

For instance, an Error Analysis is conducted when a teacher corrects a batch of essays and

uses a code system, i.e. an error taxonomy,14

to make the students aware of the type of error

made. Thus, „sp‟ may stand for a spelling error, „wo‟ for word order, „prep‟ for a problem

with a preposition, etc. After marking all the essays, and skimming his or her annotation, the

teacher realises that the most frequent error in the compilation of essays has to do with a

certain aspect of the foreign language (be it prepositions, articles, verb tenses, etc.). If the

correct instances of those aspects are considered together with the incorrect ones, an

Interlanguage Analysis is conducted. However, if the students‟ L1 is compared to their TL

14

For an overview of various error taxonomies, refer to (Dulay, Burt and Krashen, 1982: 146-197) or James

(1998: 102-117).

Page 32: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 32 of 48

either before or after analysing their production in an attempt to explain the causes of the

students‟ errors, a CA in its strong or weak version, respectively, is completed.

The manual analysis of the students‟ errors, following a CA, EA or IA methodology, proves a

time- and effort- consuming task which a teacher can only do with a limited number of

essays, as it is necessary to go to the essays, look for the errors, highlight, classify and count

them, make sure all the errors are being considered, look for the correct use of the aspect of

the language being analysed, compare the use of the aspect under analysis in the L1 and the

FL, etc. Fortunately, those processes have been sped up thanks to the improvement in

technology and, consequently, the advent of CLC, their electronic format being among their

main advantages (Nesselhauf, 2004: 139-40), because they make their compilation and their

analysis easier.

Not to fall prey of the temptation to collect huge disorganized amounts of data, as it is the

case with corpora in general (see section 2.2. above), strict design criteria are to be observed

when compiling a learner corpus. Special attention needs to be given to the principles of

authenticity and representativeness, and all attempts are to be made to avoid the effects of

variability not to compare aspects from a not homogeneous learner corpus. Thus, if the

teacher aims at representing students‟ in-class argumentative writing at intermediate level,

pieces of writing which belong to other genres, which are written by students at other

proficiency levels, or at home (and presumably with access to reference materials), should not

be included in that corpus, since the results would be biased. Just consider, from your own

experience, the difference in the type and amount of errors which an argumentative essay

written by a student in class (and without the use of dictionaries, online resources, etc.) and at

home would have or, likewise, the type of errors that you expect from descriptive writing as

compared to narrative writing.

Drawing from the methodologies in the pre-CLC era, the analysis of students‟ use of

language, as represented in a learner corpus, is nowadays being made in a systematic and

scientific way following Computer-aided Error Analysis (CEA), Contrastive Interlanguage

Analysis (CIA) or the Integrated Contrastive Method (ICM):

Methodology Focus of interest Publications

CL

C

Computer-aided Error Analysis

(CEA)

Students‟ real errors, as

attested in a CLC

(Dagneaux, Dennes

and Granger, 1998)

Contrastive Interlanguage

Analysis (CIA)

Comparison of

NS vs. NNS production

NNS vs. NNS production

(Granger, 1996)

Integrated Contrastive Method

(ICM) CA

CIA

(Granger, 1996;

Gilquin, 2000/2001)

Table 6. Methodologies used in the description of the learners‟ production of the foreign

language.

The first one, CEA, is a „new type of EA‟ (Dagneaux, Dennes and Granger, 1998: 165). In

other words, it is a computerized version of EA, which allows a quicker error annotation and

easy retrieval of the erroneous instances of students‟ use of the foreign language. There are

Page 33: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 33 of 48

two ways to conduct such an analysis, which depends on whether the learner corpus is error-

tagged or not, i.e. whether a code system to highlight the errors has been used or not.

If it is not, an intuitive search for an error-prone aspect is undertaken. This is the case when

the teacher feels that the central articles the and a(n) pose problems to his or her students. By

means of a learner corpus and retrieval tools, s/he can read in the concordances retrieved the

use of those articles and decide which ones are incorrect, thus conducting an EA.

However, a raw learner corpus, i.e. one without error annotation, will not allow the researcher

to retrieve those instances of the (mis-)use of the zero article, since it would be impossible to

automatically retrieve them. To do so, the learner corpus needs to be error-tagged.

There are two types of error-tagged learner corpora:

Fully error-tagged and

Partially error-tagged

In the former, a comprehensive error taxonomy has been used to highlight all the possible

errors in a learner corpus. Although few learner corpora are fully error-tagged due to practical

reasons of time and money, the results which such EAs yield provide a bird‟s-eye perspective

of the students‟ problems when using the foreign language at a specific moment in their

language acquisition process. As an example, Figure 7 shows the percentage of errors in

forty-three aspects of the foreign language (as represented in the error tags on the horizontal

axis) that the written production by first-year university students contains at the beginning of

the academic year (Díez-Bedmar, 2005):

Figure 22. EA of first-year University students when beginning the academic year (Díez-

Bedmar, 2005).

Page 34: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 34 of 48

A partially error-tagged learner corpus only highlights a specific type of error, which is of

interest to the teacher or the researcher. Resuming the case of the central articles, a partially

error-tagged learner corpus will make it possible to easily retrieve, quantify and analyse the

errors made with the articles the and a(n) (as it was the case with a raw learner corpus), but

also those errors involving the zero article (Ø). Notice in the following concordance lines the

cases of incorrect use of the central articles, a(n), followed by erroneous uses of the zero

article, and then erroneous uses of the, as error-tagged (GA).

Figure 23. Article errors as retrieved from a partially error-tagged learner corpus using

WordSmith Tools..

The second methodology used with CLC, the Contrastive Interlanguage Analysis, allows the

researcher to compare the students‟ production with:

1 the production by native speakers of English

2 the production by other groups of learners of English with a different L1

On the one hand, if your students‟ production is compared to that by native students of

English (at the same level and under the same external variables), it would be possible to see

how (dis-)similar both productions are when an aspect of the foreign language is studied. As a

result, instances of misuse but also under- or over-use are revealed and conclusions such as

the overuse of the prepositions between, inside and according to by Spanish university

students, when comparing them to native learners of English can be drawn (Martínez Osés

and Neff, 2001: 144). On the other hand, you may be interested in comparing how various

Page 35: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 35 of 48

groups of students of English (at the same proficiency level and under the same external

variables) struggle with the same aspect of the foreign language, as Kaszubski (2001) did

when comparing the use of the lemma be by Spanish, Polish and Belgian-French students.

Finally, the Integrated Contrastive Model includes a CIA and a corpus-based CA. Therefore,

three different corpora are used, namely the learner corpus, the control corpus and a corpus

which contains the production by native speakers in the L1. As it happened with CA in the

pre-CLC era, there are two ways of conducting an ICM. First, the corpus-based CA is

conducted in order to see the main differences between the two native languages considered

and, then, the problems posed by such differences are attested in the learner corpus. On the

contrary, the problems in a learner corpus, as revealed by a CIA may lead to a corpus-based

analysis of the two native languages in an attempt to find the causes of such errors.

4.4. The application of CLC in the TEFL classroom

The potential of CLC in the direct and indirect approaches will be explored in this section.

The first one will deal with the indirect approach, that is, using the results from the analysis of

CLC (following the methodologies described in 4.3) to improve teaching materials, the

curricula, etc., whereas the second one will focus on the direct approach, which provides

hand-on experience in working with CLC.

4.3.1. The indirect approach

Although CLC-based descriptions of the students‟ interlanguage are still limited and only

provide „[…] patchy knowledge of the different stages of interlanguage development.‟

(Gilquin et al., 2007: 322), the results obtained are progressively being introduced in teaching

materials.

Among the ones which have benefited more from the results in CLC are the dictionaries of

common errors, such as The Longman Dictionary of Common Errors (Turton and Heaton,

1987) and the Cambridge series Common Mistakes at… (Tayfoor, 2004; Driscoll, 2005; etc.),

in which frequent errors in learner corpora are highlighted and explained.

Page 36: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 36 of 48

Figure 24. CLC-informed materials focused on common errors.

Likewise, dictionaries have also been CLC-informed. The first one was the Longman

Essential Activator (LEA), which made use of the information in the Longman Learner’s

Corpus (LLC), and was followed by some others such as the Cambridge International

Dictionary of English, based on the error-tagged Cambridge Learners’ Corpus (Nicholls,

2003), or the second edition of the Macmillan English Dictionary for Advanced Learners,

based on a CIA analysis of the International Corpus of Learner English (ICLE) and a corpus

of native speakers‟ academic writing.

Figure 25. CLC-informed monolingual dictionaries of English.

The CLC-based information in these dictionaries is typically provided in „help boxes‟, which

are quite familiar to any learner of English as a foreign or second language. However, new

ways of offering information from CLC are being devised, as it is the case of the graphs in the

Macmillan English Dictionary for Advanced Learners, which shows the results of the CIAs

conducted on problems of frequency, register confusion, etc. Similarly, alternative ways to

express the students‟ typical errors are also suggested (as exemplified from the control

corpus) and extended writing sections on twelve rhetorical or organizational functions which

are particularly prominent in academic writing are included (cf. Gilquin, Granger and Paquot,

2007, pp. IW1-IW29).

Page 37: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 37 of 48

Figure 26. CLC-based results as provided in the Macmillan English Dictionary for Advanced

Learners (MED2).

Recent grammars also include information from learner corpora, as it is the case of Carter and

McCarthy‟s (2006) Cambridge Grammar of English, or the on-line Chemnitz Internet

Grammar of English.

Figure 27. CLC-informed grammars of English.

Finally, CLC may inform CALL programmes, such as WordPilot (Milton, 1998) or be

integrated into CALL programs, so that teachers and students, if deemed convenient, have a

direct access to the real data, as in the EXample eXtractor Engine for LAnguage Teaching

(eXXelant) (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007).

Although syllabus design, textbooks and writing courses are now beginning to consider native

data in their recent editions (cf. the Touchstone Student’s Book series), there is no doubt that

the information provided by CLC can complement and improve such materials to meet the

students‟ real needs.

Page 38: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 38 of 48

4.3.2. Designing remedial exercises from a learner corpus

Analysing a learner corpus and designing CLC-based remedial exercises to meet your

students‟ real needs is not a difficult task. To help you analyse the data in a learner corpus,

this section will explore two ways to approach a small raw learner corpus. The first one deals

with the students‟ use of vocabulary, and the second one with the lexico-grammatical pattern

of the verb „say‟ and „tell‟.

The learner corpus used is one composed of the handwritten production by 16 first-year

university students (amounting to 17,765 words) when writing descriptive texts in class,

without any access to reference materials and a time limit of 60 minutes, was used. The piece

of software used for such purpose will be WordSmith Tools version 4.0.

4.3.2.1. Exploring vocabulary usage: wordlists and concord

This piece of software allows the teacher or researcher to create a wordlist, to run

concordances and explore keywords, as can be seen in the following Figure. However, we

will focus on the use of word lists and concordances for an exploratory analysis of the

adjectives used by a group of learners.

Page 39: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 39 of 48

Figure 28. WordSmith Tools 4.0.

As this self-explanatory term indicates, a word list is a list of the words in your learner

corpus. This term was reviewed in Table 2 above. Such list may be quantitatively ordered

from the word which presents the highest number of occurrences to the ones which only

appear once, or the other way round.

As can be seen in Figure 29 below, a word list of the adjectives that students used in the

learner corpus was obtained after removing from the list the words which did not belong to

this open word-class. As a result, it was possible to check that the adjectives which were most

used by those students were „good‟, „important‟ and „different‟. This finding may not have

surprised an experienced teacher, but the co-text in which these adjectives are used may

reveal interesting and unexpected deficiencies in the learners‟ vocabulary.

In order to explore such co-texts, the next step is to run concordances of any of these words.

For this example, „important‟ was selected. As can be seen in Figure 30 below, by running a

concordance we obtain sentences with the searched word in the middle and in blue. This is

known as „Key Word In Context‟ (KWIC), or node, and the lines obtained (i.e. concordance

lines) are not to be read in the traditional way (that is, everything from left to right as already

seen above), but we only focus on the first word to the left or to the right of the KWIC. Thus,

we are able to see the type of pre-modification the students use with the adjective under

consideration (first word to the left of the KWIC), and which elements are qualified as

„important‟. As already reported (cf. Granger and Tribble, 1998 or Osborne, 2004, among

others), students rely on this adjective, to the detriment of the use of others like „crucial‟,

„outstanding‟, „main‟, „valuable‟, etc., in the appropriate contexts. Therefore, a very easy

Page 40: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 40 of 48

exercise to create with the students‟ real words in their compositions is to remove the KWIC

and leave a blank, so that they have to think of a better alternative to fit in the linguistic

contexts they have created.

Figures 29 and 30. WordSmih Tools: Running a concordance and hiding the KWIC.

Page 41: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 41 of 48

Figure 32, presents a screenshot of such worksheet, which you can put into a word document

and use in class. The strongest aspect of this exercise is that it is based on your students‟ own

errors, and therefore, cater for their very specific needs. Furthermore, students are more likely

to feel motivated to do this exercise, since they may recognise their sentences and may be

willing to learn how to improve them.

Figure 31. Concord utility.

Page 42: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 42 of 48

Figure 32. Worsksheet in a .doc document.

4.3.2.2. Exploring lexico-grammatical patterns: „say‟ and „tell‟

The use of the verbs „say‟ and „tell‟ are reported to pose difficulties to students at various

levels due to their different lexico-grammatical patterns. However, it is worth exploring

whether your students do make those mistakes and, if so, which are the most problematic

uses.

In order to do so, the first step is to run a concordance of the verb „say‟ and sort the first

words to the right of the concordance line, as shown in Figures 33 to 35.

Page 43: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 43 of 48

Figures 33 to 35. Running a concordance and sorting them considering the first element to the

right of the KWIC

By doing so it is now possible to see how the students complement the verb „say‟ in different

contexts and co-texts that they have created themselves. In checking those uses, it is also

possible to notice uses of the verb „say‟, where „tell‟ would have been preferred, or where

another wording would have been more native-like.

In order to show students real native examples of the use of those problematic verbs, i.e. „say‟

and „tell‟, we can use the freely available version of the British National Corpus (BNC) or the

Page 44: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 44 of 48

Collins Wordbanks Online English Corpus as control corpora, and show students some

examples in KWIC format to foster their analysis of the lexico-grammatical patterns used

(with the help of the teacher if necessary). To do so, we only have to query those corpora

(Figures 36 and 37), select the examples which show the various possibilities to complement

the verbs and, finally, create a word document for them to work with

Once real input has been provided to students and they have reflected on the various lexico-

grammatical patterning, an exercise based on their own written production, that is, in the

learner corpus compiled, can be created. As it was the case with the example of the use of

„important‟ above, we can easily remove the KWIC (the verbs „say‟ or „tell‟ in this case) from

the concordance lines and create a remedial exercise.

Page 45: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 45 of 48

Figure 36 and 37. Concordances of the verbs „say‟ and „tell‟ in two native corpora.

As can be seen, creating materials which meet our students‟ real needs is not such a difficult

or time-consuming task. EFL teachers‟ experience is highly valuable when considering their

intuitions regarding their students‟ problems, which are worth checking and exploring in the

learner corpus that they have compiled. Once the remedial exercises have been created, the

worksheets can be stored either in paper format or distributed in a virtual platform, so that

students with the same problems, in our school or in another, may benefit from our work

created and improve their use of the foreign language.

Page 46: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 46 of 48

References

Barlow, M. (1996). Corpora for Theory and Practice. International Journal of Corpus

Linguistics, 1, 1. 1-37.

Bernardini, S. (2004). In the classroom: Corpora in the classroom: An overview and some

reflections on future developments. In John Sinclair (ed) How to Use Corpora in Language

Teaching,15-36. Amsterdam: John Benjamins.

Carter, R. and McCarthy, M. (2006). Cambridge Grammar of English. Cambridge:

Cambridge University Press.

Corder, S. P. (1981). Error analysis and interlanguage. Oxford: Oxford University Press.

Dagneaux, E., Dennes, S., and Granger, S. (1998). Computer-aided error analysis. System 26:

163-174.

Díez-Bedmar, M.B. (2005). Struggling with English at university level: error-patterns and

problematic areas of first-year students‟ interlanguage. In P. Danielsson and M. Wagenmakers

(eds), The corpus linguistics conference series. Retrieved 16 September 2007, from

<http://www.corpus.bham.ac.uk/PCLC/>

Driscoll, L. (2005). Common Mistakes at PET… and How to Avoid Them. Cambridge:

Cambridge University Press.

Dulay, H.., Burt, M., and Krashen, S. (1982). Language Two. Oxford: Oxford University

Press.

Gilquin, G. (2000/2001). The integrated contrastive model. Spicing up your data. Languages

in Contrast 3(1): 95-123.

Gilquin, G., Papp, Sz. and Diez-Bedmar, M. B. (eds.) (in press) Linking up Contrastive and

Learner Corpus Research. Amsterdam and Atlanta: Rodopi.

Gilquin, G., Granger, S, and Paquot, M. (2007). Learner corpora: The missing link in EAP

pedagogy. Journal of English for Academic Purposes 6: 319-335.

Granger, S. (1996). From CA to CIA and back: an integrated approach to computerized

bilingual and learner corpora. In K. Aijmer, B.Altenberg and M. Johansson (eds.), Languages

in Contrast. Text-Based Cross-Linguistic Studies, 37-51. Lund: Lund University Press.

Granger, S. (ed.) (1998). Learner English on Computer. London and New York: Addison

Wesley Longman.

Granger S. and Tribble C.(1998). Learner corpus data in the foreign language classroom:

form-focused instruction and data-driven learning. In S. Granger (ed.) Learner English on

Computer, 199-209. London and New York: Addison Wesley Longman.

Page 47: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 47 of 48

Granger, S., Hung, J. and Petch-Tyson, S. (eds.) (2002). Computer Learner Corpora, Second

Language Acquisition and Foreign Language Teaching, Amsterdam and Philadelphia: John

Benjamins.

Granger, S., Kraif, O., Ponton, C., Antoniadis, G. and Zampa, V. (2007). Integrating learner

corpora and natural language processing: A crucial step towards reconciling technological

sophistication and pedagogical effectiveness. ReCALL 19(3): 252-268.

James, C. (1998). Errors in Language Learning and Use. Exploring Error Analysis. London

and New York: Longman.

Kaszubski, P. (2001). Tracing idiomaticity in learner language –the case of BE. In P. Rayson,

A.Wilson, T. McEnery, A. Hardie and S. Khoja (eds.), Proceedings of the Corpus Linguistics

2001 Conference (29 March-2 April), 312-322. Lancaster: University Centre for Computer

Corpus Research on Language

Lado, R. (1957). Linguistics Across Cultures. Ann Arbour, Michigan: Michigan University

Press.

Lewis, M. (1993). The Lexical Approach. Language Teaching Publications.

McEnery, T.; Xiao, R., and Tono, Y. (2006). Corpus-based language studies. An advanced

resource book. London: Routledge.

Milton J. (1998). Exploiting L1 and Interlanguage Corpora in the Design of an Electronic

Language Learning and Production Environment. In S. Granger (ed.) Learner English on

Computer, 186-198. London & New York: Addison Wesley Longman.

Martínez Osés, F. and Neff Van Aertselaer, J. (2001). Corpus analysis of prepositional

patterns in native and non-native university writing. In C. Muñoz, M. L. Celaya, M.

Fernández-Villanueva, T. Navés, O. Strunk and E. Tragant (eds.), Trabajos en Lingüística

Aplicada, 139-147. Barcelona: Univerbook.

Mauranen, A. (2004).Spoken corpus for an ordinary learner. In John Sinclair (ed) How to Use

Corpora in Language Teaching, 89-105. Amsterdam: John Benjamins.

Nattinger, J. R. and J. S. Decarrico. (1992) Lexical phrases and language teaching. Oxford:

Oxford University Press.

Nesselhauf, N. (2004). How learner corpus analysis can contribute to language teaching: A

study of support verb constructions. In G. Aston, S. Bernardini and D. Stewart (eds.),

Corpora and Language Learners, 109-124. Amsterdam and Philadelphia: John Benjamins.

O'Keeffe, A. McCarthy, M. and Carter, R. (2007). From corpus to classroom. Cambridge:

Cambridge Univrsity Press.

Osborne, J. (2004). Top-down and Botom-up Approaches to Corpora in Language Teaching.

In U. Connor and T. A. Upton (eds.). Applied Corpus Linguistics. A Multidimensional

Perspective, 251-265. Amsterdam and New York: Rodopi.

Page 48: Language corpora and the language classroom

Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.

Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Page 48 of 48

Römer, U. (2008). Corpora and language teaching.

Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10: 209-231.

Sinclair, J. (2001). Preface. In M. Ghadessy, A. Henry and R. L. Roseberry (eds.), Small

Corpus Studies and ELT. Theory and Practice, vii-xv. Amsterdam and Philadelphia: John

Benjamins.

Sinclair, J. (2004). New evidence, new priorities, new attitudes. In John Sinclair (ed) How to

Use Corpora in Language Teaching, 271-299. Amsterdam: John Benjamins.

Tayfoor, S. (2004). Common Mistakes at First Certificate… and How to Avoid Them.

Cambridge: Cambridge University Press.

Tribble, C. and Jones, G. (1990). Concordances in the classroom. London: Longman.

Turton, N. D. and Heaton, J. B. (1987). Longman Dictionary of Common Errors. Harlow:

Longman.