showcasing the potential of error-annotated learner corpora for profiling research

18
potential of error- annotated learner corpora for profiling research Jennifer Thewissen Centre for English Corpus Linguistics (CECL) 1

Upload: channing-vega

Post on 31-Dec-2015

28 views

Category:

Documents


1 download

DESCRIPTION

Showcasing the potential of error-annotated learner corpora for profiling research. Jennifer Thewissen Centre for English Corpus Linguistics (CECL). Profiling research. Definition - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Showcasing the potential of error-annotated learner corpora for profiling research

Showcasing the potential of error-annotated learner

corpora for profiling research

Jennifer ThewissenCentre for English Corpus Linguistics

(CECL)

1

Page 2: Showcasing the potential of error-annotated learner corpora for profiling research

Profiling research

Definition Finding ‘criterial

features’ that discriminate between different levels of proficiency (e.g. Hawkins & Buttery, 2010)

CEF levels C2 C1 B2 B1 A2 A1

2

Page 3: Showcasing the potential of error-annotated learner corpora for profiling research

Feature we focussed on

Construct of accuracy, viz. errors

Focus on four proficiency levels, viz. B1, B2, C1, C2

Aim = See whether errors constituted a «criterial feature» to distinguish these levels

3

Page 4: Showcasing the potential of error-annotated learner corpora for profiling research

Data & methodology

4

Page 5: Showcasing the potential of error-annotated learner corpora for profiling research

5

International Corpus of Learner English (Granger et al., 2009)

L1 Total scripts Total tokens

FR 74 50060

GE 71 49540

SP 78 51385

Total 223 150985

Page 6: Showcasing the potential of error-annotated learner corpora for profiling research

Threefold analysis

Error annotation, i.e. error tagging phase

CEF rating phase

Error counting phase

6

Page 7: Showcasing the potential of error-annotated learner corpora for profiling research

7

Error annotation

Broad error categories Description

F Form, spelling errors

G Grammatical errors

L Lexical errors

X Lexico-grammatical errors

Q Punctuation errors

W Word missing, word redudant, word order

S Sentence unclear, incomplete

Page 8: Showcasing the potential of error-annotated learner corpora for profiling research

8

Error tagging examples

The fast spread of television can transform it into a double-edged (FS) wheapon

$weapon$.

I will try to give several (XNUC) proofs $proof$ of the truth of the sentence.

46 error subcategories Result: a detailed error profile per text

Page 9: Showcasing the potential of error-annotated learner corpora for profiling research

9

The CEF rating procedure

Individual rating of the 223 learner scripts according to the linguistic descriptors in the Common European Framework of Reference for Languages (CEF) (Council of Europe, 2001)

B1, B2, C1 or C2 (with + and – increments)

2 professional raters (+ 1 rater in cases of wide disagreement) (r = 0.70)

Page 10: Showcasing the potential of error-annotated learner corpora for profiling research

Tracking development

10

CEF scoreError

profile

Development:Progress?

Stabilisation?Regression?

Page 11: Showcasing the potential of error-annotated learner corpora for profiling research

11

Error counting: potential occasion analysis (GNN)

Learner corpussample

Error-tagged data

Total noun-number errors

POS-taggeddata (CLAWS7)

Total nouns used

Page 12: Showcasing the potential of error-annotated learner corpora for profiling research

12

Statistical analyses: ANOVA & Ryan (GNN)

CEF score N Ryan-derived groupings

C2 28 0,32

C1 67 0,70 0,70

B2 62 0,99 0,99

B1 66 1,23

GNN = [B1/B2]>[B2/C1]>[C1/C2]

Page 13: Showcasing the potential of error-annotated learner corpora for profiling research

Results for profiling research

13

Page 14: Showcasing the potential of error-annotated learner corpora for profiling research

14

4 main error developmental patterns

Error developmental patterns

Illustration

Improvement-only pattern B1>B2>C1>C2

Improvement & stabilisation pattern e.g. B1>[B2/C1/C2]

Stabilisation-only pattern [B1/B2/C1/C2]

Partly regressive pattern B2>B1

Page 15: Showcasing the potential of error-annotated learner corpora for profiling research

Two dominating error patterns

Dominating error patterns

Number of error

categories

Examples

B1>[B2/C1/C2] 17 (37%) SpellingUncountable nounsLexical phrasesAdjective number errorsUnclear sentences

[B1/B2/C1/C2] 16 (35%) TensesPunctuation confusionVerb complementationNoun complementation

15

Page 16: Showcasing the potential of error-annotated learner corpora for profiling research

16

Where do progress and stabilisation mainly occur? Discriminating power of errors

Adjacent proficiency levels

Number of discriminating error

types

B1>B2 20

B2>C1 3

C1>C2 2

[B2/C1/C2] 33

Page 17: Showcasing the potential of error-annotated learner corpora for profiling research

Preliminary observations for profiling research

17

Page 18: Showcasing the potential of error-annotated learner corpora for profiling research

Some concluding remarks

Errors (negative features) Stronger discriminatory power

between certain levels (viz. B1 vs. B2) than others (viz. B2 vs. C1 vs. C2)

Need to capture other features than errors (e.g. positive features)

Conclusion for profiling research: errors are useful but they are not enough in and of themselves

18