![Page 1: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/1.jpg)
Showcasing the potential of error-annotated learner
corpora for profiling research
Jennifer ThewissenCentre for English Corpus Linguistics
(CECL)
1
![Page 2: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/2.jpg)
Profiling research
Definition Finding ‘criterial
features’ that discriminate between different levels of proficiency (e.g. Hawkins & Buttery, 2010)
CEF levels C2 C1 B2 B1 A2 A1
2
![Page 3: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/3.jpg)
Feature we focussed on
Construct of accuracy, viz. errors
Focus on four proficiency levels, viz. B1, B2, C1, C2
Aim = See whether errors constituted a «criterial feature» to distinguish these levels
3
![Page 4: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/4.jpg)
Data & methodology
4
![Page 5: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/5.jpg)
5
International Corpus of Learner English (Granger et al., 2009)
L1 Total scripts Total tokens
FR 74 50060
GE 71 49540
SP 78 51385
Total 223 150985
![Page 6: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/6.jpg)
Threefold analysis
Error annotation, i.e. error tagging phase
CEF rating phase
Error counting phase
6
![Page 7: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/7.jpg)
7
Error annotation
Broad error categories Description
F Form, spelling errors
G Grammatical errors
L Lexical errors
X Lexico-grammatical errors
Q Punctuation errors
W Word missing, word redudant, word order
S Sentence unclear, incomplete
![Page 8: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/8.jpg)
8
Error tagging examples
The fast spread of television can transform it into a double-edged (FS) wheapon
$weapon$.
I will try to give several (XNUC) proofs $proof$ of the truth of the sentence.
46 error subcategories Result: a detailed error profile per text
![Page 9: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/9.jpg)
9
The CEF rating procedure
Individual rating of the 223 learner scripts according to the linguistic descriptors in the Common European Framework of Reference for Languages (CEF) (Council of Europe, 2001)
B1, B2, C1 or C2 (with + and – increments)
2 professional raters (+ 1 rater in cases of wide disagreement) (r = 0.70)
![Page 10: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/10.jpg)
Tracking development
10
CEF scoreError
profile
Development:Progress?
Stabilisation?Regression?
![Page 11: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/11.jpg)
11
Error counting: potential occasion analysis (GNN)
Learner corpussample
Error-tagged data
Total noun-number errors
POS-taggeddata (CLAWS7)
Total nouns used
![Page 12: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/12.jpg)
12
Statistical analyses: ANOVA & Ryan (GNN)
CEF score N Ryan-derived groupings
C2 28 0,32
C1 67 0,70 0,70
B2 62 0,99 0,99
B1 66 1,23
GNN = [B1/B2]>[B2/C1]>[C1/C2]
![Page 13: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/13.jpg)
Results for profiling research
13
![Page 14: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/14.jpg)
14
4 main error developmental patterns
Error developmental patterns
Illustration
Improvement-only pattern B1>B2>C1>C2
Improvement & stabilisation pattern e.g. B1>[B2/C1/C2]
Stabilisation-only pattern [B1/B2/C1/C2]
Partly regressive pattern B2>B1
![Page 15: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/15.jpg)
Two dominating error patterns
Dominating error patterns
Number of error
categories
Examples
B1>[B2/C1/C2] 17 (37%) SpellingUncountable nounsLexical phrasesAdjective number errorsUnclear sentences
[B1/B2/C1/C2] 16 (35%) TensesPunctuation confusionVerb complementationNoun complementation
15
![Page 16: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/16.jpg)
16
Where do progress and stabilisation mainly occur? Discriminating power of errors
Adjacent proficiency levels
Number of discriminating error
types
B1>B2 20
B2>C1 3
C1>C2 2
[B2/C1/C2] 33
![Page 17: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/17.jpg)
Preliminary observations for profiling research
17
![Page 18: Showcasing the potential of error-annotated learner corpora for profiling research](https://reader035.vdocuments.us/reader035/viewer/2022071807/56812e51550346895d93f2ee/html5/thumbnails/18.jpg)
Some concluding remarks
Errors (negative features) Stronger discriminatory power
between certain levels (viz. B1 vs. B2) than others (viz. B2 vs. C1 vs. C2)
Need to capture other features than errors (e.g. positive features)
Conclusion for profiling research: errors are useful but they are not enough in and of themselves
18