learner corpus analysis and error annotation xiaofei lu calper 2010 summer workshop july 13, 2010

18
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Upload: jewel-welch

Post on 29-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Learner corpus analysis and error annotation

Xiaofei LuCALPER 2010 Summer Workshop

July 13, 2010

Page 2: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

OverviewAnalyzing raw corporaError annotation

Issues in corpus annotationGranger (2003)

Page 3: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Analyzing raw corporaConcordancing software

GOLDAntConc

Other softwareCLAN

Page 4: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Issues in corpus annotationAnnotation scheme and formatAnnotation procedureAnnotation quality

Page 5: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Annotation scheme and formatWhat are the categories you are using?

Linguistically consensualOverspecification vs. underspecificationUse short, meaningful codes for your categories

Annotation format considerationsCompatible with annotation schemeFacilitates corpus query

Page 6: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Annotation procedure and qualityAnnotator training

Scheme and formatProblematic cases and disagreements

Computer-assisted manual annotationStanford annotation toolUAM Corpus Tool and NoteTab

Inter-annotator agreementCohen’s KappaOnline Kappa calculator

Page 7: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Granger (2003)Learner corporaError annotationError statistics and analysisIntegration of results into CALLConclusion

Page 8: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Learner corporaWhat is a learner corpus?Difference from traditional data in SLADifference from native language data

FrequenciesErrors

From error annotation to error detection

Page 9: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Computer-aided error annotationDagneaux, Denness and Granger (1998)

Manual correction of L2 French corpusElaboration of an error tagging system Insertion of error tags and correctionsRetrieval of lists of error types and statisticsConcordance-based error analysis

Tagging system Informative but manageableReusable, flexible, consistent

Page 10: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Error tagging systemDulay, Burt & Krashen (1982)

System based on linguistic categories (e.g., syntax)Surface structure alternations (e.g., omission)

Granger’s (2003) three-dimensional taxonomyError domainError categoryWord category

Page 11: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Error tagging system (cont.)Error domain and category

General level: grammatical, lexical, etc.Domains subdivided into error categoriesTable 1, page 468

Word categoryA POS tagset with 11 major and 54 sub-categoriesMakes it possible to sort errors by POS categories

Page 12: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Error tagging system (cont.)Correct forms inserted next to erroneous forms

Facilitates interpretation of error annotationsAllows for automatic sorting on correct forms

Tag insertion using a menu-driven editor

Page 13: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Error statistics and analysisError frequency by domain or (word) category

Highest ranked domains: grammar and form

Error trigramsConcordancers for searching error codes

AntConc WordSmith Tools

Page 14: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Integrating results into CALLGoal: a hypermedia CALL program

Using NLP and Communicative approaches to SLATraditional and NLP-enabled exercisesAutomatic error diagnosis and feedback generation

Error statistics and analysis used to Select linguistic areas to focus onAdapt exercises as a function of attested error typesAdapt NLP tools for error diagnosis

Page 15: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Integrating results into CALL (cont.)Most error-prone linguistic areas

Tense and mood, agreementArticles, complementation, prepositions

Adapting exercises Exercises reflect type of error-prone contextFormal errors through dictation and exercises targeting

specific difficultiesAttention to punctuation

Page 16: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Integrating results into CALL (cont.)Adapting NLP tools for error diagnosis

Spell checker and parserHandles orthographic, grammatical, syntactic, and lexical

errorsNot punctuation, semantic, and tense errors

Page 17: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Granger (2003) summaryEffective 3-tier error annotation system

Limited number of categories per tierVersatile automated data manipulation

Limitations of error-tagging Element of subjectivity in annotationFocuses on misuse

Usefulness of error-tagged learner corpusError statistics helps understand learner interlangHelps adapt pedagogical materials and programs

Page 18: Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

ActivityUsing the Stanford annotation tool

Annotate a short text using your own scheme, orAnnotate a short learner text using Granger’s (2003)

schemeQuery the annotated text using AntConc