learner corpus analysis and error annotation xiaofei lu calper 2010 summer workshop july 13, 2010

Post on 29-Dec-2015

214 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Learner corpus analysis and error annotation

Xiaofei LuCALPER 2010 Summer Workshop

July 13, 2010

OverviewAnalyzing raw corporaError annotation

Issues in corpus annotationGranger (2003)

Analyzing raw corporaConcordancing software

GOLDAntConc

Other softwareCLAN

Issues in corpus annotationAnnotation scheme and formatAnnotation procedureAnnotation quality

Annotation scheme and formatWhat are the categories you are using?

Linguistically consensualOverspecification vs. underspecificationUse short, meaningful codes for your categories

Annotation format considerationsCompatible with annotation schemeFacilitates corpus query

Annotation procedure and qualityAnnotator training

Scheme and formatProblematic cases and disagreements

Computer-assisted manual annotationStanford annotation toolUAM Corpus Tool and NoteTab

Inter-annotator agreementCohen’s KappaOnline Kappa calculator

Granger (2003)Learner corporaError annotationError statistics and analysisIntegration of results into CALLConclusion

Learner corporaWhat is a learner corpus?Difference from traditional data in SLADifference from native language data

FrequenciesErrors

From error annotation to error detection

Computer-aided error annotationDagneaux, Denness and Granger (1998)

Manual correction of L2 French corpusElaboration of an error tagging system Insertion of error tags and correctionsRetrieval of lists of error types and statisticsConcordance-based error analysis

Tagging system Informative but manageableReusable, flexible, consistent

Error tagging systemDulay, Burt & Krashen (1982)

System based on linguistic categories (e.g., syntax)Surface structure alternations (e.g., omission)

Granger’s (2003) three-dimensional taxonomyError domainError categoryWord category

Error tagging system (cont.)Error domain and category

General level: grammatical, lexical, etc.Domains subdivided into error categoriesTable 1, page 468

Word categoryA POS tagset with 11 major and 54 sub-categoriesMakes it possible to sort errors by POS categories

Error tagging system (cont.)Correct forms inserted next to erroneous forms

Facilitates interpretation of error annotationsAllows for automatic sorting on correct forms

Tag insertion using a menu-driven editor

Error statistics and analysisError frequency by domain or (word) category

Highest ranked domains: grammar and form

Error trigramsConcordancers for searching error codes

AntConc WordSmith Tools

Integrating results into CALLGoal: a hypermedia CALL program

Using NLP and Communicative approaches to SLATraditional and NLP-enabled exercisesAutomatic error diagnosis and feedback generation

Error statistics and analysis used to Select linguistic areas to focus onAdapt exercises as a function of attested error typesAdapt NLP tools for error diagnosis

Integrating results into CALL (cont.)Most error-prone linguistic areas

Tense and mood, agreementArticles, complementation, prepositions

Adapting exercises Exercises reflect type of error-prone contextFormal errors through dictation and exercises targeting

specific difficultiesAttention to punctuation

Integrating results into CALL (cont.)Adapting NLP tools for error diagnosis

Spell checker and parserHandles orthographic, grammatical, syntactic, and lexical

errorsNot punctuation, semantic, and tense errors

Granger (2003) summaryEffective 3-tier error annotation system

Limited number of categories per tierVersatile automated data manipulation

Limitations of error-tagging Element of subjectivity in annotationFocuses on misuse

Usefulness of error-tagged learner corpusError statistics helps understand learner interlangHelps adapt pedagogical materials and programs

ActivityUsing the Stanford annotation tool

Annotate a short text using your own scheme, orAnnotate a short learner text using Granger’s (2003)

schemeQuery the annotated text using AntConc

top related