learner corpus analysis and error annotation xiaofei lu calper 2010 summer workshop july 13, 2010
TRANSCRIPT
Learner corpus analysis and error annotation
Xiaofei LuCALPER 2010 Summer Workshop
July 13, 2010
OverviewAnalyzing raw corporaError annotation
Issues in corpus annotationGranger (2003)
Analyzing raw corporaConcordancing software
GOLDAntConc
Other softwareCLAN
Issues in corpus annotationAnnotation scheme and formatAnnotation procedureAnnotation quality
Annotation scheme and formatWhat are the categories you are using?
Linguistically consensualOverspecification vs. underspecificationUse short, meaningful codes for your categories
Annotation format considerationsCompatible with annotation schemeFacilitates corpus query
Annotation procedure and qualityAnnotator training
Scheme and formatProblematic cases and disagreements
Computer-assisted manual annotationStanford annotation toolUAM Corpus Tool and NoteTab
Inter-annotator agreementCohen’s KappaOnline Kappa calculator
Granger (2003)Learner corporaError annotationError statistics and analysisIntegration of results into CALLConclusion
Learner corporaWhat is a learner corpus?Difference from traditional data in SLADifference from native language data
FrequenciesErrors
From error annotation to error detection
Computer-aided error annotationDagneaux, Denness and Granger (1998)
Manual correction of L2 French corpusElaboration of an error tagging system Insertion of error tags and correctionsRetrieval of lists of error types and statisticsConcordance-based error analysis
Tagging system Informative but manageableReusable, flexible, consistent
Error tagging systemDulay, Burt & Krashen (1982)
System based on linguistic categories (e.g., syntax)Surface structure alternations (e.g., omission)
Granger’s (2003) three-dimensional taxonomyError domainError categoryWord category
Error tagging system (cont.)Error domain and category
General level: grammatical, lexical, etc.Domains subdivided into error categoriesTable 1, page 468
Word categoryA POS tagset with 11 major and 54 sub-categoriesMakes it possible to sort errors by POS categories
Error tagging system (cont.)Correct forms inserted next to erroneous forms
Facilitates interpretation of error annotationsAllows for automatic sorting on correct forms
Tag insertion using a menu-driven editor
Error statistics and analysisError frequency by domain or (word) category
Highest ranked domains: grammar and form
Error trigramsConcordancers for searching error codes
AntConc WordSmith Tools
Integrating results into CALLGoal: a hypermedia CALL program
Using NLP and Communicative approaches to SLATraditional and NLP-enabled exercisesAutomatic error diagnosis and feedback generation
Error statistics and analysis used to Select linguistic areas to focus onAdapt exercises as a function of attested error typesAdapt NLP tools for error diagnosis
Integrating results into CALL (cont.)Most error-prone linguistic areas
Tense and mood, agreementArticles, complementation, prepositions
Adapting exercises Exercises reflect type of error-prone contextFormal errors through dictation and exercises targeting
specific difficultiesAttention to punctuation
Integrating results into CALL (cont.)Adapting NLP tools for error diagnosis
Spell checker and parserHandles orthographic, grammatical, syntactic, and lexical
errorsNot punctuation, semantic, and tense errors
Granger (2003) summaryEffective 3-tier error annotation system
Limited number of categories per tierVersatile automated data manipulation
Limitations of error-tagging Element of subjectivity in annotationFocuses on misuse
Usefulness of error-tagged learner corpusError statistics helps understand learner interlangHelps adapt pedagogical materials and programs
ActivityUsing the Stanford annotation tool
Annotate a short text using your own scheme, orAnnotate a short learner text using Granger’s (2003)
schemeQuery the annotated text using AntConc