page 1 naacl-hlt bea-5 2010 los angeles, ca annotating esl errors: challenges and rewards alla...

27
Page 1 Page 1 NAACL-HLT BEA-5 2010 Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1Page 1

NAACL-HLT BEA-5 2010

Los Angeles, CA

Annotating ESL Errors: Challenges and Rewards

Alla Rozovskaya and Dan Roth

University of Illinois at Urbana-Champaign

Page 2Page 2

Annotating a corpus of English as a Second Language (ESL) writing: Motivation

Many non-native English speakers ESL learners make a variety of mistakes in grammar and usage Conventional proofing tools do not detect many ESL mistakes

– target native English speakers and do not address many mistakes of ESL writers

We are not restricting ourselves to ESL mistakes

Page 3

Goals

Developing automated techniques for detecting and correcting context-sensitive mistakes

Paving the way for better proofing tools for ESL writers E.g., providing instructional feedback

Developing automated scoring techniques E.g. , automated evaluation of student essays Annotation is an

important part of that process

Page 4

Annotating ESL errors: a hard problem

A sentence usually contains multiple errors In Western countries prisson conditions are more better than in Russia , and this fact helps to change criminals in better way of life .

Not always clear how to mark the type of a mistake “…which reflect a traditional female role and a traditional attitude to a woman…”

“…which reflect a traditional female role and a traditional attitude towards women…”

women

a woman women<NONE>

a woman

Page 5

Annotating ESL errors: a hard problem

Distinction between acceptable/unacceptable usage is fuzzy Women were indignant at inequality from men.

Women were indignant at the inequality from men.

Page 6

Common ESL mistakes

English as a Second Language (ESL) mistakes

Mistakes involving prepositions We even do good to*/for other people <NONE>*/by spending money on this and asking <NONE>*/for nothing in return.

Mistakes involving articles The main idea of their speeches is that a*/the romantic period of music was too short.

Laziness is the engine of the*/<NONE> progress.

Do you think anyone will help you? There are not many people who are willing to give their*/a hands*/hand.

Page 7

Purpose of the annotation

To have a gold standard set for the development and evaluation of an automated system that corrects ESL mistakes

There is currently no gold standard data set available for researchers Systems are evaluated on different data sets – performance comparison

across different systems is hard Results depend on the source language of the speakers and proficiency level

The annotation of this corpus is available and can be used by researchers who gain access to the ICLE and the CLEC corpora.

This corpus is used in the experiments described in [Rozovskaya and Roth, NAACL, ’10]

Page 8

Outline

Annotating ESL mistakes: MotivationAnnotating ESL mistakes: Motivation Annotation

Data selection Annotation procedure Error classification

Annotation tool Annotation statistics Statistics on article corrections Statistics on preposition corrections Inter-annotator agreement

Page 9

Annotation: Overview

Annotated a corpus of ESL sentences (63K words) Extracted from two corpora of ESL essays:

International Corpus of Learner English (ICLE) [Granger et al.,’02] Chinese Learner English Corpus (CLEC) [Gui and Yang,’03]

Sentences written by ESL students of 9 first language backgrounds

Each sentence is fully corrected and error tagged Annotated by native English speakers

Page 10

Annotation: focus of the annotation

Focus of the annotation: Mistakes in article and preposition usage These mistakes have been shown to be very common mistakes for

learners of different first language backgrounds [Dagneaux et al, ’98; Gamon et al., ’08; Tetreault et al., ’08; others]

Page 11

Annotation: data selection

Sentences for annotation extracted from two corpora of ESL essays International Corpus of Learner English (ICLE)

Essays by advanced learners of English First language backgrounds: Bulgarian, Czech, French, German, Italian,

Polish, Russian, Spanish Chinese Learner of English Corpus (CLEC)

Essays by Chinese learners of different proficiency levels Garbled sentences and sentences with near-native fluency

excluded with a 4-gram language model 50% of sentences for annotation randomly sampled from the

two corpora 50% of sentences selected manually to collect more

preposition errors

Page 12

Annotation: procedure

Annotation performed by three native English speakers Graduate and undergraduate students in Linguistics/foreign languages With previous experience in natural language annotation

Annotation performed at the sentence level – all errors in the sentence are corrected and tagged

The annotators were encouraged to propose multiple alternative corrections Useful for the evaluation of an automated error correction system

“ They contribute money to the building of hospitals”

toto/towards

Page 13

Annotation: error classification

Focus of the annotation: mistakes in article and preposition usage

Error classification (inspired by [Tetreault and Chodorow,’08]) developed with the focus on article and preposition errors

“…which reflect a traditional female role and a traditional attitude to a woman…” “…which reflect a traditional female role and a traditional attitude towards a*/<NONE> woman*/women…”

was intended to give a general idea about the types of mistakes ESL students make

Page 14

Annotation: error classificationError type Example

Article error Women were indignant at <None>*/the inequality from men.

Preposition error …to change their views to*/for the better.

Noun number Science is surviving by overcoming the mistakes not by uttering the truths*/truth.

Verb form He write*/writes poetry.

Word form It is not simply*/simple to make professional army.

Spelling …if a person commited*/committed a crime…

Word replacement (lexical error)

There is a probability*/possibility that today’s fantasies will not be fantasies tomorrow.

Page 15

Outline

Annotating ESL mistakes: MotivationAnnotating ESL mistakes: Motivation AnnotationAnnotation

Data selectionData selection Annotation procedureAnnotation procedure Error classificationError classification

The annotation tool Annotation statistics Statistics on article corrections Statistics on preposition corrections Inter-annotator agreement

Page 16Page 16

The annotated ESL corpus

Annotating ESL sentences with an annotation tool

Sentence for annotation

Flexible infrastructure

allows for an easy adaptation to a

different domain

Page 17Page 17

Example of an annotated sentence

Before annotation “This time asks for looking at things with

our eyes opened.”

With annotation comments “This time @period, age, time@ asks $us$ for

<to> looking *look* at things with our eyes opened .”

After annotation “This period asks us to look at things with our

eyes opened.”

Annotation rate: 30-40 sentences per hour

Page 18

Outline

Annotating ESL mistakes: MotivationAnnotating ESL mistakes: Motivation AnnotationAnnotation

Data selectionData selection Annotation procedureAnnotation procedure Error classificationError classification

Annotation toolAnnotation tool Annotation statistics Statistics on article corrections Statistics on preposition corrections Inter-annotator agreement

Page 19

Annotation statistics

Spelling6.5%

Word order2.2%

Noun number3.0%

Word form2.9%

Verb form5.2

Prepositions17.1%

Articles12.5%

Word replacement28.2%

Punctuation22.5%

Page 20

Common article and preposition mistakes

Article mistakes Missing articles

But this , as such , is already <NONE>*/a new subject for discussion .

Extraneous articles Laziness is the engine of the*/<NONE> progress.

Preposition mistakes Confusing different prepositions

Education gives a person a better appreciation of*/for such fields as art , literature , history , human relations , and science

Page 21

Statistics on article corrections

Source language Errors total Errors per hundred words

Bulgarian 76 1.2

Chinese 179 1.9

Czech 138 2.1

French 22 0.4

German 23 0.5

Italian 43 0.6

Polish 71 1.5

Russian 271 2.5

Spanish 134 1.7

All 957 1.5

Page 22Page 22

Distribution of article errors by error type

Distribution of errors by type

Missing the Missing a Extr.the Extr.a Conf.(a,the )0

10

20

30

40

50

60

Chinese

Czech

Russian

Not all confusions are equally likely

Errors are dependent on the

first language of the writer

Page 23

Statistics on preposition corrections

Source E rrors E rrors M istakes by error typelanguage total per 100 R epl. Ins. Del. W ith

words orig.Bulgarian 89 1.4 58% 22% 11% 8%Chinese 384 4.1 52% 24% 22% 2%Czech 91 1.4 51% 21% 24% 4%French 57 1.0 61% 9% 12% 18%German 75 1.5 61% 8% 16% 15%Italian 120 1.8 57% 22% 12% 8%Polish 77 1.7 49% 18% 16% 17%Russian 251 2.3 53% 21% 17% 9%Spanish 165 2.1 55% 20% 19% 6%A ll 1309 2.1 54% 21% 18% 7%

1

Unlike with articles, preposition confusions account for over 50% of all preposition errors

Many contexts license multiple prepositions [Tetreault and Chodorow, ’08]

Page 24

Inter-annotator agreement

A greement set R ater J udged J udgedcorrect incorrect

Agreement set 1Rater #2 37 63Rater #3 59 41

Agreement set 2Rater #1 79 21Rater #3 73 27

Agreement set 3Rater #1 83 17Rater #2 47 53

1

Page 25

Inter-annotator agreement

A greement set A greement kappaAgreement set 1 56% 0.16Agreement set 2 78% 0.40Agreement set 3 60% 0.23

1

Page 26

Conclusions

We presented the annotation of a corpus of ESL sentences Annotating ESL mistakes is an important but a challenging

task Interacting mistakes in a sentence Fuzzy distinction between acceptable/unacceptable usage

We have described an annotation tool that facilitates the error-tagging of a corpus of text

The inter-annotator agreement on the task is low and shows that this is a difficult problem

The annotated data can be used by other researchers for the evaluation of their systems

Page 27Page 27

Annotation tool ESL annotation

[email protected]://L2R.cs.uiuc.edu/~cogcomp/software.php

Thank you!Questions?