does automated feedback in a proofreading tool help an english...
TRANSCRIPT
DoesAutomatedFeedbackinaProofreadingToolHelpanEnglishLanguageLearner?
ClaudiaLeacock,ButlerHillGroupMichaelGamon,MicrosoCResearch
ChrisBrocket,MicrosoCResearch
...andWilliamB.Dolan,MicrosoCResearch
JianfengGao,MicrosoCResearchDmitriyBelenko,MicrosoCResearchLucyVanderwende,MicrosoCResearch
AlexandreKlemenMev,UniversityofIllinoisatUrbanaChampaign
ESLAssistant
• March2008:CALICOWorkshop:Gamonetal.– SystemDescripMon&EvaluaMon.NouseracMon.
– Systemperformanceisstate‐of‐the‐art
• June24,2008:ESLAssistantgoeslive!• 2009CALICOWorkshopPresentaMon
– SystemUsage
– EvaluaMon– UserInteracMons:Whattheysaw.Whattheydid.
3
MostfrequenterrorsmadebyEastAsiannon‐naMvespeakers
NounRelated:ArMcles(inclusion&choice),NounNumber,NounofNoun
• Ithinkit’s*a/thebestwaytoresolveissueslikethis.
• Conversionalwaystakesalotof*efforts/effort.• Pleasesendthe*feedbackofcustomer/customerfeedbacktomebymail.
Preposi2onRelated:inclusion&choice
• ItseemsokandIdidnotpaymuchaeenMon*on/toit.
• Ishould*toask/askarhetoricalquesMon.
VerbRelated:Gerund/InfiniMveConfusion,AuxiliaryVerbError,VerbFormaMonErrors(6),Cognate/Verbconfusion,IrregularVerbs
• OnSaturday,Iwithmyclassmatewent*eaMng/toeat.
• Hopeyouwill*happy/behappyinTaiwan.• I*teached/taughthimallthethingsIknow.
Adjec2veRelated:AdjecMveConfusion(4),AdjecMveOrder
• Sheisvery*interesMng/interestedintheproblem.
• So*Korea/KoreanGovernmentisintenselyfosteringtrade.4
UsersandDataCollecMon
5
ESLAssistantUserInterface
6
PageViewsperDay
Beijing Olympics Live Translator snafu
7
Windows Live Translator 35% Traffic via Chinese MSN 13% website links: Taiwan MSN 11%
Korean MSN 7%
UserLocaMon
country visits percentageChina 51,285 26.80%UnitedStates 28,916 15.10%Taiwan 25,753 13.40%Korea‐South 12,934 6.80%HongKong 8,826 4.60%Brazil 4,648 2.40%Canada 3,917 2.00%Germany 3,077 1.60%UnitedKingdom 2,928 1.50%Japan 2,581 1.30%Italy 2,579 1.30%Spain 2,557 1.30%
RussianFederaMon 2,448 1.30%SaudiArabia 2.021 1.10%
8
GrowthoftheDatabase:UsersandSessions
0
5.000
10.000
15.000
20.000
25.000
30.000
9/24/08 10/24/08 11/24/08 1/7/2009 2/10/2009
numbe
rofusers
users
sessions
9
Repeatusers
0
10
20
30
40
50
60
70
80
90
100
onceonly 2Mmesormore 3Mmesormore 4Mmesormore 5Mmesormore
percen
tageoftotalvisits
Returnfrequency
10
CollectedData
Email53%
Non‐technical24%
Technical14%
Other4%
Unrelated5%
WriMngDomains:ByNumberofSentences
11
FrequentUsers(2/10/09).
12
adj2%
verb10%
prep27%noun
61%
FrequentUsers 578
Sessions 5,305
Session‐UniqueSentences 39,944
GrammaMcalErrorFlags 17,832
UserinteracMons
13
UsersExamine87%ofSuggesMonsAccept41%
Triggerwebsearchbutdon't
accept28%
LookatsuggesMonbutnottriggerweb
search31%
14
Conclusion: A significant number of users are inspecting the suggested rewrites and making a deliberate choice to accept it or not accept it.
Dousersmaketherightchoices?
Toanswer,needhumanevaluaMon:• Timeconsuming,costly
• Inter‐rateragreement(Tetreault&Chodorow)BUT...necessaryforsystemdevelopment
• SingleAnnotator• InternallyconsistenttomeasurerelaMveperformanceduringsystemdevelopment
15
Toanswer:Dousersmaketherightchoices?
• Evaluateduserdatatodate:34%offrequentusersessions:6Kflags
• FromEvaluatedFlags:
1. CalculateperformanceforALLsuggesMons.
2. CalculatesystemperformanceforONLYsuggesMonsthatwereaccepted.
3. CompareraMosofgoodandbadflags.
16
EvaluaMonCategories
Evalua2on SubEval Descrip2on
Good CorrectFlag ThecorrecMonfixesaproblemintheuserinput.
Neutral
BothGoodThesuggesMonisalegiMmatealternaMveofawell‐formedoriginalinput.Ex:Ilikeworking/towork.
MisdiagnosisThe original input contained an error but the suggestedrewrite neither improves nor further degrades the userinput.Ex:Ifyouhavefailmachineonhand.
BothWrongAnerrortypeiscorrectlydiagnosedbutthesuggestedrewritedoesnotcorrecttheproblem.Ex:“canyougiveme^suggesMon”inserttheinsteadofa
Non‐asciiAnon‐asciiortextprocessingmark‐upcharacterisintheimmediatecontext.(Onlyappliestouserdata)
Bad FalseFlagThesuggesMonresultedinanerrororwouldotherwiseleadtoadegradaMonovertheoriginaluserinput.
17
ErrorType:AreusersaccepMngtherightsuggesMons?
Noun‐related Prep‐related Verb‐related Adj‐related
18
good56%
neut28%
bad16%
AllSugges2ons
good63%
neut26%
bad11%
Accepted
good37%
neut39%
bad24%
AllSugges2ons
good45%
neut42%
bad13%
Accepted
good62%
neut32%
bad6%
AllSugges2ons
good72%
neut25%
bad3%
Accepted
good45%
neut32%
bad23%
AllSugges2ons
good63%
neut28%
bad9%
Accepted
Domains:AreusersaccepMngtherightsuggesMons?
Email Non‐technical Technical
19
good53%neutral
32%
bad15%
Sugges2ons
good63%
neutral28%
bad9%
Accepted
good56%
neutral32%
bad12%
Sugges2ons
good56%
neutral34%
bad10%
Accepted
good38%
neutral28%
bad34%
Sugges2ons
good52%neutral
29%
bad19%
Accepted
Whatdousersdowithneutralflags?
20
Misdiagnosis64%
Bothok15%
Nonascii7%
Bothwrong14%
NeutralCategories:“bothwrong”and“misdiagnosis”78%ofneutralflags
Inspect >15.5K Flags to Accept 6.4K
• Idon'tknowthatyouknewornot,thisearlymorningigotafromheadoffice...
– suggesMon:delete“from”
Idon'tknowthatyouknewornot,thisearlymorningIheardfromtheheadoffice...
• PleaseplaywiththesoCwareandFridayIwillbebytoworkwithanyquesMonsyoumayregardingit.
– suggesMon:regardingregardPleaseplaywiththesoCwareandFridayIwillbebytoworkwithanyquesMons
youmayhaveregardingit.
21
NeutralFlagsnotacceptedbutsentenceeditedtoproducenoflag
From1,349sentenceswithneutralflagsfound215subsequentlysubmieed“similar”stringswithnoerrorflag.UsersnotacceptsuggesMonbutdidsomethingELSEtomaketheflaggoaway.
Usersimprove40%oftheMme
22
Typedinsugges2on44%
Reviseandimprove40%
Reviseandnotimprove16%
NotAcceptSugges2onbutReviseSentence
Identifying the location of an error can help the user.
Conclusions
• Traffic:ThereisaninterestinESLproofingtools
• Evencurrentstate‐of‐the‐arterrorcorrecMoncanbeusefulforELLs: UsersdonotacceptproposedcorrecMonsblindly–theyareselecMveintheirbehavior
Usersmakeinformedchoices–theycandisMnguishcorrectsuggesMonsfromincorrectones
SomeMmesjustidenMfyingthelocaMonofanerrorenablestheuserstorepairtheproblemthemselves
23
www.eslassistant.com