correcting language errors machine translation...

CorrectingLanguageErrorsMachineTranslationTechniques

Shamil Chollampattshamil@u.nus.edu

NaturalLanguageProcessingGroupNationalUniversityofSingapore

Whatarelanguageerrors?

• A“languageerror”isadeviationfromrulesofalanguage

• Duetolackofknowledge.

• Madebylearnersofthelanguage.

• Languageerrorsinwritingincludespelling,grammatical,wordchoice,andstylisticerrors

HowcanNLPhelp?

• Buildingautomaticgrammarcorrectiontoolsandspellcheckers.

• Rule-basedsystems(e.g.MicrosoftWord),andadvancedsoftwarethatcorrectdifferentkindsoferrors(e.g.Grammarly,Ginger).

• Usefultoolfornon-nativewriters.

• Evidencethatcorrectivefeedbackhelpslanguagelearning(Leacocketal.,AutomatedGrammaticalErrorDetectionforLanguageLearners2ed,2014)

GrammaticalErrorCorrectionor“GEC”

• Automaticcorrectionofvariouskindsoferrorsinwrittentext.

Example(input):

Theproblemsbringsomeeffecton affect engineeringdesignfrom in twoaspectaspects,independentinnovationandengineeringapplication.

– fromtheNUSCorpusofLearnerEnglish(NUCLE)

• Mostpopularapproachisthemachinetranslationapproach.

TheTranslationApproach

• TreatsGECastranslation taskfrom“bad”Englishà “good”English

Advantages:üAbletolearntexttransformationsfromparalleldata.üSimple,anddoesnotneedlanguage-dependant tools.üCancorrectinteractingerrorsandcomplexerrortypes.

• Typically,statisticalmachinetranslation(SMT)orneuralmachinetranslation(NMT)frameworks.

History

SMTforcountabilityerrorsofmassnouns(Brockettetal.2006)

JapaneseSMT-basedGECandLang-8corpus(Mizumoto etal.2011)

CoNLL-2014SharedTask:2/3topsystemsuseSMT

Neuralmodelsasfeatures(Chollampatt etal.2016)

SystemcombinationapproachbeatsCoNLL-2014systems

(Susanto etal.2014)

NeuralmachinetranslationapproachtoGEC

(YuanandBriscoe2016)

GEC-specificfeaturesand(Junczys-Dowmunt andGrundkiewicz 2016)

Combiningwordandcharacter-levelSMT

(Chollampatt andNg,2017)

Convolutionalneuralencoder-decoderforGECachievesbestresults(Chollampatt andNg,AAAI2018– toappear)

DataFortraining:• ParallelCorpora

- AnnotatedLearnerDataset:NUCLE- CrawledfromLang-8

• Englishcorpora:Wikipedia,CommonCrawl

Fortesting:CoNLL-2014sharedtasktestset(1312sentences)Metric:F0.5 usingMaxMatch scorer

WordandCharacter-levelSMTforGEC

StatisticalMachineTranslationApproach

TRANSLATION MODEL

LANGUAGE MODEL

SMTDECODER

Parallel Text(Learner Text &Corrected Text)

Well-formed English text

train train

Input Sentence

Output Sentence

StatisticalMachineTranslationApproach

• Usingalog-linearframework:

• Featureweights𝜆" aretunedusingMERToptimizingF0.5 metricondevelopmentset.

𝑇∗ = argmax,

𝑃 𝑇 𝑆 = argmax,

/𝜆"(𝑓" 𝑆, 𝑇 )4

"56𝑇∗ :bestoutputsentenceS:sourcesentenceT:candidateoutputsentenceN:numberoffeatures𝜆" :ith featureweight𝑓" :ithfeaturefunction

Thus,advicefromhospitalplaysthe importantrolefor this.

Phrase-basedSMT

InputSentence(S)

Thus,advicefromhospitalplaysthe importantrolefor this.

Phrase-basedSMT

Thus,advicefromthe hospitalplaysanimportantroleinthis.

OutputSentence(T*)

InputSentence(S)

UsefulGEC-specificFeatures

• IntroducedbyJunczys-Dowmunt andGrundkiewicz (CoNLL-2014SharedTask,EMNLP2016)

‣ WordClassLanguageModel‣ OperationSequenceModel‣ EditOperations‣ SparseEditOperationFeatures‣ AWeb-scaleLM

NeuralNetworkJointModel

• JointModel(JM)vsLanguageModel(LM)

• FeatureFunction:

𝑓 𝑇, 𝑆 = 𝑃 𝑇 𝑆 ≈ J𝑃(𝑡"|𝑠NO6, 𝑠N, 𝑠NP6, 𝑡"O6)|,|

3+2gramJM:𝑃(sat|cat,sit,in,cats)

SRC:Thecatsitinamat.

HYP:Thecatssatonthemat.

BigramLM:𝑃(sat|cats)

NeuralNetworkJointModel

• Usesafeed-forwardneuralnetwork(Devlinetal.,2014)

• 5+5gramNNJMforGECinChollampatt etal.(IJCAI2016andBEAWorkshop2017)

cat sit in cats

Output Vocabulary

P(targetword|context)

𝑃(sat|cat,sit,in,cats)

Es Es Es Et

NNJMAdaptation

Training:usingloglikelihoodwithselfnormalization.

Adaptation:addingKL-divergenceregularizationtermtolossfunction:

AdaptationData:üHigherqualityerrorannotationsüHighererror/sentenceratio

SMTforSpellingCorrection

• Addedasapostprocessingsteptotheword-levelSMT.

• Character-levelSMTgetstheunknownwordsfromtheSMTsystemandgeneratescandidates(maybenon-words)

• Rescoringwithlanguagemodeltofilterawaynon-wordcandidatesandpickbestcorrectionbasedoncontext.

utli sesuti li sesuti li zesuti li seuti li shes

DevelopmentData:‣ 5,458sentencesfromNUCLEwithatleast1error/sentence.

ParallelTrainingDataforWord-levelSMT:‣ Lang-8,NUCLE(2.21Msentences,26.77Msourcewords)

DataforCharacter-levelSMT:‣ UniquewordsinthecorrectedsideofNUCLEandthecorporaofmisspellings(http://www.dcs.bbk.ac.uk/~ROGER/corpora.html)

LMTrainingData:‣ Wikipedia(1.78Btokens),CommonCrawlLM(94Btokens)

Results

43.1645.90

49.25 51.70 53.1447.40 49.52

SMT-GEC +GECFEATURES

+WEB-SCALELM

+ADAPTEDNNJM

+SMTFORSPELLING

R&R(2016) J&G(2016)

R&R(2016):ROZOVSKAYAANDROTH(ACL2016)J&G(2016) :JUNCZYSDOWMUNTANDGRUNDKIEWICZ(EMNLP2016)

MultilayerConvolutionalEncoderandDecoderNeuralNetworkforGEC

Encoder-DecoderApproach

DECODERInput Sentence

ENCODER

ATTENTION

Output Sentence

PriorworkinGEC:RecurrentNeuralNetwork(RNN)-basedapproaches(Bahdanau etal.2015)

WeuseafullyConvolutionalNeuralNetwork(CNNs)-basedapproach(Gehring etal.2017)…

Encoder-DecoderApproach

AMultilayerConvolutionalEncoder-Decoder

EncoderConsistsofsevenlayers.

ConvolutionOperation:𝐟"S = Conv 𝐡"O6SO6, 𝐡"SO6, 𝐡"P6SO6

GatedLinearUnits(GLUs):GLU 𝐟"S = 𝐟",6:ZS + 𝜎 𝐟",ZP6:]ZS

ResidualConnections:𝐡"S = GLU 𝐟"S + 𝐡"SO6

DecoderConsistsofsevenlayers.

Consistsofconvolutionsandnon-linearities

DecoderConsistsofsevenlayers.

Consistsofconvolutionsandnon-linearities

+Attention:

𝛼_,"S = exp(𝐞"a𝐳_S )

∑ exp(𝐞da𝐳_S )ed56

𝐱_S = /𝛼_,"S (e

𝐞" + 𝐬")29

Pre-trainingWordEmbeddings

• Wordembeddings arepre-trainedandinitialized.• TrainedusingfastText (Bojanowskietal.,2017)onWikipedia.• Usesunderlyingcharactern-gramsequencesofwords

AdvantagesüReliableembeddings canbeconstructedforrarerwords.üMorphologyofwordsisconsidered.

Ensembling andRe-scoring

• Ensembling multiplemodels,i.e.thelogprobabilitiesformultiplemodelsareaveragedduringpredictionofeachoutputword.

• Thefinalbeamcandidatesarere-scoredusingfeatures:• EditOperation(EO):#insertions,#deletions,#substitutions• LanguageModel(LM):web-scaleLMscore,#words

• FeatureweightstuningdonesimilartoSMT:MERToptimizingF0.5 onthedevelopmentdata.

ModelandTrainingDetails

• Data:AsinChollampatt andNg(BEA2017)exceptforusingonlyannotatedsentencepairsduringtraining.

• Vocabulary:30Kmostfrequentwordsonsourceandtargetside

• Numberofdimensionsofembeddings:500

• Numberofdimensionsof encoder/decoderoutputvectors:1024

Results

45.36 46.3849.33

54.13 53.14

45.1541.53 41.37

MULTILAYERCONVENC-DEC

PRE-TRAININGEMBEDDINGS

ENSEMBLEOF4MODELS

RE-SCORING(EO,LM)

CHOLLAMPATTANDNG(2017)

JI ETAL. (2017)WITHOUTLM

JIETAL. (2017) SCHMALTZETAL. (2017)

ChallengesandFutureWork

• Lackofgoodqualityparalleldata.

• Goingbeyondsentence-level.

• Adaptationtodiverselearners.

ThankYouEmail:shamil@u.nus.eduWebsite:shamilcm.github.io

correcting language errors machine translation...

Documents

correcting errors, imperfect awards in texas arbitration

funniest marketing translation errors

exterminator: automatically correcting memory errors

automatically tolerating and correcting memory errors

correcting capital account mistakes and errors on...

correcting antal fekete’s historical silver errors ·...

8 - correcting trid errors - hood.ppt · 2019-09-23 ·...

correcting plan errors using irs voluntary correction...

correcting capital account mistakes and errors on...

week 4. contents information theory information theory ...

correcting the errors: volatility forecast ... - github...

correcting errors produced by french speakers writing in...

correcting capital account mistakes and errors on...

a method for correcting preposition errors in learner

tolerating and correcting memory errors in c and c++

correcting errors in mlcs with bit-fixing coding

use of gesture for correcting pronunciation errors

correcting first-order errors in snow water equivalent...

correcting gfebs axol idoc errors

correcting writing errors with convolutional neural networks