cs388: natural language processing lecture 15:...
TRANSCRIPT
![Page 1: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/1.jpg)
CS388:NaturalLanguageProcessingLecture15:A9en:on
GregDurrett
![Page 2: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/2.jpg)
ThisLecture‣ GrahamNeubig(CMU)talkthisFridayat11amin6.302. “TowardsOpen-domainGenera:onofProgramsfromNaturalLanguage”
‣Mini2gradedbythisweekend
‣ Project2outbytheendoftoday;due*Friday*November2
![Page 3: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/3.jpg)
Recall:Seq2seqModel‣ Generatenextwordcondi:onedonpreviouswordaswellashiddenstate
themoviewasgreat <s>
h̄
‣Wsizeis|vocab|x|hiddenstate|,soamaxoveren:revocabulary
Decoderhasseparateparametersfromencoder,sothiscanlearntobealanguagemodel(produceaplausiblenextwordgivencurrentone)
P (y|x) =nY
i=1
P (yi|x, y1, . . . , yi�1)
P (yi|x, y1, . . . , yi�1) = softmax(W¯h)
![Page 4: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/4.jpg)
Recall:Seq2seqTraining
‣ Objec:ve:maximize
themoviewasgreat <s> lefilmétaitbon
le
‣ Onelosstermforeachtarget-sentenceword,feedthecorrectwordregardlessofmodel’spredic:on
[STOP]était
X
(x,y)
nX
i=1
logP (y⇤i |x, y⇤1 , . . . , y⇤i�1)
![Page 5: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/5.jpg)
Recall:Seman:cParsingasTransla:on
JiaandLiang(2015)
‣Writedownalinearizedformoftheseman:cparse,trainseq2seqmodelstodirectlytranslateintothisrepresenta:on
‣Mightnotproducewell-formedlogicalforms,mightrequirelotsofdata
“whatstatesborderTexas”
lambda x ( state ( x ) and border ( x , e89 ) ) )
‣ Noneedtohaveanexplicitgrammar,simplifiesalgorithms
![Page 6: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/6.jpg)
RegexPredic:on‣ Canuseforotherseman:cparsing-liketasks
‣ Predictregexfromtext
‣ Problem:requiresalotofdata:10,000examplesneededtoget~60%accuracyonpre9ysimpleregexes
Locascioetal.(2016)
![Page 7: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/7.jpg)
SQLGenera:on‣ Convertnaturallanguagedescrip:onintoaSQLqueryagainstsomeDB
‣ Howtoensurethatwell-formedSQLisgenerated?
Zhongetal.(2017)
‣ Threeseq2seqmodels
‣ Howtocapturecolumnnames+constants?‣ Pointermechanisms
![Page 8: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/8.jpg)
ThisLecture
‣ Transformers
‣ A9en:on
‣ Copying
![Page 9: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/9.jpg)
A9en:on
![Page 10: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/10.jpg)
ProblemswithSeq2seqModels
‣ Needsomeno:onofinputcoverageorwhatinputwordswe’vetranslated
‣ Encoder-decodermodelsliketorepeatthemselves:
AboyplaysinthesnowboyplaysboyplaysUngarçonjouedanslaneige
‣ Oaenabyproductoftrainingthesemodelspoorly
![Page 11: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/11.jpg)
ProblemswithSeq2seqModels
‣ Unknownwords:
‣ Noma9erhowmuchdatayouhave,you’llneedsomemechanismtocopyawordlikePont-de-Buisfromthesourcetotarget
![Page 12: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/12.jpg)
ProblemswithSeq2seqModels
‣ Badatlongsentences:1)afixed-sizerepresenta:ondoesn’tscale;2)LSTMss:llhaveahard:merememberingforreallylongperiodsof:me
RNNsearch:introducesa9en:onmechanismtogive“variable-sized”representa:on
Bahdanauetal.(2014)
![Page 13: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/13.jpg)
AlignedInputs
<s>lefilmétaitbon
themoviewasgreat
themoviewasgreat
lefilmétaitbon
‣Muchlessburdenonthehiddenstate
‣ Supposeweknewthesourceandtargetwouldbeword-by-wordtranslated
‣ Canlookatthecorrespondinginputwordwhentransla:ng—thiscouldscale!
lefilmétaitbon[STOP]
‣ Howcanweachievethiswithouthardcodingit?
![Page 14: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/14.jpg)
A9en:on
‣ Ateachdecoderstate,computeadistribu:onoversourceinputsbasedoncurrentdecoderstatethemoviewasgreat <s> le
themovie wa
sgreatthe
movie wa
sgreat
… …
‣ Usethatinoutputlayer
![Page 15: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/15.jpg)
A9en:on
themoviewasgreat
h1 h2 h3 h4
<s>
h̄1
‣ Foreachdecoderstate,computeweightedsumofinputstates
eij = f(h̄i, hj)
ci =X
j
↵ijhj
c1
‣ Unnormalized scalarweight
‣ Normalizedscalarweight
‣Weightedsumofinputhidden states(vector)
le
↵ij =exp(eij)Pj0 exp(eij0)
P (yi|x, y1, . . . , yi�1) = softmax(W [ci; ¯hi])
P (yi|x, y1, . . . , yi�1) = softmax(W¯hi)‣ Noa9n:
![Page 16: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/16.jpg)
A9en:on
<s>
h̄1
eij = f(h̄i, hj)
ci =X
j
↵ijhj
c1
‣ Notethatthisallusesoutputsofhiddenlayers
f(h̄i, hj) = tanh(W [h̄i, hj ])
f(h̄i, hj) = h̄i · hj
f(h̄i, hj) = h̄>i Whj
‣ Bahdanau+(2014):addi:ve
‣ Luong+(2015):dotproduct
Luongetal.(2015)
‣ Luong+(2015):bilinear
le
↵ij =exp(eij)Pj0 exp(eij0)
![Page 17: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/17.jpg)
Whatcana9en:ondo?‣ Learningtocopy—howmightthiswork?
Luongetal.(2015)
0 3 2 1
0 3 2 1
‣ LSTMcanlearntocountwiththerightweightmatrix
‣ Thisiseffec:velyposi:on-basedaddressing
![Page 18: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/18.jpg)
Whatcana9en:ondo?‣ Learningtosubsampletokens
Luongetal.(2015)
0 3 2 1
3 1
‣ Needtocount(forordering)andalsodeterminewhichtokensarein/out
‣ Content-basedaddressing
![Page 19: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/19.jpg)
A9en:on
‣ Decoderhiddenstatesarenowmostlyresponsibleforselec:ngwhattoa9endto
‣ Doesn’ttakeacomplexhiddenstatetowalkmonotonicallythroughasentenceandspitoutword-by-wordtransla:ons
‣ Encoderhiddenstatescapturecontextualsourcewordiden:ty
![Page 20: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/20.jpg)
themoviewasgreat
BatchingA9en:on
Luongetal.(2015)
themoviewasgreat
tokenoutputs:batchsizexsentencelengthxdimension
sentenceoutputs:batchsizexhiddensize
<s>
hiddenstate:batchsizexdimension
eij = f(h̄i, hj)
↵ij =exp(eij)Pj0 exp(eij0)
a9en:onscores=batchsizexsentencelength
c=batchsizexhiddensize ci =X
j
↵ijhj
‣Makesuretensorsaretherightsize!
![Page 21: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/21.jpg)
Alterna:ves‣Whendowecomputea9en:on?CancomputebeforeoraaerRNNcell
Bahdanauetal.(2015)
<s>
h̄1
c1
<s>
c1
Luongetal.(2015)
‣ AaerRNNcell
‣ BeforeRNNcell;thisoneisali9lemoreconvolutedandlessstandard
lele
![Page 22: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/22.jpg)
Results‣Machinetransla:on:BLEUscoreof14.0onEnglish-German->16.8witha9en:on,19.0withsmartera9en:on(we’llcomebacktothislater)
Luongetal.(2015) Chopraetal.(2016) JiaandLiang(2016)
‣ Summariza:on/headlinegenera:on:bigramrecallfrom11%->15%
‣ Seman:cparsing:~30%accuracy->70+%accuracyonGeoquery
![Page 23: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/23.jpg)
CopyingInput/Pointers
![Page 24: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/24.jpg)
UnknownWords
Jeanetal.(2015),Luongetal.(2015)
‣Wanttobeabletocopynameden::eslikePont-de-Buis
1
P (yi|x, y1, . . . , yi�1) = softmax(W [ci; ¯hi])
froma9en:on fromRNNhiddenstate
‣ S:llcanonlygeneratefromthevocabulary
![Page 25: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/25.jpg)
Copying
{ {thea
zebra
Pont-de-Buisecotax
…
‣ Vocabularycontains“normal”vocabaswellaswordsininput.Normalizesoverbothofthese:
‣ Bilinearfunc:onofinputrepresenta:on+outputhiddenstate
{P (yi = w|x, y1, . . . , yi�1) /expWw[ci; ¯hi]
h>j V h̄i
ifwinvocabifw=xj
![Page 26: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/26.jpg)
PointerNetworks
Vinyalsetal.(2015)
‣ Onlypointtotheinput,don’thaveanyno:onofvocabulary
‣ Usedfortasksincludingsummariza:onandsentenceordering
![Page 27: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/27.jpg)
Results
JiaandLiang(2016)
‣ Inmanyse}ngs,a9en:oncanroughlydothesamethingsascopying
‣ Forseman:cparsing,copyingtokensfromtheinput(texas)canbeveryuseful
![Page 28: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/28.jpg)
Transformers
![Page 29: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/29.jpg)
Self-A9en:on
themoviewasgreat
‣ LSTMabstrac:on:mapseachvectorinasentencetoanew,context-awarevector
‣ CNNsdidsomethingsimilarwithfilters
‣ A9en:oncangiveusathirdwaytodothis
Vaswanietal.(2017)
![Page 30: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/30.jpg)
Self-A9en:on
Vaswanietal.(2017)
themoviewasgreat
‣ Eachwordformsa“query”whichthencomputesa9en:onovereachword
‣Mul:ple“heads”analogoustodifferentconvolu:onalfilters.UseparametersWkandVktogetdifferenta9en:onvalues+transformvectors
x4
x
04
scalar
vector=sumofscalar*vector
↵i,j = softmax(x
>i xj)
x
0i =
nX
j=1
↵i,jxj
↵k,i,j = softmax(x
>i Wkxj) x
0k,i =
nX
j=1
↵k,i,jVkxj
![Page 31: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/31.jpg)
DeepTransformers‣ Supervised:transformercanreplaceLSTM;willrevisitthiswhenwediscussMT
‣ Unsupervised:transformersworkbe9erthanLSTMforunsupervisedpre-trainingofembeddings:predictwordgivencontextwords
‣ Devlinetal.October11,2018“BERT:Pre-trainingofDeepBidirec:onalTransformersforLanguageUnderstanding”
‣ Strongerthansimilarmethods,SOTAon~11tasks(includingNER—92.8F1)
![Page 32: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/32.jpg)
Takeaways‣ A9en:onisveryhelpfulforseq2seqmodels
‣ Usedfortasksincludingsummariza:onandsentenceordering
‣ Explicitlycopyinginputcanbebeneficialaswell
‣ Transformersarestrongmodelswe’llcomebacktolater
![Page 33: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex](https://reader033.vdocuments.us/reader033/viewer/2022042909/5f3c613ac69a4526c33895b0/html5/thumbnails/33.jpg)
Wherearewegoing‣We’venowtalkedaboutmostoftheimportantcoretoolsforNLP
‣ Restoftheclass:morefocusedonapplica:ons
‣ Informa:onextrac:on,thenMT,thenagrabbagofthings