NeuralDialogShrimai Prabhumoye
AlanWBlackSpeechProcessing11-[468]92
Review
• TaskOrientedSystems• Intents,slots,actionsandresponse
• Non-TaskOrientedSystems• Noagenda,forfun
• Buildingdialogsystems• RuleBasedSystems• Eliza
• RetrievalTechniques• Representations:TF-IDF,N-grams,wordsthemselves• SimilarityMeasures:Jaccard,cosine,euclidean distance• Limitations– fixedsetofresponses,novariationinresponse
Review
• TaskOrientedSystems• Non-TaskOrientedSystems• Buildingdialogsystems• RetrievalTechniques
• Representation• WordVectors
• SimilarityMeasures• Limitations– fixedsetofresponses,novariationinresponse
• GenerativeModels
Overview
• WordEmbeddings
• LanguageModelling
• RecurrentNeuralNetworks• SequencetoSequenceModels
• HowtoBuildDialogSystem
• IssuesandExamples
• Alexa-Prize
NeuralDialog
• Wewanttomodel:
• Howtowerepresentsentence(𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 , 𝑃 𝑖𝑛𝑝𝑢𝑡 ?)• Howtobuildalanguagemodel.• Howtorepresentswords(wordembeddings?)
𝑷 𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆 𝒊𝒏𝒑𝒖𝒕)
NaturalLanguageProcessing
• Typicalpreprocessingstepso FormvocabularyofwordsthatmapswordstoauniqueIDo Differentcriteriacanbeusedtoselectwhichwordsarepartof
thevocabulary(eg:thresholdfrequency)o Allwordsnotinthevocabularywillbemappedtoaspecial‘out-
of-vocabulary’• Typicalvocabularysizeswillvarybetween10,000and250,000
(Salakhutdinov,2017)
PreprocessingTechniques
• Tokenization• “Iamagirl.”tokenizedto“I”,“am”,“a”,“girl”,“.”
• Lowercaseallwords• RemovingStopWords• Ex:“the”,“a”,“and”,etc
• FrequencyofWords• SetathresholdandmakeallwordsbelowthisfrequencyasUNK
• Add<START>and<EOS>tagatthebeginningandendofsentence.
(Salakhutdinov,2017)
Vocabulary
One-HotEncoding
• FromitswordID,wegetabasicrepresentationofawordthroughthe
one-hotencodingoftheID
• theone-hotvectorofanIDisavectorfilledwith0s,exceptfora1at
thepositionassociatedwiththeID
• ForvocabularysizeD=10,theone-hotvectorofwordIDw=4is:
𝑒 𝑤 = [0001000000]
(Salakhutdinov,2017)
LimitationsofOne-HotEncoding
LimitationsofOne-HotEncoding
• Aone-hotencodingmakesnoassumptionaboutwordsimilarity.o [“working”,“on”,“Friday”,“is”,“tiring”]doesnotappearinourtrainingset.
o [“working”,“on”,“Monday”,“is”,“tiring”]isinthetrainset.oWewanttomodel𝑃 “𝑡𝑖𝑟𝑖𝑛𝑔” “𝑤𝑜𝑟𝑘𝑖𝑛𝑔”, “𝑜𝑛”, “𝐹𝑟𝑖𝑑𝑎𝑦”, “𝑖𝑠”)oWordrepresentationof“Monday”and“Friday” aresimilarthengeneralize
(Salakhutdinov,2017)
LimitationsofOne-HotEncoding
• Themajorproblemwiththeone-hotrepresentationisthatitisvery
high-dimensional
othedimensionalityofe(w)isthesizeofthevocabulary
oatypicalvocabularysizeis≈100,000
oawindowof10wordswouldcorrespondtoaninputvectorofat
least1,000,000units!
(Salakhutdinov,2017)
ContinuousRepresentationofWords
• Eachwordwisassociatedwithareal-valuedvectorC(w)• Typicalsizeofword– embeddingis300ormore.
(Salakhutdinov,2017)
ContinuousRepresentationofWords
• Wewouldlikethedistance||C(w)-C(w’)||toreflectmeaningfulsimilarities betweenwords
(Salakhutdinov,2017)
LanguageModeling
• Alanguagemodelallowsustopredicttheprobabilityofobserving
the sentence(inagivendataset)as:
𝑃 𝑥G, … , 𝑥I = J𝑃 𝑥K 𝑥G, … , 𝑥KLG)I
KMG
• Herelengthofsentenceisn.• Builda languagemodel usingaRecurrentNeuralNetwork.
WordEmbeddings fromLanguageModels
(Neubig,2017)
ContinuousBagofWords(CBOW)
• Predictwordbasedonsumofsurroundingembeddings
(Neubig,2017)
Skip-gram
• usethecurrentwordtopredictthesurroundingwindowofcontext
words
(Neubig,2017)
BERT(BidirectionalEncoderRepresentationsfromTransformers)
• BERTisamethodofpretraining languagerepresentations
• Data:Wikipedia(2.5Bwords)+BookCorpus (800Mwords)
• Maskoutk%oftheinputwords,andthenpredictthemaskedwords
• WordEmbeddingSize:768
UseofWordEmbeddings
• torepresentasentence
• asinputtoaneuralnetwork
• tounderstandpropertiesofwords
• Partofspeech
• Dotwowordsmeanthesamething?
• semanticrelation(is-a,part-of,went-to-school-at)?
NLPandSequentialData
• NLPisfullofsequentialdata
• Charactersinwords
• Wordsinsentences
• Sentencesindiscourse
• …
(Neubig,2017)
Long-distanceDependenciesinLanguage
• Agreementinnumber,gender,etc.
• He doesnothaveverymuchconfidenceinhimself.
• She doesnothaveverymuchconfidenceinherself.
• Selectional preference
• Thereign haslastedaslongasthelifeofthequeen.
• Therain haslastedaslongasthelifeoftheclouds.
(Neubig,2017)
RecurrentNeuralNetworks
• Toolstorememberinformation
(Neubig,2017)
FeedForwardNN RecurrentNN
UnrollinginTime
• Whatdoesprocessingasequencelooklike?
I hate this movie
RNN
predict
label
RNN
predict
label
RNN
predict
label
predict
label
RNN
(Neubig,2017)
TrainingRNNsI hate this movie
RNN
predict
Prediction1
RNN
predict
Prediction2
RNN
predict
Prediction3
predict
Prediction4
RNN
Label1 Label2 Label3 Label4
Loss1 Loss2 Loss3 Loss4
sum totalloss (Neubig,2017)
WhatcanRNNsdo
• Representasentence
• Readwholesentence,makeaprediction
• Representacontextwithinasentence
• Readcontextupuntilthatpoint
(Neubig,2017)
Representingasentence
• ℎO istherepresentationofthesentence
• ℎO istherepresentationoftheprobabilityofobserving“Ihatethismovie”
I hate this movie
RNN RNN RNN RNN
ℎP ℎG ℎQ ℎR ℎO
(Neubig,2017)
LanguageModelingusingRNN
(Neubig,2017)
I hate this movie<start>
RNN RNN RNN RNN RNN
predict
hate
predict
I
predict
this
predict
movie
predict
<end>
Bidirectional-RNNs
• Asimpleextension,runtheRNNinbothdirections
(Neubig,2017)
Bidirectional-RNNs
• Asimpleextension,runtheRNNinbothdirections
(Neubig,2017)Prediction1
Bidirectional-RNNs
• Asimpleextension,runtheRNNinbothdirections
(Neubig,2017)Prediction1 Prediction2
Bidirectional-RNNs
• Asimpleextension,runtheRNNinbothdirections
(Neubig,2017)Prediction1 Prediction2 Prediction3 Prediction4
RecurrentNeuralNetworks
• TheideabehindRNNsistomakeuseofsequentialinformation.
RecurrentNeuralNetworks
• 𝑥S istheinputattimestept• 𝑥S isthewordembedding• 𝑠S isthehiddenrepresentationattimestept
𝑠S = 𝑓 𝑈𝑥S +𝑊𝑠SLG𝑜S = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑉𝑠S)
• Note: U,V,Wareshared acrossalltimesteps
RNNProblemsandAlternatives
• Vanishinggradients
• Gradientsdecreaseastheygetpushedback
• Sol:LongShort-termMemory(Hochreiter andSchmidhuber 1997)(Neubig,2017)
RNNStrengthsandWeaknesses
• RNNs,particularlydeepRNNs/LSTMs,arequitepowerfulandflexible
• Buttheyrequirealotofdata
• Alsohavetroublewithweakerrorsignalspassedbackfromtheend
ofthesentence
BuildChatbots
• Wewanttomodel𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑖𝑛𝑝𝑢𝑡_𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒)
• Welearnthowtobuildwordembeddings
• Welearnthowtobuildalanguagemodel
• Welearnthowtorepresentasentence.
• Wewanttogetarepresentationoftheinput_sentence andthen
generatetheresponseconditionedontheinput.
ConditionalLanguageModels
• LanguageModel
𝑃 𝑋 = J𝑃 𝑥K|𝑥G, … , 𝑥KLG
I
KMG
• ConditionalLanguageModel
𝑃 𝑌 𝑋 = J𝑃 𝑦 |𝑋, 𝑦G, … , 𝑦 LG
a
`MG
(Neubig,2017)
contextnextword
context
Addedcontext
ConditionalLanguageModel(Sutskever etal.2014)
(Neubig,2017)
Howtopasshiddenstate?
• Initializedecoderw/encoder(Sutskever etal.2014)
• Transform(canbedifferentdimensions)
• Inputateverytimestep(Kalchbrenner &Blunsom 2013)
(Neubig,2017)
SequencetoSequenceModels
ConstraintsofNeuralModels
Backchanneling
Long-termconversationplanning
Context
Engagement
ConstraintsofNeuralModels
Constraints
Gesture
GazeLaughter
Backchanneling
Long-termconversationplanning
Context
Engagement
ExamplesofNeuralChatbots
Tay
Zo
Xiaoice
• https://www.youtube.com/watch?v=dg-x1WuGhuI
AlexaPrizeChallenge
• Challenge:Buildachatbotthatengages theusersfor20mins.• Sponsored12UniversityTeamswith$100k.• CMUMagnusandCMURuby.• Systemsaremulticomponent
oCombinationsoftask/non-taskoHand-writtenandstatistical/neuralmodels
• ItsaboutengagingresearchersoHavingmorePhDstudentsdodialogoGivingaccessfordeveloperstousersoCollectingdata:whatdouserssay
CMUMagnus
• Highaveragenumberofturns
• AverageRating
• Topics:Movies,Sports,Travel,GoT
• Usershadlongerconversationsbutdidnotenjoytheconversation.oIdentifywhenuserisfrustrated orwantstochangetopic.
oIdentifywhattheuserwouldliketotalkabout(intent).
• Detecting“Abusive”remarksandrespondingappropriately
Summary
• Howtorepresentwordsincontinuousspace.• WhatareRNNsandhowtousethemtorepresentasentence.• Sequencetosequencemodelsfor𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑖𝑛𝑝𝑢𝑡_𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒)• Issuesinneuralmodel• IssueswithLivesystem!
References
• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-03-wordemb.pdf• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-06-rnn.pdf• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-08-condlm.pdf• https://www.cs.cmu.edu/~rsalakhu/10707/Lectures/Lecture_Language_2019.pdf• http://www.phontron.com/class/mtandseq2seq2017/mt-spring2017.chapter6.pdf
References
• http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/• http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/• http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/• http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/• https://nlp.stanford.edu/seminar/details/jdevlin.pdf
RNNtorepresentasentence
RNN
Embedding
how
𝒔𝟎 RNN
Embedding
are
𝒔𝟏RNN
Embedding
you
𝒔𝟐RNN
Embedding
?
𝒔𝟑𝒔𝟒
• 𝑠O istherepresentationoftheentiresentence• 𝑠O istherepresentationofprobabilityofobserving“howareyou?”