deep learning and lexical, syntactic and semantic analysis · deep learning and lexical, syntactic...
TRANSCRIPT
![Page 1: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/1.jpg)
Deep Learning and Lexical, Syntacticand Semantic Analysis
Wanxiang Che and Yue Zhang2016-10
![Page 2: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/2.jpg)
Part 2: Introduction toDeepLearning
![Page 3: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/3.jpg)
Part 2.1: DeepLearning Background
![Page 4: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/4.jpg)
WhatisMachineLearning?
• FromDatatoKnowledge
InputAlgorithm
Output
Traditional Program
InputOutput
“Algorithm”
ML Program
![Page 5: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/5.jpg)
AStandardExampleofML
• TheMNIST(ModifiedNIST)databaseofhand-writtendigitsrecognition– Publiclyavailable– AhugeamountabouthowwellvariousMLmethodsdoonit
– 60,000+ 10,000 hand-writtendigits(28x28pixelseach)
![Page 6: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/6.jpg)
Veryhardtosaywhatmakesa2
![Page 7: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/7.jpg)
Traditional Model(before2012)
• Fixed/engineeredfeatures+trainableclassifier (分类器)– Designingafeatureextractorrequiresconsiderableeffortsbyexperts
SIFTGISTShapecontext
![Page 8: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/8.jpg)
DeepLearning (after2012)
• LearningHierarchicalRepresentations• DEEPmeansmorethanonestageofnon-linearfeaturetransformation
![Page 9: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/9.jpg)
DeepLearningArchitecture
![Page 10: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/10.jpg)
DeepLearningisNotNew
• 1980stechnology(NeuralNetworks)
![Page 11: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/11.jpg)
AboutNeuralNetworks
• Pros– Simpletolearnp(y|x)– ResultsOKforshallownets
• Cons– Doesnotlearnp(x)– Troublewith>3layers– Overfitts– Slowtotrain
![Page 12: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/12.jpg)
DeepLearningbeatsNN
• Pros– Simpletolearnp(y|x)– ResultsOK forshallownets
• Cons– Doesnotlearnp(x)– Troublewith>3layers– Overfitts– Slowtotrain
Unsupervisedfeaturelearning:RMBs,DAEs,
…
• Dropout• Maxout• StochasticPooling
GPU
• Newactivationfunctions:ReLU,…
• Gatedmechanism
![Page 13: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/13.jpg)
ResultsonMNIST
• NaïveNeuralNetwork– 96.59%
• SVM(defaultsettingsforlibsvm)– 94.35%
• OptimalSVM[AndreasMueller]– 98.56%
• Thestateoftheart:ConvolutionalNN(2013)– 99.79%
![Page 14: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/14.jpg)
DeepLearningWins
9.MICCAI2013GrandChallengeonMitosisDetection8.ICPR2012ContestonMitosisDetectioninBreastCancerHistologicalImages7.ISBI2012BrainImageSegmentationChallenge(withsuperhumanpixelerrorrate)6.IJCNN2011TrafficSignRecognitionCompetition(onlyourmethodachievedsuperhumanresults)5.ICDAR2011offlineChineseHandwritingCompetition4.OnlineGermanTrafficSignRecognitionContest3.ICDAR2009ArabicConnectedHandwritingCompetition2.ICDAR2009HandwrittenFarsi/ArabicCharacterRecognitionCompetition1.ICDAR2009FrenchConnectedHandwritingCompetition.Comparetheoverviewpageonhandwritingrecognition.• http://people.idsia.ch/~juergen/deeplearning.html
![Page 15: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/15.jpg)
DeepLearningforSpeechRecognition
![Page 16: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/16.jpg)
DeepLearningforNLP
Monolingual Data
Multi-lingual Data
Multi-modalData
Big Data
Deep Learning
我
喜欢
红
苹果
我
喜欢
红
苹果
我
喜欢
红
苹果
Recurrent NN Convolutional NN Recursive NN
Semantic Vector Space
Applications
QAPOS tagging Parsing
CaptionMT Dialog MCTest
Word Seg…
![Page 17: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/17.jpg)
Part 2.2: Feedforward Neural Networks
![Page 18: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/18.jpg)
TheTraditionalParadigmforML
1. Converttherawinputvectorintoavectoroffeatureactivations– Usehand-writtenprogramsbasedoncommon-sensetodefinethe
features2. Learnhowtoweighteachofthefeatureactivationstogeta
singlescalarquantity3. Ifthisquantityisabovesomethreshold,decidethattheinput
vectorisapositiveexampleofthetargetclass
![Page 19: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/19.jpg)
feature units
decision unit
input units
hand-coded programs
learned weights
TheStandardPerceptronArchitecture
IPhone is very good .
good very/good very …
>5?
0.8 0.9 0.1 …
![Page 20: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/20.jpg)
TheLimitationsofPerceptrons
• Thehand-codedfeatures– Greatinfluenceontheperformance– Needlotsofcosttofindsuitablefeatures
• Alinearclassifierwithahyperplane– Cannotseparatenon-lineardata,suchasXORfunctioncannotbelearnedbyasingle-layerperceptron
0,1
0,0 1,0
1,1
The positive and negative casescannot be separated by a plane
![Page 21: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/21.jpg)
LearningwithNon-linearHiddenLayers
![Page 22: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/22.jpg)
FeedforwardNeuralNetworks
• Theinformationispropagatedfromtheinputstotheoutputs
• Timehasnorole(NOcyclebetweenoutputsandinputs)
• Multi-layerPerceptron(MLP)?• Learningtheweightsofhiddenunits
isequivalenttolearningfeatures• Networkswithouthiddenlayersare
verylimitedintheinput-outputmappings– Morelayersoflinearunitsdonot
help.Itsstilllinear– Fixedoutputnon-linearities arenot
enough
x1 x2 xn…..
1st hidden layer
2nd hiddenlayer
Output layer
![Page 23: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/23.jpg)
MultipleLayer NeuralNetworks
• Whatarethosehiddenneuronsdoing?– Mayberepresentoutlines
![Page 24: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/24.jpg)
GeneralOptimizing(Learning)Algorithms
• GradientDescent
• StochasticGradientDescent(SGD)– Minibatch SGD(m>1),OnlineGD(m=1)
![Page 25: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/25.jpg)
Computational/Flow Graphs
• DescribingMathematicalExpressions• Forexample
– e=(a+b)*(b+1)• c=a+b,d=b+1,e=c*d
– Ifa=2,b=1
![Page 26: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/26.jpg)
DerivativesonComputationalGraphs
![Page 27: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/27.jpg)
ComputationalGraphBackwardPass(Backpropagation)
![Page 28: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/28.jpg)
AnFNNPOSTagger
![Page 29: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/29.jpg)
Part 2.3: Word Embeddings
![Page 30: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/30.jpg)
Typical Approaches for Word Representation
• 1-hot representation(orthogonality)– bag-of-word model
sun[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …]
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, …]
star
sim(star, sun) = 0
![Page 31: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/31.jpg)
DistributedWordRepresentation
• Eachwordisassociatedwithalow-dimension(compressed,50-1000),density (non-sparse)andreal (continuous)vector(wordembedding)– Learningwordvectorsthroughsupervisedmodels
• Nature– Semanticsimilarityasvectorsimilarity
![Page 32: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/32.jpg)
HowtoobtainWordEmbedding
![Page 33: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/33.jpg)
NeuralNetworkLanguageModels
• NeuralNetworkLanguageModels(NNLM)– FeedForward(Bengio etal.2003)
• Maximum-Likelihood Estimation• Back-propagation• Input: (𝑛 − 1) embeddings
![Page 34: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/34.jpg)
Predict Word Vector Directly
• SENNA (Collobert and Weston,2008)• word2vec(Mikolov etal.2013)
![Page 35: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/35.jpg)
Word2vec:CBOW(ContinuousBag-of-Word)
• Addinputsfromwordswithinshortwindowtopredictthecurrentword
• Theweightsfordifferentpositionsareshared• ComputationallymuchmoreefficientthannormalNNLM• Thehiddenlayerisjustlinear• Eachwordisanembeddingv(w)• Eachcontextisanembeddingv’(c)
![Page 36: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/36.jpg)
Word2vec:Skip-Gram
• Predictingsurroundingwordsusingthecurrentword
• SimilarperformancewithCBOW• Eachwordisanembeddingv(w)• Eachcontextisanembeddingv’(c)
![Page 37: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/37.jpg)
Word2vecTraining
• SGD+backpropagation• Mostofthecomputationalcostisafunctionofthesizeofthevocabulary(millions)
• Trainingaccelerating– NegativeSampling
• Mikolov etal.2013– HierarchicalDecomposition
• MorinandBengio 2005.Mnih andHinton2008.Mikolov etal.2013– GraphProcessingUnit (GPU)
![Page 38: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/38.jpg)
WordAnalogy
![Page 39: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/39.jpg)
Part 2.4: Recurrent and OtherNeural Networks
![Page 40: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/40.jpg)
LanguageModels
• Alanguagemodelcomputesaprobabilityforasequenceofword:𝑃(𝑤(,⋯𝑤+) orpredictsaprobabilityforthenextword:𝑃(𝑤+,(|𝑤(,⋯𝑤+)
• Usefulformachinetranslation,speechrecognition,andsoon• Wordordering
• P(thecatissmall)>P(smalltheiscat)• Wordchoice
• P(therearefour cats)>P(therearefor cats)
![Page 41: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/41.jpg)
TraditionalLanguageModels
• AnincorrectbutnecessaryMarkovassumption!• Probabilityisusuallyconditionedonwindowofn previouswords• 𝑃 𝑤(,⋯𝑤+ = ∏ 𝑃(𝑤0|𝑤(,⋯ ,𝑤01() ≈3
04( ∏ 𝑃(𝑤0|𝑤01(+1(),⋯ ,𝑤01()304(
• Howtoestimateprobabilities• 𝑝 𝑤6 𝑤( = 789+:(;<,;=)
789+:(;<)𝑝 𝑤> 𝑤(, 𝑤6 = 789+:(;<,;=,;?)
789+:(;<,;=)
• Performanceimproveswithkeepingaroundhighern-gramscountsanddoingsmoothing,suchasbackoff (e.g.if4-gramnotfound,try3-gram,etc)
• Disadvantages• ThereareALOTofn-grams!• Cannotseetoolonghistory
• P(坐/作了一整天的火车/作业)
![Page 42: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/42.jpg)
RecurrentNeuralNetworks(RNNs)
• Conditiontheneuralnetworkonallpreviousinputs• RAMrequirementonlyscaleswithnumberofinputs
W1 W1
ht ht+1ht-1
yt-1 yt yt+1
xt-1 xt xt+1
W1
W2 W2W2
W3 W3W3
![Page 43: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/43.jpg)
RecurrentNeuralNetworks (RNNs)
• Atasingletimestept• ℎ: = tanh(𝑊(ℎ:1( +𝑊6𝑥:)• 𝑦I: = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊>ℎ:)
ht …ht-1
𝑦I:
xt
W1
W2
W3
ht
𝑦I:
xt
W1
W2
W3
W1
![Page 44: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/44.jpg)
TrainingRNNsishard
• Ideallyinputsfrommanytimestepsagocanmodifyoutputy• Forexample,with2timesteps
ht ht+1ht-1
yt-1 yt yt+1
xt-1 xt xt+1
W1 W1
W2 W2
W3 W3
![Page 45: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/45.jpg)
BackPropagation ThroughTime(BPTT)
• Totalerroristhesumofeacherrorattimestept• PQPR
= ∑ PQTPR
U:4(
• PQTPR? =
PQTPVT
PVTPR? iseasytobecalculated
• Buttocalculate PQTPR< =
PQTPVT
PVTPXT
PXTPR< ishard(alsofor𝑊6)
• Becauseℎ: = tanh(𝑊(ℎ:1( +𝑊6𝑥:) dependsonℎ:1(,whichdependson𝑊( andℎ:16,andsoon.
• So PQTPR< = ∑ PQT
PVT
PVTPXT
PXTPXY
PXYPR<
:Z4(
ht
𝑦I:
xt
W1
W2
W3
![Page 46: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/46.jpg)
The vanishinggradientproblem
• PQTPR
= ∑ PQTPVT
PVTPXT
PXTPXY
PXYPR
:Z4( ,ℎ: = tanh(𝑊(ℎ:1( +𝑊6𝑥:)
• PXTPXY
= ∏ PX[PX[\<
:]4Z,( = ∏ 𝑊(diag[tanh′(⋯ )]:
]4Z,(
• PXTPXT\<
≤ 𝛾 𝑊( ≤ 𝛾𝜆(• where 𝛾 is bound diag[tanh′(⋯ )] , 𝜆(is the largest singular value of𝑊(
• PXTPXY
≤ 𝛾𝜆( :1Zà0,if𝛾𝜆( < 1
• Thiscanbecomeverysmallorverylargequicklyà Vanishingorexplodinggradient
• Trickforexplodinggradient:clippingtrick(setathreshold)
![Page 47: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/47.jpg)
A“solution”
• Intuition• Ensure𝛾𝜆( ≥ 1à topreventvanishinggradients
•So…• ProperinitializationoftheW• TouseReLU insteadoftanh orsigmoidactivationfunctions
![Page 48: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/48.jpg)
Abetter“solution”
• Recalltheoriginaltransitionequation• ℎ: = tanh(𝑊(ℎ:1( +𝑊6𝑥:)
• Wecaninsteadupdatethestateadditively• 𝑢: = tanh(𝑊(ℎ:1( +𝑊6𝑥:)• ℎ: = ℎ:1( + 𝑢:• then, PXT
PXT\<= 1 + P9T
PXT\<≥ 1
• Ontheotherhand• ℎ: = ℎ:1( + 𝑢: = ℎ:16 + 𝑢:1( + 𝑢: = ⋯
ht …ht-1
𝑦I:
xt
…
…
![Page 49: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/49.jpg)
Abetter“solution” (cont.)
• Interpolatebetweenoldstateandnewstate(“choosingtoforget”)• 𝑓: = 𝜎 𝑊k𝑥: + 𝑈kℎ:1(• ℎ: = 𝑓: ⊙ ℎ:1( + (1 − 𝑓:) ⊙ 𝑢:
• Introduceaseparateinputgate𝑖:• 𝑖: = 𝜎 𝑊0𝑥: + 𝑈0ℎ:1(• ℎ: = 𝑓: ⊙ ℎ:1( + 𝑖: ⊙ 𝑢:
• Selectivelyexposememorycell𝑐: withanoutputgate𝑜:• 𝑜: = 𝜎 𝑊8𝑥: + 𝑈8ℎ:1(• 𝑐: = 𝑓: ⊙ 𝑐:1( + 𝑖: ⊙ 𝑢:• ℎ: = 𝑜: ⊙ tanh(𝑐:)
![Page 50: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/50.jpg)
LongShort-TermMemory(LSTM)
• Hochreiter & Schmidhuber,1997• LSTM=additiveupdates+gating
Xt
+ + + +0 1 2 3
ht-1
Ct-1 Ct
ht
�
�
+
�σ σ σtanh
tanh
ht
![Page 51: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/51.jpg)
GatedRecurrentUnites,GRU(Choetal.2014)
• Mainideas• Keeparoundmemoriestocapturelongdistancedependencies• Allowerrormessagestoflowatdifferentstrengthsdependingontheinputs
• Updategate• Basedoncurrentinputandhiddenstate• 𝑧: = 𝜎 𝑊q𝑥: + 𝑈qℎ:1(
• Resetgate• Similarlybutwithdifferentweights• 𝑟: = 𝜎(𝑊s𝑥: + 𝑈sℎ:1()
![Page 52: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/52.jpg)
GRU
• Newmemorycontent• ℎt: = tanh(𝑊𝑥: + 𝑟: ⊙ 𝑈ℎ:1()• Updategatez controlshowmuchofpaststateshouldmatternow
• Ifz closedto1,thenwecancopyinformationinthatunitthroughmanytimestepsàlessvanishinggradient!
• Ifresetgater unitiscloseto0,thenthisignorespreviousmemoryandonlystoresthenewinputinformationà allowsmodeltodropinformationthatisirrelevantinthefuture
• Unitswithlongtermdependencieshaveactiveupdategatesz• Unitswithshort-termdependenciesoftenhaverestgatesr veryactive
• Finalmemoryattimestepcombinescurrentandprevioustimesteps• ℎ: = 𝑧: ⊙ ℎ:1( + (1 − 𝑧:) ⊙ ℎt
![Page 53: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/53.jpg)
LSTMvs.GRU
• Noclearwinner!• Tuninghyperparameters likelayersizeisprobablymoreimportantthanpickingtheidealarchitecture
• GRUshave fewer parametersandthusmaytrainabitfasterorneedless datatogeneralize
• Ifyouhaveenoughdata,thegreaterexpressivepowerofLSTMsmay leadtobetterresults.
![Page 54: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/54.jpg)
MoreRNNs
• BidirectionalRNN • Stack BidirectionalRNN
![Page 55: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/55.jpg)
Tree-LSTMs
• TraditionalSequentialComposition
• Tree-StructuredComposition
![Page 56: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/56.jpg)
MoreApplicationsofRNN
• NeuralMachineTranslation• HandwritingGeneration• ImageCaptionGeneration• …...
![Page 57: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/57.jpg)
NeuralMachineTranslation
• RNNtrainedend-to-end:encoder-decoder
W1 W1
h2 h3h1
I love you
W2 W2W2
我 ?
Encoder:
Decoder:
Thisneedstocapturethe
entiresentence!
我
![Page 58: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/58.jpg)
AttentionMechanism– Scoring
• Bahdanau etal.2015
W1 W1
h2 h3h1
I love you
W2 W2W2
我 ?
Encoder:
Decoder:
我
α=score(ℎ:, ℎ{)
0.3 0.6 0.1
ht𝛼
c
![Page 59: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/59.jpg)
ConvolutionNeuralNetwork
CS231nConvolutionalNeuralNetworkforVisualRecognition.
Pooling
![Page 60: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/60.jpg)
CNN for NLP
Zhang,Y.,&Wallace,B.(2015).ASensitivityAnalysisof(andPractitioners’Guideto)ConvolutionalNeuralNetworksforSentenceClassification.
![Page 61: Deep Learning and Lexical, Syntactic and Semantic Analysis · Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10. Part 2:Introduction to](https://reader033.vdocuments.us/reader033/viewer/2022052314/5f0b45737e708231d42fb0fe/html5/thumbnails/61.jpg)
RecursiveNeuralNetwork
Socher, R., Manning, C., & Ng, A. (2011). Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Network. NIPS.