deep learning for computer vision: recurrent neural networks (upc 2016)
TRANSCRIPT
![Page 1: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/1.jpg)
Day 2 Lecture 6
Recurrent Neural Networks
Xavier Giró-i-Nieto
[course site]
![Page 2: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/2.jpg)
2
Acknowledgments
Santi Pascual
![Page 3: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/3.jpg)
3
General idea
ConvNet(or CNN)
![Page 4: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/4.jpg)
4
General idea
ConvNet(or CNN)
![Page 5: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/5.jpg)
5
Multilayer Perceptron
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
The output depends ONLY on the current input.
![Page 6: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/6.jpg)
6
Recurrent Neural Network (RNN)
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
The hidden layers and the output depend from previous
states of the hidden layers
![Page 7: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/7.jpg)
7
Recurrent Neural Network (RNN)
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
The hidden layers and the output depend from previous
states of the hidden layers
![Page 8: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/8.jpg)
8
Recurrent Neural Network (RNN)
time
time
Rotation 90o
Front View Side View
Rotation 90o
![Page 9: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/9.jpg)
9
Recurrent Neural Networks (RNN)
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
Each node represents a layerof neurons at a single timestep.
tt-1 t+1
![Page 10: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/10.jpg)
10
Recurrent Neural Networks (RNN)
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
tt-1 t+1
The input is a SEQUENCE x(t) of any length.
![Page 11: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/11.jpg)
11
Recurrent Neural Networks (RNN)Common visual sequences:
Still image Spatial scan (zigzag, snake)
The input is a SEQUENCE x(t) of any length.
![Page 12: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/12.jpg)
12
Recurrent Neural Networks (RNN)Common visual sequences:
Video Temporalsampling
The input is a SEQUENCE x(t) of any length.
...
t
![Page 13: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/13.jpg)
13
Recurrent Neural Networks (RNN)
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
Must learn temporally shared weights w2; in addition to w1 & w3.
tt-1 t+1
![Page 14: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/14.jpg)
14
Bidirectional RNN (BRNN)
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
Must learn weights w2, w3, w4 & w5; in addition to w1 & w6.
![Page 15: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/15.jpg)
15Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
Bidirectional RNN (BRNN)
![Page 16: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/16.jpg)
16Slide: Santi Pascual
Formulation: One hidden layerDelay unit
(z-1)
![Page 17: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/17.jpg)
17Slide: Santi Pascual
Formulation: Single recurrence
One-timeRecurrence
![Page 18: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/18.jpg)
18Slide: Santi Pascual
Formulation: Multiple recurrences
Recurrence
One time-steprecurrence
T time stepsrecurrences
![Page 19: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/19.jpg)
19Slide: Santi Pascual
RNN problems
Long term memory vanishes because of the T nested multiplications by U.
...
![Page 20: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/20.jpg)
20Slide: Santi Pascual
RNN problems
During training, gradients may explode or vanish because of temporal depth.
Example: Back-propagation in time with 3 steps.
![Page 21: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/21.jpg)
21
Long Short-Term Memory (LSTM)
![Page 22: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/22.jpg)
22Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780.
Long Short-Term Memory (LSTM)
![Page 23: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/23.jpg)
23Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)
Long Short-Term Memory (LSTM)Based on a standard RNN whose neuron activates with tanh...
![Page 24: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/24.jpg)
24
Long Short-Term Memory (LSTM)Ct is the cell state, which flows through the entire chain...
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)
![Page 25: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/25.jpg)
25
Long Short-Term Memory (LSTM)...and is updated with a sum instead of a product. This avoid memory vanishing and exploding/vanishing backprop gradients.
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)
![Page 26: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/26.jpg)
26
Long Short-Term Memory (LSTM)Three gates are governed by sigmoid units (btw [0,1]) define the control of in & out information..
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)
![Page 27: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/27.jpg)
27
Long Short-Term Memory (LSTM)
Forget Gate:
Concatenate
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
![Page 28: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/28.jpg)
28
Long Short-Term Memory (LSTM)
Input Gate Layer
New contribution to cell state
Classic neuronFigure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
![Page 29: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/29.jpg)
29
Long Short-Term Memory (LSTM)
Update Cell State (memory):
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
![Page 30: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/30.jpg)
30
Long Short-Term Memory (LSTM)
Output Gate Layer
Output to next layer
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
![Page 31: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/31.jpg)
31
Long Short-Term Memory (LSTM)
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
![Page 32: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/32.jpg)
32
Gated Recurrent Unit (GRU)
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.
Similar performance as LSTM with less computation.
![Page 33: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/33.jpg)
33Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
Gated Recurrent Unit (GRU)
![Page 34: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/34.jpg)
34
Applications: Machine Translation
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
Language IN
Language OUT
![Page 35: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/35.jpg)
35
Applications: Image Classification
van den Oord, Aaron, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel Recurrent Neural Networks." arXiv preprint arXiv:1601.06759 (2016).
RowLSTM Diagonal BiLSTM
Classification MNIST
![Page 36: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/36.jpg)
36
Applications: Segmentation
Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville, “ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation”. DeepVision CVPRW 2016.
![Page 37: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/37.jpg)
37
Learn moreNando de Freitas (University of Oxford)
![Page 38: Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)](https://reader031.vdocuments.us/reader031/viewer/2022021813/586f89681a28ab54768b5fbf/html5/thumbnails/38.jpg)
38
Thanks ! Q&A ?Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi/ProfessorXavi