introduction to deep learning - college of computing · introduction to deep learning georgia tech...
TRANSCRIPT
![Page 1: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/1.jpg)
Introduction to Deep Learning
Georgia Tech CS 4650/7650Fall 2020
![Page 2: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/2.jpg)
Outline● Deep Learning
○ CNN
○ RNN
○ Attention
○ Transformer
● Pytorch○ Introduction
○ Basics
○ Examples
![Page 3: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/3.jpg)
CNNs
Some slides borrowed from Fei-Fei Li & Justin Johnson & Serena Yeung at Stanford.
![Page 4: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/4.jpg)
Fully Connected LayerInput
32x32x3 image
Flattened image32*32*3 = 3072 Weight Matrix Output
![Page 5: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/5.jpg)
Convolutional Layer
Input32x32x3 image
Filter5x5x3
Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”
Filters always extend the full depth of the input volume.
![Page 6: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/6.jpg)
Convolutional Layer At each step during the convolution, the filter acts on a region in the input image and results in a single number as output.
This number is the result of the dot product between the values in the filter and the values in the 5x5x3 chunk in the image that the filter acts on.
Combining these together for the entire image results in the activation map.
![Page 7: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/7.jpg)
Convolutional Layer
Filters can be stacked together.
Example- If we had 6 filters of shape 5x5,each would produce an activation map of 28x28x1 and our output would be a “new image” of shape 28x28x6.
![Page 8: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/8.jpg)
Convolutional Layer
Visualizations borrowed from Irhum Shafkat’s blog.
![Page 9: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/9.jpg)
Convolutional Layer
Visualizations borrowed from vdumoulin’s github repo.
StandardConvolution
Convolutionwith Padding
Convolutionwith strides
![Page 10: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/10.jpg)
Convolutional Layer
Output Size:(N - F)/stride + 1
e.g. N = 7, F = 3, stride 1=> (7 - 3)/1 + 1 = 5
e.g. N = 7, F = 3, stride 2 => (7 - 3)/2 + 1 = 3
![Page 11: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/11.jpg)
Pooling Layer
● makes the representations smaller and more manageable
● operates over each activation map independently
![Page 12: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/12.jpg)
Max Pooling
![Page 13: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/13.jpg)
ConvNet Layer
Image credits- Saha’s blog.
![Page 14: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/14.jpg)
● NLP doesn’t use convolutional nets a lot
● Some adjacent applications exist, such as graph convolutions or image-to-text
● For text sequences, it sometimes helps to use 1-dimensional convolutions(because embedding dimension ordering has no intrinsic meaning)
● What does this basicallyamount to?
● N-gram features.
Application in text
![Page 15: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/15.jpg)
RNNs
Some slides borrowed from Fei-Fei Li & Justin Johnson & Serena Yeung at Stanford.
![Page 16: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/16.jpg)
Vanilla Neural Networks
Input
Output
Hidden Layers
Input
Output
Hidden Layers
House Price Prediction
![Page 17: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/17.jpg)
How to model sequences?● Text Classification: Input Sequence → Output label
● Translation: Input Sequence → Output Sequence
● Image Captioning: Input image → Output Sequence
![Page 18: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/18.jpg)
RNN - Recurrent Neural Networks
Vanilla Neural
Networks
e.g.Image captioning
e.g. Text classification
e.g. Translation
e.g.POS tagging
![Page 19: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/19.jpg)
RNN - Representation
Input Vector
Output Vector
Hidden state fed back into the RNN cell
![Page 20: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/20.jpg)
RNN - Recurrence Relation
Input Vector
Output Vector
Hidden state fed back into the RNN cell
The RNN cell consists of a hidden state that is updated whenever a new input is received. At every time step, this hidden state is fed back into the RNN cell.
![Page 21: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/21.jpg)
RNN - Rolled out representation
![Page 22: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/22.jpg)
RNN - Rolled out representation
Same Weight matrix- W
Individual Losses Li
![Page 23: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/23.jpg)
RNN - Backpropagation Through Time
Forward pass through entire sequence to produce intermediate hidden states, output sequence and finally the loss. Backward pass through the entire sequence to compute gradient.
![Page 24: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/24.jpg)
RNN - Backpropagation Through Time
Running Backpropagation through time for the entire text would be very slow. Switch to an approximation-Truncated Backpropagation Through Time
![Page 25: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/25.jpg)
RNN - Truncated Backpropagation Through Time
Run forward and backward through chunks of the
sequence instead of whole sequence
![Page 26: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/26.jpg)
RNN - Truncated Backpropagation Through Time
Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps
![Page 27: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/27.jpg)
RNN TypesThe 3 most common types of Recurrent Neural Networks are:
1. Vanilla RNN2. LSTM (Long Short-Term Memory)3. GRU (Gated Recurrent Units)
Some good resources:
Understanding LSTM Networks
An Empirical Exploration of Recurrent Network Architectures
Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano
Stanford CS231n: Lecture 10 | Recurrent Neural Networks
![Page 28: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/28.jpg)
Attention
Some slides borrowed from Sarah Wiegreffe at Georgia Tech and Abigail See, Stanford CS224n.
![Page 29: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/29.jpg)
RNN
![Page 30: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/30.jpg)
RNN - Attention
![Page 31: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/31.jpg)
RNN - Attention
![Page 32: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/32.jpg)
RNN - Attention
![Page 33: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/33.jpg)
RNN - Attention
![Page 34: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/34.jpg)
RNN - Attention
![Page 35: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/35.jpg)
RNN - Attention
![Page 36: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/36.jpg)
RNN - Attention
![Page 37: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/37.jpg)
RNN - Attention
![Page 38: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/38.jpg)
RNN - Attention
![Page 39: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/39.jpg)
Attention
![Page 40: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/40.jpg)
![Page 41: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/41.jpg)
Drawbacks of RNN
![Page 42: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/42.jpg)
Transformer
Some slides borrowed from Sarah Wiegreffe at Georgia Tech and “The Illustrated Transformer” https://jalammar.github.io/illustrated-transformer/
![Page 43: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/43.jpg)
Transformer
![Page 44: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/44.jpg)
Self-Attention
![Page 45: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/45.jpg)
Self-Attention
![Page 46: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/46.jpg)
Self-Attention
![Page 47: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/47.jpg)
Self-Attention
![Page 48: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/48.jpg)
Multi-Head Self-Attention
![Page 49: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/49.jpg)
Retaining Hidden State Size
![Page 50: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/50.jpg)
Details of Each Attention Sub-Layer of Transformer Encoder
![Page 51: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/51.jpg)
Each Layer of Transformer Encoder
![Page 52: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/52.jpg)
Positional Encoding
![Page 53: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/53.jpg)
Each Layer of Transformer Decoder
![Page 54: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/54.jpg)
Transformer Decoder - Masked Multi-Head AttentionProblem of Encoder self-attention: we can’t see the future !
![Page 55: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/55.jpg)
Transformer
![Page 56: Introduction to Deep Learning - College of Computing · Introduction to Deep Learning Georgia Tech CS 4650/7650 Fall 2020. Outline Deep Learning CNN RNN Attention Transformer Pytorch](https://reader033.vdocuments.us/reader033/viewer/2022060519/604d5273f014214a5755dab8/html5/thumbnails/56.jpg)
Thank you!