lecture11 - dl - rnn & gan · dr. patrick chan @ scut problem of traditional ml traditional ml...
TRANSCRIPT
Machine Learning
Lecture 11
Deep Learning
RNN & GAN
Dr. Patrick [email protected]
South China University of Technology, China
1
Dr. Patrick Chan @ SCUT
Agenda
Recurrent Neural Network
Structure
Backpropagation Through Time
Types
Example: Image Captioning
Generative Adversarial Network
Generative and Discriminative models
Training Process and Loss Function
Lecture 11: DL - RNN & GAN2
Dr. Patrick Chan @ SCUT
Problem of traditional ML
Traditional ML assumes
Sample format is identical
Decision of sample is independent
Cannot deal with the data
Length variation
Information of sample sequence
Lecture 11: DL - RNN & GAN3
Dr. Patrick Chan @ SCUT
Recurrent Neural Network
Sequence learningLearn from and handle the sequential data
Application Example:
Language
Video
Recurrent Neural Network learn the information of sequence
Lecture 11: DL - RNN & GAN4
Dr. Patrick Chan @ SCUT
RNN
Structure
Lecture 11: DL - RNN & GAN5
��
�� (�) �� (�) �� (��)
(�) �� (�)
Feed Forward
Neural Network
Recurrent
Neural Network
(�) ��
(�)
��
(�)(��)
��
��
��
tanh is used
Dr. Patrick Chan @ SCUT
RNN
Structure
Lecture 11: DL - RNN & GAN6
��� � �
�
���
� �
�
Feed Forward
Neural Network
(�)
����(�)�
� ����(��)�
�
(�)
���
�(�)�
�
Recurrent
Neural Network
(�)
(��)
�(�)
���
ℎ(�)
����
���� ��
(�)
��
ℎ�(�) ℎ�
(�)
���
������
��
(�) ��
� ���
ℎ
����
���� ��
��
ℎ� ℎ� �
��
������
��
��� ���� ����
ℎ(��)
�
� − �ℎ�
(��) ℎ�(��)
��
Dr. Patrick Chan @ SCUT
RNN
Structure
RNN is multiple copies of the same network, each passing a message to a successor
, , and are
shared (Do not change)
Lecture 11: DL - RNN & GAN7
������ ���x(t) h(t) ��(t)
h(t-1)
Unfold the loop
x(1) h(1) ��(1)h(1)
x(2) h(2) ��(2)h(2)
x(3) h(3) ��(3)h(3)
x(t) h(t) ��(t)h(t-1)
h(0)
������ ���
������ ���
������ ���
��� ���
…
Dr. Patrick Chan @ SCUT
RNN
Example
Character-level language model
Generate one new character at a time by outputting the probability distribution of the next character in the sequence given a sequence of previous characters
Lecture 11: DL - RNN & GAN8
Dr. Patrick Chan @ SCUT
RNN
Different Types
Lecture 11: DL - RNN & GAN9
One to One Feed-Forward Network
One to Many Image CaptioningImage > Seq. of Words
Many to One Sentiment ClassificationSeq. of Words > Sentiment
Many to Many TranslationSeq. of Words > Seq. of Words
Video Classification (frame Level)Frame > Class
Dr. Patrick Chan @ SCUT
RNN
Backpropagation Through Time
Parameters of RNN ( , and ) are
shared
Different time steps are affected each other
Derivatives are aggregated across time steps
A special Backpropagation Backpropagation through time (BPTT)
Lecture 11: DL - RNN & GAN10
Dr. Patrick Chan @ SCUT
J
RNN
Backpropagation Through Time
Lecture 11: DL - RNN & GAN11
�� ()
�����
��� �� = " �� ()
�� ��
�
= � 12 � () − �� () �
�� ��
= − � () − �� () ��� ()
�� ��
= �%(� ��ℎ (�))�� ��
= �%(� ��ℎ (�))�(� ��ℎ (�))
�(� ��ℎ (�))�� ��
= ℎ (�)
x(1) h(1) ��(1)h(1)
x(2) h(2) ��(2)h(2)
x(3) h(3) ��(3)
h(0)
������ ���
������ ���
��� ���
J(1) y(1)
J(2) y(2)
J(3) y(3)
��� ()
�� ��
�(� ��ℎ (�))�� ��
= %′(� ��ℎ (�)) �(� ��ℎ (�))�� ��
ℎ(�) = %(� ��' � + � ��ℎ (��))
��(�) = %(� ��ℎ (�))
() (�) (�)
() () () �
Dr. Patrick Chan @ SCUT
J
RNN
Backpropagation Through Time
Lecture 11: DL - RNN & GAN12
�� ()
�� ��
��� �� = " �� ()
�� ��
�
= � 12 � () − �� () �
�� ��
= − � () − �� () ��� ()
�� ��
= �%(� ��ℎ (�))�� ��
= �%(� ��ℎ (�))�(� ��ℎ (�))
�(� ��ℎ (�))�� �� x(1) h(1) ��(1)
h(1)
x(2) h(2) ��(2)h(2)
x(3) h(3) ��(3)
h(0)
������ ���
������ ���
��� ���
J(1) y(1)
J(2) y(2)
J(3) y(3)
��� ()
�� ��
= %′(� ��ℎ (�)) �(� ��ℎ (�))�� ��
= � �� �(ℎ (�))�� ��
�(� ��ℎ (�))�� ��
ℎ(�) = %(� ��' � + � ��ℎ (��))
��(�) = %(� ��ℎ (�))
() (�) (�)
() () () �
Dr. Patrick Chan @ SCUT
J
RNN
Backpropagation Through Time
Lecture 11: DL - RNN & GAN13
x(1) h(1) ��(1)h(1)
x(2) h(2) ��(2)h(2)
x(3) h(3) ��(3)
h(0)
������ ���
������ ���
��� ���
J(1) y(1)
J(2) y(2)
J(3) y(3)
= �(� ��' � + � ��ℎ (��))�� ��
�(ℎ (�))�� ��
= ℎ (��) + � �� �(ℎ (��))�� ��
Recursive function
�(ℎ ())�� �� = ℎ ()) + � �� �(ℎ ()))
�� ��
= ℎ ())
Base case
= ' � �(� ��)�� �� + � �� �(' � )
�� ��
+ℎ (��) �(� ��)�� �� + � �� �(ℎ (��))
�� ��
ℎ(�) = %(� ��' � + � ��ℎ (��))
��(�) = %(� ��ℎ (�))
() (�) (�)
() () () �
Dr. Patrick Chan @ SCUT
J
RNN
Backpropagation Through Time
Lecture 11: DL - RNN & GAN14
�� ()
�� ��
��� �� = " �� ()
�� ��
�
= � 12 � () − �� () �
�� ��
= − � () − �� () ��� ()
�� ��
= �%(� ��ℎ (�))�� ��
= �%(� ��ℎ (�))�(� ��ℎ (�))
�(� ��ℎ (�))�� ��
x(1) h(1) ��(1)h(1)
x(2) h(2) ��(2)h(2)
x(3) h(3) ��(3)
h(0)
������ ���
������ ���
��� ���
J(1) y(1)
J(2) y(2)
J(3) y(3)
��� ()
�� ��
= %′(� ��ℎ (�)) �(� ��ℎ (�))�� ��
= � �� �(ℎ (�))�� ��
�(� ��ℎ (�))�� ��
ℎ(�) = %(� ��' � + � ��ℎ (��))
��(�) = %(� ��ℎ (�))
() (�) (�)
() () () �
Dr. Patrick Chan @ SCUT
J
RNN
Backpropagation Through Time
Lecture 11: DL - RNN & GAN15
x(1) h(1) ��(1)h(1)
x(2) h(2) ��(2)h(2)
x(3) h(3) ��(3)
h(0)
������ ���
������ ���
��� ���
J(1) y(1)
J(2) y(2)
J(3) y(3)
= �(� ��' � + � ��ℎ (��))�� ��
�(ℎ (�))�� ��
�(' ())�� �� = ' () + � �� �(ℎ ()))
�� ��
= ' ()
= ' � �(� ��)�� �� + � �� �(' � )
�� ��
+ℎ (��) �(� ��)�� �� + � �� �(ℎ (��))
�� ��
= ' � + � �� �(ℎ (��))�� ��
ℎ(�) = %(� ��' � + � ��ℎ (��))
��(�) = %(� ��ℎ (�))
() (�) (�)
() () () �
Recursive function
Base case
Dr. Patrick Chan @ SCUT
RNN
Other Structures
RNNs with multiple hidden layers
16
(�)
(�)
()
()
(�)
(�)
(�)
(�)
(*)
(*)
(+)
(+)
Lecture 11: DL - RNN & GAN
Dr. Patrick Chan @ SCUT
RNN
Other Structures
Bi-directional RNN
process the input sequence in forward and in the reverse direction
Popular in speech recognition
17
(�)
(�)
(�,)
(��)
()
()
(�)
(�)
(�)
(�)
(*)
(*)
(+)
(+)
Lecture 11: DL - RNN & GAN
Dr. Patrick Chan @ SCUT
RNN
Example: Image Captioning
Lecture 11: DL - RNN & GAN18
straw
()
<START>
()straw
(�)hat
(�)
<END>
(�)hat
(�)
Dr. Patrick Chan @ SCUT
RNN
Example: Image Captioning
Lecture 11: DL - RNN & GAN19
Dr. Patrick Chan @ SCUT
RNN
Long-Term Dependency
RNN performs badly when a task requires long-term dependency
Example:
A language model trying to predict the next word based on the previous ones
I grew up in France… I speak fluent French
“I speak fluent” suggests the next word is a language
For narrowing down which language, the context of France is required, which need long dependency
Lecture 11: DL - RNN & GAN20
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
Lecture 11: DL - RNN & GAN21
tanh
tanh
-(��) -(�)ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)/(�)0(�)
-1(�)
ℎ(�)
ℎ(�)ℎ(��)
'(�)
tanh(��)
(�) ��
(�)
��
(�) ��
Long Short-TermMemory
RNN
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
Four key components
Cell state
Forget Gate
Input Gate
Output Gate
Lecture 11: DL - RNN & GAN22
tanh
tanh
-(�)ℎ(�)
ℎ(�)
'(�)
.(�)/(�)0(�)
-1(�)tanh
tanh
-(�)ℎ(�)
ℎ(�)
'(�)
.(�)/(�)0(�)
-1(�)tanh
tanh
-()) -()ℎ()
ℎ()ℎ())
'()
.()/()0()
-1()
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
Cell state
Runs straight down the entire chain with only minor linear interactions
Easy for information to flow along it unchanged
Lecture 11: DL - RNN & GAN23
tanh
tanh
-(��) -(�)ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)/(�)0(�)
-1(�)
-(��) -(�)
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
Cell state can be modified by gates
Gates: Optionally let information through
Sigmoid Function
Range: 0 – 1
Describe how much a component should be let through
An LSTM has three gates, to manage cell state
Lecture 11: DL - RNN & GAN24
tanh
tanh
-(��) -(�)ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)/(�)0(�)
-1(�)
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
Forget Gate
Base on and , outputs a number
between 0 and 1 for each number in the cell
state
1 : completely keep the value
0 : completely remove the value
Lecture 11: DL - RNN & GAN25
tanh
tanh
-(��) -(�)ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)/(�)0(�)
-1(�)ℎ(��)
'(�)
0(�)
(�) 2 �� � 2
Example C might include the gender of the
present subject, so the pronouns (he/she) is correct
Until a new subject, the previous gender will be forgotten
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
Input Gate
Decide the new information stored in cell state
tanh: decide which information in update as a new candidate cell state
sigmoid: determine how much should get
involved in update
Lecture 11: DL - RNN & GAN26
(�) � �� � �
(�) 3 �� � 3
tanh
tanh
-(��) -(�)ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)/(�)0(�)
-1(�)tanhℎ(��)
'(�)
/(�)
-1(�)
Example Add the gender of the new subject
to the cell state
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
is updated to the new cell state
Forgetting old things by multiplying
Learning new things by adding
Lecture 11: DL - RNN & GAN27
� � �� � �
tanh
tanh
-(��) -(�)ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)/(�)0(�)
-1(�)
-(��) -(�)
/(�)0(�)
-1(�)
Example Drop the information about the old
subject’s gender Add the new information
Dr. Patrick Chan @ SCUT
RNN
Long Short-Term Memory
Output Gate
Output extracted (tanh) and filtered (sigmoid) information of the cell state
tanh: decide which information is output
Sigmoid: decide how much of the cell state should be outputted
Lecture 11: DL - RNN & GAN28
tanh
tanh
-(��) -(�)ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)/(�)0(�)
-1(�)
tanh
ℎ(�)
ℎ(�)ℎ(��)
'(�)
.(�)
� � �� 4 �� � 4
Example Output information relevant to the
current object (C) Output whether the subject is
singular or plural, so the form a verb can be determined
Dr. Patrick Chan @ SCUT
Generative VS Discriminative
Is it a Lion or Cat?
Method 1
Find out all possible images on Lion and Cat
Compare the image and the collected Lion and Cat images
Method 2
Find out the difference between Lion and Cat
Identify the image according to the difference
Lecture 11: DL - RNN & GAN29
(Generative Model)
(Discriminative Model)
Dr. Patrick Chan @ SCUT
Generative VS Discriminative
Generative Model
Understand everything p(x,y)
More difficult task than Discriminative Model
Classify samples
Able to generate samples
Discriminative Model
Understand difference between classes p(y|x)
Only classify samples
Lecture 11: DL - RNN & GAN30
Dr. Patrick Chan @ SCUT
Generative Model
Aim to incresase the similarity between and
Lecture 11: DL - RNN & GAN31
RealModel
Simulated Model
Generated FaceReal Face
Dr. Patrick Chan @ SCUT
Generative Model
How to quantify the similarity of real and generated distributions?
Explicit way (evaluation)
E.g. Gaussian assumption
Need prior knowledge
Evaluation may not be reasonable for some applications
Implicit way
Generative Adversarial Network
Evaluated by discriminant accuracy
Lecture 11: DL - RNN & GAN32
Dr. Patrick Chan @ SCUT
Adversarial Concept
Army race fordefender and attacker
Lecture 11: DL - RNN & GAN33
Face Recognition Put an image in
front of camera
Depth Map
3D Fake Model
Defender Attacker
Dr. Patrick Chan @ SCUT
Generative Adversarial Model
Generative Adversarial Model (GAN) quantify the similarity implicitly by accuracy of classifying the sample
Good classification indicate the real and generated ones are different
GAN contains
Generative Model (G)
Discriminative Model (D)
Lecture 11: DL - RNN & GAN34
RealModel
Generative Model
Discriminative
Model
Real or Generated?
Dr. Patrick Chan @ SCUT
GAN
Training Process
Generative Model G(z)
Generate samples similar to real one
Input noise (z) in order to
generate different samples each time
z typically has very high dimensionality (higher than x)
Discriminative Model D(x)
Classify whether a sample is real or fake
If x is real, D(x)= 1; otherwise, D(x) = 0
Lecture 11: DL - RNN & GAN35
RealModel
Generative Model
Discriminative
Model
Real or Generated?
Noise (z)
Dr. Patrick Chan @ SCUT
GAN
Training Process
G aims to fool D
D aims not to be fooled
Models are trained simultaneously
As G gets better, D has a more challenging task
As D gets better, G has a more challenging task
Only G is used finally
D aims to assist the training of G
RealModel
Generative Model
Discriminative
Model
Real or Generated?
Noise (z)
Lecture 11: DL - RNN & GAN36
Dr. Patrick Chan @ SCUT
GAN
Training Process
Lecture 11: DL - RNN & GAN37
Green solid line: probability density function (PDF) of GBlack dotted line: PDF of original x
Blue dash line: PDF of discriminator D
G is not similar to x
D is unstable
D win
(distinguish well)
D is updated
0.5
G win
(cannot distinguish
well)
G is updated
0.5
Start
Dr. Patrick Chan @ SCUT
GAN
Training Process
Lecture 11: DL - RNN & GAN38
G learn well (identical to x)
D does not learn well (not separate well)
Finally (Hopefully…)
Green solid line: probability density function (PDF) of GBlack dotted line: PDF of original x
Blue dash line: PDF of discriminator D
Dr. Patrick Chan @ SCUT
GAN
Training Process
Loss function for D
If x is real, D(x) = 1; otherwise, D(x) = 0
Minimize the error
Loss function be for G
Maximize the error of D
5 = 5 Minimax procedure
Lecture 11: DL - RNN & GAN39
RealD(x) → 1
=> 78 9 ' → 0
FakeD(x) → 0
=> 78 (1 − 9 ' ) → 0
Dr. Patrick Chan @ SCUT
GAN
Training Process
Train both G and D simultaneously
Stochastic gradient descent
Two ways of training
(1) Compute ; and 5 , and update together
(2) freeze one, calculate the gradient and update another; and then vice versa
A model can be trained without altering the other
Fixed a model, multiple training epochs of another
Increase ability of one side, assign more difficult task to other
Lecture 11: DL - RNN & GAN40
Dr. Patrick Chan @ SCUT
GAN
Problem
Training GAN is very difficult
Networks are difficult to converge
Ideal goal: G and D reach desired
equilibrium but this is rare
GANs are yet to converge on large problems
D becomes too strong too quickly and G ends
up not learning anything
G is more complicated task than D
D focuses on difference
G focuses on distribution
Lecture 11: DL - RNN & GAN41
Dr. Patrick Chan @ SCUT
References
https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf
https://medium.com/deep-math-machine-learning-ai/chapter-10-deepnlp-recurrent-neural-networks-with-math-c4a6846a50a2
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Lecture 11: DL - RNN & GAN42