lecture11 - dl - rnn & gan · dr. patrick chan @ scut problem of traditional ml traditional ml...

21
Machine Learning Lecture 11 Deep Learning RNN & GAN Dr. Patrick Chan [email protected] South China University of Technology, China 1 Dr. Patrick Chan @ SCUT Agenda Recurrent Neural Network Structure Backpropagation Through Time Types Example: Image Captioning Generative Adversarial Network Generative and Discriminative models Training Process and Loss Function Lecture 11: DL - RNN & GAN 2

Upload: others

Post on 28-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Machine Learning

Lecture 11

Deep Learning

RNN & GAN

Dr. Patrick [email protected]

South China University of Technology, China

1

Dr. Patrick Chan @ SCUT

Agenda

Recurrent Neural Network

Structure

Backpropagation Through Time

Types

Example: Image Captioning

Generative Adversarial Network

Generative and Discriminative models

Training Process and Loss Function

Lecture 11: DL - RNN & GAN2

Page 2: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

Problem of traditional ML

Traditional ML assumes

Sample format is identical

Decision of sample is independent

Cannot deal with the data

Length variation

Information of sample sequence

Lecture 11: DL - RNN & GAN3

Dr. Patrick Chan @ SCUT

Recurrent Neural Network

Sequence learningLearn from and handle the sequential data

Application Example:

Language

Video

Recurrent Neural Network learn the information of sequence

Lecture 11: DL - RNN & GAN4

Page 3: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Structure

Lecture 11: DL - RNN & GAN5

��

�� (�) �� (�) �� (��)

(�) �� (�)

Feed Forward

Neural Network

Recurrent

Neural Network

(�) ��

(�)

��

(�)(��)

��

��

��

tanh is used

Dr. Patrick Chan @ SCUT

RNN

Structure

Lecture 11: DL - RNN & GAN6

��� � �

���

� �

Feed Forward

Neural Network

(�)

����(�)�

� ����(��)�

(�)

���

�(�)�

Recurrent

Neural Network

(�)

(��)

�(�)

���

ℎ(�)

����

���� ��

(�)

��

ℎ�(�) ℎ�

(�)

���

������

��

(�) ��

� ���

����

���� ��

��

ℎ� ℎ� �

��

������

��

��� ���� ����

ℎ(��)

� − �ℎ�

(��) ℎ�(��)

��

Page 4: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Structure

RNN is multiple copies of the same network, each passing a message to a successor

, , and are

shared (Do not change)

Lecture 11: DL - RNN & GAN7

������ ���x(t) h(t) ��(t)

h(t-1)

Unfold the loop

x(1) h(1) ��(1)h(1)

x(2) h(2) ��(2)h(2)

x(3) h(3) ��(3)h(3)

x(t) h(t) ��(t)h(t-1)

h(0)

������ ���

������ ���

������ ���

��� ���

Dr. Patrick Chan @ SCUT

RNN

Example

Character-level language model

Generate one new character at a time by outputting the probability distribution of the next character in the sequence given a sequence of previous characters

Lecture 11: DL - RNN & GAN8

Page 5: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Different Types

Lecture 11: DL - RNN & GAN9

One to One Feed-Forward Network

One to Many Image CaptioningImage > Seq. of Words

Many to One Sentiment ClassificationSeq. of Words > Sentiment

Many to Many TranslationSeq. of Words > Seq. of Words

Video Classification (frame Level)Frame > Class

Dr. Patrick Chan @ SCUT

RNN

Backpropagation Through Time

Parameters of RNN ( , and ) are

shared

Different time steps are affected each other

Derivatives are aggregated across time steps

A special Backpropagation Backpropagation through time (BPTT)

Lecture 11: DL - RNN & GAN10

Page 6: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

J

RNN

Backpropagation Through Time

Lecture 11: DL - RNN & GAN11

�� ()

�����

��� �� = " �� ()

�� ��

= � 12 � () − �� () �

�� ��

= − � () − �� () ��� ()

�� ��

= �%(� ��ℎ (�))�� ��

= �%(� ��ℎ (�))�(� ��ℎ (�))

�(� ��ℎ (�))�� ��

= ℎ (�)

x(1) h(1) ��(1)h(1)

x(2) h(2) ��(2)h(2)

x(3) h(3) ��(3)

h(0)

������ ���

������ ���

��� ���

J(1) y(1)

J(2) y(2)

J(3) y(3)

��� ()

�� ��

�(� ��ℎ (�))�� ��

= %′(� ��ℎ (�)) �(� ��ℎ (�))�� ��

ℎ(�) = %(� ��' � + � ��ℎ (��))

��(�) = %(� ��ℎ (�))

() (�) (�)

() () () �

Dr. Patrick Chan @ SCUT

J

RNN

Backpropagation Through Time

Lecture 11: DL - RNN & GAN12

�� ()

�� ��

��� �� = " �� ()

�� ��

= � 12 � () − �� () �

�� ��

= − � () − �� () ��� ()

�� ��

= �%(� ��ℎ (�))�� ��

= �%(� ��ℎ (�))�(� ��ℎ (�))

�(� ��ℎ (�))�� �� x(1) h(1) ��(1)

h(1)

x(2) h(2) ��(2)h(2)

x(3) h(3) ��(3)

h(0)

������ ���

������ ���

��� ���

J(1) y(1)

J(2) y(2)

J(3) y(3)

��� ()

�� ��

= %′(� ��ℎ (�)) �(� ��ℎ (�))�� ��

= � �� �(ℎ (�))�� ��

�(� ��ℎ (�))�� ��

ℎ(�) = %(� ��' � + � ��ℎ (��))

��(�) = %(� ��ℎ (�))

() (�) (�)

() () () �

Page 7: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

J

RNN

Backpropagation Through Time

Lecture 11: DL - RNN & GAN13

x(1) h(1) ��(1)h(1)

x(2) h(2) ��(2)h(2)

x(3) h(3) ��(3)

h(0)

������ ���

������ ���

��� ���

J(1) y(1)

J(2) y(2)

J(3) y(3)

= �(� ��' � + � ��ℎ (��))�� ��

�(ℎ (�))�� ��

= ℎ (��) + � �� �(ℎ (��))�� ��

Recursive function

�(ℎ ())�� �� = ℎ ()) + � �� �(ℎ ()))

�� ��

= ℎ ())

Base case

= ' � �(� ��)�� �� + � �� �(' � )

�� ��

+ℎ (��) �(� ��)�� �� + � �� �(ℎ (��))

�� ��

ℎ(�) = %(� ��' � + � ��ℎ (��))

��(�) = %(� ��ℎ (�))

() (�) (�)

() () () �

Dr. Patrick Chan @ SCUT

J

RNN

Backpropagation Through Time

Lecture 11: DL - RNN & GAN14

�� ()

�� ��

��� �� = " �� ()

�� ��

= � 12 � () − �� () �

�� ��

= − � () − �� () ��� ()

�� ��

= �%(� ��ℎ (�))�� ��

= �%(� ��ℎ (�))�(� ��ℎ (�))

�(� ��ℎ (�))�� ��

x(1) h(1) ��(1)h(1)

x(2) h(2) ��(2)h(2)

x(3) h(3) ��(3)

h(0)

������ ���

������ ���

��� ���

J(1) y(1)

J(2) y(2)

J(3) y(3)

��� ()

�� ��

= %′(� ��ℎ (�)) �(� ��ℎ (�))�� ��

= � �� �(ℎ (�))�� ��

�(� ��ℎ (�))�� ��

ℎ(�) = %(� ��' � + � ��ℎ (��))

��(�) = %(� ��ℎ (�))

() (�) (�)

() () () �

Page 8: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

J

RNN

Backpropagation Through Time

Lecture 11: DL - RNN & GAN15

x(1) h(1) ��(1)h(1)

x(2) h(2) ��(2)h(2)

x(3) h(3) ��(3)

h(0)

������ ���

������ ���

��� ���

J(1) y(1)

J(2) y(2)

J(3) y(3)

= �(� ��' � + � ��ℎ (��))�� ��

�(ℎ (�))�� ��

�(' ())�� �� = ' () + � �� �(ℎ ()))

�� ��

= ' ()

= ' � �(� ��)�� �� + � �� �(' � )

�� ��

+ℎ (��) �(� ��)�� �� + � �� �(ℎ (��))

�� ��

= ' � + � �� �(ℎ (��))�� ��

ℎ(�) = %(� ��' � + � ��ℎ (��))

��(�) = %(� ��ℎ (�))

() (�) (�)

() () () �

Recursive function

Base case

Dr. Patrick Chan @ SCUT

RNN

Other Structures

RNNs with multiple hidden layers

16

(�)

(�)

()

()

(�)

(�)

(�)

(�)

(*)

(*)

(+)

(+)

Lecture 11: DL - RNN & GAN

Page 9: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Other Structures

Bi-directional RNN

process the input sequence in forward and in the reverse direction

Popular in speech recognition

17

(�)

(�)

(�,)

(��)

()

()

(�)

(�)

(�)

(�)

(*)

(*)

(+)

(+)

Lecture 11: DL - RNN & GAN

Dr. Patrick Chan @ SCUT

RNN

Example: Image Captioning

Lecture 11: DL - RNN & GAN18

straw

()

<START>

()straw

(�)hat

(�)

<END>

(�)hat

(�)

Page 10: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Example: Image Captioning

Lecture 11: DL - RNN & GAN19

Dr. Patrick Chan @ SCUT

RNN

Long-Term Dependency

RNN performs badly when a task requires long-term dependency

Example:

A language model trying to predict the next word based on the previous ones

I grew up in France… I speak fluent French

“I speak fluent” suggests the next word is a language

For narrowing down which language, the context of France is required, which need long dependency

Lecture 11: DL - RNN & GAN20

Page 11: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

Lecture 11: DL - RNN & GAN21

tanh

tanh

-(��) -(�)ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)/(�)0(�)

-1(�)

ℎ(�)

ℎ(�)ℎ(��)

'(�)

tanh(��)

(�) ��

(�)

��

(�) ��

Long Short-TermMemory

RNN

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

Four key components

Cell state

Forget Gate

Input Gate

Output Gate

Lecture 11: DL - RNN & GAN22

tanh

tanh

-(�)ℎ(�)

ℎ(�)

'(�)

.(�)/(�)0(�)

-1(�)tanh

tanh

-(�)ℎ(�)

ℎ(�)

'(�)

.(�)/(�)0(�)

-1(�)tanh

tanh

-()) -()ℎ()

ℎ()ℎ())

'()

.()/()0()

-1()

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 12: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

Cell state

Runs straight down the entire chain with only minor linear interactions

Easy for information to flow along it unchanged

Lecture 11: DL - RNN & GAN23

tanh

tanh

-(��) -(�)ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)/(�)0(�)

-1(�)

-(��) -(�)

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

Cell state can be modified by gates

Gates: Optionally let information through

Sigmoid Function

Range: 0 – 1

Describe how much a component should be let through

An LSTM has three gates, to manage cell state

Lecture 11: DL - RNN & GAN24

tanh

tanh

-(��) -(�)ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)/(�)0(�)

-1(�)

Page 13: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

Forget Gate

Base on and , outputs a number

between 0 and 1 for each number in the cell

state

1 : completely keep the value

0 : completely remove the value

Lecture 11: DL - RNN & GAN25

tanh

tanh

-(��) -(�)ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)/(�)0(�)

-1(�)ℎ(��)

'(�)

0(�)

(�) 2 �� � 2

Example C might include the gender of the

present subject, so the pronouns (he/she) is correct

Until a new subject, the previous gender will be forgotten

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

Input Gate

Decide the new information stored in cell state

tanh: decide which information in update as a new candidate cell state

sigmoid: determine how much should get

involved in update

Lecture 11: DL - RNN & GAN26

(�) � �� � �

(�) 3 �� � 3

tanh

tanh

-(��) -(�)ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)/(�)0(�)

-1(�)tanhℎ(��)

'(�)

/(�)

-1(�)

Example Add the gender of the new subject

to the cell state

Page 14: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

is updated to the new cell state

Forgetting old things by multiplying

Learning new things by adding

Lecture 11: DL - RNN & GAN27

� � �� � �

tanh

tanh

-(��) -(�)ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)/(�)0(�)

-1(�)

-(��) -(�)

/(�)0(�)

-1(�)

Example Drop the information about the old

subject’s gender Add the new information

Dr. Patrick Chan @ SCUT

RNN

Long Short-Term Memory

Output Gate

Output extracted (tanh) and filtered (sigmoid) information of the cell state

tanh: decide which information is output

Sigmoid: decide how much of the cell state should be outputted

Lecture 11: DL - RNN & GAN28

tanh

tanh

-(��) -(�)ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)/(�)0(�)

-1(�)

tanh

ℎ(�)

ℎ(�)ℎ(��)

'(�)

.(�)

� � �� 4 �� � 4

Example Output information relevant to the

current object (C) Output whether the subject is

singular or plural, so the form a verb can be determined

Page 15: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

Generative VS Discriminative

Is it a Lion or Cat?

Method 1

Find out all possible images on Lion and Cat

Compare the image and the collected Lion and Cat images

Method 2

Find out the difference between Lion and Cat

Identify the image according to the difference

Lecture 11: DL - RNN & GAN29

(Generative Model)

(Discriminative Model)

Dr. Patrick Chan @ SCUT

Generative VS Discriminative

Generative Model

Understand everything p(x,y)

More difficult task than Discriminative Model

Classify samples

Able to generate samples

Discriminative Model

Understand difference between classes p(y|x)

Only classify samples

Lecture 11: DL - RNN & GAN30

Page 16: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

Generative Model

Aim to incresase the similarity between and

Lecture 11: DL - RNN & GAN31

RealModel

Simulated Model

Generated FaceReal Face

Dr. Patrick Chan @ SCUT

Generative Model

How to quantify the similarity of real and generated distributions?

Explicit way (evaluation)

E.g. Gaussian assumption

Need prior knowledge

Evaluation may not be reasonable for some applications

Implicit way

Generative Adversarial Network

Evaluated by discriminant accuracy

Lecture 11: DL - RNN & GAN32

Page 17: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

Adversarial Concept

Army race fordefender and attacker

Lecture 11: DL - RNN & GAN33

Face Recognition Put an image in

front of camera

Depth Map

3D Fake Model

Defender Attacker

Dr. Patrick Chan @ SCUT

Generative Adversarial Model

Generative Adversarial Model (GAN) quantify the similarity implicitly by accuracy of classifying the sample

Good classification indicate the real and generated ones are different

GAN contains

Generative Model (G)

Discriminative Model (D)

Lecture 11: DL - RNN & GAN34

RealModel

Generative Model

Discriminative

Model

Real or Generated?

Page 18: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

GAN

Training Process

Generative Model G(z)

Generate samples similar to real one

Input noise (z) in order to

generate different samples each time

z typically has very high dimensionality (higher than x)

Discriminative Model D(x)

Classify whether a sample is real or fake

If x is real, D(x)= 1; otherwise, D(x) = 0

Lecture 11: DL - RNN & GAN35

RealModel

Generative Model

Discriminative

Model

Real or Generated?

Noise (z)

Dr. Patrick Chan @ SCUT

GAN

Training Process

G aims to fool D

D aims not to be fooled

Models are trained simultaneously

As G gets better, D has a more challenging task

As D gets better, G has a more challenging task

Only G is used finally

D aims to assist the training of G

RealModel

Generative Model

Discriminative

Model

Real or Generated?

Noise (z)

Lecture 11: DL - RNN & GAN36

Page 19: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

GAN

Training Process

Lecture 11: DL - RNN & GAN37

Green solid line: probability density function (PDF) of GBlack dotted line: PDF of original x

Blue dash line: PDF of discriminator D

G is not similar to x

D is unstable

D win

(distinguish well)

D is updated

0.5

G win

(cannot distinguish

well)

G is updated

0.5

Start

Dr. Patrick Chan @ SCUT

GAN

Training Process

Lecture 11: DL - RNN & GAN38

G learn well (identical to x)

D does not learn well (not separate well)

Finally (Hopefully…)

Green solid line: probability density function (PDF) of GBlack dotted line: PDF of original x

Blue dash line: PDF of discriminator D

Page 20: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

GAN

Training Process

Loss function for D

If x is real, D(x) = 1; otherwise, D(x) = 0

Minimize the error

Loss function be for G

Maximize the error of D

5 = 5 Minimax procedure

Lecture 11: DL - RNN & GAN39

RealD(x) → 1

=> 78 9 ' → 0

FakeD(x) → 0

=> 78 (1 − 9 ' ) → 0

Dr. Patrick Chan @ SCUT

GAN

Training Process

Train both G and D simultaneously

Stochastic gradient descent

Two ways of training

(1) Compute ; and 5 , and update together

(2) freeze one, calculate the gradient and update another; and then vice versa

A model can be trained without altering the other

Fixed a model, multiple training epochs of another

Increase ability of one side, assign more difficult task to other

Lecture 11: DL - RNN & GAN40

Page 21: Lecture11 - DL - RNN & GAN · Dr. Patrick Chan @ SCUT Problem of traditional ML Traditional ML assumes Sample format is identical Decision of sample is independent Cannotdeal with

Dr. Patrick Chan @ SCUT

GAN

Problem

Training GAN is very difficult

Networks are difficult to converge

Ideal goal: G and D reach desired

equilibrium but this is rare

GANs are yet to converge on large problems

D becomes too strong too quickly and G ends

up not learning anything

G is more complicated task than D

D focuses on difference

G focuses on distribution

Lecture 11: DL - RNN & GAN41

Dr. Patrick Chan @ SCUT

References

https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf

https://medium.com/deep-math-machine-learning-ai/chapter-10-deepnlp-recurrent-neural-networks-with-math-c4a6846a50a2

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Lecture 11: DL - RNN & GAN42