lecture 12 generative models: variationalauto encoder (vae) and...

37
Practical Machine Learning Dr. Suyong Eum Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and Generative Adversarial Networks (GAN)

Upload: others

Post on 12-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

1

Practical Machine Learning

Dr. Suyong Eum

Lecture 12

Generative Models: Variational Auto Encoder (VAE)

and Generative Adversarial Networks (GAN)

Page 2: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

2

Reinforcement Learning

Supervised Learning

Unsupervised Learning

LCR (week2) SVM (week5) CNN (week8) RNN (week10)

GMM (week3) HMM (week4) PCA (week6) VAE (week12) GAN (week12)

DQN (week14) PG (week14)

Where we are

Page 3: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

3

The basic concept of generative models

Two generative models:1) Variational Auto Encoder (VAE)2) Generative Adversarial Networks (GAN)

Some applications you may be interested

You are going to learn

Page 4: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

4

What are Generative models?

There are two types of models:1) Discriminative models2) Generative models

Determinative model, e.g., SVM Generative model, e.g, GMM

Page 5: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

5

Literally speaking, a sample can be generated from generative models.- Of course, the model needs to be trained in advance to generate such a

sample which you are interested.

Each data pointhas 64 dimension

Project the data points in2 dimension

e.g., GMM

Basic operation

Page 6: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

6

Literally speaking, a sample can be generated from generative models.- Of course, the model needs to be trained in advance to generate such a

sample which you are interested.

Each data pointhas 64 dimension

Project the data point in2 dimension

8

8

64 dimensions

a sample can be

generated

We can generate a new image which corresponds to the sample.

It is NOT one of training data points!

e.g., GMM

Basic operation

Page 7: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

7

Variational AutoEncoder (VAE)

Page 8: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

8

Idea of VAE

Each data pointhas 64 dimension

Project the data point in2 dimension

8

8

64 dimensions

a sample can be

generated

We can generate a new image which corresponds to the sample.

It is NOT one of training data points!

To build a method which does the following procedure systematically.

Page 9: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

9

Idea of VAE

Assuming that there is a complex model parameterized with “θ” The model generates data {x(1), x(2), … , x(N)} given a latent variable “z”: pθ(x|z)

Also, the model maps data set into the latent space: pθ(z|x)

)|x( )((i) izp

Data space (x)

)x|( (i))(izpIt is intractable

Data space (x)

a sample

Latent space (z)

Page 10: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

10

Idea of VAE

This Intractability is well known, which can be handled with 1) Markov Chain Monte Carlos (MCMC) and 2) Variational Inference (VI).

VAE uses the idea of Variational Inference and so the term “Variational” is in the name.

)|x( )((i) izp

Data space (x)

)x|( (i))(izpIt is intractable

Data space (x)

a sample

Latent space (z)

Page 11: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

11

Auto Encoder is a neural network which reproduces its input

Encoder Decoder

)|x( )((i) izp

x x

Data space (x) Data space (x)Latent space (z)

)x|( (i))(izp

Approximate the function “p” using a neural network

Page 12: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

12

Auto Encoder is a neural network which reproduces its input

Encoder Decoder

)|x( )((i) izp

x x

Data space (x) Data space (x)Latent space (z)

)x|( (i))(izp

)x|( (i))(izq

approximation

Assuming qφ() as Gaussian distribution

Approximate the function “p” using a neural network

Page 13: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

13

Auto Encoder is a neural network which reproduces its input

2 dimensions

x x

Data space (x) Data space (x)Latent space (z)

Encoder Decoder

)|x( )((i) izp)x|( (i))(izq

Approximate the function “p” using a neural network

Page 14: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

14

VAE model

z

. . .

. . .

1x1z

2z

2

1z

2

2z

1z

2z

x

8

8

64x

2

64x

2

1x

pixel1

pixel2

pixel64

pixel63

8

8

x. . .

pixel1

pixel2

pixel64

pixel63

sampling

},{ bw

},{ bw

sampling

qφ(z|x) ~ N(μ, σ2)

pθ(x|z) ~ N(μ, σ2) or Bernoulli

pθ(z) ~N(0, 1)

Page 15: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

15

How can we train the network to obtain the parameter φ and θ ?

- To train the neural network, a loss function is necessary. Then, the parameter “φ and θ” can be calculated through a backpropagation.

Let’s derive the loss function from the likelihood function pθ(x)

)x(log)x,,x(log (i)

1

(N)(1)

N

ipp

N

ipp

1

(i)(N)(1) )x(log)x,,x(

))x|(||)x|(()x;,()x(log (i)(i)(i)(i) zpzqDLp KL

Since DKL ≥0, “L” is the lower bound of the likelihood function, which is called “ELBO” (Evidence Lower Bound)

A loss function for AutoEncoder

(log) likelihood

Likelihood function showing the probability that given batch data set occur with the parameter θ in the neural network.

Kullback-Leibnitz divergence showing how difference between two posterior distributions: true posterior p(z|x) and its approximate posterior q(z|x)

This term is intractable because of p(z|x). However, we know DKL ≥0.

Page 16: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

16

)x(log)x|()x(log pzqpz

x)|p(z

)x,(log)x|(

zpzq

z

x)|p(z

)x|(

x)|q(z

)x,(log)x|(

zqzpzq

z

x)|p(z

)x|(log)x|(

x)|q(z

)x,(log)x|(

zqzq

zpzq

zz

))x|(||)x|(()x;,()x(log (i)(i)(i)(i) zpzqDLp KL

)|(

),()(

)()|(),(

),(),(

xzp

zxpxp

xpxzpzxp

xzpzxp

)x(log)x,,x(log (i)

1

(N)(1)

N

ipp

Proof: If you are interested…

Page 17: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

17

By maximizing “L”, we can maximize the likelihood function as well

x)|(zq

)x,(log)x|()x;,( (i)

zpzqL

z

)x;,()x(log (i)(i) Lp

))|x((log))(||)x|(( (i)

)x|(

(i)(i) zpEzpzqD

zqKL

Lower bound of the likelihood function

x)|(zq

)z()z|x(log)x|(

ppzq

z

)z|x(log)x|(x)|(zq

)z(log)x|(

pzq

pzq

zz

A loss function for AutoEncoder

Page 18: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

18

))|x((log))(||)x|(()x(log (i)

)x|(

(i)(i)(i) zpEzpzqDp

zqKL

J

j

i

z

i

z

i

z jjj

1

2)(2)(2)( )()())log((12

1

D

j x

x

i

ji

x

j

j

j

x

12

2)(

2)(

2))log((

2

1

qφ(z|x) ~ N(μ, σ2) pθ(z) ~ N(0, I) J: dimension of z

Auto-Encoding Variational Bayes (appendix B: derivation) https://arxiv.org/pdf/1312.6114.pdf

It can be computed in a closed form pθ(x|z) ~ N(μ, σ2) or Bernoulli D: dimension of x

|ˆ|min xx

A loss function for AutoEncoder

By maximizing “L”, we can maximize the likelihood function as well, In other words, the most likely model (pθ and pφ), which generates the observed data,

can be obtained by maximizing the function below.

Page 19: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

19

Variational Auto Encoder (VAE): summary

z1

z2

A generative model based on a neural network (AutoEncoder) Its loss function is derived based on variational inference approach (Variational) The loss function calculates the error used to train Auto Encoder through backpropagation

- That is the reason why it is called “Variational AutoEncoder” (VAE).

Page 20: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

20

Generative Adversarial Networks (GAN)

https://arxiv.org/pdf/1701.00160.pdf

Page 21: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

21

What is the motivation of GAN instead of VAE?

In VAE, we design a latent space which maps to a data space. Then, a latent variable in the space is used to generate a data sample. However, actually we are interested in not the latent space but a sample itself. Then, why do we generate samples directly without the latent space estimation?

https://arxiv.org/pdf/1701.00160.pdf

Page 22: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

22

How does GAN work?

Generator(Counterfeiter)

Discriminator(Police)

GAN: Generative Adversarial Network Based on game theory to train the system which directly generates a sample Adversarial:

‘GAN framework can naturally be analyzed with the tools of game theory, we call GANs

“adversarial”.’ - Ian Goodfellow

Real or Fake

Making a fake money

https://arxiv.org/pdf/1701.00160.pdf

Page 23: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

23

)))]((1[log()]([log),(maxmin )(~)(~ zGDExDEGDV zpzxpxDG zdata

Theory: formulation of an optimization problem

Expectation that discriminator (D) tells real is real (D successes)

Training discriminator to maximize it

Expectation that D tells fake is real(D fails)

Training generator to minimize a fake

notation description

x ~ pdata(x) Real data sample

z ~ pz(z) A random number from N(0, 1)

G(z) Fake data sample

D(x)=1 Probability of discriminator (D) telling that given real data “x” is real

D(G(z))=0 Probability of discriminator (D) telling that given fake data “G(z)” is fake

1 – D(G(z)) Probability of discriminator (D) telling that given fake data “G(z)” is real

Page 24: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

24

)))]((1[log()]([log),(maxmin )(~)(~ zGDExDEGDV zpzxpxDG zdata

Theory: illustration

Real data dist.

Fake data dist.Discriminator

Generator keeps trained to generate a fake one similar to real and so finally Discriminator cannot tell a fake from a real => its probability becomes 0.5.

0.5

https://arxiv.org/abs/1406.2661

Page 25: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

25

)))]((1[log()]([log),(maxmin )(~)(~ zGDExDEGDV zpzxpxDG zdata

Theory: its global optimal solution pg=pdata

)()(

)()(*

xpxp

xpxD

gdata

dataG

For the fixed generator (G), the optimal discriminator value (D*) is

z

zx

dataD

dzzGDzpdxxDxpDV )))((1log()())(log()()(max

0)(1

)(

)(

)()(

xD

xp

xD

xp

dD

DdV gdata

dxxDxpxDxp gx

data ))(1log()())(log()(

)(5.0 datag pp

Page 26: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

26

Theory: its global optimal solution pg=pdata

With the optimum value of D*, lower bound of V(G) is

)))]((1[log()]([log),(maxmin )(~)(~ zGDExDEGDV zpzxpxDG zdata

)))](1[log()]([log)(min *

)(~

*

)(~ xDExDEGV xpzxpxG gdata

)||(2)4log( gdata ppJSD

JSD: Jensen Shannon divergence- A method of measuring the similarity between two probability distribution.- 0 ≤ JSD(p|q) ≤1

Backup slide

This is the minimum value of V(G) when JSD=0 (pg=pdata)

Page 27: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

27

Generative Adversarial Networks (GAN): summary

Given the system below, we train it based on the objective function. The objective function is derived based on game theory.

- Generator tries to make a real like fake data to deceive the discriminator- Discriminator tries not to be deceived by the generator

In this manner, generator learns how to make a sample close to real data. It is about how to define the objective function and whether it converges to an

optimum solution.

Page 28: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

28

Applications

Page 29: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

29

Explosive growth of the popularity of GAN

https://deephunt.in/the-gan-zoo-79597dc8c347

Page 30: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

30

High resolution image generation

https://arxiv.org/pdf/1703.10717.pdf (BGAN)

Page 31: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

31

Text to image

https://arxiv.org/pdf/1710.10916.pdf (StackGAN)

Page 32: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

32

Style transfer

https://github.com/junyanz/CycleGAN

Page 33: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

33

Image Completion with Deep Learning in TensorFlow

http://bamos.github.io/2016/08/09/deep-completion/

Page 34: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

34

Image2Vec

https://arxiv.org/pdf/1511.06434.pdf

A B

C D

A->C = B->DC-A = D-BC-A+B=D

+ =

Page 35: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

35

Deep Feature Interpolation

https://arxiv.org/pdf/1611.05507.pdf

Page 36: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

36

Backup Slides

Page 37: Lecture 12 Generative Models: VariationalAuto Encoder (VAE) and …suyongeum.com/ML/lectures/LectureW12_20180707_print.pdf · 2019-01-21 · 3 The basic concept of generative models

37

V(G) when D is fixed

)4log()4log()()( GVGV

)()(

)(log

)()(

)(log)(min )(~)(~

xpxp

xpE

xpxp

xpEGV

gdata

g

xpx

gdata

dataxpx

G gdata

2||

2||)4log(

gdata

g

gdata

data

pppKL

pppKL

)||(2

1)||(

2

1)||( MQKLMPKLQPJSD

)||(2)4log( gdata ppJSD

)2log())(*1log()()2log())(*log()()4log( xDxpxDxp gdata

Q

PPQPKL log)||(

)2log()()(

)(log)()2log(

)()(

)(log)()4log(

xpxp

xpxp

xpxp

xpxp

gdata

g

g

gdata

datadata

)4log())](1[log()]([log)4log( )(~)(~ xDExDE xpxxpx qdata