mark glickman and pavlos protopapas

41
Lecture 20: Deep Generative Models CS 109B, STAT 121B, AC 209B, CSE 109B Mark Glickman and Pavlos Protopapas

Upload: others

Post on 15-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mark Glickman and Pavlos Protopapas

Lecture20:DeepGenerativeModelsCS109B,STAT121B,AC209B,CSE109B

MarkGlickmanandPavlosProtopapas

Page 2: Mark Glickman and Pavlos Protopapas

Thislectureisbasedonthefollowingtutorials:I.Goodfellow,GenerativeAdversarialNetworks(GANs),NIPS2016TutorialShenlongWang,DeepGenerativeModels,http://www.cs.toronto.edu/~slwang/generative_model.pdf

Page 3: Mark Glickman and Pavlos Protopapas

GenerativeModelDensityEstimation

SampleGeneration

Page 4: Mark Glickman and Pavlos Protopapas

Whygenerativemodeling?

•  Modelinghigh-dimprobabilitydistributions•  Denoisingdamagedsamples•  SimulatefutureeventsforplanningandRL•  Imputingmissingdata– Semi-supervisedLearning

•  Multi-modaldata– Multiplecorrectanswers

Page 5: Mark Glickman and Pavlos Protopapas

NextVideoFramePrediction

Lotteretal.,2016

Page 6: Mark Glickman and Pavlos Protopapas

SingleImageSuper-resolution

•  Synthesizeahigh-resolutionequivalentfromalow-resolutionimage

Ledigetal.,2016

Page 7: Mark Glickman and Pavlos Protopapas

InteractiveImageGeneration

Zhuetal.,2016

Page 8: Mark Glickman and Pavlos Protopapas

Image-to-ImageTranslation

Isolaetal.,2016

Page 9: Mark Glickman and Pavlos Protopapas

TaxonomyofDeepGenerativeModels

DeepGenerativeModels

ExplicitDensity

MarkovChain

Variational

ImplicitDensity

MarkovChain

Direct

VariationalAuto-encoders(VAE)

BoltzmannMachinesDeepBeliefNetworks

GenerativeAdversarialNetworks(GAN)

GenerativeMomentMatchingNetworks

(GMM)

Page 10: Mark Glickman and Pavlos Protopapas

Outline

•  ExplicitDensityModels– BoltzmannMachines&DeepBeliefNetworks(DBN)– VariationalAuto-encoders(VAE)

•  ImplicitDensityModels– GenerativeAdversarialNetworks(GAN)

Page 11: Mark Glickman and Pavlos Protopapas

BoltzmannMachines

E(x) = −xTUx − bT x

x:d-dimensionalbinaryvectorEnergyfunctiongivenby:

P(x) = exp(−E(x))Z

Z = exp(−E(x))x∑

Page 12: Mark Glickman and Pavlos Protopapas

VisibleandHiddenUnits

E(v,h) = − vTRv− vTWh− hTSh− bTv− cTh

x:d-dimensionalbinaryvectorEnergyfunctiongivenby:

P(v,h) = exp(−E(v,h))Z

Z = exp(−E(v,h))v,h∑

Page 13: Mark Glickman and Pavlos Protopapas

TrainingBoltzmannMachines

•  Maximum-likelihoodestimation•  Partitionfunctionisintractable•  MaybeestimatedwithMCMCmethods

Page 14: Mark Glickman and Pavlos Protopapas

RestrictedBoltzmannMachines

•  Noconnectionsamonghiddenunits,andamongvisibleunits(bipartitegraph)

Page 15: Mark Glickman and Pavlos Protopapas

RestrictedBoltzmannMachines

E(v,h) = − bTv− cTh− vTWh

x:d-dimensionalbinaryvectorEnergyfunctiongivenby:

P(v,h) = exp(−E(v,h))Z

Z = exp(−E(v,h))v,h∑

Page 16: Mark Glickman and Pavlos Protopapas

•  Conditionaldistributionshavesimpleform

•  EfficientMCMC:blockGibbssampling– AlternativelysamplefromP(h|v)andP(v|h)

P hj =1 v( ) = σ c j + (WTv) j( )P vj =1 h( ) = σ b j + (Wh) j( )

Sigmoid

TrainingRBMs

Page 17: Mark Glickman and Pavlos Protopapas

DeepBeliefNetworks

•  Multiplehiddenlayers

Undirectedconnectionsbetweenlasttwolayers

IntroductionofDBNsin2006beganthecurrentdeeplearningrenaissance

Page 18: Mark Glickman and Pavlos Protopapas

DeepBeliefNetworks

h(K)

h(K-1)

h(1)

v

h(2)…

P vi =1 h(1)( ) = σ bi(1) +wi

(1) •h(1)( )

P hi(1) =1 h(2)( ) = σ bi

(2) +wi(2) •h(2)( )

P(h(K−1),h(K ) ) = 1Z

exp E(h(K−1),h(K ) )( )…

Page 19: Mark Glickman and Pavlos Protopapas

Layer-wiseDBNTraining

h(K)

h(K-1)

h(1)

v

h(2)…

Ev~pdatalog pθ (v)

TrainanRBMtomaximize

Page 20: Mark Glickman and Pavlos Protopapas

Layer-wiseDBNTraining

h(K)

h(K-1)

h(1)

v

h(2)…

Ev~pdataEh(1)~p h(1) v( ) log pθ1 (h

(1) )

TrainanRBMtomaximize

Page 21: Mark Glickman and Pavlos Protopapas

Layer-wiseDBNTraining

h(K)

h(K-1)

h(1)

v

h(2)…

Ev~pdata!E

h(K )~p h(K ) h(K−1)( )log pθK (h

(K ) )

TrainanRBMtomaximize

Page 22: Mark Glickman and Pavlos Protopapas

Outline

•  ExplicitDensityModels– BoltzmannMachines&DeepBeliefNetworks(DBN)– VariationalAuto-encoders(VAE)

•  ImplicitDensityModels– GenerativeAdversarialNetworks(GAN)

Page 23: Mark Glickman and Pavlos Protopapas

Recap:Autoencoder

Encoder Decoderz

Page 24: Mark Glickman and Pavlos Protopapas

VariationalAutoencoder

•  Easiertotrainusinggradient-basedmethods•  VAEsampling:

DecoderP(x|z)z

SamplefromP(z)StandardGaussian

Page 25: Mark Glickman and Pavlos Protopapas

VAELikelihood

pθ (x) = pθ x z( )z∫ pθ (z)dz

Difficulttoapproximateinhighdimthroughsampling

Formostzvaluesp(x|z)closeto0

Neuralnetwork

z ~N (0, I )

Page 26: Mark Glickman and Pavlos Protopapas

VAELikelihood

pθ (x) = pθ x z( )z∫ qφ z x( )dz

Proposaldistribution:Likelytoproducevaluesofxforwhichp(x|z)isnon-zero

Anotherneuralnet z

φ

Page 27: Mark Glickman and Pavlos Protopapas

VAEArchitecture

Encoder

Decoderpθ(x|z)

z

Mean µ

SD σ

Sample from

N (µ,σ )

qϕ(z|x)

Page 28: Mark Glickman and Pavlos Protopapas

VAELoss

−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss

ProposaldistributionshouldresembleaGaussian

Page 29: Mark Glickman and Pavlos Protopapas

VAELoss

−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss

ProposaldistributionshouldresembleaGaussian

Page 30: Mark Glickman and Pavlos Protopapas

VAELoss

−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss

ProposaldistributionshouldresembleaGaussian

≥ − log pθ (x)

Variationalupperboundonlosswecareabout!

Page 31: Mark Glickman and Pavlos Protopapas

TrainingVAE

•  Applystochasticgradientdescent•  Samplingstepnotdifferentiable•  Useare-parameterizationtrick– Movesamplingtoinputlayer,sothatthesamplingstepisindependentofthemodel

Page 32: Mark Glickman and Pavlos Protopapas

BoltzmannMachinesvs.VAE

Pros:•  VAEiseasiertotrain•  DoesnotrequireMCMCsampling•  Hastheoreticalbacking•  Applicabletowiderrangeofmodels

Cons:•  SamplesfromVAEtendtobeblurry

Page 33: Mark Glickman and Pavlos Protopapas

KingmaandWelling,2014

Page 34: Mark Glickman and Pavlos Protopapas

Outline

•  ExplicitDensityModels– BoltzmannMachines&DeepBeliefNetworks(DBN)– VariationalAuto-encoders(VAE)

•  ImplicitDensityModels– GenerativeAdversarialNetworks(GAN)

Page 35: Mark Glickman and Pavlos Protopapas

GenerativeAdversarialNetworks

•  NoMarkovchainsneeded•  Novariationalapproximationsneeded•  Oftenregardedasproducingbetterexamples

Page 36: Mark Glickman and Pavlos Protopapas

TwoPlayerGame

GeneratorG

Seekstocreatesamplesfrompdata

DiscriminatorD

Classifiessamplesasrealorfake

Seekstotrickdiscriminatorintobelievingthatitssamples

aregenuine

Seekstonotgettrickedbythegenerator

Page 37: Mark Glickman and Pavlos Protopapas

GANOverview

GeneratorG

DiscriminatorD

z

Realor

Fake

Trainingsample

Codedrawnfromapriordistribution

Page 38: Mark Glickman and Pavlos Protopapas

Min-maxCost

V (θ D,θG ) = Ex~pdatalogD(x) + Ez log(1−D(G(z)))

Discriminatorseekstopredict1ontraining

samples

Discriminatorwantstopredict0onsamples

generatedbyG

GeneratorwantsDtonotdistinguishbetweenoriginalandgeneratedsamples!

J (G ) =V (θ (D),θ (G ) ) J (D) = −V (θ (D),θ (G ) )

Page 39: Mark Glickman and Pavlos Protopapas

Min-maxCost

•  Equilibriumisasaddle-point•  Trainingisdifficultinpractice

minθGmaxθD

J(θG,θ D )

Page 40: Mark Glickman and Pavlos Protopapas

TrainingGANs

•  Samplemini-batchoftrainingimagesx,andgeneratorcodesz

•  UpdateGusingback-prop•  UpdateDusingback-prop

Optional:Runkstepsofoneplayerforeverystepoftheotherplayer

Page 41: Mark Glickman and Pavlos Protopapas

Radfordetal.,2015