mark glickman and pavlos protopapas

Lecture20:DeepGenerativeModelsCS109B,STAT121B,AC209B,CSE109B

MarkGlickmanandPavlosProtopapas

Thislectureisbasedonthefollowingtutorials:I.Goodfellow,GenerativeAdversarialNetworks(GANs),NIPS2016TutorialShenlongWang,DeepGenerativeModels,http://www.cs.toronto.edu/~slwang/generative_model.pdf

GenerativeModelDensityEstimation

SampleGeneration

Whygenerativemodeling?

•  Modelinghigh-dimprobabilitydistributions•  Denoisingdamagedsamples•  SimulatefutureeventsforplanningandRL•  Imputingmissingdata– Semi-supervisedLearning

•  Multi-modaldata– Multiplecorrectanswers

NextVideoFramePrediction

Lotteretal.,2016

SingleImageSuper-resolution

•  Synthesizeahigh-resolutionequivalentfromalow-resolutionimage

Ledigetal.,2016

InteractiveImageGeneration

Zhuetal.,2016

Image-to-ImageTranslation

Isolaetal.,2016

TaxonomyofDeepGenerativeModels

DeepGenerativeModels

ExplicitDensity

MarkovChain

Variational

ImplicitDensity

MarkovChain

Direct

VariationalAuto-encoders(VAE)

BoltzmannMachinesDeepBeliefNetworks

GenerativeAdversarialNetworks(GAN)

GenerativeMomentMatchingNetworks

(GMM)

Outline

•  ExplicitDensityModels– BoltzmannMachines&DeepBeliefNetworks(DBN)– VariationalAuto-encoders(VAE)

•  ImplicitDensityModels– GenerativeAdversarialNetworks(GAN)

BoltzmannMachines

E(x) = −xTUx − bT x

x:d-dimensionalbinaryvectorEnergyfunctiongivenby:

P(x) = exp(−E(x))Z

Z = exp(−E(x))x∑

VisibleandHiddenUnits

E(v,h) = − vTRv− vTWh− hTSh− bTv− cTh


P(v,h) = exp(−E(v,h))Z

Z = exp(−E(v,h))v,h∑

TrainingBoltzmannMachines

•  Maximum-likelihoodestimation•  Partitionfunctionisintractable•  MaybeestimatedwithMCMCmethods

RestrictedBoltzmannMachines

•  Noconnectionsamonghiddenunits,andamongvisibleunits(bipartitegraph)

RestrictedBoltzmannMachines

E(v,h) = − bTv− cTh− vTWh


P(v,h) = exp(−E(v,h))Z

Z = exp(−E(v,h))v,h∑

•  Conditionaldistributionshavesimpleform

•  EfficientMCMC:blockGibbssampling– AlternativelysamplefromP(h|v)andP(v|h)

P hj =1 v( ) = σ c j + (WTv) j( )P vj =1 h( ) = σ b j + (Wh) j( )

Sigmoid

TrainingRBMs

DeepBeliefNetworks

•  Multiplehiddenlayers

Undirectedconnectionsbetweenlasttwolayers

IntroductionofDBNsin2006beganthecurrentdeeplearningrenaissance

DeepBeliefNetworks

h(K)

h(K-1)

h(1)

v

h(2)…

P vi =1 h(1)( ) = σ bi(1) +wi

(1) •h(1)( )

P hi(1) =1 h(2)( ) = σ bi

(2) +wi(2) •h(2)( )

P(h(K−1),h(K ) ) = 1Z

exp E(h(K−1),h(K ) )( )…

Layer-wiseDBNTraining

h(K)

h(K-1)

h(1)

v

h(2)…

Ev~pdatalog pθ (v)

TrainanRBMtomaximize


h(K)

h(K-1)

h(1)

v

h(2)…

Ev~pdataEh(1)~p h(1) v( ) log pθ1 (h

(1) )



h(K)

h(K-1)

h(1)

v

h(2)…

Ev~pdata!E

h(K )~p h(K ) h(K−1)( )log pθK (h

(K ) )


Outline



Recap:Autoencoder

Encoder Decoderz

VariationalAutoencoder

•  Easiertotrainusinggradient-basedmethods•  VAEsampling:

DecoderP(x|z)z

SamplefromP(z)StandardGaussian

VAELikelihood

pθ (x) = pθ x z( )z∫ pθ (z)dz

Difficulttoapproximateinhighdimthroughsampling

Formostzvaluesp(x|z)closeto0

Neuralnetwork

z ~N (0, I )

Xθ

VAELikelihood

pθ (x) = pθ x z( )z∫ qφ z x( )dz

Proposaldistribution:Likelytoproducevaluesofxforwhichp(x|z)isnon-zero

Xθ

Anotherneuralnet z

φ

VAEArchitecture

Encoder

Decoderpθ(x|z)

z

Mean µ

SD σ

Sample from

N (µ,σ )

qϕ(z|x)

VAELoss

−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss

ProposaldistributionshouldresembleaGaussian

VAELoss

−Ez~qφ z x( ) log pθ x z( )( ) + KL qφ z x( ) pθ (z)( )ReconstructionLoss

ProposaldistributionshouldresembleaGaussian

≥ − log pθ (x)

Variationalupperboundonlosswecareabout!

TrainingVAE

•  Applystochasticgradientdescent•  Samplingstepnotdifferentiable•  Useare-parameterizationtrick– Movesamplingtoinputlayer,sothatthesamplingstepisindependentofthemodel

BoltzmannMachinesvs.VAE

Pros:•  VAEiseasiertotrain•  DoesnotrequireMCMCsampling•  Hastheoreticalbacking•  Applicabletowiderrangeofmodels

Cons:•  SamplesfromVAEtendtobeblurry

KingmaandWelling,2014

Outline



GenerativeAdversarialNetworks

•  NoMarkovchainsneeded•  Novariationalapproximationsneeded•  Oftenregardedasproducingbetterexamples

TwoPlayerGame

GeneratorG

Seekstocreatesamplesfrompdata

DiscriminatorD

Classifiessamplesasrealorfake

Seekstotrickdiscriminatorintobelievingthatitssamples

aregenuine

Seekstonotgettrickedbythegenerator

GANOverview

GeneratorG

DiscriminatorD

z

Realor

Fake

Trainingsample

Codedrawnfromapriordistribution

Min-maxCost

V (θ D,θG ) = Ex~pdatalogD(x) + Ez log(1−D(G(z)))

Discriminatorseekstopredict1ontraining

samples

Discriminatorwantstopredict0onsamples

generatedbyG

GeneratorwantsDtonotdistinguishbetweenoriginalandgeneratedsamples!

J (G ) =V (θ (D),θ (G ) ) J (D) = −V (θ (D),θ (G ) )

Min-maxCost

•  Equilibriumisasaddle-point•  Trainingisdifficultinpractice

minθGmaxθD

J(θG,θ D )

TrainingGANs

•  Samplemini-batchoftrainingimagesx,andgeneratorcodesz

•  UpdateGusingback-prop•  UpdateDusingback-prop

Optional:Runkstepsofoneplayerforeverystepoftheotherplayer

Radfordetal.,2015

mark glickman and pavlos protopapas

Documents