infogan: interpretable representation learning by information maximizing generative adversarial nets

20
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI) Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo) Unsupervised learning of disentangled representations Goal GANs + Maximizing Mutual Information between generated images and input codes Approach Benefit nterpretable representation obtained ithout supervision and substantial additional costs Reference https://arxiv.org/abs/1606.03657 (with Appendix sections) Implementations https://github.com/openai/InfoGAN (by the authors, with TensorFl https://github.com/yoshum/InfoGAN (by the presenter, with Chaine NIPS2016 読読読

Upload: shuhei-yoshida

Post on 08-Feb-2017

285 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial

NetsXi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI)

Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo)

Unsupervised learning of disentangled representationsGoal

GANs + Maximizing Mutual Information between generated images and input codes

Approach

BenefitInterpretable representation obtainedwithout supervision and substantial additional costs

Reference https://arxiv.org/abs/1606.03657 (with Appendix sections)Implementationshttps://github.com/openai/InfoGAN (by the authors, with TensorFlow)https://github.com/yoshum/InfoGAN (by the presenter, with Chainer)

NIPS2016 読み会

Page 2: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

MotivationHow can we achieveunsupervised learning of disentangled representation?

In general, learned representation is entangled, i.e. encoded in a data space in a complicated manner

When a representation is disentangled, it would be more interpretable and easier to apply to tasks

Page 3: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Related works • Unsupervised learning of representation

(no mechanism to force disentanglement)Stacked (often denoising) autoencoder, RBMMany others, including semi-supervised approach

• Supervised learning of disentangled representationBilinear models, multi-view perceptronVAEs, adversarial autoencoders

• Weakly supervised learning of disentangled representationdisBM, DC-IGN

• Unsupervised learning of disentangled representationhossRBM, applicable only to discrete latent factors

which the presenter has almost no knowledge about.

This work: Unsupervised learning of disentangled representation applicable to both continuous and discrete latent factors

Page 4: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Generative Adversarial Nets(GANs)Generative model trained by competition between two neural nets:Generator

: an arbitrary noise distributionDiscriminator :

probability that is sampled from the data dist. rather than generated by the generator

whereOptimization problem to solve:

Page 5: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Problems with GANsFrom the perspective of representation learning:No restrictions on how uses • can be used in a highly entangled way• Each dimension of does not represent

any salient feature of the training data

𝑧1

𝑧 2

𝐺 (𝑧 )

𝐺 (𝑧 )

Page 6: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Proposed Resolution: InfoGAN -Maximizing Mutual Information -Observation in conventional GANs:a generated date does not have much information on the noise from which is generatedbecause of heavily entangled use of

Proposed resolution = InfoGAN:the generator trained so that it maximize the mutual information between the latent code and the generated data

min𝐺max𝐷

{𝑉 GAN (𝐺 ,𝐷 )−𝜆 𝐼 (𝐶|𝑋=𝐺 (𝑍 ,𝐶 ) )}

Page 7: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Mutual Information where• :

Entropy of the prior distribution • :

Entropy of the posterior distribution

𝑝 (𝑋=𝑥 )

𝑥

𝑝 (𝑋=𝑥∨𝑌=𝑦 )

𝑥

𝑝 (𝑋=𝑥∨𝑌=𝑦 )

𝑥𝐼 ( 𝑋 ;𝑌 )=0 𝐼 ( 𝑋 ;𝑌 )>0

Sampling

Page 8: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Avoiding increase of calculation costsMajor difficulty: Evaluation of based on evaluation and sampling from the posterior

Two strategies:Variational maximization of mutual information

Use an approximate function Sharing the neural net

between and the discriminator

Page 9: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Variational Maximization of MIFor an arbitrary function ,𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑝 (𝐶=𝑐|𝑋=𝑥 ) ]

( positivity of KL divergence)

¿𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]+𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln 𝑝 (𝐶=𝑐|𝑋=𝑥 )𝑄 (𝑐 , 𝑥 ) ]

¿𝐸𝑥∼𝑝𝐺( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]+𝐸𝑥∼𝑝𝐺 ( 𝑋 ) [𝐷KL (𝑝 (𝐶|𝑋=𝑥 )∨¿𝑄 (𝐶 ,𝑥 ) ) ]

≥𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]

Page 10: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Variational Maximization of MI

Maximizing w.r.t. and

With approximating , we obtain an variational estimate of the mutual information:

𝐿𝐼 (𝐺 ,𝑄 )≡𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]+𝐻 (𝐶 )

⇔ • Achieving the equality by setting • Maximizing the mutual information

min𝐺 ,𝑄max𝐷

{𝑉 GAN (𝐺 ,𝐷 )− 𝜆𝐿𝐼 (𝐺 ,𝑄 ) }Optimization problem to solve in InfoGAN:

Page 11: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Eliminate sampling from posteriorLemma𝐸𝑥∼𝑝 ( 𝑋 ) , 𝑦∼ 𝑝 (𝑌|𝑋=𝑥 ¿¿ [ 𝑓 (𝑥 , 𝑦 ) ]=𝐸𝑥∼ 𝑝 ( 𝑋 ) , 𝑦∼𝑝 (𝑌|𝑋=𝑥¿ ,𝑥

′∼𝑝 (𝑋 ′|𝑌=𝑦 ¿¿ [ 𝑓 (𝑥 ′ , 𝑦 ) ].

𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]By using this lemma and noting that

𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]=𝑬𝒄∼𝒑 (𝑪 ) , 𝒛∼𝒑 𝒛 (𝒁 ) ,𝒙=𝑮 (𝒛 ,𝒄 ) [ 𝐥𝐧𝑸 (𝒄 , 𝒙 ) ]we can eliminate the sampling from :

Easy to estimate!

Page 12: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Proof of lemmaLemma𝐸𝑥∼𝑝 ( 𝑋 ) , 𝑦∼ 𝑝 (𝑌|𝑋=𝑥 ¿¿ [ 𝑓 (𝑥 , 𝑦 ) ]=𝐸𝑥∼ 𝑝 ( 𝑋 ) , 𝑦∼𝑝 (𝑌|𝑋=𝑥¿ ,𝑥

′∼𝑝 (𝑋 ′|𝑌=𝑦 ¿¿ [ 𝑓 (𝑥 ′ , 𝑦 ) ].

∵ l . h . s .=∫𝑥∫𝑦𝑝 ( 𝑋=𝑥 )𝑝 (𝑌=𝑦|𝑋=𝑥 ) 𝑓 (𝑥 , 𝑦 ) Bayes’ theorem

Page 13: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Sharing layers between and Model using neural networkReduce the calculation costs by

sharing all the convolution layers with

Image from Odena, et al., arXiv:1610.09585.

Convolution layers of the discriminator

𝐷 𝑄

Given DCGANs, InfoGAN comes for negligible additional costs!

Page 14: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Experiment – MI Maximization

• InfoGAN on MNIST dataset• Latent code

= 10-class categorical code

quickly saturates to in InfoGAN

Figure 1 in the original paper

Page 15: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Experiment – Disentangled Representation –

Figure 2 in the original paper

• InfoGAN on MNIST dataset• Latent codes

: 10-class categorical code : continuous code

can be used as a classifier with 5% error rate.

and captured the rotation and width, respectively

Page 16: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Experiment – Disentangled Representation –

Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301.

Figure 3 in the original paper

Page 17: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Experiment – Disentangled Representation –

Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769.

InfoGAN learned salient features without supervisionFigure 4 in the original paper

Page 18: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Experiment – Disentangled Representation –

Dataset: Street View House Number

Figure 5 in the original paper

Page 19: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Experiment – Disentangled Representation –Dataset: CelebA

Figure 6 in the original paper

Page 20: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Future Prospect and ConclusionMutual information maximization can be applied

to other methods, e.g. VAELearning hierarchical latent representationImproving semi-supervised learning High-dimentional data discovery

Unsupervised learning of disentangled representationsGoal

GANs + Maximizing Mutual Information between generated images and input codes

Approach

BenefitInterpretable representation obtainedwithout supervision and substantial additional costs