infogan: interpretable representation learning by information maximizing generative adversarial nets

Post on 08-Feb-2017

285 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial

NetsXi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI)

Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo)

Unsupervised learning of disentangled representationsGoal

GANs + Maximizing Mutual Information between generated images and input codes

Approach

BenefitInterpretable representation obtainedwithout supervision and substantial additional costs

Reference https://arxiv.org/abs/1606.03657 (with Appendix sections)Implementationshttps://github.com/openai/InfoGAN (by the authors, with TensorFlow)https://github.com/yoshum/InfoGAN (by the presenter, with Chainer)

NIPS2016 読み会

MotivationHow can we achieveunsupervised learning of disentangled representation?

In general, learned representation is entangled, i.e. encoded in a data space in a complicated manner

When a representation is disentangled, it would be more interpretable and easier to apply to tasks

Related works • Unsupervised learning of representation

(no mechanism to force disentanglement)Stacked (often denoising) autoencoder, RBMMany others, including semi-supervised approach

• Supervised learning of disentangled representationBilinear models, multi-view perceptronVAEs, adversarial autoencoders

• Weakly supervised learning of disentangled representationdisBM, DC-IGN

• Unsupervised learning of disentangled representationhossRBM, applicable only to discrete latent factors

which the presenter has almost no knowledge about.

This work: Unsupervised learning of disentangled representation applicable to both continuous and discrete latent factors

Generative Adversarial Nets(GANs)Generative model trained by competition between two neural nets:Generator

: an arbitrary noise distributionDiscriminator :

probability that is sampled from the data dist. rather than generated by the generator

whereOptimization problem to solve:

Problems with GANsFrom the perspective of representation learning:No restrictions on how uses • can be used in a highly entangled way• Each dimension of does not represent

any salient feature of the training data

𝑧1

𝑧 2

𝐺 (𝑧 )

𝐺 (𝑧 )

Proposed Resolution: InfoGAN -Maximizing Mutual Information -Observation in conventional GANs:a generated date does not have much information on the noise from which is generatedbecause of heavily entangled use of

Proposed resolution = InfoGAN:the generator trained so that it maximize the mutual information between the latent code and the generated data

min𝐺max𝐷

{𝑉 GAN (𝐺 ,𝐷 )−𝜆 𝐼 (𝐶|𝑋=𝐺 (𝑍 ,𝐶 ) )}

Mutual Information where• :

Entropy of the prior distribution • :

Entropy of the posterior distribution

𝑝 (𝑋=𝑥 )

𝑥

𝑝 (𝑋=𝑥∨𝑌=𝑦 )

𝑥

𝑝 (𝑋=𝑥∨𝑌=𝑦 )

𝑥𝐼 ( 𝑋 ;𝑌 )=0 𝐼 ( 𝑋 ;𝑌 )>0

Sampling

Avoiding increase of calculation costsMajor difficulty: Evaluation of based on evaluation and sampling from the posterior

Two strategies:Variational maximization of mutual information

Use an approximate function Sharing the neural net

between and the discriminator

Variational Maximization of MIFor an arbitrary function ,𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑝 (𝐶=𝑐|𝑋=𝑥 ) ]

( positivity of KL divergence)

¿𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]+𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln 𝑝 (𝐶=𝑐|𝑋=𝑥 )𝑄 (𝑐 , 𝑥 ) ]

¿𝐸𝑥∼𝑝𝐺( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]+𝐸𝑥∼𝑝𝐺 ( 𝑋 ) [𝐷KL (𝑝 (𝐶|𝑋=𝑥 )∨¿𝑄 (𝐶 ,𝑥 ) ) ]

≥𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]

Variational Maximization of MI

Maximizing w.r.t. and

With approximating , we obtain an variational estimate of the mutual information:

𝐿𝐼 (𝐺 ,𝑄 )≡𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]+𝐻 (𝐶 )

⇔ • Achieving the equality by setting • Maximizing the mutual information

min𝐺 ,𝑄max𝐷

{𝑉 GAN (𝐺 ,𝐷 )− 𝜆𝐿𝐼 (𝐺 ,𝑄 ) }Optimization problem to solve in InfoGAN:

Eliminate sampling from posteriorLemma𝐸𝑥∼𝑝 ( 𝑋 ) , 𝑦∼ 𝑝 (𝑌|𝑋=𝑥 ¿¿ [ 𝑓 (𝑥 , 𝑦 ) ]=𝐸𝑥∼ 𝑝 ( 𝑋 ) , 𝑦∼𝑝 (𝑌|𝑋=𝑥¿ ,𝑥

′∼𝑝 (𝑋 ′|𝑌=𝑦 ¿¿ [ 𝑓 (𝑥 ′ , 𝑦 ) ].

𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]By using this lemma and noting that

𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]=𝑬𝒄∼𝒑 (𝑪 ) , 𝒛∼𝒑 𝒛 (𝒁 ) ,𝒙=𝑮 (𝒛 ,𝒄 ) [ 𝐥𝐧𝑸 (𝒄 , 𝒙 ) ]we can eliminate the sampling from :

Easy to estimate!

Proof of lemmaLemma𝐸𝑥∼𝑝 ( 𝑋 ) , 𝑦∼ 𝑝 (𝑌|𝑋=𝑥 ¿¿ [ 𝑓 (𝑥 , 𝑦 ) ]=𝐸𝑥∼ 𝑝 ( 𝑋 ) , 𝑦∼𝑝 (𝑌|𝑋=𝑥¿ ,𝑥

′∼𝑝 (𝑋 ′|𝑌=𝑦 ¿¿ [ 𝑓 (𝑥 ′ , 𝑦 ) ].

∵ l . h . s .=∫𝑥∫𝑦𝑝 ( 𝑋=𝑥 )𝑝 (𝑌=𝑦|𝑋=𝑥 ) 𝑓 (𝑥 , 𝑦 ) Bayes’ theorem

Sharing layers between and Model using neural networkReduce the calculation costs by

sharing all the convolution layers with

Image from Odena, et al., arXiv:1610.09585.

Convolution layers of the discriminator

𝐷 𝑄

Given DCGANs, InfoGAN comes for negligible additional costs!

Experiment – MI Maximization

• InfoGAN on MNIST dataset• Latent code

= 10-class categorical code

quickly saturates to in InfoGAN

Figure 1 in the original paper

Experiment – Disentangled Representation –

Figure 2 in the original paper

• InfoGAN on MNIST dataset• Latent codes

: 10-class categorical code : continuous code

can be used as a classifier with 5% error rate.

and captured the rotation and width, respectively

Experiment – Disentangled Representation –

Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301.

Figure 3 in the original paper

Experiment – Disentangled Representation –

Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769.

InfoGAN learned salient features without supervisionFigure 4 in the original paper

Experiment – Disentangled Representation –

Dataset: Street View House Number

Figure 5 in the original paper

Experiment – Disentangled Representation –Dataset: CelebA

Figure 6 in the original paper

Future Prospect and ConclusionMutual information maximization can be applied

to other methods, e.g. VAELearning hierarchical latent representationImproving semi-supervised learning High-dimentional data discovery

Unsupervised learning of disentangled representationsGoal

GANs + Maximizing Mutual Information between generated images and input codes

Approach

BenefitInterpretable representation obtainedwithout supervision and substantial additional costs

top related