infogan: interpretable representation learning by information maximizing generative adversarial nets
TRANSCRIPT
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial
NetsXi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI)
Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo)
Unsupervised learning of disentangled representationsGoal
GANs + Maximizing Mutual Information between generated images and input codes
Approach
BenefitInterpretable representation obtainedwithout supervision and substantial additional costs
Reference https://arxiv.org/abs/1606.03657 (with Appendix sections)Implementationshttps://github.com/openai/InfoGAN (by the authors, with TensorFlow)https://github.com/yoshum/InfoGAN (by the presenter, with Chainer)
NIPS2016 読み会
MotivationHow can we achieveunsupervised learning of disentangled representation?
In general, learned representation is entangled, i.e. encoded in a data space in a complicated manner
When a representation is disentangled, it would be more interpretable and easier to apply to tasks
Related works • Unsupervised learning of representation
(no mechanism to force disentanglement)Stacked (often denoising) autoencoder, RBMMany others, including semi-supervised approach
• Supervised learning of disentangled representationBilinear models, multi-view perceptronVAEs, adversarial autoencoders
• Weakly supervised learning of disentangled representationdisBM, DC-IGN
• Unsupervised learning of disentangled representationhossRBM, applicable only to discrete latent factors
which the presenter has almost no knowledge about.
This work: Unsupervised learning of disentangled representation applicable to both continuous and discrete latent factors
Generative Adversarial Nets(GANs)Generative model trained by competition between two neural nets:Generator
: an arbitrary noise distributionDiscriminator :
probability that is sampled from the data dist. rather than generated by the generator
whereOptimization problem to solve:
Problems with GANsFrom the perspective of representation learning:No restrictions on how uses • can be used in a highly entangled way• Each dimension of does not represent
any salient feature of the training data
𝑧1
𝑧 2
𝐺 (𝑧 )
𝐺 (𝑧 )
Proposed Resolution: InfoGAN -Maximizing Mutual Information -Observation in conventional GANs:a generated date does not have much information on the noise from which is generatedbecause of heavily entangled use of
Proposed resolution = InfoGAN:the generator trained so that it maximize the mutual information between the latent code and the generated data
min𝐺max𝐷
{𝑉 GAN (𝐺 ,𝐷 )−𝜆 𝐼 (𝐶|𝑋=𝐺 (𝑍 ,𝐶 ) )}
Mutual Information where• :
Entropy of the prior distribution • :
Entropy of the posterior distribution
𝑝 (𝑋=𝑥 )
𝑥
𝑝 (𝑋=𝑥∨𝑌=𝑦 )
𝑥
𝑝 (𝑋=𝑥∨𝑌=𝑦 )
𝑥𝐼 ( 𝑋 ;𝑌 )=0 𝐼 ( 𝑋 ;𝑌 )>0
Sampling
Avoiding increase of calculation costsMajor difficulty: Evaluation of based on evaluation and sampling from the posterior
Two strategies:Variational maximization of mutual information
Use an approximate function Sharing the neural net
between and the discriminator
Variational Maximization of MIFor an arbitrary function ,𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑝 (𝐶=𝑐|𝑋=𝑥 ) ]
( positivity of KL divergence)
¿𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]+𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln 𝑝 (𝐶=𝑐|𝑋=𝑥 )𝑄 (𝑐 , 𝑥 ) ]
¿𝐸𝑥∼𝑝𝐺( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]+𝐸𝑥∼𝑝𝐺 ( 𝑋 ) [𝐷KL (𝑝 (𝐶|𝑋=𝑥 )∨¿𝑄 (𝐶 ,𝑥 ) ) ]
≥𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥) ]
Variational Maximization of MI
Maximizing w.r.t. and
With approximating , we obtain an variational estimate of the mutual information:
𝐿𝐼 (𝐺 ,𝑄 )≡𝐸𝑥∼𝑝𝐺 ( 𝑋 ) ,𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]+𝐻 (𝐶 )
⇔ • Achieving the equality by setting • Maximizing the mutual information
min𝐺 ,𝑄max𝐷
{𝑉 GAN (𝐺 ,𝐷 )− 𝜆𝐿𝐼 (𝐺 ,𝑄 ) }Optimization problem to solve in InfoGAN:
Eliminate sampling from posteriorLemma𝐸𝑥∼𝑝 ( 𝑋 ) , 𝑦∼ 𝑝 (𝑌|𝑋=𝑥 ¿¿ [ 𝑓 (𝑥 , 𝑦 ) ]=𝐸𝑥∼ 𝑝 ( 𝑋 ) , 𝑦∼𝑝 (𝑌|𝑋=𝑥¿ ,𝑥
′∼𝑝 (𝑋 ′|𝑌=𝑦 ¿¿ [ 𝑓 (𝑥 ′ , 𝑦 ) ].
𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]By using this lemma and noting that
𝐸𝑥∼𝑝𝐺 ( 𝑋 ) , 𝑐∼𝑝 (𝐶∨𝑋=𝑥 ) [ ln𝑄 (𝑐 ,𝑥 ) ]=𝑬𝒄∼𝒑 (𝑪 ) , 𝒛∼𝒑 𝒛 (𝒁 ) ,𝒙=𝑮 (𝒛 ,𝒄 ) [ 𝐥𝐧𝑸 (𝒄 , 𝒙 ) ]we can eliminate the sampling from :
Easy to estimate!
Proof of lemmaLemma𝐸𝑥∼𝑝 ( 𝑋 ) , 𝑦∼ 𝑝 (𝑌|𝑋=𝑥 ¿¿ [ 𝑓 (𝑥 , 𝑦 ) ]=𝐸𝑥∼ 𝑝 ( 𝑋 ) , 𝑦∼𝑝 (𝑌|𝑋=𝑥¿ ,𝑥
′∼𝑝 (𝑋 ′|𝑌=𝑦 ¿¿ [ 𝑓 (𝑥 ′ , 𝑦 ) ].
∵ l . h . s .=∫𝑥∫𝑦𝑝 ( 𝑋=𝑥 )𝑝 (𝑌=𝑦|𝑋=𝑥 ) 𝑓 (𝑥 , 𝑦 ) Bayes’ theorem
Sharing layers between and Model using neural networkReduce the calculation costs by
sharing all the convolution layers with
Image from Odena, et al., arXiv:1610.09585.
Convolution layers of the discriminator
𝐷 𝑄
Given DCGANs, InfoGAN comes for negligible additional costs!
Experiment – MI Maximization
• InfoGAN on MNIST dataset• Latent code
= 10-class categorical code
quickly saturates to in InfoGAN
Figure 1 in the original paper
Experiment – Disentangled Representation –
Figure 2 in the original paper
• InfoGAN on MNIST dataset• Latent codes
: 10-class categorical code : continuous code
can be used as a classifier with 5% error rate.
and captured the rotation and width, respectively
Experiment – Disentangled Representation –
Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301.
Figure 3 in the original paper
Experiment – Disentangled Representation –
Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769.
InfoGAN learned salient features without supervisionFigure 4 in the original paper
Experiment – Disentangled Representation –
Dataset: Street View House Number
Figure 5 in the original paper
Experiment – Disentangled Representation –Dataset: CelebA
Figure 6 in the original paper
Future Prospect and ConclusionMutual information maximization can be applied
to other methods, e.g. VAELearning hierarchical latent representationImproving semi-supervised learning High-dimentional data discovery
Unsupervised learning of disentangled representationsGoal
GANs + Maximizing Mutual Information between generated images and input codes
Approach
BenefitInterpretable representation obtainedwithout supervision and substantial additional costs