disentangling disentanglement in [-0.5ex] variational ...12-11-00)-12-11-35-4811... · emilemathieu...
TRANSCRIPT
![Page 1: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/1.jpg)
Disentangling Disentanglement inVariational AutoencodersICML 2019
Emile Mathieu?, Tom Rainforth?, N. Siddharth?, Yee Whye TehJune 12, 2019
Departments of Statistics and Engineering Science, University of Oxford
![Page 2: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/2.jpg)
Variational Autoencoders
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Modelzl(gender)
zm (beard)zn
(makeup)
Factors
1
![Page 3: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/3.jpg)
Disentanglement
= Independence
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Model
xixj
zl(gender)
zm (beard)zn
(makeup)
MeaningfulFactors
1
![Page 4: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/4.jpg)
Disentanglement = Independence
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Modelzl(shape)
zm (angle)zn
(scale)
IndependentFactors
1
![Page 5: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/5.jpg)
Decomposition ∈ {Independence, Clustering, Sparsity, …}
x1
x2
x3
x4
x5
x
z1
z2
z3
z4
GenerativeModel Inference
Modelzl(gender)
zm (beard)zn
(makeup)
Co-RelatedFactors
1
![Page 6: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/6.jpg)
Decomposition: A Generalization of Disentanglement
Characterise decomposition as the fulfilment of two factors:
(a) level of overlap between encodings in the latent space,(b) matching between the marginal posterior qφ(z) and structured
prior p(z) to constrain with the required decomposition.
2
![Page 7: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/7.jpg)
Decomposition: An Analysis
Desired StructureTargetStructure
p(z)
3
![Page 8: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/8.jpg)
Decomposition: An Analysis
Insufficient Overlap
Insu�cient
Overlap
q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)
3
![Page 9: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/9.jpg)
Decomposition: An Analysis
Too Much Overlap
q�(z|x) p✓(x|z)
Too Much
Overlap
pD(x) q�(z) p(z) p✓(x)
3
![Page 10: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/10.jpg)
Decomposition: An Analysis
Appropriate Overlap
Appropriate
Overlap
q�(z|x) p✓(x|z)pD(x) q�(z) p(z) p✓(x)
3
![Page 11: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/11.jpg)
Overlap — Deconstructing the β-VAE
Lβ(x) = Eqφ(z|x)[logpθ(x|z)]− β · KL(qφ(z|x)||p(z))= L(x) (πθ,β ,qφ)︸ ︷︷ ︸
ELBO with β-annealed prior
+(β − 1) · Hqφ︸ ︷︷ ︸maxent
+ log Fβ︸ ︷︷ ︸constant
Implicationsβ-VAE disentangles largely by controlling the level of overlapIt places no direct pressure on the latents to be independent!
4
![Page 12: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/12.jpg)
Decomposition: Objective
Lα,β(x) = Eqφ(z|x)[logpθ(x | z)] Reconstruct observations
− β · KL(qφ(z | x) ‖ p(z)) Control level of overlap
− α · D(qφ(z),p(z)) Impose desired structure
5
![Page 13: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/13.jpg)
Decomposition: Generalising Disentanglement
Independence: p(z) = N (0,σ?)
Figure 1: β-VAE trained on 2D Shapes1 computing disentanglement2.
1Matthey et al., dSprites: Disentanglement testing Sprites dataset, p. 1.2Kim and Mnih, “Disentangling by Factorising”, p. 2.
6
![Page 14: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/14.jpg)
Decomposition: Generalising Disentanglement
Clustering: p(z) = ∑k ρk · N (µk,σk)
β = 0.01 β = 0.5 β = 1.0 β = 1.2
α=0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
β=0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
α = 1 α = 3 α = 5 α = 8
Figure 2: Density of aggregate posterior qφ(z) with different α, β for thepinwheel dataset.3
3http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab. 7
![Page 15: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/15.jpg)
Decomposition: Generalising Disentanglement
Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)
0 5 10 15 20 25 30 35 40 45Latent dimension
0.0
0.2
0.4
0.6
Avg.
late
nt m
agni
tude
TrouserDressShirt
Figure 3: Sparsity of learnt representations for the Fashion-MNIST4 dataset.
4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
8
![Page 16: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/16.jpg)
Decomposition: Generalising Disentanglement
Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)
(a) d = 49 (b) d = 30 (c) d = 19 (d) d = 40leg separation dress width shirt fit sleeve style
Figure 3: Latent space traversals for “active” dimensions4.
4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
8
![Page 17: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/17.jpg)
Decomposition: Generalising Disentanglement
Sparsity: p(z) = ∏d (1− γ) · N (zd; 0, 1) + γ · N (zd; 0, σ20)
0 200 400 600 800 1000alpha
0.2
0.3
0.4
0.5Av
g. N
orm
alise
d Sp
arsit
y
γ= 0, β= 0.1γ= 0.8, β= 0.1
γ= 0, β= 1γ= 0.8, β= 1
γ= 0, β= 5γ= 0.8, β= 5
Figure 3: Sparsity vs regularisation strength α (higher better)4.
4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
8
![Page 18: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/18.jpg)
Recap
We propose and develop:
• Decomposition: a generalisation of disentanglement involving:(a) overlap of latent encodings(b) match between qφ(z) and p(z)
• A theoretical analysis of the β-VAE objective showing it primarilyonly contributes to overlap.
• An objective that incorporates both factors (a) and (b).• Experiments that showcase efficacy at different decompositions:• independence • clustering • sparsity
9
![Page 19: Disentangling Disentanglement in [-0.5ex] Variational ...12-11-00)-12-11-35-4811... · EmileMathieu TomRainforth N.Siddharth YeeWhyeTeh Code Paper iffsid/disentangling-disentanglement](https://reader034.vdocuments.us/reader034/viewer/2022043014/5fb2a54fe5d4ce1e5f7eb024/html5/thumbnails/19.jpg)
Emile Mathieu Tom Rainforth N. Siddharth Yee Whye Teh
Code Paper
iffsid/disentangling-disentanglementarXiv:1812.02833
Come talk to us at our poster: #59