alternative priors for deep generative modelsenalisni/nalisnick_openai_talk.pdfalternative priors...
TRANSCRIPT
![Page 1: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/1.jpg)
Alternative Priors for Deep Generative Models
Eric NalisnickUniversity of California, Irvine
In collaboration with
Padhraic Smyth
![Page 2: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/2.jpg)
Motivation: Overview of research on deep generative models, and why we should consider priors different than the current ones.
Non-Parametric Priors: Variational Autoencoders with infinite dimensional latent spaces via Dirichlet Process priors
Objective Priors: Black-box learning of invariant priors with an application to Variational Autoencoders.
1
2
3
Outline
![Page 3: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/3.jpg)
(Kingma & Welling, 2014), (Rezende et al., 2014), (MacKay & Gibbs, 1997)
The Variational Autoencoder (VAE)
![Page 4: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/4.jpg)
![Page 5: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/5.jpg)
Inference ModelsRegression (Salimans & Knowles, 2014)Neural Networks (Kingma & Welling, 2014) (Rezende et al., 2014)Gaussian Processes (Tran et al., 2016)
Approximations via TransformationNormalizing Flows (Rezende & Mohamed, 2015)Hamiltonian Flow (Salimans et al, 2015)Inv. Auto-Regressive (Kingma et al., 2016)
Implicit Posterior ApproximationsStein Particle Descent (Liu & Wang, 2016)Operator VI (Ranganath et al., 2016)Adversarial VB (Mescheder et al., 2017)
![Page 6: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/6.jpg)
Direct Estimation of Model EvidenceImportance Sampling (Burda et al., 2015)Random Projections (Grover & Ermon, 2016)
Other Divergence MeasuresAlpha (Hernandez-Lobato et al., 2016)Renyi (Li & Turner, 2016)Stein (Ranganath et al., 2016)
Inference ModelsRegression (Salimans & Knowles, 2014)Neural Networks (Kingma & Welling, 2014) (Rezende et al., 2014)Gaussian Processes (Tran et al., 2016)
Approximations via TransformationNormalizing Flows (Rezende & Mohamed, 2015)Hamiltonian Flow (Salimans et al, 2015)Inv. Auto-Regressive (Kingma et al., 2016)
Implicit Posterior ApproximationsStein Particle Descent (Liu & Wang, 2016)Operator VI (Ranganath et al., 2016)Adversarial VB (Mescheder et al., 2017)
![Page 7: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/7.jpg)
Empirical PriorsLearned Auto-Regressive Prior (Chen et al., 2016)
Direct Estimation of Model EvidenceImportance Sampling (Burda et al., 2015)Random Projections (Grover & Ermon, 2016)
Other Divergence MeasuresAlpha (Hernandez-Lobato et al., 2016)Renyi (Li & Turner, 2016)Stein (Ranganath et al., 2016)
Inference ModelsRegression (Salimans & Knowles, 2014)Neural Networks (Kingma & Welling, 2014) (Rezende et al., 2014)Gaussian Processes (Tran et al., 2016)
Approximations via TransformationNormalizing Flows (Rezende & Mohamed, 2015)Hamiltonian Flow (Salimans et al, 2015)Inv. Auto-Regressive (Kingma et al., 2016)
Implicit Posterior ApproximationsStein Particle Descent (Liu & Wang, 2016)Operator VI (Ranganath et al., 2016)Adversarial VB (Mescheder et al., 2017)
![Page 8: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/8.jpg)
“…to improve our variational bounds we should improve our priors and not just the encoder and decoder….perhaps we should investigate multimodal priors…”
M. Hoffman & M. Johnson. “ELBO Surgery”. NIPS 2016 Workshop on Advances in Approx. Bayesian Inference.
![Page 9: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/9.jpg)
Part 1: Non-Parametric Priors
1. E. Nalisnick and P. Smyth. “Stick-Breaking Variational Autoencoders”. ICLR 2017.2. E. Nalisnick, Lars Hertel, and P. Smyth. “Approximate Inference for Deep Latent Gaussian Mixtures”. NIPS 2016 Workshop on Bayesian Deep Learning.
Publications
![Page 10: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/10.jpg)
The Dirichlet Process
![Page 11: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/11.jpg)
The Dirichlet Process
Two Obstacles:1. Gradients through 2. Gradients thought samples from
![Page 12: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/12.jpg)
Obstacle: The Beta distribution does not have a non-centered parametrization (except in special cases)
Kumaraswamy Distribution: A Beta-like distribution with a closed-form inverse CDF. Use as variational posterior.
Poondi Kumaraswamy (1930-1988)
Differentiating Through πk
![Page 13: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/13.jpg)
Obstacle: Not obvious how to use the reparametrization trick for samples from a mixture distribution.
Stochastic Backpropagation through Mixtures
Brute Force Solution: Summing over all components results in a tractable ELBO but requires O(KS) decoder propagations.
![Page 14: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/14.jpg)
MODEL #1: Stick-Breaking Variational Autoencoder
Kumaraswamy Samples
Truncated posterior; not necessary but learns faster
(Nalisnick & Smyth, 2017)
![Page 15: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/15.jpg)
Stick-Breaking VAE
Truncation level of 50 dimensions, Beta(1,5) Prior
Samples from Generative Model (Nalisnick & Smyth, 2017)
![Page 16: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/16.jpg)
Stick-Breaking VAE Gaussian VAE
50 dimensions, N(0,1) PriorTruncation level of 50 dimensions, Beta(1,5) Prior
Samples from Generative Model (Nalisnick & Smyth, 2017)
![Page 17: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/17.jpg)
Uns
uper
vise
dSe
mi-
Supe
rvis
ed
MNIST: Dirichlet Latent Space (t-SNE)
MNIST: Gaussian Latent Space (t-SNE)
MNIST: kNN Classifier on Latent Space
Quantitative Results for SB-VAE (Nalisnick & Smyth, 2017)
(Estimated) Marginal Likelihoods
![Page 18: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/18.jpg)
Uns
uper
vise
dSe
mi-
Supe
rvis
ed
MNIST: Dirichlet Latent Space (t-SNE)
MNIST: Gaussian Latent Space (t-SNE)
MNIST: kNN Classifier on Latent Space
Nonparametric version of (Kingma et al., NIPS 2014)’s M2 model
Quantitative Results for SB-VAE (Nalisnick & Smyth, 2017)
(Estimated) Marginal Likelihoods
![Page 19: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/19.jpg)
MODEL #2: Dirichlet Process Variational Autoencoder (Nalisnick et al., 2016)
![Page 20: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/20.jpg)
MNIST Samples from Component #1
MNIST Samples from Component #5
(Nalisnick et al., 2016)
Samples from Generative Model
![Page 21: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/21.jpg)
MNIST: Dirichlet Process Latent Space (t-SNE)
MNIST: Gaussian Latent Space (t-SNE)
MNIST: kNN Classifier on Latent Space
Quantitative Results for DP-VAE (Nalisnick et al., 2016)
(Estimated) Marginal Likelihoods
![Page 22: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/22.jpg)
Part 2: Objective Priors
1. E. Nalisnick and P. Smyth. “Variational Reference Priors”. In submission to Workshop Track, ICLR 2017.
Publications
![Page 23: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/23.jpg)
Objective Priors
Jeffreys Priors: Uninformative prior invariant to reparametrization. Represents a state of ‘ignorance’ in prior beliefs.
![Page 24: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/24.jpg)
Objective Priors
Reference Priors (Bernardo, 1979): Generalize the notion of an objective prior to the following definition:
They are also invariant to reparametrization, and equal to the Jeffreys prior in one-dimension. Hard to solve for analytically.
![Page 25: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/25.jpg)
Variational Reference Priors
Can re-write the Reference prior objective as:
![Page 26: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/26.jpg)
Variational Reference Priors
Can re-write the Reference prior objective as:
Pick a functional family and optimize the parameters:
![Page 27: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/27.jpg)
Variational Lowerbound
Marginal likelihood makes the objective intractable in most cases so we derive a lowerbound:
![Page 28: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/28.jpg)
Variational Lowerbound
Marginal likelihood makes the objective intractable in most cases so we derive a lowerbound:
Using the Renyi bound (Li and Turner, 2016):
![Page 29: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/29.jpg)
Variational Lowerbound
Marginal likelihood makes the objective intractable in most cases so we derive a lowerbound:
Using the Renyi bound (Li and Turner, 2016):
Final variational objective:
![Page 30: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/30.jpg)
Types of Approximations
Parametric: Pick some known distribution with the proper support, preferably that can be sampled via a non-centered parametrization.
1
![Page 31: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/31.jpg)
Types of Approximations
Parametric: Pick some known distribution with the proper support, preferably that can be sampled via a non-centered parametrization.
Implicit Prior: Notice that the variational objective doesn’t require the prior be evaluated, just sampled from. Thus we can use an arbitrary transformation.
1
2
![Page 32: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/32.jpg)
Recovering Jeffreys Priors
Bernoulli Parameter Gaussian Scale Parameter
Poisson Parameter
![Page 33: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/33.jpg)
Finding the VAE’s Reference Prior
Implicit Prior VAE Decoder
![Page 34: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/34.jpg)
Finding the VAE’s Reference Prior
![Page 35: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/35.jpg)
Superior Latent Spaces with NP priors: Seem to preserve class structure well, resulting in better discriminative properties. Their dynamic capacity naturally encodes factors of variation.
1
Conclusions
![Page 36: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/36.jpg)
Superior Latent Spaces with NP priors: Seem to preserve class structure well, resulting in better discriminative properties. Their dynamic capacity naturally encodes factors of variation.
Extra Computation: Extra computation and the imposed model structure may be undesirable.
1
2
Conclusions
![Page 37: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/37.jpg)
Superior Latent Spaces with NP priors: Seem to preserve class structure well, resulting in better discriminative properties. Their dynamic capacity naturally encodes factors of variation.
Extra Computation: Extra computation and the imposed model structure may be undesirable.
Subjectivity in Priors: Are arbitrary priors affecting our posteriors? Maybe. More analysis needed.
1
2
3
Conclusions
![Page 38: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/38.jpg)
Stick-Breaking for Adaptive Computation: Use stick-breaking to determine the number of computation steps, such as RNN recursions. Preliminary results suggest SOTA on language modeling.
1
Future Work
![Page 39: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/39.jpg)
Stick-Breaking for Adaptive Computation: Use stick-breaking to determine the number of computation steps, such as RNN recursions. Preliminary results suggest SOTA on language modeling.
Learning Robust Priors: Can we minimize the mutual information objective to learn robust priors? Reasons to believe this would (approximately) encode Dropout as a Bayesian prior.
1
2
Future Work
![Page 40: Alternative Priors for Deep Generative Modelsenalisni/nalisnick_openAI_talk.pdfAlternative Priors for Deep Generative Models Eric Nalisnick University of California, Irvine In collaboration](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5ad087de0b738f91a7ce4/html5/thumbnails/40.jpg)
Thank you. Questions?
Papers and code at: ics.uci.edu/~enalisni/