deep generative modeling: implicit models
TRANSCRIPT
![Page 1: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/1.jpg)
Jakub M. Tomczak Deep Learning
Deep generative modeling: Implicit models
![Page 2: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/2.jpg)
TYPES OF GENERATIVE MODELS
2
Generative model
Autoregressive (e.g., PixelCNN)
Implicit models (e.g., GANs)
Prescribed models (e.g., VAE)
Latent variable models
Flow-based (e.g., RealNVP, GLOW)
![Page 3: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/3.jpg)
GENERATIVE MODELS
3
Training Likelihood Sampling Compression
Autoregressive models (e.g., PixelCNN) Stable Exact Slow No
Flow-based models(e.g., RealNVP) Stable Exact Fast/Slow No
Implicit models(e.g., GANs) Unstable No Fast No
Prescribed models(e.g., VAEs) Stable Approximate Fast Yes
![Page 4: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/4.jpg)
GENERATIVE MODELS
4
Training Likelihood Sampling Compression
Autoregressive models (e.g., PixelCNN) Stable Exact Slow No
Flow-based models(e.g., RealNVP) Stable Exact Fast/Slow No
Implicit models(e.g., GANs) Unstable No Fast No
Prescribed models(e.g., VAEs) Stable Approximate Fast Yes
![Page 5: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/5.jpg)
DENSITY NETWORKS
5
![Page 6: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/6.jpg)
Generative process:
1.
2.
The log-likelihood function:
z ∼ pλ(z)
x ∼ pθ(x ∣ z)
log p(x) = log∫ pθ(x ∣ z)pλ(z)dz
GENERATIVE MODELING WITH LATENT VARIABLES
6
![Page 7: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/7.jpg)
Generative process:
1.
2.
The log-likelihood function:
z ∼ pλ(z)
x ∼ pθ(x ∣ z)
log p(x) = log∫ pθ(x ∣ z)pλ(z)dz
≈ log1S
S
∑s=1
exp (log pθ (x ∣ zs))
GENERATIVE MODELING WITH LATENT VARIABLES
7
It could be estimated by MC samples.
![Page 8: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/8.jpg)
Generative process:
1.
2.
The log-likelihood function:
z ∼ pλ(z)
x ∼ pθ(x ∣ z)
log p(x) = log∫ pθ(x ∣ z)pλ(z)dz
≈ log1S
S
∑s=1
exp (log pθ (x ∣ zs))
GENERATIVE MODELING WITH LATENT VARIABLES
8
It could be estimated by MC samples. If we take standard Gaussian prior,
we need to model p(x|z) only!
![Page 9: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/9.jpg)
DENSITY NETWORK
9MacKay, D. J., & Gibbs, M. N. (1999). Density networks. Statistics and neural networks: advances at the interface. Oxford
Let’s consider a function f that transforms z to x.
![Page 10: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/10.jpg)
DENSITY NETWORK
10
It must be a powerful (=flexible) transformation!
MacKay, D. J., & Gibbs, M. N. (1999). Density networks. Statistics and neural networks: advances at the interface. Oxford
Let’s consider a function f that transforms z to x.
![Page 11: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/11.jpg)
DENSITY NETWORK
11
It must be a powerful (=flexible) transformation! NEURAL NETWORK 💪
MacKay, D. J., & Gibbs, M. N. (1999). Density networks. Statistics and neural networks: advances at the interface. Oxford
Let’s consider a function f that transforms z to x.
![Page 12: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/12.jpg)
DENSITY NETWORK
12
Neural network outputs parameters of a distribution, e.g., a mixture of Gaussians.
MacKay, D. J., & Gibbs, M. N. (1999). Density networks. Statistics and neural networks: advances at the interface. Oxford
Let’s consider a function f that transforms z to x.
![Page 13: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/13.jpg)
The log-likelihood function:
Training procedure:
1. Sample multiple z’s from the prior (e.g., standard Gaussian).
2. Apply log-sum-exp-trick, and apply backpropagation.
log p(x) = log∫ pθ(x ∣ z)pλ(z)dz
≈ log1S
S
∑s=1
exp (log pθ (x ∣ zs))
DENSITY NETWORKS
13
![Page 14: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/14.jpg)
The log-likelihood function:
Training procedure:
1. Sample multiple z’s from the prior (e.g., standard Gaussian).
2. Apply log-sum-exp-trick, and apply backpropagation.
log p(x) = log∫ pθ(x ∣ z)pλ(z)dz
≈ log1S
S
∑s=1
exp (log pθ (x ∣ zs))
DENSITY NETWORKS
14Drawback: It scales badly in high-dimensional cases…
![Page 15: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/15.jpg)
Advantages
✓Non-linear transformations.
✓Allows to generate.
DENSITY NETWORKS
15
Disadvantages
- No analytical solutions.
- No exact likelihood.
- It requires a lot of samples from the prior.
- Fails in high-dim.
- It requires an explicit distribution (e.g., Gaussian).
![Page 16: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/16.jpg)
Advantages
✓Non-linear transformations.
✓Allows to generate.
DENSITY NETWORKS
16
Disadvantages
- No analytical solutions.
- No exact likelihood.
- It requires a lot of samples from the prior.
- Fails in high-dim.
- It requires an explicit distribution (e.g., Gaussian).
Can we do better?
![Page 17: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/17.jpg)
IMPLICIT DISTRIBUTIONS
17
![Page 18: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/18.jpg)
Let us look again at the Density Network model. The idea is to inject noise to a neural network that serves as a generator:
THE IDEA BEHIND IMPLICIT DISTRIBUTIONS
18
0
p(z)
z
![Page 19: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/19.jpg)
Let us look again at the Density Network model. The idea is to inject noise to a neural network that serves as a generator:
THE IDEA BEHIND IMPLICIT DISTRIBUTIONS
19
0
p(z)
z
But now, we don’t specify the distribution (e.g., MoG), but use a flexible
transformation directly to return an image. This is now implicit.
![Page 20: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/20.jpg)
We have a neural network (generator) that transforms noise into an image.
THE IDEA BEHIND IMPLICIT DISTRIBUTIONS
20
![Page 21: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/21.jpg)
We have a neural network (generator) that transforms noise into an image.
It defines an implicit distribution (i.e., we do not assume any form of it), and it could be seen as Dirac’s delta:
p(x |z) = δ (x − f(z))
THE IDEA BEHIND IMPLICIT DISTRIBUTIONS
21
![Page 22: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/22.jpg)
We have a neural network (generator) that transforms noise into an image.
It defines an implicit distribution (i.e., we do not assume any form of it), and it could be seen as Dirac’s delta:
However, now we cannot use the likelihood-based approach, because is ill-defined.
p(x |z) = δ (x − f(z))
ln δ (x − f(z))
THE IDEA BEHIND IMPLICIT DISTRIBUTIONS
22
![Page 23: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/23.jpg)
We have a neural network (generator) that transforms noise into an image.
It defines an implicit distribution (i.e., we do not assume any form of it), and it could be seen as Dirac’s delta:
However, now we cannot use the likelihood-based approach, because is ill-defined.
We need to use a different approach.
p(x |z) = δ (x − f(z))
ln δ (x − f(z))
THE IDEA BEHIND IMPLICIT DISTRIBUTIONS
23
![Page 24: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/24.jpg)
GENERATIVE ADVERSARIAL NETWORKS
24
![Page 25: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/25.jpg)
BEFORE WE GO INTO MATH…
25
Let’s imagine two actors:
![Page 26: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/26.jpg)
BEFORE WE GO INTO MATH…
26
A fraud
Let’s imagine two actors:
![Page 27: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/27.jpg)
BEFORE WE GO INTO MATH…
27
A fraud
An art expert
Let’s imagine two actors:
![Page 28: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/28.jpg)
BEFORE WE GO INTO MATH…
28
A fraud
An art expert
… and a real artist
Let’s imagine two actors:
![Page 29: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/29.jpg)
BEFORE WE GO INTO MATH…
29
A fraud
An art expert
… and a real artist
The fraud aims to copy the real artist and cheat the art expert.
Let’s imagine two actors:
![Page 30: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/30.jpg)
BEFORE WE GO INTO MATH…
30
A fraud
An art expert
… and a real artist
The fraud aims to copy the real artist and cheat the art expert.
The expert assesses a painting and gives her opinion.
Let’s imagine two actors:
![Page 31: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/31.jpg)
BEFORE WE GO INTO MATH…
31
A fraud
An art expert
… and a real artist
The fraud aims to copy the real artist and cheat the art expert.
The expert assesses a painting and gives her opinion.
The fraud learns and tries to fool the expert.
Let’s imagine two actors:
![Page 32: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/32.jpg)
BEFORE WE GO INTO MATH…
32
A fraud
An art expert
… and a real artist
Hmmm…A FAKE!
Let’s imagine two actors:
![Page 33: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/33.jpg)
BEFORE WE GO INTO MATH…
33
A fraud
An art expert
… and a real artist
Hmmm…PABLO!
Let’s imagine two actors:
![Page 34: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/34.jpg)
BEFORE WE GO INTO MATH…
34
A fraud
An art expert
… and a real artist
Let’s imagine two actors:
Hmmm…PABLO!
![Page 35: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/35.jpg)
HOW WE CAN FORMULATE IT?
35
![Page 36: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/36.jpg)
HOW WE CAN FORMULATE IT?
36
Generator /fraud/
![Page 37: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/37.jpg)
HOW WE CAN FORMULATE IT?
37
Discriminator /expert/
![Page 38: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/38.jpg)
HOW WE CAN FORMULATE IT?
38
1. Sample z.
2. Generate G(z).
3. Discriminate whether given image is real or fake.
![Page 39: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/39.jpg)
HOW WE CAN FORMULATE IT?
39
1. Sample z.
2. Generate G(z).
3. Discriminate whether given image is real or fake.
What about the learning objective?
![Page 40: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/40.jpg)
We can consider the following learning objective:
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
ADVERSARIAL LOSS
40 Goodfellow, I., et al. (2014). Generative adversarial nets. In NIPS
![Page 41: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/41.jpg)
We can consider the following learning objective:
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
ADVERSARIAL LOSS
41 Goodfellow, I., et al. (2014). Generative adversarial nets. In NIPS
It resemblances the logarithm of the Bernoulli distribution:
y log p(y = 1) + (1 − y)log(1 − p(y = 1))
![Page 42: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/42.jpg)
We can consider the following learning objective:
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
ADVERSARIAL LOSS
42 Goodfellow, I., et al. (2014). Generative adversarial nets. In NIPS
It resemblances the logarithm of the Bernoulli distribution:
y log p(y = 1) + (1 − y)log(1 − p(y = 1))
![Page 43: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/43.jpg)
We can consider the following learning objective:
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
ADVERSARIAL LOSS
43 Goodfellow, I., et al. (2014). Generative adversarial nets. In NIPS
It resemblances the logarithm of the Bernoulli distribution:
Therefore, the discriminator network should end with a sigmoid function to mimic probability.
y log p(y = 1) + (1 − y)log(1 − p(y = 1))
![Page 44: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/44.jpg)
We can consider the following learning objective:
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
ADVERSARIAL LOSS
44 Goodfellow, I., et al. (2014). Generative adversarial nets. In NIPS
We want to minimize wrt. generator.
![Page 45: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/45.jpg)
We can consider the following learning objective:
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
ADVERSARIAL LOSS
45 Goodfellow, I., et al. (2014). Generative adversarial nets. In NIPS
We want to maximize wrt. discriminator.
![Page 46: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/46.jpg)
GENERATIVE ADVERSARIAL NETWORKS
46
Generative process:
1. Sample z.
2. Generate G(z).
![Page 47: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/47.jpg)
GENERATIVE ADVERSARIAL NETWORKS
47
Generative process:
1. Sample z.
2. Generate G(z).
The learning objective (adversarial loss):
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
![Page 48: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/48.jpg)
GENERATIVE ADVERSARIAL NETWORKS
48
Generative process:
1. Sample z.
2. Generate G(z).
The learning objective (adversarial loss):
Learning:
1. Generate fake images, and minimize wrt. G.
2. Take real & fake images, and maximize wrt. D.
minG maxD 𝔼x∼preal[log D(x)] + 𝔼z∼p(z)[log(1 − D(G(z)))]
![Page 49: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/49.jpg)
GANS
49
import torch.nn as nn class GAN(nn.Module): def __init__(self, D, M): super(GAN, self).__init__() self.D = D self.M = M self.gen1 = nn.Linear(in_features= self.M, out_features=300) self.gen2 = nn.Linear(in_features=300, out_features= self.D)
self.dis1 = nn.Linear(in_features= self.D, out_features=300) self.dis2 = nn.Linear(in_features=300, out_features=1)
![Page 50: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/50.jpg)
GANS
50
def generate(self, N): z = torch.randn(size=(N, self.D)) x_gen = self.gen1(z) x_gen = nn.functional.relu(x_gen) x_gen = self.gen2(x_gen) return x_gen
def discriminate(self, x): y = self.dis1(x) y = nn.functional.relu(y) y = self.dis2(y) y = torch.sigmoid(y) return y
![Page 51: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/51.jpg)
GANS
51
def gen_loss(self, d_gen): return torch.log(1. - d_gen)
def dis_loss(self, d_real, d_gen): # We maximize wrt. the discriminator, but optimizers minimize! # We need to include the negative sign! return -(torch.log(d_real) + torch.log(1. - d_gen))
def forward(self, x_real): x_gen = self.generate(N=x_real.shape[0]) d_real = self.discriminate(x_real) d_gen = self.discriminate(x_gen) return d_real, d_gen
![Page 52: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/52.jpg)
GANS
52
def gen_loss(self, d_gen): return torch.log(1. - d_gem)
def dis_loss(self, d_real, d_gen): # We maximize wrt. the discriminator, but optimizers minimize! # We need to include the negative sign! return -(torch.log(d_real) + torch.log(1. - d_gen))
def forward(self, x_real): x_gen = self.generate(N=x_real.shape[0]) d_real = self.discriminate(x_real) d_gen = self.discriminate(x_gen) return d_real, d_gen
We can use two optimizers, one for d_real & d_gen, and one for d_gen.
![Page 53: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/53.jpg)
GENERATIONS
53Salimans, T., et al. (2016). Improved techniques for training GANs. NeurIPS
![Page 54: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/54.jpg)
Advantages
✓Non-linear transformations.
✓Allows to generate.
✓Learnable loss.
✓Allows implicit models.
✓Works in high-dim.
GANS
54
Disadvantages
- No exact likelihood.
- Unstable training.
- Missing mode problem (i.e., it doesn’t cover the whole space).
- No clear way for quantitative assessment.
![Page 55: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/55.jpg)
Before, the motivation for the adversarial loss was the Bernoulli distribution. But we can use other ideas.
WASSERSTEIN GAN
55
![Page 56: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/56.jpg)
Before, the motivation for the adversarial loss was the Bernoulli distribution. But we can use other ideas.
For instance, we can use the earth-mover distance:
where the discriminator is a 1-Lipschitz function.
minG maxD∈𝒲 𝔼x∼preal[D(x)] − 𝔼z∼p(z)[D(G(z))]
WASSERSTEIN GAN
56Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv preprint arXiv:1701.07875.
![Page 57: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/57.jpg)
Before, the motivation for the adversarial loss was the Bernoulli distribution. But we can use other ideas.
For instance, we can use the earth-mover distance:
where the discriminator is a 1-Lipschitz function.
We need to clip weights of the discriminator: clip(weights, -c, c)
minG maxD∈𝒲 𝔼x∼preal[D(x)] − 𝔼z∼p(z)[D(G(z))]
WASSERSTEIN GAN
57Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv preprint arXiv:1701.07875.
![Page 58: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/58.jpg)
Before, the motivation for the adversarial loss was the Bernoulli distribution. But we can use other ideas.
For instance, we can use the earth-mover distance:
where the discriminator is a 1-Lipschitz function.
We need to clip weights of the discriminator: clip(weights, -c, c)
It stabilizes training, but other problems remain.
minG maxD∈𝒲 𝔼x∼preal[D(x)] − 𝔼z∼p(z)[D(G(z))]
WASSERSTEIN GAN
58Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv preprint arXiv:1701.07875.
![Page 59: Deep generative modeling: Implicit models](https://reader035.vdocuments.us/reader035/viewer/2022062309/62a76aa50002712d24798eb6/html5/thumbnails/59.jpg)
Thank you!