![Page 1: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/1.jpg)
Generative Adversarial Text to Image Synthesis
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran
[GitHub] [Arxiv]
Slides by Víctor Garcia [GDoc]Computer Vision Reading Group (30/09/2016)
![Page 2: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/2.jpg)
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
![Page 3: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/3.jpg)
Introduction
Text → Image
GANs
![Page 4: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/4.jpg)
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
![Page 5: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/5.jpg)
GANs
Discriminator
1/0
True
World
Fake
Generator
![Page 6: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/6.jpg)
GANs
DiscriminatorD(·)
1/0
True
World
Fake
Generator
q(x) xG(z) zx’
![Page 7: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/7.jpg)
GANs
DiscriminatorD(·)
MAX → E[log(D(X))]
True
World
Fake
Generator
q(x) xG(z) zx’
![Page 8: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/8.jpg)
GANs
DiscriminatorD(·)
MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]
True
World
Fake
Generator
q(x) xG(z) zx’
![Page 9: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/9.jpg)
GANs
DiscriminatorD(·)
MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]
True
World
Fake
Generator
q(x) xG(z) zx’
![Page 10: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/10.jpg)
GANs
DiscriminatorD(·)
True
World
Fake
Generator
q(x) xG(z) zx’
MIN → E[ log(1 - D(G(Z))) ]
![Page 11: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/11.jpg)
GANs with Join DistributionsHow do we generate the image from text?
![Page 12: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/12.jpg)
GANs with Join DistributionsHow do we generate the image from text?
Discriminator
1/0
f(x,t) f(x’,t)
![Page 13: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/13.jpg)
GANs with Join Distributions
Discriminator
1/0
Real Image
+Text
Gen. Image
+Text
Generator +Text
![Page 14: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/14.jpg)
GANs with Join Distributions
Discriminator
1/0
Real Image
+Text
Gen. Image
+Text
Generator +Text
![Page 15: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/15.jpg)
Text EmbedddingIn order to represent the text in a vector...
MIN
WHERE
![Page 16: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/16.jpg)
Text EmbedddingIn order to represent the text in a vector...
MIN
WHERE
This is the recurrent text encoder
![Page 17: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/17.jpg)
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
![Page 18: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/18.jpg)
Network Architecture
![Page 19: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/19.jpg)
Losses - CLS
log(D(x,t)) log(1-D(G(z,t)))
True Image +
True Text
Fake Image +
True Text
Real Images match the text content?
![Page 20: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/20.jpg)
Losses - CLS
log(D(x,t)) log(1-D(G(z,t))) log(1-D(G(zi,tk)))
True Image +
True Text
Fake Image +
True Text
True Image (i) +
True Text (j)Unmatched
![Page 21: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/21.jpg)
Losses - INT
They train interpolating between different text embedding vector (t1~t2).
So the generator learns to fill GAPS on the data manifold.
![Page 22: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/22.jpg)
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
![Page 23: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/23.jpg)
Qualitative Results - Birds
![Page 24: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/24.jpg)
Sentence Interpolation
Gen.
z0
+Text1
Gen.
z1
+Text3
Gen.
z0
+Text2
Gen.
z1
+Text4
![Page 25: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/25.jpg)
Disentangling style and content
Generator.
z+
Text
If ‘text’ is describing the content? What is ‘z’ describing?
![Page 26: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/26.jpg)
Disentangling style and content
Generator.
z+
Text
If ‘text’ is describing the content? What is ‘z’ describing?
Style → Pose, Background…, let’s extract ‘z’
![Page 27: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/27.jpg)
Disentangling style and contentz0 z1 z2 z3 z4 z5
![Page 28: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/28.jpg)
Qualitative Results - Flowers
![Page 29: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/29.jpg)
Qualitative Results - MSCOCO
![Page 30: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/30.jpg)
Conclusions
Discriminator
1/0
f(x,t) f(x’,t)
x~t
![Page 31: Generative adversarial text to image synthesis](https://reader033.vdocuments.us/reader033/viewer/2022042611/5879efb51a28ab70298b46f1/html5/thumbnails/31.jpg)