generative adversarial text to image synthesis

Generative Adversarial Text to Image Synthesis

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran

[GitHub] [Arxiv]

Slides by Víctor Garcia [GDoc]Computer Vision Reading Group (30/09/2016)

Index● Introduction ● State of the Art● Method

○ Network Architecture○ Losses

● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer

● Conclusions

Introduction

Text → Image

● Conclusions

Discriminator

Generator

DiscriminatorD(·)

Generator

q(x) xG(z) zx’

DiscriminatorD(·)

MAX → E[log(D(X))]

Generator

q(x) xG(z) zx’

DiscriminatorD(·)

MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]

Generator

q(x) xG(z) zx’

DiscriminatorD(·)

MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]

Generator

q(x) xG(z) zx’

DiscriminatorD(·)

Generator

q(x) xG(z) zx’

MIN → E[ log(1 - D(G(Z))) ]

GANs with Join DistributionsHow do we generate the image from text?

Discriminator

f(x,t) f(x’,t)

GANs with Join Distributions

Discriminator

Real Image

Gen. Image

Generator +Text

GANs with Join Distributions

Discriminator

Real Image

Gen. Image

Generator +Text

Text EmbedddingIn order to represent the text in a vector...

This is the recurrent text encoder

● Conclusions

Network Architecture

Losses - CLS

log(D(x,t)) log(1-D(G(z,t)))

True Image +

True Text

Fake Image +

True Text

Real Images match the text content?

Losses - CLS

log(D(x,t)) log(1-D(G(z,t))) log(1-D(G(zi,tk)))

True Image +

True Text

Fake Image +

True Text

True Image (i) +

True Text (j)Unmatched

Losses - INT

They train interpolating between different text embedding vector (t1~t2).

So the generator learns to fill GAPS on the data manifold.

● Conclusions

Qualitative Results - Birds

Sentence Interpolation

+Text1

+Text3

+Text2

+Text4

Disentangling style and content

Generator.

If ‘text’ is describing the content? What is ‘z’ describing?

Disentangling style and content

Generator.

If ‘text’ is describing the content? What is ‘z’ describing?

Style → Pose, Background…, let’s extract ‘z’

Disentangling style and contentz0 z1 z2 z3 z4 z5

Qualitative Results - Flowers

Qualitative Results - MSCOCO

Conclusions

Discriminator

f(x,t) f(x’,t)

generative adversarial text to image synthesis

Data & Analytics

robust face sketch synthesis via generative adversarial...

generative adversarial transformers

high-resolution mammogram synthesis using progressive...

generative adversarial networks (gans) - emanuele...

generative adversarial networks, and applications...

generative adversarial imitation learning

generative adversarial...

generative adversarial networks (part...

generative models - sharif university of...

generative adversarial networks (gans) -...

non-adversarial image synthesis with generative latent...

a survey of image synthesis and editing with generative...

mc-gan: multi-conditional generative adversarial network for...

emotigan: emoji art using generative adversarial...

text to image synthesis using generative adversarial...

generative adversarial networks (gans)

generative adversarial network based synthesis for...

mode seeking generative adversarial networks for diverse...

generative adversarial networks - machine...

generative adversarial networks - haw...