generative adversarial text to image synthesis

Post on 14-Jan-2017

192 Views

Category:

Data & Analytics

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Index● Introduction ● State of the Art● Method

○ Network Architecture○ Losses

● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer

● Conclusions

Introduction

Text → Image

GANs

Index● Introduction ● State of the Art● Method

○ Network Architecture○ Losses

● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer

● Conclusions

GANs

Discriminator

1/0

True

World

Fake

Generator

GANs

DiscriminatorD(·)

1/0

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

MAX → E[log(D(X))]

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

True

World

Fake

Generator

q(x) xG(z) zx’

MIN → E[ log(1 - D(G(Z))) ]

GANs with Join DistributionsHow do we generate the image from text?

GANs with Join DistributionsHow do we generate the image from text?

Discriminator

1/0

f(x,t) f(x’,t)

GANs with Join Distributions

Discriminator

1/0

Real Image

+Text

Gen. Image

+Text

Generator +Text

GANs with Join Distributions

Discriminator

1/0

Real Image

+Text

Gen. Image

+Text

Generator +Text

Text EmbedddingIn order to represent the text in a vector...

MIN

WHERE

Text EmbedddingIn order to represent the text in a vector...

MIN

WHERE

This is the recurrent text encoder

Index● Introduction ● State of the Art● Method

○ Network Architecture○ Losses

● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer

● Conclusions

Network Architecture

Losses - CLS

log(D(x,t)) log(1-D(G(z,t)))

True Image +

True Text

Fake Image +

True Text

Real Images match the text content?

Losses - CLS

log(D(x,t)) log(1-D(G(z,t))) log(1-D(G(zi,tk)))

True Image +

True Text

Fake Image +

True Text

True Image (i) +

True Text (j)Unmatched

Losses - INT

They train interpolating between different text embedding vector (t1~t2).

So the generator learns to fill GAPS on the data manifold.

Index● Introduction ● State of the Art● Method

○ Network Architecture○ Losses

● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer

● Conclusions

Qualitative Results - Birds

Sentence Interpolation

Gen.

z0

+Text1

Gen.

z1

+Text3

Gen.

z0

+Text2

Gen.

z1

+Text4

Disentangling style and content

Generator.

z+

Text

If ‘text’ is describing the content? What is ‘z’ describing?

Disentangling style and content

Generator.

z+

Text

If ‘text’ is describing the content? What is ‘z’ describing?

Style → Pose, Background…, let’s extract ‘z’

Disentangling style and contentz0 z1 z2 z3 z4 z5

Qualitative Results - Flowers

Qualitative Results - MSCOCO

Conclusions

Discriminator

1/0

f(x,t) f(x’,t)

x~t

top related