draw - cleveland state universityeecs.csuohio.edu/~sschung/cis601/brandonmarlowe... · draw draw =...

DRAWA Recurrent Neural Network for Image

Generation

Brandon Marlowe - 2693414CIS 601 - Spring 18

Agenda● What is a Neural Network? (VERY briefly)● DRAW (Deep Recurrent Attentive Writer) Overview● Why DRAW matters● DRAW...ing in Detail● Experimentation and Results

What is a Neural Network?

● Statistical learning model inspired by the structure of the human mind● Composed of “Neurons” (AKA, nodes)● Consist of three main parts

○ Input Layer○ Hidden Layer○ Output Layer

Extremely Simple Example Feedforward Neural Network

(Computes XOR Function)

Inputs of [1, 1] passed into the Neural Network

Image:https://stevenmiller888.github.io/mind-how-to-build-a-neural-network/

Random weights are assigned to each Synapse in all layers


The weights corresponding to each Neuron are summed


Activation function (Sigmoid in this case) is applied to each of the weighted sums


Example Activation Functions

Image: Özkan, C., & Erbek, F. S. (2003). The Comparison of Activation Functions for Multispectral Landsat TM Image Classification. Photogrammetric Engineering & Remote Sensing, 69(11), 1225-1234. doi:10.14358/pers.69.11.1225

Hidden layer values multiplied by weights and summed


Error = target - calculated = -0.77


The derivative of the activation function is used to adjust weights and the process is repeated


Recurrent Neural Networks (RNNs) vs. Feedforward Neural Networks (FNNs)

● RNNs are similar to FNNs○ Main difference: RNNs are aware of previous inputs, FNNs are not

● RNNs can be thought of as multiple FNNs

Image: https://image.slidesharecdn.com/mdrnn-yandexmoscowcv-160427182305/95/multidimensional-rnn-4-638.jpg?cb=1461781453

DRAW Overview

DRAW

● DRAW = Deep Recurrent Attentive Writer○ Comprised of two Long Short-Term Memory Recurrent Neural Networks

■ Encoder RNN: compresses images■ Decoder RNN: reconstitutes images

○ Long Short-Term Memory Architecture composed of:■ Read Gate, Write Gate, Keep/Forget Gate

● Not the first image generation Neural Network● Belongs to family of Variational Autoencoders● Mimics behavior of the human eye● Creates portions of scenes independently and iteratively refines them

Why DRAW Matters

● Previous Autoencoders created images in a single pass○ Accuracy suffered○ Details were missed○ Complex images posed problems○ Could not create natural-looking images

● DRAW creates images iteratively○ Generates complex images that cannot be distinguished from the real image○ Gradually refines each portion of the image○ Substantially improves on state of the art image generation models

Structure of DRAW in DetailConventional Auto-Encoder DRAW

Images: Karol Gregor, Ivo Danihelka, Alex Graves, Daan Wierstra (2015). DRAW: A Recurrent Neural Network For Image Generation. CoRR, abs/1502.04623, .

DRAW...ing with Attention to Detail

● Read Gate places N x N grid of Gaussian Filters on image and determines the image center (gx, gy)

● δ = “stride” or “zoom” of attention patch○ Large stride means more of the image is

visible to the attention model



● Write Gate extracts previous attention parameters, and inverts them

● The inversion alternates focus between highly detailed and broad views of image


Key Component:


DRAW recreating images from MNIST dataset

Image: Karol Gregor, Ivo Danihelka, Alex Graves, Daan Wierstra (2015). DRAW: A Recurrent Neural Network For Image Generation. CoRR, abs/1502.04623, .

Experimentation

● Three sets of training data were used:○ MNIST (Modified National Institute of Standards and Technology Database)

■ Database of handwritten digits○ SVHN (Street View House Numbers)

■ Database of images containing house numbers○ CIFAR-10 (Canadian Institute For Advanced Research - 10 Classes)

■ Database containing 10 classes of vehicles and animals● Experiment consisted of:

○ Classifying MNIST images○ Generating MNIST images○ Generating SVHN images○ Generating CIFAR-10 images

Classifying MNIST

● MNIST 100 x 100 Clutter Classification○ 100 x 100 pixel images contained digit-like fragments○ DRAW was tasked with identifying digits○ The model was given a fixed number of “glimpses”

■ Each glimpse is 12 x 12 pixels in size○ DRAW compared with RAM (Recurrent Attention Model)

■ DRAW uses ¼ of the attention patches RAM uses


Generating MNIST

● DRAW tasked with generating MNIST-like digits○ MNIST is widely used, allowing DRAW to be easily compared

● Trained on MNIST dataset● With vs. without selective attention compared as well

All images generated by DRAW(except rightmost column = training set image)Negative Log-likelihood (lower is better)


Generating SVHN

● DRAW trained on 64 x 64 pixel images of house numbers

● 231,053 images in dataset

● 4,701 validation images

Sequence of drawing SVHN digits

All images generated by DRAW(except rightmost column = training set image)


Generating CIFAR-10

● DRAW trained on 50,000 images○ Small training sample considering diversity of

images● Still able to capture a good portion of detail

All images generated by DRAW(except rightmost column = training set image)


DRAW in Action

Image: Karol Gregor, Ivo Danihelka, Alex Graves, Daan Wierstra (2015). DRAW: A Recurrent Neural Network For Image Generation. CoRR, abs/1502.04623, .

Sources

● Karol Gregor, Ivo Danihelka, Alex Graves, Daan Wierstra (2015). DRAW: A Recurrent Neural Network For Image Generation. CoRR, abs/1502.04623, .

● Özkan, C., & Erbek, F. S. (2003). The Comparison of Activation Functions for Multispectral Landsat TM Image Classification. Photogrammetric Engineering & Remote Sensing, 69(11), 1225-1234. doi:10.14358/pers.69.11.1225

● https://stevenmiller888.github.io/mind-how-to-build-a-neural-network/

draw - cleveland state universityeecs.csuohio.edu/~sschung/cis601/brandonmarlowe... · draw draw =...

Documents