fsan/eleg815: statistical learningarce/files/courses/stat...7/38 xii: convolutional neural networks...

39
FSAN/ELEG815: Statistical Learning Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware XII: Convolutional Neural Networks

Upload: others

Post on 18-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

FSAN/ELEG815: Statistical Learning

Gonzalo R. ArceDepartment of Electrical and Computer Engineering

University of Delaware

XII: Convolutional Neural Networks

Page 2: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

1/38

XII: Convolutional Neural Networks FSAN/ELEG815

Outline

I Convolutional Neural Networks Overview

I Applications: Style Transfer

Page 3: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

2/38

XII: Convolutional Neural Networks FSAN/ELEG815

Neural Networks ArchitecturesI Consider several outputs. Linear score function: h = Wx

I 2-Layer Neural Network: s = W2 θ(W1x)

Map the raw image pixels to class scores. Classification based on the score.

Page 4: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

3/38

XII: Convolutional Neural Networks FSAN/ELEG815

Neural Networks Architectures

4+2 = 6 neurons.[3×4]+ [4×2] = 20 weights4+2 = 6 biases.

4+4+1 = 9 neurons.[3×4]+ [4×4]+ [4×1] = 32 weights4+4+1 = 9 biases.

Page 5: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

4/38

XII: Convolutional Neural Networks FSAN/ELEG815

Convolutional Neural Networks ArchitecturesI Very similar to ordinary Neural Networks.

I Add convolutional layers. Neurons with 3 dimensions: width, height anddepth.

I Inputs are also volumes.

Page 6: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

5/38

XII: Convolutional Neural Networks FSAN/ELEG815

Neural Network - Fully Connected (FC) Layer

Consider a 32×32×3 image → stretch to 3072×1

Each output is the result of a dot product between a row of W and the inputx. 10 neurons outputs.

Page 7: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

6/38

XII: Convolutional Neural Networks FSAN/ELEG815

Convolutional LayerConsider a 32×32×3 image → preserve spatial structure.

Page 8: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

7/38

XII: Convolutional Neural Networks FSAN/ELEG815

Convolutional Layer

Result: dot product between the filterand a small 5×5×3 chunk of theimage.

Volume convolution at (x,y), for allmaps of the input volume:

convx,y =∑

i

wivi

where ws are kernel weights, vs chuckof the image.Adding scalar bias b:

zx,y =∑

i

wivi + b

Page 9: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

8/38

XII: Convolutional Neural Networks FSAN/ELEG815

Convolutional Layer

Page 10: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

9/38

XII: Convolutional Neural Networks FSAN/ELEG815

Convolutional LayerConsider a second, green filter:

Page 11: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

10/38

XII: Convolutional Neural Networks FSAN/ELEG815

Convolutional LayerConsider 6 filters (5×5), we get 6 separate activation maps:

We stack these up to get a “new image volume” of size 28×28×6

Page 12: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

11/38

XII: Convolutional Neural Networks FSAN/ELEG815

Activation FunctionsPass every element of each activation map through a nonlinearity:

Page 13: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

12/38

XII: Convolutional Neural Networks FSAN/ELEG815

ConvNet is a sequence of Convolutional Layers, interspersed with activationfunctions:

Notice how the activation maps get smaller, this can be solved by zeropadding.

Page 14: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

13/38

XII: Convolutional Neural Networks FSAN/ELEG815

InterpretationFilters Learned:

Page 15: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

14/38

XII: Convolutional Neural Networks FSAN/ELEG815

Interpretation

Page 16: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

15/38

XII: Convolutional Neural Networks FSAN/ELEG815

Pooling LayerI Makes the representations smaller and more manageable.I Operates over each activation map independently.I Neighborhood of 2×2 is replaced by the average.

Page 17: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

16/38

XII: Convolutional Neural Networks FSAN/ELEG815

Max PoolingI Neighborhood of 2×2 is replaced by the maximum value.I Effective in classifying large image databases.I Simple and fast.

I L2 pooling is also used. Neighborhood of 2×2 is replaced by the squaredroot of the sum of their squared values.

Page 18: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

17/38

XII: Convolutional Neural Networks FSAN/ELEG815

Example - Image classification

Page 19: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

18/38

XII: Convolutional Neural Networks FSAN/ELEG815

Convolutional Neural Networks Complete Scheme

I 277×277 pixelsRGB image.

I 96 feature maps.I 96 kernels volumes

of size 11×11×3I This weights came

from AlexNet:CNN trained usingmore than 1million imagesbelonging to 1,000object categories.

Page 20: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

19/38

XII: Convolutional Neural Networks FSAN/ELEG815

Result Feature Maps

(4) and (35) emphasize edge content. (23) is a blurred version of the input.(10) and (16) capture complementary shades of gray (hair). (39) emphasizeseyes and dress (blue). (45) blue and red tones (lips, hair, skin).

Page 21: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

20/38

XII: Convolutional Neural Networks FSAN/ELEG815

Example - Handwritten Numerals Classification

I Training: 60,000grayscale images.

I Testing: 10,000grayscale images.

I Network trainedfor 200 epochs.

I Performance:99.4% in trainingset.

I Performance:99.1% in testingset.

Page 22: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

21/38

XII: Convolutional Neural Networks FSAN/ELEG815

Example - Handwritten Numerals Classification

I First stage: 6features maps.

I Second stage: 12features maps.

I Kernels of size5×5.

I Fully ConnectedLayer withouthidden layers.

Page 23: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

22/38

XII: Convolutional Neural Networks FSAN/ELEG815

Remember: Networks with many layers - Exampleφi is feature function which computes the presence (+1) and absence (−1) of

the corresponding feature.

If we feed in ‘1’, φ1,φ2, φ3 compute +1 and φ4,φ5, φ6 compute −1.Combining with the signs of the weights, z1 will be positive and z5 will be

negative.

Page 24: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

23/38

XII: Convolutional Neural Networks FSAN/ELEG815

Features Map InterpretationI First feature map:

strong verticalcomponents on theleft.

I Second: strongcomponents in thenorthwest area ofthe top of thecharacter and theleft vertical lowerarea.

I Third: stronghorizontalcomponents.

Page 25: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

24/38

XII: Convolutional Neural Networks FSAN/ELEG815

Style TransferI Goal: Rendering the semantic content of an image in different styles.I Challenge: separate image content from style.

A Neural Algorithm of Artistic Style can separate and recombine the imagecontent and style of natural images.

Page 26: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

25/38

XII: Convolutional Neural Networks FSAN/ELEG815

Deep Image RepresentationsVGG-19 is a convolutional neural network that is trained on more than amillion images from the ImageNet database to perform object recognition(1000 categories) and localization.

Page 27: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

26/38

XII: Convolutional Neural Networks FSAN/ELEG815

Content RepresentationResponses in a layer l are stored in a matrix F l ∈ RNl×Ml where Nl is thenumber of filters and Ml is the height times the width of the feature map.

F lij is the activation of the ith filter at position j in layer l.

Page 28: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

27/38

XII: Convolutional Neural Networks FSAN/ELEG815

Visualize Image Information at each Layer

Perform gradient descent on a white noise image to obtain a reconstructedimage ~x with the information encoded at different layers.Minimize the loss function:

Lcontent(~p,~x, l) = 12

∑i,j

(F lij−P l

ij)2

where F lij and P l

ij are the feature representations of the original image ~p andthe reconstructed image ~x in layer l.

~x(t+1) = ~x(t)−λ∂Lcontent∂~x

The gradient with respect to the image ~x can be computed using standarderror back-propagation.

Page 29: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

28/38

XII: Convolutional Neural Networks FSAN/ELEG815

Content Representation Results

Reconstruction of the input image from layers (a) conv1_2 (b) conv2_2(c) conv3_2 (d) conv4_2 (e) conv5_2 of the original VGG-Network.

Page 30: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

29/38

XII: Convolutional Neural Networks FSAN/ELEG815

Style RepresentationUse a feature space designed to capture texture information: correlationbetween the different filter responses.

Page 31: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

30/38

XII: Convolutional Neural Networks FSAN/ELEG815

Style Representation

Feature correlations are given by the Gram matrix Gl ∈ RNl×Nl . Expectationtaken over the spatial extent of the features maps.

Glij =

∑k

F likF

ljk

Glij is the inner product between the vectorized feature maps i and j in layer l.

Perform gradient descent on a white noise image to observe the informationcaptured by these style feature spaces.

Page 32: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

31/38

XII: Convolutional Neural Networks FSAN/ELEG815

Style Representation

Page 33: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

32/38

XII: Convolutional Neural Networks FSAN/ELEG815

Visualize Image Style at each LayerI Minimize the distance between the Gram matrices. The contribution of

layer l to the total loss is:

El = 14N2

l M2l

∑i,j

(Glij−Al

ij)2

where Glij and Al

ij are the style representation of the original image ~a andthe generated image ~x in layer l.

I The total loss is:Lstyle(~a,~x) =

L∑l=0

wlEl

wl are weighting factors (parameters).

~x(t+1) = ~x(t)−λ∂Lstyle∂~x

The gradient with respect to the image ~x can be computed using standarderror back-propagation.

Page 34: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

33/38

XII: Convolutional Neural Networks FSAN/ELEG815

Style Representation Results

Style Reconstructions from layers (a) conv1_1, (b) conv1_1 and conv2_1(c) conv1_1, conv2_1 and conv3_1 (d) conv1_1, conv2_1, conv3_1 andconv4_1 (e) conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 of the

original VGG-Network.

Page 35: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

34/38

XII: Convolutional Neural Networks FSAN/ELEG815

Style Transfer

The total loss is a linear combination between the content and the style loss:

Ltotal = αLcontent +βLstyle

Its derivative with respect to the pixel values can be computed using errorback-propagation.

~x(t+1) = ~x(t)−λ∂Ltotal∂~x

Page 36: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

35/38

XII: Convolutional Neural Networks FSAN/ELEG815

Style Transfer

Page 37: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

36/38

XII: Convolutional Neural Networks FSAN/ELEG815

Results

Page 38: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

37/38

XII: Convolutional Neural Networks FSAN/ELEG815

ResultsStyle Image Content Image

Page 39: FSAN/ELEG815: Statistical Learningarce/files/Courses/Stat...7/38 XII: Convolutional Neural Networks FSAN/ELEG815 ConvolutionalLayer Result: dotproductbetweenthefilter andasmall5×5×3

38/38

XII: Convolutional Neural Networks FSAN/ELEG815

ResultsStyle Image Content Image