deep learning for computer vision - executiveml

Deep Learning for Computer VisionExecutive-ML 2017/09/21

Neither Proprietary nor Confidential – Please Distribute ;)

Alex Conway

alex @ numberboost.com@alxcnwy

Hands up!

Check out theDeep Learning Indaba

videos & practicals!

http://www.deeplearningindaba.com/videos.htmlhttp://www.deeplearningindaba.com/practicals.html

Deep Learning is Sexy (for a reason!)

4

Image Classification

5

http://yann.lecun.com/exdb/mnist/

https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py(99.25% test accuracy in 192 seconds and 70 lines of code)


6


7

https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

Object detection

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

Image Captioning & Visual Attention

XXX

10

https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning

Image Q&A

11https://arxiv.org/pdf/1612.00837.pdf

Video Q&A

XXX

12

https://www.youtube.com/watch?v=UeheTiBJ0Io

Pix2Pix

https://affinelayer.com/pix2pix/https://github.com/affinelayer/pix2pix-tensorflow

13

Pix2Pix

https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66

14

15

Original inputRear Window (1954)

Pix2pix outputFully Automated

RemasteredPainstakingly by Hand

https://hackernoon.com/remastering-classic-films-in-tensorflow-with-pix2pix-f4d551fa0503

Style Transfer

https://github.com/junyanz/CycleGAN16

Style Transfer MAGIC

https://github.com/junyanz/CycleGAN17

1. What is a neural network?

2. What is a convolutional neural network?

3. How to use a convolutional neural network

4. More advanced Methods

5. Case studies & applications

18

Big Shout Outs

Jeremy Howard & Rachel Thomashttp://course.fast.ai

Andrej Karpathyhttp://cs231n.github.io

François Chollet (Keras lead dev)https://keras.io/

19

1.What is a neural network?

What is a neuron?

21

• 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3]• Element-wise multiply and sum • Apply activation function f• Often add a bias too (weight of 1) – not shown

What is an Activation Function?

22

Sigmoid Tanh ReLU

Nonlinearities … “squashing functions” … transform neuron’s outputNB: sigmoid output in [0,1]

What is a (Deep) Neural Network?

23

Inputs outputs

hiddenlayer 1

hiddenlayer 2

hiddenlayer 3

Outputs of one layer are inputs into the next layer

How does a neural network learn?

24

• You need labelled examples “training data”

• Initially, the network makes random predictions (weights initialized randomly)

• For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (aka “loss function”)

• Use ‘backpropagation’ (really just the chain rule), to update the network parameters (weights) in the opposite direction to the error

How does a neural network learn?

25

New weight = Old

weightLearning

rate-Gradient of weight with

respect to Error( )x“How much

error increaseswhen we increase

this weight”

Gradient Descent Interpretation

26

http://scs.ryerson.ca/~aharley/neural-networks/

http://playground.tensorflow.org

What is a Neural Network?

For much more detail, see:

1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html

2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html

28

2. What is a convolutional

neural network?

What is a Convolutional Neural Network?

30

“like a ordinary neural network but with special types of layers that work well on images”

(math works on numbers)

• Pixel = 3 colour channels (R, G, B)

• Pixel intensity ∈[0,255]• Image has width w and height h• Therefore image is w x h x 3 numbers

31This is VGGNet – don’t panic, we’ll break it down piece by piece

Example Architecture

32This is VGGNet – don’t panic, we’ll break it down piece by piece

Example Architecture

Convolutions

33

http://setosa.io/ev/image-kernels/

Convolutions

34http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

New Layer Type: Convolutional Layer

35

• 2-d weighted average when multiply kernel over pixel patches• We slide the kernel over all pixels of the image (handle borders)• Kernel starts off with “random” values and network updates (learns)

the kernel values (using backpropagation) to try minimize loss• Kernels shared across the whole image (parameter sharing)

Many Kernels = Many “Activation Maps” = Volume

36http://cs231n.github.io/convolutional-networks/

New Layer Type: Convolutional Layer

37

Convolutions

38

https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

Convolutions

39

Convolutions

40

Convolutions

41

Convolution Learn Heirarchial Features

42

Great vid

43

https://www.youtube.com/watch?v=AgkfIQ4IGaM

New Layer Type: Max Pooling

44


• Reduces dimensionality from one layer to next• …by replacing NxN sub-area with max value• Makes network “look” at larger areas of the image at a time• e.g. Instead of identifying fur, identify cat• Reduces overfitting since losing information helps the network generalize

45


46

Softmax

• Convert scores∈ℝ to probabilities∈ [0,1]

• Then predict the class with highest probability

47

Bringing it all together

48

Convolution + max pooling + fully connected + softmax


49



50



51



52



53


We need labelled training data!

ImageNet

55http://image-net.org/explore

1000 object categories

1.2 million training images

ImageNet

56

ImageNet

57

ImageNet Top 5 Error Rate

58

TraditionalImage Processing Methods

AlexNet8 Layers

ZFNet8 Layers

GoogLeNet22 Layers ResNet

152 Layers SENetEnsamble

TSNetEnsamble

3. How to use a convolutional neural

network

Using a Pre-Trained ImageNet-Winning CNN

60

• We’ve been looking at “VGGNet”

• Oxford Visual Geometry Group (VGG)

• ImageNet 2014 Runner-up

• Network is 16 layers (deep!)

• Easy to fine-tune

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Example: Classifying Product Images

61

https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

Classifying

products into

9 categories

62https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

Start with Pre-Trained ImageNet Model

“Transfer Learning”is a game changer

Fine-tuning A CNN To Solve A New Problem• Cut off last layer of pre-trained Imagenet winning CNN• Keep learned network (convolutions) but replace final layer• Can learn to predict new (completely different) classes• Fine-tuning is re-training new final layer - learn for new task

64

Fine-tuning A CNN To Solve A New Problem

65

66

Before Fine-Tuning

67

After Fine-Tuning

Fine-tuning A CNN To Solve A New Problem

• Fix weights in convolutional layers (set trainable=False)

• Remove final dense layer that predicts 1000 ImageNet classes

• Replace with new dense layer to predict 9 categories

68

88% accuracy in under 2 minutes for classifying products into categories

Fine-tuning is awesome!Insert obligatory brain analogy

Visual Similarity

69

• Chop off last 2 VGG layers

• Use dense layer with 4096 activations

• Compute nearest neighbours in the space of these activations

https://memeburn.com/2017/06/spree-image-search/

70

https://github.com/alexcnwy/CTDL_CNN_TALK_20170620

71

Input Imagenot seen by model

ResultsTop 10 most “visually similar”

4. More Advanced Methods

Use a Better Architecture (or all of them!)

73

“Ensambles win”

learn a weighted average of many models’ predictions

cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

There are MANY Computer Vision Tasks

> Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015

> Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015

Semantic Segmentation

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Object detection

Object detection

https://www.youtube.com/watch?v=VOC3huqHrss

Object detection

http://blog.romanofoti.com/style_transfer/https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py

Style TransferLoss = content_loss + style_loss

Content loss = convolutions from pre-trained networkStyle loss = gram matrix from style image convolutions

Video Q&A

XXX

80


Video Q&A

XXX

81


5. Case Studies

Image & Video Moderation

TODO

83

Large international gay dating app with tens of millions of users uploading hundreds-of-thousands of photos per day

Estimating Accident Repair Cost from Photos

TODO

84

Prototype for large SA insurer

Detect car make & model from registration disk

Predict repair cost using learned model

Optical Sorting

TODO

85

https://www.youtube.com/watch?v=Xf7jaxwnyso

Segmenting Medical Images

TODO

86

m

Counting People

TODO

Count shoppers, segment on age & genderfacial recognition loyalty is next

Counting Cars

TODO

Wine A.I.

Detecting Potholes

GET IN TOUCH!

Alex Conway

alex @ numberboost.com@alxcnwy

http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-winners-interview-2nd-place-felix-lau/

Computer Vision Pipelines

https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html

https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/

Practical Tips• use a GPU – AWS p2 instances (use spot!)

• when overfitting (validation_accuracy <<< training_accuracy)– Increase dropout – Early stopping

• when underfitting (low training_accuracy)1. Add more data2. Use data augmentation

– Flipping / stretching / rotating– Slightly change hues / brightness

3. Use more complex model– Resnet / Inception / Densenet– Ensamble (average <<< weighted average with learned weights)

94

deep learning for computer vision - executiveml

Data & Analytics