introduction to deep learning - delta course · 2017-06-19 · introduction to deep learning 1....

Internet of Things Group

Anna Petrovicheva

IOTG Computer Vision

Introduction to Deep Learning

1


Agenda

1. Neural Networks overview

2. Math engine

3. Neural Network layers

4. Solving Computer Vision problems

5. How to train a network

2


Deep Learning systems in real world

Image credit: DeepMind, Prisma, Yayvo, Google Translate, Redmond Pie, TechRepublic, Brit

*Other names and brands may be claimed as the property of others 3

https://deepmind.com/research/alphago/

https://prisma-ai.com/

http://yayvo.com/blog/prisma-app-artistry-wreaks-massive-selfie-craze/

https://translate.google.com/

http://www.redmondpie.com/tesla-shows-off-model-s-autopilot-features-in-new-video/

http://www.techrepublic.com/article/apples-siri-the-smart-persons-guide/

https://www.brit.co/the-faceapp-filter-made-these-celebs-look-exactly-like-other-celebs/


Tesla autopilot

Image credit: Autopilot Full Self Driving Demonstration Nov 18 2016 Realtime Speed

*Other names and brands may be claimed as the property of others 4

https://www.youtube.com/watch?v=VG68SKoG7vE


Brief history● 1965: first idea

● AI winter

● 1998: LeNet-5

● 2000’s: “The biggest issue of this paper, is that it relies on neural networks”

● 2012: groundbreaking results in ImageNet contest

○ Old algorithms

○ Big dataset

○ Compute power

● 2012-now: wide adoption

5

http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

http://www.image-net.org/


Artificial Neural Network

dog

input

layer

v1

v2

v3

vnew

w2

w1

w3

output

parameter

neuron

6


Training

Start: parameters are random

Goal: find good parameters W = (w1, w2, … , wm)

cat

7


Finding parameters

● W = (w1, w2, … , wm) - point in multidimensional space

○ Modern nets: 10s - 100s million parameters

● Use W in network → get corresponding prediction error

○ Wstart: high prediction error

○ Woptimal: low prediction error

● Goal: get from Wstart to Woptimal

prediction error

w2

w1

Woptimal

Wstart

8


Gradient descent

W1 = Wstart + α * F’(Wstart)

α - learning rate

Too small: long training

Too large: training diverges

prediction error

w2

w1

WstartWoptimalW1

9


Gradient descent

W1 = Wstart + α * F’(Wstart)

W2 = W1 + α * F’(W1)

W3 = W2 + α * F’(W2)

W4 = W3 + α * F’(W3)

W5 = W4 + α * F’(W4)

prediction error

w2

w1

Wstart

Woptimal

W1

10


Non-convex task

● May stuck in local minima

● Solution depends on initial point

State-of-the-art opinion:

● Local minima are not biggest problem

● “Like person driving a car in a really confusing city”

prediction error

w2

w1

Woptimal Wlocal

11


Stochastic gradient descent

Gradient descent:

▪ Take all data points (= all dataset)▪ Compute parameter derivative in all points▪ Make a step in this direction

Dataset is too big

▪ Too much time to compute▪ Does not fit in operating memory

Stochastic Gradient Descent:

▪ Use random subset of data (new each iteration)

12


Backpropagation algorithm

cat

w2

w1

w3

Forward pass

error e

w’1

w’3

Backward pass

● Cost function estimates prediction error

● Layers compute derivative with respect to parameters

● Parameter derivative is sent to Stochastic Gradient Descent

● SGD outputs parameter update for the next iteration

● Next iteration - new parameters, new data from dataset

SGD

Parameter update ΔW

w’2

13


Neural network layers


Convolutional layer

● Local connectivity

● Convolves channels too

● Each convolutional layer has many different filters

● Each filter detects specific feature

○ Borders, colors

● General data transform tool

● Can have bias b

Image credit: Visualizing Neural Networks In Virtual Space

1 0 1

0 1 0

1 0 1

15

https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177


Convolutional layer

Can represent any image operation

Goal: find suitable parameters

Takes 95% computations in network

:

Image credit: OpenCV documentation

16

http://docs.opencv.org/3.2.0/d5/d0f/tutorial_py_gradients.html


Convolutional layer filters

AlexNet 1st convolution filters

● Detect lines

● Detect color patterns

Further layers:

● Growing level of abstraction

○ “Face neuron”

Image credit: CS231n: Convolutional Neural Networks for Visual Recognition

17

http://cs231n.github.io/understanding-cnn/


Fully connected layer

● 95 % of parameters in network

● “Classic” layer

● Usually used before the final classificator

w11

w21

v1

v2

fc1

wnm

b1

bm

vn

fcm

18


Activation layer

● Applied after all convolution and fully connected layers

● Analogous to biological neuron mechanism

○ Neuron firing rate

19


Activation layers● Original idea: Heaviside step function

○ Fire / not fire○ Non-differentiable -> cannot use

backpropagation

● Approximation: sigmoid / tanh ○ Approximate step function○ Differentiable○ Saturate and kill gradients

● Used almost everywhere: Rectified Linear Unit○ Accelerates convergence in training○ Does not saturate

●

tanhsigmoid

Heaviside step function

ReLU

20


Pooling layer

● Types:

○ Average pooling○ Max pooling

● Reduces data dimensionality○ Less parameters ○ Less computations○ Controls overfitting

0 -1 0 2

1 1 -1 1

1 0 3 0

-1 2 0 1

1 2

2 3

max pooling

21


Typical feed-forward neural network

No cycles Activation after each convolution / FC Pooling after several convolution blocks

Image credit: Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

VGG16 topology

22

http://www.hirokatsukataoka.net/research/cnnfeatureevaluation/cnnfeatureevaluation.html

http://www.robots.ox.ac.uk/~vgg/research/very_deep/



Solving Computer Vision with Deep Learning


Image classification

● Predicts category of image

● Backbone extracts features

● Classification head outputs probabilities of each category

backbone classification head

dog cat bird

0.7 0.2 0.1

24


Softmax layer

Softmax layer + cross-entropy loss

label dog cat bird

ground truth 1 0 0

algorithm 1 0.2 0.6 0.2

algorithm 2 0.5 0.4 0.1

algorithm 3 0.8 0.1 0.1

Cross-entropy loss

Cross-entropy loss

- ((ln(0.2) * 1) + (ln(0.6) * 0) + (ln(0.2) * 0)) = 1.6

- ((ln(0.5) * 1) + (ln(0.4) * 0) + (ln(0.1) * 0)) = 0.69

- ((ln(0.8) * 1) + (ln(0.1) * 0) + (ln(0.1) * 0)) = 0.22

25


ImageNet

Greatest driver of Deep Learning and image classification

1 million images

1000 classes

▪ 120 dog breeds

ImageNet 2017 is the last one

26


● Before 2012: non-Deep Learning methods

● 2012: AlexNet

● 2014: VGG, GoogLeNet

● 2015: ResNet

27

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks


https://arxiv.org/abs/1409.4842




ResNet topology Won ImageNet 2015 image classification contest

Key advantage: residual connection

▪ Better convergence in parameter space

Outperformes human accuracy in image classification

▪ Andrej Karpathy blog

ResNet-like topologies are state-of-the-art

▪ Top accuracy in many Computer Vision tasks

Very deep

▪ 50 / 101 / 152 -convolution modificationsImage credit: Deep Residual Learning for Image Recognition

28



http://image-net.org/challenges/LSVRC/2015/

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

https://arxiv.org/pdf/1512.03385.pdf


Typical Deep Learning algorithm for Computer Vision

Requirement: big datasets for the task exist

Typical solution

Backbone: AlexNet, VGG, GoogLeNet, ResNet and other

▪ Without softmax head

▪ Extracts representative features

▪ Pretrained on ImageNet

backboneinput task-specific layers output

29


Object detection

VGG

Inception

ResNet

backbone detection head

Faster R-CNN

R-FCN

SSD

treetree

elephant

Image credit: Savanna

30



https://www.tensorflow.org/tutorials/image_recognition

https://www.tensorflow.org/tutorials/image_recognition



https://github.com/rbgirshick/py-faster-rcnn

https://github.com/rbgirshick/py-faster-rcnn





https://infograph.venngage.com/p/197786/savanna-infographic


Object detection

Image credit: YOLO v2

31

https://www.youtube.com/watch?v=VOC3huqHrss&t=53s


Semantic segmentation

● Generate mask of objects of each class on image○ Road○ Pedestrian○ ...

● Each pixel classification

● Datasets

○ General case○ Road scenarios

Image credit: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

32

https://arxiv.org/abs/1606.00915v2






Semantic segmentation

Image credit: Feature Space Optimization for Semantic Video Segmentation - CityScapes Demo 02

33

https://www.youtube.com/watch?v=Nok6Xludc_Q


Instance segmentation

Mask for each object + category of object

▪ Semantic segmentation

▪ Object detection

State-of-the-art: Mask-R-CNN

34



Generative Adversarial Networks

Image credit: Stability of Generative Adversarial Networks

● Generator network generates sample

● Discriminator network tries to distinguish real samples from generated

○ Bank-counterfeiter task

● Trained GAN:

○ Good generator of new objects

○ Good estimator of object quality

● Any task can be interpreted as GAN

35

http://www.araya.org/archives/1183


GAN for image generation

Image credit: BEGAN: Boundary Equilibrium Generative Adversarial Networks

September 2016 March 2017

36



GAN for image generation

Image credit: BEGAN: Boundary Equilibrium Generative Adversarial Networks

37



GAN for image generation from caption

Image credit: StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

38



GAN for Super Resolution

Image credit: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Original image Bicubic interpolation SRGAN

4x

39



GAN for image to image translation

Image credit: Image-to-Image Translation with Conditional Adversarial Nets

40

https://phillipi.github.io/pix2pix/


GAN for image to image translation

Image credit: CycleGAN

41

https://github.com/junyanz/CycleGAN


How to train a network


Understand state-of-the-art

● Google Scholar, Arxiv papers

● Datasets, benchmarks

● Existing implementations, open repositories

43


Prepare dataset● Neural Networks demand big datasets

○ ImageNet: 1.4 million images

○ MS COCO: 300 thousand images

● Data augmentation

○ Cropping

○ Flipping

○ Brightness / contrast

44


Prepare dataset

Train Train-Val Validation Test

high error

Bigger model

Train longer

Other architecture

high error

More data

More regularization

Other architecture

high error

Get more data similar

to test

Other architecture

high error

More validation data

Small amount of real-life data: add train-val split

overfitting generalization

Andrew Ng. Nuts and Bolts of Applying Deep Learning

45

https://www.youtube.com/watch?v=F1ka6a13S9I&t=9s

https://www.youtube.com/watch?v=F1ka6a13S9I&t=9s


Iterative experiments

● Overfit 1 sample● Put all results in table

● Variability:

▪ Backbone

▪ Task-specific layers and loss

▪ Data augmentation

▪ Optimization parameters

– Learning rate value and policy– Regularization

Image credit: Speed/accuracy trade-offs for modern convolutional object detectors

46



Accuracy evaluation

● Compare with state-of-the-art

● Analyze accuracy dynamics while training1.0

0.5

0.9

0.8

0.7

0.6

iterationsac

cura

cy

train

val

Typical good training

1.0

0.5

0.9

0.8

0.7

0.6

iterations

accu

racy

Overfitting

47


Choose accuracy metric

● Single accuracy metric

○ Comparable results

Example:

Accuracy Performance

Model 1 98 % 2 seconds

Model 2 93 % 0.5 second

● Accuracy: optimizing metric

● Time: satisficing metric

48


General tips

● Neural Networks can solve vision problems human can solve in 1 second

● Open source repositories do not work out of the box

● Find your way to learn about new DL research

Papers submitted to Arxiv categories cs.AI, cs.LG, cs.CV, cs.CL, cs.NE, stat.ML over time

Image credit: Andrej Karpathy’s blog @ Medium

49

https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106

introduction to deep learning - delta course · 2017-06-19 · introduction to deep learning 1....

Documents