introduction to deep learning - delta course · 2017-06-19 · introduction to deep learning 1....
TRANSCRIPT
Internet of Things Group
Anna Petrovicheva
IOTG Computer Vision
Introduction to Deep Learning
1
Internet of Things Group
Agenda
1. Neural Networks overview
2. Math engine
3. Neural Network layers
4. Solving Computer Vision problems
5. How to train a network
2
Internet of Things Group
Deep Learning systems in real world
Image credit: DeepMind, Prisma, Yayvo, Google Translate, Redmond Pie, TechRepublic, Brit
*Other names and brands may be claimed as the property of others 3
Internet of Things Group
Tesla autopilot
Image credit: Autopilot Full Self Driving Demonstration Nov 18 2016 Realtime Speed
*Other names and brands may be claimed as the property of others 4
Internet of Things Group
Brief history● 1965: first idea
● AI winter
● 1998: LeNet-5
● 2000’s: “The biggest issue of this paper, is that it relies on neural networks”
● 2012: groundbreaking results in ImageNet contest
○ Old algorithms
○ Big dataset
○ Compute power
● 2012-now: wide adoption
5
Internet of Things Group
Artificial Neural Network
dog
input
layer
v1
v2
v3
vnew
w2
w1
w3
output
parameter
neuron
6
Internet of Things Group
Training
Start: parameters are random
Goal: find good parameters W = (w1, w2, … , wm)
cat
7
Internet of Things Group
Finding parameters
● W = (w1, w2, … , wm) - point in multidimensional space
○ Modern nets: 10s - 100s million parameters
● Use W in network → get corresponding prediction error
○ Wstart: high prediction error
○ Woptimal: low prediction error
● Goal: get from Wstart to Woptimal
prediction error
w2
w1
Woptimal
Wstart
8
Internet of Things Group
Gradient descent
W1 = Wstart + α * F’(Wstart)
α - learning rate
Too small: long training
Too large: training diverges
prediction error
w2
w1
WstartWoptimalW1
9
Internet of Things Group
Gradient descent
W1 = Wstart + α * F’(Wstart)
W2 = W1 + α * F’(W1)
W3 = W2 + α * F’(W2)
W4 = W3 + α * F’(W3)
W5 = W4 + α * F’(W4)
prediction error
w2
w1
Wstart
Woptimal
W1
10
Internet of Things Group
Non-convex task
● May stuck in local minima
● Solution depends on initial point
State-of-the-art opinion:
● Local minima are not biggest problem
● “Like person driving a car in a really confusing city”
prediction error
w2
w1
Woptimal Wlocal
11
Internet of Things Group
Stochastic gradient descent
Gradient descent:
▪ Take all data points (= all dataset)▪ Compute parameter derivative in all points▪ Make a step in this direction
Dataset is too big
▪ Too much time to compute▪ Does not fit in operating memory
Stochastic Gradient Descent:
▪ Use random subset of data (new each iteration)
12
Internet of Things Group
Backpropagation algorithm
cat
w2
w1
w3
Forward pass
error e
w’1
w’3
Backward pass
● Cost function estimates prediction error
● Layers compute derivative with respect to parameters
● Parameter derivative is sent to Stochastic Gradient Descent
● SGD outputs parameter update for the next iteration
● Next iteration - new parameters, new data from dataset
SGD
Parameter update ΔW
w’2
13
Internet of Things Group
Neural network layers
Internet of Things Group
Convolutional layer
● Local connectivity
● Convolves channels too
● Each convolutional layer has many different filters
● Each filter detects specific feature
○ Borders, colors
● General data transform tool
● Can have bias b
Image credit: Visualizing Neural Networks In Virtual Space
1 0 1
0 1 0
1 0 1
15
Internet of Things Group
Convolutional layer
Can represent any image operation
Goal: find suitable parameters
Takes 95% computations in network
:
Image credit: OpenCV documentation
16
Internet of Things Group
Convolutional layer filters
AlexNet 1st convolution filters
● Detect lines
● Detect color patterns
Further layers:
● Growing level of abstraction
○ “Face neuron”
Image credit: CS231n: Convolutional Neural Networks for Visual Recognition
17
Internet of Things Group
Fully connected layer
● 95 % of parameters in network
● “Classic” layer
● Usually used before the final classificator
w11
w21
v1
v2
fc1
wnm
b1
bm
vn
fcm
18
Internet of Things Group
Activation layer
● Applied after all convolution and fully connected layers
● Analogous to biological neuron mechanism
○ Neuron firing rate
19
Internet of Things Group
Activation layers● Original idea: Heaviside step function
○ Fire / not fire○ Non-differentiable -> cannot use
backpropagation
● Approximation: sigmoid / tanh ○ Approximate step function○ Differentiable○ Saturate and kill gradients
● Used almost everywhere: Rectified Linear Unit○ Accelerates convergence in training○ Does not saturate
●
tanhsigmoid
Heaviside step function
ReLU
20
Internet of Things Group
Pooling layer
● Types:
○ Average pooling○ Max pooling
● Reduces data dimensionality○ Less parameters ○ Less computations○ Controls overfitting
0 -1 0 2
1 1 -1 1
1 0 3 0
-1 2 0 1
1 2
2 3
max pooling
21
Internet of Things Group
Typical feed-forward neural network
No cycles Activation after each convolution / FC Pooling after several convolution blocks
Image credit: Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection
VGG16 topology
22
Internet of Things Group
Solving Computer Vision with Deep Learning
Internet of Things Group
Image classification
● Predicts category of image
● Backbone extracts features
● Classification head outputs probabilities of each category
backbone classification head
dog cat bird
0.7 0.2 0.1
24
Internet of Things Group
Softmax layer
Softmax layer + cross-entropy loss
label dog cat bird
ground truth 1 0 0
algorithm 1 0.2 0.6 0.2
algorithm 2 0.5 0.4 0.1
algorithm 3 0.8 0.1 0.1
Cross-entropy loss
Cross-entropy loss
- ((ln(0.2) * 1) + (ln(0.6) * 0) + (ln(0.2) * 0)) = 1.6
- ((ln(0.5) * 1) + (ln(0.4) * 0) + (ln(0.1) * 0)) = 0.69
- ((ln(0.8) * 1) + (ln(0.1) * 0) + (ln(0.1) * 0)) = 0.22
25
Internet of Things Group
ImageNet
Greatest driver of Deep Learning and image classification
1 million images
1000 classes
▪ 120 dog breeds
ImageNet 2017 is the last one
26
Internet of Things Group
● Before 2012: non-Deep Learning methods
● 2012: AlexNet
● 2014: VGG, GoogLeNet
● 2015: ResNet
27
Internet of Things Group
ResNet topology Won ImageNet 2015 image classification contest
Key advantage: residual connection
▪ Better convergence in parameter space
Outperformes human accuracy in image classification
▪ Andrej Karpathy blog
ResNet-like topologies are state-of-the-art
▪ Top accuracy in many Computer Vision tasks
Very deep
▪ 50 / 101 / 152 -convolution modificationsImage credit: Deep Residual Learning for Image Recognition
28
Internet of Things Group
Typical Deep Learning algorithm for Computer Vision
Requirement: big datasets for the task exist
Typical solution
Backbone: AlexNet, VGG, GoogLeNet, ResNet and other
▪ Without softmax head
▪ Extracts representative features
▪ Pretrained on ImageNet
backboneinput task-specific layers output
29
Internet of Things Group
Object detection
VGG
Inception
ResNet
backbone detection head
Faster R-CNN
R-FCN
SSD
treetree
elephant
Image credit: Savanna
30
Internet of Things Group
Object detection
Image credit: YOLO v2
31
Internet of Things Group
Semantic segmentation
● Generate mask of objects of each class on image○ Road○ Pedestrian○ ...
● Each pixel classification
● Datasets
○ General case○ Road scenarios
Image credit: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
32
Internet of Things Group
Semantic segmentation
Image credit: Feature Space Optimization for Semantic Video Segmentation - CityScapes Demo 02
33
Internet of Things Group
Instance segmentation
Mask for each object + category of object
▪ Semantic segmentation
▪ Object detection
State-of-the-art: Mask-R-CNN
34
Internet of Things Group
Generative Adversarial Networks
Image credit: Stability of Generative Adversarial Networks
● Generator network generates sample
● Discriminator network tries to distinguish real samples from generated
○ Bank-counterfeiter task
● Trained GAN:
○ Good generator of new objects
○ Good estimator of object quality
● Any task can be interpreted as GAN
35
Internet of Things Group
GAN for image generation
Image credit: BEGAN: Boundary Equilibrium Generative Adversarial Networks
September 2016 March 2017
36
Internet of Things Group
GAN for image generation
Image credit: BEGAN: Boundary Equilibrium Generative Adversarial Networks
37
Internet of Things Group
GAN for image generation from caption
Image credit: StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
38
Internet of Things Group
GAN for Super Resolution
Image credit: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Original image Bicubic interpolation SRGAN
4x
39
Internet of Things Group
GAN for image to image translation
Image credit: Image-to-Image Translation with Conditional Adversarial Nets
40
Internet of Things Group
GAN for image to image translation
Image credit: CycleGAN
41
Internet of Things Group
How to train a network
Internet of Things Group
Understand state-of-the-art
● Google Scholar, Arxiv papers
● Datasets, benchmarks
● Existing implementations, open repositories
43
Internet of Things Group
Prepare dataset● Neural Networks demand big datasets
○ ImageNet: 1.4 million images
○ MS COCO: 300 thousand images
● Data augmentation
○ Cropping
○ Flipping
○ Brightness / contrast
44
Internet of Things Group
Prepare dataset
Train Train-Val Validation Test
high error
Bigger model
Train longer
Other architecture
high error
More data
More regularization
Other architecture
high error
Get more data similar
to test
Other architecture
high error
More validation data
Small amount of real-life data: add train-val split
overfitting generalization
Andrew Ng. Nuts and Bolts of Applying Deep Learning
45
Internet of Things Group
Iterative experiments
● Overfit 1 sample● Put all results in table
● Variability:
▪ Backbone
▪ Task-specific layers and loss
▪ Data augmentation
▪ Optimization parameters
– Learning rate value and policy– Regularization
Image credit: Speed/accuracy trade-offs for modern convolutional object detectors
46
Internet of Things Group
Accuracy evaluation
● Compare with state-of-the-art
● Analyze accuracy dynamics while training1.0
0.5
0.9
0.8
0.7
0.6
iterationsac
cura
cy
train
val
Typical good training
1.0
0.5
0.9
0.8
0.7
0.6
iterations
accu
racy
Overfitting
47
Internet of Things Group
Choose accuracy metric
● Single accuracy metric
○ Comparable results
Example:
Accuracy Performance
Model 1 98 % 2 seconds
Model 2 93 % 0.5 second
● Accuracy: optimizing metric
● Time: satisficing metric
48
Internet of Things Group
General tips
● Neural Networks can solve vision problems human can solve in 1 second
● Open source repositories do not work out of the box
● Find your way to learn about new DL research
Papers submitted to Arxiv categories cs.AI, cs.LG, cs.CV, cs.CL, cs.NE, stat.ML over time
Image credit: Andrej Karpathy’s blog @ Medium
49
Internet of Things Group