object detection lecture 10.3 - introduction to deep
TRANSCRIPT
![Page 1: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/1.jpg)
Object Detection
Lecture 10.3 - Introduction to deep learning (CNN)
Idar Dyrdal
![Page 2: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/2.jpg)
Deep Learning
• Computational models composed of multiple processing layers (non-linear transformations)
• Used to learn representations of data with multiple levels of abstraction:
• Learning a hierarchy of feature extractors
• Each level in the hierarchy extracts features from the output of the previous layer (pixels classes)
• Deep learning has dramatically improved state-of-the-art in:
• Speech and character recognition
• Visual object detection and recognition
• Convolutional neural nets for processing of images, video, speech and signals (time series) in general
• Recurrent neural nets for processing of sequential data (speech, text).
2
Level 3
Level 2
Level 1
Raw data (images, video, signals)
Labels
![Page 3: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/3.jpg)
Deep Learning for Object Recognition
3
«Ship»
Millions of images Millions of parameters Thousands of classes
(AlexNet)
![Page 4: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/4.jpg)
Traditional supervised learning
4
Training
images
Feature
extraction
Classifier
training Classifier
Class
labels
![Page 5: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/5.jpg)
Deep learning
5
Training
images
Feature
extraction
Classifier
training Classifier
Class
labels
• Learning of weights in the processing layers
• Supervised, unsupervised (or semi-supervised) learning
![Page 6: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/6.jpg)
Semi-supervised learning
6
Labeled samples and (trained) linear
decision boundary Labeled and unlabeled samples and non-
linear decision boundary
![Page 7: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/7.jpg)
Artificial Neural Network (ANN)
Used in Machine Learning and Pattern Recognition: • Regression • Classification • Clustering • …
Applications:
• Speech recognition • Recognition of handwritten text • Image classification • …
Network types:
• Feed-forward neural networks • Recurrent neural networks (RNN) • …
7
Feed-forward ANN (non-linear classifier)
![Page 8: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/8.jpg)
Mark 1 Perceptron (Rosenblatt, 1957-59)
8
(Cornell Aeronautical Laboratory)
![Page 9: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/9.jpg)
Activation functions
Sigmoid (logistic function):
Hyperbolic tangent:
Rectified linear unit (ReLU):
9
(Quasar Jarosz, English Wikipedia)
Biological neuron:
![Page 10: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/10.jpg)
Feed-forward neural network
10
i
j
k
l
![Page 11: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/11.jpg)
Back-propagation
11
i
j
k
l
![Page 12: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/12.jpg)
Convolutional Neural Network (CNN)
Used in Machine Vision and Image Analysis:
• Speech Recognition
• Image Recognition
• Video Recognition
• Image Segmentation
• …
Convolutional neural network:
• Multi-layer feed-forward ANN
• Combinations of convolutional and fully connected layers
• Convolutional layers with local connectivity
• Shared weights across spatial positions
• Local or global pooling layers
12
(A. Karpathy)
![Page 13: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/13.jpg)
Typical CNN
13
(Aphex34)
Learned features
![Page 14: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/14.jpg)
Convolutional neural net
14
Convolution (learned)
Input image
Feature map
Non-linearity
Spatial pooling
Normalization
Input image
(credit: S. Lazebnik)
![Page 15: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/15.jpg)
Convolutional neural net
15
Convolution (learned)
Input image
Feature map
Non-linearity
Spatial pooling
Normalization
Input Feature Map
.
.
.
(credit: S. Lazebnik)
![Page 16: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/16.jpg)
Convolutional neural net
16
Convolution (learned)
Input image
Feature map
Non-linearity
Spatial pooling
Normalization Rectified Linear Unit (ReLU)
(credit: S. Lazebnik)
![Page 17: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/17.jpg)
Convolutional neural net
17
Convolution (learned)
Input image
Feature map
Non-linearity
Spatial pooling
Normalization Max pooling
(credit: S. Lazebnik)
Max-pooling: a non-linear down-sampling
Provide translation invariance
![Page 18: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/18.jpg)
Convolutional neural net
18
Convolution (learned)
Input image
Feature map
Non-linearity
Spatial pooling
Normalization
Feature Maps Feature Maps After Contrast Normalization
(credit: S. Lazebnik)
![Page 19: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/19.jpg)
Convolutional neural net
19
Convolution (learned)
Input image
Feature map
Non-linearity
Spatial pooling
Normalization
Feature maps after contrast normalization
(credit: S. Lazebnik)
![Page 20: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/20.jpg)
Example - Caffe Demos
20
http://demo.caffe.berkeleyvision.org
![Page 21: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/21.jpg)
21
![Page 22: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/22.jpg)
22
![Page 23: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/23.jpg)
Example - Semantic Segmentation (SegNet)
23
http://mi.eng.cam.ac.uk/projects/segnet/
![Page 24: Object Detection Lecture 10.3 - Introduction to deep](https://reader030.vdocuments.us/reader030/viewer/2022012811/61c17ac67148751558230cde/html5/thumbnails/24.jpg)
Summary
Topics covered:
• Deep learning
• Artificial neural networks
• Convolutional neural networks
More information:
• Szeliski, chapter 14
• Yann LeCun ,Yoshua Bengio & Geoffrey Hinton, “Deep learning”, Nature, Vol 521, 28. May 2015.
• Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu, “Benchmarking State-of-the-Art Deep Learning Software Tools”, 2017 (https://arxiv.org/pdf/1608.07249.pdf)
Software: • Caffe (http://caffe.berkeleyvision.org) • TensorFlow (https://www.tensorflow.org/) • MatConvNet (http://www.vlfeat.org/matconvnet)
24