whatis deep learning? - khu.ac.krweb.khu.ac.kr/~tskim/patternclass lec note 24-1 deep... ·...

32
What is Deep Learning?

Upload: others

Post on 14-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

What is Deep Learning?

Page 2: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Activation of Action Potentials

Page 3: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Artificial Neural Network (ANN)

Sigmoid Activation Function

Page 4: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Multi-Layer Neural Networks

• Nonlinear classifier• Training: find network weights w to minimize the error between true

training labels yi and estimated labels fw(xi):N

i=1

• Minimization can be done by gradient descent provided f is differentiable• This training method is called back-propagation

(y - f (x ))åE(w) = i w i2

Page 5: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Shallow Network vs. Deep Network

# of Hidden Layer <= 1 (i.e., shallow network) # of Hidden Layer >= 2 (i.e., deep network)

Page 6: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Training: Forward-propagation

http://www.slideshare.net/keepurcalm/intro-to-deep-learning-autoencoders

Three layer neural network : two inputs & one output weights of connections

Page 7: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Training via Backpropagation (1)error=target-output

weight updates

Page 8: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Training via Backpropagation (2)final weight updates

Page 9: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Difficulty of Training Deep Neural Network (or Multi Layer NN)

• Vanishing gradient problem• Problem with nonlinear activation function• Gradient (error signal) decreases exponentially with the number of layers

and the front layers train very slowly.• Over-fitting

• Given limited amounts of labeled data, training via backpropagation does not work well

• Local minima• Difficulty in optimization

Page 10: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Traditional Solution : Feature Extraction• Supervised learning

Hand-designed feature

extraction

Trainable classifier

Image/VideoPixels

• Features are not learned, but extracted by humans• Trainable classifier

Object Class

Facial Feature Extraction

Page 11: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

New Solutions for Deep Net• Vanishing gradient problem

• Solved by a new non-linear activation function: rectified linear unit (ReLU) in 2010, 2011

• Over-fitting• Solved by new regularization methods: dropout

(Hinton et al., 2012) etc.

• Local minima• Solved by high-dimensional non-convex optimization: local minima are all similar• Local minima are good and close to global minima

Page 12: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Supervised vs. Unsupervised :: Shallow vs. Deep

• Supervised learning for shallow net

Hand-designed feature

extraction

Trainable classifier

Image/VideoPixels

Object Class

• Unsupervised learning for deep net

Layer 1 Layer N Simple classifier

Object Class

Image/VideoPixels

Deep learning: “Deep” architecture

Page 13: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a
Page 14: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Deep Learning: Multiple Levels of Feature Representation

Page 15: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

What is Deep Learning?• Deep learning is training a deep network.• Deep learning = hierarchical learning• Replace (supervised) handcrafted features for unsupervised feature leaning and hierarchical feature extraction

• The key of deep net = weights• Various deep learning architectures: deep belief network (DBN), convolutional neural network (CNN), recurrent neural network (RNN)

• Application areas: visual object recognition, object detection, speech recognition, bioinformatics, etc.

Page 16: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a
Page 17: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a
Page 18: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Three Well-known Deep Learning Algorithms

1. Deep Belief Network (DBN)

2. Convolutional Neural Network (CNN)

3. Recurrent Neural Network (RNN)

Page 19: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

What is CNN?

Page 20: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Visual Processing of The Brain

Page 21: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Hierarchical Visual Representation

Page 22: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Convolutional Neural Networks (CNN, Convnet)• Neural network with specialized connectivity structure• Stack multiple stages of feature extractors• Higher stages compute more global, more invariant features• Classification layer at the end

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied todocument recognition, Proceedings of the IEEE 86(11): 2278–2324, 1998.

Page 23: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

• Feed-forward feature extraction:1. Convolve input with learned filters2. Non-linearity3. Spatial pooling4. Normalization

• Supervised training of convolutional filters by back-propagating classification error

Non-linearity

Convolution (Learning)

Input Image

Spatial pooling

Normalization

Convolutional Neural Networks: 기본 구조

Feature maps

Page 24: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

1. Convolution

• Dependencies are local• Translation invariant• Few parameters (filter weights)• Stride can be greater than 1

(faster, less memory)

.

.

.

Input Feature Map

Page 25: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Convolution

그림, 내용 출처 : http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

5x5 image를 3x3 kernel로Convolution 하는 과정

Kernel의 계수에 따라 각각 다른

feature를 얻을 수 있는데 일반적으로

계수들은 특정 목적에 따라 고정이 되

지만 CNN에 사용되는 kernel은 학습

을 통해 최적의 계수를 결정한다.

Page 26: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

2. Non-Linearity

• Per-element (independent)• Options:

• Tanh• Sigmoid: 1/(1+exp(-x))• Rectified linear unit(ReLU)

– Simplifies backpropagation– Makes learning faster– Avoids saturation issues

à Preferred option

Page 27: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

3. Spatial Pooling

• Sum or max• Non-overlapping / overlapping regions• Role of pooling:

• Invariance to small transformations• Larger receptive fields (see more of input)

Max

Sum

Page 28: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Subsampling = Pooling

통상적인 sub-sampling은 보통 고정된 위치에

있는 픽셀을 고르거나, 혹은 sub-sampling 윈도

우 안에 있는 픽셀들의 평균을 취한다.

CNN에서의 sub-sampling은 신경세포와 유사한

방법으로 강한 신호만 전달하고 나머지 신호는

무시하는 max-pooling 방식을 사용한다.

그림,내용출처 : http://blog.naver.com/laonple/220608018546

Page 29: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

4. Normalization

• Within or across feature maps• Before or after spatial pooling

Feature MapsFeature Maps

After Contrast Normalization

Page 30: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

CNN Applications

• Handwritten text/digits• MNIST (0.17% error [Ciresan et al. 2011])• Arabic & Chinese [Ciresan et al. 2012]

• Simpler recognition benchmarks• CIFAR-10 (9.3% error [Wan et al. 2013])• Traffic sign recognition

– 0.56% error vs 1.16% for humans[Ciresan et al. 2011]

• But until recently, less good at more complex datasets• Caltech-101/256 (few training examples)

Page 31: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Brain vs. CNN

Page 32: Whatis Deep Learning? - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 24-1 Deep... · 2016-12-05 · New Solutions for Deep Net •Vanishing gradient problem •Solved by a

Deep Learning Resources

• Google TensorFlow https://www.tensorflow.org/• UC Berkeley Caffe http://caffe.berkeleyvision.org/• Matlab Toolbox

• https://kr.mathworks.com/discovery/deep-learning.html• https://kr.mathworks.com/matlabcentral/fileexchange/38310-deep-

learning-toolbox

• Microsoft Cognitive Toolkit (CNTK) https://github.com/Microsoft/CNTK/wiki/KDD-2016-Tutorial