whatis deep learning? - khu.ac.krweb.khu.ac.kr/~tskim/patternclass lec note 24-1 deep... ·...

What is Deep Learning?

Activation of Action Potentials

Artificial Neural Network (ANN)

Sigmoid Activation Function

Multi-Layer Neural Networks

• Nonlinear classifier• Training: find network weights w to minimize the error between true

training labels yi and estimated labels fw(xi):N

i=1

• Minimization can be done by gradient descent provided f is differentiable• This training method is called back-propagation

(y - f (x ))åE(w) = i w i2

Shallow Network vs. Deep Network

# of Hidden Layer <= 1 (i.e., shallow network) # of Hidden Layer >= 2 (i.e., deep network)

Training: Forward-propagation

http://www.slideshare.net/keepurcalm/intro-to-deep-learning-autoencoders

Three layer neural network : two inputs & one output weights of connections

Training via Backpropagation (1)error=target-output

weight updates

Training via Backpropagation (2)final weight updates

Difficulty of Training Deep Neural Network (or Multi Layer NN)

• Vanishing gradient problem• Problem with nonlinear activation function• Gradient (error signal) decreases exponentially with the number of layers

and the front layers train very slowly.• Over-fitting

• Given limited amounts of labeled data, training via backpropagation does not work well

• Local minima• Difficulty in optimization

Traditional Solution : Feature Extraction• Supervised learning

Hand-designed feature

extraction

Trainable classifier

Image/VideoPixels

• Features are not learned, but extracted by humans• Trainable classifier

Object Class

Facial Feature Extraction

New Solutions for Deep Net• Vanishing gradient problem

• Solved by a new non-linear activation function: rectified linear unit (ReLU) in 2010, 2011

• Over-fitting• Solved by new regularization methods: dropout

(Hinton et al., 2012) etc.

• Local minima• Solved by high-dimensional non-convex optimization: local minima are all similar• Local minima are good and close to global minima

Supervised vs. Unsupervised :: Shallow vs. Deep

• Supervised learning for shallow net

Hand-designed feature

extraction

Trainable classifier

Image/VideoPixels

Object Class

• Unsupervised learning for deep net

Layer 1 Layer N Simple classifier

Object Class

Image/VideoPixels

Deep learning: “Deep” architecture

…

Deep Learning: Multiple Levels of Feature Representation

What is Deep Learning?• Deep learning is training a deep network.• Deep learning = hierarchical learning• Replace (supervised) handcrafted features for unsupervised feature leaning and hierarchical feature extraction

• The key of deep net = weights• Various deep learning architectures: deep belief network (DBN), convolutional neural network (CNN), recurrent neural network (RNN)

• Application areas: visual object recognition, object detection, speech recognition, bioinformatics, etc.

Three Well-known Deep Learning Algorithms

1. Deep Belief Network (DBN)

2. Convolutional Neural Network (CNN)

3. Recurrent Neural Network (RNN)

What is CNN?

Visual Processing of The Brain

Hierarchical Visual Representation

Convolutional Neural Networks (CNN, Convnet)• Neural network with specialized connectivity structure• Stack multiple stages of feature extractors• Higher stages compute more global, more invariant features• Classification layer at the end

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied todocument recognition, Proceedings of the IEEE 86(11): 2278–2324, 1998.

• Feed-forward feature extraction:1. Convolve input with learned filters2. Non-linearity3. Spatial pooling4. Normalization

• Supervised training of convolutional filters by back-propagating classification error

Non-linearity

Convolution (Learning)

Input Image

Spatial pooling

Normalization

Convolutional Neural Networks: 기본 구조

Feature maps

1. Convolution

• Dependencies are local• Translation invariant• Few parameters (filter weights)• Stride can be greater than 1

(faster, less memory)

.

.

.

Input Feature Map

Convolution

그림, 내용 출처 : http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

5x5 image를 3x3 kernel로Convolution 하는 과정

Kernel의 계수에 따라 각각 다른

feature를 얻을 수 있는데 일반적으로

계수들은 특정 목적에 따라 고정이 되

지만 CNN에 사용되는 kernel은 학습

을 통해 최적의 계수를 결정한다.

2. Non-Linearity

• Per-element (independent)• Options:

• Tanh• Sigmoid: 1/(1+exp(-x))• Rectified linear unit(ReLU)

– Simplifies backpropagation– Makes learning faster– Avoids saturation issues

à Preferred option

3. Spatial Pooling

• Sum or max• Non-overlapping / overlapping regions• Role of pooling:

• Invariance to small transformations• Larger receptive fields (see more of input)

Max

Sum

Subsampling = Pooling

통상적인 sub-sampling은 보통 고정된 위치에

있는 픽셀을 고르거나, 혹은 sub-sampling 윈도

우 안에 있는 픽셀들의 평균을 취한다.

CNN에서의 sub-sampling은 신경세포와 유사한

방법으로 강한 신호만 전달하고 나머지 신호는

무시하는 max-pooling 방식을 사용한다.

그림,내용출처 : http://blog.naver.com/laonple/220608018546

4. Normalization

• Within or across feature maps• Before or after spatial pooling

Feature MapsFeature Maps

After Contrast Normalization

CNN Applications

• Handwritten text/digits• MNIST (0.17% error [Ciresan et al. 2011])• Arabic & Chinese [Ciresan et al. 2012]

• Simpler recognition benchmarks• CIFAR-10 (9.3% error [Wan et al. 2013])• Traffic sign recognition

– 0.56% error vs 1.16% for humans[Ciresan et al. 2011]

• But until recently, less good at more complex datasets• Caltech-101/256 (few training examples)

Brain vs. CNN

Deep Learning Resources

• Google TensorFlow https://www.tensorflow.org/• UC Berkeley Caffe http://caffe.berkeleyvision.org/• Matlab Toolbox

• https://kr.mathworks.com/discovery/deep-learning.html• https://kr.mathworks.com/matlabcentral/fileexchange/38310-deep-

learning-toolbox

• Microsoft Cognitive Toolkit (CNTK) https://github.com/Microsoft/CNTK/wiki/KDD-2016-Tutorial

whatis deep learning? - khu.ac.krweb.khu.ac.kr/~tskim/patternclass lec note 24-1 deep... ·...

Documents