from neural networks to deep learning
Post on 14-Jan-2017
4.064 Views
Preview:
TRANSCRIPT
From Artificial Neural Networks to Deep learning
Viet-Trung Tran
1
2
3
4
5
Perceptron • Rosenblatt 1957 • input signals x1, x2, • bias x0 = 1 • Net input = weighted sum = Net(w,x) • Activation/transfer func = f(Net(w,x)) • output
weighted sum
step func1on
6
Weighted Sum and Bias
• Weighted sum
• Bias
7
8
Hard-limiter function
• Hard-limiter – Threshold function – Discontinuous function – Discontinuous derivative
9
Threshold logic function
• Saturating linear function
• Contiguous function
• Discontinuous derivative
10
Sigmoid function • Most popular • Output (0,1) • Continuous derivatives • Easy to differentiate
11
Artificial neural network – ANN structure
• Number of input/output signals • Number of hidden layers • Number of neurons per layer • Neuron weights • Topology • Biases
12
Feed-forward neural network
• connections between the units do not form a directed cycle
13
Recurrent neural network
• A class of artificial neural network where connections between units form a directed cycle
14
Why hidden layers
15
Neural network learning
• 2 types of learning – Parameter learning • Learn neuron weight connections
– Structure learning • Learn ANN structure from training data
16
Error function
• Consider an ANN with n neurons • For each learning example (x,d) – Training error caused by current weight w
• Training error caused by w for entire learning examples
17
Learning principle
18
Neuron error gradients
19
Parameter learning: back propagation of error
• Calculate total error at the top • Calculate contributions to error at each step going
backwards
20
Back propagation discussion
• Initial weights • Learning rate • Number of neurons per hidden layers • Number of hidden layers
21
Stochastic gradient descent (SGD)
22
23
Deep learning
24
Google brain
25
GPU
26
Learning from tagged data
• @Andrew Ng
27
2006 breakthrough
• More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep
architectures
28
29
30
31
Deep Learning trends
• @Andrew Ng
32
33
34
AI will transform the internet • @Andrew Ng • Technology areas with potential for paradigm shift: – Computer vision – Speech recognition & speech synthesis – Language understanding: Machine translation; Web
search; Dialog systems; …. – Advertising – Personalization/recommendation systems – Robotics
• All this is hard: scalability, algorithms.
35
36
37
38
Deep learning
39
40
CONVOLUTIONAL NEURAL NETWORK
http://colah.github.io/
41
Convolution • Convolution is a mathematical operation on two
functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions,
42
Convolutional neural networks
• Conv Nets is a kind of neural network that uses many identical copies of the same neuron – Large number of neurons – Large computational models – Number of actual weights (parameters) to be
learned fairly small
43
A 2D Convolutional Neural Network
• a convolutional neural network can learn a neuron once and use it in many places, making it easier to learn the model and reducing error.
44
Structure of Conv Nets
• Problem – predict whether a human is speaking or not
• Input: audio samples at different points in time
45
Simple approach
• just connect them all to a fully-connected layer
• Then classify
46
A more sophisticated approach • Local properties of the data – frequency of sounds (increasing/decreasing)
• Look at a small window of the audio sample – Create a group of neuron A to compute certain features – the output of this convolutional layer is fed into a fully-
connected layer, F
47
48
49
Max pooling layer
50
2D convolutional neural networks
51
52
53
Three-dimensional convolutional networks
54
Group of neurons: A
• Bunch of neurons in parallel • all get the same inputs and compute different
features.
55
Network in Network (Lin et al. (2013)
56
Conv Nets breakthroughs in computer vision
• Krizehvsky et al. (2012)
57
Diferent Levels of Abstraction
58
59
60
RECURRENT NEURAL NETWORKS
http://colah.github.io/
61
Recurrent Neural Networks (RNN) have loops
• A loop allows information to be passed from one step of the network to the next.
62
Unroll RNN
• recurrent neural networks are intimately related to sequences and lists.
63
Examples • predict the last word in “the clouds are in the sky" • the gap between the relevant information and the
place that it’s needed is small • RNNs can learn to use the past information
64
• “I grew up in France… I speak fluent French.” • As the gap grows, RNNs become unable to
learn to connect the information.
65
LONG SHORT TERM MEMORY NETWORKS
LSTM Networks
66
LSTM networks • A Special kind of RNN • Capable of learning long-term dependencies • Structure in the form of a chain of repeating
modules of neural network
67
RNN
• repeating module has a very simple structure, such as a single tanh layer
68
• The tanh(z) function is a rescaled version of the sigmoid, and its output range is [ − 1,1] instead of [0,1].
69
LSTM networks
• Repeating module consists of four neuron, interacting in a very special way
70
Core idea behind LSTMs • The key to LSTMs is the cell state, the horizontal line
running through the top of the diagram. • The cell state runs straight down the entire chain, with only
some minor linear interactions • Easy for information to just flow along it unchanged
71
Gates
• The ability to remove or add information to the cell state, carefully regulated by structures called gates
• Sigmoid – How much of each component should be let
through. – Zero means nothing through – One means let everything through
• An LSTM has three of these gates 72
LSTM step 1
• decide what information we’re going to throw away from the cell state
• forget gate layer
73
LSTM step 2
• decide what new information we’re going to store in the cell state
• input gate layer
74
LSTMs step 3
• update the old cell state, Ct−1, into the new cell state Ct
75
LSTMs step 4
• decide what we’re going to output
76
77
78
79
80
RECURRENT NEURAL NETWORKS WITH WORD EMBEDDINGS
81
APPENDIX
82
83
Perceptron 1957
84
Perceptron 1957
85
Perceptron 1986
86
Perceptron
87
Activation function
88
Back propagation 1974/1986
89
90
91
• Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks.
• No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998
• SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow
• architecture). • Breakthrough in 2006!
92
2006 breakthrough
• More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep
architectures
93
• Beat state of the art in many areas: – Language Modeling (2012, Mikolov et al) – Image Recognition (Krizhevsky won 2012
ImageNet competition) – Sentiment Classification (2011, Socher et al) – Speech Recognition (2010, Dahl et al) – MNIST hand-written digit recognition (Ciresan et
al, 2010)
94
Credits
• Roelof Pieters, www.graph-technologies.com • Andrew Ng • http://colah.github.io/
95
top related