advanced introduction to machine learning, cmu-10715 · 2014-09-17 · advanced introduction to...
TRANSCRIPT
![Page 1: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/1.jpg)
Advanced Introduction to
Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron
Barnabás Póczos, Sept 17
![Page 2: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/2.jpg)
2
Contents
History of Artificial Neural Networks Definitions: Perceptron, MLP Representation questions Perceptron algorithm Backpropagation algorithm
![Page 3: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/3.jpg)
3
Progression (1943-1960)
• First mathematical model of neurons
Pitts & McCulloch (1943)
• Beginning of artificial neural networks
• Perceptron, Rosenblatt (1958)
A single layer neuron for classification
Perceptron learning rule
Perceptron convergence theorem
Degression (1960-1980)
• Perceptron can’t even learn the XOR function
• We don’t know how to train MLP
• 1969 Backpropagation… but not much attention…
Short History
![Page 4: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/4.jpg)
4
Progression (1980-)
• 1986 Backpropagation reinvented:
Rumelhart, Hinton, Williams: Learning representations by back-propagating errors. Nature, 323, 533—536, 1986
• Successful applications:
Character recognition, autonomous cars,…
• Open questions: Overfitting? Network structure? Neuron number? Layer number? Bad local minimum points? When to stop training?
• Hopfield nets (1982), Boltzmann machines,…
Short History
![Page 5: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/5.jpg)
5
Degression (1993-)
• SVM: Vapnik and his co-workers developed the Support Vector Machine (1993). It is a shallow architecture.
• SVM almost kills the ANN research.
• Training deeper networks consistently yields poor results.
• Exception: deep convolutional neural networks, Yann LeCun 1998. (discriminative model)
Short History
![Page 6: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/6.jpg)
6
Short History
Deep Belief Networks (DBN)
• Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
• Generative graphical model
• Based on restrictive Boltzmann machines
• Can be trained efficiently
Deep Autoencoder based networks
Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19
Progression (2006-)
![Page 7: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/7.jpg)
7
– Each neuron has a body, axon, and many dendrites
– A neuron can fire or rest
– If the sum of weighted inputs larger than a threshold, then
the neuron fires.
– Synapses: The gap between the axon and other neuron’s
dendrites. It determines the weights in the sum.
The Neuron
![Page 8: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/8.jpg)
8
The Mathematical Model of a Neuron
![Page 9: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/9.jpg)
9
• Identity function
• Threshold function
(perceptron)
• Ramp function
Typical activation functions
![Page 10: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/10.jpg)
10
• Logistic function
• Hiperbolic tangent function
Typical activation functions
![Page 11: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/11.jpg)
11
Structure of Neural Networks
![Page 12: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/12.jpg)
12
Input neurons, Hidden neurons, Output neurons
Fully Connected Neural Network
![Page 13: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/13.jpg)
13
Layers
Convention: The input layer is Layer 0.
![Page 14: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/14.jpg)
14
• Connections only between Layer i and Layer i+1
• = Multilayer perceptron
• The most popular architecture.
Feedforward neural networks
Recurrent NN: there are connections backwards too.
![Page 15: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/15.jpg)
15
What functions can multi-layer perceptrons represent?
![Page 16: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/16.jpg)
16
Perceptrons cannot represent the XOR function
f(0,0)=1, f(1,1)=1, f(0,1)=0, f(1,0)=0
What functions can multilayer perceptrons represent?
![Page 17: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/17.jpg)
17
“Solve 7-th degree equation using continuous functions of two parameters.”
Related conjecture:
Let f be a function of 3 arguments such that
Prove that f cannot be rewritten as a composition of finitely many functions of two arguments.
Another rewritten form:
Prove that there is a nonlinear continuous system of three variables that cannot be decomposed with finitely many functions of two variables.
Hilbert’s 13th Problem
![Page 18: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/18.jpg)
18
f(x,y,z)=Φ1(ψ1(x), ψ2(y))+Φ2(c1ψ3(y)+c2ψ4(z),x)
x
z
y
Σ
Φ1
Φ2
ψ1
ψ2
ψ3
ψ4
Σ
f(x,y,z)
c1
c2
Function decompositions
![Page 19: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/19.jpg)
19
1957, Arnold disproves Hilbert’s conjecture.
Function decompositions
![Page 20: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/20.jpg)
20
Issues: This statement is not constructive.
For a given N we don’t know
and for a given N and f, we don’t know
Function decompositions
Corollary:
![Page 21: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/21.jpg)
21
Kur Hornik, Maxwell Stinchcombe and Halber White: “Multilayer feedforward networks are universal approximators”, Neural Networks, Vol:2(3), 359-366, 1989
Universal Approximators
Definition: ΣN(g) neural network with 1 hidden layer:
M
i
i
T
ii
NN bxagcxxffg1
N1 )(),...,(:)(
If δ>0, g arbitrary sigmoid function, f is continuous on a compact set A, then
esetén )(ˆ)(such that ),(ˆ Axxfxfgf N
Theorem:
![Page 22: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/22.jpg)
22
Theorem: (Blum & Li, 1991)
SgnNet2(x,w) with two hidden layers and sgn activation function is uniformly dense in L2.
j l
ljlij
i
i xwwwwxNet123)2( sgnsgn,sgn
Universal Approximators
then0, and , i.e. , If 22 dxxfLxfX
such that ,w
n
i j l
ljliji dxdxxwwwxf 1
2
123sgnsgn
Definition:
Formal statement:
![Page 23: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/23.jpg)
23
Integral approximation in 1-dim:
i
jii )(
2
xdxxfi
Xi i
Proof
xi
xi xi
Integral approximation in 2-dim:
![Page 24: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/24.jpg)
24
xi
xi xi
The indicator function of Xi polygon can be learned by this neural network:
i j
jiji xba sgnsgn 1 if x is in Xi
-1 otherwise
Proof
The weighted linear combination of these indicator functions will be a
good approximation of the original function f
![Page 25: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/25.jpg)
25
LEARNING The Perceptron Algorithm
![Page 26: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/26.jpg)
26
1
-1
The Perceptron
![Page 27: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/27.jpg)
27
The training set
![Page 28: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/28.jpg)
28
The perceptron algorithm
![Page 29: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/29.jpg)
29
The perceptron algorithm
![Page 30: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/30.jpg)
30
Perceptron convergence theorem
Theorem:
If the training samples were linearly separable, then the algorithm finds a separating hyperplane in finite steps.
The upper bound on the number of steps is “independent” from the number of training samples, but depends on the dimension.
Proof: [homework]
![Page 31: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/31.jpg)
31
Multilayer Perceptron
![Page 32: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/32.jpg)
32
The gradient of the error
![Page 33: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/33.jpg)
33
Notation
![Page 34: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/34.jpg)
34
Some observations
![Page 35: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/35.jpg)
35
The backpropagated error
![Page 36: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/36.jpg)
36
The backpropagated error
Lemma
![Page 37: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/37.jpg)
37
The backpropagated error
Therefore,
![Page 38: Advanced Introduction to Machine Learning, CMU-10715 · 2014-09-17 · Advanced Introduction to Machine Learning, CMU-10715 Perceptron, Multilayer Perceptron Barnabás Póczos, Sept](https://reader033.vdocuments.us/reader033/viewer/2022042219/5ec4d6d387da536c51390195/html5/thumbnails/38.jpg)
38
The backpropagation algorithm