learning via neural networks l. manevitz all rights reserved
Post on 21-Dec-2015
214 views
TRANSCRIPT
![Page 1: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/1.jpg)
Learning via Neural Networks
L. Manevitz All rights reserved
![Page 2: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/2.jpg)
Natural versus Artificial Neuron
• Natural Neuron McCullough Pitts Neuron
![Page 3: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/3.jpg)
Definitions and History
• McCullough –Pitts Neuron
• Perceptron
• Adaline
• Linear Separability
• Multi-Level Neurons
• Neurons with Loops
![Page 4: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/4.jpg)
Sample Feed forward Network (No loops)
•Weights •Weights
•Weights
•Input
•Output
•Wji•Vik
F(wji xj
![Page 5: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/5.jpg)
•Pattern Identification
•(Note: Neuron is trained)
•weights
field receptivein threshold Axw ii kdkdkfjlll
field. receptive in the is letter The Axw ii
Perceptron
![Page 6: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/6.jpg)
•weights
field receptivein threshold Axw ii kdkdkfjlll
Feed Forward Network
•weights
![Page 7: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/7.jpg)
Reason for Explosion of Interest
• Two co-incident affects (around 1985 – 87)
– (Re-)discovery of mathematical tools and algorithms for handling large networks
– Availability (hurray for Intel and company!) of sufficient computing power to make experiments practical.
![Page 8: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/8.jpg)
Some Properties of NNs
• Universal: Can represent and accomplish any task.
• Uniform: “Programming” is changing weights
• Automatic: Algorithms for Automatic Programming; Learning
![Page 9: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/9.jpg)
Networks are Universal
• All logical functions represented by three level (non-loop) network (McCullough-Pitts)
• All continuous (and more) functions represented by three level feed-forward networks (Cybenko et al.)
• Networks can self organize (without teacher).
• Networks serve as associative memories
![Page 10: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/10.jpg)
Universality
• McCullough-Pitts: Adaptive Logic Gates; can represent any logic function
• Cybenko: Any continuous function representable by three-level NN.
![Page 11: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/11.jpg)
Networks can “LEARN” and Generalize (Algorithms)
• One Neuron (Perceptron and Adaline) Very popular in 1960s – early 70s– Limited by representability (only linearly separable
• Feed forward networks (Back Propagation)– Currently most popular network (1987 –now)
• Kohonen self-Organizing Network (1980s – now)(loops)
• Attractor Networks (loops)
![Page 12: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/12.jpg)
Learnability (Automatic Programming)
• One neuron: Perceptron and Adaline algorithms (Rosenblatt and Widrow-Hoff) (1960s –now)
Feed forward Networks: Backpropagation (1987 – now)
Associative Memories and Looped Networks (“Attractors”) (1990s – now)
![Page 13: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/13.jpg)
Generalizability
• Typically train a network on a sample set of examples
• Use it on general class
• Training can be slow; but execution is fast.
![Page 14: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/14.jpg)
•weights
field receptivein threshold Axw ii kdkdkfjlll
Feed Forward Network
•weights
![Page 15: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/15.jpg)
Classical Applications(1986 – 1997)
• “Net Talk” : text to speech
• ZIPcodes: handwriting analysis
• Glovetalk: Sign Language to speech
• Data and Picture Compression: “Bottleneck”
• Steering of Automobile (up to 55 m.p.h)
• Market Predictions
• Associative Memories
• Cognitive Modeling: (especially reading, …)
• Phonetic Typewriter (Finnish)
![Page 16: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/16.jpg)
Neural Network
• Once the architecture is fixed; the only free parameters are the weights
• Thus Uniform ProgrammingUniform Programming
• Potentially Potentially Automatic ProgrammingAutomatic Programming
• Search for Learning AlgorithmsSearch for Learning Algorithms
![Page 17: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/17.jpg)
Programming: Just find the weights!
• AUTOMATIC PROGRAMMING
• One Neuron: Perceptron or Adaline
• Multi-Level: Gradient Descent on Continuous Neuron (Sigmoid instead of step function).
![Page 18: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/18.jpg)
Prediction
•Input/Output •NN
•delay
•Compare
![Page 19: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/19.jpg)
Abstracting
• Note: Size may not be crucial (apylsia or crab does many things)
• Look at simple structures first
![Page 20: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/20.jpg)
Real and Artificial Neurons
![Page 21: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/21.jpg)
One NeuronMcCullough-Pitts
• This is very complicated. But abstracting the details,we have
w1
w2
wn
x1
x2
xn
hresholdntegrate
Integrate-and-fire Neuron
![Page 22: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/22.jpg)
Representability
• What functions can be represented by a network of Mccullough-Pitts neurons?
• Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons.
![Page 23: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/23.jpg)
Proof
• Show simple functions: and, or, not, implies
• Recall representability of logic functions by DNF form.
![Page 24: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/24.jpg)
AND, OR, NOT
w1
w2
wn
x1
x2
xn
hresholdntegrate
1.5
1.0
1.0
![Page 25: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/25.jpg)
AND, OR, NOT
w1
w2
wn
x1
x2
xn
hresholdntegrate
.9
1.0
1.0
![Page 26: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/26.jpg)
AND, OR, NOT
w1
w2
wn
x1
x2
xn
hresholdntegrate
.5
-1.0
![Page 27: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/27.jpg)
DNF and All Functions
![Page 28: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/28.jpg)
Learnability and Generalizability
• The previous theorem tells us that neural networks are potentially powerful, but doesn’t tell us how to use them.
• We desire simple networks with uniform training rules.
![Page 29: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/29.jpg)
One Neuron(Perceptron)
• What can be represented by one neuron?
• Is there an automatic way to learn a function by examples?
![Page 30: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/30.jpg)
Perceptron Training Rule
• Loop:
Take an example. Apply to network. If correct answer,
return to loop. If incorrect,
go to FIX.
FIX: Adjust network weights by
input example . Go to Loop.
![Page 31: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/31.jpg)
Example of PerceptronLearning
• X1 = 1 (+) x2 = -.5 (-)
• X3 = 3 (+) x4 = -2 (-)
• Expanded Vector– Y1 = (1,1) (+) y2= (-.5,1)(-)
– Y3 = (3,1) (+) y4 = (-2,1) (-)
Random initial weight
(-2.5, 1.75)
![Page 32: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/32.jpg)
Trace of Perceptron
– W1 y1 = (-2.5,1.75) (1,1)<0 wrong
• W2 = w1 + y1 = (-1.5, 2.75)– W2 y2 = (-1.5, 2.75)(-.5, 1)>0 wrong
• W3 = w2 – y2 = (-1, 1.75)– W3 y3 = (-1,1.75)(3,1) <0 wrong
• W4 = w4 + y3 = (2, 2.75)
![Page 33: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/33.jpg)
Perceptron Convergence Theorem
• If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time.
• (MAKE PRECISE and Prove)
![Page 34: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/34.jpg)
Percentage of Boolean Functions Representible by a Perceptron
• Input Functions Perceptron
1 4 4
2 16 14 3 104256 4 1,882 65,536
5 94,572 10**9
6 15,028,134 10**19
7 8,378,070,864 10**38
8 17,561,539,552,946 10**77
![Page 35: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/35.jpg)
What is a Neural Network?
• What is an abstract Neuron?
• What is a Neural Network?
• How are they computed?
• What are the advantages?
• Where can they be used?
– Agenda
– What to expect
![Page 36: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/36.jpg)
Perceptron Algorithm
• Start: Choose arbitrary value for weights, W
• Test: Choose arbitrary example X
• If X pos and WX >0 or X neg and WX <= 0 go to Test
• Fix: – If X pos W := W +X;
– If X negative W:= W –X;
– Go to Test;
![Page 37: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/37.jpg)
Perceptron Conv. Thm.
• Let F be a set of unit length vectors. If there is a vector V* and a value e>0 such that V*X > e for all X in F then the perceptron program goes to FIX only a finite number of times.
![Page 38: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/38.jpg)
Applying Algorithm to “And”
• W0 = (0,0,1) or random
• X1 = (0,0,1) result 0
• X2 = (0,1,1) result 0
• X3 = (1,0, 1) result 0
• X4 = (1,1,1) result 1
![Page 39: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/39.jpg)
“And” continued• Wo X1 > 0 wrong;
• W1 = W0 – X1 = (0,0,0)
• W1 X2 = 0 OK (Bdry)
• W1 X3 = 0 OK
• W1 X4 = 0 wrong;
• W2 = W1 +X4 = (1,1,1)
• W3 X1 = 1 wrong
• W4 = W3 –X1 = (1,1,0)
• W4X2 = 1 wrong
• W5 = W4 – X2 = (1,0, -1)
• W5 X3 = 0 OK
• W5 X4 = 0 wrong
• W6 = W5 + X4 = (2, 1, 0)
• W6 X1 = 0 OK
• W6 X2 = 1 wrong
• W7 = W7 – X2 = (2,0, -1)
![Page 40: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/40.jpg)
“And” page 3
• W8 X3 = 1 wrong
• W9 = W8 – X3 = (1,0, 0)
• W9X4 = 1 OK
• W9 X1 = 0 OK
• W9 X2 = 0 OK
• W9 X3 = 1 wrong
• W10 = W9 – X3 = (0,0,-1)
• W10X4 = -1 wrong
• W11 = W10 + X4 = (1,1,0)
• W11X1 =0 OK
• W11X2 = 1 wrong
• W12 = W12 – X2 = (1,0, -1)
![Page 41: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/41.jpg)
Proof of Conv Theorem
• Note:
1. By hypothesis, there is a
such that V*X > for all x F
1. Can eliminate threshold
(add additional dimension to input) W(x,y,z) > threshold if and only if
W* (x,y,z,1) > 0
2. Can assume all examples are positive ones
(Replace negative examples
by their negated vectors)
W(x,y,z) <0 if and only if
W(-x,-y,-z) > 0.
![Page 42: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/42.jpg)
Proof (cont).
• Consider quotient V*W/|W|.
(note: this is multidimensional
cosine between V* and W.)
Recall V* is unit vector .
Quotient <= 1.
![Page 43: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/43.jpg)
Proof(cont)
• Now each time FIX is visited W changes via ADD.
V* W(n+1) = V*(W(n) + X)
= V* W(n) + V*X
>= V* W(n) +
Hence
V* W(n) >= n
![Page 44: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/44.jpg)
Proof (cont)
• Now consider denominator:
• |W(n+1)| = W(n+1)W(n+1) = ( W(n) + X)(W(n) + X) = |W(n)|**2 + 2W(n)X + 1 (recall |X| = 1)
< |W(n)|**2 + 1
So after n times
|W(n+1)|**2 < n (**)
![Page 45: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/45.jpg)
Proof (cont)
• Putting (*) and (**) together:
Quotient = V*W/|W|
> n sqrt(n)
Since Quotient <=1 this means
n < 1/
This means we enter FIX a bounded number of times.
Q.E.D.
![Page 46: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/46.jpg)
Geometric Proof
• See hand slides.
![Page 47: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/47.jpg)
Additional Facts
• Note: If X’s presented in systematic way, then solution W always found.
• Note: Not necessarily same as V*
• Note: If F not finite, may not obtain solution in finite time
• Can modify algorithm in minor ways and stays valid (e.g. not unit but bounded examples); changes in W(n).
![Page 48: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/48.jpg)
What wont work?
• Example: Connectedness with bounded diameter perceptron.
• Compare with Convex with
(use sensors of order three).
![Page 49: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/49.jpg)
What wont work?
• Try XOR.
![Page 50: Learning via Neural Networks L. Manevitz All rights reserved](https://reader036.vdocuments.us/reader036/viewer/2022062714/56649d565503460f94a345fb/html5/thumbnails/50.jpg)
Limitations of Perceptron
• Representability
– Only concepts that are linearly separable.
– Compare: Convex versus connected
– Examples: XOR vs OR