adaline,madaline,widrow hoff
DESCRIPTION
Adaline,Madaline,Widrow HoffTRANSCRIPT
![Page 1: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/1.jpg)
ADALINE, MADALINE and the Widrow-Hoff Rule
![Page 2: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/2.jpg)
Adaptive Linear Combiner
![Page 3: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/3.jpg)
ADALINE
![Page 4: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/4.jpg)
MADALINE
![Page 5: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/5.jpg)
![Page 6: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/6.jpg)
Minimal Disturbance Principle
• Adjust weights to reduce error wrt current pattern with minimal disturbance to patterns already learnt.
• In other words, make changes to the weight vector in the same direction as the input
![Page 7: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/7.jpg)
Learning Rules
![Page 8: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/8.jpg)
Error Correction- Single Element Network
![Page 9: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/9.jpg)
Perceptron Convergence Rule Non-linear Weight update:
Quantizer error:
![Page 10: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/10.jpg)
Geometric Visualization of the Perceptron Convergence Rule
![Page 11: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/11.jpg)
Geometric Visualization of the Perceptron Convergence Rule
![Page 12: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/12.jpg)
Geometric Visualization of the Perceptron Convergence Rule
![Page 13: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/13.jpg)
α Least Mean Square (LMS) Linear
![Page 14: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/14.jpg)
Weight Update equation: Error for the kth input pattern
Change in error for the kth input pattern after the weights have been updated:
Condition for convergence and stability
![Page 15: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/15.jpg)
Error Correction Rules for Multi-Layer Networks
![Page 16: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/16.jpg)
Madaline Rule 1 Non-Linear
![Page 17: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/17.jpg)
Steps:
• If output matches the desired response- no adaptation
• If output is different:
- Find the adaline whose linear sum is closest to 0
- Adapt its weights in the LMS direction far enough to reverse its output.
- LOAD SHARING : Do until you get desired response.
![Page 18: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/18.jpg)
Madaline Rule II Non-Linear
![Page 19: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/19.jpg)
Steps: (For one training pattern) • Similar to MR I
• Concept of trial adaptation, by adding a small perturbation of suitable amplitude and polarity
• If output Hamming error is reduced- change the weights of that adaline in direction collinear with input, else no adaptation
• Keep doing this for all adalines with sufficiently small linear output magnitude.
• Finally last layer adapted using alpha-LMS.
![Page 20: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/20.jpg)
Steepest Descent – Single Element Network
![Page 21: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/21.jpg)
Error Surface of a Linear Combiner
![Page 22: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/22.jpg)
The Optimal Weiner Hopf Weight
The squared error can be written as:
Taking expectation of the above expression yields:
![Page 23: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/23.jpg)
So MSE surface equations is :
With global optimal weight solution as:
![Page 24: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/24.jpg)
Gradient Descent Algorithm
The aim of gradient descent is to make weight updates in the direction of the negative gradient by a factor of μ, which controls the stability and convergence of the algorithm and ∇𝑘 is the gradient a point on the MSE surface corresponding to w= 𝑤𝑘.
𝑤𝑘+1 = 𝑤𝑘 + 𝜇(−𝛻𝑘)
![Page 25: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/25.jpg)
μ- LMS Linear
• It uses the Instantaneous Gradient i.e. the gradient of the squared error of the current training sample is used as an approximation of the actual gradient.
![Page 26: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/26.jpg)
• Since instantaneous gradient can be easily calculated from the current sample, rather than averaging over instantaneous gradients over all pattern in the training set.
• For stability and convergence we need:
![Page 27: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/27.jpg)
Madaline III
Non-Linear
![Page 28: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/28.jpg)
Steps:
• Small perturbation added to input
• Change in error and change in putput due to this perturbation on input is calculated
• Given this change in output error wrt input perturbation, instantaneous gradient can be calculated.
• It is shown to be mathematically equivalent to backprop if input perturbation is small.
![Page 29: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/29.jpg)
Approximate Gradient:
Since: And therefore:
So for small perturbation:
So weight update equation is thus:
![Page 30: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/30.jpg)
• No need to know apriori the nature of the
activation function • Robust to drifts in analog hardware
Alternatively:
So weight update equation is thus:
![Page 31: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/31.jpg)
Steepest Descent- Multi-Layer Networks
![Page 32: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/32.jpg)
Madaline -III
![Page 33: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/33.jpg)
Steps:
• Same as a single element, except here the change due to perturbation is measure at the output of multiple layers.
• Add perturbation to the linear sum.
• Measure the change in sum of squared error caused due to this perturbation.
• Obtain the instantaneous gradient of MSE wrt weight vector of the perturbed adaline.
![Page 34: Adaline,Madaline,Widrow Hoff](https://reader030.vdocuments.us/reader030/viewer/2022012314/55cf91c6550346f57b907e90/html5/thumbnails/34.jpg)
Relevance to Present day work
• μ-LMS and α-LMS are still used today
• MR-III and MR-II can be applied on complicated architectures
• Given an arbitrary activation function, one can actually use MR-III architecture without requiring the activation function to be known