1 hassoun chap3 perceptron
Post on 03-Jun-2018
232 Views
Preview:
TRANSCRIPT
-
8/11/2019 1 Hassoun Chap3 Perceptron
1/10
s tc o n
cli
Fundamentals of RTIFICI LNEUR L NETWORKS
Mohamad H. Hassoun
II ;
-
8/11/2019 1 Hassoun Chap3 Perceptron
2/10
3 earning ulesOne of the most significant attributes of a neural network is its ability to learinteracting with its environment or:Wiih an information source. Learning in a nenetwork is normally accomplished through an adaptive procedure, known as a leing rule or algorithm whereby the weights of the network are incrementally adjusso as to improve a predefined performance measure over time.
In the context of artificial neural networks, the process of learning is best viewas an optimization process. More precisely, the learning process can be viewesearch in a multidimensional parameter (weight) space for a solution, which gr
~ yoptimizes a prespecified objective {criterion) function. This view is adopted inchapter, and it allows us to unify a wide range of existing learning rules which othewise would have looked more like a diverse variety of learning procedures.
This chapter presents a number of basic learning rules for supervised, reinforcedand unsupervised learning tasks. In supervised learning (also known as learning witteacher or assOciative learning , each input pattern/signal received from the environmentis associated with a specific desired target pattern. Usually, the weights are syn
-
8/11/2019 1 Hassoun Chap3 Perceptron
3/10
S8 3 Learning Ru
established that all these learning rules can be systematically derived as minimizersan appropriate criterion function.
3 1 1 Error Correction Rules
Error-correction rules were proposed initially as ad hoc rules for single-unit trainingThese rules essentially drive the output error of a given unit to zero. This section starwith the classic perceptron learning rule and gives a proof for its convergence. Tother error correction rules such as Mays' rule and the IX-LMS rule are coverThroughout this section an attempt is made to point out criterion functions thatminimized by using each rule. These learning rules also will be cast as relaxation rules,
-
8/11/2019 1 Hassoun Chap3 Perceptron
4/10
3.1 Supervised Learning in a Single-Unit Setting
k = 1,2, ... , m, is the desired target for the kth input vector (usually the order of thtraining pairs is random). The entire collection of these pairs is called the training
The goal, then, is to design a perceptron such that for each input vector k oftraining set, the perceptron output l matches the desired target dk; that is, we reql = sgn(wTx k = dk, for each k = 1,2,00.,m. In this case we say that the perceptcorrectly classifies the training set. Of course, designing an appropriate perceptrto correctly classify the training set amounts to determining a weight vector wthat the following relations are satisfied:
if d k = + 1if d k = 1
(3
Recall that the set of all x which satisfy x T w = U defines a hyperplane in R . T
-
8/11/2019 1 Hassoun Chap3 Perceptron
5/10
60 3 Learning Rules
Notice that for p = 0.5, the perceptrop. learning rule can be written as
where
{
WI arbitraryW k l = wk iwk+1 =
{ X k
Z k - kx
otherwise
if d k = 1if d k = 1
That is, a correction is made if and only if a misclassification, indicated by
(i )TW k :s; 0
(3.1.3)
(3.1.4)
(3.1.5)
occurs. The addition of vector Zk to wk in Equation (3.1.3) moves the weight vectordirectly toward and perhaps across the hyperplane (i )TW k = O. The new inner product (i )Twl+ 1 is larger than (Zk)Twl by the amount of 11i 1I2 , and the correction aw k =wl+1 - wk is clearly moving wk in a good direction, the direction of increasing (i )Twl,
-
8/11/2019 1 Hassoun Chap3 Perceptron
6/10
-
8/11/2019 1 Hassoun Chap3 Perceptron
7/10
62 3 Learning Ru
This sensitivity is responsible for the varying quality of the perceptron-generatedseparating surface observed in simulations.
The bound on the number of corrections ko given by Equation (3.1.14) dependsthe choice of the initial weight vector WI. If WI = 0, we get
or k = maxii 2 11w*1I 2
o [mini (Xi)TW*] 2(3.1.1
Here, ko is a function of the initially unknown solution weight vector w . ThereforEquation (3.1.15) is of no help for predicting the maximum number of correctionHowever, the denominator of Equation (3.1.15) implies that the difficulty ofthe prolem is essentially determined by the samples most nearly orthogonal to the solution
vector.eneralizations of tbe Perceptron Learning Rule The perceptron lea:rning rule may
be generalized to include a variable increment p and a fixed positive margin b Thgeneralized learning rule updates the weight vector whenever (z )TW fails to excethe margin b. Here, the algor ithm for weight vector update is given by
-
8/11/2019 1 Hassoun Chap3 Perceptron
8/10
3.t Supervised Learning in a Single-Unit Setting
e.g., pk = p/k or even l = pk , then w converges to a solution w* that satisfZI)TW* > b for i = 1,2,: .. , m. Furthermore, when p is fixed at a positive constantthis learning rule converges in finite time.
Another variant of the perceptron learning rule is given by the batch updaprocedure
{
W1 arbitrary
wk+1 = wk + p L zzeZ( , , )
3.1
where Z Wk) is the set of patterns z misclassified by wk. Here, the weight vector changeAw = W i - wk is along the direction of the resultant vector of all misclassified patterns. In general, this update procedure converges faster than the perceptron rule.it requires more storage
In the nonlInearly separable case, the preceding algorithms do not converge. Fewtheoretical results are available on the behavior of these algorithms for nonlinearl)separable problems [see Minsky and Papert 1969) and Block and Levin 1970)
-
8/11/2019 1 Hassoun Chap3 Perceptron
9/10
64 3 Learning RUles
Given this objective function J w), the search point W can be incrementally improved at each iteration by sliding downhill on the surface defined by J(w) in w space.Specifically, we may use J to perform a discrete gradient-descent search that updates
i so that a step is taken downhill in the steepest direction along the search surfaceJ(w) at Wk This can be achieved by making Aw k proportional to the gradient of J atthe present location wk ; formally, we may write 2
W 1 =W -pVJ(W)I ..=wk=W -P - - . . .[ aJ aJ aJ JTaWl aW2 aW,,+l ..="k
(3.1.21)
Here, the initial search point WI and the learning rate (step size) p are to be specifiedby the user. Equation (3.1.21) can be called the steepest gradient descent search rule or,
simply, gradient descent. Next, substituting the gradientVJ(W ) = - L 'z (3.1.22)
& e Z ( k )
into Equation (3.1.21) leads to the weight update rule
-
8/11/2019 1 Hassoun Chap3 Perceptron
10/10
3.l Supervised Learning in a Single-Unit Setting
procedure as in Equations (3.1.21) through 3.1.23), it can e shown that
J w) = - L (ZTW - b:s:b
(3
is the appropriate criterion function for the modified perceptron rule in Equation3.1.16).
Before moving on, it should be noted that the gradient of J in Equation 3.1.not mathematically precise. Owing to the piecewise linear nature of J sudden cha
\in the gradient of J occur every time the perceptron output goes through a transition at (Zk)TW = O Therefore, the gradient of J is not defined at transition poinsatisfying zk)TW = 0, k = 1,2, , m. However, because of the discrete nature of Etion 3.1.21), the likelihood of w
k
overlapping with one of these transition pointnegligible, and thus we may still express VJ as in Equation 3.1.22). The readreferred to Problem 3.1.3 for further exploration into gradient descent on the pertron criterion function.
top related