ann 3- perceptron

8/2/2019 ANN 3- Perceptron

1/56

Neural NetworksNeural NetworksLectureLecture 33

ChapterChapter 0101

Resenblatts perceptron

Dr.Dr. HalaHala MousherMousher EbiedEbiedScientific Computing DepartmentScientific Computing Department

Faculty of Computers & Information SciencesFaculty of Computers & Information SciencesAin Shams UniversityAin Shams University


2/56

Lecture 3-2

AgendaAgenda

SingleSingle--layer feedforward networks andlayer feedforward networks andClassification Problems (Classification Problems (McCulloch-Pitts neuronmodel )

Decision SurfaceDecision Surface

Rosenblatts Perceptron Learning Rule (ThePerceptron Convergence Theorem )

Illustrate ExampleIllustrate Example 11 Illustrate ExampleIllustrate Example 22 (AND)(AND)


3/56

ASU-CSC445: Neural Networks 3

IntroductionIntroductionMcCulloch and Pitts proposed the McCulloch-Pitts neuron model1943

Hebb published his book The Organization of Behavior, in which the Hebbianlearning rule was proposed.

1949

Rosenblatt introduced the simple single layer networks now calledPerceptrons.

1958

Minsky and Paperts book Perceptrons demonstrated the limitation of singlelayer perceptrons, and almost the whole field went into hibernation.

1969

Hopfield published a series of papers on Hopfield networks.1982

Kohonen developed the Self-Organising Maps that now bear his name.1982

The Back-Propagation learning algorithm for Multi-Layer Perceptrons wasrediscovered and the whole field took off again.

1986

The sub-field of Radial Basis Function Networks was developed.1990s

The power of Ensembles of Neural Networks and Support Vector Machinesbecomes apparent.

2000s


4/56

Lecture 3-4

going back togoing back to

The McCulloch-Pitts neuron model (1943)

Threshold Logic Units (TLU)Threshold Logic Units (TLU)

00

01)()( 2211

k

k

kkkkifv

ifvxwxwvy


5/56

going back togoing back to

ClassificationClassificationThe goal of pattern classification is to assign a physical

object or event to one of a set of a pre-specified

classes (or categories)

Examples:

Boolean functions

Pixel Patterns

Lecture 3-5


6/56

Lecture 3-6

The AND problemThe AND problem

x1 x2 y

1 1 1

1 0 0

0 1 0

0 0 0

11211 wwLet

,01

02 ?

05.1

)0(0)()1001(,00

)01(0)1()1101(,10

)01(0)1()0111(,01

)02(1)2()1111(,11

21

21

21

21

ifyxandxwith

ifyxandxwith

ifyxandxwith

ifyxandxwith


7/56

Lecture 3-7

The OR problemThe OR problem

x1 x2 y

1 1 1

1 0 1

0 1 1

0 0 011211 wwLet

5.001 ,02

?0

)0(0)()1001(,00

)01(1)1()1101(,10

)01(1)1()0111(,01

)02(1)2()1111(,11

21

21

21

21

ifyxandxwith

ifyxandxwith

ifyxandxwith

ifyxandxwith


8/56

Lecture 3-8

The XOR problemThe XOR problem

x1 x2 y

1 1 0

1 0 1

0 1 1

0 0 0

11211

wwLet

???


9/56

Lecture 3-9

TheThe graph forgraph for AND, OR, and XORAND, OR, and XOR

ProblemsProblems

Y=0

Y=1


10/56

Lecture 3-10

Whats NextWhats Next

SingleSingle--layer Feedforward Networks andlayer Feedforward Networks andClassification Problems (McCullochClassification Problems (McCulloch--Pitts neuronPitts neuronmodel )model )

Decision SurfaceDecision Surface

Rosenblatts Perceptron Learning Rule (TheRosenblatts Perceptron Learning Rule (ThePerceptron Convergence Theorem )Perceptron Convergence Theorem )

Illustrate ExampleIllustrate Example 11 Illustrate ExampleIllustrate Example 22 (AND)(AND)


11/56

Decision Surface (Decision Surface (hyperplanehyperplane))))

For minput variables x1, x2, , xm. The decisionsurface is the surface at which the output of the nodeis precisely equal to the threshold (bias), and defined

by

On one side of this surface, the output of decision unit,y, will be 0, and on the other side it will be 1.

Lecture 3-11

01

m

i

iixw


12/56

Lecture 3-12

LinearLinear separableseparable

A function f:{0, 1}n {0, 1} is linearly separable if thespace of input vectors yielding 1 can be separated fromthose yielding 0 by a linear surfacelinear surface (hyperplanehyperplane) in nn

dimensionsdimensions. Examples: in case of two-dimensional, a hyperpane is a

straight line.

linearly separable Non-linearly separable


13/56

Lecture 3-13

Decision Surfaces in 2Decision Surfaces in 2--DD

Two classification regions are separated bythe decision boundary line L:

This line is shifted away from the origin

according to the bias . If the bias is 0, then the decision surface

passes through the origin.

This line is perpendicular to the weight

vector W

Adding a bias allows the neuron to solveproblems where the two sets of inputvectors are not located on different sides of

the ori in.

02211 xwxw

2

1

ww


14/56

Decision Surfaces in 2Decision Surfaces in 2--D, cont.D, cont.

In 2-dim, the surface is:

Which we can write

Which is the equation of a line of gradient

With intercept

Lecture 3-14

02211 xwxw

2

1

2

12

wx

w

wx

2

1

w

w

2w


15/56

Lecture 3-15

Examples forExamples for Decision Surfaces inDecision Surfaces in 22--DD ::

x1

1 2 3-3 -2 -1

x2

12

3

-3

-2

-1

Class 1

Let w1 = 1, w2 = 2, = -2

The decision boundary:

X1+2X2-2=0

x1

1 2 3-3 -2 -1

x2

12

3

-3

-2

-1

Class 1

Let w1 = -2, w2 = 1, =- 2

The decision boundary:

-2X1+X2-2=0

Depending on the values of the weights, the decision boundaryline will separate the possible inputs into two categories.

Class 0Class 0


16/56

ExampleExample

Slope= (6-0)/(0-6)= -1

Intersection at x2-axis=6

then the hyperplane (straightline ) will be x2=-x1+6

Which is: xx11+x+x22--66==00

Lecture 3-16


17/56

Lecture 3-17

Linear Separability for nLinear Separability for n--dimensionaldimensional So by varying the weights and the threshold, we can

realize any linear separation of the input space into aregion that yields output 1, and another region that yields

output 0. As we have seen, a twotwo--dimensionaldimensional input space can

be divided by any straight line.

A threethree--dimensionaldimensional input space can be divided by anytwo-dimensional plane.

In general, an nn--dimensionaldimensional input space can bedivided by an (n-1)-dimensional plane or hyperplane.

Of course, for n > 3 this is hard to visualize.


18/56

Lecture 3-18

Rosenblatts Perceptron LearningRosenblatts Perceptron Learning

RuleRule

Whats Next


19/56

The PerceptronThe Perceptron

by Frank Rosenblatt (by Frank Rosenblatt (19581958,, 19621962))

It is the simplestform of a neural network.

It is a single-layer network whose synapticweights apple

to adjustable.

is limitedto performpattern classification with only two

classes.

The two classes must be linearly Separable; patterns lie on

opposite sides of a hyperplane.

Lecture 3-19


20/56

Lecture 3-20

The Perceptron ArchitectureThe Perceptron Architecture

A perceptron nonlinear neuron, which uses the hard-limittransfer function (signum activation function), returnsa +1 or -1.

Weights are adapted using an error-correction rule,

The training technique used is called thethe perceptronperceptronlearninglearning rulerule..

xwxw

bxwv

T

m

j

jj

m

j

jj

0

1

01

01)sgn()(

v

vvvyk


21/56

The Perceptron Convergence AlgorithmThe Perceptron Convergence Algorithm

The learning algorithms find a weight vector w such that:

, then the training problem is then to find a weight vectorW such that the previous two inequalities are satisfied.This is achieved when updating the weights as follows:

Lecture 3-21

C1orinput vecteveryfor0 xxwT

C2orinput vecteveryfor0 xxwT

C1x(n)and0)(if)()1( nnwnnT

xwww

C2x(n)and0)(if)()1( nnwnn T xwww


22/56

Lecture 3-22

The Perceptron Convergence TheoremThe Perceptron Convergence Theorem

Rosenblatt (1958,1962)

If a problem is linearly separable, thenThe perceptron will converges after some n iterations

i.e.The perceptron will learn in a finite number of steps.

Rosenblatt designed a learning law for weightedadaptation in the McCulloch-Pitts model.


23/56

Lecture 3-23

Perceptron parameters and learning rulePerceptron parameters and learning rule

Training set consists of the pair {X,t}

X is an input vector and

t the desired target vector

Iterative process

Present a training example X ,

compute network output y ,

compare output y with target t,

and adjust weights. (addw)

Learning ruleSpecifies how to change the weights w of the network as a function

of

the inputs X, output y and target t.


24/56

Lecture 3-24

Drive the errorDrive the error--correction learning rulecorrection learning rule

-- how to adjusting the Weight Vector?how to adjusting the Weight Vector? The net input signalnet input signal is the sum of all inputs after

passing the synapses:

m

j jj

xwnet0

This can be viewed as computing the inner productinner product of

the vectors wi and X:

)cos(.. jj xwnet

where is the angleangle between the two vectors.

wwii

x1

x2

xx


25/56

Lecture 3-25

Drive the errorDrive the error--correction learning rule ,correction learning rule ,

cont.cont.

w

w = w + axx

Move w in thedirection of x

Case 1. ifError=t-y=0, then make a change w=0.

Case 2. ifTarget t=1, Output y=-1, and the Error=t-y=2

w

x then

The weight vector results in an incorrectly classifies the vector X.The input vector X is added to the weight vector w.This makes the weight vector point closer tocloser to the input vector,increasing the chance that the input vector will be classified as a 1in the future.

x


26/56

Lecture 3-26


cont.cont.

w

x

w = w - ax

w

x

x

Move w awayfrom thedirection of x

Case 3. ifTarget t=-1, Output y=1, and the Error=t-y=-2

then

The weight vector results in an incorrectly classifies the vector X.

The input vector X is subtracted from the weight vector w.This makes the weight vector point farther awayfarther away from the inputvector, increasing the chance that the input vector will be classifiedas a -1 in the future.


27/56

Lecture 3-27


cont.cont. The perceptron learning rule can be written more succinctly in terms ofthe error e = t y and the change to be made to the weight vector

w:

We can see that the sign of X is the same as the sign on theerror, e. Thus, all three cases can then be written with a single

expression:

With w(0)= random values. Where n denotes the time-step in applying thealgorithm.

The parameter is called the learning rate. It determines the

ma nitude of wei ht u dates w .

w= (t-y)

x

W(n+1) = w(n)+w(n) = w(n) + (t-y) x

CASE 1. If e = 0, then make a change w equal to 0.

CASE 2. If e = 2, then make a change w proportional to x.

CASE 3. If e = 2, then make a change w proportional to x.


28/56

Lecture 3-28

Learning Rate

governs the rate at which the training rule

converges toward the correct solution.

Typically


29/56

Lecture 3-29

PerceptronPerceptron ConvergenceConvergence AlgorithmAlgorithm

variables and parameters:

X(n) = input vector of dim (m+1) by 1 = [+1, x1, x2, , xm]T

W(n) = weight vector of dim (m+1) by 1 = [b, w1, w2, , wm]T

b = bias

y(n) = actual response

t (n) = desired (target) response

= learning rate parameter, a positive constant less than unity


30/56

Lecture 3-30

PerceptronPerceptron ConvergenceConvergence Algorithm, cont.Algorithm, cont.1- Start with a randomly chosen weight vector w(0)2- Repeat

3- for each training vector pair (Xi,ti)evaluate the output yi when Xi is the input

4- if yiti then

form a new weight vector wi+1 according towi+1 = wi + wi = wi + (t-y) Xi

else

do nothingend if

end for

5- Until the stopping criterion (convergence) is reached

convergence : epoch doesnt produce an error

(i..e all training vector classified correctly (y=t) for the same w(n) )(i.e. until error is zero for all training patterns).(i.e. no weights change in step 4 for all training patterns)


31/56

Lecture 3-31

Perceptron learning schemePerceptron learning scheme


32/56

Illustrative ExampleIllustrative Example

Lecture 3-32


33/56

IntroductionIntroduction

We will begin with a simple test problem to show how theperceptron learning rule should work. To simplify ourproblem, we will begin with a network without a bias andhave two-inputs and one output. The network will thenhave just two parameters, w11 and w12 , to adjustmentas shown in the following figure.

By removing the bias we are left with a network whosedecision boundary must pass through the origin.

Lecture 3-33


34/56

Example 1Example 1

The input/target pairs for our test problem are

Lecture 3-34

which are to be used in the training together

with the following weights

Show how the learning proceeds?

1;1;1

1

0;

2

1;

2

1

321

321

ddd

XXX

5.0;8.0

0.1

w


35/56

Lecture 3-35

Iteration 1. Input isX1 and the desired output isd1:

1)sgn(

6.0

1

12

18.00.111

nety

Xwnet T

sinced1=1 , the error signal ise= 1-(-1) = 2

Then the Correction (update weights) in this step is necessaryThen the Correction (update weights) in this step is necessary

The updated weight vector is

2.1

0.2

2

1

8.0

0.1

*))1(1(*5.0

2

12 1

w

Xww

This operation is illustrated in the adjacent figure.


36/56

Lecture 3-36


1)sgn(

4.0

2

2

12.10.2222

nety

XwnetT

sinced2=-1 , the error signal ise = -1-1 = -2



8.0

0.3

2

1

2.1

0.2*)11(*5.0

3

223

w

Xww



37/56

Lecture 3-37


1)sgn(

8.0

3

1

08.00.3333

nety

XwnetT

sinced1=-1 , the error signal ise = -1-1 = -2



2.0

0.3

1

0

8.0

0.3*)11(*5.0

4

334

w

Xww



38/56

Lecture 3-38

check

weights change for iteration 1, 2, and 3.No stop


39/56

Lecture 3-39

Iteration 4:

111

2

1

2.00.314

1)sgn(

4.31

dnety

Xwnet

T

222

2

12.00.3242

1)sgn(

6.2

dnety

XwnetT

33

1

02.00.3343

1)sgn(

2.0

dnety

XwnetT

No weights change for iteration 4, 5, and 6Stopping criteria is satisfied

Iteration 5:

Iteration 6:


40/56

Lecture 3-40

The diagram to the left shows that theperceptron has finally learned to classifythe three vectors properly.If we present any of the input vectors to

the neuron, it will output the correct classfor that input vector.


41/56

Lecture 3-41

Your turn (HomeworkYour turn (Homework 11))

Consider the following set of training vectorsx1,x2, andx3,

1

5.0

1

1

;

1

5.0

5.1

0

;

1

0

2

1

321 XXX

With learning rate equals 0.1, and the desired

responses and weights initialized as:

;

5.0

0

1

1

;

1

11

3

2

1

w

d

dd



42/56

ExampleExample 22

AND ProblemAND Problem

Lecture 3-42


43/56

Lecture 3-43

ExampleExample 22

Consider the AND problem

1

1;

1

1;

1

1;

1

14321 xxxx

With bias b =0.5 and the desired responses and weights

initialized as:

1.0;7.0

3.0;

1

1

1

1

4

3

2

1

w

d

d

d

d



44/56

Lecture 3-44

Consider:

1

1

1

;

1

1

1

;

1

1

1

;

1

1

1

4321 XXXX

7.0

3.0

5.0

w


45/56

Lecture 3-45

Iteration 1. InputX1present and its desired outputd1:

1)sgn(

5.1

1

1111

7.03.05.010

nety

XwnetT

sinced1=1 , the error signal is

e1 = 1-1 = 0Correction (update the weights) in this step isCorrection (update the weights) in this step is not necessarynot necessary

TTww 01


46/56

Lecture 3-46


9.01.03.0

9.0

1.03.0

2.0

2.02.0

7.0

3.05.0

*)11(1.0

2

2

12 2

Tw

w

Xww

Iteration 2. Input X2present and its desired output d2:

1)sgn(

1.0

2

1117.03.05.0212

nety

Xwnet T

sinced2=-1 , the error signal is

e2 = -1-1 = -2Correction in this step is necessaryCorrection in this step is necessary


47/56

Lecture 3-47

Iteration 3. Input X3 present and its desired outputd3:

1)sgn(

1.1

3

111

9.01.03.0323

nety

Xwnet T


e3= -1-1 = -2Correction in this step is necessary

7.03.01.0

7.0

3.01.0

2.0

2.02.0

9.0

1.03.0

*)11(*1.0

3

2

3

33

Tw

w

XwwThe updated weight vector is


48/56

Lecture 3-48

Iteration 4. Input isX4 present and its desired outputd4:

1)sgn(

9.0

4

111

7.03.01.0434

nety

Xwnet T


e4= -1+1 = 0Correction in this step is not necessaryCorrection in this step is not necessary

TTww 34


49/56

Lecture 3-49

check

weights change for iteration 2 and 3.Stopping criteria is not satisfied, continuous withepoch 2


50/56

Iteration 5

11

1

11

7.03.01.014

1)sgn(

1.11

dnety

XwnetT

22

111

7.03.01.0252

1)sgn(

3.0

dnety

Xwnet T

33

111

7.03.01.0363

1)sgn(

5.0

dnety

XwnetT

X

Iteration 6

Iteration 7

TTww 45

TTww

56


51/56

Lecture 3-51

Iteration 7. Input X3present and its desired outputd3:

1)sgn(

5.0

7

111

7.03.01.0367

nety

Xwnet T


e7 = -1-1 = -2Correction in this step is necessary

5.05.01.0

5.0

5.0

1.0

2.0

2.0

2.0

7.0

3.0

1.0

*)11(*1.0

7

7

367

Tw

w

Xww


52/56

Lecture 3-52

Iteration 8. Now all the patterns will give the correct

response. Try it.We can draw the solution found using the line:

01.05.05.0 21 xx


53/56

Lecture 3-53

PerceptronLimitations


54/56

Lecture 3-54

The perceptron learning rule is guaranteed to convergeto a solution in a finite number of steps, so long as asolution exists. The perceptron can be used to classify input vectors that

can be separated by a linear boundary. Unfortunately, many problems are not linearly separable.The classic example is the XOR gate. This problem isillustrated graphically as follows:

Linearly Inseparable Problems

Perceptron Limitations


55/56

Lecture 3-55

Rosenblatt had investigated more complex networks,

which he would felt overcome the limitations of the basic

perceptron, but he was never able to effectively extend the

perceptron rule to such networks.

In later lecture we will introduce multilayer perceptrons,

which can solve arbitrary classification problems, and will

describe the backpropagation algorithm, which can be

used to train them.

End of Perceptron


56/56

HomeworkHomework 11, cont., cont.

Problems:

1.1, 1.2, and 1.3

Computer Experiment

1.6

56

ann 3- perceptron

Documents