ann 3- perceptron

Upload: nwwar

Post on 05-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 ANN 3- Perceptron

    1/56

    Neural NetworksNeural NetworksLectureLecture 33

    ChapterChapter 0101

    Resenblatts perceptron

    Dr.Dr. HalaHala MousherMousher EbiedEbiedScientific Computing DepartmentScientific Computing Department

    Faculty of Computers & Information SciencesFaculty of Computers & Information SciencesAin Shams UniversityAin Shams University

  • 8/2/2019 ANN 3- Perceptron

    2/56

    Lecture 3-2

    AgendaAgenda

    SingleSingle--layer feedforward networks andlayer feedforward networks andClassification Problems (Classification Problems (McCulloch-Pitts neuronmodel )

    Decision SurfaceDecision Surface

    Rosenblatts Perceptron Learning Rule (ThePerceptron Convergence Theorem )

    Illustrate ExampleIllustrate Example 11 Illustrate ExampleIllustrate Example 22 (AND)(AND)

  • 8/2/2019 ANN 3- Perceptron

    3/56

    ASU-CSC445: Neural Networks 3

    IntroductionIntroductionMcCulloch and Pitts proposed the McCulloch-Pitts neuron model1943

    Hebb published his book The Organization of Behavior, in which the Hebbianlearning rule was proposed.

    1949

    Rosenblatt introduced the simple single layer networks now calledPerceptrons.

    1958

    Minsky and Paperts book Perceptrons demonstrated the limitation of singlelayer perceptrons, and almost the whole field went into hibernation.

    1969

    Hopfield published a series of papers on Hopfield networks.1982

    Kohonen developed the Self-Organising Maps that now bear his name.1982

    The Back-Propagation learning algorithm for Multi-Layer Perceptrons wasrediscovered and the whole field took off again.

    1986

    The sub-field of Radial Basis Function Networks was developed.1990s

    The power of Ensembles of Neural Networks and Support Vector Machinesbecomes apparent.

    2000s

  • 8/2/2019 ANN 3- Perceptron

    4/56

    Lecture 3-4

    going back togoing back to

    The McCulloch-Pitts neuron model (1943)

    Threshold Logic Units (TLU)Threshold Logic Units (TLU)

    00

    01)()( 2211

    k

    k

    kkkkifv

    ifvxwxwvy

  • 8/2/2019 ANN 3- Perceptron

    5/56

    going back togoing back to

    ClassificationClassificationThe goal of pattern classification is to assign a physical

    object or event to one of a set of a pre-specified

    classes (or categories)

    Examples:

    Boolean functions

    Pixel Patterns

    Lecture 3-5

  • 8/2/2019 ANN 3- Perceptron

    6/56

    Lecture 3-6

    The AND problemThe AND problem

    x1 x2 y

    1 1 1

    1 0 0

    0 1 0

    0 0 0

    11211 wwLet

    ,01

    02 ?

    05.1

    )0(0)()1001(,00

    )01(0)1()1101(,10

    )01(0)1()0111(,01

    )02(1)2()1111(,11

    21

    21

    21

    21

    ifyxandxwith

    ifyxandxwith

    ifyxandxwith

    ifyxandxwith

  • 8/2/2019 ANN 3- Perceptron

    7/56

    Lecture 3-7

    The OR problemThe OR problem

    x1 x2 y

    1 1 1

    1 0 1

    0 1 1

    0 0 011211 wwLet

    5.001 ,02

    ?0

    )0(0)()1001(,00

    )01(1)1()1101(,10

    )01(1)1()0111(,01

    )02(1)2()1111(,11

    21

    21

    21

    21

    ifyxandxwith

    ifyxandxwith

    ifyxandxwith

    ifyxandxwith

  • 8/2/2019 ANN 3- Perceptron

    8/56

    Lecture 3-8

    The XOR problemThe XOR problem

    x1 x2 y

    1 1 0

    1 0 1

    0 1 1

    0 0 0

    11211

    wwLet

    ???

  • 8/2/2019 ANN 3- Perceptron

    9/56

    Lecture 3-9

    TheThe graph forgraph for AND, OR, and XORAND, OR, and XOR

    ProblemsProblems

    Y=0

    Y=1

  • 8/2/2019 ANN 3- Perceptron

    10/56

    Lecture 3-10

    Whats NextWhats Next

    SingleSingle--layer Feedforward Networks andlayer Feedforward Networks andClassification Problems (McCullochClassification Problems (McCulloch--Pitts neuronPitts neuronmodel )model )

    Decision SurfaceDecision Surface

    Rosenblatts Perceptron Learning Rule (TheRosenblatts Perceptron Learning Rule (ThePerceptron Convergence Theorem )Perceptron Convergence Theorem )

    Illustrate ExampleIllustrate Example 11 Illustrate ExampleIllustrate Example 22 (AND)(AND)

  • 8/2/2019 ANN 3- Perceptron

    11/56

    Decision Surface (Decision Surface (hyperplanehyperplane))))

    For minput variables x1, x2, , xm. The decisionsurface is the surface at which the output of the nodeis precisely equal to the threshold (bias), and defined

    by

    On one side of this surface, the output of decision unit,y, will be 0, and on the other side it will be 1.

    Lecture 3-11

    01

    m

    i

    iixw

  • 8/2/2019 ANN 3- Perceptron

    12/56

    Lecture 3-12

    LinearLinear separableseparable

    A function f:{0, 1}n {0, 1} is linearly separable if thespace of input vectors yielding 1 can be separated fromthose yielding 0 by a linear surfacelinear surface (hyperplanehyperplane) in nn

    dimensionsdimensions. Examples: in case of two-dimensional, a hyperpane is a

    straight line.

    linearly separable Non-linearly separable

  • 8/2/2019 ANN 3- Perceptron

    13/56

    Lecture 3-13

    Decision Surfaces in 2Decision Surfaces in 2--DD

    Two classification regions are separated bythe decision boundary line L:

    This line is shifted away from the origin

    according to the bias . If the bias is 0, then the decision surface

    passes through the origin.

    This line is perpendicular to the weight

    vector W

    Adding a bias allows the neuron to solveproblems where the two sets of inputvectors are not located on different sides of

    the ori in.

    02211 xwxw

    2

    1

    ww

  • 8/2/2019 ANN 3- Perceptron

    14/56

    Decision Surfaces in 2Decision Surfaces in 2--D, cont.D, cont.

    In 2-dim, the surface is:

    Which we can write

    Which is the equation of a line of gradient

    With intercept

    Lecture 3-14

    02211 xwxw

    2

    1

    2

    12

    wx

    w

    wx

    2

    1

    w

    w

    2w

  • 8/2/2019 ANN 3- Perceptron

    15/56

    Lecture 3-15

    Examples forExamples for Decision Surfaces inDecision Surfaces in 22--DD ::

    x1

    1 2 3-3 -2 -1

    x2

    12

    3

    -3

    -2

    -1

    Class 1

    Let w1 = 1, w2 = 2, = -2

    The decision boundary:

    X1+2X2-2=0

    x1

    1 2 3-3 -2 -1

    x2

    12

    3

    -3

    -2

    -1

    Class 1

    Let w1 = -2, w2 = 1, =- 2

    The decision boundary:

    -2X1+X2-2=0

    Depending on the values of the weights, the decision boundaryline will separate the possible inputs into two categories.

    Class 0Class 0

  • 8/2/2019 ANN 3- Perceptron

    16/56

    ExampleExample

    Slope= (6-0)/(0-6)= -1

    Intersection at x2-axis=6

    then the hyperplane (straightline ) will be x2=-x1+6

    Which is: xx11+x+x22--66==00

    Lecture 3-16

  • 8/2/2019 ANN 3- Perceptron

    17/56

    Lecture 3-17

    Linear Separability for nLinear Separability for n--dimensionaldimensional So by varying the weights and the threshold, we can

    realize any linear separation of the input space into aregion that yields output 1, and another region that yields

    output 0. As we have seen, a twotwo--dimensionaldimensional input space can

    be divided by any straight line.

    A threethree--dimensionaldimensional input space can be divided by anytwo-dimensional plane.

    In general, an nn--dimensionaldimensional input space can bedivided by an (n-1)-dimensional plane or hyperplane.

    Of course, for n > 3 this is hard to visualize.

  • 8/2/2019 ANN 3- Perceptron

    18/56

    Lecture 3-18

    Rosenblatts Perceptron LearningRosenblatts Perceptron Learning

    RuleRule

    Whats Next

  • 8/2/2019 ANN 3- Perceptron

    19/56

    The PerceptronThe Perceptron

    by Frank Rosenblatt (by Frank Rosenblatt (19581958,, 19621962))

    It is the simplestform of a neural network.

    It is a single-layer network whose synapticweights apple

    to adjustable.

    is limitedto performpattern classification with only two

    classes.

    The two classes must be linearly Separable; patterns lie on

    opposite sides of a hyperplane.

    Lecture 3-19

  • 8/2/2019 ANN 3- Perceptron

    20/56

    Lecture 3-20

    The Perceptron ArchitectureThe Perceptron Architecture

    A perceptron nonlinear neuron, which uses the hard-limittransfer function (signum activation function), returnsa +1 or -1.

    Weights are adapted using an error-correction rule,

    The training technique used is called thethe perceptronperceptronlearninglearning rulerule..

    xwxw

    bxwv

    T

    m

    j

    jj

    m

    j

    jj

    0

    1

    01

    01)sgn()(

    v

    vvvyk

  • 8/2/2019 ANN 3- Perceptron

    21/56

    The Perceptron Convergence AlgorithmThe Perceptron Convergence Algorithm

    The learning algorithms find a weight vector w such that:

    , then the training problem is then to find a weight vectorW such that the previous two inequalities are satisfied.This is achieved when updating the weights as follows:

    Lecture 3-21

    C1orinput vecteveryfor0 xxwT

    C2orinput vecteveryfor0 xxwT

    C1x(n)and0)(if)()1( nnwnnT

    xwww

    C2x(n)and0)(if)()1( nnwnn T xwww

  • 8/2/2019 ANN 3- Perceptron

    22/56

    Lecture 3-22

    The Perceptron Convergence TheoremThe Perceptron Convergence Theorem

    Rosenblatt (1958,1962)

    If a problem is linearly separable, thenThe perceptron will converges after some n iterations

    i.e.The perceptron will learn in a finite number of steps.

    Rosenblatt designed a learning law for weightedadaptation in the McCulloch-Pitts model.

  • 8/2/2019 ANN 3- Perceptron

    23/56

    Lecture 3-23

    Perceptron parameters and learning rulePerceptron parameters and learning rule

    Training set consists of the pair {X,t}

    X is an input vector and

    t the desired target vector

    Iterative process

    Present a training example X ,

    compute network output y ,

    compare output y with target t,

    and adjust weights. (addw)

    Learning ruleSpecifies how to change the weights w of the network as a function

    of

    the inputs X, output y and target t.

  • 8/2/2019 ANN 3- Perceptron

    24/56

    Lecture 3-24

    Drive the errorDrive the error--correction learning rulecorrection learning rule

    -- how to adjusting the Weight Vector?how to adjusting the Weight Vector? The net input signalnet input signal is the sum of all inputs after

    passing the synapses:

    m

    j jj

    xwnet0

    This can be viewed as computing the inner productinner product of

    the vectors wi and X:

    )cos(.. jj xwnet

    where is the angleangle between the two vectors.

    wwii

    x1

    x2

    xx

  • 8/2/2019 ANN 3- Perceptron

    25/56

    Lecture 3-25

    Drive the errorDrive the error--correction learning rule ,correction learning rule ,

    cont.cont.

    w

    w = w + axx

    Move w in thedirection of x

    Case 1. ifError=t-y=0, then make a change w=0.

    Case 2. ifTarget t=1, Output y=-1, and the Error=t-y=2

    w

    x then

    The weight vector results in an incorrectly classifies the vector X.The input vector X is added to the weight vector w.This makes the weight vector point closer tocloser to the input vector,increasing the chance that the input vector will be classified as a 1in the future.

    x

  • 8/2/2019 ANN 3- Perceptron

    26/56

    Lecture 3-26

    Drive the errorDrive the error--correction learning rule ,correction learning rule ,

    cont.cont.

    w

    x

    w = w - ax

    w

    x

    x

    Move w awayfrom thedirection of x

    Case 3. ifTarget t=-1, Output y=1, and the Error=t-y=-2

    then

    The weight vector results in an incorrectly classifies the vector X.

    The input vector X is subtracted from the weight vector w.This makes the weight vector point farther awayfarther away from the inputvector, increasing the chance that the input vector will be classifiedas a -1 in the future.

  • 8/2/2019 ANN 3- Perceptron

    27/56

    Lecture 3-27

    Drive the errorDrive the error--correction learning rule ,correction learning rule ,

    cont.cont. The perceptron learning rule can be written more succinctly in terms ofthe error e = t y and the change to be made to the weight vector

    w:

    We can see that the sign of X is the same as the sign on theerror, e. Thus, all three cases can then be written with a single

    expression:

    With w(0)= random values. Where n denotes the time-step in applying thealgorithm.

    The parameter is called the learning rate. It determines the

    ma nitude of wei ht u dates w .

    w= (t-y)

    x

    W(n+1) = w(n)+w(n) = w(n) + (t-y) x

    CASE 1. If e = 0, then make a change w equal to 0.

    CASE 2. If e = 2, then make a change w proportional to x.

    CASE 3. If e = 2, then make a change w proportional to x.

  • 8/2/2019 ANN 3- Perceptron

    28/56

    Lecture 3-28

    Learning Rate

    governs the rate at which the training rule

    converges toward the correct solution.

    Typically

  • 8/2/2019 ANN 3- Perceptron

    29/56

    Lecture 3-29

    PerceptronPerceptron ConvergenceConvergence AlgorithmAlgorithm

    variables and parameters:

    X(n) = input vector of dim (m+1) by 1 = [+1, x1, x2, , xm]T

    W(n) = weight vector of dim (m+1) by 1 = [b, w1, w2, , wm]T

    b = bias

    y(n) = actual response

    t (n) = desired (target) response

    = learning rate parameter, a positive constant less than unity

  • 8/2/2019 ANN 3- Perceptron

    30/56

    Lecture 3-30

    PerceptronPerceptron ConvergenceConvergence Algorithm, cont.Algorithm, cont.1- Start with a randomly chosen weight vector w(0)2- Repeat

    3- for each training vector pair (Xi,ti)evaluate the output yi when Xi is the input

    4- if yiti then

    form a new weight vector wi+1 according towi+1 = wi + wi = wi + (t-y) Xi

    else

    do nothingend if

    end for

    5- Until the stopping criterion (convergence) is reached

    convergence : epoch doesnt produce an error

    (i..e all training vector classified correctly (y=t) for the same w(n) )(i.e. until error is zero for all training patterns).(i.e. no weights change in step 4 for all training patterns)

  • 8/2/2019 ANN 3- Perceptron

    31/56

    Lecture 3-31

    Perceptron learning schemePerceptron learning scheme

  • 8/2/2019 ANN 3- Perceptron

    32/56

    Illustrative ExampleIllustrative Example

    Lecture 3-32

  • 8/2/2019 ANN 3- Perceptron

    33/56

    IntroductionIntroduction

    We will begin with a simple test problem to show how theperceptron learning rule should work. To simplify ourproblem, we will begin with a network without a bias andhave two-inputs and one output. The network will thenhave just two parameters, w11 and w12 , to adjustmentas shown in the following figure.

    By removing the bias we are left with a network whosedecision boundary must pass through the origin.

    Lecture 3-33

  • 8/2/2019 ANN 3- Perceptron

    34/56

    Example 1Example 1

    The input/target pairs for our test problem are

    Lecture 3-34

    which are to be used in the training together

    with the following weights

    Show how the learning proceeds?

    1;1;1

    1

    0;

    2

    1;

    2

    1

    321

    321

    ddd

    XXX

    5.0;8.0

    0.1

    w

  • 8/2/2019 ANN 3- Perceptron

    35/56

    Lecture 3-35

    Iteration 1. Input isX1 and the desired output isd1:

    1)sgn(

    6.0

    1

    12

    18.00.111

    nety

    Xwnet T

    sinced1=1 , the error signal ise= 1-(-1) = 2

    Then the Correction (update weights) in this step is necessaryThen the Correction (update weights) in this step is necessary

    The updated weight vector is

    2.1

    0.2

    2

    1

    8.0

    0.1

    *))1(1(*5.0

    2

    12 1

    w

    Xww

    This operation is illustrated in the adjacent figure.

  • 8/2/2019 ANN 3- Perceptron

    36/56

    Lecture 3-36

    Iteration 2. Input isX2 and the desired output isd2:

    1)sgn(

    4.0

    2

    2

    12.10.2222

    nety

    XwnetT

    sinced2=-1 , the error signal ise = -1-1 = -2

    Then the Correction (update weights) in this step is necessaryThen the Correction (update weights) in this step is necessary

    The updated weight vector is

    8.0

    0.3

    2

    1

    2.1

    0.2*)11(*5.0

    3

    223

    w

    Xww

    This operation is illustrated in the adjacent figure.

  • 8/2/2019 ANN 3- Perceptron

    37/56

    Lecture 3-37

    Iteration 3. Input isX3 and the desired output isd3:

    1)sgn(

    8.0

    3

    1

    08.00.3333

    nety

    XwnetT

    sinced1=-1 , the error signal ise = -1-1 = -2

    Then the Correction (update weights) in this step is necessaryThen the Correction (update weights) in this step is necessary

    The updated weight vector is

    2.0

    0.3

    1

    0

    8.0

    0.3*)11(*5.0

    4

    334

    w

    Xww

    This operation is illustrated in the adjacent figure.

  • 8/2/2019 ANN 3- Perceptron

    38/56

    Lecture 3-38

    check

    weights change for iteration 1, 2, and 3.No stop

  • 8/2/2019 ANN 3- Perceptron

    39/56

    Lecture 3-39

    Iteration 4:

    111

    2

    1

    2.00.314

    1)sgn(

    4.31

    dnety

    Xwnet

    T

    222

    2

    12.00.3242

    1)sgn(

    6.2

    dnety

    XwnetT

    33

    1

    02.00.3343

    1)sgn(

    2.0

    dnety

    XwnetT

    No weights change for iteration 4, 5, and 6Stopping criteria is satisfied

    Iteration 5:

    Iteration 6:

  • 8/2/2019 ANN 3- Perceptron

    40/56

    Lecture 3-40

    The diagram to the left shows that theperceptron has finally learned to classifythe three vectors properly.If we present any of the input vectors to

    the neuron, it will output the correct classfor that input vector.

  • 8/2/2019 ANN 3- Perceptron

    41/56

    Lecture 3-41

    Your turn (HomeworkYour turn (Homework 11))

    Consider the following set of training vectorsx1,x2, andx3,

    1

    5.0

    1

    1

    ;

    1

    5.0

    5.1

    0

    ;

    1

    0

    2

    1

    321 XXX

    With learning rate equals 0.1, and the desired

    responses and weights initialized as:

    ;

    5.0

    0

    1

    1

    ;

    1

    11

    3

    2

    1

    w

    d

    dd

    Show how the learning proceeds?

  • 8/2/2019 ANN 3- Perceptron

    42/56

    ExampleExample 22

    AND ProblemAND Problem

    Lecture 3-42

  • 8/2/2019 ANN 3- Perceptron

    43/56

    Lecture 3-43

    ExampleExample 22

    Consider the AND problem

    1

    1;

    1

    1;

    1

    1;

    1

    14321 xxxx

    With bias b =0.5 and the desired responses and weights

    initialized as:

    1.0;7.0

    3.0;

    1

    1

    1

    1

    4

    3

    2

    1

    w

    d

    d

    d

    d

    Show how the learning proceeds?

  • 8/2/2019 ANN 3- Perceptron

    44/56

    Lecture 3-44

    Consider:

    1

    1

    1

    ;

    1

    1

    1

    ;

    1

    1

    1

    ;

    1

    1

    1

    4321 XXXX

    7.0

    3.0

    5.0

    w

  • 8/2/2019 ANN 3- Perceptron

    45/56

    Lecture 3-45

    Iteration 1. InputX1present and its desired outputd1:

    1)sgn(

    5.1

    1

    1111

    7.03.05.010

    nety

    XwnetT

    sinced1=1 , the error signal is

    e1 = 1-1 = 0Correction (update the weights) in this step isCorrection (update the weights) in this step is not necessarynot necessary

    TTww 01

  • 8/2/2019 ANN 3- Perceptron

    46/56

    Lecture 3-46

    The updated weight vector is

    9.01.03.0

    9.0

    1.03.0

    2.0

    2.02.0

    7.0

    3.05.0

    *)11(1.0

    2

    2

    12 2

    Tw

    w

    Xww

    Iteration 2. Input X2present and its desired output d2:

    1)sgn(

    1.0

    2

    1117.03.05.0212

    nety

    Xwnet T

    sinced2=-1 , the error signal is

    e2 = -1-1 = -2Correction in this step is necessaryCorrection in this step is necessary

  • 8/2/2019 ANN 3- Perceptron

    47/56

    Lecture 3-47

    Iteration 3. Input X3 present and its desired outputd3:

    1)sgn(

    1.1

    3

    111

    9.01.03.0323

    nety

    Xwnet T

    sinced3=-1 , the error signal is

    e3= -1-1 = -2Correction in this step is necessary

    7.03.01.0

    7.0

    3.01.0

    2.0

    2.02.0

    9.0

    1.03.0

    *)11(*1.0

    3

    2

    3

    33

    Tw

    w

    XwwThe updated weight vector is

  • 8/2/2019 ANN 3- Perceptron

    48/56

    Lecture 3-48

    Iteration 4. Input isX4 present and its desired outputd4:

    1)sgn(

    9.0

    4

    111

    7.03.01.0434

    nety

    Xwnet T

    sinced4=-1 , the error signal is

    e4= -1+1 = 0Correction in this step is not necessaryCorrection in this step is not necessary

    TTww 34

  • 8/2/2019 ANN 3- Perceptron

    49/56

    Lecture 3-49

    check

    weights change for iteration 2 and 3.Stopping criteria is not satisfied, continuous withepoch 2

  • 8/2/2019 ANN 3- Perceptron

    50/56

    Iteration 5

    11

    1

    11

    7.03.01.014

    1)sgn(

    1.11

    dnety

    XwnetT

    22

    111

    7.03.01.0252

    1)sgn(

    3.0

    dnety

    Xwnet T

    33

    111

    7.03.01.0363

    1)sgn(

    5.0

    dnety

    XwnetT

    X

    Iteration 6

    Iteration 7

    TTww 45

    TTww

    56

  • 8/2/2019 ANN 3- Perceptron

    51/56

    Lecture 3-51

    Iteration 7. Input X3present and its desired outputd3:

    1)sgn(

    5.0

    7

    111

    7.03.01.0367

    nety

    Xwnet T

    sinced3=-1 , the error signal is

    e7 = -1-1 = -2Correction in this step is necessary

    5.05.01.0

    5.0

    5.0

    1.0

    2.0

    2.0

    2.0

    7.0

    3.0

    1.0

    *)11(*1.0

    7

    7

    367

    Tw

    w

    Xww

  • 8/2/2019 ANN 3- Perceptron

    52/56

    Lecture 3-52

    Iteration 8. Now all the patterns will give the correct

    response. Try it.We can draw the solution found using the line:

    01.05.05.0 21 xx

  • 8/2/2019 ANN 3- Perceptron

    53/56

    Lecture 3-53

    PerceptronLimitations

  • 8/2/2019 ANN 3- Perceptron

    54/56

    Lecture 3-54

    The perceptron learning rule is guaranteed to convergeto a solution in a finite number of steps, so long as asolution exists. The perceptron can be used to classify input vectors that

    can be separated by a linear boundary. Unfortunately, many problems are not linearly separable.The classic example is the XOR gate. This problem isillustrated graphically as follows:

    Linearly Inseparable Problems

    Perceptron Limitations

  • 8/2/2019 ANN 3- Perceptron

    55/56

    Lecture 3-55

    Rosenblatt had investigated more complex networks,

    which he would felt overcome the limitations of the basic

    perceptron, but he was never able to effectively extend the

    perceptron rule to such networks.

    In later lecture we will introduce multilayer perceptrons,

    which can solve arbitrary classification problems, and will

    describe the backpropagation algorithm, which can be

    used to train them.

    End of Perceptron

  • 8/2/2019 ANN 3- Perceptron

    56/56

    HomeworkHomework 11, cont., cont.

    Problems:

    1.1, 1.2, and 1.3

    Computer Experiment

    1.6

    56