intro to ml and learning algorithms for single-layer nn

Upload: chamara-prasanna

Post on 23-Feb-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    1/12

    SC 549 Artificial Neural

    Networks 2015/2016

    Topic 02 : Introduction to Machine

    Learning and Learning Algorithms forSingleLayer Neural Networks

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Contents

    Machine learning and Learning algorithms

    Supervised and unsupervised

    Learning in neural networks

    Hebb rule (Hebbian learning)

    Perceptron and its learning algorithm

    ADALINE and its learning algorithm

    2

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Learning in Neural Networks

    Learning (Training) in a neural network

    essentially means selecting one model from

    the set of allowed models, that minimize a

    cost function

    It is the process of finding the decision

    boundary by adjusting the weights

    It is the process of finding the weight matrixthat provides the correct classification

    3Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Machine Learning

    Machine learning

    scientific discipline that explores the construction

    and study of algorithms that can learn from data

    and make predictions on data

    science of getting computers to act without being

    explicitly programmed

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    2/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Types of Learning Algorithms

    Supervised learning

    infers a function from labeled training data

    Unsupervised learning

    tries to find hidden structure in unlabeled data

    5Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Supervised Learning

    The NN is trained repeatedly by a teacher.

    Each input presented to the network has anassociated desired output.

    In each learning cycle, the error between theactual and the desired output is used to adjust theweights.

    When the error is an acceptable amount the

    learning stops.

    Applications : Classification and Regressionproblems

    6

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Unsupervised Learning

    A teacher is not involved

    The network uses only the inputs

    The inputs form automatic clustering based onsome closeness or similarity criteria.

    Meanings are associated to these clustersdepending on the data.

    Applications : Clustering, Dimensionalityreduction

    7Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Machine Learning

    Unsupervised Learning

    8

    Supervised Learning

    x

    Clustering (K means,

    GMM(EM), Mean

    shift)

    Dimensionality

    Reduction (PCA, LDA)

    Classification,

    Regression,

    ANN

    SVM

    Decision tree

    Polynomialcurve fit

    Gaussian

    Process

    (x,t) f:x t

    t={1,,n}

    t: conti. variable

    p(x)

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    3/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Application of Machine Learning

    Pattern Recognition

    Pattern recognition is a branch of machine learning

    that focuses on the recognition of patterns andregularities in data

    9Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Pattern Recognition

    Predicting tumor cells as benign ormalignant

    Classifying credit card transactionsas legitimate or fraudulent

    Classifying secondary structures ofprotein as alphahelix, betasheet, orrandom coil

    Categorizing news stories as finance,weather, entertainment, sports, etc

    10

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Pattern Recognition Basic Concepts

    Given a collection of records (training set )

    Each record contains a set of attributes, one ofthe attributes is the class.

    Find a model for class attribute as a function

    of the values of other attributes.

    11Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Pattern Recognition Basic Concepts

    Goal: previously unseen recordsshould be

    assigned a class as accurately as possible.

    A test set is used to determine the accuracyof

    the model.

    Usually, the given data set is divided into

    training and test sets, with training set used to

    build the model and test set used to validate it.

    12

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    4/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Pattern Recognition Example

    Apply

    Model

    Learn

    Model

    Tid Attrib1 Attrib2 Attrib3 Class

    1 Yes Large 125K No

    2 No Medium 100K No

    3 No Small 70K No

    4 Yes Medium 120K No

    5 No Large 95K Yes

    6 No Medium 60K No

    7 Yes Large 220K No

    8 No Small 85K Yes

    9 No Medium 75K No

    10 No Small 90K Yes10

    Tid Attrib1 Attrib2 Attrib3 Class

    11 No Small 55K ?

    12 Yes Medium 80K ?

    13 Yes Large 110K ?

    14 No Small 95K ?

    15 No Large 67K ?10

    13Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Classification Techniques

    Decision tree based Methods

    Rulebased Methods

    Memory based reasoning

    Neural Networks

    Nave Bayes and Bayesian Belief Networks

    Support Vector Machines

    14

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Classifiers Examples Support vector machine

    LibSVM

    SVMLight

    Decision Tree

    J48 C4.5

    KNearest Neighbor

    Bayesian Nave Bayes

    Artificial Neural Networks Perceptron

    Multilayer Perceptron

    Selforganizing maps

    Homework : Go through the list of classifiers in Weka

    15Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Learning Algorithms for Single

    Layer Neural Networks

    Hebbian learning (Hebb rule)

    Perceptron learning

    Least mean square (LMS) learning

    16

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    5/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Hebb Nets and Hebbian Learning

    Donald Hebb, in his influential book The

    organization of Behavior (1949), claimed

    Behavior changes are primarily due to thechanges of synaptic strengths (wij) between

    neurons i and j

    The weight between two neurons increases if

    the two neurons activate simultaneously, and

    reduces if they activate separately

    That is wiincreases only when both i and j

    (two connected neurons) are on: the

    Hebbian learning law (algorithm)

    17Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Hebb Nets and Hebbian Learning

    In ANN, Hebbian law can be stated:

    increases only if the outputs of both units

    and have the same sign.This is a generalized version of Hebbian law.

    The weights are increased as follows;

    Sometimes, there is a learning rate ,

    iw

    ix

    y

    yxoldwnewww iiii )()(

    18

    yxoldwnewwiii

    )()(

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Hebbian learning algorithm

    Step 0. Initialization: b = 0, wi = 0, i = 1 to nStep 1. For each training sample s:t do steps 24

    /* s is the input pattern, t the target output of thesample */

    Step 2. xi := si, i = 1 to n /* set s to input units */Step 3. y := t /* set y to the target */Step 4. wi := wi + xi * y, i = 1 to n /* update weight */

    b := b + y /* update bias */

    19Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Hebb Net Example AND Function

    Examples: AND function

    Binary units (1, 0)

    (x1, x2) y=t w1 w2 b(1, 1) 1 1 1 1(1, 0) 0 1 1 1(0, 1) 0 1 1 1(0, 0) 0 1 1 1

    An incorrect boundary:

    1 + x1 + x2 = 0

    Is learned after using

    each sample once

    20

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    6/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Hebb Net Example AND Function

    A boundary1 + x1 + x2 = 0

    is learned. This is not the

    correct boundary

    21Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Hebb Net Example AND Function

    Bipolar units (1, 1)

    (x1, x2) y=t w1 w2 b(1, 1) 1 1 1 1(1, 1) 1 0 2 0(1, 1) 1 1 1 1(1, 1) 1 2 2 2

    A correct boundary

    1 + x1 + x2 = 0

    is successfully

    learned

    (2 + 2x1 + 2x2 =0

    is the boundary

    and 2 is cancelled

    out)

    22

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Hebb Net Example AND Function

    With bipolar units, a

    correct boundary

    1 + x1 + x2 = 0

    is successfully learned

    23Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Stronger learning methods are

    needed

    Classification error can be used to determinethe weight update

    Training samples can be used repeatedly, andeach time only change weights slightly

    Learning methods of Perceptron and ADALINEmodels are error driven

    24

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    7/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron

    The perceptron occupies a special place in the

    historical development of neural networks.

    It was the first algorithmically described

    neural network.

    It was invented by Frank Rosenblatt, a

    psychologist (1962).

    25Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron

    Rosenblatts perceptron is built around the

    McCullochPitts model of a neuron

    Basically, it consists of a single neuron with

    adjustable synaptic weights and bias

    Perceptron works as a binary classifier

    26

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron

    27

    Activation function = Signum function

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron

    The output of the perceptron y = f(s) is

    computed using the signum (sign) activation

    function.

    OR

    28

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    8/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron

    Perceptrons can differentiate patterns only if theyare linearly separable.

    Rosenblatt proved that if the patterns (vectors)used to train the perceptron are drawn from twolinearly separable classes, then the perceptron algorithm converges and

    positions the decision surface in the form of ahyperplane between the two classes.

    The proof of convergence of the algorithm isknown as the perceptron convergence theorem.

    29Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron learning algorithm

    Variables and Parameters:

    x(n) = (m + 1)by1 input vector= [1, x1(n), x2(n), ..., xm(n)]

    w(n) = (m + 1)by1 weight vector

    = [b,w1(n),w22(n), ...,wm(n)]

    b = bias

    y(n) = actual responsed(n) = desired response (target)

    = learningrate parameter

    30

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron learning algorithm

    1. Initialization. Set w(0) = 0. Then perform the followingcomputations for timestep (iteration) n = 1, 2, ....

    2. Activation. At timestep n, activate the perceptron by applyingcontinuousvalued input vector x(n) and desired response d(n).

    3. Computation of Actual Response. Compute the actual response ofthe perceptron as

    y(n) = sgn[wT(n)x(n)] where sgn() is the signum function

    4. Adaptation of Weight Vector.Update the weight vector of theperceptron to obtain

    w(n + 1) = w(n) + [d(n) y(n)]x(n)

    5. Continuation. Increment time step n by one and go back to step 2until convergence.

    31Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron learning algorithm

    Weight update rule

    w(n + 1) = w(n) + [d(n) y(n)]x(n)

    Weight update is based on this error correctionrule known as perceptron convergence theorem.

    Learning paramater (learining rate), 0 < 1

    The initial weights are set to small random values.

    32

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    9/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Perceptron Convergence (Stopping

    Condition)

    Each iteration goes through each sample in the

    training set. One iteration is called an epoch. Algorithm runs for several epochs until

    convergence

    Convergence When the mean error, , where ei(n) =

    di(n) yi(n) is less than a threshold value or ideally,

    when mean error = 0 m = number of input samples

    Or predetermined number of iterations have beencompleted

    33Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    An Application of Perceptron

    Character recognition

    7 characters (A, B,C,D,E,F, and G) from 3 fonts

    are provided as shown in the next slide.

    21 inputs samples

    An algorithm should be developed to classify a

    given character into one of seven characters

    34

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    An Application of Perceptron Input Samples

    35Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    An Application of Perceptron Character

    Recognition

    Solution: Singlelayer neural network ofperceptrons

    Input layer63 binary inputs

    Representing 9x7 pixels where dot is 0 and hash is 1

    Output layer7 perceptrons

    Perceptron 1 outputs A or Not A, perceptron 2outputs B or Not B and so on.

    Eg: The output vector for letter B of perceptron 2 is0100000 (or 1+111111)

    36

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    10/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Learning Rate

    The learning rate has to be chosen

    appropriately:A Small value will make the learning process

    extremely slow.

    A large value will result in fast learning, but the

    learning process may not converge.

    37Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Least Mean Square (LMS) Learning

    and ADALINE The leastmeansquare (LMS) was the first linear

    adaptivefiltering algorithm for solving problems such

    as prediction and communicationchannel equalization.

    LMS finds a desired filter by computing the filtercoefficients that relate to producing the least meansquares of the error signal (difference between thedesired and the actual signal).

    It was invented in 1960 by Stanford Universityprofessor Bernard Widrow and his first Ph.D. student,Ted Hoff.

    38

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Adaptive Filter

    An adaptive filter is a system with a linear

    filter that has a transfer function controlled by

    variable parameters and a means to adjustthose parameters according to an

    optimization (adaptive) algorithm.

    39Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    Adaptive Filter

    40

    x(n) y(n)

    d(n)

    e(n)

    +

    -

    Linear Filter

    Adaptive

    Algorithm

    This system can easily be modeled using a simple

    neuron (McCullochPitts model)

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    11/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    ADALINE

    Adaptive Linear Neuron (ADALINE) was introduced byWidrow and Hoff (1960), is an implementation of an

    adaptive filter.

    The ADALINE networks are similar to the perceptron, buttheir transfer function is linear (f(u) = u) rather than hardlimiting (i.e, Signum).

    This allows their outputs to take on any value, whereas the

    perceptron output is limited to either 0 or 1 (or 1 or 1).

    Hence, ADALINE is also built around the McCullochPittsmodel of a neuron.

    41Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    ADALINE

    42

    Activation function = Linear function

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    ADALINE

    The ADALINE is trained using the leastmean

    square(LMS) or WidrowHoff rule

    43Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    LMS Learning Rule

    LMS learning rule is similar to perceptron

    learning, except for the weight update rule.

    The LMS rule adjusts the weights to reduce

    the difference (error) between net input (local

    induced field) and the desired outputs

    This because the activation function is linear

    44

  • 7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

    12/12

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    LMS Learning Rule

    Learning algorithm: similar to Perceptron

    learning except the weight update rule,

    45

    w(n + 1) = w(n) + [d(n) - y(n)]x(n)

    where, y(n) = wT(n)x(n)

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    LMS Convergence (Stopping

    Condition)

    LMS stops when the meansquare error (MSE)

    is less than a certain threshold value.When, error, e(n) = d(n) y(n) and threshold =

    MSE =

    46

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    ADALINE LMS Algorithm

    Step 0 Initialize weights. Set learning rate

    Step 1 While stopping condition is false, do Step 26.

    Step 2 For each training pair x : d, do Step 35.

    Step 3 Set activations of input units, i = 1 n: xi(n) = si .

    Step 4 Compute net input to output unit:y(n) = wT(n)x(n)

    Step 5 Update bias and weights, i = 1 n:

    Step 6 Test for stopping condition: if the meansquareerror is less than a threshold value , then stop ;otherwise go to Step 2 and continue.

    47

    w(n + 1) = w(n) + [d(n) y(n)]x(n)

    Postgraduate Institute of Science MSc in Computer Science SC549 ANN

    MADALINE

    Extension of ADALINE

    MADALINE (Many ADALINEs) is a threelayer

    (input, hidden, output), fully connected, feed

    forward artificial neural network architecture

    for classification that uses ADALINE units in its

    hidden and output layers

    48