what is pattern recognition (lecture 3 of 6)

ERI SUMMER TRAININGCOMPUTERS & SYSTEMS DEPT.

Dr. Randa ElanwarLecture 3

Content

� Non linear problems

� Learning Methods

� Supervised learning

� Unsupervised learning

2

ERI Summer training (C&S) Dr. Randa Elanwar

� Unsupervised learning

� Reinforcement learning

Non linear problems3

� So far we understood that NN can handle problems with

� feature vectors > 2 (i.e. hyper feature space)

� multiple classes >= 2� multiple classes >= 2

� linear problems (linearly separable)

� Sometimes, the problem nature or the features taken in the feature space cannot be solved using first degree polynomial like this example.


Non linear problems

� Assume a shoes factory is packing the shoes pairs of certain size on a moving belt, the machine should check that the pairs are not alike. The factory labels left shoe as "0" and right shoe as "1".

� The factory needs a solution that checks labels and

4


� The factory needs a solution that checks labels and whenever it find similar pair stops the moving belt to prevent packing it.

� So we want to implement a NN to act as logical XOR function .. In other words whenever the input is '00' or "11" the output is '0' and belt stops, otherwise the output is '1' and belt keeps moving.


� XOR problem

� No single line can ever separate the samples correctly. The only way to separate the positive from negative examples is to draw 2 lines (i.e., we need 2 straight line equations) or draw a nonlinear region to capture one type only


Non linear problems

� To implement the nonlinearity we need to insert one or more extra layer of nodes between the input layer and the output layer (Hidden layer)

6



� The nonlinearity can help re-shape the straight line decision boundary into higher order polynomial that can successfully separates each class samples


Non linear problems

� The higher the polynomial order, the more over fitting we get. But here we have to answer 2 questions:

8


� Why are the samples interfering in the feature space?

� How far should we pursue raising the order of the decision boundary (adding hidden layers to the network)?


� Well! first you will notice that discriminative features are rare and have a limit.

� Real samples usually have similarities that cause classes interference/overlap within any feature classes interference/overlap within any feature space.

� Moreover, some features may be dependent on other features used and not distinct. This dependency usually allows more interference.


Non linear problems

� Some times the number of features used is insufficient to distinctly separate the samples in the feature space.

� Increasing the number of distinct features help widen the space between each class samples.

10


the space between each class samples.

� However, increasing the number of features more than needed introduce noise in the form of irrelevant features, which doesn't lead to the separation as well as it leads to the system complexity (remember that the number of nodes in the NN input layer is equal to the number of features used).


� Using a feed forward network to solve the overlapping problem will not help, because the delta rule works only If a set of <input, output> pairs are learnable (representable), the delta rule will find the necessary weights:

� in a finite number of steps

independent of initial weights� independent of initial weights

� In case of interfering samples in the feature space the delta rue will run in an infinite loop because the error will never diminish to zero, i.e. the sample pairs are not learnable. The only solution we need is a hyperbola (non linear decision boundary).


Non linear problems

� This solution is offered in either of 2 ways:1. Adding hidden layers (use Multiple Layer Perceptron

MLP NN) to add non linearity to the decision boundary to some acceptable limit.

12


2. Make sample transformation by kernels, in other words, multiply the feature vector of the patterns/samples by a set of orthogonal functions to re-locate them in space in a way that can make the linear solution possible (use Radial Basis Functions RBF NN)


� The learning of MLP and RBF depends on another method than the delta rule, called the back propagation algorithm. This algorithms also depends on minimizing the error function of misclassified patterns. It has a long derivation that we will not discuss it now, has a long derivation that we will not discuss it now, may be later.

� But to imagine how things work� Back propagation tries to transform training patterns to

make them almost linearly separable and use linear network


Non linear problems

� In other words, if we need more than 1 straight line to separate +ve and –ve patterns, we solve the problem in two phases:

� In phase 1: we first represent each straight line with a

14


� In phase 1: we first represent each straight line with a single perceptron and classify the training patterns (output)

� In phase 2: these outputs are then transformed to new patterns which are now linearly separable and can be classified by an additional perceptron giving the final result.

Learning Methods15

� Learning/Training is The process of modifying the weights in the connections between network layers with the objective of achieving the expected output.

� This is achieved through�Supervised learning

�Unsupervised learning

�Reinforcement learning


Supervised learning

� Each input vector requires a corresponding target vector. Training pair=[input vector, target vector]

16


Supervised learning17

� During learning, produced output is compared with the desired output� The difference between both output is used to modify

learning weights according to the learning algorithm

� Learning cases: pattern recognition problems.

� Neural Network models using supervised learning: ML Perceptron, feed-forward, radial basis function, support vector machine.


Unsupervised learning18

� All similar input patterns are grouped together as clusters. If a matching input pattern is not found a new cluster is formed

� In unsupervised learning there is no error feedback because targets are not provided


Unsupervised learning

� Network must discover patterns, regularities, features for the input data over the output. This process is called self-organizing

� Learning cases: Appropriate for clustering task like:

19


� Learning cases: Appropriate for clustering task like:� Find similar groups of documents in the web, content

addressable memory, clustering.

� Neural Network models using unsupervised learning: Kohonen, self organizing maps, Hopfield networks.

Reinforcement learning

� Target is provided, but the desired output is absent. I.e. unlike unsupervised learning, there is a given feedback but it is in the form of "Good /bad", "Greater/less", etc. No exact value for the desired output or the class membership.

20


output or the class membership.

� The net is only provided with guidance to determine the produced output is acceptable or not.

Reinforcement learning21

� Weights are modified in the units that have errors


Reinforcement learning

� When Reinforcement learning is used?

� If less information is available about the target output values (critic information)

� Feedback in this case is only evaluative and not

22


instructive

what is pattern recognition (lecture 3 of 6)

Science