pattern recognition linear classifier by zaheer ahmad

Pattern RecognitionLinear Classifiers

Zaheer Ahmad PhD Scholar

[email protected] of Computer Science University of Peshawar

mailto:[email protected]

Agenda• Pattern Recognition

– Features and Patterns– Classifiers– Approaches– Design Cycle

• Linear Classification– Linear Discriminant Functions – Linear Separability– Fisher Discriminant Functions – Support Vector Machines(SVMs)

What is pattern recognition?

• “The assignment of a physical object or event to one of several pre-specified categories” –Duda and Hart

• “The science that concerns the description or classification (recognition) of measurements” –Schalkoff

• “The process of giving names to observations x”, –Schürmann

• Pattern Recognition is concerned with answering the question “What is this?” –Morse

Applications of PR

• Image processing • Computer vision • Speech recognition • Data Mining • Automated target recognition • Optical character recognition • Seismic analysis

• Man and machine diagnostics • Fingerprint identification• Industrial inspection • Financial forecast • Medical diagnosis • ECG signal analysis

Terminology

• Recognition: During recognition (or classification) given objects are assigned to prescribed classes.

• classification is the problem of identifying which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known

• An algorithm that implements classification, especially in a concrete implementation, is known as a classifier

• A classifier is a machine which performs classification.

Features • Feature is any distinctive aspect, quality or

characteristic of an object• Features may be symbolic (i.e., color) or numeric

(i.e., height) • The combination of features is a -dim column 𝑑 𝑑

vector called a feature vector • The -dimensional space defined by the feature 𝑑

vector is called the feature space – Objects are represented as points in feature space; the

result is a scatter plot

Features

a “good” feature vector? • The quality of a feature vector is related to its

ability to discriminate • It should include examples from different

classes • Examples from the same class should have

similar feature values • Examples from different classes have different

feature values

More feature properties

Pattern and Pattern Class • A pattern is an object, process or event that can be given a

name.• Pattern is a composite of traits or features characteristic of

an individual • In classification tasks, a pattern is a pair of variables { , } 𝑥 𝜔

where – 𝑥 is a collection of observations or features (feature vector) – 𝜔 is the concept behind the observation (label/category)

• A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source.

• A class/ pattern class is a set of objects having some important properties in common

Decision Boundary/Surface

• A line or curve separating the classes is a decision boundary

• The equation g(x) = 0 defines the decision surface that separates points assigned to the category ω1 from points assigned to the category ω2

• When g(x) is linear, the decision surface is a hyperplane• If x1 and x2 are both on the hyperplane then

Decision Boundary

Slope intercept form of a Line(Straight Line):The equation of a line with a defined slope m can also be written as follows: y = mx + b

Classifiers • The task of a classifier is to partition feature

space into class-labeled decision regions • Borders between decision regions are called

decision boundaries • The classification of feature vector consists 𝑥

of determining which decision region it belongs to, and assign to this class 𝑥

Pattern recognition approaches

Statistical • Patterns classified based on an underlying statistical model of

the features – The statistical model is defined by a family of class-conditional

probability density functions ( | ) (Probability of 𝑝 𝑥 𝜔feature vector given class ) 𝑥 𝜔

Neural • Classification is based on the response of a network of

processing units (neurons) to an input stimuli (pattern) – “Knowledge” is stored in the connectivity and strength of the synaptic

weights – Trainable, non-algorithmic, black-box strategy

• Very attractive since – it requires minimum a priori knowledge – with enough layers and neurons, ANNs can create any

complex decision region Syntactic • Patterns classified based on measures of structural similarity • “Knowledge” is represented by means of formal grammars or

relational descriptions (graphs) • Used not only for classification, but also for description

Typically, syntactic approaches formulate hierarchical descriptions of complex patterns built up from simpler sub patterns

The pattern recognition design cycle

Data collection • Probably the most time-intensive component of a PR project • How many examples are enough? Feature choice • Critical to the success of the PR problem

– “Garbage in, garbage out”

• Requires basic prior knowledge Model choice • Statistical, neural and structural approaches • Parameter settings

Training • Given a feature set and a “blank” model,

adapt the model to explain the data • Supervised, unsupervised and reinforcement

learning Evaluation • How well does the trained model do? • Overfitting vs. generalization

• Classification in which the decision boundary in the feature (input) space is linear

Linear Classification

• In linear classification the input space is split in (hyper-)planes, each with an assigned class

Linear Separable

• If a hyperplanar decision boundary exists that correctly classify all the training samples for a c=2 class problem, the samples are said to be linearly separable.

Linear Discriminant Function• A discriminant function that is a linear

combination of the components of x is called a linear discriminant function and can be written as

where w is the weight vector and w0 is the bias (or threshold weight).

Linear Classifiers

• Linear Classifiers– a linear classifier is a mapping which partitions

feature space using a linear function (a straight line, or a hyperplane)

– it is one of the simplest classifiers we can imagine• “separate the two classes using a straight line in feature

space”

– in 2 dimensions the decision boundary is a straight line

2-Class Data with a Linear Decision Boundary

-4 -2 0 2 4 6 8 10 12 14-4

-2

0

2

4

6

8

Feature 1

Featu

re 2

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE

DecisionBoundary

DecisionRegion 1

Decision Region 2

Data that is Not “Linearly Separable”

2 3 4 5 6 7 8 9 10-1

0

1

2

3

4

5

6

Feature 1

Featu

re 2

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE

DecisionRegion 2

DecisionRegion 1

DecisionBoundary

Fisher’s linear discriminant

• A simple linear discriminant function is a projection of the data down to 1-D.– So choose the projection that gives the best separation of

the classes. • An obvious direction to choose is the direction of the line

joining the class means.– But if the main direction of variance in each class is not

orthogonal to this line, this will not give good separation (see the next figure).

• Fisher’s method chooses the direction that maximizes the ratio of between class variance to within class variance.– This is the direction in which the projected points contain

the most information about class membership (under Gaussian assumptions)

•  Classes well-separated in D-space may strongly overlap in 1-dimension – Adjust component of the weight vector w – Select projection to maximize class-separation

• Can be generalized for multiple classes

A picture showing the advantage of Fisher’s linear discriminant.

When projected onto the line joining the class means, the classes are not well separated.

Fisher chooses a direction that makes the projected classes much tighter, even though their projected means are less far apart.

Math of Fisher’s linear discriminants

• What linear transformation is best for discrimination?

• The projection onto the vector separating the class means seems sensible:

• But we also want small variance within each class:

• Fisher’s objective function is:

xwTy

12 mmw

)(

)(

222

121

2

1

mys

mys

Cnn

Cnn

22

21

212 )(

)(ss

mmJ

wbetween

within

)(:

)()()()(

)()(

)()(

121

2211

1212

22

21

212

21

mmSw

mxmxmxmxS

mmmmS

wSw

wSww

W

Cn

Tnn

Cn

TnnW

TB

WT

BT

solutionOptimal

ss

mmJ

More math of Fisher’s linear discriminants

Support Vector Machines(SVMs)

• a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks.

• a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class

• the larger the margin the lower the generalization error of the classifier.

A support vector machine (SVM) is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis.

http://en.wikipedia.org/wiki/Generalization_error

A separating hypreplane

1iy 1iy

Separating Hyperplane

But There are many possibilities

for such hyperplanes !!

0w x b

1x

2x

Separating Hyperplanes

1iy 1iy

Yes, There are many possible separating hyperplanes

It could be

Which one should we choose!

this one or this or this or maybe….!

Choosing a separating hyperplane:

ix'x

-Hyperplane should be as far as possible from any sample point.

-This way a new data that is close to the old samples will be classified correctly.

Good generalization!

Choosing a separating hyperplane.The SVM approach: Linear separable case

-The SVM idea is to maximize the distance between The hyperplane and the closest sample point.

In the optimal hyperplane:

The distance to the closest negative point =

The distance to the closest positive point.

Choosing a separating hyperplane.The SVM approach: Linear separable case

ix

Mar

gin

d d

d

These are Support Vectors

Support vectors are the samples closest to the separating hyperplane.

thanK yoU

pattern recognition linear classifier by zaheer ahmad

Documents