pattern classification and evaluating

Pattern Classification and Evaluating

Contents Introduction Pattern Recognition Pattern Classification Evaluating a pattern

What is Pattern Recognition? The study of how machines can observe the

environment, learn to distinguish patterns of interest from their

background, and make sound and reasonable decisions about the

categories of the patterns.

What is a pattern? What kinds of category we have?

What is a Pattern? As opposite of a chaos; it is an entity, vaguely

defined, that could be given a name.

For example, a pattern could be A fingerprint images A handwritten cursive word A human face A speech signal

Pattern Recognition Models The four best known approaches

template matching statistical classification syntactic or structural matching neural networks

Pattern Recognition Models

Pattern Representation A pattern is represented by a set of d features, or

attributes, viewed as a d-dimensional feature vector.

1 2( , , , )Tdx x xx

Two Modes of a Pattern Recognition system

PreprocessingPreprocessingFeature

MeasurementFeature

MeasurementClassificationClassification

testpattern

Classification Mode

PreprocessingPreprocessing

FeatureExtraction/Selection

FeatureExtraction/Selection

LearningLearningtrainingpattern

Training Mode

Pattern Classification Pattern classification involves taking features

extracted from the image and using them to classify image objects automatically.

This is done by developing classification algorithms that use the feature information.

The primary uses of pattern classification are for computer vision and image compression applications development.

Pattern Classification Pattern classification is typically the final step

in the development of a computer vision algorithm.

In computer vision applications, the goal is to identify objects in order for the computer to perform some vision-related task.

These tasks range from computer diagnosis of medical images to object classification for robot control.

Pattern Classification In image compression, we want to remove

redundant information from the image and compress the important information as much as possible.

One way to compress information is to find a higher-level representation of it, which is exactly what feature analysis and pattern classification is all about.

Pattern Classification To develop a classification algorithm, we need

to divide our data into two. Training set: To develop the classification scheme Test set: To test the classification algorithm

Both the training and the test sets should represent the images that will be seen in the application domain.

Pattern Classification Theoretically, a larger training set size would

give an increasingly higher success rate. However, since we normally have a finite

number of data (images), they are equally divided between the two sets.

After the data have been divided, work can begin on the development of the classification algorithm.

Figure 6.4.1

Pattern Classification The general approach is to use the

information in the training set to classify the “unknown” samples in test set.

It is assumed that all samples available have a known classification.

The success rate is measured by the number of correct classifications.

Pattern Classification The simplest method for identifying a sample

from the test set is called the nearest neighbor method.

The object of interest is compared to every sample in the training using either a distance measure, a similarity measure or a combination of measures.

Pattern Classification The “unknown” object is then identified as

belonging to the to the same class as the closest example in the training set. If distance measure is used, this is indicated by

the smallest number. If similarity measure is used, this is indicated by

the largest number. This process is computationally intensive and

not robust.

Pattern Classification We can make the nearest neighbor method

more robust by selecting not just the vector it is closest to, but a group of close feature vectors.

This method is known as K-nearest neighbor method. K can be assigned any integer.

Pattern Classification Then we assign the unknown feature vector to

the class that occurs most often in the set of K-neighbors.

This is still very computationally expensive since we must compare each unknown sample to every sample in the training set.

Even worse, we normally want the training set to be as large as possible.

Pattern Classification One way to reduce the amount of

computation is by using a method called nearest centroid.

Here, we find the centroid vector that is the representative of the whole class.

The centroids are calculated by finding the average value for each vector component in the training set.

Pattern Classification The unknown sample only needs to be

compared with the representative centroid. This would reduce the number of comparisons and

subsequently the amount of calculations. Template matching is a pattern classification

method that uses the raw image data as a feature vector.

Pattern Classification A template is devised, possibly via a training

set, which is then compared to subimages by using a distance or similarity measure.

Typically, a threshold is set on this measure to determine when we have found a match.

More sophisticated methods using fuzzy logic, artificial neural network, and probability density model are also commonly used.

Evaluating a pattern recognition system Recognition rate Cross validation Interpretation of the results

Recognition rate

In your system specifications you need success criteria for your project (product) HW, SW, Real-time, recognition rate,…

Recognition rate = (number of correct classified / number of tested samples) Multiply by 100% and you have it in percentages

How do you test a system? How do you present and interpret the

results?

Test

The training data contains variations Is this variation similar to the variations in ”real life data” ?

The system will never be better than the training data !

The right question to ask is how well the trained system generalizes That is, how well does the system recognize UNKNOWN data? NEVER TRAIN OF TEST DATA !!! However, it does provide an upper limit for the recognition

rate Test methods:

Cross-validation M-fold cross validation

Methods for test

Cross-validation Train on % of the samples () and test on the rest is typically 90, depending on the number of samples and

the complexity of the system M-fold cross validation

Divide (randomly) all samples in M equally sized groups Use M-1 groups to train the system and test on the rest Do this M times and average the results

Training of the system Before we test we need to train our system How much should the system be trained?

How much should the different parameters be tuned? Danger of over fitting!

Interpretation of the results Recognition rate =

(number of correct classified / number of tested samples) Multiply by 100% and you have it in percentages

Error % = 100% - ( Recognition rate x 100% ) Distribution of errors? Confusion matrix

3 classes 25 samples

per class P1 P2 P3

P1 18 5 1

P2 0 24 1

P3 1 4 19

Input(the truth)

Output(from the system)

Confusion matrix

Provides inside into : Are the errors equally distributed or are the errors

only associated with a few classes? ”Solution”: Sub-divide the classes, delete some classes, use

other features, post processing,… Is one class too big (”eats many others”) ?? Which classes are close ?? Etc….

P1 P2 P3

P1 18 5 1

P2 0 24 1

P3 1 4 19

Input(the truth)


Confusion matrix – overview…

Number of errors = Incorrect recognized + not recognized

P1 P2

P3 Numberof

samples

Recog-nized

Number of

errors

Incorrect recognize

d

Not recognized (reject

class)

P1 a b c l a/l l-a a l -

P2 d e f m e/m m-e e m -

P3 g h i n i/n n-i i n -

To-tal

S=l+m+n

T/S S-T T S - 1 = a+b+c. 2= d+e+f. 3 = g+h+i. = T = Trace = a+e+i (matrix diagonal - successes)

General Representation of errors Number of errors = Incorrect recognized + Not recognized The total number of errors can be represented like this:

P1 P2 P3 Numberof

samples

Recog-

nized

Number of error

s

Incorrect

recognized

Not recogniz

ed

P1 a b c l a/l l-a a l -

P2 d e f m e/m m-e e m -

P3 g h i n i/n n-i i n -

To-tal

S=l+m+

n

T/S S-T T S -

”Yes”

”No”

”Yes”

”No”

How does the system respond to a random input….

Output

Inpu

t

General Representation of errors Number of errors = Incorrect recognized + Not

recognized The total number of errors can be represented like

this:

Yes No

Yes

No

Input(the truth)


Incorrect recognized(Type I error)(False positiv = FP)(False accept = FA)(False accept rate = FAR)(Ghost object) (False alarm)

Not recognized(Type II error)(False negativ = FN)(False reject = FR)(False reject rate = FRR)(Miss)

Design your system wrt errors

One parameter (threshold?) often controls FN and FP Use this parameter when designing the system wrt errors

Bayes classifier: Given an input, find the nearest class using Mahalanobis

distance: r (tavle) Are we 100% sure the input originates from a known class? Noise can result in unreliable data Solution: Besides nearest class we also introduce a Threshold

on r That is, r < TH otherwise ignore this sample (not recognized: FN)

(Tavle: kurver som funktion af r )

”Yes” ”No”

”Yes” Recognized Not recognized (FN)

”No” Incorrectly recognized (FP)

Design your system wrt errors Choice of TH: DEPENDS ON THE APPLICATION! Default: EER or overall minimum error If we want a low FP (incorrect recognized). This

results in more FN (not recognized) and you therefore need to post-process these samples

”Re-try” (as with conveyer belts) ”New” pattern recognizer with ”new” features Or…

General post-processing: store the likelihoods for each classified sample and use this in the following

General Representation of errors• Example: SETI

• Find intelligent signals in input data• FN versus FP – are they equally important?

Yes No

Yes

No

Input(the truth)




Ok

No !!

General Representation of errors• Example: Access control to nuclear weapons

• Is the person trying to enter ok?• FN versus FP – are they equally important?

Yes No

Yes

No

Input(the truth)




Ok

No !!

Design your system wrt errors

When we have True Negatives (correct rejection ) Fx a system which assesses whether a person is sick or well

Alternative representation ROC curve ROC = Receiver Operating Characteristic (Tavle )

X-axis: Fraction of the well who were found to be sick False Positive Rate (FPR): FP / (FP+TN)

Y-axis: Fraction of the sick who were found to be sick True Positive Rate (TPR): TP / (TP+FN)

ROC curves are good when comparing different systems Only one curve with normalized axes: [0,1]

The more similar a curve is to: the better the system

Sick (system output) Well (system output)

Sick (the truth) Correct (TP) Not recognized (FN)

Well (the truth) Incorrectly recognized (FP)

Correct rejection (TN)

What to remember (2/2)

Danger of over fitting!

Interpretation of results Confusion matrix Error = Incorrect recognized (FP) + not recognized

(FN) FP and FN depend on a Threshold value How is this Threshold value defined in your system? FP and FN can be illustrated directly or as a ROC curve Test on data from unknown classes

Relevant for your project?

What to remember (1/2)

Success criteria for your project HW, SW, Real-time, recognition rate

Test You perform tests to see how well the trained

system generalizes NEVER TEST ON TRAINING DATA!!! Cross-validation

Train on % of the samples () and test on the rest What should be?

M-fold cross validation Divide (randomly) all samples in M equally sized groups Use M-1 groups to train the system and test on the rest Do this M times and average the results

pattern classification and evaluating

Documents

pattern classificationto

object classification

known classification

classification schemetest

classification algorithmboth

classification algorithms

computer vision algorithm

feature information