introduction to statistical pattern recognition part ii setosa iris versicolor iris virginica...

Introduction to Statistical Pattern Recognition

Part II

1/20/2011 ECE 523: Introduction to Biometrics 1

Outline


• Bayes Detection Rule Revisited

• Probability of Error

• Evaluating the Classifier

• Matlab illustrations


Bayes Decision Rule

Decide

• Two-class case

• N-class case

Given a feature vector x, assign it to class wj if:

Expanding P(wj|x) and P(wi|x)


Bayes Decision Rule

• N-class case

Given a feature vector x, assign it to class wj if:

• Likelihood Ratio: 2-class case

Likelihood ratio Threshold


• An error is made when we classify an observation as class wi when it is really in the j-th class. Denote the complement of region i as i

c , the probability of error is

Bayes Decision Rule: Probability of Error (N-class)


Bayes Decision Rule: Probability of Error (2-class)


• We can set the amount of error we will tolerate for misclassifying one of the classes

Case I: Fish Sorting Example (Salmon vs. Sea Bass)

-6 -4 -2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

Feature-x

Posterior 1

Posterior 2

Salmon Sea Bass

x*

Salmon: $20/lb Sea Bass: $10/lb

To satisfy customers, which error should be minimized? Error I or Error II

I II



Case II: Cancerous vs. Healthy Tissue

-6 -4 -2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

Feature-x

Posterior 1

Posterior 2

Healthy Cancerous

x*

I II

Taking into account the patient’s well-being, which error should be minimized? Error I or Error II




-6 -4 -2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

Feature-x

Posterior 1

Posterior 2Target Class

Non-target Class

x*

I

Region I shows the probability of false alarm or the probability of wrongly classifying as target (class w1) when it really belongs to class w2.


Example

We will look at a univariate classification problem with two classes. The class-conditionals are given by the normal distributions as follows:

The priors are

Adjust the decision boundary such to achieve a desired probability of false alarm, 𝑃 𝐹𝐴 =0.05, e.g., (a) probability that cancerous tissue is classified as healthy or (b) probability that sea bass is classified as salmon


Example

We need to find the value of 𝑥∗ such that

𝑥∗ is a quantile, i.e.,

x* = norminv(0.05/0.4,1,1);

x* = -0.15


Evaluating the Classifier

• Need to evaluate its usefulness by measuring the percentage of observations we correctly classify

• Important to report the probability of false alarms



Independent Test Sample

• If sample collection is large, divide it into training and testing sets

• Training set – build the classifier

• Testing set – classify observations in the test set using our classification rule

• Estimated classification rate – proportion of correctly classified observations

• Common mistake that novice researches make is to build a classifier using their sample and then use the same sample for testing


Evaluating the Classifier: Independent Test Sample

Database

• Iris flower data set – introduced by Sir Ronald Aylmer Fisher (1936)

• Dataset consists of 50 samples from each of three species of Iris flowers

• Four features measured from each sample, i.e., length and width of sepal and petal in centimeters

Iris setosa Iris versicolor Iris virginica



Probability of Correct Classification – Independent Test Sample (Formal Procedure)

• Randomly separate 𝑛 samples into two sets of size 𝑛𝑡𝑟𝑎𝑖𝑛 and 𝑛𝑡𝑒𝑠𝑡, where 𝑛𝑡𝑟𝑎𝑖𝑛 + 𝑛𝑡𝑒𝑠𝑡 = 𝑛

• Build the classifier (e.g., Bayes Decision Rule) using the training set

• Present each pattern from the test set to the classifier and obtain a class label for it. Since we know the correct class label for these observations beforehand, we can count the number of patterns (𝑁𝑐𝑐) correctly classified

• Probability of correct classification is



Matlab illustration (consider only the two species that are hard to separate, i.e., iris

versicolor and iris virginica)

% Load data

load iris

% Get data for training and testing set

% Use only first two features

indtrain = 1:2:50;

indtest = 2:2:50;

versitest = versicolor(indtest,1:2);

versitrain = versicolor(indtrain,1:2);

virgitest = virginica(indtest,1:2);

virgitrain = virginica(indtrain,1:2);

• Randomly separate 𝑛 samples into two sets of size 𝑛𝑡𝑟𝑎𝑖𝑛 and 𝑛𝑡𝑒𝑠𝑡, where 𝑛𝑡𝑟𝑎𝑖𝑛 +𝑛𝑡𝑒𝑠𝑡 = 𝑛



• Build the classifier (e.g., Bayes Decision Rule) using the training set, assume multivariate normal model for these data

muver = mean(versitrain);

covver = cov(versitrain);

muvir = mean(virgitrain);

covvir = cov(virgitrain);


Evaluating the Classifier: Independent Test Sample • Present each pattern from the test set to the classifier and obtain a class label for

it. Since we know the correct class label for these observations beforehand, we can count the number of patterns (𝑁𝑐𝑐) correctly classified

• Use equal priors

% Put all of the test data into one matrix.

X = [versitest; virgitest];

% These are the probability of x given versicolor.

pxgver = csevalnorm(X, muver, covver);

% These are the probability of x given virginica.

pxgvir = csevalnorm(X, muvir, covvir);

% Check which are correctly classified

ind = find(pxgver(1:25) > pxgvir(1:25));

ncc = length(ind);

ind = find(pxgvir(26:50) > pxgver(26:50));

ncc = ncc + length(ind);

pcc = ncc/50



Cross-validation

• Systematically partition the data into training and testing sets • 𝑛 − 𝑘 observations are used to build the classifier, and the remaining 𝑘 patterns

are used to test it


Cross-validation (Formal Procedure) at 𝑘 = 1 (also known as leave-one-out method)

• Set the number of correctly classified to 0, i.e., 𝑁𝐶𝐶 = 0

• Keep out one observation, call it 𝑥𝑖

• Build the classifier using the remaining 𝑛 − 1 observations

• Present the observation 𝑥𝑖 to the classifier and obtain a class label using the classifier from the previous step

• If class label is correct, increment 𝑁𝐶𝐶, i.e., 𝑁𝐶𝐶 = 𝑁𝐶𝐶 + 1

• Repeat steps 2-5 for each pattern in the sample

• Probability of correct classification is

Evaluating the Classifier: Cross-validation



Matlab Illustration • Use iris data and estimate probability of correct classification • Use cross-validation with 𝑘 = 1 • Use versicolor and virginica only • Equal priors • Use first two features only • Build the classifier (e.g., Bayes Decision Rule) using the training set, assume

multivariate normal model for these data



% Load data

load iris

% Set ncc= 0

ncc = 0;

% Use only first two features

virginica(:,3:4) = [];

versicolor(:,3:4) = [];

% Sample size

[nver,d] = size(versicolor);

[nvir,d] = size(virginica);

n = nvir + nver;



% Loop first through all of the patterns corresponding

% to versicolor.

muvir = mean(virginica);

covvir = cov(virginica);

% These will be the same for this part.

for i = 1:nver

% Get the test point and the training set

versitrain = versicolor;

% This is the testing point.

x = versitrain(i,:);

% Delete from training set.

% The result is the training set.

versitrain(i,:)=[];

muver = mean(versitrain);

covver = cov(versitrain);

pxgver = csevalnorm(x,muver,covver);

pxgvir = csevalnorm(x,muvir,covvir);

if pxgver > pxgvir

% then we correctly classified it

ncc = ncc+1;

end

end


Evaluating the Classifier: Cross-validation % Loop through all of the patterns of virginica

notes.

muver = mean(versicolor);

covver = cov(versicolor);

% Those remain the same for the following.

for i = 1:nvir

% Get the test point and training set.

virtrain = virginica;

x = virtrain(i,:);

virtrain(i,:)=[];

muvir = mean(virtrain);

covvir = cov(virtrain);

pxgver = csevalnorm(x,muver,covver);

pxgvir = csevalnorm(x,muvir,covvir);

if pxgvir > pxgver

% then we correctly classified it

ncc = ncc+1;

end

end

pcc = ncc/n


Homework #2

(A) • Use iris data and estimate probability of correct classification • Use cross-validation with 𝑘 = 2 • Use versicolor and virginica only • Equal priors • Use first two features only • Build the classifier (e.g., Bayes Decision Rule) using the training set, assume


(B) • Use iris data and estimate probability of correct classification • Use cross-validation with 𝑘 = 2 • Use versicolor and virginica only • Equal priors • Use all four features • Build the classifier (e.g., Bayes Decision Rule) using the training set, assume



Future topics

• Receiver Operating Characteristics (ROCs)

• Face Detection in Color Images using Skin Models


References

R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification, 2nd edition, John Wiley & Sons, Inc., 2000 Selim Aksoy, CS 551(Pattern Recognition) Course Website, http://www.cs.bilkent.edu.tr/~saksoy/courses/cs551-Spring2010/index.html W. Martinez and A. Martinez, Computational Statistics Handbook with MATLAB, 2nd edition, Chapman and Hall/CRC, Inc., 2007

http://www.cs.bilkent.edu.tr/~saksoy/courses/cs551-Spring2010/index.html




introduction to statistical pattern recognition part ii setosa iris versicolor iris virginica...

Documents