classification ece 847: digital image processing stan birchfield clemson university

60
Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Upload: oliver-johnston

Post on 01-Jan-2016

297 views

Category:

Documents


14 download

TRANSCRIPT

Page 1: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Classification

ECE 847:Digital Image Processing

Stan BirchfieldClemson University

Page 2: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Acknowledgment

Many slides

are courtesy of Frank Dellaert

and

Jim Rehg at Georgia Tech

from http://www-static.cc.gatech.edu/classes/AY2007/cs4495_fall/html/materials.html

Page 3: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Classification problems

• Detection – Search set, find all instances of class

• Recognition – Given instance, label its identity

• Verification – Given instance and hypothesized identity, verify whether correct

• Tracking – Like detection, but local search and fixed identity

Page 4: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Classification issues

• Feature extraction – needed for practical reasons; distinction is somewhat arbitrary:– Perfect feature extraction classification is trivial– Perfect classifier no need for feature extraction

• occlusion (missing features)• mereology – study of part/whole relationships

POLOPONY, BEATS (not BE EATS)• segmentation – how can we classify before segmenting?

how can we segment before classifying?• context• computational complexity: 20x20 binary input is 10120

patterns!

Page 5: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Mereology exampleWhat does this say?

Page 6: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Decision theory• Decision theory – goal is to make a decision (i.e.,

set a decision boundary) so as to minimize cost• Pattern classification is perhaps most important

subfield of decision theory• Supervised learning: features, data sets,

algorithm

decision boundary

Page 7: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Overfitting

decision boundary

Could separate perfectly using nearest neighborsBut poor generalization (overfitting) – will not work well on new data

Occam’s razor – The simplest explanation is the best(Philosophical principle based upon the orderliness of the creation)

Page 8: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Bayes decision theory

0

1

class-conditional pdfs

Problem: Given a feature x, determine the most likely class: 1 or 2

Easy to measure with enough examples

Page 9: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Bayes’ rule

prior

evidence(normalization factor)

likelihood(class-conditional pdf)

posterior

0

1

0

1

Page 10: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

What is this P(1|x) ?

• Probability of class 1 given data x

1.0

0.0

P(1|x)

P(2|x) ?

P(1|x)+P(2|x)=1 !x

Note: Area under each curve is not 1

Page 11: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Bayes Classifier

• Classifier: Select• Decision boundaries occur where

1.0

0.0

P(1|x)

P(2|x)

select2

select1

select2

Page 12: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Bayes Risk

1.0

0.0

P(1|x)

P(2|x)

The shaded area is called the Bayes risk

The total risk is the expected loss when using the classifier:

where

(We’re assuming loss is constant here)

Page 13: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Finding a decision boundary is not the same asmodeling a conditional density.

Discriminative vs. Generative

Note: Bug in Forsyth-Ponce book: P(1|x)+P(2|x) != 1

Page 14: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Histograms• One way to compute class-

conditional pdfs is to collect a bunch of examples and store a histogram

• Then normalize

Page 15: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Application: Skin Histograms

• Skin has a very small range of (intensity independent) colours, and little texture– Compute colour measure, check if colour is in this

range, check if there is little texture (median filter)– See this as a classifier - we can set up the tests by hand,

or learn them.– get class conditional densities (histograms), priors from

data (counting)

• Classifier is

Page 16: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Finding skin color

3D histogram in RGB spaceM. J. Jones and J. M. Rehg, Statistical Color Models with Application to Skin Detection, Int. J. of Computer Vision, 46(1):81-96, Jan 2002.

Page 17: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Histogram

skin non-skin

Page 18: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Results

Note: We have assumed that all pixels are

independent!Context is ignored

Page 19: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Confusion matrixtrue positive = hit

false positive = false alarm = false detection= Type I errorfalse negative

= miss= false dismissal = Type II error

• sensitivity = true positive rate = hit rate = recallTPR = TP / (TP+FN)

• false negative rate FNR = FN / (TP+FN)

• false positive rate = false alarm rate= falloutFPR = FP / (FP+TN)

• specificity SPC = TN / (FP+TN)

TPR + FNR = 1 FPR + SPC = 1

Page 20: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Receiver operating characteristic (ROC) curve

FPR

TPR

equal error rate(EER) = 88%

confusion matrix for image classifier:

Page 21: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Cross-validation

Page 22: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Naïve Bayes

• Quantize image patches, then compute a histogram of patch types within a face

• But histograms suffer from the curse of dimensionality

• Histogram in N dimensions is intractable with N>5

• To solve this, assume independence among the pixels

• Features are the patch typesP(image|face) = P(label 1 at (x1,y1)|face)...P(label k at (xk,yk)|face)

Page 23: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Histograms applied to faces and cars

H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000)

Page 24: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Alternative: Kernel density estimation (Parzen windows)

K/N is fraction of samples that fall into volume V

Page 25: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Parzen windows

• Non-parametric technique

• Center kernel at each data point, sum results (and normalize) to get pdf

Page 26: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Parzen windows

Page 27: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Gaussian Parzen Windows

Page 28: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Parzen Window Density Estimation

Page 29: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Comparison

Histograms• non-parametric• smoothing parameter = #

of bins• discard data afterwards• discontinuous• boundaries arbitrary• d dimensions Md bins

(curse of dimensionality)

Parzen windows• non-parametric• smoothing parameter =

size of kernel• need data always• discontinuous (box) or

continuous (Gaussian)• boundaries data driven

(box) or no boundaries (Gaussian)

• dimensionality not as much of a curse

Page 30: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Another alternative: Locally Weighted Averaging (LWA)

• Keep instance database• At each query point, form locally weighted

average

• Equivalent to Parzen windows• memory based, lazy learning, applicable to

any kernel, can be slow

f(i) = 1 for positive examples, 0 for negative examples

Page 31: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

LWA Classifier, Circular Kernel

Kernel Weights

Data, 2 classes

LWA Posterior

All Data

Page 32: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

K-Nearest Neighbors

Classification = majority vote of K nearest neighbors

Page 33: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Recognition by finding patterns

• We have seen very simple template matching (under filters)

• Some objects behave like quite simple templates – Frontal faces

• Strategy:– Find image

windows– Correct lighting– Pass them to a

statistical test (a classifier) that accepts faces and rejects non-faces

Page 34: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Finding faces• Faces “look like”

templates (at least when they’re frontal).

• General strategy:– search image windows

at a range of scales– Correct for illumination– Present corrected

window to classifier

• Issues– How corrected?– What features?– What classifier?

classifier

learner

featureextraction

trainingdatabase

test image

training image decision

Page 35: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Face detection

http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf

Page 36: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Face recognition

http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf

Page 37: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Linear discriminant functions

• g(x) = wTx+w0

• decision surface is hyperplane

• w is perpendicular to hyperplane

• neural network: combination of linear discriminant functions

• sigmoid function is differentiable, enables backpropagation

Page 38: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Neural networks for detecting faces

Henry A. Rowley, Shumeet Baluja, and Takeo Kanade, Neural Network-Based Face Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998.

Page 39: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Neural networks for detecting faces

positive training images: scaled, rotated, translated,

and mirrored

negative training images

Page 40: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Neural networks for detecting faces

Page 41: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Arbitration

Page 42: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Bootstrapping

• Hardest examples to classify are those near the decision boundary

• These are also the most useful for training

• Approach: Run detector, find examples of misclassification, feed back into training process

Page 43: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Results

Page 44: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Real-time face detection

• Components– Cascade architecture– Box sum features (integral image)

H1

H2

Hn

Non-face

Non-face Face

Viola and Jones, CVPR 2001

Page 45: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Haar-like features(Integral image makes

computation fast)

Page 46: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

More features

Page 47: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Example

•Feature’s value is calculated as the difference between the sum of the pixels within white and black rectangle regions.

)Sum(r)Sum(r black i, whitei, if

thresholdfif

thresholdfifxh

i

ii 1

1)(

Page 48: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Boosting

Page 49: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Adaboost

)...( 2211 nnhwhwhwsignF

ii

iii fif

fifxh

1

1)( ,where

The more distinctive the feature, the larger the weight.

Page 50: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Training images

Page 51: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Results

Page 52: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Training

Viola-Jones Direct Feature Selection

(two orders of magnitude faster)Jianxin Wu, James M. Rehg, Matthew D. Mullin. Learning a Rare Event Detection Cascade by Direct Feature Selection, NIPS 2003.

Page 53: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Using OpenCV detector

1. Collect a database of positive samples and a database of negative samples.

2. Mark object by objectmarker.exe3. Build a vec file out of positive samples using

createsamples.exe4. Run haartraining.exe to build the classifier.5. Run performance.exe to evaluate the classifier.6. Run haarconv.exe to convert classifier to .xml

file

Page 54: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Using OpenCV detector1. Mark positive samples: info.txt2. Use createsamples,exe to pack the positive samples into

“hw.vec” file. createsamples –info info.txt –vec hw.vec –w 15 –h 12 (The minimum size of marked object was 15 by 12)

3. Use haartraining.exe to train the classifier. haartraining –data hw –vec hw.vec -bg background.txt –mem 100 –w 15 –h 12 –nstages 18

4. Convert classifier to xml. Convert hw hw.xml 15 12.5. Use performance.exe to check the performance.

performance –dada hw.xml –info.txt –w 15 –h 12 –ni6. Use PatternDetector class in Blepo to display the results

m_Detector = new PatternDetector(xml_file_name); 7. In the results, you will see a object detected twice or

more, with overlap.

from Zhichao Chen

Page 55: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Using OpenCV detectorResult from checking performance:

Here you can see that the classifier detected 469 positive objects and missed 36. The false positive is bigger(1991), because

• A positive object might be detected many times and the positions are slightly different. Some “good” detections are regarded as “false”

• We only used 18 stages . More stages would reduce the false positives, at the expense of more training time.

• No background image was included for training.

Conclusions: • Use the proper sample size for training. Basically, the sample size should be

similar to the minimum size of the marked object.• If the FPR is too high, increase the number of stages.

from Zhichao Chen

Page 56: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

OpenCV detector links

• Original Viola-Jones paper: http://research.microsoft.com/~viola/Pubs/Detect/violaJones_CVPR2001.pdf

• OpenCV library:http://sourceforge.net/projects/opencvlibrary

• How-to build a cascade of boosted classifiers based on Haar-like features: http://lab.cntl.kyutech.ac.jp/~kobalab/nishida/opencv/OpenCV_ObjectDetection_HowTo.pdf

• Objectmarker.exe and haarconv.exe, *.dll:

http://www.iem.pw.edu.pl/~domanskj/haarkit.rar

from Zhichao Chen

Page 57: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Fisher linear discriminant

http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf

Page 58: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Linear SVMs

http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf

Page 59: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Non-linear SVMs

http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf

Page 60: Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Eigenfaces