lecture 6: classification – boosting and svms cap 5415 fall 2006
TRANSCRIPT
Lecture 6: Classification – Boosting and SVMsCAP 5415
Fall 2006
Course Project
Basic Requirement: Implement a vision algorithm
How complex? The experiments/implementation details should be
interesting enough for a 4-5 page write-up. If you choose a relatively simple algorithm, then you
should do interesting experiments to test the algorithm's limits
Groups
I encourage you to work in groups Can do more interesting projects Should be more interesting projects
Come talk to me if you would like to work in a group, but don't know anyone
Group write-up: 6-8 pages Possible goal: CVPR07 Submission (Dec 4)
~20% acceptance rate, don't plan on submitting second-rate work
How do I pick a project?
Strategy #1: Pick topic that you think is interesting Read three papers on that topic Implement one Or implement your own solution Could be original research
Lots of opportunity in the area of computational photography
Come talk to me!!! I can point you to interesting papers that have come out
recently
Strategy #2
I have a few original research ideas Computational Photography Surveillance Object Segmentation
Come talk to me to see what you're interested in and if you need help finding partners for a group project
No advantage in terms of grading
Q:I work in one of the vision groups, can I just turn in my CVPR07
submission?
A: No
Well, actually
Your project may be related, but should not just be your current research project
Examples Related side project that you haven't had time to
pursue in depth Application of algorithms that you have developed
for one problem to a different problem Should have interesting experiments
Getting it done
Write-ups due Dec 2 Brief Proposal Due Nov 7th
I would prefer Oct 18th or 25th
Whatever you work on, keep me updated!!!! I am here to help!
Grading I will give you feedback on your proposal
The earlier you touch base with me, the better Once we agree, if you do what your proposal stated and
turn in a good-quality write-up, you will get an “A” What if it doesn't work?
It happens a lot! Good write-up explaining what went wrong, what you
think the underlying problems are and how you would fix them if you were to keep working on this project
I'm not talking about “I didn't understand the math” or “My code kept crashing”
Can still get an “A”
One last thing about projects
I will be scheduling project meetings to meet with each group at the end of November
Class will be canceled on November 21 That class will be your project meeting.
What's wrong with this decision boundary?
(Assume this is the training data)
What's wrong with this decision boundary?
What if you then tested on this data?
This decision boundary over-fit the training data Hard to do with a linear classifier, but easy with a
non-linear classifier
How to tell if your classifier is overfitting
Strategy #1:Hold out part of your data as a test set
What if data is hard to come by? Strategy #2: k-fold cross-validation
Break the data set into k parts For each part, hold a part out, then train the
classifier and use the held out part as a test set Slower than test-set method More efficient use of limited data
Basic Set-up for Boosting
We want to learn a classifier
We will assume that F(x) has the form
Basic Idea: Iteratively Choose weak learners and set the
weights
AdaBoost
Initialization:
D is a distribution over the training examples Can also be thought of as a weight on each
exampleFrom “A short introduction to boosting” by Freund and Schapire
Next Step: Get Weak Learner
The weak learner trained to do as well as possible on the weighted training set Must have better than 50% accuracy
From “A short introduction to boosting” by Freund and Schapire
Next Reset Weights
From “A short introduction to boosting” by Freund and Schapire
Demo
Demo
In this demo, each weak learner is a stump of the form (ax+by)>c
Demo
Looking at the algorithm again
From “A short introduction to boosting” by Freund and Schapire
Advantages A simple algorithm for learning robust classifiers
Freund & Shapire, 1995 Friedman, Hastie, Tibshhirani, 1998
Provides efficient algorithm for sparse visual feature selection Tieu & Viola, 2000 Viola & Jones, 2003
Easy to implement, does not require external optimization tools.
(From Tutorial on Object Detection by Torralba, Ferbus, and Li – ICCV 2005)
Where do the weak learners come from?
Any classifier can be a weak learner Common ones:
Stump: r(x) > c Decision tree (Another kind of classifier)
Combined with Adaboost, has been dubbed “Best off-the-shelf classifier” (Friedman, Hastie, and Tibshirani)
Application: Face Detection (Viola and Jones 2001)
Features
Threshold on the response to simple features(Figures copied from Robust Real-time Object Detection by Viola and Jones) (2001)
Why?
Viola and Jones introduce a trick that lets them compute the response to these features very quickly Called integral image
First step: Doing a running, cumulative sum across the image
Integral Image Can compute the response in a square very
easily
These features also capture important features of faces
How well does it work?
95% Detection Rate with a false positive rate of 1 in 14084
Is it fast?
In 2001, one 384x288 image every 0.7 seconds Not real-time How can we make it faster?
Use a cascade A classifier with 2 weak-learners detect 100% of
the faces with a 40% false positive rate Have eliminated 60% of the training set with very
little computation Can now train a slightly more complicated
classifier to eliminate even more examples
The implementation
32 layers Layer 1 – Two Weak Learners (Rejects 60% of
non-faces) Layer 2 – Five Weak Learners (Rejects 80% of
non-faces) Layers 3-5 – 20 Weak Learners Layers 6-7 – 50 Weak Learners Layers 8-12 – 100 Weak Learners Layers 13-32 – 200 Weak Learners
Computation On average 8 features out of 4297 possible features are evaluated at every
pixel
On a 700Mhz Pentium III, can process a 384x288 image in 0.067 seconds
Almost as accurate as without a cascade
The Support Vector Machine Boosted Classifiers and SVM's are probably the
two most popular classifiers today I won't get into the math behind SVM's, if you
are interested, you should take the pattern recognition course (highly recommended)
The Support Vector Machine
Last time, we considered the problem of linear classification
We used probabilities to fit the line
The Support Vector Machine
Consider a different criterion Called the margin
The Support Vector Machine
Margin – minimum distance from a data point to the decision boundary
The Support Vector Machine
The SVM finds the decision boundary that maximizes the margin
The Support Vector Machine
Data points along the boundary are known as support vectors
Non-Linear Classification in SVMs
Last time, I showed how you could do non-linear classification by using non-linear transformations of the features
x
y
This is the decision boundary fromx2 + 8xy + y2 > 0
This is the same as making a new set of features, then doing linear classification
Non-Linear Classification in SVMs
The decision function can be expressed in terms of dot-products
Each α will be zero unless the vector is a support vector
Non-Linear Classification in SVMs
What if we wanted to do non-linear classification?
We could transform the features and compute the dot product of the transformed features.
But there may be an easier way!
The Kernel Trick
Let Φ(x) be a function that transforms x into a different space
A kernel function K is a function such that
Example (Burges 98)
If
Then
This is called the polynomial kernel
Gaussian RBF Kernel
One of the most commonly used kernels
Equivalent to doing a dot-product in an infinite dimensional space
The Kernel Trick
So, with a kernel function K, the new classification rule is
Basic Ideas: Computing the kernel function should be easier
than computing a dot-product in the transformed space
Other algorithms, like logistic regression can also be “kernelized”
So what if I want to use an SVM?
There are well-developed packages with Python and MATLAB interfaces libSVM SVMLight SVMTorch