1 eric boosting304finalrpdf
TRANSCRIPT
-
8/22/2019 1 Eric Boosting304FinalRpdf
1/19
BOOSTING
(ADABOOST ALGORITHM)Eric Emer
-
8/22/2019 1 Eric Boosting304FinalRpdf
2/19
Consider Horse-Racing Gambler
Rules of Thumb for determining Win/Loss: Most favored odds Fastest recorded lap time Most wins recently, say, in the past 1 month
Hard to determine how he combines analysis of featureset into a single bet.
-
8/22/2019 1 Eric Boosting304FinalRpdf
3/19
Consider MIT Admissions
2-class system (Admit/Deny) Both Quantitative Data and Qualitative Data
We consider (Y/N) answers to be Quantitative (-1,+1) Region, for instance, is qualitative.
-
8/22/2019 1 Eric Boosting304FinalRpdf
4/19
Rules of Thumb, Weak Classifiers
Easy to come up with rules of thumb that correctly classify the training data atbetter than chance.
E.g. IF GoodAtMath==Y THEN predict Admit. Difficult to find a single, highly accurate prediction rule. This is where our Weak
Learning Algorithm, AdaBoost, helps us.
-
8/22/2019 1 Eric Boosting304FinalRpdf
5/19
What is a Weak Learner?
For any distribution, with high probability, givenpolynomially many examples and polynomial time we can
find a classifier with generalization errorbetter than
random guessing.
0 for generalization error ( 1
2 )
-
8/22/2019 1 Eric Boosting304FinalRpdf
6/19
Weak Learning Assumption
We assume that our Weak Learning Algorithm (WeakLearner) can consistently find weak classifiers (rules of
thumb which classify the data correctly at better than
50%)
Given this assumption, we can use boosting to generate asingle weighted classifier which correctly classifies our
training data at 99%-100%.
-
8/22/2019 1 Eric Boosting304FinalRpdf
7/19
AdaBoost Specifics
How does AdaBoost weight training examples optimally? Focus on difficult data points. The data points that have been
misclassified most by the previous weak classifier.
How does AdaBoost combine these weak classifiers into acomprehensive prediction?
Use an optimally weighted majority vote of weak classifier.
-
8/22/2019 1 Eric Boosting304FinalRpdf
8/19
AdaBoost Technical Description
Missing details: How to generate distribution? How to get single classifier?
-
8/22/2019 1 Eric Boosting304FinalRpdf
9/19
Constructing Dt
D1(i) =1
m
and given Dt and ht:
Dt+1 = Dt(i)Zt
c(x)
c(x) =
et : yi = ht(xi)et : yi 6= ht(xi)
Dt+1 = Dt(i)Zt
etyiht(xi)
here Zt = normalization constant
t =
1
2 ln
1 tt > 0
-
8/22/2019 1 Eric Boosting304FinalRpdf
10/19
Getting a Single Classifier
Hfinal(x) = sign(Xt
tht(x))
-
8/22/2019 1 Eric Boosting304FinalRpdf
11/19
Mini-Problem
-
8/22/2019 1 Eric Boosting304FinalRpdf
12/19
Training Error Analysis
Claim:
then,
-
8/22/2019 1 Eric Boosting304FinalRpdf
13/19
Proof
Step 1: unwrapping the recurrence
Step 2: Show
Step 3: Showtraining error(Hfinal)
Qt Zt
Zt = 2p
t(1 t)
-
8/22/2019 1 Eric Boosting304FinalRpdf
14/19
How might test error react to AdaBoost?
We expect to encounter:
Occams Razor
Overfitting
-
8/22/2019 1 Eric Boosting304FinalRpdf
15/19
Empirical results of test error
Test error does not increase even after 1000 rounds.Test error continues to drop after training error reaches zero.
-
8/22/2019 1 Eric Boosting304FinalRpdf
16/19
Difference from Expectation: The Margins
Explanation Our training error only measures correctness of
classifications, neglects confidence of classifications. How
can we measure confidence of classifications?
Margin(x,y) close to +1 is high confidence, correct. Margin(x,y) close to -1 is high confidence, incorrect. Margin(x,y) close to 0 is low confidence.
Hfinal(x) = sign(f(x))
f(x) =
Pt thtPt t
2 [1, 1]
margin(x, y) = yf(x)
-
8/22/2019 1 Eric Boosting304FinalRpdf
17/19
Empirical Evidence Supporting Margins
Explanation
Cumulative distribution of margins on training examples
Hfinal(x) = sign(f(x))
f(x) =
Pt thtPt t
2 [1, 1]
margin(x, y) = yf(x)
-
8/22/2019 1 Eric Boosting304FinalRpdf
18/19
Pros/Cons of AdaBoost
Pros
Fast Simple and easy to programNo parameters to tune(except T)
No prior knowledge neededabout weak learner
Provably effective givenWeak Learning Assumption
versatile
Cons
Weak classifiers toocomplex leads to
overfitting. Weak classifiers too weak
can lead to low margins,
and can also lead to
overfitting. From empirical evidence,AdaBoost is particularly
vulnerable to uniform
noise.
-
8/22/2019 1 Eric Boosting304FinalRpdf
19/19
Predicting College Football ResultsTraining Data: 2009 NCAAF SeasonTest Data: 2010 NCAAF Season