1 eric boosting304finalrpdf

8/22/2019 1 Eric Boosting304FinalRpdf

1/19

BOOSTING

(ADABOOST ALGORITHM)Eric Emer


2/19

Consider Horse-Racing Gambler

Rules of Thumb for determining Win/Loss: Most favored odds Fastest recorded lap time Most wins recently, say, in the past 1 month

Hard to determine how he combines analysis of featureset into a single bet.


3/19

Consider MIT Admissions

2-class system (Admit/Deny) Both Quantitative Data and Qualitative Data

We consider (Y/N) answers to be Quantitative (-1,+1) Region, for instance, is qualitative.


4/19

Rules of Thumb, Weak Classifiers

Easy to come up with rules of thumb that correctly classify the training data atbetter than chance.

E.g. IF GoodAtMath==Y THEN predict Admit. Difficult to find a single, highly accurate prediction rule. This is where our Weak

Learning Algorithm, AdaBoost, helps us.


5/19

What is a Weak Learner?

For any distribution, with high probability, givenpolynomially many examples and polynomial time we can

find a classifier with generalization errorbetter than

random guessing.

0 for generalization error ( 1

2 )


6/19

Weak Learning Assumption

We assume that our Weak Learning Algorithm (WeakLearner) can consistently find weak classifiers (rules of

thumb which classify the data correctly at better than

50%)

Given this assumption, we can use boosting to generate asingle weighted classifier which correctly classifies our

training data at 99%-100%.


7/19

AdaBoost Specifics

How does AdaBoost weight training examples optimally? Focus on difficult data points. The data points that have been

misclassified most by the previous weak classifier.

How does AdaBoost combine these weak classifiers into acomprehensive prediction?

Use an optimally weighted majority vote of weak classifier.


8/19

AdaBoost Technical Description

Missing details: How to generate distribution? How to get single classifier?


9/19

Constructing Dt

D1(i) =1

m

and given Dt and ht:

Dt+1 = Dt(i)Zt

c(x)

c(x) =

et : yi = ht(xi)et : yi 6= ht(xi)

Dt+1 = Dt(i)Zt

etyiht(xi)

here Zt = normalization constant

t =

1

2 ln

1 tt > 0


10/19

Getting a Single Classifier

Hfinal(x) = sign(Xt

tht(x))


11/19

Mini-Problem


12/19

Training Error Analysis

Claim:

then,


13/19

Proof

Step 1: unwrapping the recurrence

Step 2: Show

Step 3: Showtraining error(Hfinal)

Qt Zt

Zt = 2p

t(1 t)


14/19

How might test error react to AdaBoost?

We expect to encounter:

Occams Razor

Overfitting


15/19

Empirical results of test error

Test error does not increase even after 1000 rounds.Test error continues to drop after training error reaches zero.


16/19

Difference from Expectation: The Margins

Explanation Our training error only measures correctness of

classifications, neglects confidence of classifications. How

can we measure confidence of classifications?

Margin(x,y) close to +1 is high confidence, correct. Margin(x,y) close to -1 is high confidence, incorrect. Margin(x,y) close to 0 is low confidence.

Hfinal(x) = sign(f(x))

f(x) =

Pt thtPt t

2 [1, 1]

margin(x, y) = yf(x)


17/19

Empirical Evidence Supporting Margins

Explanation

Cumulative distribution of margins on training examples

Hfinal(x) = sign(f(x))

f(x) =

Pt thtPt t

2 [1, 1]

margin(x, y) = yf(x)


18/19

Pros/Cons of AdaBoost

Pros

Fast Simple and easy to programNo parameters to tune(except T)

No prior knowledge neededabout weak learner

Provably effective givenWeak Learning Assumption

versatile

Cons

Weak classifiers toocomplex leads to

overfitting. Weak classifiers too weak

can lead to low margins,

and can also lead to

overfitting. From empirical evidence,AdaBoost is particularly

vulnerable to uniform

noise.


19/19

Predicting College Football ResultsTraining Data: 2009 NCAAF SeasonTest Data: 2010 NCAAF Season

1 eric boosting304finalrpdf

Documents