boosting rong jin. inefficiency with bagging d bagging … d1d1 d2d2 dkdk boostrap sampling h1h1...

Boosting

Rong Jin

Inefficiency with Bagging

D

Bagging

…

D1 D2 Dk

Boostrap Sampling

Pr( | , )iic h x

h1 h2 hk

Inefficient boostrap sampling:• Every example has equal chance to be sampled• No distinction between “easy” examples and “difficult” examples

Inefficient model combination:• A constant weight for each classifier• No distinction between accurate classifiers and inaccurate classifiers

Improve the Efficiency of Bagging

Better sampling strategy• Focus on the examples that are difficult to classify

Better combination strategy• Accurate model should be assigned larger weights

Intuition

Training Examples

X1

Y1

X2

Y2

X3

Y3

X4

Y4

Mistakes

X1

Y1

X3

Y3

Classifier1 Classifier2

Mistakes

X1

Y1

+Classifier3

No training mistakes !!

May overfitting !!

+

AdaBoost Algorithm

AdaBoost Example: t=ln2

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

1/5 1/5 1/51/5 1/5D0:

x5, y5x3, y3x1, y1

Sample

h1

Training

2/7 1/7 2/71/7 1/7D1:

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

Update Weightsh1

Samplex3, y3

x1, y1

h2

Training

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

h2 Update Weights

2/9 1/9 4/91/9 1/9D2: Sample …

How To Choose t in AdaBoost?How to construct the best distribution Dt+1(i)1. Dt+1(i) should be significantly different from Dt(i)

2. Dt+1(i) should create a situation that classifier ht performs poorly

How To Choose t in AdaBoost?

Optimization View for Choosing t

ht(x): x{1,-1}; a base (weak) classifier

HT(x): a linear combination of basic classifiers

Goal: minimize training error

Approximate error swith a exponential function

AdaBoost: Greedy OptimizationFix HT-1(x), and solve hT(x) and t

Empirical Study of AdaBoost

AdaBoosting decision trees• Generate 50 decision trees by

AdaBoost• Linearly combine decision trees

using the weights of AdaBoost

In general:• AdaBoost = Bagging > C4.5• AdaBoost usually needs less number

of classifiers than Bagging

Bia-Variance Tradeoff for AdaBoost• AdaBoost can reduce both variance and bias

simultaneously

single decision tree

Bagging decision tree

bias

variance

AdaBoosting decision trees

boosting rong jin. inefficiency with bagging d bagging … d1d1 d2d2 dkdk boostrap sampling h1h1...

Documents

t h t x

h2h2 training x

bagging slide

adaboost algorithm slide

base weak classifier

weights of adaboost

sample slide

update weights h1h1