boosting rong jin. inefficiency with bagging d bagging … d1d1 d2d2 dkdk boostrap sampling h1h1...

12
Boosting Rong Jin

Upload: zavier-donson

Post on 14-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

Boosting

Rong Jin

Page 2: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

Inefficiency with Bagging

D

Bagging

D1 D2 Dk

Boostrap Sampling

Pr( | , )iic h x

h1 h2 hk

Inefficient boostrap sampling:• Every example has equal chance to be sampled• No distinction between “easy” examples and “difficult” examples

Inefficient model combination:• A constant weight for each classifier• No distinction between accurate classifiers and inaccurate classifiers

Page 3: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

Improve the Efficiency of Bagging

Better sampling strategy• Focus on the examples that are difficult to classify

Better combination strategy• Accurate model should be assigned larger weights

Page 4: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

Intuition

Training Examples

X1

Y1

X2

Y2

X3

Y3

X4

Y4

Mistakes

X1

Y1

X3

Y3

Classifier1 Classifier2

Mistakes

X1

Y1

+Classifier3

No training mistakes !!

May overfitting !!

+

Page 5: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

AdaBoost Algorithm

Page 6: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

AdaBoost Example: t=ln2

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

1/5 1/5 1/51/5 1/5D0:

x5, y5x3, y3x1, y1

Sample

h1

Training

2/7 1/7 2/71/7 1/7D1:

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

Update Weightsh1

Samplex3, y3

x1, y1

h2

Training

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

h2 Update Weights

2/9 1/9 4/91/9 1/9D2: Sample …

Page 7: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

How To Choose t in AdaBoost?How to construct the best distribution Dt+1(i)1. Dt+1(i) should be significantly different from Dt(i)

2. Dt+1(i) should create a situation that classifier ht performs poorly

Page 8: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

How To Choose t in AdaBoost?

Page 9: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

Optimization View for Choosing t

ht(x): x{1,-1}; a base (weak) classifier

HT(x): a linear combination of basic classifiers

Goal: minimize training error

Approximate error swith a exponential function

Page 10: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

AdaBoost: Greedy OptimizationFix HT-1(x), and solve hT(x) and t

Page 11: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

Empirical Study of AdaBoost

AdaBoosting decision trees• Generate 50 decision trees by

AdaBoost• Linearly combine decision trees

using the weights of AdaBoost

In general:• AdaBoost = Bagging > C4.5• AdaBoost usually needs less number

of classifiers than Bagging

Page 12: Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficient boostrap sampling: Every example

Bia-Variance Tradeoff for AdaBoost• AdaBoost can reduce both variance and bias

simultaneously

single decision tree

Bagging decision tree

bias

variance

AdaBoosting decision trees