adaboost - university of oxford

136
AdaBoost Jiri Matas and Jan ˇ Sochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz

Upload: others

Post on 03-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AdaBoost - University of Oxford

AdaBoost

Jiri Matas and Jan Sochman

Centre for Machine PerceptionCzech Technical University, Prague

http://cmp.felk.cvut.cz

Page 2: AdaBoost - University of Oxford

Presentation

Outline:

� AdaBoost algorithm

• Why is of interest?

• How it works?

• Why it works?

� AdaBoost variants

� AdaBoost with a Totally Corrective Step (TCS)

� Experiments with a Totally Corrective Step

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 3: AdaBoost - University of Oxford

Introduction

� 1990 – Boost-by-majority algorithm (Freund)

� 1995 – AdaBoost (Freund & Schapire)

� 1997 – Generalized version of AdaBoost (Schapire & Singer)

� 2001 – AdaBoost in Face Detection (Viola & Jones)

Interesting properties:

� AB is a linear classifier with all its desirable properties.

� AB output converges to the logarithm of likelihood ratio.

� AB has good generalization properties.

� AB is a feature selector with a principled strategy (minimisation of upperbound on empirical error).

� AB close to sequential decision making (it produces a sequence of graduallymore complex classifiers).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 4: AdaBoost - University of Oxford

What is AdaBoost?

� AdaBoost is an algorithm for constructing a ”strong” classifier as linearcombination

f(x) =T∑

t=1

αtht(x)

of “simple” “weak” classifiers ht(x).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 5: AdaBoost - University of Oxford

What is AdaBoost?

� AdaBoost is an algorithm for constructing a ”strong” classifier as linearcombination

f(x) =T∑

t=1

αtht(x)

of “simple” “weak” classifiers ht(x).

Terminology

� ht(x) ... “weak” or basis classifier, hypothesis, ”feature”

� H(x) = sign(f(x)) ... ‘’strong” or final classifier/hypothesis

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 6: AdaBoost - University of Oxford

What is AdaBoost?

� AdaBoost is an algorithm for constructing a ”strong” classifier as linearcombination

f(x) =T∑

t=1

αtht(x)

of “simple” “weak” classifiers ht(x).

Terminology

� ht(x) ... “weak” or basis classifier, hypothesis, ”feature”

� H(x) = sign(f(x)) ... ‘’strong” or final classifier/hypothesis

Comments

� The ht(x)’s can be thought of as features.

� Often (typically) the set H = {h(x)} is infinite.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 7: AdaBoost - University of Oxford

(Discrete) AdaBoost Algorithm – Singer & Schapire (1997)

Given: (x1, y1), ..., (xm, ym); xi ∈ X , yi ∈ {−1, 1}

Initialize weights D1(i) = 1/m

For t = 1, ..., T :

1. (Call WeakLearn), which returns the weak classifier ht : X → {−1, 1} withminimum error w.r.t. distribution Dt;

2. Choose αt ∈ R,

3. Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

where Zt is a normalization factor chosen so that Dt+1 is a distribution

Output the strong classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 8: AdaBoost - University of Oxford

(Discrete) AdaBoost Algorithm – Singer & Schapire (1997)

Given: (x1, y1), ..., (xm, ym); xi ∈ X , yi ∈ {−1, 1}

Initialize weights D1(i) = 1/m

For t = 1, ..., T :

1. (Call WeakLearn), which returns the weak classifier ht : X → {−1, 1} withminimum error w.r.t. distribution Dt;

2. Choose αt ∈ R,3. Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

where Zt is a normalization factor chosen so that Dt+1 is a distribution

Output the strong classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

Comments� The computational complexity of selecting ht is independent of t.� All information about previously selected “features” is captured in Dt!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 9: AdaBoost - University of Oxford

WeakLearn

Loop step: Call WeakLearn, given distribution Dt;returns weak classifier ht : X → {−1, 1} from H = {h(x)}

� Select a weak classifier with the smallest weighted errorht = arg min

hj∈Hεj =

∑mi=1 Dt(i)[yi 6= hj(xi)]

� Prerequisite: εt < 1/2 (otherwise stop)

� WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

• Selecting the best one from given finite set H

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 10: AdaBoost - University of Oxford

WeakLearn

Loop step: Call WeakLearn, given distribution Dt;returns weak classifier ht : X → {−1, 1} from H = {h(x)}

� Select a weak classifier with the smallest weighted errorht = arg min

hj∈Hεj =

∑mi=1 Dt(i)[yi 6= hj(xi)]

� Prerequisite: εt < 1/2 (otherwise stop)

� WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

• Selecting the best one from given finite set H

Demonstration example

Weak classifier = perceptron

• ∼ N(0, 1) • ∼ 1

r√

8π3e−1/2(r−4)2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 11: AdaBoost - University of Oxford

WeakLearn

Loop step: Call WeakLearn, given distribution Dt;returns weak classifier ht : X → {−1, 1} from H = {h(x)}

� Select a weak classifier with the smallest weighted errorht = arg min

hj∈Hεj =

∑mi=1 Dt(i)[yi 6= hj(xi)]

� Prerequisite: εt < 1/2 (otherwise stop)

� WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

• Selecting the best one from given finite set H

Demonstration example

Training set Weak classifier = perceptron

• ∼ N(0, 1) • ∼ 1

r√

8π3e−1/2(r−4)2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 12: AdaBoost - University of Oxford

WeakLearn

Loop step: Call WeakLearn, given distribution Dt;returns weak classifier ht : X → {−1, 1} from H = {h(x)}

� Select a weak classifier with the smallest weighted errorht = arg min

hj∈Hεj =

∑mi=1 Dt(i)[yi 6= hj(xi)]

� Prerequisite: εt < 1/2 (otherwise stop)

� WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

• Selecting the best one from given finite set H

Demonstration example

Training set Weak classifier = perceptron

• ∼ N(0, 1) • ∼ 1

r√

8π3e−1/2(r−4)2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 13: AdaBoost - University of Oxford

AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

� The main objective is to minimize εtr = 1m|{i : H(xi) 6= yi}|

� It can be upper bounded by εtr(H) ≤T∏

t=1Zt

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 14: AdaBoost - University of Oxford

AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

� The main objective is to minimize εtr = 1m|{i : H(xi) 6= yi}|

� It can be upper bounded by εtr(H) ≤T∏

t=1Zt

How to set αt?

� Select αt to greedily minimize Zt(α) in each step

� Zt(α) is convex differentiable function with one extremum

⇒ ht(x) ∈ {−1, 1} then optimal αt = 12 log(1+rt

1−rt)

where rt =∑m

i=1 Dt(i)ht(xi)yi

� Zt = 2√

εt(1− εt) ≤ 1 for optimal αt

⇒ Justification of selection of ht according to εt

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 15: AdaBoost - University of Oxford

AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

� The main objective is to minimize εtr = 1m|{i : H(xi) 6= yi}|

� It can be upper bounded by εtr(H) ≤T∏

t=1Zt

How to set αt?� Select αt to greedily minimize Zt(α) in each step

� Zt(α) is convex differentiable function with one extremum

⇒ ht(x) ∈ {−1, 1} then optimal αt = 12 log(1+rt

1−rt)

where rt =∑m

i=1 Dt(i)ht(xi)yi

� Zt = 2√

εt(1− εt) ≤ 1 for optimal αt

⇒ Justification of selection of ht according to εt

Comments

� The process of selecting αt and ht(x) can be interpreted as a singleoptimization step minimising the upper bound on the empirical error.Improvement of the bound is guaranteed, provided that εt < 1/2.

� The process can be interpreted as a component-wise local optimization(Gauss-Southwell iteration) in the (possibly infinite dimensional!) space ofα = (α1, α2, . . . ) starting from. α0 = (0, 0, . . . ).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 16: AdaBoost - University of Oxford

Reweighting

Effect on the training set

Reweighting formula:

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt=

exp(−yi

∑tq=1 αqhq(xi))

m∏t

q=1 Zq

exp(−αtyiht(xi)){

< 1, yi = ht(xi)> 1, yi 6= ht(xi)

}

⇒ Increase (decrease) weight of wrongly (correctly)classified examples. The weight is the upper bound on theerror of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 17: AdaBoost - University of Oxford

Reweighting

Effect on the training set

Reweighting formula:

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt=

exp(−yi

∑tq=1 αqhq(xi))

m∏t

q=1 Zq

exp(−αtyiht(xi)){

< 1, yi = ht(xi)> 1, yi 6= ht(xi)

}

⇒ Increase (decrease) weight of wrongly (correctly)classified examples. The weight is the upper bound on theerror of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 18: AdaBoost - University of Oxford

Reweighting

Effect on the training set

Reweighting formula:

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt=

exp(−yi

∑tq=1 αqhq(xi))

m∏t

q=1 Zq

exp(−αtyiht(xi)){

< 1, yi = ht(xi)> 1, yi 6= ht(xi)

}

⇒ Increase (decrease) weight of wrongly (correctly)classified examples. The weight is the upper bound on theerror of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 19: AdaBoost - University of Oxford

Reweighting

Effect on the training set

Reweighting formula:

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt=

exp(−yi

∑tq=1 αqhq(xi))

m∏t

q=1 Zq

exp(−αtyiht(xi)){

< 1, yi = ht(xi)> 1, yi 6= ht(xi)

}

⇒ Increase (decrease) weight of wrongly (correctly)classified examples. The weight is the upper bound on theerror of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 20: AdaBoost - University of Oxford

Reweighting

Effect on the training set

Reweighting formula:

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt=

exp(−yi

∑tq=1 αqhq(xi))

m∏t

q=1 Zq

exp(−αtyiht(xi)){

< 1, yi = ht(xi)> 1, yi 6= ht(xi)

}

⇒ Increase (decrease) weight of wrongly (correctly)classified examples. The weight is the upper bound on theerror of a given example!

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

yf(x)

err

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 21: AdaBoost - University of Oxford

Reweighting

Effect on the training set

Reweighting formula:

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt=

exp(−yi

∑tq=1 αqhq(xi))

m∏t

q=1 Zq

exp(−αtyiht(xi)){

< 1, yi = ht(xi)> 1, yi 6= ht(xi)

}

⇒ Increase (decrease) weight of wrongly (correctly)classified examples. The weight is the upper bound on theerror of a given example!

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

yf(x)

err

Effect on ht

� αt minimize Zt ⇒∑i:ht(xi)=yi

Dt+1(i) =∑

i:ht(xi) 6=yi

Dt+1(i)

� Error of ht on Dt+1 is 1/2

� Next weak classifier is the most“independent” one

1 2 3 4

e

0.5

t

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 22: AdaBoost - University of Oxford

Summary of the Algorithm

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 23: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization... 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 24: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 25: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 26: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 27: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 28: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 29: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 1

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 30: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 2

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 31: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 3

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 32: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 4

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 33: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 5

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 34: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 6

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 35: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 7

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 36: AdaBoost - University of Oxford

Summary of the Algorithm

Initialization...For t = 1, ..., T :

� Find ht = arg minhj∈H

εj =m∑

i=1

Dt(i)[yi 6= hj(xi)]

� If εt ≥ 1/2 then stop

� Set αt = 12 log(1+rt

1−rt)

� Update

Dt+1(i) =Dt(i)exp(−αtyiht(xi))

Zt

Output the final classifier:

H(x) = sign

(T∑

t=1

αtht(x)

)

t = 40

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 37: AdaBoost - University of Oxford

Does AdaBoost generalize?

Margins in SVM

max min(x,y)∈S

y(~α · ~h(x))‖~α‖2

Margins in AdaBoost

max min(x,y)∈S

y(~α · ~h(x))‖~α‖1

Maximizing margins in AdaBoost

PS[yf(x) ≤ θ] ≤ 2TT∏

t=1

√ε1−θt (1− εt)1+θ where f(x) =

~α · ~h(x)‖~α‖1

Upper bounds based on margin

PD[yf(x) ≤ 0] ≤ PS[yf(x) ≤ θ] +O

1√m

(d log2(m/d)

θ2+ log(1/δ)

)1/2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 38: AdaBoost - University of Oxford

AdaBoost variants

Freund & Schapire 1995

� Discrete (h : X → {0, 1})

� Multiclass AdaBoost.M1 (h : X → {0, 1, ..., k})

� Multiclass AdaBoost.M2 (h : X → [0, 1]k)

� Real valued AdaBoost.R (Y = [0, 1], h : X → [0, 1])

Schapire & Singer 1997

� Confidence rated prediction (h : X → R, two-class)

� Multilabel AdaBoost.MR, AdaBoost.MH (different formulation of minimizedloss)

... Many other modifications since then (Totally Corrective AB, Cascaded AB)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 39: AdaBoost - University of Oxford

Pros and cons of AdaBoost

Advantages

� Very simple to implement

� Feature selection on very large sets of features

� Fairly good generalization

Disadvantages

� Suboptimal solution for α

� Can overfit in presence of noise

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 40: AdaBoost - University of Oxford

Adaboost with a Totally Corrective Step (TCA)

Given: (x1, y1), ..., (xm, ym); xi ∈ X , yi ∈ {−1, 1}Initialize weights D1(i) = 1/m

For t = 1, ..., T :

1. (Call WeakLearn), which returns the weak classifier ht : X → {−1, 1} withminimum error w.r.t. distribution Dt;

2. Choose αt ∈ R,

3. Update Dt+1

4. (Call WeakLearn) on the set of hm’s with non zero α’s . Update α..Update Dt+1. Repeat till |εt − 1/2| < δ,∀t.

Comments

� All weak classifiers have εt ≈ 1/2, therefore the classifier selected at t + 1 is”independent” of all classifiers selected so far.

� It can be easily shown, that the totally corrective step reduces the upperbound on the empirical error without increasing classifier complexity.

� The TCA was first proposed by Kivinen and Warmuth, but their αt is set as instadard Adaboost.

� Generalization of TCA is an open question.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 41: AdaBoost - University of Oxford

Experiments with TCA on the IDA Database

� Discrete AdaBoost, Real AdaBoost, and Discrete and Real TCA evaluated

� Weak learner: stumps.

� Data from the IDA repository (Ratsch:2000):Input Training Testing Number ofdimension patterns patterns realizations

Banana 2 400 4900 100Breast cancer 9 200 77 100Diabetes 8 468 300 100German 20 700 300 100Heart 13 170 100 100Image segment 18 1300 1010 20Ringnorm 20 400 7000 100Flare solar 9 666 400 100Splice 60 1000 2175 20Thyroid 5 140 75 100Titanic 3 150 2051 100Twonorm 20 400 7000 100Waveform 21 400 4600 100

� Note that the training sets are fairly small

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 42: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 43: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Length of the strong classifier

IMAGE

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 44: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

Length of the strong classifier

FLARE

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 45: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

Length of the strong classifier

GERMAN

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 46: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Length of the strong classifier

RINGNORM

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 47: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

Length of the strong classifier

SPLICE

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 48: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

Length of the strong classifier

THYROID

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 49: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030.21

0.215

0.22

0.225

0.23

0.235

0.24

Length of the strong classifier

TITANIC

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 50: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Length of the strong classifier

BANANA

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 51: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030.24

0.25

0.26

0.27

0.28

0.29

0.3

0.31

Length of the strong classifier

BREAST

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 52: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Length of the strong classifier

DIABETIS

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 53: AdaBoost - University of Oxford

Results with TCA on the IDA Database

� Training error (dashed line), test error (solid line)

� Discrete AdaBoost (blue), Real AdaBoost (green),

� Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

� the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Length of the strong classifier

HEART

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 54: AdaBoost - University of Oxford

Conclusions

� The AdaBoost algorithm was presented and analysed

� A modification of the Totally Corrective AdaBoost was introduced

� Initial test show that the TCA outperforms AB on some standard data sets.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Page 55: AdaBoost - University of Oxford
Page 56: AdaBoost - University of Oxford
Page 57: AdaBoost - University of Oxford
Page 58: AdaBoost - University of Oxford
Page 59: AdaBoost - University of Oxford
Page 60: AdaBoost - University of Oxford
Page 61: AdaBoost - University of Oxford
Page 62: AdaBoost - University of Oxford
Page 63: AdaBoost - University of Oxford

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

yf(x)

err

Page 64: AdaBoost - University of Oxford

1 2 3 4

e

0.5

t

Page 65: AdaBoost - University of Oxford
Page 66: AdaBoost - University of Oxford
Page 67: AdaBoost - University of Oxford
Page 68: AdaBoost - University of Oxford

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

yf(x)

err

Page 69: AdaBoost - University of Oxford

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

yf(x)

err

Page 70: AdaBoost - University of Oxford

1 2 3 4

e

0.5

t

Page 71: AdaBoost - University of Oxford
Page 72: AdaBoost - University of Oxford
Page 73: AdaBoost - University of Oxford
Page 74: AdaBoost - University of Oxford
Page 75: AdaBoost - University of Oxford
Page 76: AdaBoost - University of Oxford
Page 77: AdaBoost - University of Oxford
Page 78: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 79: AdaBoost - University of Oxford
Page 80: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 81: AdaBoost - University of Oxford
Page 82: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 83: AdaBoost - University of Oxford
Page 84: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 85: AdaBoost - University of Oxford
Page 86: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 87: AdaBoost - University of Oxford
Page 88: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 89: AdaBoost - University of Oxford
Page 90: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 91: AdaBoost - University of Oxford
Page 92: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 93: AdaBoost - University of Oxford
Page 94: AdaBoost - University of Oxford
Page 95: AdaBoost - University of Oxford
Page 96: AdaBoost - University of Oxford
Page 97: AdaBoost - University of Oxford
Page 98: AdaBoost - University of Oxford
Page 99: AdaBoost - University of Oxford
Page 100: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 101: AdaBoost - University of Oxford
Page 102: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 103: AdaBoost - University of Oxford
Page 104: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 105: AdaBoost - University of Oxford
Page 106: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 107: AdaBoost - University of Oxford
Page 108: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 109: AdaBoost - University of Oxford
Page 110: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 111: AdaBoost - University of Oxford
Page 112: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 113: AdaBoost - University of Oxford
Page 114: AdaBoost - University of Oxford

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 115: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Length of the strong classifier

IMAGE

Page 116: AdaBoost - University of Oxford

100 101 102 1030.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

Length of the strong classifier

FLARE

Page 117: AdaBoost - University of Oxford

100 101 102 1030.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

Length of the strong classifier

GERMAN

Page 118: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Length of the strong classifier

RINGNORM

Page 119: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

Length of the strong classifier

SPLICE

Page 120: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

Length of the strong classifier

THYROID

Page 121: AdaBoost - University of Oxford

100 101 102 1030.21

0.215

0.22

0.225

0.23

0.235

0.24

Length of the strong classifier

TITANIC

Page 122: AdaBoost - University of Oxford

100 101 102 1030.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Length of the strong classifier

BANANA

Page 123: AdaBoost - University of Oxford

100 101 102 1030.24

0.25

0.26

0.27

0.28

0.29

0.3

0.31

Length of the strong classifier

BREAST

Page 124: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Length of the strong classifier

DIABETIS

Page 125: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Length of the strong classifier

HEART

Page 126: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Length of the strong classifier

IMAGE

Page 127: AdaBoost - University of Oxford

100 101 102 1030.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

Length of the strong classifier

FLARE

Page 128: AdaBoost - University of Oxford

100 101 102 1030.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

Length of the strong classifier

GERMAN

Page 129: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Length of the strong classifier

RINGNORM

Page 130: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

Length of the strong classifier

SPLICE

Page 131: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

Length of the strong classifier

THYROID

Page 132: AdaBoost - University of Oxford

100 101 102 1030.21

0.215

0.22

0.225

0.23

0.235

0.24

Length of the strong classifier

TITANIC

Page 133: AdaBoost - University of Oxford

100 101 102 1030.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Length of the strong classifier

BANANA

Page 134: AdaBoost - University of Oxford

100 101 102 1030.24

0.25

0.26

0.27

0.28

0.29

0.3

0.31

Length of the strong classifier

BREAST

Page 135: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Length of the strong classifier

DIABETIS

Page 136: AdaBoost - University of Oxford

100 101 102 1030

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Length of the strong classifier

HEART