ch7: adaboost for building robust classifiers kh wong ch7. adaboost, v5b 1

Ch7: Adaboost for building robust classifiers

KH Wong

Ch7. Adaboost , V5b 1

Overview

• Objective of AdaBoost• 2-class problems

– Training– Detection– Examples


Objective• Automatically classify inputs

into different categories of similar features

• Example– Spam mail detection and filtering– Face detection:

• find the faces in the input image

– Vision based gesture recognition [Chen 2007]


Different detection problems

• Two classes problem (will be discussed here)– E.g. face detection

• In a picture, are there any faces or no face?

• Multi-class problems (Not discussed here)– Adaboost can be extended to handle multi class

problems• In a picture, are there any faces of men , women,

children ? (Still an unsolved problem)


Define a 2-class classifier :its method and procedures

• Supervised training– Show many positive samples (face) to the system– Show many negative samples (non-face) to the

system.– Learn the parameters and construct the final

strong classifier.• Detection

– Given an unknown input image, the system can tell if there are faces or not.


We will learn

• Training procedures– Give +ve and –ve examples to the system, then

the system will learn to classify an unknown input.• E.g. give pictures of faces (+ve examples) and non-faces

(-ve examples) to train the system.

• Detection procedures – Input an unknown (e.g. an image) , the system will

tell you it is a face or not.

Ch7. Adaboost , V5b 6Face non-face

A linear programming example• A line y=mx+c• m=3, c=2

• %when x=1, y=3*x+2=5. So when y>5, it is above the line, if y<5 it is below the line.

• %when x=0, y=3*0+2=2. So when y>2, it is above the line, if y<2 it is below the line

• %when x=-1, y=3*-1+2=-1. So when y>-1, it is above the line, if y<-1 it is below the line

• Conclusion, – if a point (x,y) is above and on the left

of the line y=mx+c, y>mx+c– if a (x,y) is below and on the right

side of the line y=mx+c, y<mx+cCh7. Adaboost , V5b 7

y-axis

X-axis

y=mx+c x=0,y=2

x=1,y=5

x=-1,y=-1

y-mx>c

y-mx<c

First let us learn what is a weak classifier h( )

•

)(otherwise 1

if 1)(

becomes )( ,

. variablesare constants, are

and ),]),[((: function theis

1},-or 1{polarity where

)(otherwise 1

)( if 1)(

use. want to youcase whichcontrol topolarity use and

equation become to together 2 and 1 case combine , -At time-

otherwise 1

constants given are , where1)(

:as writtenbe canIt .1 otherwise

1 then,area white"" thein is ],[xpoint a If

---Case2-

otherwise 1

constants given are , where, if 1)(

:as writtenbe canIt .1 otherwise

1 then,area gray"" thein is ][point a If

---Case1-

iicpmuvp

xh

iequationcpmuvp

u,vm,cwhere

cmuvvuxff

p

ipxfp

xh

p

(i)t

m,xc ,mu) if -(v-xh

h(x)

h(x)v)(u

m,xcv-muxh

h(x)

h(x)(u,v)x

ttt

tt

t

t

ttttt

t


v=mu+corv-mu=c

•m,c are used to define the line•Any points in the gray area satisfy v-mu<c•Any points in the white area satisfy v-mu>c

v

c

Gradient m

(0,0)v-mu<c

v-mu>c

u

The weak classifier (a summary)• By definition a weak classifier

should be slightly better than a random choice (probability of correct classifcation =0.5) . Otherwise you should use a dice!

• In [u,v] space, the decision function f : (v-mu)=c is a straight line defined by m,c.


Function f is a straight line

v

c

Gradient m

(0,0)v-mu<c

v-mu>c

u

v=mu+corv-mu=c

Example A :• Find the equation of the

line :v=mu+c– Answer: c=2, m=(6-2)/10=0.4, So

v=0.4u+2• Assume polarity Pt=1, classify P1,2,3,4.• P1(u=5,v=9)

– Answer: V-mu=9-0.4*5=7, since c=2, so v-mu>c, so it is class: -1

• P2(u=9,v=4):– Answer: V-mu=4-0.4*9=0.4, since c=2,

so v-mu<c, so it is class:+1• P3 (u=6,v=3): • P4(u=2,v=3): • Repeat using Pt= -1


)(

otherwise 1

if 1)(

)(otherwise 1

)( if 1)(

ibcpmuvp

xh

ipxfp

xh

ttt

ttttt

Class +1:V-mu<c

Class -1:V-mu>c

Assume Polarity Pt is 1

v=mu+corv-mu=c

Answer for example A

P3(u=6,v=3):• V-mu=3-0.4*6=0.6, since c=2, so v-mu<c, so it is class +1

• P4(u=2,v=3):• V-mu=3-0.4*2=2.2, since c=2, so v-mu>c, so it is class -1


Learn what is h( ), a weak classifier. Decision stump

• Decision stump definition• A decision stump is a machine learning

model consisting of a one-level decision tree.[1] That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes. A decision stump makes a prediction based on the value of just a single input feature. Sometimes they are also called 1-rules.[2]

• From http://en.wikipedia.org/wiki/Decision_stump

• Example


Temperature T

T<=10oc 10oc<T<28oC T>=280c

Cold mild hot

http://en.wikipedia.org/wiki/Machine_learning

http://en.wikipedia.org/wiki/Decision_tree_learning

http://en.wikipedia.org/wiki/Decision_stump#cite_note-IL92-0

http://en.wikipedia.org/wiki/Decision_stump#cite_note-1

http://en.wikipedia.org/wiki/Decision_stump


A weak learner (classifier ) is a decision stump• Define weak learners based on rectangle

features

othewise 1

)( if 1)( tttt

t

pxfpxh

The function ft(x) of a decision-line in space

pt= polarity{+1,-1}

Select which side separated by the line you

prefer

t =threshold

Decision line

+1 region

-1 region

Weak classifier we use here: Axis parallel weak classifier

• In order to simplify the derivation, we will use the simple “axis parallel weak classifier”

• It assumes gradient (m) of the decision line is =0(horizontal) or (vertical).

• The decision line is parallel to either the horizontal or vertical axis.

otherwise 1

)( if 1),(

)(),(

use

0

0

vpvpvuxh

vvuxf

thresholdv

ttt

t

t


ht(x)v0

If polarity pt=1, this region is +1If polarity pt=-1, this region is -1

If polarity pt=1, this region is -1If polarity pt=-1, this region is +1

An example to show how Adaboost works

• Training, – Present ten samples to the system :

[xi={ui,vi},yi={’+’ or ‘-’}]• 5 +ve (blue, diamond) samples• 5 –ve (red, circle) samples

– Train up the system– Detection

• Give an input xj=(1.5,3.4)• The system will tell you it is ‘+’ or ‘-’.

E.g. Face or non-face

• Example: – u=weight, v=height– Classification: suitability to play in

the boxing.


[xi={-0.48,0},yi=’+’]

[xi={-0.2,-0.5},yi=’+’]u-axis

v-axis

Adaboost concept

• Use this training data, how to make a classifier


Only one axis-parallel weak classifier cannot achieve 100% classification. E.g. h1() or h2() or h3() alone will fail. That means no matter how you place the decision line (horizontally or vertically) you cannot get 100% classification result.You may try it yourself!

The above strong classifier should; work, but how can we find it?ANSWER:Combine many weak classifiers to achieve it.

Training data6 squares,5 circles.

h1( )

h2 ( )

h3( )

The solution is aH_complex( )

Objective: Train a classifier to classify an unknown input to see if it is a circle or a square.

How? Each classifier may not be perfect but each can achieve over 50% correct rate.

T

ttt (x)hαsignH(x)

1


ClassificationResult

Combine to form theFinal strong classifier

ht=1( ) ht=2( ) ht=3( ) ht=4( ) ht=5( ) ht=6( ) ht=7( )

2t 3t5t

6t7t

7,..,2,1for ,

classifierweak

eachfor Weight

tt

4t

•

1t

•

THE ADABOOSTALGORITHM


Initialization

MainTrainingloop

The final strong classifier

T

ttt (x)hαsignH(x)

1

classifier strong final The

;1 classifiedorrectly ;1 classifiedcorrectly when

,

__

a becomes so factor, :Note


}

(done);break , thenenough) small(or 0 If

stage toup classifier strong the

using samples trainingall oferror Total4

))(exp()()(:3

value).confidence(or weight ,1

ln2

1 :Step2

stop. otherwise ok) is 0.5 ansmaller th(error :50: teprerequisi :step checking

0

y)incorrectl d(classifie)( if 1 where,*)(error :Step1b

minarg :meansthat , respect to error with

theminimizes that }1,1{: classifier theFind:Step1a {

step timeprocessing each //for 1For

examples )1( negative ofnumber Lexamples; 1positive ofnumber M

LMn that such ;/1)( (weight) ondistributi Initialze

}1,1{, where Given :only) reference Algo.(v.4aAdaboost The

__

1

__

1

__

1

__

1

1

1

,..,1

1

)()(1

1

11

)(xhyin)(xhy

(i)eD (i)eD

weightincorrrectweightcorrectZ

ondistrubutiyprobabilitDionnormalizatZ

(x)hαsignH(x)

t TCE

t(x)hαsign(x)H

x:CEStep

Z

xhyiDiDStep

ε

ε

.ε

otherwise

yxhIIiD

εhD

Xh

,...Tt

) (

niD

YyXx),,y),..(xy(x

itiiti

)(xhyαclassifiedyincorrectln

it

)(xhy-αclassifiedcorrectlyn

it

classifiedyincorrectln

i

classifiedcorrectlyn

it

tt

T

ttt

t

t

t

Nit

t

itittt

tt

tt

t

iityxhyxh

n

itt

qq

tt

t

t

iinn,

itititit

iitiit

Initialization

•


examples )1( negative ofnumber L

examples; 1positive ofnumber M

LMn that such

;/1)( (weight) ondistributi Initialze

here 1or 1either class, theis

sample training theof location theis

}1,1{, where :Given

1

11

) (

niD

-Y

[u,v]X

YyXx),,y),..(xy(x

t

iinn,

Dt(i) =weight

• Dt(i) = probability distribution of the i-th training sample at time t . i=1,2…n.

• It shows how much you trust this sample.• At t=1, all samples are the same with equal

weight. Dt=1(all i)=same

• At t >1 , Dt>1(i) will be modified, we will see later.


Main loop (step1,2,3)

t is the index for the weak classifier at stage t, T is the number of weak classifiers needed

nexplanatiofor slidenext see,))(exp()(

)(:3

value).confidence(or weight ,1

ln2

1 :Step2

stop. otherwise ok) is 0.5an smaller th(error :50: teprerequisi :step checking

0

)yincorrectl classified()( if 1 where,*)(error :Step1b

minarg :meansthat , respect to error with

theminimizes that }1,1{: classifier theFind:Step1a {

1For

1

)()(1

t

itittt

tt

tt

t

iityxhyxh

n

itt

qq

tt

t

Z

xhyiDiDStep

ε

ε

.ε

otherwise

yxhIIiD

εhD

Xh

,...Tt

iitiit


Zt = normalization factorm it will be discussed later

Main Loop Step 4 (updated)

• Step 4 in English:• If all training samples xi=1,..,all are correctly

classified by H(),then stop the training


(done);break , thenenough) small(or 0 If

stage toup classifier strong the

using samples trainingall oferror Total4

1

,..,1

t TCE

t(x)hαsign(x)H

x:CEStep

t

t

t

Nit

classifier strong final The1

T

ttt (x)hαsignH(x)

Note: To find Normalization factor Zt in step3

;1 classifiedorrectly ;1 classifiedcorrectly :Note

,

__

a becomes so factor, where

,))(exp()(

)(:3

:Re

__

1

__

1

__

1

__

1

1

)(xhyin)(xhy

(i)eD (i)eD



Z

xhyiDiDStep

call

itiiti

)(xhyαclassifiedyincorrectln

it

)(xhy-αclassifiedcorrectlyn

it

classifiedyincorrectln

i

classifiedcorrectlyn

it

tt

t

itittt

itititit

))(exp()()(1 itittt xhyiDiD


AdaBoost chooses this weight update function deliberately

Because the idea is, •when a training sample is correctly classified, weight decreases•when a training sample is incorrectly classified, weight increases

An example to show how Adaboost works

• Training, – Present ten samples to the system :

[xi={ui,vi},yi={’+’ or ‘-’}]• 5 +ve (blue, diamond) samples• 5 –ve (red, circle) samples

– Train up the classification system.– Detection example:

• Give an input xj=(1.5,3.4)• The system will tell you it is ‘+’ or ‘-’.

E.g. Face or non-face.

– Example: – You may treat u=skills, v=height– Classification task: suitability to play

in the basket ball team.


[xi={-0.48,0},yi=’+’]

[xi={-0.2,-0.5},yi=’+’]u-axis

v-axis

Initialization

• M=5 +ve (blue, diamond) samples• L=5 –ve (red, circle) samples• n=M+L=10 (usually make MN)• Initialize weight D(t=1)(i)= 1/10 for all i=1,2,..,10,

– So, D(1)(1)=0.1, D(1) (2)=0.1,……, D(1)(10)=0.1

example Lexample; positiveM

LMn that such ;/1)( Initialze

}1,1{, wherewhere :Given

1

11

negative

niD

YyXx),,y),..(x,y(x

t

iinn


Main training loop

Step 1a, 1b


Select h( ): For simplicity in implementation we use the axis-parallel weak classifier

•

0

0

by controlled be can line theof position the

line) (vertcial mgradient of linea is

or

by controlled be can line theof position the

line) l(horizonta 0mgradient of linea is

classifier weak parallel-Axis

. variablesare constants, are ),(: function theis

threshold 1},-or 1{polarity where

)(otherwise 0

)( if 1)(

Recall

u

f

v

f

u,vm,ccmuff

vp

ipxfp

xh

tt

ttttt


ha (x)

hb(x)

u0

v0

Step1a,1b

• Assume h() can only be horizontal or vertical separators. (axis-parallel weak classifier)

• There are still many ways to set h(), here, if this hq() is selected, there will be 3 incorrectly classified training samples.

• See the 3 circled training samples

• We can go through all h( )s and select the best with the least misclassification (see the following 2 slides)

stop. otherwise ok) is 0.5an smaller th(error :50: teprerequisi :step checking :Step1b

minarg :meansThat

respect to error with theminimize that }1,1{: classifier theFind:{Step1a

.ε

εh

DXh

t

qq

t

tt


Incorrectly classified by hq()

hq()

Example :Training example slides from [Smyth 2007]classifier the ten red (circle)/blue (diamond) dots Step 1a:

•

},-{p

(x)h

vvux

pupuxh

i

ii

11polarity

axis. verticalthe

toparallel is because

usednot is ),,(

otherwise 1

if 1)(


Initialize:Dn

(t=1)=1/10

You may choose one of the following axis-parallel (vertical line) classifiers

Vertical Dotted lines are possible choices

hi=1(x) ………….. hi=4(x) ……………… hi=9(x)

u1 u2 u3 u4 u5 u6 u7 u8 u9

u-axis

v-axis

There are 9x2 choices here, hi=1,2,3,..9, (polarity +1)h’i=1,2,3,..9, (polarity -1)

Example :Training example slides from [Smyth 2007]classifier the ten red (circle)/blue (diamond) dots Step 1a:

•

},-{p

(x)h

uvux

pvpvxh

j

jj

11polarity

axis. horizontal the

toparallel is because

usednot is ),,(

otherwise 1

if 1)(


Initialize:Dn

(t=1)=1/10

You may choose one of the following axis-parallel (horizontal lines) classifiers

Horizontal dotted lines are possible choices

hj=1(x)hj=2(x):hj=4(x):::::hj=9(x)

v1v2v3

V4V5V6V7V8

v9

u-axis

v-axis

There are 9x2 choices here, hj=1,2,3,..9, (polarity +1)h’j=1,2,3,..9, (polarity -1)

All together including the previous slide 36 choices

Step 1b:Find and check the error of the weak classifier h( )

• To evaluate how successful is your selected weak classifier h( ), we can evaluate the error rate of the weak classifier

• For parallel-axis weak classifiers, if you have N (+ve plus –ve) training samples, you will have (N-1)x4 (Proof that!)

• ɛt = Misclassification probability of h( ) • Checking: If εt>= 0.5 (something wrong), stop the training

– Because, by definition a weak classifier should be slightly better than a random choice--probability =0.5

– So if εt >= 0.5 , your h( ) is a bad choice, redesign another h”( ) and do the training based on the new h”( ).

stop. otherwise ,50: teprerequisi :step checking :Step1b

0

)classifiedly (incorrect)( if 1 where,*)( )()(

1

.ε

otherwise

yxhIIiD

t

iityxhyxh

n

itt iitiit


Example B for Step1a,1b

• Assume h() can only be horizontal or vertical separators.

• How many different classifiers are available?

• If hj() is selected as shown, circle the misclassified training samples. Find ɛ( ) to see misclassification probability if the probability distribution (D) for each sample is the same.

• Find h() with minimum error.


respect to error with theminimizes that }1,1{:classfier theFind:{Step1a

.ε

DXh

t

tt


hj(): below the line are squares, above are circles)

Answer : Example B for Step1a,1b

• Assume h() can only be horizontal or vertical separators.

• How many different classifiers are available?

– Answer: because there are 12 training samples, we will have 11x2 vertical + 11x2 horizontal classifies. so the total is (11x2+11x2)=44. (updated)

• If hj() is selected as shown, circle the misclassified training samples. Find ɛ( ) to see misclassification probability if the probability distribution (D) for each sample is the same.

– Answer=(1/12), 4 misclassified (circled) samples. ɛ=4*(1/12)

• Find h() with minimum error. Answer:– ?? Repeat above and find ɛj( ) for each

of the hj=1,,..44(), compare ɛj( ) and find the smallest ɛj( ). Then this indicates the best hj()


respect to error with theminimizes that }1,1{:classfier theFind:{Step1a

.ε

DXh

t

tt


hj(): below the line are squares, above are circles)

Weak classifiers required

• D=Dimension on of the problem• N=Number of total training samples• M=Number of weak classifier

required=Dx2x(N-1)• Explain why?• If the problem is three dimension, and the

number of total training samples (positive +ve plus negative –ve samples) is 20, Calculate M.

• Answer: M=3x2x(20-1)=114, explain the answer


Result of step2 at t=1

•


Incorrectly classified by ht=1(x)

ht=1(x)

Step2 at t=1 (refer to the previous slide)• Using εt=1=0.3, because 3

samples are incorrectly classified

_value)(confident 424.030.0

3.01ln

2

1

. classifier of rateerror weighted theis where

1ln

2

1 :Step2

3.01.01.01.0

1

1

t

tt

t

tt

t

so

hε

ε

ε

ε

otherwise

yxhI

IiD

iityxh

yxh

n

itt

iit

iit

0

)( if 1 where

,*)(

)(

)(1


The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdfAlso see appendix.

http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf

Step3 at t=1, update Dt to Dt+1

• Update the weight Dt(i) for each training sample i

incorrect ision classifcat theIf:1)(

correct ision classifcat theIf: 1)(

:Note

function) (prob.on distrubuti a is so

factor,ion normalizat where

))(exp()()(:3 1

iti

iti

t

t

t

itittt

xhy

xhy

D

Z

Z

xhyiDiDStep


The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdfAlso see appendix.


Step 3: Find first Z (the normalization factor). Note that Dt=1=0.1, at=1 =0.424

•

911.0

456.0455.0

52.1*3*1.065.0*7*1.0*3*1.0*7*1.0

)__()__(

in it put ,1)( so),(:classifiedy incorrectl

in it put ,1)( so),(:classifiedcorrectly

)(

__

1t

samplesincorrect 3 andcorrect 7

424.0,1.0

1

424.0424.0

)()()(

)1(

)(

)1(

)()(

)( )(

11

t

xhyi

αt

xhy

αt

xhyi

αt

xhy

αtt

iiiiii

iiiiii

xhyi

)(xhyαt

xhy

)(xhyαtt

xhy xhyit

tt

Z

ee

weightincorrecttotalweightcorrecttotal

(i)eD(i)eD(i)eD(i)eDZ

(i)xhyxhy

(i)xhyxhy

i(i)eD(i)eDZ

weightincorrectweightcorrectZ

αD

ii

t

iii

t

ii

t

iii

t

ii

itit

iii

itit

iii ii


Note: currently t=1, Dt=1(i)=0.1 for all i7 correctly classified3 incorrectly classified

Step 3: Example: update Dt to Dt+1

If correctly classified, weight Dt+1 will decrease, and vice versa.

•

167.052.1911.0

1.0

911.0

1.0)(

0714.065.0911.0

1.0

911.0

1.0)(

, 911.0 since

52.1*1.01.01.0

)(

65.01.0

)(

1.0

)(

11

11

42.011

1

42.0

)(1

eiDincrease

eiDdecrease

SoZ

eZ

eZ

iD

ZiD

eZ

eZ

iDD

incorrectt

correctt

t

ttincorrectt

tcorrectt

t

correct

t

tt


Now run the main training loop the second time(t=2)

• 167.052.1911.0

1.0

911.0

1.0)(

0714.065.0911.0

1.0

911.0

1.0)(

11

11

eiD

eiD

incorrectt

correctt


Now run the main training loop second time t=2, and then t=3

•


Final classifier by combining three weak classifiers

Combined classifier for t=1,2,3Exercise: work out 1and 2

•

32

33221

1

,workout :Exercise

)()()(*424.0)(

tt

tt

T

ttt

αα

xhαxhαxhsignxH

(x)hαsignH(x)


Combine to form theclassifier.May need one more step for the final classifier

ht=1()

ht=2() ht=3()

t=1=0.424 t=2 t=3

Example C• if example ==1• blue(*)=[ • -26 38• 3 34• 32 3• 42 10];• red(O)=[ • 23 38• -4 -33• -22 -25• -37 -31];• datafeatures=[blue;red];• dataclass=[ -1 -1 -1 -1 1 1 1 1 ];


Answer-C , initialized, t=1Find the best h() by inspection

What is D(i) for all i=1 to 8?•


Answer-C, t=1h1(upper half =*, lower= o)

• Weak classifier h1(upper half =*, lower= o)We see that Feature(5) is wrongly classified, 1 sample is wrong

• err =ε(t)=D(t)*1, • ε(t) =0.125• Alpha=α=0.5*log[1- ε(t) )/ ε(t)]• =0.973• Find next D(t+1) =D(t)*exp(α*(h(x)≠y)• I.e. Incorrect=Dt+1(i)=Dt(i)*exp(α)• D(5)=0.125*exp(0.973)• =0.3307 (not normalized yet) • Correct=Dt+1(i)=Dt(i)*exp(-α)• D(1)=0.125*exp(-0.973)=0.0472 (not

normalized yet) • ------------• Z=(7*0.0472+0.3307)=0.6611• After normalization,D at t+1• D(5)=0.3307 / Z=0.5002• D(1)=D(2)..etc =0.0472 / Z=0.0714

t

tt

t

iityxhyxh

n

itt

ε

ε

.ε

otherwise

yxhIIiD

Step

iitiit

1ln

2

1 :Step2


0

)( if 1 where,*)(error

1

)()(1


h1( )

Answer-C, Result at t=1

• ##display result t_step=1 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification error##• >i=1, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=2, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=3, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=4, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=5, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=1, CE_=1• >i=6, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0• >i=7, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0• >i=8, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0• >weak classifier specifications: • -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of 1)• -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction of 1)• >#-new weak classifier at stage(1):dimension=2,threshold=-25.00;direction=-1• >Cascaded classifier error up to stage(t=1)for(N=8 training samples) =[sum(CE_)/N]= 0.125

•


Use Step4 of the AdaBoost algo. To find CEt

Answer-C, t=2

• Weak classifier h1(left =o, eight= *):Feature(1),(2) are wrongly classified, 2 samples are wrong.

• err =ε(t)=Dt(1)+Dt(2)=0.0714+0.0714=• ε(t) =0.1428• Alpha=α=0.5*log[1- ε(t) )/ ε(t)]=0.8961• Find next D(t+1) =D(t)*exp(α*(h(x)≠y), ie.• Incorrect=Dt+1(i)=Dt(i)*exp( α)• D(1)=D(2)=0.0714*exp(0.8961)• =0.1749 (not normalized yet)• correct=Dt+1(i)=Dt(i)*exp(-α)• D(7)=D(6)=D(3,)D=(4)=D(8)=0.071*exp(-

0.8961)=0.029• Same for sample (7)(6)(3,)(4), but• D(5)=0.5*exp(-0.8961)=0.2041• Z=(2*0.1749 +5*0.029+0.2041)=0.6989• After normalization• D at t+1, D(1)=D(2) = 0.1749

/0.6989=0.2503• D(5)= 0.2041 /0.6989=0.292• D(8)= 0.029 / 0.6989=0.0415

t

tt

t

iityxhyxh

n

itt

ε

ε

.ε

otherwise

yxhIIiD

Step

iitiit

1ln

2

1 :Step2


0

)( if 1 where,*)(error

1

)()(1



• ##display result t_step=2 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification error##

• >i=1, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=-1, CE_=0• >i=2, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=-1, CE_=0• >i=3, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, O_=-1.869, S_=-1.000, Y_=-1, CE_=0• >i=4, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, O_=-1.869, S_=-1.000, Y_=-1, CE_=0• >i=5, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=1, CE_=1• >i=6, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0• >i=7, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0• >i=8, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0• >weak classifier specifications: • -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of

1)• -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction

of 1)• >#-new weak classifier at stage(2):dimension=1,threshold=23.00;direction=-1• >Cascaded classifier error up to stage(t=2)for(N=8 training samples) =[sum(CE_)/N]= 0.125

•



Answer-C, t=3

•




• ##display result t_step=3 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification error##• >i=1, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=-0.745, S_=-1.000, Y_=-1, CE_=0• >i=2, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=-0.745, S_=-1.000, Y_=-1, CE_=0• >i=3, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, a3*h3(xi)=0.668, O_=-1.201, S_=-1.000, Y_=-1, CE_=0• >i=4, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, a3*h3(xi)=0.668, O_=-1.201, S_=-1.000, Y_=-1, CE_=0• >i=5, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=0.668, O_=0.590, S_=1.000, Y_=1, CE_=0• >i=6, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0• >i=7, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0• >i=8, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0• >weak classifier specifications: • -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of 1)• -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction of 1)• >#-new weak classifier at stage(3):dimension=1,threshold=3.00;direction=1• >Cascaded classifier error up to stage(t=3)for(N=8 training samples) =[sum(CE_)/N]= 0.000

•



Answer-C, strong classifier

•


The strong classifier

h1

h2h3

Answer-C :Test result, example 5.1

•


1

T

ttt (x)hαsignH(x)

•


CEt


Class exercise 7.1

• if example ==2• blue=[ -46 18• -30 -30• -31 -19• -8 15• 8 -45• -22 2];• red=[ 33 38• 30 10• 21 35• 1 19• 14 23• 37 -41];• datafeatures=[blue;red];• dataclass=[ -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 ];


Blue=* (star)Red = (circle)

Class exercise 7.1 ,t=0

•


Blue=* (star)Red = (circle)

Summary

• Learned– what is a classifier– the classification technique Adaboost


Appendix


Theory• We first make up a measurement function called “Exponential Loss

function” to measure the strength of a strong classifier.– Exponential Loss function L (H) =a measurement of the misclassification

rate of a strong classifier H .

• yiH(xi)=+1 ( correctly classified)

• yiH(xi)=-1 ( incorrectly classified)

• A good Strong classifier should have low L(H)

n

i

xHyn

ixHy

T

kkk

ii

iie

eHL

d as is definection L(H)l Loss funExponentia

xhsignxH

1

))((

1))((

1

1)(

)()( classifier stronga For


Theory:By definition, the weight update rule is chosen to achieve adaptive boosting

• AdaBoost chooses this weight update function deliberately

• Because, • when a training sample is correctly classified, weight decreases• when a training sample is in correctly classified, weight increases• Some other systems may use different weight update formulas but with

the same spirit (correctly classified samples will result in decreased weight, and vice versa) .


))(exp()()(1 itittt xhyiDiD

Theory: part1a

•


samplesnall

iiiii

iiii

t

n

i

xhyxHyt

n

i

xhxHyt

t

tt

t

t

t

tt

tt

xhyxhycasesincorrect

xhyxhycasescorrect

hHL

ieehHL

ehHL

hHL

h, hα αcificationFor simpli

hHL tH

HLt H

(x) ier hak classif of the wetion rateclassificaincorrect of obability ε

iiiti

iiti

__

1

1

)()(

1

)()(

cases} 1)( hence )({:_


)(

minimize to Find:is Objective The

:Proof

, is1 stageat of function Loss

is stageat of function Loss

:Define

1log

2

1 prove want toWe t.stageat selected

thePr

Given

Theory : part2a

•

)(

0

0set (ii), in minimize To

constants. are they so, , (ii) In

)(

formula to above Put the



)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)1()(

)(

)1()(

__

1

iiie

e

e

e

eeee

d

hHdLd

hHdLhHL

ndent of αare indepeee

iieeeehHL

eeeehHL

(i)

xhyxhycasesincorrect

xhyxhycasescorrect

hHL

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

xhy

xHy

xhy

xHy

xhy

xHy

xhy

xHy

t

tt

xhy

xHy

xhy

xHy

xhy

xHy

xhy

xHyt

xhy

xHy

xhy

xHyt

samplesnall

iiiii

iiii

t

tt h, hα αcificationFor simpli


Theory : part3a

•

!Pr

1log

2

1

earlier

1log

2

1

(iv) in above Put the

where

/

/1

,

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

)(

oved

So

h, hαα,we setcificationFor simpli

eeZ

factor ,alization is the nomZ

Zeee

e

ε

Zeee

e

ε

So

t

tt

tt

xhy

xHy

xhy

xHyt

t

txhy

xHy

xhy

xHy

xhy

xHy

xhy

xHy

txhy

xHy

xhy

xHy

xhy

xHy

xhy

xHy

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

lityon probabiassificaticorrect clε

encebility , htion probaclassificaIncorrect ε

(x) h

ive

e

e

e

e

e

e

e

e

(iii)

t

xhy

xHy

xhy

xHy

xhy

xHy

xhy

xHy

xhy

xHy

xhy

xHy

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

ii

iti

1

classifier weak For the:recall

)(log2

1

log)log(

loglog

, of sides both of log Take

)(

)(

)(

)(

)(

)(

)(

)(

2

)(

)(

)(

)(


tt h, hα αcificationFor simpli This is because of step 3 of previous stage (t-1) and step1b of current stage t

Advanced topic: Viola Jones’ implementation, compared with the original AdaBoost

• Also , classes can be {1,0} rather than {1,-1}

decreaseZ

eiD

Z

xhyiDiDxhy

increaseZ

eiD

Z

xhyiDiDxhy

) (xhy(i)eD) (xhy(i)eD



Z

xhyiDiDStep

call

AdaBoostOrginal

t

t

t

ititttiti

t

t

t

ititttiti

itiα

L

ititi

-αM

it

L

i

M

it

tt

t

itittt

tt

1

1

1

1

11

11

1

)())(exp()()(:1)(incorrect

)())(exp()()(:1)(correct

:Note

__

a is so factor, where

))(exp()()(:3

:Re

_

changenowwwe

decreasedε

εwwwe

w

ε

εβ

exe

ww

AdaBoostJoneViola

itititi

t

titititi

t

t

iii

eitit

i

_,:)1(incorrect

,1

: )0(correct

)normalizedalready is : (assume Note

1 and

, otherwise 1 andcorrectly classfied is example if 0 where

: weights theupdate

:__

,11

,,1

,01

,,1

t

1,,1


Face detection idea

• 1) in Adaboost use parallel-axis (tree decision) classifier 2) in Viola Jones, the weak classifier is the specially designed classifier described in the paper.


Useful Features Learned by Boosting


A Cascade of Classifierswill be discussed in the next chapter


Reference• [Chen 007] Qing Chen, Nicolas D. Georganas and Emil M. Petriu,” Real-Time Vision-Based

Gesture Recognition Using Haar-like Features”, IMTC 2007, Warsaw, Poland, May 1-3, 2007• [smyth 2007] : slides: smyth, “Face Detection using the Viola-Jones Method” slide:

http://www.ics.uci.edu/~smyth/courses/cs175/slides12_viola_jones_face_detection.ppt• [Deng 2007 ] slides: Hongbo Deng A brief introduction to adaboost, 6 Feb, 2007, sildes:

http://dtpapers.googlecode.com/files/Hongbo.ppt• [Freund ] slides: A tutorial on boosting , A Tutorial on Boosting A Tutorial on Boosting• www.cs.toronto.edu/~hinton/csc321/notes/boosting.pdf• [Hoiem 2004]: sildes: Derek Hoiem, Adaboost , March 31, 2004,

http://www.cs.uiuc.edu/~dhoiem/presentations/Adaboost_Tutorial.ppt• [Jensen 2008] Jensen , “Implementing the Viola-Jones Face Detection Algorithm”, • http://orbit.dtu.dk/getResource?recordId=223656&objectId=1&versionId=1• http://informatik.unibas.ch/lehre/ws05/cs232/_Folien/08_AdaBoost.pdf• [Boris Babenko]: Boris Babenko , “Note: A Derivation of Discrete AdaBoost”, Department of

Computer Science and Engineering,University of California, San Diego http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf

• [Kroon 2010] http://www.mathworks.com/matlabcentral/fileexchange/27813-classic-adaboost-classifier



Matlab demo

• [Kroon 2010] http://www.mathworks.com/matlabcentral/fileexchange/27813-classic-adaboost-classifier

• http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html


http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html

http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html

Answer : Exercise 7.1, t=1, t=1=1.1989

•


Answer : Exercise 7.1, t=2, t=2=1.5223•


Answer : Exercise 7.1, t=3, t=3=1.4979

•


Answer : Exercise 7.1, strong classifier

•


h1

h3

h2

Answer:7.1 Testing example 7.1

•


1

T

ttt (x)hαsignH(x)


CEt


exercise 7.2 (Take home exercise )Find the strong classifier from this training data set. Write clearly the types of h( ) (e.g. left=blue, right =red, threshold at u or v etc)

and value of ε and α of each stage t.

if example ==3 %assign 13• blue=[ -23 19• 18 13• 3 23• 12 22];• red=[ 13 18• -43 -3• 21 -14• 29 -8]; datafeatures=[blue;red];• dataclass=[ -1 -1 -1 -1 -1 1 1 1 1 1 ];


1

T

ttt (x)hαsignH(x)


Take home exercise 7.2 on AdaBoost , t=0

•


ｉ

ch7: adaboost for building robust classifiers kh wong ch7. adaboost, v5b 1

Documents

mu corvmu

weak classifier h ch7

class classifier

mu canswer

nonfaces ve examples

v5bthe weak classifier

pictures of faces ve

faces of men