ch7: adaboost for building robust classifiers kh wong ch7. adaboost, v5b 1
TRANSCRIPT
Ch7: Adaboost for building robust classifiers
KH Wong
Ch7. Adaboost , V5b 1
Overview
• Objective of AdaBoost• 2-class problems
– Training– Detection– Examples
Ch7. Adaboost , V5b 2
Objective• Automatically classify inputs
into different categories of similar features
• Example– Spam mail detection and filtering– Face detection:
• find the faces in the input image
– Vision based gesture recognition [Chen 2007]
Ch7. Adaboost , V5b 3
Different detection problems
• Two classes problem (will be discussed here)– E.g. face detection
• In a picture, are there any faces or no face?
• Multi-class problems (Not discussed here)– Adaboost can be extended to handle multi class
problems• In a picture, are there any faces of men , women,
children ? (Still an unsolved problem)
Ch7. Adaboost , V5b 4
Define a 2-class classifier :its method and procedures
• Supervised training– Show many positive samples (face) to the system– Show many negative samples (non-face) to the
system.– Learn the parameters and construct the final
strong classifier.• Detection
– Given an unknown input image, the system can tell if there are faces or not.
Ch7. Adaboost , V5b 5
We will learn
• Training procedures– Give +ve and –ve examples to the system, then
the system will learn to classify an unknown input.• E.g. give pictures of faces (+ve examples) and non-faces
(-ve examples) to train the system.
• Detection procedures – Input an unknown (e.g. an image) , the system will
tell you it is a face or not.
Ch7. Adaboost , V5b 6Face non-face
A linear programming example• A line y=mx+c• m=3, c=2
• %when x=1, y=3*x+2=5. So when y>5, it is above the line, if y<5 it is below the line.
• %when x=0, y=3*0+2=2. So when y>2, it is above the line, if y<2 it is below the line
• %when x=-1, y=3*-1+2=-1. So when y>-1, it is above the line, if y<-1 it is below the line
• Conclusion, – if a point (x,y) is above and on the left
of the line y=mx+c, y>mx+c– if a (x,y) is below and on the right
side of the line y=mx+c, y<mx+cCh7. Adaboost , V5b 7
y-axis
X-axis
y=mx+c x=0,y=2
x=1,y=5
x=-1,y=-1
y-mx>c
y-mx<c
First let us learn what is a weak classifier h( )
•
)(otherwise 1
if 1)(
becomes )( ,
. variablesare constants, are
and ),]),[((: function theis
1},-or 1{polarity where
)(otherwise 1
)( if 1)(
use. want to youcase whichcontrol topolarity use and
equation become to together 2 and 1 case combine , -At time-
otherwise 1
constants given are , where1)(
:as writtenbe canIt .1 otherwise
1 then,area white"" thein is ],[xpoint a If
---Case2-
otherwise 1
constants given are , where, if 1)(
:as writtenbe canIt .1 otherwise
1 then,area gray"" thein is ][point a If
---Case1-
iicpmuvp
xh
iequationcpmuvp
u,vm,cwhere
cmuvvuxff
p
ipxfp
xh
p
(i)t
m,xc ,mu) if -(v-xh
h(x)
h(x)v)(u
m,xcv-muxh
h(x)
h(x)(u,v)x
ttt
tt
t
t
ttttt
t
Ch7. Adaboost , V5b 8
v=mu+corv-mu=c
•m,c are used to define the line•Any points in the gray area satisfy v-mu<c•Any points in the white area satisfy v-mu>c
v
c
Gradient m
(0,0)v-mu<c
v-mu>c
u
The weak classifier (a summary)• By definition a weak classifier
should be slightly better than a random choice (probability of correct classifcation =0.5) . Otherwise you should use a dice!
• In [u,v] space, the decision function f : (v-mu)=c is a straight line defined by m,c.
Ch7. Adaboost , V5b 9
Function f is a straight line
v
c
Gradient m
(0,0)v-mu<c
v-mu>c
u
v=mu+corv-mu=c
Example A :• Find the equation of the
line :v=mu+c– Answer: c=2, m=(6-2)/10=0.4, So
v=0.4u+2• Assume polarity Pt=1, classify P1,2,3,4.• P1(u=5,v=9)
– Answer: V-mu=9-0.4*5=7, since c=2, so v-mu>c, so it is class: -1
• P2(u=9,v=4):– Answer: V-mu=4-0.4*9=0.4, since c=2,
so v-mu<c, so it is class:+1• P3 (u=6,v=3): • P4(u=2,v=3): • Repeat using Pt= -1
Ch7. Adaboost , V5b 10
)(
otherwise 1
if 1)(
)(otherwise 1
)( if 1)(
ibcpmuvp
xh
ipxfp
xh
ttt
ttttt
Class +1:V-mu<c
Class -1:V-mu>c
Assume Polarity Pt is 1
v=mu+corv-mu=c
Answer for example A
P3(u=6,v=3):• V-mu=3-0.4*6=0.6, since c=2, so v-mu<c, so it is class +1
• P4(u=2,v=3):• V-mu=3-0.4*2=2.2, since c=2, so v-mu>c, so it is class -1
Ch7. Adaboost , V5b 11
Learn what is h( ), a weak classifier. Decision stump
• Decision stump definition• A decision stump is a machine learning
model consisting of a one-level decision tree.[1] That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes. A decision stump makes a prediction based on the value of just a single input feature. Sometimes they are also called 1-rules.[2]
• From http://en.wikipedia.org/wiki/Decision_stump
• Example
Ch7. Adaboost , V5b 12
Temperature T
T<=10oc 10oc<T<28oC T>=280c
Cold mild hot
Ch7. Adaboost , V5b 13
A weak learner (classifier ) is a decision stump• Define weak learners based on rectangle
features
othewise 1
)( if 1)( tttt
t
pxfpxh
The function ft(x) of a decision-line in space
pt= polarity{+1,-1}
Select which side separated by the line you
prefer
t =threshold
Decision line
+1 region
-1 region
Weak classifier we use here: Axis parallel weak classifier
• In order to simplify the derivation, we will use the simple “axis parallel weak classifier”
• It assumes gradient (m) of the decision line is =0(horizontal) or (vertical).
• The decision line is parallel to either the horizontal or vertical axis.
otherwise 1
)( if 1),(
)(),(
use
0
0
vpvpvuxh
vvuxf
thresholdv
ttt
t
t
Ch7. Adaboost , V5b 14
ht(x)v0
If polarity pt=1, this region is +1If polarity pt=-1, this region is -1
If polarity pt=1, this region is -1If polarity pt=-1, this region is +1
An example to show how Adaboost works
• Training, – Present ten samples to the system :
[xi={ui,vi},yi={’+’ or ‘-’}]• 5 +ve (blue, diamond) samples• 5 –ve (red, circle) samples
– Train up the system– Detection
• Give an input xj=(1.5,3.4)• The system will tell you it is ‘+’ or ‘-’.
E.g. Face or non-face
• Example: – u=weight, v=height– Classification: suitability to play in
the boxing.
Ch7. Adaboost , V5b 15
[xi={-0.48,0},yi=’+’]
[xi={-0.2,-0.5},yi=’+’]u-axis
v-axis
Adaboost concept
• Use this training data, how to make a classifier
Ch7. Adaboost , V5b 16
Only one axis-parallel weak classifier cannot achieve 100% classification. E.g. h1() or h2() or h3() alone will fail. That means no matter how you place the decision line (horizontally or vertically) you cannot get 100% classification result.You may try it yourself!
The above strong classifier should; work, but how can we find it?ANSWER:Combine many weak classifiers to achieve it.
Training data6 squares,5 circles.
h1( )
h2 ( )
h3( )
The solution is aH_complex( )
Objective: Train a classifier to classify an unknown input to see if it is a circle or a square.
How? Each classifier may not be perfect but each can achieve over 50% correct rate.
T
ttt (x)hαsignH(x)
1
Ch7. Adaboost , V5b 17
ClassificationResult
Combine to form theFinal strong classifier
ht=1( ) ht=2( ) ht=3( ) ht=4( ) ht=5( ) ht=6( ) ht=7( )
2t 3t5t
6t7t
7,..,2,1for ,
classifierweak
eachfor Weight
tt
4t
•
1t
•
THE ADABOOSTALGORITHM
Ch7. Adaboost , V5b 18
Initialization
MainTrainingloop
The final strong classifier
T
ttt (x)hαsignH(x)
1
classifier strong final The
;1 classifiedorrectly ;1 classifiedcorrectly when
,
__
a becomes so factor, :Note
classifier strong final The
}
(done);break , thenenough) small(or 0 If
stage toup classifier strong the
using samples trainingall oferror Total4
))(exp()()(:3
value).confidence(or weight ,1
ln2
1 :Step2
stop. otherwise ok) is 0.5 ansmaller th(error :50: teprerequisi :step checking
0
y)incorrectl d(classifie)( if 1 where,*)(error :Step1b
minarg :meansthat , respect to error with
theminimizes that }1,1{: classifier theFind:Step1a {
step timeprocessing each //for 1For
examples )1( negative ofnumber Lexamples; 1positive ofnumber M
LMn that such ;/1)( (weight) ondistributi Initialze
}1,1{, where Given :only) reference Algo.(v.4aAdaboost The
__
1
__
1
__
1
__
1
1
1
,..,1
1
)()(1
1
11
)(xhyin)(xhy
(i)eD (i)eD
weightincorrrectweightcorrectZ
ondistrubutiyprobabilitDionnormalizatZ
(x)hαsignH(x)
t TCE
t(x)hαsign(x)H
x:CEStep
Z
xhyiDiDStep
ε
ε
.ε
otherwise
yxhIIiD
εhD
Xh
,...Tt
) (
niD
YyXx),,y),..(xy(x
itiiti
)(xhyαclassifiedyincorrectln
it
)(xhy-αclassifiedcorrectlyn
it
classifiedyincorrectln
i
classifiedcorrectlyn
it
tt
T
ttt
t
t
t
Nit
t
itittt
tt
tt
t
iityxhyxh
n
itt
tt
t
t
iinn,
itititit
iitiit
Initialization
•
Ch7. Adaboost , V5b 19
examples )1( negative ofnumber L
examples; 1positive ofnumber M
LMn that such
;/1)( (weight) ondistributi Initialze
here 1or 1either class, theis
sample training theof location theis
}1,1{, where :Given
1
11
) (
niD
-Y
[u,v]X
YyXx),,y),..(xy(x
t
iinn,
Dt(i) =weight
• Dt(i) = probability distribution of the i-th training sample at time t . i=1,2…n.
• It shows how much you trust this sample.• At t=1, all samples are the same with equal
weight. Dt=1(all i)=same
• At t >1 , Dt>1(i) will be modified, we will see later.
Ch7. Adaboost , V5b 20
Main loop (step1,2,3)
t is the index for the weak classifier at stage t, T is the number of weak classifiers needed
nexplanatiofor slidenext see,))(exp()(
)(:3
value).confidence(or weight ,1
ln2
1 :Step2
stop. otherwise ok) is 0.5an smaller th(error :50: teprerequisi :step checking
0
)yincorrectl classified()( if 1 where,*)(error :Step1b
minarg :meansthat , respect to error with
theminimizes that }1,1{: classifier theFind:Step1a {
1For
1
)()(1
t
itittt
tt
tt
t
iityxhyxh
n
itt
tt
t
Z
xhyiDiDStep
ε
ε
.ε
otherwise
yxhIIiD
εhD
Xh
,...Tt
iitiit
Ch7. Adaboost , V5b 21
Zt = normalization factorm it will be discussed later
Main Loop Step 4 (updated)
• Step 4 in English:• If all training samples xi=1,..,all are correctly
classified by H(),then stop the training
Ch7. Adaboost , V5b 22
(done);break , thenenough) small(or 0 If
stage toup classifier strong the
using samples trainingall oferror Total4
1
,..,1
t TCE
t(x)hαsign(x)H
x:CEStep
t
t
t
Nit
classifier strong final The1
T
ttt (x)hαsignH(x)
Note: To find Normalization factor Zt in step3
;1 classifiedorrectly ;1 classifiedcorrectly :Note
,
__
a becomes so factor, where
,))(exp()(
)(:3
:Re
__
1
__
1
__
1
__
1
1
)(xhyin)(xhy
(i)eD (i)eD
weightincorrrectweightcorrectZ
ondistrubutiyprobabilitDionnormalizatZ
Z
xhyiDiDStep
call
itiiti
)(xhyαclassifiedyincorrectln
it
)(xhy-αclassifiedcorrectlyn
it
classifiedyincorrectln
i
classifiedcorrectlyn
it
tt
t
itittt
itititit
))(exp()()(1 itittt xhyiDiD
Ch7. Adaboost , V5b 23
AdaBoost chooses this weight update function deliberately
Because the idea is, •when a training sample is correctly classified, weight decreases•when a training sample is incorrectly classified, weight increases
An example to show how Adaboost works
• Training, – Present ten samples to the system :
[xi={ui,vi},yi={’+’ or ‘-’}]• 5 +ve (blue, diamond) samples• 5 –ve (red, circle) samples
– Train up the classification system.– Detection example:
• Give an input xj=(1.5,3.4)• The system will tell you it is ‘+’ or ‘-’.
E.g. Face or non-face.
– Example: – You may treat u=skills, v=height– Classification task: suitability to play
in the basket ball team.
Ch7. Adaboost , V5b 24
[xi={-0.48,0},yi=’+’]
[xi={-0.2,-0.5},yi=’+’]u-axis
v-axis
Initialization
• M=5 +ve (blue, diamond) samples• L=5 –ve (red, circle) samples• n=M+L=10 (usually make MN)• Initialize weight D(t=1)(i)= 1/10 for all i=1,2,..,10,
– So, D(1)(1)=0.1, D(1) (2)=0.1,……, D(1)(10)=0.1
example Lexample; positiveM
LMn that such ;/1)( Initialze
}1,1{, wherewhere :Given
1
11
negative
niD
YyXx),,y),..(x,y(x
t
iinn
Ch7. Adaboost , V5b 25
Main training loop
Step 1a, 1b
Ch7. Adaboost , V5b 26
Select h( ): For simplicity in implementation we use the axis-parallel weak classifier
•
0
0
by controlled be can line theof position the
line) (vertcial mgradient of linea is
or
by controlled be can line theof position the
line) l(horizonta 0mgradient of linea is
classifier weak parallel-Axis
. variablesare constants, are ),(: function theis
threshold 1},-or 1{polarity where
)(otherwise 0
)( if 1)(
Recall
u
f
v
f
u,vm,ccmuff
vp
ipxfp
xh
tt
ttttt
Ch7. Adaboost , V5b 27
ha (x)
hb(x)
u0
v0
Step1a,1b
• Assume h() can only be horizontal or vertical separators. (axis-parallel weak classifier)
• There are still many ways to set h(), here, if this hq() is selected, there will be 3 incorrectly classified training samples.
• See the 3 circled training samples
• We can go through all h( )s and select the best with the least misclassification (see the following 2 slides)
stop. otherwise ok) is 0.5an smaller th(error :50: teprerequisi :step checking :Step1b
minarg :meansThat
respect to error with theminimize that }1,1{: classifier theFind:{Step1a
.ε
εh
DXh
t
t
tt
Ch7. Adaboost , V5b 28
Incorrectly classified by hq()
hq()
Example :Training example slides from [Smyth 2007]classifier the ten red (circle)/blue (diamond) dots Step 1a:
•
},-{p
(x)h
vvux
pupuxh
i
ii
11polarity
axis. verticalthe
toparallel is because
usednot is ),,(
otherwise 1
if 1)(
Ch7. Adaboost , V5b 29
Initialize:Dn
(t=1)=1/10
You may choose one of the following axis-parallel (vertical line) classifiers
Vertical Dotted lines are possible choices
hi=1(x) ………….. hi=4(x) ……………… hi=9(x)
u1 u2 u3 u4 u5 u6 u7 u8 u9
u-axis
v-axis
There are 9x2 choices here, hi=1,2,3,..9, (polarity +1)h’i=1,2,3,..9, (polarity -1)
Example :Training example slides from [Smyth 2007]classifier the ten red (circle)/blue (diamond) dots Step 1a:
•
},-{p
(x)h
uvux
pvpvxh
j
jj
11polarity
axis. horizontal the
toparallel is because
usednot is ),,(
otherwise 1
if 1)(
Ch7. Adaboost , V5b 30
Initialize:Dn
(t=1)=1/10
You may choose one of the following axis-parallel (horizontal lines) classifiers
Horizontal dotted lines are possible choices
hj=1(x)hj=2(x):hj=4(x):::::hj=9(x)
v1v2v3
V4V5V6V7V8
v9
u-axis
v-axis
There are 9x2 choices here, hj=1,2,3,..9, (polarity +1)h’j=1,2,3,..9, (polarity -1)
All together including the previous slide 36 choices
Step 1b:Find and check the error of the weak classifier h( )
• To evaluate how successful is your selected weak classifier h( ), we can evaluate the error rate of the weak classifier
• For parallel-axis weak classifiers, if you have N (+ve plus –ve) training samples, you will have (N-1)x4 (Proof that!)
• ɛt = Misclassification probability of h( ) • Checking: If εt>= 0.5 (something wrong), stop the training
– Because, by definition a weak classifier should be slightly better than a random choice--probability =0.5
– So if εt >= 0.5 , your h( ) is a bad choice, redesign another h”( ) and do the training based on the new h”( ).
stop. otherwise ,50: teprerequisi :step checking :Step1b
0
)classifiedly (incorrect)( if 1 where,*)( )()(
1
.ε
otherwise
yxhIIiD
t
iityxhyxh
n
itt iitiit
Ch7. Adaboost , V5b 31
Example B for Step1a,1b
• Assume h() can only be horizontal or vertical separators.
• How many different classifiers are available?
• If hj() is selected as shown, circle the misclassified training samples. Find ɛ( ) to see misclassification probability if the probability distribution (D) for each sample is the same.
• Find h() with minimum error.
stop. otherwise ,50: teprerequisi :step checking :Step1b
respect to error with theminimizes that }1,1{:classfier theFind:{Step1a
.ε
DXh
t
tt
Ch7. Adaboost , V5b 32
hj(): below the line are squares, above are circles)
Answer : Example B for Step1a,1b
• Assume h() can only be horizontal or vertical separators.
• How many different classifiers are available?
– Answer: because there are 12 training samples, we will have 11x2 vertical + 11x2 horizontal classifies. so the total is (11x2+11x2)=44. (updated)
• If hj() is selected as shown, circle the misclassified training samples. Find ɛ( ) to see misclassification probability if the probability distribution (D) for each sample is the same.
– Answer=(1/12), 4 misclassified (circled) samples. ɛ=4*(1/12)
• Find h() with minimum error. Answer:– ?? Repeat above and find ɛj( ) for each
of the hj=1,,..44(), compare ɛj( ) and find the smallest ɛj( ). Then this indicates the best hj()
stop. otherwise ,50: teprerequisi :step checking :Step1b
respect to error with theminimizes that }1,1{:classfier theFind:{Step1a
.ε
DXh
t
tt
Ch7. Adaboost , V5b 33
hj(): below the line are squares, above are circles)
Weak classifiers required
• D=Dimension on of the problem• N=Number of total training samples• M=Number of weak classifier
required=Dx2x(N-1)• Explain why?• If the problem is three dimension, and the
number of total training samples (positive +ve plus negative –ve samples) is 20, Calculate M.
• Answer: M=3x2x(20-1)=114, explain the answer
Ch7. Adaboost , V5b 34
Result of step2 at t=1
•
Ch7. Adaboost , V5b 35
Incorrectly classified by ht=1(x)
ht=1(x)
Step2 at t=1 (refer to the previous slide)• Using εt=1=0.3, because 3
samples are incorrectly classified
_value)(confident 424.030.0
3.01ln
2
1
. classifier of rateerror weighted theis where
1ln
2
1 :Step2
3.01.01.01.0
1
1
t
tt
t
tt
t
so
hε
ε
ε
ε
otherwise
yxhI
IiD
iityxh
yxh
n
itt
iit
iit
0
)( if 1 where
,*)(
)(
)(1
Ch7. Adaboost , V5b 36
The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdfAlso see appendix.
Step3 at t=1, update Dt to Dt+1
• Update the weight Dt(i) for each training sample i
incorrect ision classifcat theIf:1)(
correct ision classifcat theIf: 1)(
:Note
function) (prob.on distrubuti a is so
factor,ion normalizat where
))(exp()()(:3 1
iti
iti
t
t
t
itittt
xhy
xhy
D
Z
Z
xhyiDiDStep
Ch7. Adaboost , V5b 37
The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdfAlso see appendix.
Step 3: Find first Z (the normalization factor). Note that Dt=1=0.1, at=1 =0.424
•
911.0
456.0455.0
52.1*3*1.065.0*7*1.0*3*1.0*7*1.0
)__()__(
in it put ,1)( so),(:classifiedy incorrectl
in it put ,1)( so),(:classifiedcorrectly
)(
__
1t
samplesincorrect 3 andcorrect 7
424.0,1.0
1
424.0424.0
)()()(
)1(
)(
)1(
)()(
)( )(
11
t
xhyi
αt
xhy
αt
xhyi
αt
xhy
αtt
iiiiii
iiiiii
xhyi
)(xhyαt
xhy
)(xhyαtt
xhy xhyit
tt
Z
ee
weightincorrecttotalweightcorrecttotal
(i)eD(i)eD(i)eD(i)eDZ
(i)xhyxhy
(i)xhyxhy
i(i)eD(i)eDZ
weightincorrectweightcorrectZ
αD
ii
t
iii
t
ii
t
iii
t
ii
itit
iii
itit
iii ii
Ch7. Adaboost , V5b 38
Note: currently t=1, Dt=1(i)=0.1 for all i7 correctly classified3 incorrectly classified
Step 3: Example: update Dt to Dt+1
If correctly classified, weight Dt+1 will decrease, and vice versa.
•
167.052.1911.0
1.0
911.0
1.0)(
0714.065.0911.0
1.0
911.0
1.0)(
, 911.0 since
52.1*1.01.01.0
)(
65.01.0
)(
1.0
)(
11
11
42.011
1
42.0
)(1
eiDincrease
eiDdecrease
SoZ
eZ
eZ
iD
ZiD
eZ
eZ
iDD
incorrectt
correctt
t
ttincorrectt
tcorrectt
t
correct
t
tt
Ch7. Adaboost , V5b 39
Now run the main training loop the second time(t=2)
• 167.052.1911.0
1.0
911.0
1.0)(
0714.065.0911.0
1.0
911.0
1.0)(
11
11
eiD
eiD
incorrectt
correctt
Ch7. Adaboost , V5b 40
Now run the main training loop second time t=2, and then t=3
•
Ch7. Adaboost , V5b 41
Final classifier by combining three weak classifiers
Combined classifier for t=1,2,3Exercise: work out 1and 2
•
32
33221
1
,workout :Exercise
)()()(*424.0)(
tt
tt
T
ttt
αα
xhαxhαxhsignxH
(x)hαsignH(x)
Ch7. Adaboost , V5b 42
Combine to form theclassifier.May need one more step for the final classifier
ht=1()
ht=2() ht=3()
t=1=0.424 t=2 t=3
Example C• if example ==1• blue(*)=[ • -26 38• 3 34• 32 3• 42 10];• red(O)=[ • 23 38• -4 -33• -22 -25• -37 -31];• datafeatures=[blue;red];• dataclass=[ -1 -1 -1 -1 1 1 1 1 ];
Ch7. Adaboost , V5b 43
Answer-C , initialized, t=1Find the best h() by inspection
What is D(i) for all i=1 to 8?•
Ch7. Adaboost , V5b 44
Answer-C, t=1h1(upper half =*, lower= o)
• Weak classifier h1(upper half =*, lower= o)We see that Feature(5) is wrongly classified, 1 sample is wrong
• err =ε(t)=D(t)*1, • ε(t) =0.125• Alpha=α=0.5*log[1- ε(t) )/ ε(t)]• =0.973• Find next D(t+1) =D(t)*exp(α*(h(x)≠y)• I.e. Incorrect=Dt+1(i)=Dt(i)*exp(α)• D(5)=0.125*exp(0.973)• =0.3307 (not normalized yet) • Correct=Dt+1(i)=Dt(i)*exp(-α)• D(1)=0.125*exp(-0.973)=0.0472 (not
normalized yet) • ------------• Z=(7*0.0472+0.3307)=0.6611• After normalization,D at t+1• D(5)=0.3307 / Z=0.5002• D(1)=D(2)..etc =0.0472 / Z=0.0714
t
tt
t
iityxhyxh
n
itt
ε
ε
.ε
otherwise
yxhIIiD
Step
iitiit
1ln
2
1 :Step2
stop. otherwise ,50: teprerequisi :step checking :Step1b
0
)( if 1 where,*)(error
1
)()(1
Ch7. Adaboost , V5b 45
h1( )
Answer-C, Result at t=1
• ##display result t_step=1 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification error##• >i=1, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=2, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=3, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=4, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=-1, CE_=0• >i=5, a1*h1(xi)=-0.973, O_=-0.973, S_=-1.000, Y_=1, CE_=1• >i=6, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0• >i=7, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0• >i=8, a1*h1(xi)=0.973, O_=0.973, S_=1.000, Y_=1, CE_=0• >weak classifier specifications: • -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of 1)• -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction of 1)• >#-new weak classifier at stage(1):dimension=2,threshold=-25.00;direction=-1• >Cascaded classifier error up to stage(t=1)for(N=8 training samples) =[sum(CE_)/N]= 0.125
•
Ch7. Adaboost , V5b 46
Use Step4 of the AdaBoost algo. To find CEt
Answer-C, t=2
• Weak classifier h1(left =o, eight= *):Feature(1),(2) are wrongly classified, 2 samples are wrong.
• err =ε(t)=Dt(1)+Dt(2)=0.0714+0.0714=• ε(t) =0.1428• Alpha=α=0.5*log[1- ε(t) )/ ε(t)]=0.8961• Find next D(t+1) =D(t)*exp(α*(h(x)≠y), ie.• Incorrect=Dt+1(i)=Dt(i)*exp( α)• D(1)=D(2)=0.0714*exp(0.8961)• =0.1749 (not normalized yet)• correct=Dt+1(i)=Dt(i)*exp(-α)• D(7)=D(6)=D(3,)D=(4)=D(8)=0.071*exp(-
0.8961)=0.029• Same for sample (7)(6)(3,)(4), but• D(5)=0.5*exp(-0.8961)=0.2041• Z=(2*0.1749 +5*0.029+0.2041)=0.6989• After normalization• D at t+1, D(1)=D(2) = 0.1749
/0.6989=0.2503• D(5)= 0.2041 /0.6989=0.292• D(8)= 0.029 / 0.6989=0.0415
t
tt
t
iityxhyxh
n
itt
ε
ε
.ε
otherwise
yxhIIiD
Step
iitiit
1ln
2
1 :Step2
stop. otherwise ,50: teprerequisi :step checking :Step1b
0
)( if 1 where,*)(error
1
)()(1
Ch7. Adaboost , V5b 47
Answer-C, Result at t=2
• ##display result t_step=2 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification error##
• >i=1, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=-1, CE_=0• >i=2, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=-1, CE_=0• >i=3, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, O_=-1.869, S_=-1.000, Y_=-1, CE_=0• >i=4, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, O_=-1.869, S_=-1.000, Y_=-1, CE_=0• >i=5, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, O_=-0.077, S_=-1.000, Y_=1, CE_=1• >i=6, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0• >i=7, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0• >i=8, a1*h1(xi)=0.973, a2*h2(xi)=0.896, O_=1.869, S_=1.000, Y_=1, CE_=0• >weak classifier specifications: • -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of
1)• -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction
of 1)• >#-new weak classifier at stage(2):dimension=1,threshold=23.00;direction=-1• >Cascaded classifier error up to stage(t=2)for(N=8 training samples) =[sum(CE_)/N]= 0.125
•
Ch7. Adaboost , V5b 48
Use Step4 of the AdaBoost algo. To find CEt
Answer-C, t=3
•
Ch7. Adaboost , V5b 49
Use Step4 of the AdaBoost algo. To find CEt
Answer-C, Result at t=3
• ##display result t_step=3 ## O_=cascaded_sum, S_=sign(O_),Y=train_class,CE=classification error##• >i=1, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=-0.745, S_=-1.000, Y_=-1, CE_=0• >i=2, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=-0.745, S_=-1.000, Y_=-1, CE_=0• >i=3, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, a3*h3(xi)=0.668, O_=-1.201, S_=-1.000, Y_=-1, CE_=0• >i=4, a1*h1(xi)=-0.973, a2*h2(xi)=-0.896, a3*h3(xi)=0.668, O_=-1.201, S_=-1.000, Y_=-1, CE_=0• >i=5, a1*h1(xi)=-0.973, a2*h2(xi)=0.896, a3*h3(xi)=0.668, O_=0.590, S_=1.000, Y_=1, CE_=0• >i=6, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0• >i=7, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0• >i=8, a1*h1(xi)=0.973, a2*h2(xi)=0.896, a3*h3(xi)=-0.668, O_=1.201, S_=1.000, Y_=1, CE_=0• >weak classifier specifications: • -dimension: 1=vertical :direction:1=(left="blue_*", right="red_O"); -1=(reverse direction of 1)• -dimension: 2=horizontal:direction:1=(up="red_O", down="blue_*"); -1=(reverse direction of 1)• >#-new weak classifier at stage(3):dimension=1,threshold=3.00;direction=1• >Cascaded classifier error up to stage(t=3)for(N=8 training samples) =[sum(CE_)/N]= 0.000
•
Ch7. Adaboost , V5b 50
Use Step4 of the AdaBoost algo. To find CEt
Answer-C, strong classifier
•
Ch7. Adaboost , V5b 51
The strong classifier
h1
h2h3
Answer-C :Test result, example 5.1
•
classifier strong final The
1
T
ttt (x)hαsignH(x)
•
Ch7. Adaboost , V5b 52
CEt
Use Step4 of the AdaBoost algo. To find CEt
Class exercise 7.1
• if example ==2• blue=[ -46 18• -30 -30• -31 -19• -8 15• 8 -45• -22 2];• red=[ 33 38• 30 10• 21 35• 1 19• 14 23• 37 -41];• datafeatures=[blue;red];• dataclass=[ -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 ];
Ch7. Adaboost , V5b 53
Blue=* (star)Red = (circle)
Class exercise 7.1 ,t=0
•
Ch7. Adaboost , V5b 54
Blue=* (star)Red = (circle)
Summary
• Learned– what is a classifier– the classification technique Adaboost
Ch7. Adaboost , V5b 55
Appendix
Ch7. Adaboost , V5b 56
Theory• We first make up a measurement function called “Exponential Loss
function” to measure the strength of a strong classifier.– Exponential Loss function L (H) =a measurement of the misclassification
rate of a strong classifier H .
• yiH(xi)=+1 ( correctly classified)
• yiH(xi)=-1 ( incorrectly classified)
• A good Strong classifier should have low L(H)
n
i
xHyn
ixHy
T
kkk
ii
iie
eHL
d as is definection L(H)l Loss funExponentia
xhsignxH
1
))((
1))((
1
1)(
)()( classifier stronga For
Ch7. Adaboost , V5b 57
Theory:By definition, the weight update rule is chosen to achieve adaptive boosting
• AdaBoost chooses this weight update function deliberately
• Because, • when a training sample is correctly classified, weight decreases• when a training sample is in correctly classified, weight increases• Some other systems may use different weight update formulas but with
the same spirit (correctly classified samples will result in decreased weight, and vice versa) .
Ch7. Adaboost , V5b 58
))(exp()()(1 itittt xhyiDiD
Theory: part1a
•
Ch7. Adaboost , V5b 59
samplesnall
iiiii
iiii
t
n
i
xhyxHyt
n
i
xhxHyt
t
tt
t
t
t
tt
tt
xhyxhycasesincorrect
xhyxhycasescorrect
hHL
ieehHL
ehHL
hHL
h, hα αcificationFor simpli
hHL tH
HLt H
(x) ier hak classif of the wetion rateclassificaincorrect of obability ε
iiiti
iiti
__
1
1
)()(
1
)()(
cases} 1)( hence )({:_
cases} 1)( hence )({:_
)(
minimize to Find:is Objective The
:Proof
, is1 stageat of function Loss
is stageat of function Loss
:Define
1log
2
1 prove want toWe t.stageat selected
thePr
Given
Theory : part2a
•
)(
0
0set (ii), in minimize To
constants. are they so, , (ii) In
)(
formula to above Put the
cases} 1)( hence )({:_
cases} 1)( hence )({:_
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)1()(
)(
)1()(
__
1
iiie
e
e
e
eeee
d
hHdLd
hHdLhHL
ndent of αare indepeee
iieeeehHL
eeeehHL
(i)
xhyxhycasesincorrect
xhyxhycasescorrect
hHL
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
xhy
xHy
xhy
xHy
xhy
xHy
xhy
xHy
t
tt
xhy
xHy
xhy
xHy
xhy
xHy
xhy
xHyt
xhy
xHy
xhy
xHyt
samplesnall
iiiii
iiii
t
tt h, hα αcificationFor simpli
Ch7. Adaboost , V5b 60
Theory : part3a
•
!Pr
1log
2
1
earlier
1log
2
1
(iv) in above Put the
where
/
/1
,
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
)(
oved
So
h, hαα,we setcificationFor simpli
eeZ
factor ,alization is the nomZ
Zeee
e
ε
Zeee
e
ε
So
t
tt
tt
xhy
xHy
xhy
xHyt
t
txhy
xHy
xhy
xHy
xhy
xHy
xhy
xHy
txhy
xHy
xhy
xHy
xhy
xHy
xhy
xHy
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
lityon probabiassificaticorrect clε
encebility , htion probaclassificaIncorrect ε
(x) h
ive
e
e
e
e
e
e
e
e
(iii)
t
xhy
xHy
xhy
xHy
xhy
xHy
xhy
xHy
xhy
xHy
xhy
xHy
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
ii
iti
1
classifier weak For the:recall
)(log2
1
log)log(
loglog
, of sides both of log Take
)(
)(
)(
)(
)(
)(
)(
)(
2
)(
)(
)(
)(
Ch7. Adaboost , V5b 61
tt h, hα αcificationFor simpli This is because of step 3 of previous stage (t-1) and step1b of current stage t
Advanced topic: Viola Jones’ implementation, compared with the original AdaBoost
• Also , classes can be {1,0} rather than {1,-1}
decreaseZ
eiD
Z
xhyiDiDxhy
increaseZ
eiD
Z
xhyiDiDxhy
) (xhy(i)eD) (xhy(i)eD
weightincorrrectweightcorrectZ
ondistrubutiyprobabilitDionnormalizatZ
Z
xhyiDiDStep
call
AdaBoostOrginal
t
t
t
ititttiti
t
t
t
ititttiti
itiα
L
ititi
-αM
it
L
i
M
it
tt
t
itittt
tt
1
1
1
1
11
11
1
)())(exp()()(:1)(incorrect
)())(exp()()(:1)(correct
:Note
__
a is so factor, where
))(exp()()(:3
:Re
_
changenowwwe
decreasedε
εwwwe
w
ε
εβ
exe
ww
AdaBoostJoneViola
itititi
t
titititi
t
t
iii
eitit
i
_,:)1(incorrect
,1
: )0(correct
)normalizedalready is : (assume Note
1 and
, otherwise 1 andcorrectly classfied is example if 0 where
: weights theupdate
:__
,11
,,1
,01
,,1
t
1,,1
Ch7. Adaboost , V5b 62
Face detection idea
• 1) in Adaboost use parallel-axis (tree decision) classifier 2) in Viola Jones, the weak classifier is the specially designed classifier described in the paper.
Ch7. Adaboost , V5b 63
Useful Features Learned by Boosting
Ch7. Adaboost , V5b 64
A Cascade of Classifierswill be discussed in the next chapter
Ch7. Adaboost , V5b 65
Reference• [Chen 007] Qing Chen, Nicolas D. Georganas and Emil M. Petriu,” Real-Time Vision-Based
Gesture Recognition Using Haar-like Features”, IMTC 2007, Warsaw, Poland, May 1-3, 2007• [smyth 2007] : slides: smyth, “Face Detection using the Viola-Jones Method” slide:
http://www.ics.uci.edu/~smyth/courses/cs175/slides12_viola_jones_face_detection.ppt• [Deng 2007 ] slides: Hongbo Deng A brief introduction to adaboost, 6 Feb, 2007, sildes:
http://dtpapers.googlecode.com/files/Hongbo.ppt• [Freund ] slides: A tutorial on boosting , A Tutorial on Boosting A Tutorial on Boosting• www.cs.toronto.edu/~hinton/csc321/notes/boosting.pdf• [Hoiem 2004]: sildes: Derek Hoiem, Adaboost , March 31, 2004,
http://www.cs.uiuc.edu/~dhoiem/presentations/Adaboost_Tutorial.ppt• [Jensen 2008] Jensen , “Implementing the Viola-Jones Face Detection Algorithm”, • http://orbit.dtu.dk/getResource?recordId=223656&objectId=1&versionId=1• http://informatik.unibas.ch/lehre/ws05/cs232/_Folien/08_AdaBoost.pdf• [Boris Babenko]: Boris Babenko , “Note: A Derivation of Discrete AdaBoost”, Department of
Computer Science and Engineering,University of California, San Diego http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf
• [Kroon 2010] http://www.mathworks.com/matlabcentral/fileexchange/27813-classic-adaboost-classifier
Ch7. Adaboost , V5b 66
Matlab demo
• [Kroon 2010] http://www.mathworks.com/matlabcentral/fileexchange/27813-classic-adaboost-classifier
• http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html
Ch7. Adaboost , V5b 67
Answer : Exercise 7.1, t=1, t=1=1.1989
•
Ch7. Adaboost , V5b 68
Answer : Exercise 7.1, t=2, t=2=1.5223•
Ch7. Adaboost , V5b 69
Answer : Exercise 7.1, t=3, t=3=1.4979
•
Ch7. Adaboost , V5b 70
Answer : Exercise 7.1, strong classifier
•
Ch7. Adaboost , V5b 71
h1
h3
h2
Answer:7.1 Testing example 7.1
•
classifier strong final The
1
T
ttt (x)hαsignH(x)
Ch7. Adaboost , V5b 72
CEt
Use Step4 of the AdaBoost algo. To find CEt
exercise 7.2 (Take home exercise )Find the strong classifier from this training data set. Write clearly the types of h( ) (e.g. left=blue, right =red, threshold at u or v etc)
and value of ε and α of each stage t.
if example ==3 %assign 13• blue=[ -23 19• 18 13• 3 23• 12 22];• red=[ 13 18• -43 -3• 21 -14• 29 -8]; datafeatures=[blue;red];• dataclass=[ -1 -1 -1 -1 -1 1 1 1 1 1 ];
classifier strong final The
1
T
ttt (x)hαsignH(x)
Ch7. Adaboost , V5b 73
Take home exercise 7.2 on AdaBoost , t=0
•
Ch7. Adaboost , V5b 74
i