e nsemble l earning : a da b oost jianping fan dept of computer science unc-charlotte
TRANSCRIPT
![Page 1: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/1.jpg)
ENSEMBLE LEARNING: ADABOOST
Jianping Fan Dept of Computer Science UNC-Charlotte
![Page 2: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/2.jpg)
ENSEMBLE LEARNINGA machine learning paradigm where multiple learners are used to solve the problem
Problem
… ...… ...
Problem
Learner Learner Learner Learner
Previously:
Ensemble:
The generalization ability of the ensemble is usually significantly better than that of an individual learner
Boosting is one of the most important families of ensemble methods
![Page 3: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/3.jpg)
3
Bootstrapping
Bagging
Boosting (Schapire 1989)
Adaboost (Schapire 1995)
A BRIEF HISTORY Resampling for estimating statistic
Resampling for classifier design
![Page 4: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/4.jpg)
BOOTSTRAP ESTIMATION
Repeatedly draw n samples from D
For each set of samples, estimate a statistic
The bootstrap estimate is the mean of the individual estimates
Used to estimate a statistic (parameter) and its variance
![Page 5: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/5.jpg)
BAGGING - AGGREGATE BOOTSTRAPPING
For i = 1 .. MDraw n*<n samples from D with
replacementLearn classifier Ci
Final classifier is a vote of C1 .. CM
Increases classifier stability/reduces variance
![Page 6: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/6.jpg)
BAGGING
f1
f2
fT
ML
ML
ML
f
Random sa
mple
with re
placem
ent
Random sample
with replacement
![Page 7: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/7.jpg)
BOOSTING
Training Sample
Weighted Sample
Weighted Sample
fT
f1
…
f2
f
ML
ML
ML
![Page 8: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/8.jpg)
REVISIT BAGGING
![Page 9: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/9.jpg)
BOOSTING CLASSIFIER
![Page 10: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/10.jpg)
BAGGING VS BOOSTING
Bagging: the construction of complementary base-learners is left to chance and to the unstability of the learning methods.
Boosting: actively seek to generate complementary base-learner--- training the next base-learner based on the mistakes of the previous learners.
![Page 11: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/11.jpg)
BOOSTING (SCHAPIRE 1989)BOOSTING (SCHAPIRE 1989) Randomly select n1 < n samples from D without
replacement to obtain D1Train weak learner C1
Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2Train weak learner C2
Select all samples from D that C1 and C2 disagree onTrain weak learner C3
Final classifier is vote of weak learners
![Page 12: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/12.jpg)
ADABOOST (SCHAPIRE ADABOOST (SCHAPIRE 1995)1995)Instead of sampling, re-weight
Previous weak learner has only 50% accuracy over new distribution
Can be used to learn weak classifiers
Final classification based on weighted vote of weak classifiers
![Page 13: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/13.jpg)
ADABOOST TERMSLearner = Hypothesis = Classifier
Weak Learner: < 50% error over any distribution
Strong Classifier: thresholded linear combination of weak learner outputs
![Page 14: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/14.jpg)
AdaBoostAdaptive
A learning algorithm
Building a strong classifier a lot of weaker ones
Boosting
![Page 15: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/15.jpg)
ADABOOST CONCEPT
1 { 1 }) , 1(h x
.
.
.
weak classifiers
slightly better than random
1
( )( )T
ttT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
![Page 16: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/16.jpg)
WEAKER CLASSIFIERS
1 { 1 }) , 1(h x
.
.
.
weak classifiers
slightly better than random
1
( )( )T
ttT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
Each weak classifier learns by considering one simple feature
T most beneficial features for classification should be selected
How to– define features?– select beneficial features?– train weak classifiers?– manage (weight) training
samples?– associate weight to each weak
classifier?
![Page 17: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/17.jpg)
THE STRONG CLASSIFIERS
1 { 1 }) , 1(h x
.
.
.
weak classifiers
slightly better than random
1
( )( )T
ttT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
How good the strong one will be?How good the strong one will be?
![Page 18: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/18.jpg)
THE ADABOOST ALGORITHM
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 1
1( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
:probability distribution of 's at time ( )t iD i x t:probability distribution of 's at time ( )t iD i x t
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
minimize weighted errorminimize weighted error
for minimize exponential lossfor minimize exponential loss
Give error classified patterns more chance for learning.Give error classified patterns more chance for learning.
![Page 19: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/19.jpg)
THE ADABOOST ALGORITHM
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 1
1( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
![Page 20: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/20.jpg)
BOOSTING ILLUSTRATION
Weak
Classifier 1
![Page 21: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/21.jpg)
BOOSTING ILLUSTRATION
Weights
Increased
![Page 22: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/22.jpg)
THE ADABOOST ALGORITHM
typicallywhere
the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set
where
![Page 23: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/23.jpg)
BOOSTING ILLUSTRATION
Weak
Classifier 2
![Page 24: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/24.jpg)
BOOSTING ILLUSTRATION
Weights
Increased
![Page 25: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/25.jpg)
BOOSTING ILLUSTRATION
Weak
Classifier 3
![Page 26: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/26.jpg)
BOOSTING ILLUSTRATION
Final classifier is
a combination of weak classifiers
![Page 27: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/27.jpg)
THE ADABOOST ALGORITHM
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 1
1( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier: 11
ln2
tt
t
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
What goal the AdaBoost wants to reach?
![Page 28: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/28.jpg)
THE ADABOOST ALGORITHM
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
Initialization: 1
1( ) , 1, ,mD i i m
For :1, ,t T
• Find classifier which minimizes error wrt Dt ,i.e., : { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
• Weight classifier:
• Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
What goal the AdaBoost wants to reach?
11ln
2t
tt
They are goal dependent.They are goal dependent.
![Page 29: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/29.jpg)
GOAL
Minimize exponential loss
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e
![Page 30: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/30.jpg)
GOAL
Minimize exponential loss
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e ( )yH x
Maximize the margin yH(x)
![Page 31: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/31.jpg)
GOALFinal classifier:
1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e Minimize
( ) ( ), |t tyH x yH x
x y x yE e E E e x
Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x
Then, ( ) ( )TH x H x
1[ ( ) ( )] |t t ty H x h xx yE E e x
1 ( ) ( ) |t t tyH x y h xx yE E e e x
1 ( ) ( ( )) ( ( ))t t tyH xx t tE e e P y h x e P y h x
![Page 32: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/32.jpg)
( ) ( ), |t tyH x yH x
x y x yE e E E e x
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ,( ) yH x
x yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( ))t t tyH xx t tE e e P y h x e P y h x
( ), 0tyH x
x yt
E e
Set
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
?t
![Page 33: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/33.jpg)
Final classifier:1
( ) ( )T
t tt
sign H x h x
Minimize
Define 1( ) ( ) ( )t t t tH x H x h x with 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
( ( ))1ln
2 ( ( ))t
tt
P y h x
P y h x
11
ln2
tt
t
(error)t P
?t ( )
exp ,( ) yH xx yloss H x E e
( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i
1
( )[ ( )]m
t i j ii
D i y h x
![Page 34: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/34.jpg)
( )exp ,( ) yH x
x yloss H x E e
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
( ( ))1ln
2 ( ( ))t
tt
P y h x
P y h x
11
ln2
tt
t
?t Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
(error)t P
( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i
1
( )[ ( )]m
t i j ii
D i y h x
![Page 35: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/35.jpg)
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ( ) ( ( )) ( ( )) 0t t tyH xx t tE e e P y h x e P y h x
0
( ( ))1ln
2 ( ( ))t
tt
P y h x
P y h x
11
ln2
tt
t
1 ?tD
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
(error)t P
( , ) ( )i i tP x y D i( , ) ( )i i tP x y D i
1
( )[ ( )]m
t i j ii
D i y h x
![Page 36: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/36.jpg)
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
1, ,
t t t tyH yH y hx y x yE e E e e
1 2 2 2,
11
2tyH
x y t t t tE e y h y h
1 2 2 2,
1arg min 1
2tyH
t x y t th
h E e y h y h
2 2 1y h
1 2,
1arg min 1
2tyH
t x y t th
h E e y h
1 21arg min 1 |
2tyH
t x y t th
h E E e y h x
![Page 37: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/37.jpg)
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
1 21arg min 1 |
2tyH
t x y t th
h E E e y h x
1arg min |tyHt x y t
hh E E e y h x
1arg max |tyHt x y
hh E E e yh x
1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H xt x
hh E h x e P y x h x e P y x
![Page 38: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/38.jpg)
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
( )1, ~ ( | )arg max ( )yH xtt x y e P y xh
h E yh x maximized when ( ) y h x x
( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x
h x sign P y x P y x
( )1, ~ ( | )( ) |yH xtt x y e P y x
h x sign E y x
1 1( ) ( )arg max 1 ( ) ( 1| ) ( 1) ( ) ( 1| )t tH x H xt x
hh E h x e P y x h x e P y x
![Page 39: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/39.jpg)
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
( ) ( )1 1, ~ ( | ) , ~ ( | )( ) ( 1| ) ( 1| )yH x yH xt tt x y e P y x x y e P y x
h x sign P y x P y x
1 ( ), ~ ( | )tyH xx y e P y xAt time t
![Page 40: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/40.jpg)
with
Final classifier:1
( ) ( )T
t tt
sign H x h x
( )exp ~ ,( ) yH x
x D yloss H x E e Minimize
Define 1( ) ( ) ( )t t t tH x H x h x 0 ( ) 0H x
Then, ( ) ( )TH x H x
1 ?tD
1 ( ), ~ ( | )tyH xx y e P y xAt time t
At time 1
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
Given: 1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y
I nitialization: 11( ) , 1, ,mD i i m
For :1, ,t T
•Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j iih
h D i y h x
•Weight classifier:11
ln2
tt
t
•Update distribution: 1
( ) exp[ ( )], is for normalizati( ) ont t i t i
t tt
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t tt
sign H x h x
, ~ ( | )x y P y x ( | ) 1i iP y x 11
1 1(1)D
Z m
At time t+1( ), ~ ( | )tyH xx y e P y x ( )t tyh x
tD e
1
( ) exp[ ( ), is for normaliza i
]( ) t ont t i t i
t tt
D i y h xD i Z
Z
![Page 41: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/41.jpg)
41
PROS AND CONS OF ADABOOST
AdvantagesVery simple to implementDoes feature selection resulting
in relatively simple classifierFairly good generalization
DisadvantagesSuboptimal solutionSensitive to noisy data and
outliers
![Page 42: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/42.jpg)
INTUITION
Train a set of weak hypotheses: h1, …., hT.
The combined hypothesis H is a weighted majority vote of the T weak hypotheses. Each hypothesis ht has a weight αt.
During the training, focus on the examples that are misclassified. At round t, example xi has the weight Dt(i).
![Page 43: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/43.jpg)
BASIC SETTING Binary classification problem Training data:
Dt(i): the weight of xi at round t. D1(i)=1/m.
A learner L that finds a weak hypothesis ht: X Y given the training set and Dt
The error of a weak hypothesis ht:
}1,1{,),,(),....,,( 11 YyXxwhereyxyx iimm
![Page 44: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/44.jpg)
THE BASIC ADABOOST ALGORITHM
For t=1, …, T
• Train weak learner using training data and Dt
• Get ht: X {-1,1} with error
• Choose
• Update
iit yxhitt iD
)(:
)(
t
tt
1ln
2
1
t
xhyt
iit
iit
t
tt
Z
eiD
yxhife
yxhife
Z
iDiD
itit
t
t
)(
1
)(
)(
)(*
)()(
![Page 45: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/45.jpg)
THE GENERAL ADABOOST ALGORITHM
![Page 46: E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte](https://reader036.vdocuments.us/reader036/viewer/2022062806/5697bf791a28abf838c825ac/html5/thumbnails/46.jpg)
46
PROS AND CONS OF ADABOOST
AdvantagesVery simple to implementDoes feature selection resulting
in relatively simple classifierFairly good generalization
DisadvantagesSuboptimal solutionSensitive to noisy data and
outliers