semiboost : boosting for semi-supervised learning pavan kumar mallapragada, student member, ieee,...
TRANSCRIPT
![Page 1: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/1.jpg)
SemiBoost : Boosting for Semi-supervised Learning
Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE,
Anil K. Jain, Fellow, IEEE, and Yi Liu, Student Member, IEEE
Presented by Yueng-Tien ,LoReference:SemiBoost: Boosting for Semi-supervised Learning P. K. Mallapragada, R. Jin, A. K. Jain, and Y. Liu, IEEE Transaction on Pattern Analysis and Machine Intelligence(PAMI), 31(11):2000-2014, 2009
![Page 2: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/2.jpg)
outline
• Introduction• Related Work• Semi-Supervised Boosting• Results and Discussion• Conclusions and Future Work
![Page 3: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/3.jpg)
Introduction
• The key idea of semi-supervised learning, specifically semi-supervised classification, is to exploit both labeled and unlabeled data to learn a classification model.
• There is an immense need for algorithms that can utilize the small amount of labeled data, combined with the large amount of unlabeled data to build efficient classification systems.
![Page 4: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/4.jpg)
Introduction
• Existing semi-supervised classification algorithms may be classified into two categories based on their underlying assumptions.– manifold assumption – cluster assumption
![Page 5: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/5.jpg)
Introduction
• manifold assumption : the data lie on a low dimensional manifold in the input space
• cluster assumption : the data samples with high similarity between them must share the same label.
![Page 6: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/6.jpg)
Introduction
• cluster /manifold assumption
![Page 7: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/7.jpg)
Introduction
• Most semi-supervised learning approaches design specialized learning algorithms to effectively utilize both labeled and unlabeled data.
• We refer to this problem of improving the performance of any supervised learning algorithm using unlabeled data as Semi-supervise Improvement, to distinguish our work from the standard semi-supervised learning problems.
![Page 8: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/8.jpg)
Introduction
• The key difficulties in designing SemiBoost are: – how to sample the unlabeled examples for training a new
classification model at each iteration – what class labels should be assigned to the selected
unlabeled examples.
![Page 9: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/9.jpg)
Introduction
• One way to address the above questions is to exploit both the clustering assumption and the large margin criterion
• Selecting the unlabeled examples with the highest classification confidence and assign them the class labels that are predicted by the current classifier.
• A problem with this strategy is that the introduction of examples with predicted class labels may only help to increase the classification margin, without actually providing any novel information to the classifier.
![Page 10: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/10.jpg)
Introduction
• To overcome the above problem, we propose using the pairwise similarity measurements to guide the selection of unlabeled examples at each iteration, as well as for assigning class labels to them.
![Page 11: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/11.jpg)
Related Work
![Page 12: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/12.jpg)
Related Work
• An inductive algorithm can be used to predict the labels of samples that are unseen during training (irrespective of it being labeled or unlabeled).
• Transductive algorithms are limited to predicting only the labels of the unlabeled samples seen during training.
![Page 13: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/13.jpg)
Related Work
• A popular way to define the inconsistency between the labels of the samples and the pairwise similarities Sij is the quadratic criterion:
where L is the combinatorial graph Laplacian.• The task is to assign values to the unknown labels in
such a way that the overall inconsistency is minimized.
niiyy 1 niix 1
LyyyySyF Tn
i
n
jjiji
1 1
2,
![Page 14: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/14.jpg)
Semi-Supervised Improvement
• denote the entire data set, including both the labeled and the unlabeled examples.– the first nl examples are labeled, given by – the imputed class labels of unlabeled examples
• Let denote the symmetric similarity matrix, where represents the similarity between xi and xj.– denote the submatrix of the similarity matrix– denote the submatrix of the similarity matrix
• ‘A ‘ denote the given supervised learning algorithm
nxxxD ,...,, 21
lnlll l
yyyy ,...,, 21 unuu
u uyyyy ,...,, 21
llS ll nn
ul nn luS
nnji
SS
,
0, jiS
![Page 15: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/15.jpg)
Semi-Supervised Improvement
• The goal of semi-supervised improvement is to improve the performance of ‘A ‘ iteratively by treating ‘A ‘ like a black box, using the unlabeled examples and the pairwise similarity S.
• In the semi-supervised improvement problem, we aim to build an ensemble classifier which utilizes the unlabeled samples in the way a graph-based approach would utilize.
![Page 16: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/16.jpg)
An outline of the SemiBoost algorithm for semi-supervised improvement
• Start with an empty ensemble.• At each iteration,– Compute the peusdo label (and its confidence)
for each unlabeled example (using existing ensemble,and the pairwise similarity).
– Sample most confident pseudo labeled examples;combine them with the labeled samples andtrain a component classifier using the supervised learning algorithm A.
– Update the ensemble by including the componentclassifier with an appropriate weight.
![Page 17: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/17.jpg)
SemiBoost
• The unlabeled samples must be assigned labels following two main criteria: – The points with high similarity among unlabeled samples must
share the same label – Those unlabeled samples which are highly similar to a labeled
sample must share its label.
• Our objective function is a combination of two terms: – one measuring the inconsistency between labeled and
unlabeled examples – the other measuring the inconsistency among the unlabeled
examples SyF uu ,
SyFl ,
SyF ,
![Page 18: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/18.jpg)
SemiBoost
• Inspired by the harmonic function approach, we define , the inconsistency between class labels y and the similarity measurement S, as
• Note that (1) can be expanded as
and due to the symmetry of S
SyFu ,
)1( exp,1,
,
un
ji
uj
uijiuu yySSyF
uj
ui
uuji
ui
uj
uuijuu yySyySSyF exp
2
1exp
2
1, ,,
![Page 19: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/19.jpg)
SemiBoost
• We have
where is the hyperbolic cosine function.
• Rewriting (1) using the function reveals the connection between the quadratic penalty used in the graph-Laplacian-based approaches, and the exponential penalty used in the current approach
)2( cosh,1,
,
un
ji
uj
ui
uujiuu yySSyF
2/))exp()(exp()cosh( jijiji yyyyyy
)cosh(x
![Page 20: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/20.jpg)
SemiBoost
• Using a penalty not only facilitates the derivation of boosting-based algorithms but also increases the classification margin.
• The inconsistency between labeled and unlabeled examples is defined as
cosh(.)
SyFl ,
)3( 2exp,1 1
,
l un
i
n
j
uj
li
uujil yySSyF
![Page 21: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/21.jpg)
SemiBoost
• Combining (1) and (3) leads to the objective function,
• The constant C is introduced to weight the importance between the labeled and the unlabeled data.
• Given the objective function in (4), the optimal class label yu is found by minimizing F.
)4( ,,, uuuu
lul SyCFSyFSyF
![Page 22: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/22.jpg)
SemiBoost
• The problem can now be formally expressed as
• The following procedure is adopted to derive the boosting algorithm:– The labels for the unlabeled samples yi
u are replaced by the ensemble predictions over the corresponding data sample.
– A bound-optimization-based approach is then used to find the ensemble classifier minimizing the objective function.
– The bounds are simplified further to obtain the sampling scheme, and other required parameters.
)5( ,,1, ˆ .. , min lli
li niyytsSyF
![Page 23: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/23.jpg)
The SemiBoost algorithm
• Compute the pairwise similarity Si,j between any two examples.
• Initialize H(x) = 0• For t = 1, 2, . . . , T
– Compute pi and qi for every example using Equations (9)and(10)
– Compute the class label zi = sign(pi − qi) for each example
– Sample example xi by the weight |pi − qi|– Apply the algorithm A to train a binary classifier ht(x) using the sampled examples and their class
labels zi– Compute αt using Equation (11)– Update the classification function as H(x) ← H(x) + αtht(x)
![Page 24: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/24.jpg)
SemiBoost-Algorithm
• Let denote the two-class classifi-cation model that is learned at the t-th iteration by the algorithm A.
• Let denote the combined classification model learned after the first T iterations.
where is the combination weight.
1,1:)( xh t
:xH
T
t
tt xhxH
1
)(
t
![Page 25: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/25.jpg)
SemiBoost-Algorithm
• This leads to the following optimization problem:
• This expression involves products of variables and making it nonlinear and, hence , difficult to optimize.
(7) n,1,i ,)( ..
)6( )(expexp
2exp minarg
l
1,,
1 1,
),(
lii
n
jijiji
uuji
n
i
n
jjj
li
luji
xh
yxhts
hhHHSC
hHyS
u
l u
ih
![Page 26: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/26.jpg)
SemiBoost-Algorithm (prop.1)
• Minimizing (7) is equivalent to minimizing the function
where
and
The quantities pi and qi can be interpreted as the confidence
in classifying the unlabeled example xi into the positive class
and the negative class, respectively.
)8( 2exp2exp1
1
un
iiiii qhphF
)10( 2
1,
)9( 2
1,
1,
1
2,
1,
1
2,
uji
l
i
uij
l
i
n
j
HHuuji
n
jj
Huljii
n
j
HHuuji
n
ij
Huljii
eSC
yeSq
eSC
yeSp
otherwiseandyxwhenyx 0 1),(
![Page 27: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/27.jpg)
SemiBoost-Algorithm (prop.2)
• The expression in (8) is difficult to optimize since the weight and the classifier h(x) are coupled together.
• Minimizing (8) is equivalent to minimizing
• We denote the upper bound in the above equation by F2.
uu n
iiii
n
iii qpheeqpF 21
1
221
![Page 28: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/28.jpg)
SemiBoost-Algorithm (prop.3)
• To minimize F2, the optimal class label zi for the example xi is zi = sign(pi – qi) and the weight for sampling example xi is |pi-qi|. The optimal that minimizes F1 is
)11( )1,()1,(
)1,()1,(ln
4
1
11
11
uu
uu
n
i ii
n
i ii
n
i ii
n
i ii
hqhp
hqhp
![Page 29: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/29.jpg)
SemiBoost-Algorithm
• At each relaxation, the “touch-point” is maintained between the objective function and the upper bound.
• As a result, the procedure guarantees: – The objective function always decreases through iterations– The final solution converges to a local minimum.
![Page 30: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/30.jpg)
The SemiBoost algorithm
• Compute the pairwise similarity Si,j between any two examples.
• Initialize H(x) = 0• For t = 1, 2, . . . , T– Compute pi and qi for every example
using Equations (9)and(10)– Compute the class label zi = sign(pi − qi) for each example– Sample example xi by the weight |pi − qi|– Apply the algorithm A to train a binary classifier ht(x) using the sampled examples and their class labels zi– Compute αt using Equation (11)– Update the classification function as H(x) ← H(x) + αtht(x)
![Page 31: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/31.jpg)
Implementation- Sampling
• Sampling forms the most important step for SemiBoost, just like any other boosting algorithm. The criterion for sampling usually considers the following issues: – 1) How many samples must be selected from the unlabeled
samples available for training • The choice currently is made empirically; selecting top 10
percent of the samples seem to work well in practice– 2) what is the distribution according to which the sampling
must be done.• The sampling is probabilistically done according to the
distribution:
un
i ii
iiis
qp
qpxP
1
)(
![Page 32: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/32.jpg)
Implementation- Stopping Criterion
• According to the optimization procedure, SemiBoost stops when , indicating that addition of that classifier would increase the objective function instead of decreasing it.
• We currently use an empirically chosen fixed number of classifiers in the ensemble, specified as a parameter T. We set the value of T =20.
0
![Page 33: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/33.jpg)
Results and Discussion
![Page 34: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/34.jpg)
conclusion
• The strength of SemiBoost lies in its ability to improve the performance of any given base classifier in the presence of unlabeled samples.
• We are working toward obtaining theoretical results that will guarantee the performance of SemiBoost, when the similarity matrix reveals the underlying structure of data (e.g., the probability that two points may share the same class).
![Page 35: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/35.jpg)
![Page 36: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/36.jpg)
Appendix
)2exp()2exp(2
1 )(exp jiji hhhh
)2exp()2exp(exp
2exp2exp
1,,
1 1,
u
l u
n
jijiji
uuji
n
i
n
jj
lij
li
luji
hhHHSC
hyHySF
![Page 37: SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and](https://reader036.vdocuments.us/reader036/viewer/2022062423/56649f4f5503460f94c71c72/html5/thumbnails/37.jpg)
Appendix
]1,1[ 1exp exp exp xxx
uu n
iiii
n
iii qphqpF
111 )(2)1)2exp()2)(exp((