online multiple kernel classification steven c.h. hoi, rong jin, peilin zhao, tianbao yang machine...

27
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical & Computer Engineering MATH 6397: Data Mining

Upload: marvin-kenneth-underwood

Post on 20-Jan-2016

219 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

Online Multiple Kernel ClassificationSteven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang

Machine Learning (2013)

Presented by Audrey Cheong

Electrical & Computer Engineering

MATH 6397: Data Mining

Page 2: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

2

Online Multiple Kernel Classification (OMKC)

Background - Online• Online learning

• Learns one instance at a time and predicts labels for future instances

Learner is given an instance

Learner predicts the label of the

instance

Learner is given the correct label

Learner refines its prediction mechanism

Page 3: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

3

Online Multiple Kernel Classification (OMKC)

Background – Multiple Kernel• Composed of two online learning algorithms:

• Perceptron algorithm (Rosenblatt 1958) • Type of linear classifier• Learns a classifier for a given kernel

• Hedge algorithm (Freund and Schapire 1997) • Combines classifiers by linear weights

: Classifier 2

Perceptron

: Classifier 1

Perceptron

: Classifier 3

Perceptron

where

Hedge

Page 4: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

4

Online Multiple Kernel Classification (OMKC)

Perceptron algorithm• Input vector : • Output vector : ; • Weights : • Threshold : • Arithmetic test :

• Minimize :

𝑦 𝑖={−1 𝑖𝑓 𝛼 ⋅ 𝑥 𝑖<𝜃1𝑖𝑓 𝛼 ⋅ 𝑥 𝑖≥𝜃

Page 5: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

5

Online Multiple Kernel Classification (OMKC)

Hedge algorithm• Distribute weight among classifiers

• Setting new weights : for discount weight

• if the prediction is incorrect and if correct

Page 6: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

6

Online Multiple Kernel Classification (OMKC)

Notations• : trial• : mixture of kernel classifiers• : indicates if training instance

is misclassified by the kernel classifier at trial t

• : indicator function

• : prediction from combination of m kernel classifiers

• : classifier function

Page 7: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

7

Online Multiple Kernel Classification (OMKC)

Proposed framework• We define the optimal margin classification error for the

kernel with respect to a collection of training examples as

where

Page 8: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

8

Online Multiple Kernel Classification (OMKC)

Algorithms

Deterministic approach: all kernels are used

Stochastic approach: a subset of kernels are used

Deterministic Stochastic

Deterministic StochasticUpdate

Combination

Page 9: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

9

Online Multiple Kernel Classification (OMKC)

OMKC(D,D)Training sample

𝑓 1(𝑥) 𝑓 2(𝑥) 𝑓 𝑚(𝑥)…Kernel

classifiers :

𝑧1Prediction: 𝑧 2 … 𝑧𝑚

�̂� (𝑥)=∑𝑖=1

𝑚

𝑤𝑖 𝑧𝑖Combined Prediction:

Reduce if

Reduce if

Reduce if

Deterministic update

Deterministic combination

Deterministic Stochastic

Deterministic StochasticUpdate

Combination

Page 10: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

10

Online Multiple Kernel Classification (OMKC)

OMKC(S,S)Training sample

𝑓 1(𝑥) 𝑓 2(𝑥) 𝑓 𝑚(𝑥)…Kernel

classifiers :

𝑧1Prediction: 𝑧 2 … 𝑧𝑚

�̂� (𝑥)=∑𝑖=1

𝑚

𝑤𝑖 𝑧𝑖Combined Prediction:

Reduce if

𝑤2=0 𝑤𝑚=0

Stochastic update

Deterministic Stochastic

Deterministic StochasticUpdate

Combination

𝑤1≠0Stochastic combination

Page 11: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

11

Online Multiple Kernel Classification (OMKC)

Experimental setupbinary datasets

Page 12: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

12

Online Multiple Kernel Classification (OMKC)

Experimental setup• 15 diverse datasets obtained from LIBSVM and UCI

machine learning repository• Predefine 16 kernel functions

• 3 polynomial kernels (i.e. )• 13 Gaussian kernels (i.e.)

• Fix discount weight • Results are averaged over 20 runs

Page 13: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

13

Online Multiple Kernel Classification (OMKC)

Evaluation of the deterministic OMKC algorithm

• Comparison of the deterministic OMKC algorithm with three Perceptron based algorithms

• Perceptron : the well-known Perceptron baseline algorithm with a linear kernel (Rosenblatt 1958; Freund and Schapire 1999)

• Perceptron(u) : another Perceptron baseline algorithm with an unbiased/uniform combination of all the kernels

• Perceptron(*): an online validation procedure to search for the best kernel among the pool of kernels (using the first 10 % training examples), and then apply the Perceptron algorithm with the best kernel

• OM-2: a state-of-the-art online learning algorithm for multiple kernel learning (Jie et al. 2010; Orabona et al. 2010)

Page 14: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

14

Online Multiple Kernel Classification (OMKC)

Evaluation of the deterministic OMKC algorithm

<>

<

Page 15: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

15

Online Multiple Kernel Classification (OMKC)

Average mistake rate (20 runs)

Page 16: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

16

Online Multiple Kernel Classification (OMKC)

Number of support vectors (20 runs)

Page 17: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

17

Online Multiple Kernel Classification (OMKC)

Kernel weights

Page 18: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

18

Online Multiple Kernel Classification (OMKC)

Effect of optimal 𝛽= √𝑇√𝑇+√ ln𝑚

; 𝑇 : training   examples  𝑚  : #  of   kernels                

Page 19: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

19

Online Multiple Kernel Classification (OMKC)

Time Efficiency

Decreases as size increases

Page 20: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

20

Online Multiple Kernel Classification (OMKC)

Conclusion• All the OMKC algorithms usually perform better than

• the regular Perceptron algorithm with an unbiased linear combination of multiple kernels

• the Perceptron algorithm with the best kernel found by validation• the state-of-the-art online MKL algorithm

• The deterministic combination strategy usually performs better

• Stochastic updating strategy improves computational efficiency without decreasing the accuracy significantly

Page 21: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

21

Questions?1) How many kernel classifiers were used in the stochastic

combination?2) How was the number of support vectors determined? Should the

support vectors be given in terms of the number of support vectors per kernel classifier? Did support vectors overlap between kernel classifiers?

Page 22: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

22

Online Multiple Kernel Classification (OMKC)

References• Hoi, S. C. H., Jin, R., Zhao, P., & Yang, T. (2012). Online

Multiple Kernel Classification. Machine Learning, 90(2), 289–316. doi:10.1007/s10994-012-5319-2

Page 23: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

23

Online Multiple Kernel Classification (OMKC)

Algorithm 1

All kernels are used

: Represent the classifier at trial t : combination of m kernel classifiers

Deterministic Stochastic

Deterministic Stochastic

Update

Combination

Normalize the weights

Update

Combination

Page 24: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

24

Online Multiple Kernel Classification (OMKC)

Algorithm 1 → 2

: Represent the classifier at trial t : combination of m kernel classifiers

Stochastic combination

Deterministic update

17:

Update

Combination Deterministic Stochastic

Deterministic Stochastic

Page 25: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

25

Online Multiple Kernel Classification (OMKC)

Algorithm 2 → 3Deterministic Stochastic

Deterministic StochasticUpdate

Combination

Page 26: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

26

Online Multiple Kernel Classification (OMKC)

Algorithm 2 → 3Deterministic Stochastic

Deterministic Stochastic

Deterministic combination

Stochastic update

Guaranteeds that each kernel will be selected with at least probability • Tradeoff between exploration and

exploitation (Auer et al. 2003)

Update

Combination

Page 27: Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical

27

Online Multiple Kernel Classification (OMKC)

Algorithm 4Deterministic Stochastic

Deterministic Stochastic

Stochastic update

Stochastic combination

Update

Combination