proximal plane classification kdd 2001 san francisco august 26-29, 2001

23
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute University of Wisconsin - Madison

Upload: yamal

Post on 22-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001. Glenn Fung & Olvi Mangasarian. Data Mining Institute University of Wisconsin - Madison. Second Annual Review June 1, 2001. Key Contributions. Fast new support vector machine classifier - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Proximal Plane ClassificationKDD 2001

San Francisco August 26-29, 2001

Glenn Fung & Olvi Mangasarian

Second Annual ReviewJune 1, 2001

Data Mining Institute University of Wisconsin - Madison

Page 2: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Key Contributions

Fast new support vector machine classifier

An order of magnitude faster than standard classifiers

Extremely simple to implement

4 lines of MATLAB code

NO optimization packages (LP,QP) needed

Page 3: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Outline of Talk

(Standard) Support vector machine (SVM) classifiers Proximal support vector machines (PSVM) classifiers

Geometric motivation Linear PSVM classifier Nonlinear PSVM classifier

Full and reduced kernels Numerical results

Correctness comparable to standard SVM Much faster classification!

2-million points in 10-space in 21 seconds Compared to over 10 minutes for standard SVM

Page 4: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Support Vector MachinesMaximizing the Margin between Bounding

Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjíwjj22

w

Page 5: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Proximal Vector MachinesFitting the Data using two parallel

Bounding Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjíwjj22

w

Page 6: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

SVM as an Unconstrained Minimization Problem

At the solution of (QP) : where (á)+ = maxfá;0g

y = (eà D(Awà eí ))+ ,

Hence (QP) is equivalent to :minw;í 2

÷k(eà D(Awà eí ))+k22 + 2

1kw; í k22

2÷kyk2

2 + 21kw;í k2

2D(Awà eí ) + y > e

y > 0;w;ímin

s. t.(QP)

Changing to 2-norm and measuring margin in ( ) space:w;í

Page 7: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

PSVM Formulation

We have from the QP SVM formulation:

w;í (QP)2÷kyk2

2 + 21kw;í k2

2D(Awà eí ) + y

mins. t. = e=

This simple, but critical modification, changes the nature of the optimization problem tremendously!!

Solving for in terms of and gives:

minw;í 2÷keà D(Awà eí )k2

2 + 21kw; í k2

2

y w í

Page 8: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Advantages of New Formulation

Objective function remains strongly convex

An explicit exact solution can be written in terms of the problem data

PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space

Exact leave-one-out-correctness can be obtained in terms of problem data

Page 9: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Linear PSVM

We want to solve:

w;ímin 2

÷keà D(Awà eí )k22 + 2

1kw; í k22

Setting the gradient equal to zero, gives a nonsingular system of linear equations.

Solution of the system gives the desired PSVM classifier

Page 10: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Linear PSVM Solution

H = [A à e]Here,

íw

h i= (÷

I + H 0H)à 1H 0De

The linear system to solve depends on:

H 0H(n + 1) â (n + 1)which is of the size

is usually much smaller than n m

Page 11: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Linear Proximal SVM Algorithm

Classifier: sign(w0x à í )

Input A;D

Define H = [A à e]

Solve (÷I + H 0H) í

wh i

= v

v = H0DeCalculate

Page 12: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Nonlinear PSVM Formulation

By QP “duality”, w = A0Du. Maximizing the margin in the “dual space” , gives:

2÷keà D(AA0Du à eí )k2

2+ 21ku;í k2

2u;ímin

K (A;A0) Replace AA0by a nonlinear kernel :

2÷keà D(K (A;A0)Du à eí )k2

2+ 21ku;í k2

2u;ímin

Linear PSVM: (Linear separating surface:x0w = í )

w;í (QP)2÷kyk2

2 + 21kw;í k2

2D(Awà eí ) + y

mins. t. = e

Page 13: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

The Nonlinear Classifier

Gaussian (Radial Basis) Kernel :

"à ökA ià A jk22; i; j = 1;. . .;m

Polynomial Kernel : (AA0+ öaa0)dï

K (A;B) : Rmâ n â Rnâ l 7à! Rmâ lK (x0;A0)Du = í

The nonlinear classifier:

Where K is a nonlinear kernel, e.g.:

Page 14: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Nonlinear PSVM

H = [K (A;A0) à e]Defining slightly different:

íu

h i= (÷

I + H 0H)à 1H 0De

Similar to the linear case, setting the gradient equal to zero, we obtain:

However, reduced kernels techniques can be used (RSVM)to reduce dimensionality.

Here, the linear system to solve is of the size

(m+ 1) â (m+ 1)

Page 15: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Linear Proximal SVM Algorithm

Input A;D

Solve (÷I + H 0H) í

wh i

= v

v = H0DeCalculate

Non

Define H = [A à e] K = K (A;A0)K

Classifier: sign(w0x à í ) Classifier: sign(u0K (x;A0) à í )

u u = Du

Page 16: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

PSVM MATLAB Code

function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification% INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = pvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Page 17: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Linear PSVM Comparisons with Other SVMs

Much Faster, Comparable Correctness

Data Setm x n

PSVMTen-fold test

%Time (sec.)

SSVM Ten-fold test

%Time (sec.)

SVM Ten-fold test

%Time (sec.)

WPBC (60 mo.)110 x 32

68.50.02

68.50.17

62.73.85

Ionosphere351 x 34

87.30.17

88.71.23

88.02.19

Cleveland Heart297 x 13

85.90.01

86.20.70

86.51.44

Pima Indians768 x 8

77.50.02

77.60.78

76.437.00

BUPA Liver345 x 6

69.40.02

70.00.78

69.56.65

Galaxy Dim4192 x 14

93.50.34

95.05.21

94.128.33

light

Page 18: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Linear PSVMComparisons on Larger Adult Dataset

Much Faster & Comparable Correctness

Dataset Size

Testing correctness % Running time Sec. (Best in Red)

(Train,Test)

Attributes=123

PSVM LSVM

SSVM

SOR SMO SVM

(11221,21341)

84.482.5

84.84

38.9

84.7914.1

84.3718.8

-17.0

84.68306.6

(16101,16461)

84.783.7

85.01

60.5

84.9621.5

84.6224.8

-35.3

84.83667.2

(22697,9865)

85.165.2

85.35

92.0

85.35

29.0

85.0631.3

-85.7

85.171425.6

(32562,16282)

84.567.4

85.05

140.9

85.0244.5

84.9683.9

-163.6

85.052184.0

light

Page 19: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Linear PSVM vs LSVM 2-Million Dataset

Over 30 Times Faster

Dataset Method TrainingCorrectness

%

TestingCorrectness %

TimeSec.

NDC“Easy”

LSVM 90.86 91.23 658.5PSVM 90.80 91.13 20.8

NDC“Hard”

LSVM 69.80 69.44 655.6PSVM 69.84 69.52 20.6

Page 20: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Nonlinear PSVM: Spiral Dataset94 Red Dots & 94 White Dots

Page 21: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Nonlinear PSVM Comparisons

Data Setm x n

PSVMTen-fold test

%Time (sec.)

SSVM Ten-fold test

%Time (sec.)

LSVM Ten-fold test

%Time (sec.)

Ionosphere351 x 34

95.24.60

95.825.25

95.814.58

BUPA Liver345 x 6

73.64.34

73.720.65

73.730.75

Tic-Tac-Toe958 x 9

98.474.95

98.4395.30

94.7350.64

Mushroom *8124 x 22

88.035.50

88.8307.66

87.8503.74

* A rectangular kernel was used of size 8124 x 215

Page 22: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Conclusion

PSVM is an extremely simple procedure for generating linear and nonlinear classifiers

PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier

Comparable test set correctness to standard SVM

Much faster than standard SVMs : typically an order of magnitude less.

Page 23: Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Future Work

Extension of PSVM to multicategory classification

Massive data classification using an incremental PSVM

Parallel extension and implementation of PSVM