microsoft research ltd. semidefinite programming machines thore graepel and ralf herbrich microsoft...
Post on 26-Mar-2015
215 Views
Preview:
TRANSCRIPT
Microsoft Research Ltd.
Semidefinite Programming Machines
Thore Graepel and Ralf Herbrich
Microsoft Research Cambridge
Microsoft Research Ltd.
Overview
Invariant Pattern RecognitionSemidefinite Programming (SDP)From Support Vector Machines (SVMs)
to Semidefinite Programming Machines (SDPMs)
Experimental IllustrationFuture Work
Microsoft Research Ltd.
Typical Invariances for Images
Translation
Rotation
Shear
Microsoft Research Ltd.
Typical Invariances for Images
Translation
Rotation
Shear
Microsoft Research Ltd.
Toy Features for Handwritten Digits
1 =0.48
3=0.37
2=0.58
Microsoft Research Ltd.
Warning: Highly Non-Linear
Á1
Á2
Microsoft Research Ltd.
Warning: Highly Non-Linear
0.2 0.3 0.4 0.5 0.60.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
1
2
Microsoft Research Ltd.
Motivation: Classification Learning
0.1 0.2 0.3 0.4 0.50.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
1(x)
2(x)
Can we learn with infinitely many examples?
Microsoft Research Ltd.
Motivation: Classification Learning
0.1 0.2 0.3 0.4 0.50.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
1(x)
2(x)
Microsoft Research Ltd.
Motivation: Version Spaces
Original patterns Transformed patterns
Microsoft Research Ltd.
Semidefinite Programs (SDPs)
Linear objective functionPositive semidefinite (psd) constraints
Infinitely many linear constraints
Microsoft Research Ltd.
SVM as a Quadratic Program
Given: A sample ((x1,y1),…,(xm,ym)).
SVMs find the weight vector w that maximises the margin on the sample
Microsoft Research Ltd.
SVM as a Semidefinite Program (I)
A (block)-diagonal matrix is psd if and only if all its blocks are psd.
Aj:=
g1,j
gi,j
gm,j
B:=
1
1
1
1
1
Microsoft Research Ltd.
SVM as a Semidefinite Program (I)
A (block)-diagonal matrix is psd if and onlyif all its blocks are psd.
Aj:=
g1,j
gi,j
gm,j
B:=
1
1
1
1
1
Microsoft Research Ltd.
SVM as a Semidefinite Program (II)
Transform quadratic into linear objective
Adds new (n+1)£(n+1) block to Aj and B
Use Schur’s complement lemma
Microsoft Research Ltd.
Taylor Approximation of Invariance
Let T (x,µ) be an invariance transformation with parameter µ (e.g., angle of rotation).
Taylor Expansion about 0=0 gives
Polynomial approximation to trajectory.
Microsoft Research Ltd.
Extension to Polynomials
Consider polynomial trajectory x(µ):
Infinite number of constraints from training example (x(0),…, x(r),y):
Microsoft Research Ltd.
Non-Negative Polynomials (I)
Theorem (Nesterov,2000): If r=2l then 1. For every psd matrix P the polynomial
p(µ)=µTP µ is non-negative everywhere.
2. For every non-negative polynomial p there exists a psd matrix P such that p(µ)=µTPµ.
Example:
Microsoft Research Ltd.
Non-Negative Polynomials (II)
(1) follows directly from psd definition(2) follows from sum-of-squares lemma.Note that (2) states the mere existence:
Polynomial of degree r: r+1 parametersCoefficient matrix P:(r+2) (r+4)/8 parameters
For r >2, we have to introduce another r(r-2)/8 auxiliary variables to find P.
Microsoft Research Ltd.
Semidefinite Programming Machines
Extension of SVMs as (non-trivial) SDP.
Aj:=
g1,j
gi,j
gm,j
B:=
1
1
1
1
1
1
G1,j
Gi,j
Gm,j
1 0
0 0
1 0
0 0
1 0
0 0
Microsoft Research Ltd.
Semidefinite Programming Machines
Extension of SVMs as (non-trivial) SDP.
Aj:=
g1,j
gi,j
gm,j
B:= 1
1
1
1
G1,j
Gi,j
Gm,j
1 0
0 0
1 0
0 0
1 0
0 0
Microsoft Research Ltd.
Example: Second-Order SDPMs
2nd order Taylor expansion:
Resulting polynomial in µ:
Set of constraint matrices:
Microsoft Research Ltd.
Example: Second-Order SDPMs
2nd order Taylor expansion:
Resulting polynomial in µ:
Set of constraint matrices:
Microsoft Research Ltd.
Non-Negative on Segment
Given a polynomial p of degree 2l, consider the polynomial
Note that q is a polynomial of degree 4l.If q is positive everywhere, then p is
positive everywhere in [-¿,+¿].-5 0 5 10-10
-5
0
5
10
f()
Microsoft Research Ltd.
Non-Negative on Segment
-5 0 5 10-10
-5
0
5
10
f( )
Microsoft Research Ltd.
Truly Virtual Support Vectors
Dual complementarity yields expansion:
The truly virtual support vectors are linear combinations of derivatives:
Microsoft Research Ltd.
Truly Virtual Support Vectors
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.31 0.315 0.32 0.325
0.188
0.189
0.19
0.191
0.192
0.193
“1”
“9”
Microsoft Research Ltd.
Visualisation: USPS “1” vs. “9”
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
¿ = 20º
0.31 0.315 0.32 0.325
0.188
0.189
0.19
0.191
0.192
0.193
Microsoft Research Ltd.
Results: Experimental Setup
All 45 USPS classification tasks (1-v-1).20 training images; 250 test images.Rotation is applied to all training images
with ¿ = 10º.All results are averaged over 50 random
training sets.Compared to SVM and virtual SVM.
Microsoft Research Ltd.
Results: SDPM vs. SVM
0 0.05 0.1 0.15 0.20
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
SVM error
SD
PM
err
or
Microsoft Research Ltd.
Results: SDPM vs. Virtual SVM
0 0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.02
0.04
0.06
0.08
0.1
0.12
0.14
VSVM error
SD
PM
err
or
Microsoft Research Ltd.
Results: Curse of Dimensionality
Microsoft Research Ltd.
Results: Curse of Dimensionality
1 parameter 2 parameters
Microsoft Research Ltd.
Extensions & Future Work
Multiple parameters µ1, µ2,..., µD.
(Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A.
Kharechko and J. Shawe-Taylor). Sparsification by efficiently finding the example
x and transformation µ with maximal information (idea of Neil Lawrence).
Expectation propagation for BPMs (idea of Tom Minka).
Microsoft Research Ltd.
Conclusions & Future Work
Learning from infinitely many examples.Truly virtual support vectors xi(µi*).
Multiple parameters µ1, µ2,..., µD.
(Efficient) adaptation to kernel space.Semidefinite Perceptrons (NIPS poster
with A. Kharechko and J. Shawe-Taylor).
top related