approximate leave-one-out error estimation for learning with smooth, strictly convex margin loss...

Approximate Leave-One-Out Error Estimation for Learning with Smooth,

Strictly Convex Margin Loss FunctionsChristopher P. Diehl

Research and Technology Development CenterJohns Hopkins Applied Physics Laboratory

[email protected]

September 29, 2004

Overview of the Talk

General Learning FrameworkModel SelectionLOO Error EstimationUnlearning and LOO Margin EstimationComments on the MethodEvaluating the ApproximationResultsConclusion and Future Work

General Large Margin Learning Framework

Class of objective functions includesSmoothed Support Vector MachinesKernel Logistic Regression

( )convex strictly are and

where

(

Risk EmpiricalRisk

Structural

g

bxxKyxfxfyz

zgCbbL

sIjjjj

iii

N

iiib

Ω

+==

+Ω=

∑

∑

∈

=

,)()(

)(),),(min1

,

α

43421321

ααα

Model Selection

The GoalIdentify the model that yields the best generalization performance

The NeedAn efficient means of estimating the generalization performance (error rate) using the available data

Common ApproachesValidation SetK-fold Cross-ValidationLeave-One-Out (N-fold Cross-Validation)

Almost unbiased estimator of the generalization errorValuable tool when data is scarce for one or more classes

LOO Error Estimation

Definition

∑=

<=N

i

kkNLOO zIe

1

)(1 0 ( )kk

kk

k xfyz )()( =where and

)(kf is the classifier trained on all data except ),( kk yx

Approximation Strategy

Estimate )(kf based on f to derive an estimate of

the LOO margin )(kkz

Unlearning and LOO Margin Estimation

Repeated Steps in LOOUnlearn the example

Compute the LOO margin

Exact UnlearningSet regularization parameter to remove example’s influenceUpdate classifier parameters to minimize

via multiple Newton steps

),( kk yx)(k

kz

0=kC

∑≠=

+Ω=N

kii

iik

bzgCbbL

1

)(

,)(),),(min αα

α(

Unlearning and LOO Margin Estimation

Approximate UnlearningCompute only one Newton step to approximate the change in the classifier parameters

Estimate of LOO margin is then

∆+∆+≈ ∑

∈ SIjkjjjkk

kk bxxKyyzz ),()( α

( ) ( )),(minarg), **,

)(,

1

,

)(,

2

****bLbLL

b b

kbb

kb αα

αααα

α ( where =∇

∇−=

∆∆ −

[ ]),(

),(2

1

1)(

**

),(ˆ

ˆˆ)(1ˆˆ)(

bb

kSkjkjk

kkkk

kkkkk

kk

LyIjxxKyy

zgCzgCzz

ααH

q

qHqqHq

∇=∈∀=

′′−′

+≈−

−

where

T

T

T

L

Comments about the Method

Main Computational ExpenseComputing for each exampleInverse Hessian available from training phase

Margin SensitivityLimiting form of approximation yields

Margin increases monotonically with decreasing regularization (increasing Ck )Implies ∆zk ≤ 0 when unlearning the example – examples classified incorrectly will remain in error after unlearning

GeneralizationsApproximations can be derived for other strictly convex objective functions and cross-validation methods (e.g. K-fold cross-validation)

kk qHq ˆˆ 1−T

0ˆˆ)( 1 ≥′−=∂∂ −

kkkk

k zgCz qHq T

Evaluating the Approximation

Two Questions1. How much approximation error are we incurring?2. How much computational savings are we gaining?

Evaluation Approach1. Compute fraction of margin sign errors

Fraction of examples that yield exact and approximate LOO margins with different signs

2. Compute ratio of CPU time used to compute exact and approximate LOO error estimates

Used cputime command in MATLAB to collect measurements

Fraction of Margin Sign Errors

Reduction in CPU Time

LOO for Model Selection

Conclusion and Future Work

RecapIntroduced approximate LOO error estimation method for strictly convex objective functionsMethod estimates the change in the margin by computing a single Newton step to approximately unlearn a given example

Future WorkInvestigate causes of LOO error failing to track the test errorDefine and evaluate method for model selection based on the LOO margin distribution

approximate leave-one-out error estimation for learning with smooth, strictly convex margin loss...

Technology

causes of loo error

approximation error

approximate loo margins

test error

approximate leave

fold crossvalidation

model selectionconclusion

crossvalidation methods