approximate leave-one-out error estimation for learning with smooth, strictly convex margin loss...
TRANSCRIPT
Approximate Leave-One-Out Error Estimation for Learning with Smooth,
Strictly Convex Margin Loss FunctionsChristopher P. Diehl
Research and Technology Development CenterJohns Hopkins Applied Physics Laboratory
September 29, 2004
Overview of the Talk
General Learning FrameworkModel SelectionLOO Error EstimationUnlearning and LOO Margin EstimationComments on the MethodEvaluating the ApproximationResultsConclusion and Future Work
General Large Margin Learning Framework
Class of objective functions includesSmoothed Support Vector MachinesKernel Logistic Regression
( )convex strictly are and
where
(
Risk EmpiricalRisk
Structural
g
bxxKyxfxfyz
zgCbbL
sIjjjj
iii
N
iiib
Ω
+==
+Ω=
∑
∑
∈
=
,)()(
)(),),(min1
,
α
43421321
ααα
Model Selection
The GoalIdentify the model that yields the best generalization performance
The NeedAn efficient means of estimating the generalization performance (error rate) using the available data
Common ApproachesValidation SetK-fold Cross-ValidationLeave-One-Out (N-fold Cross-Validation)
Almost unbiased estimator of the generalization errorValuable tool when data is scarce for one or more classes
LOO Error Estimation
Definition
∑=
<=N
i
kkNLOO zIe
1
)(1 0 ( )kk
kk
k xfyz )()( =where and
)(kf is the classifier trained on all data except ),( kk yx
Approximation Strategy
Estimate )(kf based on f to derive an estimate of
the LOO margin )(kkz
Unlearning and LOO Margin Estimation
Repeated Steps in LOOUnlearn the example
Compute the LOO margin
Exact UnlearningSet regularization parameter to remove example’s influenceUpdate classifier parameters to minimize
via multiple Newton steps
),( kk yx)(k
kz
0=kC
∑≠=
+Ω=N
kii
iik
bzgCbbL
1
)(
,)(),),(min αα
α(
Unlearning and LOO Margin Estimation
Approximate UnlearningCompute only one Newton step to approximate the change in the classifier parameters
Estimate of LOO margin is then
∆+∆+≈ ∑
∈ SIjkjjjkk
kk bxxKyyzz ),()( α
( ) ( )),(minarg), **,
)(,
1
,
)(,
2
****bLbLL
b b
kbb
kb αα
αααα
α ( where =∇
∇−=
∆∆ −
[ ]),(
),(2
1
1)(
**
),(ˆ
ˆˆ)(1ˆˆ)(
bb
kSkjkjk
kkkk
kkkkk
kk
LyIjxxKyy
zgCzgCzz
ααH
q
qHqqHq
∇=∈∀=
′′−′
+≈−
−
where
T
T
T
L
Comments about the Method
Main Computational ExpenseComputing for each exampleInverse Hessian available from training phase
Margin SensitivityLimiting form of approximation yields
Margin increases monotonically with decreasing regularization (increasing Ck )Implies ∆zk ≤ 0 when unlearning the example – examples classified incorrectly will remain in error after unlearning
GeneralizationsApproximations can be derived for other strictly convex objective functions and cross-validation methods (e.g. K-fold cross-validation)
kk qHq ˆˆ 1−T
0ˆˆ)( 1 ≥′−=∂∂ −
kkkk
k zgCz qHq T
Evaluating the Approximation
Two Questions1. How much approximation error are we incurring?2. How much computational savings are we gaining?
Evaluation Approach1. Compute fraction of margin sign errors
Fraction of examples that yield exact and approximate LOO margins with different signs
2. Compute ratio of CPU time used to compute exact and approximate LOO error estimates
Used cputime command in MATLAB to collect measurements
Fraction of Margin Sign Errors
Reduction in CPU Time
LOO for Model Selection
Conclusion and Future Work
RecapIntroduced approximate LOO error estimation method for strictly convex objective functionsMethod estimates the change in the margin by computing a single Newton step to approximately unlearn a given example
Future WorkInvestigate causes of LOO error failing to track the test errorDefine and evaluate method for model selection based on the LOO margin distribution