The Forgetron: A kernel-based Perceptron on a fixed budget
Problem Setting•Online learning:
•Choose an initial hypothesis f0
•For t=1,2,…
•Receive an instance xt and predict sign(ft(xt))
•If (yt ft(xt) <= 0)
•update the hypothesis f
•Goal: minimize the number of prediction mistakes
•Kernel-based hypotheses
•Example: the dual Perceptron
• is always 1
•Initial hypothesis:
•Update rule:
•Competitive analysis
•Compete against any g:
•Goal: A provably correct algorithm on a budget: |It| <= B
The Forgetron Proof Technique
Experiments
Hardness Result•For any kernel-based algorithm that satisfies |It| <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss.
• for each ft based on B examples, exists x in X such that ft(x) = 0
•norm of the competitor is
we cannot compete with hypotheses whose norm larger than
•Initilize:
•For t=1,2…
•define
•Receive an instance xt, predict sign(ft(xt)), and then receive yt
•If yt ft(xt) <= 0 set Mt = Mt-1 + 1 and update:
•If |I’t|<= B skip the next two steps
•define rt = min It
•Progress measure: can be rewritten as
•The Perceptron update leads to positive progress:
•The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ?
synthetic (10% label noise)
The Hebrew University Jerusalem, Israel
Ofer Dekel
Shai Shalev-Shwartz
Yoram Singer
•A sequence {(x1,y1),…,(xT,yT)} such that K(xt,xt) <= 1
•Budget B >= 84
•Competitor g s.t.
•Then, 1 2 3 ... ... t-1 t
Step (1) - Perceptron
Step (2) - Shrinking
Step (3) - Removal
•Compare to Crammer, Kandola, Singer (NIPS’03)
•Display online error vs. budget constraint 1 2 3 ... ... t-1 t
1 2 3 ... ... t-1 t
1 2 3 ... ... t-1 t
Deviation due to Shrinking
Deviation due to Removal
•Case I: the shrinking is projection onto the ball of radius U = ||g||:
•Case II: aggressive shrinking
•If example (x,y) is activated on round r and deactivated on round t then,
Perceptron’s progress
Deviation due to removal:
Mistake Bound
hinge lossmax { 0 , 1 - ytg(xt)
}
1000 2000 3000 4000 5000 6000
0.05
0.1
0.15
0.2
0.25
0.3
budget size - B
ave
rag
e e
rro
r
ForgetronCKS
0 500 1000 1500 2000
0.2
0.25
0.3
0.35
0.4
0.45
budget size - B
ave
rag
e e
rro
r
ForgetronCKS
100 200 300 400 5000.05
0.1
0.15
0.2
0.25
budget size - B
ave
rag
e e
rro
r
ForgetronCKS
USPS
census-income (adult)
1000 2000 3000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
budget size - B
ave
rag
e e
rro
r
ForgetronCKS
MNIST
define
define