outlinework.caltech.edu/slides/slides12.pdf · 2014-12-17 · outline • rization regula- ......
TRANSCRIPT
Review of Le ture 11• OverttingFitting the data more than is warranted
PSfrag repla ements
x
y
Fit
DataTargetFit-1-0.500.51-30-20-10010
VC allows it; doesn't predi t it
Fitting the noise, sto hasti /deterministi • Deterministi noise
PSfrag repla ements
x
y
h∗
f
-0.8-0.6-0.4-0.200.20.40.60.81-100-80-60-40-2002040
PSfrag repla ements
Number of data points, N
Target omplexity,Q
f
00.51
80 100 120 -0.2-0.100.10.2
050
100
Learning From DataYaser S. Abu-MostafaCalifornia Institute of Te hnologyLe ture 12: Regularization
Sponsored by Calte h's Provost O e, E&AS Division, and IST • Thursday, May 10, 2012
Outline• Regularization - informal• Regularization - formal• Weight de ay• Choosing a regularizer
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 2/21
Two approa hes to regularizationMathemati al:
Ill-posed problems in fun tion approximationHeuristi :
Handi apping the minimization of Ein © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 12 3/21
A familiar example
PSfrag repla ements
x
y
-1-0.8-0.6-0.4-0.200.20.40.60.81-8-6-4-20246 without regularization with regularization
PSfrag repla ements
x
y
-1-0.8-0.6-0.4-0.200.20.40.60.81-2-1.5-1-0.500.511.52 © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 12 4/21
and the winner is . . .
without regularization with regularization
PSfrag repla ements
x
y g(x)
sin(πx)
-1-0.8-0.6-0.4-0.200.20.40.60.81-3-2-10123
PSfrag repla ements
x
y g(x)
sin(πx)
-1-0.8-0.6-0.4-0.200.20.40.60.81-1.5-1-0.500.511.5bias = 0.21 var = 1.69 bias = 0.23 var = 0.33 © AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 5/21
The polynomial modelHq: polynomials of order Q linear regression in Z spa e
z =
1
L1(x)...Lq(x)
Hq =
Q∑
q=0
wq Lq(x)
Legendre polynomials:PSfrag repla ements
L1
x
PSfrag repla ementsL2
12(3x2 − 1)
PSfrag repla ementsL3
12(5x3 − 3x)
PSfrag repla ementsL4
18(35x4 − 30x2 + 3)
PSfrag repla ementsL5
18(63x5 · · · )
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 6/21
Un onstrained solutionGiven (x1, y1), · · · , (xN , yn) −→ (z1, y1), · · · , (zN , yn)
Minimize Ein(w) =1
N
N∑
n=1
(wTzn − yn)2
Minimize 1N
(Zw − y)T(Zw − y)
wlin = (ZTZ)−1ZTy © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 12 7/21
Constraining the weightsHard onstraint: H2 is onstrained version of H10 with wq = 0 for q > 2
Softer version: Q∑
q=0
w2q ≤ C soft-order onstraint
Minimize 1N
(Zw − y)T(Zw − y)
subje t to: wTw ≤ C
Solution: wreg instead of wlin © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 12 8/21
Solving for wreg
wlin
wtw = C
w
Ein = onst.
∇Einnormal
Minimize Ein(w) = 1N
(Zw − y)T(Zw − y)subje t to: wTw ≤ C
∇Ein(wreg) ∝ −wreg= −2 λ
Nwreg
∇Ein(wreg) + 2 λNwreg = 0
Minimize Ein(w) + λNwTw C ↑ λ ↓
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 9/21
Augmented errorMinimizing Eaug(w) = Ein(w) + λ
NwTw
= 1N
(Zw − y)T(Zw − y) + λNwTw un onditionally
− solves −Minimizing Ein(w) = 1
N(Zw − y)T(Zw − y)
subje t to: wTw ≤ C ←− VC formulation © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 12 10/21
The solutionMinimize Eaug(w) = Ein(w) + λ
NwTw
=1
N
(
(Zw − y)T(Zw − y) + λ wtw)
∇Eaug(w) = 0 =⇒ ZT(Zw − y) + λw = 0
wreg = (ZTZ + λI)−1 ZTy (with regularization)as opposed to wlin = (ZTZ)−1ZTy (without regularization) © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 12 11/21
The resultMinimizing Ein(w) + λ
NwTw for dierent λ's:
λ = 0 λ = 0.0001 λ = 0.01 λ = 1
PSfrag repla ements
x
y
Fit DataTargetFit-1-0.500.51-30-20-10010
PSfrag repla ements
x
y-1-0.500.5100.511.52
PSfrag repla ements
x
y
-1-0.500.5100.511.52
PSfrag repla ements
x
y
-1-0.500.5100.511.52overtting −→ −→ −→ −→ undertting
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 12/21
Weight `de ay'Minimizing Ein(w) + λ
NwTw is alled weight de ay. Why?
Gradient des ent:w(t + 1) = w(t) − η ∇Ein (
w(t))
− 2 ηλ
Nw(t)
= w(t) (1− 2ηλ
N) − η ∇Ein (
w(t))
Applies in neural networks:wTw =
L∑
l=1
d(l−1)∑
i=0
d(l)∑
j=1
(
w(l)ij
)2
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 13/21
Variations of weight de ayEmphasis of ertain weights: Q
∑
q=0
γq w2q
Examples: γq = 2q =⇒ low-order tγq = 2−q =⇒ high-order tNeural networks: dierent layers get dierent γ'sTikhonov regularizer: wTΓTΓw
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 14/21
Even weight growth!We ` onstrain' the weights to be large - bad!Pra ti al rule:
sto hasti noise is `high-frequen y'deterministi noise is also non-smooth
=⇒ onstrain learning towards smoother hypotheses
PSfrag repla ements
Regularization Parameter, λ
Expe tedE
out weight growthweight de ay
00.511.500.511.522.533.54
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 15/21
General form of augmented errorCalling the regularizer Ω = Ω(h), we minimize
Eaug(h) = Ein(h) +λ
NΩ(h)
Rings a bell? ↓ ↓
Eout(h) ≤ Ein(h) + Ω(H)
Eaug is better than Ein as a proxy for Eout © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 12 16/21
Outline• Regularization - informal• Regularization - formal• Weight de ay• Choosing a regularizer
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 17/21
The perfe t regularizer Ω
Constraint in the `dire tion' of the target fun tion (going in ir les )Guiding prin iple:Dire tion of smoother or simplerChose a bad Ω?We still have λ!regularization is a ne essary evil
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 18/21
Neural-network regularizersWeight de ay: From linear to logi al
PSfrag repla ements
lineartanhhard threshold
+1
−1
-4-2024-1-0.500.51Weight elimination:Fewer weights =⇒ smaller VC dimension
Soft weight elimination:Ω(w) =
∑
i,j,l
(
w(l)ij
)2
β2 +(
w(l)ij
)2
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 19/21
Early stopping as a regularizer
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.5
1
1.5
2
2.5
3
3.5
Epochs
Err
or
E
E
top
bottom
Early stopping
in
out
Regularization through the optimizer!When to stop? validation
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 20/21
The optimal λ
PSfrag repla ements
Regularization Parameter, λ
Expe tedEout
σ2 = 0
σ2 = 0.25
σ2 = 0.5
0.5 1 1.5 2.250.5.751
Sto hasti noise Deterministi noise
PSfrag repla ements
Regularization Parameter, λ
Expe tedE
outQf = 15
Qf = 30
Qf = 100
0.5 1 1.5 20.20.40.6
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 12 21/21