s lecture08–part01 modelcomplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... ·...
TRANSCRIPT
![Page 1: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/1.jpg)
4 2 0 2 4y
0.0
0.2
0.4
0.6
0.8
1.0
s(y)
Lecture 08 – Part 01Model Complexity
![Page 2: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/2.jpg)
0.0 0.2 0.4 0.6 0.8 1.0x
2.5
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
y
![Page 3: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/3.jpg)
Empirical Risk Minimization1. Pick a model.
▶ E.g., linear prediction rules.
2. Pick a loss.▶ E.g., mean squared error.
3. Find a prediction rule minimizing the risk.
![Page 4: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/4.jpg)
Big Decision▶ Pick a model.
▶ Picking the wrong model causes problems.
![Page 5: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/5.jpg)
Underfitting▶ Fit 𝐻(𝑥) = 𝑤0 + 𝑤1𝑥?
▶ We have underfit the data.
0.0 0.2 0.4 0.6 0.8 1.02.5
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
![Page 6: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/6.jpg)
Overfitting
▶ Fit 𝐻(𝑥) = 𝑤0 + 𝑤1𝑥 + 𝑤2𝑥2 + … + 𝑤10𝑥10?▶ We have overfit the data.
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
![Page 7: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/7.jpg)
Model Complexity▶ Difference? Complexity.
▶ Complex models are highly flexible.▶ They tend to overfit.
▶ Simple models are stiff.▶ They tend to underfit.
![Page 8: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/8.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 9: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/9.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 10: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/10.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 11: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/11.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 12: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/12.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 13: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/13.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 14: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/14.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 15: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/15.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 16: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/16.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 17: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/17.jpg)
Degree 10 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 18: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/18.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 19: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/19.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 20: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/20.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 21: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/21.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 22: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/22.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 23: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/23.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 24: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/24.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 25: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/25.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 26: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/26.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 27: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/27.jpg)
Degree 1 Polynomial
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 28: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/28.jpg)
Example: kNN▶ 1NN: complex model, likely to overfit.
▶ 20NN: less complex model, likely to underfit.
![Page 29: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/29.jpg)
Choosing Model Complexity▶ How do we choose between two models?
▶ Between degree 10 and degree 1?▶ Between 1NN and 20NN?
▶ Not always obvious.
![Page 30: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/30.jpg)
Bad Idea: Use training MSE▶ Which has smaller MSE on training data?
▶ Problem: Best 10-degree polynomial will alwayshave smaller MSE on training data.
![Page 31: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/31.jpg)
Bad Idea: Use training MSE▶ Which has smaller MSE on training data?
▶ Problem: Best 10-degree polynomial will alwayshave smaller MSE on training data.
![Page 32: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/32.jpg)
Good Idea: Use validation MSE▶ We care about generalization.
▶ So keep a small amount of data hidden in avalidation set.
▶ Fit model on training data, compute MSE onvalidation set.
▶ Pick whichever model has smaller validationerror.
![Page 33: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/33.jpg)
What do you expect?▶ You fit a complex model on training data.
▶ Test it on a validation set.
▶ Likely: validation MSE > training MSE.
![Page 34: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/34.jpg)
What do you expect?▶ You fit a very simple model on training data.
▶ Test it on a validation set.
▶ Likely: validation MSE ≈ training MSE.
![Page 35: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/35.jpg)
Cross-Validation▶ We want all the training data we can get.
▶ Reserving some of it is wasteful.
▶ Idea: split data into pieces, each takes turn asvalidation set.
![Page 36: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/36.jpg)
k-Fold Cross Validation1. Split data set into 𝑘 pieces, 𝑆1, … , 𝑆𝑘.
2. Loop 𝑘 times; on iteration 𝑖:▶ Use 𝑆𝑖 as validation set; rest as training.▶ Compute validation error 𝜖𝑖
3. Overall error: 1𝑘 ∑𝜖𝑖
![Page 37: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/37.jpg)
Leave-One-Out Cross Validation▶ Suppose we have 𝑛 labeled data points.
▶ LOOCV: 𝑘-fold CV with 𝑘 = 𝑛.
![Page 38: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/38.jpg)
Another Approach▶ We can control complexity by choosing model.
▶ Also: via regularization.
![Page 39: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/39.jpg)
Regularization
▶ Let’s fit a complex model: 𝑤0 + 𝑤𝑥𝑥 + … + 𝑤10𝑥10.
▶ But impose a budget on weights, 𝑤0, … , 𝑤𝑑 .
0.0 0.2 0.4 0.6 0.8 1.0x
2.5
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5y
![Page 40: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/40.jpg)
Budgeting Weights▶ One way to budget: ask that ‖�⃗�‖2 is small.
▶ Before: minimize
𝑅sq(�⃗�) =1𝑛
𝑛∑𝑖=1(�⃗� ⋅ ⃗𝑥(𝑖) − 𝑦𝑖)2
▶ Now: minimize
�̃�sq(�⃗�) =1𝑛
𝑛∑𝑖=1(�⃗� ⋅ ⃗𝑥(𝑖) − 𝑦𝑖)2 + 𝜆‖�⃗�‖2
![Page 41: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/41.jpg)
Solution▶ The regularized Normal Equations:
(𝑋𝑇𝑋 + 𝜆𝐼)�⃗� = 𝑋𝑇 ⃗𝑦
![Page 42: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/42.jpg)
Example
𝜆 = 0
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 43: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/43.jpg)
Example
𝜆 = 1 × 10−4
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 44: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/44.jpg)
Example
𝜆 = 1
0.0 0.2 0.4 0.6 0.8 1.05
0
5
10
15
20
25
![Page 45: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/45.jpg)
Regularization▶ As 𝜆 increases, simpler models preferred.
▶ Pick 𝜆 using cross-validation.
![Page 46: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/46.jpg)
Other Penalizations▶ ‖�⃗�‖22 is ℓ2 regularization (explicit solution)
▶ a.k.a., ridge regression
▶ ‖�⃗�‖1 is ℓ1 regularization (no explicit solution)▶ a.k.a., the LASSO▶ encourages sparse �⃗�
![Page 47: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/47.jpg)
4 2 0 2 4y
0.0
0.2
0.4
0.6
0.8
1.0
s(y)
Lecture 08 – Part 02Logistic Regression
![Page 48: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/48.jpg)
Note▶ The midterm will cover everything up to rightnow.
▶ Regularization: yes.
▶ Logistic regression: no.
![Page 49: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/49.jpg)
Predicting Heart Disease▶ Classification problem: Does a patient haveheart disease?
▶ Features: blood pressure, cholesterol level,exercise amount, maximum heart rate, sex
![Page 50: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/50.jpg)
Better idea...▶ Instead of predicting yes/no...
▶ Give a probability that they have heart disease.▶ 1 = definitely yes▶ 0 = definitely no▶ 0.75 = probably, yes▶ …
![Page 51: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/51.jpg)
Associations▶ If cholesterol is high, increased likelihood.
▶ Positive association.
▶ If exercise is low, increased likelihood.▶ Negative association.
![Page 52: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/52.jpg)
The Model▶ Measure cholesterol (𝑥1), exercise (𝑥2), etc.
▶ Idea: weighted1 “vote” for heart disease:
𝑤1𝑥1 + 𝑤2𝑥2 + … + 𝑤𝑑𝑥𝑑
▶ Convention:▶ A positive number = vote for yes▶ A negative number = vote for no
1We’ll learn weights later.
![Page 53: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/53.jpg)
The Model▶ Add a “bias” term:
𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + … + 𝑤𝑑𝑥𝑑= �⃗� ⋅ Aug( ⃗𝑥)
▶ The more positive �⃗� ⋅ Aug( ⃗𝑥), the more likely.
▶ The more negative �⃗� ⋅ Aug( ⃗𝑥), the less likely.
![Page 54: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/54.jpg)
Converting to a Probability▶ Probabilities are between 0 and 1.
▶ Problem: �⃗� ⋅ Aug( ⃗𝑥) can be anything in (−∞,∞)
▶ We need to convert it to a probability.
![Page 55: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/55.jpg)
The Logistic Function
𝜎(𝑡) = 11 + 𝑒−𝑡
4 2 0 2 4y
0.0
0.2
0.4
0.6
0.8
1.0
s(y)
![Page 56: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/56.jpg)
The Model▶ Our simplified model for probability of heartdisease:
𝐻( ⃗𝑥) = 𝜎(�⃗� ⋅ Aug( ⃗𝑥))
▶ What should �⃗� be?
▶ To find �⃗�, use principle of maximum likelihood.
![Page 57: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/57.jpg)
Maximum Likelihood▶ Suppose you have an unfair coin.
▶ Probability of heads is 𝑝, unknown.
▶ Flip 8 times and see: H, H, T, H, H, H, H, T
▶ Which is more likely: 𝑝 = 0.5 or 𝑝 = 0.75?
![Page 58: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/58.jpg)
Maximum Likelihood▶ Assume coin flips are independent.
▶ The likelihood of H, H, T, H, H, H, H, T is:
L(𝑝) = 𝑝 ⋅ 𝑝 ⋅ (1 − 𝑝) ⋅ 𝑝 ⋅ 𝑝 ⋅ 𝑝 ⋅ 𝑝 ⋅ (1 − 𝑝)= 𝑝6(1 − 𝑝)2
▶ Idea: find 𝑝 maximizing L(𝑝)▶ Equivalently, find 𝑝 maximizing logL(𝑝)
![Page 59: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/59.jpg)
Maximum LikelihoodFind 𝑝 maximizing logL(𝑝) = log𝑝6(1 − 𝑝)2:
![Page 60: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/60.jpg)
Maximum Likelihood▶ In general, given 𝑛1 observed heads, 𝑛2observed tails, maximize:
log [𝑃(𝐹1 = 𝑓1) ⋅ 𝑃(𝐹2 = 𝑓2) ⋯ ⋅ 𝑃(𝐹𝑛 = 𝑓𝑛)]
=𝑛∑𝑖=1
log𝑃(𝐹1 = 𝑓1)
![Page 61: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/61.jpg)
Back to Logistic Regression▶ The probability that person 𝑖 has heart disease:
𝐻( ⃗𝑥(𝑖)) = �⃗� ⋅ Aug( ⃗𝑥(𝑖))
▶ Gather a data set, ( ⃗𝑥(1), 𝑦1), … , ( ⃗𝑥(𝑛), 𝑦𝑛).
▶ What is the most likely �⃗�?
![Page 62: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/62.jpg)
Maximum Likelihood▶ Suppose 3 people, (+,-,+).
▶ Likelihood:
𝐻( ⃗𝑥(1)) ⋅ (1 − 𝐻( ⃗𝑥(2))) ⋅ 𝐻( ⃗𝑥(3))
= 𝜎(�⃗� ⋅ Aug( ⃗𝑥(1))) ⋅ (1 − 𝜎(�⃗� ⋅ Aug( ⃗𝑥(2)))) ⋅ 𝜎(�⃗� ⋅ Aug( ⃗𝑥(3)))
= 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(1))
⋅ (1 − 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(2))
) ⋅ 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(3))
![Page 63: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/63.jpg)
Observation▶ Note:
1 − 11 + 𝑒−𝑡 =
11 + 𝑒𝑡
![Page 64: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/64.jpg)
Maximum Likelihood▶ The likelihood:
11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(1))
⋅ 11 + 𝑒�⃗�⋅Aug( ⃗𝑥(2))
⋅ 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(3))
▶ Suppose 𝑦𝑖 = 1 if positive, 𝑦𝑖 = −1 if negative:
11 + 𝑒−𝑦1�⃗�⋅Aug( ⃗𝑥(1))
⋅ 11 + 𝑒−𝑦2�⃗�⋅Aug( ⃗𝑥(2))
⋅ 11 + 𝑒−𝑦3�⃗�⋅Aug( ⃗𝑥(3))
![Page 65: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/65.jpg)
Maximum Likelihood▶ In general, the likelihood is:
L(�⃗�) =𝑛∏𝑖=1
11 + 𝑒−𝑦𝑖�⃗�⋅Aug( ⃗𝑥(𝑖))
▶ The log likelihood is:
logL(�⃗�) = −𝑛∑𝑖=1
log [1 + 𝑒−𝑦𝑖�⃗�⋅Aug( ⃗𝑥(𝑖))]
![Page 66: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/66.jpg)
Maximizing Likelihood▶ Goal: find �⃗� maximizing logL
▶ Take gradient, set to zero, solve?
▶ Problem: try it, you’ll get stuck.
▶ Unlike least-squares regression, there is noexplicit solution.
![Page 67: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression](https://reader033.vdocuments.us/reader033/viewer/2022060320/5f0ce6397e708231d437aeda/html5/thumbnails/67.jpg)
Next TimeHow to maximize the log loss with gradient descent.