s lecture08–part01 modelcomplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... ·...

67
Lecture 08 – Part 01 Model Complexity

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

4 2 0 2 4y

0.0

0.2

0.4

0.6

0.8

1.0

s(y)

Lecture 08 – Part 01Model Complexity

Page 2: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

0.0 0.2 0.4 0.6 0.8 1.0x

2.5

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

y

Page 3: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Empirical Risk Minimization1. Pick a model.

▶ E.g., linear prediction rules.

2. Pick a loss.▶ E.g., mean squared error.

3. Find a prediction rule minimizing the risk.

Page 4: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Big Decision▶ Pick a model.

▶ Picking the wrong model causes problems.

Page 5: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Underfitting▶ Fit 𝐻(𝑥) = 𝑤0 + 𝑤1𝑥?

▶ We have underfit the data.

0.0 0.2 0.4 0.6 0.8 1.02.5

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

Page 6: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Overfitting

▶ Fit 𝐻(𝑥) = 𝑤0 + 𝑤1𝑥 + 𝑤2𝑥2 + … + 𝑤10𝑥10?▶ We have overfit the data.

0.0 0.2 0.4 0.6 0.8 1.0

0

5

10

15

Page 7: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Model Complexity▶ Difference? Complexity.

▶ Complex models are highly flexible.▶ They tend to overfit.

▶ Simple models are stiff.▶ They tend to underfit.

Page 8: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 9: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 10: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 11: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 12: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 13: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 14: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 15: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 16: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 17: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 10 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 18: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 19: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 20: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 21: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 22: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 23: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 24: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 25: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 26: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 27: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Degree 1 Polynomial

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 28: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Example: kNN▶ 1NN: complex model, likely to overfit.

▶ 20NN: less complex model, likely to underfit.

Page 29: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Choosing Model Complexity▶ How do we choose between two models?

▶ Between degree 10 and degree 1?▶ Between 1NN and 20NN?

▶ Not always obvious.

Page 30: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Bad Idea: Use training MSE▶ Which has smaller MSE on training data?

▶ Problem: Best 10-degree polynomial will alwayshave smaller MSE on training data.

Page 31: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Bad Idea: Use training MSE▶ Which has smaller MSE on training data?

▶ Problem: Best 10-degree polynomial will alwayshave smaller MSE on training data.

Page 32: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Good Idea: Use validation MSE▶ We care about generalization.

▶ So keep a small amount of data hidden in avalidation set.

▶ Fit model on training data, compute MSE onvalidation set.

▶ Pick whichever model has smaller validationerror.

Page 33: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

What do you expect?▶ You fit a complex model on training data.

▶ Test it on a validation set.

▶ Likely: validation MSE > training MSE.

Page 34: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

What do you expect?▶ You fit a very simple model on training data.

▶ Test it on a validation set.

▶ Likely: validation MSE ≈ training MSE.

Page 35: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Cross-Validation▶ We want all the training data we can get.

▶ Reserving some of it is wasteful.

▶ Idea: split data into pieces, each takes turn asvalidation set.

Page 36: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

k-Fold Cross Validation1. Split data set into 𝑘 pieces, 𝑆1, … , 𝑆𝑘.

2. Loop 𝑘 times; on iteration 𝑖:▶ Use 𝑆𝑖 as validation set; rest as training.▶ Compute validation error 𝜖𝑖

3. Overall error: 1𝑘 ∑𝜖𝑖

Page 37: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Leave-One-Out Cross Validation▶ Suppose we have 𝑛 labeled data points.

▶ LOOCV: 𝑘-fold CV with 𝑘 = 𝑛.

Page 38: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Another Approach▶ We can control complexity by choosing model.

▶ Also: via regularization.

Page 39: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Regularization

▶ Let’s fit a complex model: 𝑤0 + 𝑤𝑥𝑥 + … + 𝑤10𝑥10.

▶ But impose a budget on weights, 𝑤0, … , 𝑤𝑑 .

0.0 0.2 0.4 0.6 0.8 1.0x

2.5

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5y

Page 40: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Budgeting Weights▶ One way to budget: ask that ‖�⃗�‖2 is small.

▶ Before: minimize

𝑅sq(�⃗�) =1𝑛

𝑛∑𝑖=1(�⃗� ⋅ ⃗𝑥(𝑖) − 𝑦𝑖)2

▶ Now: minimize

�̃�sq(�⃗�) =1𝑛

𝑛∑𝑖=1(�⃗� ⋅ ⃗𝑥(𝑖) − 𝑦𝑖)2 + 𝜆‖�⃗�‖2

Page 41: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Solution▶ The regularized Normal Equations:

(𝑋𝑇𝑋 + 𝜆𝐼)�⃗� = 𝑋𝑇 ⃗𝑦

Page 42: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Example

𝜆 = 0

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 43: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Example

𝜆 = 1 × 10−4

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 44: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Example

𝜆 = 1

0.0 0.2 0.4 0.6 0.8 1.05

0

5

10

15

20

25

Page 45: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Regularization▶ As 𝜆 increases, simpler models preferred.

▶ Pick 𝜆 using cross-validation.

Page 46: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Other Penalizations▶ ‖�⃗�‖22 is ℓ2 regularization (explicit solution)

▶ a.k.a., ridge regression

▶ ‖�⃗�‖1 is ℓ1 regularization (no explicit solution)▶ a.k.a., the LASSO▶ encourages sparse �⃗�

Page 47: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

4 2 0 2 4y

0.0

0.2

0.4

0.6

0.8

1.0

s(y)

Lecture 08 – Part 02Logistic Regression

Page 48: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Note▶ The midterm will cover everything up to rightnow.

▶ Regularization: yes.

▶ Logistic regression: no.

Page 49: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Predicting Heart Disease▶ Classification problem: Does a patient haveheart disease?

▶ Features: blood pressure, cholesterol level,exercise amount, maximum heart rate, sex

Page 50: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Better idea...▶ Instead of predicting yes/no...

▶ Give a probability that they have heart disease.▶ 1 = definitely yes▶ 0 = definitely no▶ 0.75 = probably, yes▶ …

Page 51: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Associations▶ If cholesterol is high, increased likelihood.

▶ Positive association.

▶ If exercise is low, increased likelihood.▶ Negative association.

Page 52: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

The Model▶ Measure cholesterol (𝑥1), exercise (𝑥2), etc.

▶ Idea: weighted1 “vote” for heart disease:

𝑤1𝑥1 + 𝑤2𝑥2 + … + 𝑤𝑑𝑥𝑑

▶ Convention:▶ A positive number = vote for yes▶ A negative number = vote for no

1We’ll learn weights later.

Page 53: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

The Model▶ Add a “bias” term:

𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + … + 𝑤𝑑𝑥𝑑= �⃗� ⋅ Aug( ⃗𝑥)

▶ The more positive �⃗� ⋅ Aug( ⃗𝑥), the more likely.

▶ The more negative �⃗� ⋅ Aug( ⃗𝑥), the less likely.

Page 54: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Converting to a Probability▶ Probabilities are between 0 and 1.

▶ Problem: �⃗� ⋅ Aug( ⃗𝑥) can be anything in (−∞,∞)

▶ We need to convert it to a probability.

Page 55: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

The Logistic Function

𝜎(𝑡) = 11 + 𝑒−𝑡

4 2 0 2 4y

0.0

0.2

0.4

0.6

0.8

1.0

s(y)

Page 56: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

The Model▶ Our simplified model for probability of heartdisease:

𝐻( ⃗𝑥) = 𝜎(�⃗� ⋅ Aug( ⃗𝑥))

▶ What should �⃗� be?

▶ To find �⃗�, use principle of maximum likelihood.

Page 57: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximum Likelihood▶ Suppose you have an unfair coin.

▶ Probability of heads is 𝑝, unknown.

▶ Flip 8 times and see: H, H, T, H, H, H, H, T

▶ Which is more likely: 𝑝 = 0.5 or 𝑝 = 0.75?

Page 58: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximum Likelihood▶ Assume coin flips are independent.

▶ The likelihood of H, H, T, H, H, H, H, T is:

L(𝑝) = 𝑝 ⋅ 𝑝 ⋅ (1 − 𝑝) ⋅ 𝑝 ⋅ 𝑝 ⋅ 𝑝 ⋅ 𝑝 ⋅ (1 − 𝑝)= 𝑝6(1 − 𝑝)2

▶ Idea: find 𝑝 maximizing L(𝑝)▶ Equivalently, find 𝑝 maximizing logL(𝑝)

Page 59: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximum LikelihoodFind 𝑝 maximizing logL(𝑝) = log𝑝6(1 − 𝑝)2:

Page 60: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximum Likelihood▶ In general, given 𝑛1 observed heads, 𝑛2observed tails, maximize:

log [𝑃(𝐹1 = 𝑓1) ⋅ 𝑃(𝐹2 = 𝑓2) ⋯ ⋅ 𝑃(𝐹𝑛 = 𝑓𝑛)]

=𝑛∑𝑖=1

log𝑃(𝐹1 = 𝑓1)

Page 61: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Back to Logistic Regression▶ The probability that person 𝑖 has heart disease:

𝐻( ⃗𝑥(𝑖)) = �⃗� ⋅ Aug( ⃗𝑥(𝑖))

▶ Gather a data set, ( ⃗𝑥(1), 𝑦1), … , ( ⃗𝑥(𝑛), 𝑦𝑛).

▶ What is the most likely �⃗�?

Page 62: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximum Likelihood▶ Suppose 3 people, (+,-,+).

▶ Likelihood:

𝐻( ⃗𝑥(1)) ⋅ (1 − 𝐻( ⃗𝑥(2))) ⋅ 𝐻( ⃗𝑥(3))

= 𝜎(�⃗� ⋅ Aug( ⃗𝑥(1))) ⋅ (1 − 𝜎(�⃗� ⋅ Aug( ⃗𝑥(2)))) ⋅ 𝜎(�⃗� ⋅ Aug( ⃗𝑥(3)))

= 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(1))

⋅ (1 − 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(2))

) ⋅ 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(3))

Page 63: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Observation▶ Note:

1 − 11 + 𝑒−𝑡 =

11 + 𝑒𝑡

Page 64: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximum Likelihood▶ The likelihood:

11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(1))

⋅ 11 + 𝑒�⃗�⋅Aug( ⃗𝑥(2))

⋅ 11 + 𝑒−�⃗�⋅Aug( ⃗𝑥(3))

▶ Suppose 𝑦𝑖 = 1 if positive, 𝑦𝑖 = −1 if negative:

11 + 𝑒−𝑦1�⃗�⋅Aug( ⃗𝑥(1))

⋅ 11 + 𝑒−𝑦2�⃗�⋅Aug( ⃗𝑥(2))

⋅ 11 + 𝑒−𝑦3�⃗�⋅Aug( ⃗𝑥(3))

Page 65: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximum Likelihood▶ In general, the likelihood is:

L(�⃗�) =𝑛∏𝑖=1

11 + 𝑒−𝑦𝑖�⃗�⋅Aug( ⃗𝑥(𝑖))

▶ The log likelihood is:

logL(�⃗�) = −𝑛∑𝑖=1

log [1 + 𝑒−𝑦𝑖�⃗�⋅Aug( ⃗𝑥(𝑖))]

Page 66: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Maximizing Likelihood▶ Goal: find �⃗� maximizing logL

▶ Take gradient, set to zero, solve?

▶ Problem: try it, you’ll get stuck.

▶ Unlike least-squares regression, there is noexplicit solution.

Page 67: s Lecture08–Part01 ModelComplexitysierra.ucsd.edu/cse151a-2020-sp/lectures/08... · OtherPenalizations ‖ ⃗ ‖2 2isℓ2regularization(explicitsolution) a.k.a.,ridgeregression

Next TimeHow to maximize the log loss with gradient descent.