machine learning week 4 lecture 1. hand in data is coming online later today. i keep test set with...
TRANSCRIPT
- Slide 1
- Machine Learning Week 4 Lecture 1
- Slide 2
- Hand In Data Is coming online later today. I keep test set with approx. 1000 test images That will be your real test You are most welcome to add regularization as we discussed last week. It is not a requirement. Hand in Version 4 available
- Slide 3
- Recap What is going on Ways to fix it
- Slide 4
- Overfitting Data Increases -> Overfitting Decreases Noise Increases -> Overfitting Increases Target Complexity Increase -> Overfitting Increases
- Slide 5
- Learning Theory Perspective In Sample Error + Model Complexity Instead of picking simpler hypothesis set Prefer Simpler hypotheses h from Define what simple means in complexity measure Minimize
- Slide 6
- Regularization In Sample Error + Model Complexity Weight Decay Decay Every round we take a step towards the zero vector
- Slide 7
- Why are small weights better Practical Perspective Because in practice we believe that Noise is Noisy Stochastic Noise High Frequency Deterministic Noise also non-smooth Sometimes weight are weighed differently Bias Term gets free ride
- Slide 8
- Regularization Summary More Art Than Science Use VC and Bias Variance as guides Weight Decay universal technique practical believe that noise is noisy (non-smooth) Question. Which to use Many other regularizers exist. Extremely Important. Quote Book: Necessary Evil
- Slide 9
- Validation Regularization Estimates Validation Estimate Remember the test set
- Slide 10
- Model Selection t Models m 1,,m t Which is better? E val (m 1 ) E val (m 2 ). E val (m t ) Pick the minimum one Compute Train on D train Validate on D val Use to find for my weight decay
- Slide 11
- Cross Validation Increasing K Dilemma E val estimate tightens E val increases Small K Large K We would like to have both. Cross Validation
- Slide 12
- K-Fold Cross Validation Split Data in N/K Parts of size K Test Train all but one set. Test on remaining. Pick one who is best on average over N/K partitions Usual K = N/10 (we do not have all day)
- Slide 13
- Today: Support Vector Machines Margins Intuition Optimization Problem Convex Optimization Lagrange Multipliers Lagrange for SVM WARNING: Linear Algebra and function analysis coming up
- Slide 14
- Support Vector Machines Today Next Time
- Slide 15
- Notation Target y is in {-1,+1} We write parameters as w and b The hyperplane we consider is w T x + b = 0 Data D = {x i,y i ) For now assume D is linear separable 0 for some i maximize over i >0 then i g i (x) is unbounded h i (x) 0 for some i maximize ove">
- Primal Problem If x is primal infeasible: g i (x) >0 for some i maximize over i >0 then i g i (x) is unbounded h i (x) 0 for some i maximize over then i h i (x) is unbounded x is primal infeasible if g i (x) < 0 for some i or h i (x) 0 for some i Primal Problem
- Slide 31
- If x is primal feasible: g i (x) 0 for all i maximize over i 0 then optimal is i =0 h i (x) = 0 for all i maximize over then i h i (x) = 0, is irrelevant
- Slide 32
- Primal Problem Made constraints into value in optimization function Which is what we are looking for!!! is an optimal x
- Slide 33
- Dual Problem , are dual feasible if i 0 for all i This implies
- Slide 34
- Weak and Strong Duality Question: When are they equal?
- Slide 35
- Strong Duality: Slaters Condition If f,g i are convex and h i is affine and the problem is strictly feasible e.g. exist primal feasible x such g i (x) < 0 for all i then d* = p * (strong duality) Assume that is the case
- Slide 36
- Complementary Slackness Let x* be primal optimal *,* dual optimal (p*=d*) All Non-Negative for all i Complimentary Slackness
- Slide 37
- Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal *,* dual optimal (p*=d*) g i (x*) 0, for all i i * 0 for all i i * g i (x*) = 0 for all i h i (x*) = 0 for all i Primal Feasibility Dual Feasibility Complementary Slackness Since x* minimizes Stationary KKT Conditions for optimality, necessary and sufficient.
- Slide 38
- Finally Back To SVM Subject To Minimize Define the Lagrangian (no required)
- Slide 39
- SVM Dual Form Need to minimize. We take derivatives and solve for 0 Solve for 0 w is a vector that is a specific linear combination of input points
- Slide 40
- SVM Dual Form Which must be 0. We get constraint
- Slide 41
- SVM Dual Form Insert Above
- Slide 42
- SVM Dual Form Insert Above
- Slide 43
- SVM Dual Form
- Slide 44
- SVM Dual Problem Found the minimum over w,b now maximize over Subject To Remember
- Slide 45
- Intercerpt b* Case: y i = 1 Cases: y i =-1 Constraint
- Slide 46
- Making Predictions Sign of Support Vectors
- Slide 47
- w Complementary Slackness Support vectors are the vectors that support the plane
- Slide 48
- SVM Summary Subject To Support Vectors w