l05 non-convex optimization: gd + noise converges to ... · analysis of perturbed gradient descent...
TRANSCRIPT
![Page 1: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/1.jpg)
50.579 Optimization for Machine Learning
Ioannis Panageas
ISTD, SUTD
L05Non-convex Optimization: GD + noise converges to second order stationarity
![Page 2: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/2.jpg)
Recap
Optimization for Machine Learning
![Page 3: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/3.jpg)
Recap
Optimization for Machine Learning
• This is only true in the unconstrained case!
![Page 4: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/4.jpg)
Recap
Optimization for Machine Learning
• This is only true in the unconstrained case!
![Page 5: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/5.jpg)
Recap
Optimization for Machine Learning
• This is only true in the unconstrained case!
![Page 6: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/6.jpg)
Vanishing step-sizes
Optimization for Machine Learning
![Page 7: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/7.jpg)
Vanishing step-sizes
Optimization for Machine Learning
![Page 8: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/8.jpg)
Vanishing step-sizes
Optimization for Machine Learning
![Page 9: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/9.jpg)
Vanishing step-sizes
Optimization for Machine Learning
![Page 10: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/10.jpg)
Vanishing step-sizes
Optimization for Machine Learning
![Page 11: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/11.jpg)
Definitions
Optimization for Machine Learning
![Page 12: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/12.jpg)
Convergence to first order stationarity
Optimization for Machine Learning
![Page 13: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/13.jpg)
Convergence to first order stationarity
Optimization for Machine Learning
![Page 14: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/14.jpg)
Convergence to first order stationarity
Optimization for Machine Learning
![Page 15: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/15.jpg)
Convergence to first order stationarity
Optimization for Machine Learning
![Page 16: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/16.jpg)
Perturbed Gradient Descent
Optimization for Machine Learning
![Page 17: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/17.jpg)
Analysis of Perturbed Gradient Descent
Optimization for Machine Learning
• High level proof strategy:
1) When the current iterate is not an 𝜖-second order stationary point, it must either (a) have a large gradient or (b) have a strictly negative eigenvalue the Hessian.
2) We can show in both cases that yield a significant decrease in function value in a controlled number of iterations.
3) Since the decrease cannot be more that f x0 − 𝑓(𝑥∗) (global minimum is bounded) we can reach contradiction.
![Page 18: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/18.jpg)
Analysis of Perturbed Gradient Descent
Optimization for Machine Learning
![Page 19: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/19.jpg)
Analysis of Perturbed Gradient Descent
Optimization for Machine Learning
![Page 20: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/20.jpg)
Analysis of Perturbed Gradient Descent
Optimization for Machine Learning
![Page 21: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/21.jpg)
Analysis of Perturbed Gradient Descent
Optimization for Machine Learning
![Page 22: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/22.jpg)
Analysis of Perturbed Gradient Descent
Optimization for Machine Learning
![Page 23: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/23.jpg)
Analysis of Perturbed Gradient Descent
Optimization for Machine Learning
![Page 24: L05 Non-convex Optimization: GD + noise converges to ... · Analysis of Perturbed Gradient Descent Optimization for Machine Learning • High level proof strategy: 1) When the current](https://reader036.vdocuments.us/reader036/viewer/2022081617/60376f9d07ad27233d74ee4a/html5/thumbnails/24.jpg)
Conclusion
• Introduction to Non-convex Optimization.
– Perturbed Gradient Descent avoids strict saddles!
– Same is true for Perturbed SGD.
• Next lecture we will talk about more about accelerated methods.
• Week 8 we are going to talk about min-max optimization (GANs).
SUTD ISTD 50.004 Intro to Algorithms