Report copyright - Mini-Course 3: Convergence Analysis of Neural …Optimization I In practice, SGD always nds good local minima. I SGD: stochastic gradient descent I x t+1 = x t g t, E[g t] = rf(x t)
Please pass captcha verification before submit form