noise robustness of optimization algorithms via
TRANSCRIPT
![Page 1: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/1.jpg)
Noise Robustness of Optimization Algorithms via Statistical Queries
Cristóbal Guzmá[email protected]
CWI Scientific Meeting27-11-2015
![Page 2: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/2.jpg)
Motivation
2
![Page 3: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/3.jpg)
Motivation
• In algorithms: Faster is better, right?
2
![Page 4: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/4.jpg)
Motivation
• In algorithms: Faster is better, right?
• Sometimes, other objectives are as important
2
![Page 5: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/5.jpg)
Motivation
• In algorithms: Faster is better, right?
• Sometimes, other objectives are as important
• This time, we care about robustness
2
![Page 6: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/6.jpg)
Goal: Find
Example: Gradient Methods
3
where smooth convex function, compact convex setf : X ! R X ✓ Rd
f
⇤ = minx2X
f(x)
![Page 7: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/7.jpg)
Goal: Find
Example: Gradient Methods
3
• Gradient Descent [Euler:1770]
x
t+1 = x
t � �trf(xt)
f
⇤ = minx2X
f(x)
![Page 8: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/8.jpg)
Goal: Find
Example: Gradient Methods
3
• Gradient Descent [Euler:1770]
x
t+1 = x
t � �trf(xt) ) f(xT )� f⇤ = O(1/T )
f
⇤ = minx2X
f(x)
![Page 9: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/9.jpg)
Goal: Find
Example: Gradient Methods
3
x
t = y
t � �trf(yt)
y
t+1 = x
t +↵t � 1
↵t+1(xt � x
t�1)
• Accelerated Gradient Method [Nesterov:1983]
• Gradient Descent [Euler:1770]
x
t+1 = x
t � �trf(xt) ) f(xT )� f⇤ = O(1/T )
f
⇤ = minx2X
f(x)
![Page 10: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/10.jpg)
Goal: Find
Example: Gradient Methods
3
x
t = y
t � �trf(yt)
y
t+1 = x
t +↵t � 1
↵t+1(xt � x
t�1)
• Accelerated Gradient Method [Nesterov:1983]
• Gradient Descent [Euler:1770]
x
t+1 = x
t � �trf(xt) ) f(xT )� f⇤ = O(1/T )
) f(xT )� f⇤ = O(1/T 2)
f
⇤ = minx2X
f(x)
![Page 11: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/11.jpg)
Example: Gradient Methods
4
Source: Moritz Hardt’s blog
g(x) = rf(x)
![Page 12: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/12.jpg)
g(x) = rf(x) + ⇠
Example: Gradient Methods
4
Source: Moritz Hardt’s blog
⇠ ⇠ N (0,�2), � = 0.1
![Page 13: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/13.jpg)
• There are other important reasons why acceleration could be harmful
Why Noise Robustness?
5
![Page 14: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/14.jpg)
• There are other important reasons why acceleration could be harmful
Why Noise Robustness?
5
• Overfitting, privacy leakage, etc.
![Page 15: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/15.jpg)
• There are other important reasons why acceleration could be harmful
Why Noise Robustness?
5
• Overfitting, privacy leakage, etc.
• Goal: A theory of optimization algorithms that incorporates noise sensitivity
"
TIterations
Accuracy
![Page 16: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/16.jpg)
• There are other important reasons why acceleration could be harmful
Why Noise Robustness?
5
• Overfitting, privacy leakage, etc.
• Goal: A theory of optimization algorithms that incorporates noise sensitivity
"
TIterations
Accuracy
Data
![Page 17: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/17.jpg)
Statistical Queries
6
• Unknown distribution supported on D W
W
D
1/⌧2
![Page 18: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/18.jpg)
Statistical Queries
6
• Unknown distribution supported on D W
• Algorithm has oracle access to D
� : W ! [�1, 1]
Ew⇠D
[�(w)]± ⌧
W
D
1/⌧2
![Page 19: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/19.jpg)
Statistical Queries
6
• Unknown distribution supported on D W
• Algorithm has oracle access to D
� : W ! [�1, 1]
Ew⇠D
[�(w)]± ⌧
W
D
1/⌧2
An oracle query can be emulated by samples1/⌧2
![Page 20: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/20.jpg)
Statistical Queries
6
• Unknown distribution supported on D W
• Goals
• Algorithm has oracle access to D
• Efficiency ~ small number of queries
� : W ! [�1, 1]
Ew⇠D
[�(w)]± ⌧
W
D
1/⌧2
![Page 21: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/21.jpg)
Statistical Queries
6
• Unknown distribution supported on D W
• Goals
• Algorithm has oracle access to D
• Efficiency ~ small number of queries• Noise tolerance ~ as large as possible ⌧
� : W ! [�1, 1]
Ew⇠D
[�(w)]± ⌧
W
D
1/⌧2
![Page 22: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/22.jpg)
Why Statistical Queries?
7
SQ algorithms have additional properties, or provide the basis for designing algorithms with additional features:
![Page 23: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/23.jpg)
Why Statistical Queries?
7
SQ algorithms have additional properties, or provide the basis for designing algorithms with additional features:
• Noise tolerance [Kearns:1994]
• Differential privacy [Blum, Dwork, McSherry, Nissim:2005], [Kasiviswanathan, Lee, Nissim, Raskhodnikova, Smith, 2011]
• Distributed computation [Balcan, Blum, Fine, Mansour:2012]
• Generalization in adaptive data analysis [Dwork, Feldman, Hardt, Pitassi, Reingold, Roth:2014]
![Page 24: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/24.jpg)
Statistical Query Convex Opt.
8
• Stochastic convex optimizationminx2X
Ew⇠D
[f(x,w)]
![Page 25: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/25.jpg)
Statistical Query Convex Opt.
8
• Stochastic convex optimizationminx2X
Ew⇠D
[f(x,w)]
• Statistical queries
• Function value: �(·) = f(x, ·)
![Page 26: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/26.jpg)
Statistical Query Convex Opt.
8
• Stochastic convex optimizationminx2X
Ew⇠D
[f(x,w)]
• Statistical queries
• Function value: �(·) = f(x, ·)
• Gradient:�(·) = @f(x, ·)
@xii = 1, . . . , d
![Page 27: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/27.jpg)
Statistical Query Convex Opt.
8
• Difficulty: estimation in 2-norm accumulates errors
• Stochastic convex optimizationminx2X
Ew⇠D
[f(x,w)]
• Statistical queries
• Function value: �(·) = f(x, ·)
• Gradient:�(·) = @f(x, ·)
@xii = 1, . . . , d
gi = Ew⇠D
@f(x,w)
@xi
�± ⌧ ) krEw[f(x,w)]� gk2 ⇡
pd⌧
![Page 28: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/28.jpg)
Optimal Estimation Algorithm
9
[Feldman, G., Vempala:2015]
Q: Is it possible to avoid noise accumulation in gradient estimation?
W
D
![Page 29: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/29.jpg)
Optimal Estimation Algorithm
9
[Feldman, G., Vempala:2015]
Q: Is it possible to avoid noise accumulation in gradient estimation?
W
D a
![Page 30: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/30.jpg)
Optimal Estimation Algorithm
9
[Feldman, G., Vempala:2015]
Q: Is it possible to avoid noise accumulation in gradient estimation?
W
D
U
![Page 31: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/31.jpg)
Optimal Estimation Algorithm
9
[Feldman, G., Vempala:2015]
Q: Is it possible to avoid noise accumulation in gradient estimation?
W
D
U
With high probability krEw[f(x,w)]� gk2 ⇡ O(
plog d)⌧
![Page 32: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/32.jpg)
Further Consequences
10
New statistical query algorithms for optimization and machine learning
[Feldman, G., Vempala:2015]
![Page 33: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/33.jpg)
Further Consequences
10
New statistical query algorithms for optimization and machine learning
[Feldman, G., Vempala:2015]
• Gradient methods: mirror-descent, accelerated method, strongly convex minimization• Polynomial-time algorithms: center of gravity, interior-point method• Improved algorithms for high-dimensional classification (Perceptron) and regression• Improved differentially-private convex optimization algorithms
![Page 34: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/34.jpg)
Further Consequences
10
New statistical query algorithms for optimization and machine learning
[Feldman, G., Vempala:2015]
• Gradient methods: mirror-descent, accelerated method, strongly convex minimization• Polynomial-time algorithms: center of gravity, interior-point method• Improved algorithms for high-dimensional classification (Perceptron) and regression• Improved differentially-private convex optimization algorithms
We provide new structural lower bounds for convex optimization
![Page 35: Noise Robustness of Optimization Algorithms via](https://reader033.vdocuments.us/reader033/viewer/2022041802/6254271239a900642a40645b/html5/thumbnails/35.jpg)
Thank you
11