blind online optimization gradient descent without a gradient abie flaxman cmu adam tauman kalai tti...
TRANSCRIPT
Blind online optimizationGradient descent without a gradient
Abie Flaxman CMU
Adam Tauman Kalai TTI
Brendan McMahan CMU
Standard convex optimization
Convex feasible set S ½ <d
Concave function f : S ! <
}
Goal: find x
f(x) ¸ maxz2Sf(z) – = f(x*) -
x*Rd
Steepest ascent
• Move in the direction of steepest ascent
• Compute f’(x) (rf(x) in higher dimensions)
• Works for convex optimization
(and many other problems)
x1 x2x3x4
Typical application
• Company produces certain numbers of cars per month
• Vector x 2 <d (#Corollas, #Camrys, …)
• Profit of company is concave function of production vector
• Maximize total (eq. average) profit
PROBLEMS
• Sequence of unknown concave functions
• period t: pick xt 2 S, find out only ft(xt)
• convex
Problem definition and results
Theorem:
Online model
• Holds for arbitrary sequences
• Stronger than stochastic model:– f1, f2, …, i.i.d. from D
– x* = arg minx2S ED[f(x)]
expected
regret
Outline
• Problem definition
• Simple algorithm
• Analysis sketch
• Variations
• Related work & applications
First try
x1
f1(x1)
PR
OF
IT
#CAMRYSx2
f2(x2)
x3
f3(x3)
x4
f4(x4)
f1f2f3
f4
Zinkevich ’03:
If we could only compute gradients…
x*
Idea: one point gradientP
RO
FIT
#CAMRYSxx+x-
With probability ½, estimate = f(x + )/
With probability ½, estimate = –f(x – )/
E[ estimate ] ¼ f’(x)
d-dimensional online algorithm
S
x1
x2
x3
x4
Outline
• Problem definition
• Simple algorithm
• Analysis sketch
• Variations
• Related work & applications
Analysis ingredients
• E[1-point estimate] is gradient of
• is small
• Online gradient ascent analysis [Z03]
• Online expected gradient ascent analysis
• (Hidden complications)
1-pt gradient analysisP
RO
FIT
#CAMRYSx+x-
1-pt gradient analysis (d-dim)
• E[1-point estimate] is gradient of
• is small 2
•
• 1
Online gradient ascent [Z03]
•
•
•
(concave,
bounded gradient)
Expected gradient ascent analysis
• Regular deterministic gradient ascent on gt
(concave,
bounded gradient)
Hidden complication…
S
Hidden complication…
S
Hidden complication…
S’
Hidden complication…
Thin sets are bad
S
Hidden complication…
Round sets are good
…reshape into
“isotropic position”
[LV03]
Outline
• Problem definition
• Simple algorithm
• Analysis sketch
• Variations
• Related work & applications
Variations
•
• Works against adaptive adversary– Chooses ft knowing x1, x2, …, xt-1
• Also works if we only get a noisy estimate of ft(xt), i.e. E[ht(xt)|xt]=ft(xt)
diameter
gradient
bound
Finite difference
Related convex optimization
Sighted(see entire function(s))
Blind (evaluations only)
Regular(single f)
Stochastic(dist over f’s or
dist over errors)
Online(f1, f2, f3, …)
Gradient descent (stoch.)
Gradient descent, ... Ellipsoid, Random walk [BV02],
Sim. annealing [KV05],
Finite difference
Gradient descent (online)
[Z03]
1-pt. gradient appx. [BKM04]
Finite difference [Kleinberg04]
1-pt. gradient appx.
[G89,S97]
2
2 3 5
2 3 5
2 5
2 3 5
Multi-armed bandit (experts)
1
0
0
0S
[R52,ACFS95,…]
Driving to work (online routing)
Exponentially many paths…
Exponentially many slot machines?
Finite dimensions
Exploration/exploitation tradeoff
25
[TW02,KV02,
AK04,BM04]
S
Online product design
Conclusions and future work
• Can “learn” to optimize a sequence of unrelated functions from evaluations
• Answer to:“What is the sound of one hand clapping?”
• Applications– Cholesterol– Paper airplanes– Advertising
• Future work– Many players using same algorithm
(game theory)