stochastic optimization for large-scale optimal transport (ot) · 2018. 3. 27. · stochastic...
TRANSCRIPT
![Page 1: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/1.jpg)
Stochastic Optimization forLarge-scale Optimal Transport (OT)
Sixiong You
Mechanical and Aerospace Engineering Department
![Page 2: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/2.jpg)
2
I. Introduction
• Motivation• Optimal transport (OT) defines a powerful framework to compare
probability distributions in a geometrically faithful way.• Previous works are purely discrete and cannot cope with continuous
densities, The only known class of methods that can overcome thislimitation are so-called semi-discrete solvers.
• In addition, the practical impact of OT is still limited because of itscomputational burden
• This paper propose a new class of stochastic optimization algorithmsto cope with large-scale OT problems.
![Page 3: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/3.jpg)
3
I. Introduction
• This paper introduces three kinds of stochastic optimization methods to cope with three possible settings:
• Discrete OT: compare a discrete vs. another discrete measure Stochastic averaged gradient (SAG) method• Semi-discrete OT: compare a discrete vs. a continuous measure Averaged stochastic gradient descent (SGD)• Continous OT: to compare a continuous vs. another continuous
measure makes use of an expansion of the dual variables in a reproducing
kernel Hilbert space (RKHS)
![Page 4: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/4.jpg)
4
II. Problem Formulation
The definition of joint probability measures
![Page 5: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/5.jpg)
5
II. Problem Formulation
The definition of Kullback-Leibler divergence
![Page 6: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/6.jpg)
6
II. Problem FormulationThe Kantorovich formulation of OT and its entropic regularization can be written in a single convex optimization problem
In which is the cost to move a unit of mass from x to y( , )c x y
![Page 7: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/7.jpg)
7
II. Problem Formulation
Fenchel-Rockafellar’s dual theorem
Solving ,plugging this expression back in ,( , )0, 0 cF u v u vu
εεε ∂> = ⇒ =
∂( )D ε
![Page 8: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/8.jpg)
8
II. Problem Formulation
![Page 9: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/9.jpg)
9
III. Discrete Optimal Transport
Stochastic gradient descent (SGD): the gradient of that term can be used as a proxy for the full gradient in a standard gradient ascent step to maximize
![Page 10: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/10.jpg)
10
III. Discrete Optimal TransportStochastic gradient descent (SGD)
Example: For an optimization problem
( )1
1min min ( )n
ii
J Q Qnω ω
ω ω=
= = ∑Objective function
When used to minimize the above function, a standard (or "batch") gradient descent method would perform the following iterations :
( ) ( )1
: /n
iQ Q nω ω η ω ω η ω
=
= − ∇ = − ∇∑
where is a step size. However, evaluating the sum-gradient may require expensive evaluations of the gradients from all summand functions. To economize on the computational cost at every iteration, stochastic gradient descent samples a subset of summand functions at every step. This is very effective in the case of large-scale machine learning problems.
𝜂𝜂
![Page 11: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/11.jpg)
11
III. Discrete Optimal TransportStochastic gradient descent (SGD)
In stochastic (or "on-line") gradient descent, the true gradient of isapproximated by a gradient at a single example
𝑄𝑄 𝜔𝜔
( ): iQω ω η ω= − ∇
In pseudocode, stochastic gradient descent can be presented as follows:
![Page 12: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/12.jpg)
12
III. Discrete Optimal TransportAveraged stochastic gradient descent (Average SGD)
Invented independently by Ruppert and Polyak in the late 1980s, is ordinarystochastic gradient descent that records an average of its parameter vectorover time. That is, the update is the same as for ordinary stochastic gradientdescent, but the algorithm also keeps track of
1
0
1 t
iit
ω ω−
=
= ∑
When optimization is done, this averaged parameter vector takes the place of 𝜔𝜔
Stochastic averaged gradient (SAG): the stochastic average gradientmethod with a (user-supplied) constant step size.
![Page 13: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/13.jpg)
13
III. Discrete Optimal Transport
Stepsize
Initial gi =0
Update gradient
The flow chat of SAG for discrete OT
Output
![Page 14: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/14.jpg)
14
III. Discrete Optimal TransportNumerical Illustrations on Bags of Word-Embeddings
![Page 15: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/15.jpg)
15
IV. Semi-Discrete Optimal Transport
Stepsize
Update output
The flow chat of SGD for discrete OT
Output
![Page 16: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/16.jpg)
16
IV. Semi-Discrete Optimal TransportNumerical Illustrations
![Page 17: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/17.jpg)
17
IV. Continuous Optimal Transport
![Page 18: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/18.jpg)
18
IV. Continuous Optimal Transport
Stepsize C and Kernels
Update output
The flow chat of Kernel SGD for discrete OT
![Page 19: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/19.jpg)
19
IV. Continuous Optimal TransportNumerical Illustrations
![Page 20: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new](https://reader033.vdocuments.us/reader033/viewer/2022051920/600cfed2d4ff0f782f0fc370/html5/thumbnails/20.jpg)
20
Thank You!