geometry and statistics in high-dimensional structured ... · geometry and statistics in...
TRANSCRIPT
![Page 1: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/1.jpg)
Geometry and Statistics in High-Dimensional Structured Optimization
Yuanming ShiShanghaiTech University
1
![Page 2: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/2.jpg)
Outline
Motivations Issues on computation, storage, nonconvexity,…
TwoVignettes: Structured Sparse Optimization
Geometry of Convex Statistical Optimization
Fast Convex Optimization Algorithms
Generalized Low-rank Optimization
Geometry of Nonconvex Statistical Optimization
Scalable Riemannian Optimization Algorithms
Concluding remarks2
![Page 3: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/3.jpg)
Motivation: High-Dimensional Statistical Optimization
3
![Page 4: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/4.jpg)
Motivations
The era of massive data sets Lead to new issues related to modeling, computing, and statistics.
Statistical issues Concentration of measure: high-dimensional probability
Importance of “low-dimensional” structures: sparsity and low-rankness
Algorithmic issues Excessively large problem dimension, parameter size
Polynomial-time algorithms often not fast enough
Non-convexity in general formulations4
![Page 5: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/5.jpg)
Issue A: Large-scale structured optimization
Explosion in scale and complexity of the optimization problem formassive data set processing
Questions:
How to exploit the low-dimensional structures (e.g., sparsity and low-rankness) to assist efficient algorithms design?
5
1 0 0
1 0 0
0 1 00 1 0
0 1
![Page 6: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/6.jpg)
Issue B: Computational vs. statistical efficiency
Massive data sets require very fast algorithms but with rigorousguarantees: parallel computing and approximations are essential
Questions: When is there a gap between polynomial-time and exponential-time algorithms?
What are the trade-offs between computational and statistical efficiency?
6
![Page 7: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/7.jpg)
Issue C: Scalable nonconvex optimization
Nonconvex optimization may be super scary: saddle points, local optima
Question: How to exploit the geometry of nonconvex programs to guarantee
optimality and enable scalability in computation and storage?7
Fig. credit: Chen
![Page 8: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/8.jpg)
Vignettes A: Structured Sparse Optimization
8
1. Geometry of Convex Statistical Estimation1) Phase transitions of random convex programs2) Convex geometry, statistical dimension
2. Fast Convex Optimization Algorithms1) Homogeneous self-dual embedding2) Operator splitting, ADMM
![Page 9: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/9.jpg)
High-dimensional sparse optimization
Let be an unknown structured sparse signal
Individual sparsity for compressed sensing
Let be a convex function that reflects structure, e.g., -norm
Let be a measurement operator
Observe
Find estimate by solving convex program
Hope:9
![Page 10: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/10.jpg)
Application: High-dimensional IoT data analysis
Machine-type communication (e.g., massive IoT devices) with sporadictraffic: massive device connectivity
10
Sporadic traffic: only a small fraction of potentially large
number of devices are active for data acquisition (e.g.,
temperature measurement)
![Page 11: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/11.jpg)
Application: High-dimensional IoT data analysis
Cellular network with massive number of devices
Single-cell uplink with a BS with antennas; Total single-antenna devices,active devices (sporadic traffic)
Define diagonal activity matrix with non-zero diagonals
denotes the received signal across antennas
: channel matrix from all devices to the BS
: known transmit pilot matrix from devices11
![Page 12: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/12.jpg)
Group sparse estimation
Let (unknown): group sparsity in rows of matrix
Let be a known measurement operator (pilot matrix)
Observe
Find estimate by solving a convex program
is mixed -norm to reflect group sparsity structure12
![Page 13: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/13.jpg)
Geometry of Convex Statistical Optimization
13
![Page 14: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/14.jpg)
Geometric view: sparsity
Sparse approximation via convex hull
14
1-sparse vectors of Euclidean norm 1
convex hull: -norm
![Page 15: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/15.jpg)
Geometric view: low-rank
Low-rank approximation via convex hull
15
2x2 rank 1 symmetric matrices (normalized)
convex hull: nuclear norm
![Page 16: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/16.jpg)
Geometry of sparse optimization
Descent cone of a function at a point is
16References: Rockafellar 1970
Fig. credit: Chen
![Page 17: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/17.jpg)
Geometry of sparse optimization
17
References: Candes–Romberg–Tao 2005, Rudelson–Vershynin 2006, Chandrasekaran et al. 2010, Amelunxen et al. 2013
Fig. credit: Tropp
![Page 18: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/18.jpg)
Sparse optimization with random data
Assume
The vector is unknown
The observation where is standard normal
The vector solves
Then
18statistical dimension [Amelunxen-McCoy-Tropp’13]
![Page 19: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/19.jpg)
Statistical dimension
The statistical dimension of a closed, convex cone is
is the Euclidean projection onto ; is a standard normal vector
19Fig. credit: Tropp
![Page 20: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/20.jpg)
Examples for statistical dimension
Example 1: -minimization for compressed sensing
with non-zero entries
Example II: -minimization for massive device connectivity
with non-zero rows
20
![Page 21: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/21.jpg)
Numerical phase transition
Compressed sensing with -minimization
21
Fig. credit: Amelunxen-McCoy-Tropp’13
![Page 22: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/22.jpg)
Numerical phase transition
User activity detection via -minimization
22
group-structured sparsity estimation
![Page 23: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/23.jpg)
Summary of convex statistical optimization
Theoretical foundations for sparse optimization
Convex relaxation: convex hull, convex analysis
Fundamental bounds for convex methods: convex geometry, high-dimensionalstatistics
Computational limits for (convexified) sparse optimization
Custom methods (e.g., stochastic gradient descent): not generalizable forcomplicated problems
Generic methods (e.g., CVX): not scalable to large problem sizes
23
Can we design a unified framework for general large-scale convex programs?
![Page 24: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/24.jpg)
Fast Convex Optimization Algorithms
24
![Page 25: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/25.jpg)
Large-scale convex optimization
Proposal:Two-stage approach for large-scale convex optimization
Matrix stuffing: Fast homogeneous self-dual embedding (HSD) transformation
Operator splitting (ADMM): Large-scale homogeneous self-dual embedding
25
fast homogeneous self-dual embedding (HSD) transformation
large-scale homogeneous self-dual embedding solving
![Page 26: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/26.jpg)
Smith form reformulation
Goal: Transform the classical form to conic form
Key idea: Introduce a new variable for each subexpression in classicalform [Smith ’96]
The Smith form is ready for standard cone programming transformation
26
![Page 27: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/27.jpg)
Example
Coordinated beamforming problem family
Smith form reformulation
27
Reference: Grant-Boyd’08
Smith form for (1) Smith form for (2)
QoS constraints
Per-BS power constraint
The Smith form is readily to be reformulated as the standard cone program
![Page 28: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/28.jpg)
Optimality condition
KKT conditions (necessary and sufficient, assuming strong duality)
Primal feasibility:
Dual feasibility:
Complementary slackness:
Feasibility:
28
zero duality gap
no solution if primal or dual problem infeasible/unbounded
![Page 29: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/29.jpg)
Homogeneous self-dual (HSD) embedding
HSD embedding of the primal-dual pair of transformed standard coneprogram (based on KKT conditions) [Ye et al. 94]
This feasibility problem is homogeneous and self-dual
29
+ ⟹
finding a nonzero solution
![Page 30: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/30.jpg)
Recovering solution or certificates
Any HSD solution falls into one of three cases:
Case 1: , then is a solution
Case 2: , implies
If , then certifies primal infeasibility
If , then certifies dual infeasibility
Case 3: , nothing can be said about original problem
HSD embedding: 1) obviates need for phase I / phase II solves tohandle infeasibility/unboundedness; 2) used in all interior-point conesolvers
30
![Page 31: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/31.jpg)
Operator Splitting
31
![Page 32: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/32.jpg)
Alternating direction method of multipliers
ADMM: an operator splitting method solving convex problems in form
, convex, not necessarily smooth, can take infinite values
The basic ADMM algorithm [Boyd et al., FTML 11]
is a step size; is the dual variable associated the constraint32
![Page 33: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/33.jpg)
Alternating direction method of multipliers
Convergence of ADMM: Under benign conditions ADMM guarantees
, an optimal dual variable
Same as many other operator splitting methods for consensus problem,e.g., Douglas-Rachford method
Pros: 1) with good robustness of method of multipliers; 2) can supportdecomposition
33
![Page 34: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/34.jpg)
Operator splitting
Transform HSD embedding in ADMM form: Apply the operatingsplitting method (ADMM)
Final algorithm
34
subspace projectionparallel cone projection
computationally trivial
![Page 35: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/35.jpg)
Parallel cone projection
Proximal algorithms for parallel cone projection [Parikn & Boyd, FTO 14]
Projection onto the second-order cone:
Closed-form, computationally scalable (we mainly focus on SOCP)
Projection onto positive semidefinite cone:
SVD is computationally expensive
35
![Page 36: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/36.jpg)
Numerical results
Power minimization coordinated beamforming problem (SOCP)
36
[Ref] Y. Shi, J. Zhang, B. O’Donoghue, and K. B. Letaief, “Large-scale convex optimization for dense wireless cooperative networks,” IEEE Trans. Signal Process., vol. 63, no. 18, pp. 4729-4743, Sept. 2015. (The 2016 IEEE Signal Processing Society Young Author Best Paper Award)
Network Size (L=K) 20 50 100 150
Interior-Point Solver
Solving Time [sec] 4.2835 326.2513 N/A N/A
Objective [W] 12.2488 6.5216 N/A N/A
Operator Splitting
Solving Time [sec] 0.1009 2.4821 23.8088 81.0023
Objective [W] 12.2523 6.5193 3.1296 2.0689
ADMM can speedup 130x over the interior-point method
![Page 37: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/37.jpg)
Cone programs with random constraints
Phase transitions in cone programming: independent standard normalentries in and
37Fig. credit: Amelunxen-McCoy-Tropp’13
![Page 38: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/38.jpg)
Vignette B: Generalized Low-Rank Optimization
38
Optimization over Riemannian Manifolds (non-Euclidean geometry)
1. Geometry of Nonconvex Statistical Estimation2. Scalable Riemannian Optimization Algorithms
![Page 39: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/39.jpg)
Generalized low-rank matrix optimization
Rank-constrained matrix optimization problem
is a real linear map on matrices
is convex and differentiable
A prevalent model in signal processing, statistics and machine learning (e.g.,low-rank matrix completion)
Challenge 1: Reliably solve the low-rank matrix problem at scale
Challenge II: Develop optimization algorithms with optimal storage39
![Page 40: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/40.jpg)
Application: Topological interference alignment
Blessings: partial connectivity in dense wireless networks for massive dataprocessing and transmission
Approach: topological interference management (TIM) [Jafar,TIT 14]
Maximize the achievable DoF: only based on the network topologyinformation (no CSIT)
40
path-loss
shadowing
transmitter receiver transmitter receiver
Degrees of Freedom?
![Page 41: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/41.jpg)
Application: Topological interference alignment
Goal: Deliver one data stream per user over time slots
Transmitter transmits , receiver receives
Receiver decodes symbol by projecting into the space
Topological interference alignment condition
41
: network connectivity pattern
![Page 42: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/42.jpg)
Generalized low-rank model
Generalized low-rank optimization with network side information
: precoding vectors and decoding vectors
equals the inverse of achievable degrees-of-freedom (DoF)
42
topological interference alignment condition
side information
![Page 43: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/43.jpg)
Nuclear norm fails
Convex relaxation fails: always return the identity matrix!
Fact:
Proposal: Solve the nonconvex problems directly with rank adaptivity
43
Riemannian manifold optimization problem
manifold constraint
![Page 44: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/44.jpg)
Recent advances in nonconvex optimization
2009–Present: Nonconvex heuristics
Burer–Monteiro factorization idea + various nonlinear programming methods
Store low-rank matrix factors
Guaranteed solutions: Global optimality with statistical assumptions
Matrix completion/recovery: [Sun-Luo’14], [Chen-Wainwright’15], [Ge-Lee-Ma’16],…
Phase retrieval: [Candes et al., 15], [Chen-Candes’ 15], [Sun-Qu-Wright’16]
Community detection/phase synchronization [Bandeira-Boumal-Voroninski’16], [Montanari et al., 17],…
44When are nonconvex optimization problems not scary?
![Page 45: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/45.jpg)
Geometry of Nonconvex Statistical Optimization
45
![Page 46: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/46.jpg)
First-order stationary points
Saddle points and local minima:
46Local minima Saddle points/local maxima
![Page 47: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/47.jpg)
First-order stationary points
Applications: PCA, matrix completion, dictionary learning etc.
Local minima: Either all local minima are global minima or all local minimaas good as global minima
Saddle points:Very poor compared to global minima; Several such points
Bottomline: Local minima much more desirable than saddle points
47
![Page 48: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/48.jpg)
Summary of nonconvex statistical optimization
Convex methods:
Slow memory hogs
Convex relaxation fails sometimes, e.g., topological interference alignment
High computational complexity, e.g., eigenvalue decomposition
Nonconvex methods: fast, lightweight
Under certain statistical models with benign global geometry: no spuriouslocal optima
48
How to escape saddle points efficiently? Fig credit: Sun, Qu & Wright
![Page 49: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/49.jpg)
Riemannian Optimization Algorithms
49
Escape saddle pints via manifold optimization
![Page 50: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/50.jpg)
What is manifold optimization?
Manifold (or manifold-constrained) optimization problem
is a smooth function
is a Riemannian manifold: spheres, orthonormal bases (Stiefel), rotations,positive definite matrices, fixed-rank matrices, Euclidean distance matrices,semidefinite fixed-rank matrices, linear subspaces (Grassmann), phases,essential matrices, fixed-rank tensors, Euclidean spaces...
50
![Page 51: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/51.jpg)
Escape saddle pints via manifold optimization
Convergence guarantees for Riemannian trust regions
Global convergence to second-order critical points
Quadratic convergence rate locally
Reach -second order stationary point and
in iterations under Lipschitz assumptions [Cartis & Absil’16]
Other approaches: Gradient descent by adding noise [Ge et al., 2015],[Jordan et al., 17] (slow convergence rate in general)
51
Escape strict saddle points via finding second-order stationary point
![Page 52: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/52.jpg)
Recent applications of manifold optimization
Matrix/tensor completion/recovery: [Vandereycken’13], [Boumal-Absil’15], [Kasai-Mishra’16],…
Gaussian mixture models: [Hosseini-Sra’15], Dictionary learning: [Sun-Qu-Wright’17], Phase retrieval: [Sun-Qu-Wright’17],…
Phase synchronization/community detection: [Boumal’16], [Bandeira-Boumal-Voroninski’16],…
Wireless transceivers design: [Shi-Zhang-Letaief’16], [Yu-Shen-Zhang-K. B. Letaief’16], [Shi-Mishra-Chen’16],…
52
![Page 53: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/53.jpg)
The power of manifold optimization paradigms
Generalize Euclidean gradient (Hessian) to Riemannian gradient (Hessian)
We need Riemannian geometry: 1) linearize search space into atangent space ; 2) pick a metric on to give intrinsic notions ofgradient and Hessian
53
Riemannian Gradient Euclidean Gradient
Retraction Operator
![Page 54: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/54.jpg)
54
An excellent bookOptimization algorithms on matrix manifolds
A Matlab toolbox
![Page 55: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/55.jpg)
Taking A Close Look at Gradient Descent
55
![Page 56: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/56.jpg)
Optimization on the manifold: main idea
56
![Page 57: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/57.jpg)
Optimization on the manifold: main idea
57
![Page 58: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/58.jpg)
Optimization on the manifold: main idea
58
![Page 59: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/59.jpg)
Optimization on the manifold: main idea
59
![Page 60: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/60.jpg)
Example: Rayleigh quotient
Optimization over (sphere) manifold
The cost function is smooth on , symmetric matrix
Step 1: Compute the Euclidean gradient in
Step 2: Compute the Riemannian gradient on via projecting to
the tangent space using the orthogonal projector
60
![Page 61: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/61.jpg)
Example: Generalized low-rank optimization
Generalized low-rank optimization for topological interferencealignment via Riemannian optimization
61
![Page 62: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/62.jpg)
Convergence rates
Optimize over fixed-rank matrices (quotient matrix manifold)
62[Ref] Y. Shi, J. Zhang, and K. B. Letaief, “Low-rank matrix completion for topological interference management by
Riemannian pursuit,” IEEETrans.Wireless Commun., vol. 15, no. 7, Jul. 2016.
Riemannian algorithms:1. Exploit the rank structure
in a principled way2. Develop second-order
algorithms systematically3. Scalable, SVD-free
![Page 63: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/63.jpg)
Phase transitions for topological IA
63
The heat map indicates the empirical probability of success
(blue=0%; yellow=100%)
![Page 64: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/64.jpg)
Concluding remarks
Structured sparse optimization
Convex geometry and analysis provide statistical optimality guarantees
Matrix stuffing for fast HSD embedding transformation
Operator splitting for solving large-scale HSD embedding
Future directions:
Statistical analysis for more complicated problems, e.g., cone programs
Operator splitting for large-scale sparse SDP problems [Zheng-Fantuzzi-Papachristodoulou-Goulart-Wynn’17]
More applications: deep neural network compression via sparse optimization64
![Page 65: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/65.jpg)
Concluding remarks
Generalized low-rank optimization
Nonconvex statistical optimization may not be that scary: no spurious localoptima
Riemannian optimization is powerful: 1) Exploit the manifold geometry offixed-rank matrices; 2) Escape saddle points
Future directions:
Geometry of neural network loss surfaces via random matrix theory[Pennington-Bahri’17]: 1) Are all minima global? 2) What is the distribution ofcritical points?
More applications: blind deconvolution for IoT, big data analytics (e.g., ranking)65
![Page 66: Geometry and Statistics in High-Dimensional Structured ... · Geometry and Statistics in High-Dimensional Structured Optimization Yuanming Shi ... Massive data sets require very fast](https://reader034.vdocuments.us/reader034/viewer/2022042319/5f081eed7e708231d4207134/html5/thumbnails/66.jpg)
To learn more... Web: http://shiyuanming.github.io/
Papers:
Y. Shi, J. Zhang, and K. B. Letaief, “Group sparse beamforming for green Cloud-RAN,” IEEETrans. Wireless Commun., vol. 13, no. 5, pp. 2809-2823, May 2014. (The 2016 Marconi PrizePaper Award)
Y. Shi, J. Zhang, B. O’Donoghue, and K. B. Letaief, “Large-scale convex optimization fordense wireless cooperative networks,” IEEE Trans. Signal Process., vol. 63, no. 18, pp. 4729-4743, Sept. 2015. t. 2015. (The 2016 IEEE Signal Processing Society Young Author BestPaper Award)
Y. Shi, J. Zhang, and K. B. Letaief, “Low-rank matrix completion for topological interferencemanagement by Riemannian pursuit,” IEEE Trans. Wireless Commun., vol. 15, no. 7, pp. 4703-4717, Jul. 2016.
Y. Shi, J. Zhang, W. Chen, and K. B. Letaief, “Generalized sparse and low-rank optimization forultra-dense networks,” IEEE Commun. Mag., to appear.
66