method and approximation algorithms i - d steurerΒ Β· for approximation algorithms: need...

34
Cargese Workshop, 2014 SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell

Upload: others

Post on 17-Jul-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

Cargese Workshop, 2014

SUM-OF-SQUARES method and approximation algorithms I

David SteurerCornell

Page 2: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

meta-task

given: functions 𝑓1, … , π‘“π‘š: Β±1𝑛 β†’ ℝ

find: solution π‘₯ ∈ Β±1 𝑛 to 𝑓1 = 0,… , π‘“π‘š = 0

encoded as low-degree polynomial in ℝ π‘₯

example: 𝑓(π‘₯) = 𝑖,π‘—βˆˆ 𝑛 𝑀𝑖𝑗 β‹… π‘₯𝑖 βˆ’ π‘₯𝑗2

examples: combinatorial optimization problem on graph 𝐺

MAX CUT: 𝐿𝐺 = 1 βˆ’ νœ€ over Β±1 𝑛

MAX BISECTION: 𝐿𝐺 = 1 βˆ’ νœ€, 𝑖 π‘₯𝑖 = 0 over Β±1 𝑛

where 1 βˆ’ νœ€ is guess for optimum value

Laplacian 𝐿𝐺 =1

𝐸 𝐺 π‘–π‘—βˆˆπΈ 𝐺

1

4π‘₯𝑖 βˆ’ π‘₯𝑗

2

goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation

(β€œon the edge intractability” need strongest possible relaxations)

Page 3: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

price of convexity: individual solutions distributions over solutions

price of tractability: can only enforce β€œefficiently checkable knowledge” about solutions

individual solutions

distributions over solutions

β€œpseudo-distributions over solutions”(consistent with efficiently checkable knowledge)

given: functions 𝑓1, … , π‘“π‘š: Β±1𝑛 β†’ ℝ

find: solution π‘₯ ∈ Β±1 𝑛 to 𝑓1 = 0,… , π‘“π‘š = 0

meta-task

goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation

Page 4: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

distribution 𝐷 over Β±1 𝑛

function 𝐷: Β±1 𝑛 β†’ ℝ

non-negativity: 𝐷 π‘₯ β‰₯ 0 for all π‘₯ ∈ Β±1 𝑛

normalization: π‘₯∈ Β±1 𝐷 π‘₯ = 1

distribution 𝐷 satisfies 𝑓1 = 0,… , π‘“π‘š = 0 for some 𝑓𝑖: Β±1𝑛 β†’ ℝ

𝔼𝐷𝑓12 +β‹―+ π‘“π‘š

2 = 0 (equivalently: ℙ𝐷 βˆ€π‘–. 𝑓𝑖 β‰  0 = 0)

examplesuniform distribution: 𝐷 = 2βˆ’π‘›

fixed 2-bit parity: 𝐷 π‘₯ = (1 + π‘₯1π‘₯2)/2𝑛

examplesfixed 2-bit parity distribution satisfies π‘₯1π‘₯2 = 1uniform distribution does not satisfy 𝑓 = 0 for any 𝑓 β‰  0

convex: 𝐷,𝐷′ satisfy conditions 𝐷 + 𝐷′ /2 satisfies conditions

# function values is exponential need careful representation

# independent inequalities is exponential not efficiently checkable

Page 5: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

distribution 𝐷 over Β±1 𝑛

function 𝐷: Β±1 𝑛 β†’ ℝ

non-negativity: 𝐷 π‘₯ β‰₯ 0 for all π‘₯ ∈ Β±1 𝑛

normalization: π‘₯∈ Β±1 𝐷 π‘₯ = 1

distribution 𝐷 satisfies 𝑓1 = 0,… , π‘“π‘š = 0 for some 𝑓𝑖: Β±1𝑛 β†’ ℝ

𝔼𝐷𝑓12 +β‹―+ π‘“π‘š

2 = 0 (equivalently: ℙ𝐷 βˆ€π‘–. 𝑓𝑖 β‰  0 = 0)

deg.-𝑑 pseudo-distribution 𝐷

π‘₯∈ Β±1 𝑛𝐷 π‘₯ 𝑓 π‘₯ 2 β‰₯ 0 for

every deg.-𝑑/2 polynomial 𝑓

convenient notation: 𝔼𝐷𝑓 ≔ π‘₯𝐷 π‘₯ 𝑓 π‘₯β€œpseudo-expectation of 𝑓 under 𝐷”

𝔼𝐷

deg.-2𝑛 pseudo-distributions are actual distributions

(point-indicators 𝟏 π‘₯ have deg. 𝑛 𝐷 π‘₯ = π”Όπ·πŸ π‘₯2 β‰₯ 0)

pseudo-

Page 6: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

deg.-𝑑 pseudo-distr. 𝐷: Β±1 𝑛 β†’ ℝ

non-negativity: 𝔼𝐷𝑓2 β‰₯ 0 for every deg.-𝑑/2 poly. 𝑓

normalization: 𝔼𝐷1 = 1

pseudo-distr. 𝐷 satisfies 𝑓1 = 0,… , π‘“π‘š = 0 for some 𝑓𝑖: Β±1𝑛 β†’ ℝ

𝔼𝐷𝑓12 +β‹―+ π‘“π‘š

2 = 0 (equivalently: 𝔼𝐷𝑓𝑖 β‹… 𝑔 = 0 whenever deg 𝑔 ≀ 𝑑 βˆ’ deg 𝑓𝑖)

notation: 𝔼𝐷𝑓 ≔ π‘₯𝐷 π‘₯ 𝑓 π‘₯ , β€œpseudo-expectation of 𝑓 under 𝐷”

Page 7: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

deg.-𝑑 pseudo-distr. 𝐷: Β±1 𝑛 β†’ ℝ

non-negativity: 𝔼𝐷𝑓2 β‰₯ 0 for every deg.-𝑑/2 poly. 𝑓

normalization: 𝔼𝐷1 = 1

pseudo-distr. 𝐷 satisfies 𝑓1 = 0,… , π‘“π‘š = 0 for some 𝑓𝑖: Β±1𝑛 β†’ ℝ

𝔼𝐷𝑓12 +β‹―+ π‘“π‘š

2 = 0 (equivalently: 𝔼𝐷𝑓𝑖 β‹… 𝑔 = 0 whenever deg 𝑔 ≀ 𝑑 βˆ’ deg 𝑓𝑖)

notation: 𝔼𝐷𝑓 ≔ π‘₯𝐷 π‘₯ 𝑓 π‘₯ , β€œpseudo-expectation of 𝑓 under 𝐷”

claim: can compute such 𝐷 in time 𝑛𝑂(𝑑) if it exists (otherwise, certify that no solution to original problem exists)

(can assume 𝐷 is deg.-𝑑 polynomial separation problem min𝑓

𝔼𝐷𝑓2 is 𝑛𝑑-

dim. eigenvalue prob. 𝑛𝑂(𝑑)-time via grad. descent / ellipsoid method)

[Shor, Parrilo, Lasserre]

Page 8: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

surprising property: 𝔼𝐷𝑓 β‰₯ 0 for many* low-degree polynomials 𝑓such that 𝑓 β‰₯ 0 follows from 𝑓1 = 0,… , π‘“π‘š = 0 by β€œexplicit proof”

soon: examples of such properties and how to exploit them

deg.-𝑑 pseudo-distr. 𝐷: Β±1 𝑛 β†’ ℝ

non-negativity: 𝔼𝐷𝑓2 β‰₯ 0 for every deg.-𝑑/2 poly. 𝑓

normalization: 𝔼𝐷1 = 1

pseudo-distr. 𝐷 satisfies 𝑓1 = 0,… , π‘“π‘š = 0 for some 𝑓𝑖: Β±1𝑛 β†’ ℝ

𝔼𝐷𝑓12 +β‹―+ π‘“π‘š

2 = 0 (equivalently: 𝔼𝐷𝑓𝑖 β‹… 𝑔 = 0 whenever deg 𝑔 ≀ 𝑑 βˆ’ deg 𝑓𝑖)

notation: 𝔼𝐷𝑓 ≔ π‘₯𝐷 π‘₯ 𝑓 π‘₯ , β€œpseudo-expectation of 𝑓 under 𝐷”

Page 9: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

surprising property: 𝔼𝐷𝑓 β‰₯ 0 for many* low-degree polynomials 𝑓such that 𝑓 β‰₯ 0 follows from 𝑓1 = 0,… , π‘“π‘š = 0 by β€œexplicit proof”

deg.-𝑑 pseudo-distr. 𝐷: Β±1 𝑛 β†’ ℝ

non-negativity: 𝔼𝐷𝑓2 β‰₯ 0 for every deg.-𝑑/2 poly. 𝑓

normalization: 𝔼𝐷1 = 1

pseudo-distr. 𝐷 satisfies 𝑓1 = 0,… , π‘“π‘š = 0 for some 𝑓𝑖: Β±1𝑛 β†’ ℝ

𝔼𝐷𝑓12 +β‹―+ π‘“π‘š

2 = 0 (equivalently: 𝔼𝐷𝑓𝑖 β‹… 𝑔 = 0 whenever deg 𝑔 ≀ 𝑑 βˆ’ deg 𝑓𝑖)

notation: 𝔼𝐷𝑓 ≔ π‘₯𝐷 π‘₯ 𝑓 π‘₯ , β€œpseudo-expectation of 𝑓 under 𝐷”

soon: examples of such properties and how to exploit them

emerging algorithm-design paradigm:analyze algorithm pretending that underlying actual distribution exists; verify only afterwards that low-deg. pseudo-distr.’s satisfy required properties

pseudo-distr. over optimal solutions

approximate solution (to original problem)

efficient algorithm

deg.-𝑑 part of actual distr. over optimal solutions

π‘›π‘œ(𝑑)-time algorithms cannot* distinguish between deg.-𝑑 pseudo-distributions and deg.-𝑑 part of actual distr.’s

Page 10: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

dual view (sum-of-squares proof system)

eitherβˆƒ deg.-𝑑 pseudo-distribution 𝐷 over Β±1 𝑛 satisfying 𝑓1 = 0,… , π‘“π‘š = 0

or βˆƒ 𝑔1, … , π‘”π‘š and β„Ž1, … , β„Žπ‘˜ such that 𝑖 𝑓𝑖 β‹… 𝑔𝑖 + 𝑗 β„Žπ‘—

2 = βˆ’1 over Β±1 𝑛

and deg 𝑓𝑖 + deg𝑔𝑖 ≀ 𝑑 and deg β„Žπ‘– ≀ 𝑑/2

derivation of unsatisfiable constraint βˆ’1 β‰₯ 0from 𝑓1 = 0,… , π‘“π‘š = 0 over Β±1 𝑛

βˆ’1

𝐷𝑓𝑓1𝑓2π‘“π‘š

𝐾𝑑 = 𝑓 = 𝑖 𝑓𝑖 β‹… 𝑔𝑖 + 𝑗 β„Žπ‘—2

𝐾𝑑

if βˆ’1 βˆ‰ 𝐾𝑑 then βˆƒ separating hyperplane 𝐷with 𝔼𝐷 βˆ’ 1 = βˆ’1 and 𝔼𝐷𝑓 β‰₯ 0 for all 𝑓 ∈ 𝐾𝑑

Page 11: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

pseudo-distribution satisfies all local properties of ±𝟏 𝒏

claimsuppose 𝑓 β‰₯ 0 is 𝑑/2-junta over Β±1 𝑛 (depends on ≀ 𝑑/2 coordinates)then, 𝔼𝐷𝑓 β‰₯ 0

proof: 𝑓 has degree ≀ 𝑑/2 𝔼𝐷𝑓 = 𝔼𝐷 𝑓2β‰₯ 0

corollaryfor any set 𝑆 of ≀ 𝑑 coordinates, marginal 𝐷′ = π‘₯𝑆 𝐷 is actual distribution

𝐷′ π‘₯𝑆 =

π‘₯ 𝑛 βˆ–π‘†

𝐷 π‘₯𝑆, π‘₯ 𝑛 βˆ–π‘† = π”Όπ·πŸ π‘₯𝑆 β‰₯ 0

𝑑-junta

(also captured by LP methods, e.g., Sherali–Adams hierarchies … )

example: triangle inequalities over Β±1 𝑛

𝔼𝐷 π‘₯𝑖 βˆ’ π‘₯𝑗2+ π‘₯𝑗 βˆ’ π‘₯π‘˜

2βˆ’ π‘₯𝑖 βˆ’ π‘₯π‘˜

2 β‰₯ 0

Page 12: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

conditioning pseudo-distributions

claim

βˆ€π‘– ∈ 𝑛 , 𝜎 ∈ Β±1 . 𝐷′ = π‘₯ ∣ π‘₯𝑗 = 𝜎𝐷

is deg.- 𝑑 βˆ’ 2 pseudo-distr.

proof

𝐷′ π‘₯ =1

ℙ𝐷 π‘₯𝑗=𝜎𝐷 π‘₯ β‹… 1 π‘₯𝑗=𝜎

𝔼𝐷′𝑓2 ∝ 𝔼𝐷1 π‘₯𝑗=πœŽπ‘“2 = 𝔼𝐷 1 π‘₯𝑗=𝜎

𝑓2β‰₯ 0

deg 𝑓 ≀ (𝑑 βˆ’ 2)/2 deg𝟏 π‘₯𝑗=πœŽπ‘“ ≀ 𝑑/2

(also captured by LP methods, e.g., Sherali–Adams hierarchies … )

Page 13: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

pseudo-covariances are covariances of distributions over ℝ𝒏

claimthere exists a (Gaussian) distr. πœ‰ over ℝ𝑛 such that

𝔼𝐷π‘₯ = 𝔼 πœ‰ and 𝔼𝐷π‘₯π‘₯𝑇 = 𝔼 πœ‰πœ‰π‘‡

let πœ‡ = 𝔼𝐷π‘₯ and 𝑀 = 𝔼𝐷 π‘₯ βˆ’ πœ‡ π‘₯ βˆ’ πœ‡ 𝑇

choose πœ‰ to be Gaussian with mean πœ‡ and covariance 𝑀

matrix 𝑀 p.s.d. because 𝑣𝑇𝑀𝑣 = 𝔼𝐷 𝑣𝑇π‘₯ 2 β‰₯ 0 for all 𝑣 ∈ ℝ𝑛

consequence: π”Όπ·π‘ž = 𝔼 πœ‰ π‘ž

for every π‘ž of deg. 2

square of linear form

proof

Page 14: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

claimfor every univariate 𝑝 β‰₯ 0 over ℝ and every 𝑛-variate polynomial π‘žwith deg 𝑝 β‹… deg π‘ž ≀ 𝑑,

𝔼𝐷𝑝 π‘ž π‘₯ β‰₯ 0

enough to show: 𝑝 is sum of squares

choose: minimizer 𝛼 of 𝑝

proof by induction on deg 𝑝

squares sum of squares by ind. hyp.

𝛼

𝑝 𝛼 β‰₯ 0

then: p= 𝑝 𝛼 + π‘₯ βˆ’ 𝛼 2 β‹… 𝑝′ for some polynomial 𝑃′ with deg 𝑝′ < deg 𝑝

ℝ

pseudo-distr.’s satisfy (compositions of) low-deg. univariate properties

useful class of non-local higher-deg. inequalities

𝑝

Page 15: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

MAX CUT

given: deg.-𝑑 pseudo-distr. 𝐷 over Β±1 𝑛, satisfies 𝐿𝐺 = 1 βˆ’ νœ€

𝐿𝐺 =1

𝐸 𝐺 π‘–π‘—βˆˆπΈ 𝐺

1

4π‘₯𝑖 βˆ’ π‘₯𝑗

2

goal: find 𝑦 ∈ Β±1 𝑛 with 𝐿𝐺 𝑦 β‰₯ 1 βˆ’ 𝑂 νœ€

algorithm

sample from Gaussian distr. πœ‰ over ℝ𝑛 with 𝔼 πœ‰πœ‰π‘‡ = 𝔼𝐷 π‘₯π‘₯𝑇

output 𝑦 = sgn πœ‰

analysis

claim: ℙ𝐷 π‘₯𝑖 β‰  π‘₯𝑗 = 1 βˆ’ πœ‚ β‡’ β„™ 𝑦𝑖 β‰  𝑦𝑗 β‰₯ 1 βˆ’ 𝑂 πœ‚

proof: πœ‰π‘– , πœ‰π‘— satisfies βˆ’π”Ό πœ‰π‘–πœ‰π‘— = βˆ’ 𝔼𝐷π‘₯𝑖π‘₯𝑗 = 1 βˆ’ 𝑂 πœ‚ and π”Όπœ‰π‘–2 = π”Όπœ‰π‘—

2 = 1

(tedious calculation) β„™ sgn πœ‰π‘– β‰  sgn πœ‰π‘— β‰₯ 1 βˆ’ 𝑂 πœ‚

[Goeman-Williamson]

Page 16: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

low global correlation in (pseudo-)distributions

claimβˆ€π‘Ÿ. βˆƒ deg.- 𝑑 βˆ’ 2π‘Ÿ pseudo-distribution 𝐷′, obtained by conditioning 𝐷,

Avg𝑖,π‘—βˆˆ 𝑛 𝐼𝐷′ π‘₯𝑖 , π‘₯𝑗 ≀ 1/π‘Ÿ

[Barak-Raghavendra-S.,Raghavendra-Tan]

proofpotential Avgπ‘–βˆˆ 𝑛 𝐻 π‘₯𝑖 ; greedily condition on variables to maximize

potential decrease until global correlation is low

mutual information: 𝐼 π‘₯, 𝑦 = 𝐻 π‘₯ βˆ’ 𝐻 π‘₯ 𝑦

potential decrease β‰₯ Avgπ‘–βˆˆ 𝑛 𝐻 π‘₯𝑖 βˆ’ Avgπ‘—βˆˆ 𝑛 Avgπ‘–βˆˆ 𝑛 𝐻 π‘₯𝑖 ∣ π‘₯𝑗= Avg𝑖,π‘—βˆˆ 𝑛 𝐼𝐷′ π‘₯𝑖 , π‘₯𝑗

how often do we need to condition?

only need to condition ≀ π‘Ÿ times

Page 17: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

MAX BISECTION

given: deg.-𝑑 pseudo-distr. 𝐷 over Β±1 𝑛, satisfies 𝐿𝐺 = 1 βˆ’ νœ€, 𝑖 π‘₯𝑖 = 0

goal: find 𝑦 ∈ Β±1 𝑛 with 𝐿𝐺 𝑦 β‰₯ 1 βˆ’ 𝑂 νœ€ and 𝑖 𝑦𝑖 = 0

𝑑 = 1/νœ€π‘‚ 1

algorithm

let 𝐷′ be conditioning of 𝐷 with global correlation ≀ νœ€π‘‚ 1

sample Gaussian πœ‰ with same deg.-2 moments as 𝐷′output 𝑦 with 𝑦𝑖 = sgn(πœ‰π‘– βˆ’ 𝑑𝑖) (choose 𝑑𝑖 ∈ ℝ so that 𝔼 𝑦𝑖 = 𝔼𝐷π‘₯𝑖)

analysis

almost as before:ℙ𝐷′ π‘₯𝑖 β‰  π‘₯𝑗 β‰₯ 1 βˆ’ πœ‚ β‡’ β„™ 𝑦𝑖 β‰  𝑦𝑗 β‰₯ 1 βˆ’ 𝑂 πœ‚

(𝑑𝑖 = 0 is worst case same analysis as MAX CUT)

new: 𝐼 π‘₯𝑖 , π‘₯𝑗 ≀ νœ€π‘‚ 1 β‡’ 𝔼𝑦𝑖𝑦𝑗 = 𝔼 π‘₯𝑖π‘₯𝑗 Β± νœ€π‘‚(1)

𝔼 𝑖 𝑦𝑖 ≀ 𝔼 𝑖 𝑦𝑖2 1/2 = 𝔼 𝑖 π‘₯𝑖

2 1/2+ νœ€π‘‚ 1 β‹… 𝑛 = νœ€π‘‚ 1 β‹… 𝑛

get bisection 𝑦′ from 𝑦 by correcting νœ€π‘‚(1) fraction of vertices

[Raghavendra-Tan]

𝔼 𝑖 π‘₯𝑖2 = 0

Page 18: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

Cargese Workshop, 2014

SUM-OF-SQUARES method and approximation algorithms II

David SteurerCornell

Page 19: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

sparse vector

given: linear subspace π‘ˆ βŠ† ℝ𝑛 (represented by some basis), parameter π‘˜ ∈ 𝑛

promise: βˆƒπ‘£0 ∈ π‘ˆ such that 𝑣0 is π‘˜-sparse (and 𝑣0 ∈ 0,Β±1 𝑛)goal: find π‘˜-sparse vector 𝑣 ∈ π‘ˆ

efficient approximation algorithm for π‘˜ = Ξ© 𝑛 would be major step toward refuting Khot’s Unique Games Conjecture and improved guarantees for MAX CUT, VERTEX COVER, …

planted / average-case version (benchmark for unsupervised learning tasks)

subspace π‘ˆ spanned by 𝑑 βˆ’ 1 random vectors and some π‘˜-sparse vector 𝑣0

previous best algorithms only work for very sparse vectors π‘˜

𝑛≀ 1/ 𝑑

[Spielman-Wang-Wright, Demanet-Hand]

here: deg.-4 pseudo-distributions work for π‘˜

𝑛= Ξ© 𝑛 up to 𝑑 ≀ 𝑂 𝑛

[Barak-Kelner-S.]

Page 20: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

limitations of β„“βˆž/β„“1 (previous best algorithm; exact via linear programming)

limitations of std. SDP relaxation for β„“2/β„“1 (best proxy for sparsity)

analytical proxy for sparsity

if vector 𝑣 is π‘˜-sparse then 𝑣 ∞

𝑣 1β‰₯

1

π‘˜, 𝑣 2

2

𝑣 12 β‰₯

1

π‘˜, and

𝑣 44

𝑣 24 β‰₯

1

π‘˜

(tight if 𝑣 ∈ 0,Β±1 𝑛)

𝑣 = sum of 𝑑 random Β±1 vectors with same first coordinate

β€– ‖𝑣 ∞ β‰₯ 𝑑 , β€– ‖𝑣 1 ≀ 𝑑 + 𝑛 𝑑 ratio β‰ˆπ‘‘

𝑛

β„“βˆž/β„“1 algorithm fails for π‘˜

𝑛β‰₯

1

𝑑

β€œideal object”: distribution 𝐷 over β„“2 unit sphere of subspace π‘ˆβ„“1-constraint: 𝔼𝐷 𝑣 1

2 ≀ π‘˜

tractable relaxation: 𝑖,𝑗 𝔼𝐷𝑣𝑖𝑣𝑗 ≀ π‘˜

not a low-deg. polynomial in 𝑣 unclear how to represent(also NP-hard in worst-case)

[d'Aspremont-El Ghaoui-Jordan-Lanckriet]

but: for uniform distr. 𝐷 over β„“2 sphere of 𝑑-dim. rand. subspace

𝑖,𝑗 𝔼𝐷𝑣𝑖𝑣𝑗 β‰ˆπ‘›

𝑑 same limitation as β„“βˆž/β„“1

Page 21: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

deg.-𝑑 pseudo-distr. 𝐷: 𝑣 ∈ π‘ˆ; 𝑣 2 = 1 β†’ ℝ over unit β„“2-sphere of π‘ˆ

degree-𝒅 SOS relaxation for β„“4/β„“2

pseudo-distribution satisfies 𝑣 44 = 1/π‘˜

notation: 𝔼𝐷𝑓 ≔ π‘£βˆˆπ‘ˆ;𝑣 =1

𝐷 β‹… 𝑓 (only consider polynomials easy to integrate)

normalization: 𝔼𝐷1 = 1

non-negativity: π”Όπ·β„Ž(𝑣)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑑/2

orthogonality: 𝔼𝐷 𝑣 44 βˆ’

1

π‘˜β‹… 𝑔(𝑣) = 0 for every 𝑔 of deg. ≀ 𝑑 βˆ’ 4

Page 22: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

set of deg.-𝑑 pseudo-distributions

= convex set with 𝑛𝑂 𝑑 -time separation oracle

separation problemgiven: function 𝐷 (represented as deg.-𝑑 polynomial)

check: quadratic form 𝑓 ↦ 𝔼𝐷𝑓2 is p.s.d. or output

violated constraint 𝔼𝐷𝑓2 < 0

how to find pseudo-distributions?

deg.-𝑑 pseudo-distr. 𝐷: 𝑣 ∈ π‘ˆ; 𝑣 2 = 1 β†’ ℝ over unit β„“2-sphere of π‘ˆ

degree-𝒅 SOS relaxation for β„“4/β„“2

pseudo-distribution satisfies 𝑣 44 = 1/π‘˜

notation: 𝔼𝐷𝑓 ≔ π‘£βˆˆπ‘ˆ;𝑣 =1

𝐷 β‹… 𝑓 (only consider polynomials easy to integrate)

normalization: 𝔼𝐷1 = 1

non-negativity: π”Όπ·β„Ž(𝑣)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑑/2

orthogonality: 𝔼𝐷 𝑣 44 βˆ’

1

π‘˜β‹… 𝑔(𝑣) = 0 for every 𝑔 of deg. ≀ 𝑑 βˆ’ 4

Page 23: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

rule of thumb: set of deg.-𝑑 pseudo-moments 𝔼𝐷𝑓 ∣ deg 𝑓 ≀ 𝑑 difficult*

to distinguish / separate from deg.-𝑑 moments of actual distr. of solutions

(* unless you invest 𝑛Ω 𝑑 time to distinguish)

also: values 𝔼𝐷𝑓 ∣ deg 𝑓 > 𝑑 do not carry additional information

no need to look at them

how to use pseudo-distributions?

deg.-𝑑 pseudo-distr. 𝐷: 𝑣 ∈ π‘ˆ; 𝑣 2 = 1 β†’ ℝ over unit β„“2-sphere of π‘ˆ

degree-𝒅 SOS relaxation for β„“4/β„“2

pseudo-distribution satisfies 𝑣 44 = 1/π‘˜

notation: 𝔼𝐷𝑓 ≔ π‘£βˆˆπ‘ˆ;𝑣 =1

𝐷 β‹… 𝑓 (only consider polynomials easy to integrate)

normalization: 𝔼𝐷1 = 1

non-negativity: π”Όπ·β„Ž(𝑣)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑑/2

orthogonality: 𝔼𝐷 𝑣 44 βˆ’

1

π‘˜β‹… 𝑔(𝑣) = 0 for every 𝑔 of deg. ≀ 𝑑 βˆ’ 4

Page 24: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

dual view (SOS certificates)

𝑣 44 βˆ’

1

π‘˜β‹… 𝑔 + 𝑗 β„Žπ‘—

2 = βˆ’1 over 𝑣 ∈ π‘ˆ; 𝑣 2 = 1

for some 𝑔 of deg. ≀ 𝑑 βˆ’ 4 and {β„Žπ‘—} of deg. ≀ 𝑑/2

⇔ no deg.-𝑑 pseudo-distr. exists ( no solution exists)

for approximation algorithms: need pseudo-distr. to extract approx. solution(hard to exploit non-existence of SOS certificate directly)

deg.-𝑑 pseudo-distr. 𝐷: 𝑣 ∈ π‘ˆ; 𝑣 2 = 1 β†’ ℝ over unit β„“2-sphere of π‘ˆ

degree-𝒅 SOS relaxation for β„“4/β„“2

pseudo-distribution satisfies 𝑣 44 = 1/π‘˜

notation: 𝔼𝐷𝑓 ≔ π‘£βˆˆπ‘ˆ;𝑣 =1

𝐷 β‹… 𝑓 (only consider polynomials easy to integrate)

normalization: 𝔼𝐷1 = 1

non-negativity: π”Όπ·β„Ž(𝑣)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑑/2

orthogonality: 𝔼𝐷 𝑣 44 βˆ’

1

π‘˜β‹… 𝑔(𝑣) = 0 for every 𝑔 of deg. ≀ 𝑑 βˆ’ 4

Page 25: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

Cauchy–Schwarz inequality

HΓΆlder’s inequality

β„“4-triangle inequality

𝔼𝐷 𝑒, 𝑣 ≀ 𝔼𝐷 𝑒 2 1/2 𝔼𝐷 𝑣 2 1/2

let 𝐷 = 𝑒, 𝑣 be a deg.-4 pseudo-distribution over ℝ𝑛 Γ— ℝ𝑛

𝔼𝐷 𝑖 𝑒𝑖3 β‹… 𝑣𝑖 ≀ 𝔼𝐷 𝑒 4

4 3/4 𝔼𝐷 𝑣 44 1/4

𝔼𝐷 𝑒 + 𝑣 44 ≀ 𝔼𝐷 𝑒 4

4 1/4+ 𝔼𝐷 𝑣 4

4 1/4

following inequalities hold as expected (same as for distributions)

general properties of pseudo-distributions

Page 26: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

claimlet π‘ˆβ€² βŠ† ℝ𝑛 be a random 𝑑-dim. subspace with 𝑑 β‰ͺ 𝑛let 𝑃′ be the orthogonal projector into π‘ˆβ€²

then w.h.p, 𝑃′𝑣 44 =

𝑂 1

𝑛𝑣 2

4 βˆ’ 𝑗 β„Žπ‘— 𝑣 2 over 𝑣 ∈ ℝ𝑛 for β„Žπ‘—β€™s of deg. 4

[Barak-Brandao-Harrow-Kelner-S.-Zhou]

proof sketch

(SOS certificate for classical inequality 𝑃′𝑣 44 ≀

𝑂 1

𝑛𝑣 2

4)

basis change: let π‘₯ = 𝐡𝑇𝑣 where 𝐡’s columns are orthonormal basis of π‘ˆ

(so that 𝑃′ = 𝐡𝐡𝑇) 𝑃′𝑣 44 =

1

𝑛2 𝑖 𝑏𝑖 , π‘₯

4 with 𝑏1, … , 𝑏𝑛 close to i.i.d.

standard Gaussian vectors (so that 𝔼𝑏 𝑏, π‘₯ 2 = π‘₯ 22 and 𝔼𝑏 𝑏, π‘₯ 4 = 3 β‹…

π‘₯ 24)

enough to show:1

𝑛 𝑖=1𝑛 𝑏𝑖 , π‘₯

4 = 𝑂 1 β‹… 𝔼𝑏 𝑏, π‘₯ 4 βˆ’ 𝑗 β„Žπ‘—β€² π‘₯ 2

reduce to deg. 2: 1

𝑛 𝑖=1𝑛 𝑏𝑖

βŠ—2, 𝑦 2 ≀ 𝑂 1 β‹… 𝔼𝑏 π‘βŠ—2, 𝑦 2 (𝑦 = π‘₯βŠ—2)

use concentration inequalities for quadratic forms (aka matrices)

Page 27: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

given: some basis of subspace π‘ˆ = span π‘ˆβ€² βˆͺ 𝑣0 βŠ† ℝ𝑛,where π‘ˆβ€² βŠ† ℝ𝑛 random 𝑑-dim. subspace,

and 𝑣0 ∈ ℝ𝑛 with 𝑣0 βŠ₯ π‘ˆβ€², 𝑣0 44 =

1

π‘˜, and 𝑣0 2

4 = 1 (e.g., π‘˜-sparse)

approximation algorithm for planted sparse vector

compute deg.-4 pseudo-distr. 𝐷 = {𝑣} over unit ball of π‘ˆ satisfying 𝑣 44 =

1

π‘˜

goal: find unit vector 𝑀 with 𝑀, 𝑣02 β‰₯ 1 βˆ’ 𝑂 π‘˜/𝑛 1/4

algorithm

sample Gaussian distr. 𝑀 with 𝔼 𝑀𝑀𝑇 = 𝔼𝐷𝑣𝑣𝑇 and renormalize

analysis

claim: 𝔼𝐷 𝑣, 𝑣02 β‰₯ 1 βˆ’ 𝑂 π‘˜/𝑛 1/4 ( Gaussian 𝑀 almost 1-dim.)

Page 28: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

analysis

claim: 𝔼𝐷 𝑣, 𝑣02 β‰₯ 1 βˆ’ 𝑂 π‘˜/𝑛 1/4 ( Gaussian 𝑀 almost 1-dim.)

1

π‘˜1/4= 𝔼𝐷 𝑣 4

4 1/4(𝐷 satisfies β€– ‖𝑣 4

4 = 1 π‘˜ )

= 𝔼𝐷 𝑣, 𝑣0 𝑣0 + 𝑃′𝑣 44 1/4

(same function)

≀ 𝔼𝐷 𝑣, 𝑣0 𝑣0 44 1 4

+ 𝔼𝐷 𝑃′𝑣 44 1 4

(β„“4-triangle inequ.)

≀1

π‘˜ 1 4 β‹… 𝔼𝐷 𝑣, 𝑣04 1 4

+𝑂 1

𝑛1/4(SOS cert. for π‘ˆβ€²)

𝔼𝐷 𝑣, 𝑣04 β‰₯ 1 βˆ’ 𝑂 π‘˜/𝑛 1/4

𝔼𝐷 𝑣, 𝑣02 β‰₯ 1 βˆ’ 𝑂 π‘˜/𝑛 1/4 (because 𝑣, 𝑣0

4 = 1 βˆ’ 𝑃′𝑣 22 𝑣, 𝑣0

2)

given: some basis of subspace π‘ˆ = span π‘ˆβ€² βˆͺ 𝑣0 βŠ† ℝ𝑛,where π‘ˆβ€² βŠ† ℝ𝑛 random 𝑑-dim. subspace,

and 𝑣0 ∈ ℝ𝑛 with 𝑣0 βŠ₯ π‘ˆβ€², 𝑣0 44 =

1

π‘˜, and 𝑣0 2

4 = 1 (e.g., π‘˜-sparse)

approximation algorithm for planted sparse vector

goal: find unit vector 𝑀 with 𝑀, 𝑣02 β‰₯ 1 βˆ’ 𝑂 π‘˜/𝑛 1/4

Page 29: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

Cauchy–Schwarz inequality

HΓΆlder’s inequality

β„“4-triangle inequality

𝔼𝐷 𝑒, 𝑣 ≀ 𝔼𝐷 𝑒 2 1/2 𝔼𝐷 𝑣 2 1/2

let 𝐷 = 𝑒, 𝑣 be a deg.-4 pseudo-distribution over ℝ𝑛 Γ— ℝ𝑛

𝔼𝐷 𝑖 𝑒𝑖3 β‹… 𝑣𝑖 ≀ 𝔼𝐷 𝑒 4

4 3/4 𝔼𝐷 𝑣 44 1/4

𝔼𝐷 𝑒 + 𝑣 44 ≀ 𝔼𝐷 𝑒 4

4 1/4+ 𝔼𝐷 𝑣 4

4 1/4

following inequalities hold as expected (same as for distributions)

general properties of pseudo-distributions

Page 30: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

products of pseudo-distributions

claimsuppose 𝐷,𝐷′: Ξ© β†’ ℝ is deg.-𝑑 pseudo-distr. over Ξ©then, π·βŠ—π·β€²: Ξ© Γ— Ξ© β†’ ℝ is deg.-𝑑 pseudo-distr. over Ξ© Γ— Ξ©

proof

tensor products of positive semidefinite matrices are positive semidefinite

Page 31: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

Cauchy–Schwarz inequality

𝔼𝐷 𝑒, 𝑣 ≀ 𝔼𝐷 𝑒 22 1/2 𝔼𝐷 𝑣 2

2 1/2

let 𝐷 = 𝑒, 𝑣 be a deg.-2 pseudo-distribution over ℝ𝑛 Γ— ℝ𝑛

𝔼𝐷 𝑒, 𝑣2

= π”Όπ·βŠ—π· 𝑒, 𝑣 𝑒′, 𝑣′ (𝐷 βŠ—π·β€² is product pseudo-distr.)

= π”Όπ·βŠ—π· 𝑖𝑗 𝑒𝑖𝑣𝑖𝑒𝑗′𝑣𝑗

β€²

≀1

2 π”Όπ·βŠ—π· 𝑖𝑗 𝑒𝑖

2 𝑣𝑗′ 2

+ 𝑖𝑗 𝑒𝑗′ 2

𝑣𝑖2 (2π‘Žπ‘ = π‘Ž2 + 𝑏2 βˆ’ π‘Ž βˆ’ 𝑏 2)

=1

2 π”Όπ·βŠ—π· 𝑒 2

2 𝑣′ 22 + 𝑒′ 2

2 𝑣 22

= 𝔼𝐷 𝑒 22 β‹… 𝔼𝐷 𝑣 2

2 (𝐷 βŠ—π·β€² is product pseudo-distr.)

proof

Page 32: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

let 𝐷 = 𝑒, 𝑣 be a deg.-4 pseudo-distribution over ℝ𝑛 Γ— ℝ𝑛

proof

HΓΆlder’s inequality

𝔼𝐷 𝑖 𝑒𝑖3 β‹… 𝑣𝑖 ≀ 𝔼𝐷 𝑒 4

4 3/4 𝔼𝐷 𝑣 44 1/4

𝔼𝐷 𝑖 𝑒𝑖3 β‹… 𝑣𝑖

≀ 𝔼𝐷 𝑖 𝑒𝑖4 1 2

β‹… 𝔼𝐷 𝑖 𝑒𝑖2 β‹… 𝑣𝑖

2 1/2(Cauchy-Schwarz)

≀ 𝔼𝐷 𝑖 𝑒𝑖4 1 2

β‹… 𝔼𝐷 𝑖 𝑒𝑖4 β‹… 𝔼𝐷 𝑖 𝑣𝑖

4 1/4(Cauchy-Schwarz)

we also used:{𝑒, 𝑣} deg-4 pseudo-distr. 𝑒 βŠ— 𝑒, 𝑒 βŠ— 𝑣 deg.-2 pseudo-distr.(every deg.-2 poly. in 𝑒 βŠ— 𝑒, 𝑒 βŠ— 𝑣 is deg.-4 poly. in 𝑒, 𝑣 )

Page 33: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

let 𝐷 = 𝑒, 𝑣 be a deg.-4 pseudo-distribution over ℝ𝑛 Γ— ℝ𝑛

β„“4-triangle inequality

𝔼𝐷 𝑒 + 𝑣 44 1/4

≀ 𝔼𝐷 𝑒 44 1/4

+ 𝔼𝐷 𝑣 44 1/4

proof

expand 𝑒 + 𝑣 44 in terms of 𝑖 𝑒𝑖

4, 𝑖 𝑒𝑖3𝑣𝑖 , 𝑖 𝑒𝑖

2𝑣𝑖2, 𝑖 𝑒𝑖𝑣𝑖

3, 𝑖 𝑣𝑖4

bound pseudo-expect. of β€œmixed terms” using Cauchy-Schwarz / HΓΆlder

check that total is equal to right-hand side

Page 34: method and approximation algorithms I - d SteurerΒ Β· for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate

tensor decomposition

given: tensor 𝑇 β‰ˆ 𝑖 π‘Žπ‘–βŠ—4

(in spectral norm) for nice π‘Ž1, … , π‘Žπ‘š ∈ ℝ𝑛

goal: find set of vectors 𝐡 β‰ˆ Β±π‘Ž1, … , Β±π‘Žπ‘š

for simplicity: orthonormal and π‘š = 𝑛

approach

show β€œuniqueness”: 𝑖 π‘Žπ‘–βŠ—4

β‰ˆ 𝑖 π‘π‘–βŠ—4

β‡’ Β±π‘Ž1, … , Β±π‘Žπ‘š β‰ˆ ±𝑏1, … , Β±π‘π‘š

show that uniqueness proof translates to SOS certificate

any pseudo-distribution over decomposition is β€œconcentrated” on unique decomposition Β±π‘Ž1, … , Β±π‘Žπ‘š

recover decomposition by reweighing pseudo-distribution by log 𝑛degree polynomial (approximation to 𝛿 function)