spectrahedral lifts of combinatorial...

spectrahedral lifts of combinatorial polytopes

James R. LeeUniversity of Washington

Prasad Raghavendra (UC Berkeley) David Steurer (Cornell)

combinatorial optimization

Traveling Salesman Problem:Given 𝑛 cities {1,2, … , 𝑛} and costs 𝑐𝑖𝑗 ≥ 0 for traveling between cities𝑖 and 𝑗, find the permutation 𝜋 of 1,2, … , 𝑛 that minimizies

𝑐𝜋 1 𝜋 2 + 𝑐𝜋 2 𝜋 3 + ⋯+ 𝑐𝜋 𝑛 𝜋 1

Gian Carlo Rota, 1969

combinatorial optimization

Traveling Salesman Problem:Given 𝑛 cities {1,2, … , 𝑛} and costs 𝑐𝑖𝑗 ≥ 0 for traveling between cities𝑖 and 𝑗, find the permutation 𝜋 of 1,2, … , 𝑛 that minimizies

𝑐𝜋 1 𝜋 2 + 𝑐𝜋 2 𝜋 3 + ⋯+ 𝑐𝜋 𝑛 𝜋 1

TSP𝑛 = conv 1𝜏 ∈ 0,1𝑛2 ∶ 𝜏 is a tour ⊆ ℝ

𝑛2

Can find an optimal tour by minimizing a linear function over TSP𝑛: min 𝑐, 𝑥 ∶ 𝑥 ∈ TSP𝑛

Problem: TSP𝑛 has exponentially (in 𝑛) many facets.

One can tell the same (short) story for many polytopes associated to NP-complete problems.

polyhedral lifts

Minimum Spanning Tree:Given 𝑛 cities {1,2, … , 𝑛} and costs 𝑐𝑖𝑗 ≥ 0 between cities 𝑖 and 𝑗, find a spanning tree of minimum cost.

ST𝑛 = conv 1𝜏 ∈ 0,1𝑛2 ∶ 𝜏 is a spanning tree

Again, has exponentially (in 𝑛) many facets.

There is a lift of ST𝑛 in 𝑛3 dimensions with only 𝑂 𝑛3 facets.

A lift 𝑄 of 𝑃 ⊆ ℝ𝑑 is a polytope 𝑄 ⊆ ℝ𝑁 for 𝑁 ≥ 𝑑 such that 𝑄 linearly projects to 𝑃.If we can optimize over 𝑄, then we can optimize over 𝑃:

max 𝑐, 𝑥 ∶ 𝑥 ∈ 𝑃 = max 𝑐, 𝑥 ∶ 𝑥 ∈ 𝑄

example: the permutahedron

Permutahedron:Π𝑛 ⊆ ℝ𝑛

Convex hull of vectors (𝑖1, 𝑖2, … , 𝑖𝑛) such that 𝑖1, 𝑖2, … , 𝑖𝑛 = 1,2, … , 𝑛 .

Number of facets is 2𝑛 − 2.

Let Π𝑛 ⊆ ℝ𝑛2 be the convex hull of permutation matrices.

The map 𝐴 ↦ 1, 2, … , 𝑛 𝐴 is a linear projectionfrom Π𝑛 onto Π𝑛 (i.e., Π𝑛 is a lift of Π𝑛)

Birkhoff-von Neumann: Π𝑛 is the set of matrices 𝐴 = 𝑎𝑖𝑗

satisfying 𝑎𝑖𝑗 ≥ 0 for all 𝑖, 𝑗 with all row and column sums equal one. Thus Π𝑛 has 𝑛2 facets.

[Goemans 2013]: Smallest lift has Θ 𝑛 log𝑛 facets.

a general model of (small) linear programs

Powerful model of computation:For a polytope 𝑃 ⊆ ℝ𝑑 , what is the minimal # of facets in a lift of 𝑃?

Even more powerful when we allow approximation:

Indication of power:Integrality gaps for LPs often lead to NP-hardness of approximation.

Polynomial-size LPs for NP-hard problems would show that NP ⊆ P/poly. [Rothvoss 2013]

an analog for semi-definite programs

Semidefinite programs (aka spectrahedral lifts):

Let 𝒮𝑘+ denote the cone of 𝑘 × 𝑘 symmetric, positive semidefinite matrices.

A spectrahedron is the intersection 𝒮𝑘+ ∩ ℒ for some affine subspace ℒ.

This is precisely the feasible region of an SDP.

Definition: A polytope 𝑃 admits a PSD lift of size 𝑘 if 𝑃 is a linear projection of a spectrahedron 𝒮𝑘

+ ∩ ℒ.• Easy to see that minimal size of PSD lift is ≤ minimal size of polyhedral lift

• Assuming the Unique Games Conjecture, integrality gaps for SDPs translatedirectly into NP-hardness of approximation results.

[Khot-Kindler-Mossel-O’Donnell 2004, Austrin 2007, Raghavendra 2008]

a brief history of extended formulations

1980s: Swart tries to prove that 𝑃 = 𝑁𝑃 by giving a linear program for TSP.

1989: Yannakakis (the referee for Swart’s submissions) shows that every symmetric LP for TSP must have exponential size.

. . .

2012: Fiorini, Massar, Pokutta, Tiwary, de Wolf show that every LP for TSPmust have exponential size.

2013: Chan, L, Ragahvendra, Steurer show that no polynomial-size LPcan approximate the MAX-CUT problem within a factor better than 2.[Goemans-Williamson 1998: SDPs can do factor ≈ 1.139.]

2014: Rothvoss shows that every LP for the matching polytope must haveexponential size.

????: Spectahedral lifts?[Briët-Dadush-Pokutta 2014]: Random {0,1}-polytopes do not havePSD lifts of size 𝑒𝑜 𝑛 .

our results [L-Raghavendra-Steurer]

Lower bounds on PSD lift size

The TSP𝑛, CUT𝑛, and STAB 𝐺𝑛 polytopes do not admit PSD lifts of size 𝑐𝑛2/11

(for some constant 𝑐 > 1 and some family {𝐺𝑛} of 𝑛-vertex graphs)

Lower bounds on PSD rank via sums-of-squares / Lasserre hierarchy

Approximation hardness for constraint satisfaction problems

For instance, no family of polynomial-size SDP relaxations can achieve betterthan a 7/8-approximation for MAX 3-SAT.

slack matrices & psd factorizations

Consider a non-negative matrix 𝑀 ∈ ℝ+𝑚×𝑛

rank 𝑀 = min { 𝑟 ∶ 𝑀𝑖𝑗 = 𝑢𝑖 , 𝑣𝑗 , 𝑢𝑖 , 𝑣𝑗 ⊆ ℝ𝑟 }

rank+ 𝑀 = min { 𝑟 ∶ 𝑀𝑖𝑗 = 𝑢𝑖 , 𝑣𝑗 , 𝑢𝑖 , 𝑣𝑗 ⊆ ℝ+𝑟 }

rankpsd 𝑀 = min { 𝑟 ∶ 𝑀𝑖𝑗 = 𝑈𝑖 , 𝑉𝑗 , 𝑈𝑖 , 𝑉𝑗 ⊆ 𝒮+𝑟 }

where 𝑈, 𝑉 = Tr(𝑈𝑇𝑉) for 𝑈, 𝑉 ∈ 𝒮+𝑟

Suppose 𝑃 ⊆ ℝ𝑑 is a polytope defined by

The corresponding slack matrix 𝑀 ∈ ℝ𝑚×𝑛 is given by 𝑀𝑖𝑗 = 𝑏𝑖 − ⟨𝑎𝑖 , 𝑥𝑗⟩

𝑃 = 𝑥 ∶ 𝑎𝑖 , 𝑥 ≤ 𝑏𝑖 ∶ 𝑖 = 1,… ,𝑚 , and𝑃 = conv 𝑥1, … , 𝑥𝑛

Yannakakis Factorization Theorem[Yannakakis 1989, Fiorini-Massar-Pokutta-Tiwary-de Wolf, Gouveia-Parrilo-Thomas 2011]:

The minimum size of a polyhedral (resp., PSD) lift for 𝑃 is precisely rank+ 𝑀(resp., rankpsd(𝑀)) for any slack matrix 𝑀 of 𝑃.

polytopes as a set of “theorems”

Yannakakis Factorization Theorem[Yannakakis 1989, Fiorini-Massar-Pokutta-Tiwary-de Wolf, Gouveia-Parrilo-Thomas 2011]:

The minimum size of a polyhedral (resp., PSD lift) for 𝑃 is precisely rank+ 𝑀(resp., rankpsd(𝑀)) for any slack matrix 𝑀 of 𝑃.

Polytope 𝑃 is a collection of valid linear inequalities.

Farkas’ Lemma: Every valid linear inequality is a coniccombination of defining inequalities.

Polyhedral lift size: Smallest set of “axioms” that generate all valid inequalities for 𝑃. But we are allowedto use auxiliary variables.

For the rest of the talk, restriction attention to the polytope CUT𝑛 ⊆ ℝ𝑛2 :

Convex hull of cuts in the complete graph.

valid linear inequalities for CUT𝑛

Let 𝑓 ∶ ℝ𝑛 → ℝ be a quadratic polynomial such that the restriction 𝑓 0,1 𝑛

is non-negative.

For instance: 𝑓 𝑥 = 𝑥1 + 𝑥2 + ⋯+ 𝑥𝑛 − 1 2 or

𝑓 𝑥 = 𝑥1 + 𝑥2 + ⋯+ 𝑥𝑛 −𝑛

2

2−

1

4for 𝑛 odd

Then the inequality 𝑓 𝑥 ≥ 0 ∶ 𝑥 ∈ ℝ𝑛 is a valid “linear” inequality, and every valid inequality is of this form.

[Can consider this as a linear inequality in 𝑥 ⊗ 𝑥.]

Let 𝒬 be a subspace of 𝐿2 0,1 𝑛 , and define

sos 𝒬 = cone 𝑞2 ∶ 𝑞 ∈ 𝒬 ⊆ 𝐿2 0,1 𝑛

minimal size of PSD lift for CUT𝑛 ≈ min{dim𝒬 ∶ sos(𝒬) ⊇ QML+𝑛 }

QML+𝑛 = 𝑓 ∶ 0,1 𝑛 → ℝ+ ∶ deg 𝑓 ≤ 2

minimal size of polyhedral lift for CUT𝑛 = min{ 𝒬 ∶ sos(𝒬) ⊇ QML+𝑛 }

sums of (low-degree) squares

For 𝑓 ∶ 0,1 𝑛 → ℝ+, define:where 𝒫𝑑 is the space of degree ≤ 𝑑 multi-linear polynomials on 0,1 𝑛

degsos(𝑓) = min 𝑑 ∶ 𝑓 ∈ sos 𝒫𝑑

Easy: degsos 𝑓 ≤ 𝑛. (Trivial case of the Krivine-Stengle Positivstellensatz.)

Fix a function 𝑔: 0,1 𝑚 → ℝ+ and a number 𝑛 ≥ 𝑚.

For every subset 𝑆 ⊆ {1,… , 𝑛} with 𝑆 = 𝑚, let 𝑔𝑆 𝑥 = 𝑔(𝑥 𝑆).

Let ℱ 𝑔, 𝑛 = { 𝑔𝑆 ∶ 0,1 𝑛 → ℝ+ ∶ 𝑆 = 𝑚 }.

Theorem: For any 𝑔: 0,1 𝑚 → ℝ+ and 𝑛 ≥ 2𝑚,

1 + 𝑛𝑑 ≥ min dim𝒬 ∶ ℱ 𝑔, 𝑛 ⊆ sos 𝒬 ≥ 𝐶𝑛

log 𝑛

𝑑−12

where 𝑑 = degsos(𝑔) and 𝐶 = 𝐶(𝑔) > 0.

a “hard” quadratic function

Theorem [Grigoriev 2001]:For every odd integer 𝑚 ≥ 1, the function 𝑔 ∶ 0,1 𝑚 → ℝ+ given by

𝑔 𝑥 = 𝑥1 + ⋯+ 𝑥𝑚 −𝑚

2

2

−1

4

has degsos 𝑔 ≥𝑚+1

2.

Main Theorem [L-Raghavendra-Steurer 2015]: For any 𝑔: 0,1 𝑚 → ℝ+ and 𝑛 ≥ 2𝑚,

where 𝑑 = degsos(𝑔) and 𝐶 = 𝐶(𝑔) > 0.

1 + 𝑛𝑑 ≥ min dim𝒬 ∶ ℱ 𝑔, 𝑛 ⊆ sos 𝒬 ≥ 𝐶𝑛

log 𝑛

𝑑−12

Corollary: The minimal dimension of a PSD lift of CUT𝑛 grows faster thanany polynomial.

proof outline

Let 𝒬 ⊆ 𝐿2( 0,1 𝑛) be a subspace of dimension 𝑘.

Then sos 𝒬′ ⊆ sos 𝒫𝑂 log 𝑘∼

(with respect to a certain class of test functionals)

Approximation:

Degree reduction:

For a uniformly random subset 𝑆 ⊆ {1,2, … , 𝑛} with 𝑆 = 𝑚,with high probability sos 𝒬 𝑆 ⊆ sos 𝒫log 𝑘

log 𝑛

∼

sos 𝒬 ≈ sos(𝒬′) for some “well-conditioned” 𝒬′ with dim𝑄 = dim𝒬′

[John’s theorem—factorization through a cone—via Briët-Dadush-Pokutta 2014]

Pre-processing:

quantum learning

𝒬

simple approximator

𝒬

𝒬0 = span(𝟏)

{1,2,… , 𝑛}𝑆 = 𝑚

quantum learning

𝒬

simple approximator

{1,2,… , 𝑛}𝑆 = 𝑚

𝒬 = 𝒬𝑇

𝒬0 = span(𝟏)

quantum learning

𝒬

simple approximator

𝒬 = 𝒬𝑇

𝒬0 = span(𝟏)

dim 𝒬 small ⇒ 𝒬 has small von-Neumann entropy⇒ learning cannot go on for too long

quantum learning

Consider a map 𝑄 ∶ 0,1 𝑛 → 𝒮+𝑘 with 𝔼𝑥 Tr 𝑄 𝑥 = 1

Theorem:

Want to approximate 𝑄 by a simpler map 𝑄 (s.t. 𝔼𝑥Tr 𝑄 𝑥 = 1) with respect to a class 𝒯 of test functionals Λ ∶ 0,1 𝑛 → 𝒮𝑘:

𝔼𝑥 Tr Λ 𝑥 𝑄 𝑥 − 𝑄 𝑥 ≤ 𝜀

If deg Λ ≤ 𝜅 and max𝑥

Λ 𝑥 ≤ 𝜔 for all Λ ∈ 𝒯 then there is a function

𝑅 ∶ 0,1 𝑛 → 𝒮+𝑘 with 𝔼𝑥 Tr 𝑅 𝑥 2 = 1 such that

deg 𝑅 ≤ 𝑂𝜅𝜔

𝜀𝑆 𝑈𝑄 ‖ 𝒰

𝔼𝑥 Tr Λ 𝑥 𝑄 𝑥 − 𝑅 𝑥 2 ≤ 𝜀 ∀Λ ∈ 𝒯

𝑈𝑄 = 𝔼𝑥 𝑒𝑥𝑒𝑥𝑇 ⊗ 𝑄 𝑥

𝒰 = Id/Tr(Id)

𝑆 𝐴 𝐵) = Tr 𝐴 log𝐴 − log𝐵

construction of 𝑄

Consider a map 𝑄 ∶ 0,1 𝑛 → 𝒮+𝑘 with 𝔼𝑥 Tr 𝑄 𝑥 = 1

Theorem:

Want to approximate 𝑄 by a simpler map 𝑄 (s.t. 𝔼𝑥Tr 𝑄 𝑥 = 1) with respect to a class 𝒯 of test functionals Λ ∶ 0,1 𝑛 → 𝒮𝑘:

𝔼𝑥 Tr Λ 𝑥 𝑄 𝑥 − 𝑄 𝑥 ≤ 𝜀

minimize 𝑆(𝑈𝑄 ‖ 𝒰) subject to these constraints𝑈𝑄 = 𝔼𝑥 𝑒𝑥𝑒𝑥

𝑇 ⊗ 𝑄 𝑥

Optimality dictates: 𝑄∗ 𝑥 ∝ exp

Λ∈𝒯

𝑐Λ Λ 𝑥

If one can achieve Λ 𝑐Λ small, then the Taylor series of theexponential can be truncated at a low degree.

future directions

Work modulo ideals other than 𝑋𝑖2 − 𝑋𝑖 ∶ 𝑖 = 1,… , 𝑛

Perfect matchingsCliques in 𝐺 𝑛, 𝑝

Sums of squares of sparse polynomials:

Can one find an explicit 𝑓 ∶ 0,1 𝑛 → ℝ+ such that 𝑓 ∉ sos(𝒱𝑠) ?

Let 𝒱𝑠 be the subset of multi-linear polynomials on 0,1 𝑛 thathave at most 𝑠 non-zero monomials.

spectrahedral lifts of combinatorial...

Documents