expander codes and pseudorandom subspaces of r n

expander codes and pseudorandom subspaces of Rn

James R. LeeUniversity of Washington

[joint with Venkatesan Guruswami (Washington) and Alexander Razborov (IAS/Steklov)]

random sections of the cross polytope

Classical high-dimensional geometry [Kasin 77, Figiel-Lindenstrauss-Milman 77]:

For a random subspace X µ RN with dim(X) = N/2,(e.g. choose X = span {v1, …, vN/2} where vi are i.i.d. on the unit sphere)

In other words, every x 2 X has its L2 mass very “spread” out:

This holds not only for each vi, but every linear combination

random sections of the cross polytope

Classical high-dimensional geometry [Kasin 77, Figiel-Lindenstrauss-Milman 77]:

For a random subspace X µ RN with dim(X) = N/2,(e.g. choose X = span {v1, …, vN/2} where vi are i.i.d. on the unit sphere)

an existential crisis

Geometric functional analysts face a dilemma we know well: Almost every subspace satisfies this property, but we can’t pinpoint even one.

[Szarek, ICM06; Milman, GAFA01; Johnson-Schechtman, handbook01] asked: Can we find an explicit subspace on which the L1 and L2 norms are equivalent?

This is a prominent example of the (now ubiquitous) use of the probabilistic method in asymptotic convex geometry.

Related questions about explicit, high-dim. constructions arose (concurrently) in CS: - explicit embeddings of L2 into L1 for nearest-neighbor search (Indyk)

- explicit compressed sensing matrices M : RN Rn for n ¿ N (Devore) - explicit Johnson-Lindenstrauss (dimension reduction) transform (Ailon-Chazelle)

Why do analytists / CSists care about explicit high-dimensional constructions?

distortion

For a subspace X µ RN, we define the distortion of X by

By Cauchy-Schwarz, we always have N1/2¸ (X) ¸ 1.

dim(X) = (N) and (X) · 1+. [Fiegel-Lindenstrauss-Milman 77]

Random construction:A random X µ RN satisfies:

dim(X) = (1-)N and (X) = O(1). [Kasin 77]

Let X = ker(first N/2 rows of Hadamard), then (X) ¼ N1/4.

Example (Hadamard):

applications

distortion dimension

Nearest-neighbor search Compressive sensingCoding in characteristic zero,Geometric functional analysis

View as an embedding:

1+ distortion,small blowup in dimension

O(1) distortion,(N) dimension

Want a map A : RN Rn with n ¿ N, such that any r-sparse signal x 2 RN

(vector with at most r non-zero entries) can be uniquely and efficiently recovered from Ax.

Can uniquely and efficiently recover any r-sparse signal for r · N/(ker(A))2.(Even tolerates additional “noise” in the “non-sparse” parts of the signal.)

Relation to distortion: [Kashin-Temlyakov]

(Milman believes impossible)

sensing and distortion

Want a map A : RN Rn such that any r-sparse signal x 2 RN

(vector with at most r non-zero entries) can be uniquely and efficiently recovered from Ax.

Basis Pursuit: Given compressed signal y, minimize ||x||1 subject to Ax = y. (P1)

Want to solve: Given compressed signal y, minimize ||x||0 subject to Ax = y. (P0)Highly non-convex optimization problem, NP-hard for general A.

Can use linear programming!

[KT07]: If y = Av and v has at most N/[2 (ker(A))]2 non-zero coordinates, then (P0) and (P1) give the same answer.

let’s prove this

[Lots of work has been done here: Donoho et. al.; Candes-Tao-Romberg; etc.]

sensing and distortion

[KT07]: If y = Av and v has at most N/[2 (ker(A))]2 non-zero coordinates, then (P0) and (P1) give the same answer.

For x 2 RN and S µ [N], let xS be x restricted to coordinates in S.

If x 2 ker(A) and

previous results: explicit

Sub-linear dimension:

Rudin’60 (and later LLR’94) achieve dim(X) ¼ N1/2 and (X) · 3 (X = span {4-wise independent vectors})

Indyk’07 achieves dim(X) ¼ N/2(log log N)2 and (X) = 1+o(1).

Indyk’00 achieves dim(X) ¼ exp((log N)1/2) and (X) = 1+o(1).

We construct an explicit subspace X µ RN with dim(X) = (1-o(1))N and

Our result:

In our constructions, X = ker(explicit sign matrix).

previous results: derandomization

Partial derandomization:

Let Ak, N be a random k £ N sign matrix (entries are ±1 i.i.d)Kashin’s technique shows that almost surely,

(and dim(ker(Ak, N)) ¸ N – k)

Can reduce to O(N log2 N) random bits [Indyk 00]Can reduce to O(N log N) random bits [Artstein-Milman 06]Can reduce to O(N) random bits [Lovett-Sodin 07]

With No(1) random bits, we get (X) · polylog(N).

Our result:

With N random bits for any , we get (X) = O(1).[Guruswami-L-Wigderson]

the expander code construction

G = ([N], [n], E) - bipartite graph, d-right-regular and L µ Rd a subspace.

where xS 2 R|S| is x restricted to the coordinates in S µ [N] and (j) is the neighborhood of j.

N nÀ

d j

Resembles construction of Gallager, Tanner(L is the “inner” code).

Following Tanner and Sipser-Spielman, we willshow that if L is “good” and G is an “expander”then X(G,L) is even better (in some parameters).

x1x2x3

xN

some quantitative matters

Say that a subspace L µ Rd is (t, )-spread if every x 2 L satisfies

If L is ((d), )-spread, then

Conversely, if L has (L) = O(1), then L is ((d),(1))-spread.

For a bipartite graph G = ([N],[n],E), the expansion profile of G is

(This is expansion from left to right.)

spread-boosting theorem

G = ([N], [n], E) - bipartite graph, d-right-regular and left degree · D.

Setup:

L µ Rd a (t, )-spread subspace.

Conclusion:

If X(G,L) is (T, )-spread, then X(G,L) is

How to apply:

Assume D=O(1) and G(q) = (q) 8q 2 [N] (impossible to achieve)

X(G,L) is (½, 1)-spread ) (t, )-spread ) (t2, 2)-spread … ) ((N), log

t(N))-spread ) (X(G,L)) . (1/log

t(N)

spread-boosting theorem

G = ([N], [n], E) - bipartite graph, d-right-regular and left degree · D.

Setup:

L µ Rd a (t, )-spread subspace.

Conclusion:

If X(G,L) is (T, )-spread, then X(G,L) is

SS should “leak” L2 mass outside(since L is spreading and G is an expander),unless most of the mass in S is concentratedon a small subset B (impossible by assumption)B

when L is random

Let H be a (non-bipartite) d-regular graph with second eigenvalue = O(d1/2).Let G be the edge-vertex incidence graph (an edge is connected to its endpoints)

edges of Hnodes of H

Alon-Chung:

Random subspace L µ Rd is ((d), (1))-spread

Letting d = N1/4, the spread-boosting thm gives X(G,L) is (T,)-spread ) X(G,L) is

Takes O(log log N) steps to reach (N)-sized sets ) poly(log N) distortion.

(explicit constructions exist by Margulis, Lubotsky-Phillips-Sarnak)

explicit construction: ingredients for L

Let A be any k £ d matrix whose columns a1, …, ad 2 Rk are unitvectors and such that for every i j, |hai, aji| · .

Kerdock codes (aka Mutually Unbiased Bases) [Kerdock’72, Cameron-Seidel’73]

Spectral Lemma:

Then ker(A) is

((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0

+

boosting L with sum-product expanders

Kerdock + Spectral Lemma gives ((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0

Problem: If G=Ramanujan construction and L=Kerdock, the spread-boosting theorem gives nothing. (Ramanujan loses d1/2 and Kerdock gains only d1/2)

Solution: Produce L’ = X(G,L) where L=Kerdock and G=sum-product expander

Sum-product theorems [Bourgain-Katz-Tao, …]

For A µ Fp, with |A| · p0.99 we have


Kerdock + Spectral Lemma gives ((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0

Problem: If G=Ramanujan construction and L=Kerdock, the spread-boosting theorem gives nothing. (Ramanujan loses d1/2 and Kerdock gains only d1/2)


Using [Barak-Impagliazzo-Wigderson/BKSSW]and the spread-boosting theorem, L’ is(d1/2+c, (1))-spread for some c > 0.



Using [Barak-Impagliazzo-Wigderson/BKSSW]and the spread-boosting theorem, L’ is(d1/2+c, (1))-spread for some c > 0.

Now we can plug L’ into G=Ramanujan and get non-trivial boosting.

(almost done…)

some open questions

- Improve the current bounds: First attempt would be O(1) distortion with sub-linear randomness.

- Stronger pseudorandom properties: Restricted Isometry Property [T. Tao’s blog]

Improve dependence on the co-dimension (important for compressed sensing)If dim(X) ¸ (1-)N, we get distortion dependence (1/O(log log N).

- Breaking the diameter bound: Show that the kernel of a random {0,1} matrix with only 100 ones per row has small distortion. Or prove that sparse matrices cannot work.

Could hope for .

Find an explicit collection of unit vectors v1, v2, …, vN 2 Rn with N À nso that every small enough sub-collection is “nearly orthogonal.”

some open questions

- Refuting random subspaces with high distortion Give efficiently computable certificates for (X) small or Restricted Isometry Property which exist almost surely for random X µ RN.

- Linear time expander decoding? Are their recovery schemes that run faster than Basis Pursuit?

expander codes and pseudorandom subspaces of r n

Documents

random x rn

random subspace x rn

rsparse signal x

r non

explicit subspace

n dimensionwant

compressed signal y

n2 kera2 non