expander codes and pseudorandom subspaces of r n
DESCRIPTION
expander codes and pseudorandom subspaces of R n. James R. Lee. University of Washington. [joint with Venkatesan Guruswami (Washington) and Alexander Razborov (IAS/Steklov)]. Classical high-dimensional geometry [Kasin 77, Figiel-Lindenstrauss-Milman 77] :. - PowerPoint PPT PresentationTRANSCRIPT
expander codes and pseudorandom subspaces of Rn
James R. LeeUniversity of Washington
[joint with Venkatesan Guruswami (Washington) and Alexander Razborov (IAS/Steklov)]
random sections of the cross polytope
Classical high-dimensional geometry [Kasin 77, Figiel-Lindenstrauss-Milman 77]:
For a random subspace X µ RN with dim(X) = N/2,(e.g. choose X = span {v1, …, vN/2} where vi are i.i.d. on the unit sphere)
In other words, every x 2 X has its L2 mass very “spread” out:
This holds not only for each vi, but every linear combination
random sections of the cross polytope
Classical high-dimensional geometry [Kasin 77, Figiel-Lindenstrauss-Milman 77]:
For a random subspace X µ RN with dim(X) = N/2,(e.g. choose X = span {v1, …, vN/2} where vi are i.i.d. on the unit sphere)
an existential crisis
Geometric functional analysts face a dilemma we know well: Almost every subspace satisfies this property, but we can’t pinpoint even one.
[Szarek, ICM06; Milman, GAFA01; Johnson-Schechtman, handbook01] asked: Can we find an explicit subspace on which the L1 and L2 norms are equivalent?
This is a prominent example of the (now ubiquitous) use of the probabilistic method in asymptotic convex geometry.
Related questions about explicit, high-dim. constructions arose (concurrently) in CS: - explicit embeddings of L2 into L1 for nearest-neighbor search (Indyk)
- explicit compressed sensing matrices M : RN Rn for n ¿ N (Devore) - explicit Johnson-Lindenstrauss (dimension reduction) transform (Ailon-Chazelle)
Why do analytists / CSists care about explicit high-dimensional constructions?
distortion
For a subspace X µ RN, we define the distortion of X by
By Cauchy-Schwarz, we always have N1/2¸ (X) ¸ 1.
dim(X) = (N) and (X) · 1+. [Fiegel-Lindenstrauss-Milman 77]
Random construction:A random X µ RN satisfies:
dim(X) = (1-)N and (X) = O(1). [Kasin 77]
Let X = ker(first N/2 rows of Hadamard), then (X) ¼ N1/4.
Example (Hadamard):
applications
distortion dimension
Nearest-neighbor search Compressive sensingCoding in characteristic zero,Geometric functional analysis
View as an embedding:
1+ distortion,small blowup in dimension
O(1) distortion,(N) dimension
Want a map A : RN Rn with n ¿ N, such that any r-sparse signal x 2 RN
(vector with at most r non-zero entries) can be uniquely and efficiently recovered from Ax.
Can uniquely and efficiently recover any r-sparse signal for r · N/(ker(A))2.(Even tolerates additional “noise” in the “non-sparse” parts of the signal.)
Relation to distortion: [Kashin-Temlyakov]
(Milman believes impossible)
sensing and distortion
Want a map A : RN Rn such that any r-sparse signal x 2 RN
(vector with at most r non-zero entries) can be uniquely and efficiently recovered from Ax.
Basis Pursuit: Given compressed signal y, minimize ||x||1 subject to Ax = y. (P1)
Want to solve: Given compressed signal y, minimize ||x||0 subject to Ax = y. (P0)Highly non-convex optimization problem, NP-hard for general A.
Can use linear programming!
[KT07]: If y = Av and v has at most N/[2 (ker(A))]2 non-zero coordinates, then (P0) and (P1) give the same answer.
let’s prove this
[Lots of work has been done here: Donoho et. al.; Candes-Tao-Romberg; etc.]
sensing and distortion
[KT07]: If y = Av and v has at most N/[2 (ker(A))]2 non-zero coordinates, then (P0) and (P1) give the same answer.
For x 2 RN and S µ [N], let xS be x restricted to coordinates in S.
If x 2 ker(A) and
previous results: explicit
Sub-linear dimension:
Rudin’60 (and later LLR’94) achieve dim(X) ¼ N1/2 and (X) · 3 (X = span {4-wise independent vectors})
Indyk’07 achieves dim(X) ¼ N/2(log log N)2 and (X) = 1+o(1).
Indyk’00 achieves dim(X) ¼ exp((log N)1/2) and (X) = 1+o(1).
We construct an explicit subspace X µ RN with dim(X) = (1-o(1))N and
Our result:
In our constructions, X = ker(explicit sign matrix).
previous results: derandomization
Partial derandomization:
Let Ak, N be a random k £ N sign matrix (entries are ±1 i.i.d)Kashin’s technique shows that almost surely,
(and dim(ker(Ak, N)) ¸ N – k)
Can reduce to O(N log2 N) random bits [Indyk 00]Can reduce to O(N log N) random bits [Artstein-Milman 06]Can reduce to O(N) random bits [Lovett-Sodin 07]
With No(1) random bits, we get (X) · polylog(N).
Our result:
With N random bits for any , we get (X) = O(1).[Guruswami-L-Wigderson]
the expander code construction
G = ([N], [n], E) - bipartite graph, d-right-regular and L µ Rd a subspace.
where xS 2 R|S| is x restricted to the coordinates in S µ [N] and (j) is the neighborhood of j.
N nÀ
d j
Resembles construction of Gallager, Tanner(L is the “inner” code).
Following Tanner and Sipser-Spielman, we willshow that if L is “good” and G is an “expander”then X(G,L) is even better (in some parameters).
x1x2x3
xN
some quantitative matters
Say that a subspace L µ Rd is (t, )-spread if every x 2 L satisfies
If L is ((d), )-spread, then
Conversely, if L has (L) = O(1), then L is ((d),(1))-spread.
For a bipartite graph G = ([N],[n],E), the expansion profile of G is
(This is expansion from left to right.)
spread-boosting theorem
G = ([N], [n], E) - bipartite graph, d-right-regular and left degree · D.
Setup:
L µ Rd a (t, )-spread subspace.
Conclusion:
If X(G,L) is (T, )-spread, then X(G,L) is
How to apply:
Assume D=O(1) and G(q) = (q) 8q 2 [N] (impossible to achieve)
X(G,L) is (½, 1)-spread ) (t, )-spread ) (t2, 2)-spread … ) ((N), log
t(N))-spread ) (X(G,L)) . (1/log
t(N)
spread-boosting theorem
G = ([N], [n], E) - bipartite graph, d-right-regular and left degree · D.
Setup:
L µ Rd a (t, )-spread subspace.
Conclusion:
If X(G,L) is (T, )-spread, then X(G,L) is
SS should “leak” L2 mass outside(since L is spreading and G is an expander),unless most of the mass in S is concentratedon a small subset B (impossible by assumption)B
when L is random
Let H be a (non-bipartite) d-regular graph with second eigenvalue = O(d1/2).Let G be the edge-vertex incidence graph (an edge is connected to its endpoints)
edges of Hnodes of H
Alon-Chung:
Random subspace L µ Rd is ((d), (1))-spread
Letting d = N1/4, the spread-boosting thm gives X(G,L) is (T,)-spread ) X(G,L) is
Takes O(log log N) steps to reach (N)-sized sets ) poly(log N) distortion.
(explicit constructions exist by Margulis, Lubotsky-Phillips-Sarnak)
explicit construction: ingredients for L
Let A be any k £ d matrix whose columns a1, …, ad 2 Rk are unitvectors and such that for every i j, |hai, aji| · .
Kerdock codes (aka Mutually Unbiased Bases) [Kerdock’72, Cameron-Seidel’73]
Spectral Lemma:
Then ker(A) is
((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0
+
boosting L with sum-product expanders
Kerdock + Spectral Lemma gives ((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0
Problem: If G=Ramanujan construction and L=Kerdock, the spread-boosting theorem gives nothing. (Ramanujan loses d1/2 and Kerdock gains only d1/2)
Solution: Produce L’ = X(G,L) where L=Kerdock and G=sum-product expander
Sum-product theorems [Bourgain-Katz-Tao, …]
For A µ Fp, with |A| · p0.99 we have
boosting L with sum-product expanders
Kerdock + Spectral Lemma gives ((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0
Problem: If G=Ramanujan construction and L=Kerdock, the spread-boosting theorem gives nothing. (Ramanujan loses d1/2 and Kerdock gains only d1/2)
Solution: Produce L’ = X(G,L) where L=Kerdock and G=sum-product expander
Using [Barak-Impagliazzo-Wigderson/BKSSW]and the spread-boosting theorem, L’ is(d1/2+c, (1))-spread for some c > 0.
boosting L with sum-product expanders
Solution: Produce L’ = X(G,L) where L=Kerdock and G=sum-product expander
Using [Barak-Impagliazzo-Wigderson/BKSSW]and the spread-boosting theorem, L’ is(d1/2+c, (1))-spread for some c > 0.
Now we can plug L’ into G=Ramanujan and get non-trivial boosting.
(almost done…)
some open questions
- Improve the current bounds: First attempt would be O(1) distortion with sub-linear randomness.
- Stronger pseudorandom properties: Restricted Isometry Property [T. Tao’s blog]
Improve dependence on the co-dimension (important for compressed sensing)If dim(X) ¸ (1-)N, we get distortion dependence (1/O(log log N).
- Breaking the diameter bound: Show that the kernel of a random {0,1} matrix with only 100 ones per row has small distortion. Or prove that sparse matrices cannot work.
Could hope for .
Find an explicit collection of unit vectors v1, v2, …, vN 2 Rn with N À nso that every small enough sub-collection is “nearly orthogonal.”
some open questions
- Refuting random subspaces with high distortion Give efficiently computable certificates for (X) small or Restricted Isometry Property which exist almost surely for random X µ RN.
- Linear time expander decoding? Are their recovery schemes that run faster than Basis Pursuit?