1 techniques for time-space tradeoff lower bounds for branching programs: part i paul beame...

1

Techniques for Time-Space Tradeoff Lower Bounds for Branching Programs: Part I

Paul BeameUniversity of Washington

joint work with Erik Vee, Mike Saks, T.S. Jayram, Xiaodong Sun

2

Branching programs

x1

x4

x2

x3

x5x5

x3

x7

x1

x2 x8x7

1

0

10

To computef:{0,1}n {0,1}

on input (x1,…,xn)follow path fromsource to sink

x=(1,1,0,1,...)Time T= length of

longest path

Space S= log2 (# of nodes)

3

Branching program properties

Simulate random-access machines same time T and space S

Multi-way version for xi in domain D good for modeling RAM input registers

BPs will be leveled wlog. same time T at most 2S nodes per level

4

Overall approach to lower bounds

If f:Dn {0,1} is computed using small time and space

then f-1(1) has a special combinatorial structure.

Lower bounds for f follow if f-1(1) does not have the structure

How do we find such structures?

5

Levelled BPs and Layers

v0

10

kn

Break BP into r layers L1,…,Lr

of height kn/r

kn

r

kn

r

L1

L2

Lr

Assume time T knand wlog that the BP is levelled( 2S nodes per level)

Partition (a subset of) the layers Lj into sets 1, 2,…, p p 2

6

The Trace of an Input

v0

10

kn

kn

r

kn

r

L1

L2

L5

The trace of input x

• the sequence of nodes reached on input x as the computation moves from one set i to another

•E.g. trace(x) =(v1,v2,v3)

• a = length of trace = # of alternations in the partition

• 2Sa possible traces

v1

v2

v3

Partition of (a subset of) the layers Lj into sets 1, 2,…, p p 2

7

Branching program time-space lower bounds using these ideas

Oblivious - same variable queried per level [Chandra-Furst-Lipton 83], [Alon-Maass 86],

[Babai-Nisan-Szegedy 89]

(Syntactic) read k - no variable queried k times on any path

[Borodin-Razborov-Smolensky 89], [Okol’nishnikova 89]

General BP’s [B-Jayram-Saks 98], [Ajtai 99a], [Ajtai 99b],

[B-Saks-Sun-Vee 00], [B-Vee 02]

8

The Case of Oblivious BP’s

v0

10

kn

kn

r

kn

r

L1

L2

L5

v1

v2

v3

Partition of the layers Lj into sets 1, 2,…, p p 2

When the BP is oblivious• Each i is associated with the subset Ai of variables read in levels in i

• trace(x) can be used as the messages on input x in a communication protocol between p players computing f, where the ith player has values of the variables in Ai

9

The Oblivious Case

Let C= ip Ai be the common variables for the players and A’i = Ai - C

For any assignment to C, the trace can be used to compute f

Space bound S CC(f;A’1,…,A’p)/a for any

Want: n-|A’i| large for all i

small # of alternations a

10

The Read-k Case Wlog first make the

read-k BP uniform For any pair of nodes

u,v the multi-set of variables queried between u and v is the same on any path

Call the set Auv

Then apply levelling etc.

u

v

Add extra ‘dummy’ queries on each path if necessary

11

Read-k Case Argument Overview Variation of the usual argument

First fix the node sequence s=(v0,v1,…,vr) for the r layers

Defines sets of inputs Av0v1,…,Avr-1vr read during these layers

fs is an AND of functions defined on these sets of variables

(k,r)-rectangle

Then choose a layer partition 1, 2 that is good for Av0v1,…,Avr-1vr

Subsequence of (v0,v1,…,vr) at alternations forms the trace - also good 10

v1

v2

v4

v3

v0

vr

12

Partitioning the layers

r layers (of height kn/r)

Let Layers(x,i) be the set of layers in which variable xi is read on input x |Layers(x,i)| k

For a set of layers, unread(x, ) = { i : Layers(x,i) = } core(x, ) = { i : Layers(x,i) } Partition is good if these are large for = 1, 2

13

How to partition the layers

Assign every layer to 1 or 2

A = core(x, 1) = unread(x, 2)

B = core(x, 2) = unread(x, 1) C = set of variables read in common

Two techniques, both using probabilistic method [Borodin-Razborov-Smolensky 89]

|A|, |B| n/2k+1, a r k22k

[Okol’nishnikova 89] |A| n/kO(k), |B| n/2, a = 2k, r = 2k2

14

The Read-k Case: Fixing the Trace

v0

10

kn

kn

r

kn

r

L1

L2

L5

v1

v2

v3

Fix a node sequence and then partition the layers Lj into sets 1, 2 yielding a trace tDefineft(x)=1 f(x)=1 and x follows t

Again, by uniformity, the trace determines which variables are read in each component of the partition

vf

ft(x)=g(xAC) h(xBC)

ft-1(1) is a pseudo-rectangle

15

Rectangles and Pseudo-rectangles

Ordinary combinatorial rectangle in {0,1}n

Partition [n] into A and B RARB for sets RA {0,1}A and RB {0,1}B

Alternatively {x : xA RA and xBRB}

Pseudo-rectangle [n] =D E, sets RD {0,1}D and RE {0,1}E

{x : xD RD and xE RE}

Or, partition [n] into A, B and C {x: xAC RAC and xBC RBC}

16

Read-k lower bounds

If f is computed by a (nondeterministic) read k branching program of size 2S then

The ones of f, f-1(1), can be covered by 2Sa pseudo-rectangles R with |A| and |B| large and f(R)=1 |A|, |B| n/2k+1, ak22k [BRS 89] |A| n/kO(k), |B| n/2, a=2k [Okol 89]

Prove upper bound on # of inputs in any such pseudo-rectangle on which f is constant 1

2S (|f-1(1)|/)1/a or S log (|f-1(1)|/)1

a

17

Lower bounds for general BPs [BST 98]

Major problem to handle Fixing the node sequence and the layer partition

does not fix sets A = core(x, 1) or B = core(x, 2)

Solutions Apply one layer partition for all inputs

Use extension of [BRS 89] partition method Ignore inputs for which partition is bad

Prob method argument bounds # of bad inputs Partition remaining inputs based on the values of

core(x, 1) and core(x, 2) as well as on their traces

18


Number of rectangles increases Multiply 2Sa by the number of choices of

core(x, 1) and core(x, 2) A priori bound is 3n since sets are disjoint Observation

a pseudo-rectangle w.r.t A,B,C remains a pseudo-rectangle w.r.t A’,B’,C’ if A’ A, B’ B, and C’=C (A-A’) (B-B’)

Partition based on only the first m=n/2k+1 elements of core(x, 1) and core(x, 2)

# of choices is at most

2nn

m,m m

19


If f is computed by a (nondeterministic) time kn branching program of size 2S

Then most of f-1(1) can be covered by 2Sa

pseudo-rectangles with |A|=|B|=m=n/2k+1 where ak22k (the cover is a partition if the program is

deterministic)

# of pseudo-rectangles is at most 24log2(n/m) m+Sa = 24(k+1)m+Sa

2n

m

Is that good?

20

Using the Bound: Embedded Rectangles

Pseudo-rectangles are hard to reason about

Easier objects: Embedded rectangles Start with an pseudo-rectangle on A,B,C Fix an assignment to the common set C

we get a simpler object with a combinatorial rectangle RAxRB on AxB an assignment to C=AB spine

Result is an embedded rectangle

21

Partition of most of f-1(1) into embedded rectangles

Input space is Dn

Each pseudo-rectangle can be partitioned into at most |D|n-2m embedded rectangles R with

|A|=|B|=m=n/2k+1 A,B feet of R

Total number of such embedded rectangles partitioning most of f-1(1) 24(k+1)m+Sa |D|n-2m

Total number of inputs is |D|n

Non-trivial only if, e.g. |D| 23(k+1) large domain

22

Lower bound on embedded rectangle size for which f is constant

Suppose |f-1(1)| |D|n

Since at most 24(k+1)m+Sa |D|n-2m embedded rectangles, average size is at least 2-4(k+1)m-Sa-1 |D|2m and at least 1/4 of f-1(1) is covered by those

2-4(k+1)m-Sa-2 |D|2m

Such a rectangle defined by (,A,B,RA,RB) must

have |RA|/|Dm|,|RB|/|Dm| 2-4(k+1)m-Sa-2

Typical 2-party communication complexity results* say |RA|/|Dm|,|RB|/|Dm| |D|-m

*With extra work to handle and easiest A,B

23

The time space tradeoff lower bounds [BST 98]

Therefore for such a hard f 2-4(k+1)m-Sa-2 |D|-m

So if is constant and |D| 29(k+1)/ Sa [log |D| 4(k+1)] m c (/2) m log |D|

Since m=n/2k+1 and ak22k for some C 1 S C-k n log |D|

Therefore T/n=k c’log ((n log|D|)/S), i.e.n | D |T n

S

loglog

24

What functions are this hard? Computing xTMx 0 (mod q) qn [BST 98]

Non-optimal bound when M is Sylvester matrix

Let 1/2 and c 2/(1H2()) HAM:[nc]n {0,1}: Is any pair (xi,xj) close in

Hamming distance (xi,xj) clog n? Any two sets in [nc]m each of density n-m contain a

pair of coordinates that are within clog n of each other Defined in [Ajtai 99a] where weaker lower bounds proved

using generalization of [Okol 89] instead of [BRS 89] Best bounds follow immediately from [BST 98]

25

What functions are this hard?

Computing xTMyx 0 (mod q) for x GF(q)n, y GF(q)2n-1, qn

Function defined in [Ajtai 99b] and case q=2 used for Boolean lower bounds

Key to improvement: For some y, My has better rigidity properties than Sylvester matrices have

Defining these matrices and analyzing their rigidity properties is the key contribution of [Ajtai 99b]

Most of the hard work in Boolean lower bounds is in the second half of [Ajtai 99a], much of which does not fit in the STOC version

26

Ajtai’s matrices

0

y1

y2n-1y2n-2yn+2yn+1yn

y4

y3

y2

My

My is constant on anti-diagonals below the main diagonal

27

xTMyx on an embedded (m,)-rectangle

My

A Bx

A

B

x

For every on AUB, f (xAUB,,y)

= xAT MAB xB

+ g(xA,y) + h(xB,y)

28

Rectangles, rank, & rigidity

Largest rectangle on which xATMxB is

constant has density q-rank(M)

[BRS 89]

Lemma [Ajtai 99b] Can fix y s.t. every nn minor MAB of My has rank(MAB) c n/log2(1/) 1+n better than comparable rigidity bound of 2n for

Sylvester matrices [BRS 89], [BST 98]

29

How to partition the layers

Assign every layer to 1 or 2

A = core(x, 1) = unread(x, 2)

B = core(x, 2) = unread(x, 1) C = set of variables read in common

Two techniques for read-k case, both using probabilistic method [Borodin-Razborov-Smolensky 89]

|A|, |B| n/2k+1, a r k22k

[Okol’nishnikova 89] |A| n/kO(k), |B| n/2, a = 2k, r = 2k2

30

Read-k case:Branching program with node sequence

kn

v0

vr-1

v2

v1

vr10

kn

r

kn

r

L1

L2

Lr

31

Partitioning the layers

r layers (of height kn/r)

Let Layers(x,i) be the set of layers in which variable xi is read on input x |Layers(x,i)| k

For a set of layers, unread(x, ) = { i : Layers(x,i) = } core(x, ) = { i : Layers(x,i) } Partition is good if these are large for = 1, 2

32

Partitioning the layers [Okol’nishnikova 89]

Fix node sequence s and x that follows s Choose a random subset 1 of k of the r

layers For each index i

Thus

Fix a partition achieving the average

1 1

rk

# n/E i :Layers(x,i)

L L

1 1

rk

1Pr Layers(x,i) /

L L

33

Partitioning the layers [Okol’nishnikova 89]

I.e., for each such x

Only k layers of height kn/r At most a=2k alternations Total k2n/r n/2 vars read in 1 if r=2k2

1rk

core(x, ) n

L /

core (x, 2) n/22 O(k)

12kk

core(x, ) n / n /k

L

34

Partitioning the layers [BRS 89]

Assign each layer independently Pr[Li 1]=Pr[Li 2]=1/2

for =1 or 2

Let i=1 if Layers(x,i) and 0 otherwise

Pr[i]=Pr[Layers(x,i) ] 1/2k

each variable is read in at most k layers

E[ii ]=E[ #{ i: Layers(x,i) } ] n/2k

i.e., E[|core(x, )|] n/2k

E[|unread(x, )|] n/2k

35

Modification for general BP [BST 98]

Let (i) =|Layers(x,i)| i (i) kn

Pr[i] = Pr[Layers(x,i) ] = 2 (i)

E[|core(x, )|] = E[ii ] = i 2(i)

By arithmetic-geometric mean inequality this is ki

( ) /nn 2 n2

i

36

Second Moment Method [BRS 89][BST 98]

If r is big enough |core(x,)| is concentrated around its mean Bound Var[|core(x, )|] = Var[ii ]

Events for i, j correlated only if xi and xj read in the same layer

At most (i)kn/r vars read in the same layer as xi

Each contributes at most Pr[i]=1/2 (i) to variance

Var[ii ] = (kn/r) i (i) 2 (i)

(k/r) (j (j)) i 2 (i)

(k2n/r) i 2 (i) = (k2n/r) E[|core(x, )|]

FKG-like inequalityof Chebyshev - termsare anti-correlated

38

The Boolean case is much harder

[BST 98] Showed only T 1.017n for S=o(n) for quadratic form problem Uses pseudo-rectangles but specialized to splitting BP only

at the T/2 level, deterministic

[Ajtai 99a] Shows lower bounds for Element Distinctness over [n2] that work for density 2-m

Embedded rectangles not pseudo-rectangles, deterministic [Ajtai 99b] T=O(n) S=(n) for Boolean BP’s!!!

[B-Saks-Sun-Vee 00] Improved bounds and extension to O(n/T)-error randomized case Talk later

39

Power of the Large Domain Technique

For oblivious BPs, best bound using two-party CC is T=(n log (n/S)) [Alon-Maass 86]

Bounds match for general BPs over large domains

Best oblivious BP bounds use multiparty CC T=(n log2(n/S)) [Babai-Nisan-Szegedy 89] [B-Vee 02] Matching bounds for general BPs over

large domains Erik Vee talk later

1 techniques for time-space tradeoff lower bounds for branching programs: part i paul beame...

Documents