smooth boolean functions are easy: efficient algorithms for low-sensitivity functions rocco servedio...
DESCRIPTION
“Complexity” and “Boolean functions” Certificate complexity Decision tree depth (deterministic, randomized, quantum) Sensitivity Block sensitivity PRAM complexity Real Polynomial degree (exact, approximate) … Complexity measures: combinatorial/analytic ways to “get a handle on how complicated f is” All lie in {0,1,…,n} for n-variable Boolean functions.TRANSCRIPT
Smooth Boolean Functions are Easy:Efficient Algorithms for Low-Sensitivity Functions
Rocco Servedio
Joint work with
Parikshit Gopalan (MSR)Noam Nisan (MSR / Hebrew University)
Kunal Talwar (Google)Avi Wigderson (IAS)
ITCS 2016
The star of our show:
f: {0,1}n {0,1}
“Complexity” and “Boolean functions”
Certificate complexity
Decision tree depth (deterministic, randomized, quantum)
Sensitivity
Block sensitivity
PRAM complexity
Real Polynomial degree (exact, approximate)
…
Complexity measures: combinatorial/analytic ways to “get a handle on how complicated f is”
All lie in {0,1,…,n} for n-variable Boolean functions.
“Complexity” and “Boolean functions” revisited
Unrestricted circuit size
Unrestricted formula size
AC0 circuit size
DNF size
…
Complexity classes: computational ways to “get a handle on how complicated f is”
All lie in {0,1,…,2n} for n-variable Boolean functions.
High-level summary of this work:
A computational perspective on a classic open question about complexity measures of Boolean functions….
...namely, the sensitivity conjecture of [NisanSzegedy92].
Background: Complexity measures
Certificate complexity
Decision tree depth (deterministic, randomized, quantum)
Block sensitivity
PRAM complexity
Real Polynomial degree (exact, approximate)
Fundamental result(s) in Boolean function complexity:
For any Boolean function f, the above complexity measures are all polynomially related to each other.
Examples: DT-depth and real degree
• DT-depth(f) = minimum depth of any decision tree computing f
• degR(f) = degree of the unique real multilinear polynomial computing f: {0,1}n {0,1}
0 1 0 1 1
1
0 1
DT-depth is 4
DT-depth and real degree are polynomially related
_ _
0 1 0 1 1
1
0 1
(Lower bound is trivial: for each 1-leaf at depth d, have degree-d polynomial outputing 1 iff its input reaches that leaf, else 0. Sum these.)
Polynomial for this leaf: x1x2(1-x4)x3
Theorem: [NisanSzegedy92,NisanSmolensky,Midrijanis04] For any Boolean function f,
degR(f) < DT-depth(f) < 2degR(f)3.
An outlier among complexity measures: Sensitivity
• s(f,x) = sensitivity of f at x = number of neighbors y of x such that f(x) = f(y)
• s(f) = (max) sensitivity of f = max of s(f,x) over all x in {0,1}n
Folklore: s(f) < DT-depth(f)._
_Question: [Nisan91,NisanSzegedy92]
Is DT-depth(f) < poly(s(f))?
0 1 0 1 11
0 1
/
The sensitivity conjecture
_Conjecture: DT-depth(f) < poly(s(f)).
Despite much effort, best known upper bounds are exponentialin sensitivity:• [Simon82]: # relevant variables < s(f)4s(f)
• [KenyonKutin04]: bs(f) < es(f)
• [AmbainisBavarianGaoMaoSunZuo14]: bs(f),C(f) < s(f)2s(f)-1, deg(f) < 2s(f)(1+o(1))
• [AmbainisPrusisVihrovs15]: bs(f) < (s(f)-1/3)2s(f)-1
Equivalently, block sensitivity, certificate complexity, real degree, approximate degree, randomized/quantum DT-depth…
_
_
_ _
_
This work: Computational view on the sensitivity conjecture
Previous approaches: combinatorial/analyticBut conjecture also is a strong computational statement: low-sensitivity functions are very easy to compute!
_Conjecture: DT-depth(f) < poly(s(f)).
Prior to this work, evenseems not to have been known. In fact, even an upper bound on the # of sensitivity-s functions seems not to have been known.
Conjecture implies
_Conjecture: DT-depth(f) < poly(s(f)).
Results
Theorem: Every n-variable sensitivity-s function is computed by a Boolean circuit of size nO(s).In fact, every such function is computed by a Boolean formula of depth O(s log(n)).
So now the picture is
?
?
?
?
Results (continued)
Theorem: Any n-variable sensitivity-s function can be self-corrected from 2-cs-fraction of worst-case errors using nO(s) queries and runtime.
(Conjecture low-sensitivity f has low degR has low deg2
has self-corrector)
All results are fairly easy. (Lots of directions for future work!)
Circuit/formula size bounds are consequences of the conjecture.Another consequence of the conjecture:
Simple but crucial insight
Fact: If f has sensitivity s, then f(x) is completely determined once you know f’s value on 2s+1 neighbors of x.
… …………..…
x
neighbors where f=0
neighbors where f=1
2s+1 neighbors
Either have at least s+1 many 0-neighbors or at least s+1 many 1-neighbors.The value of f(x) must equal this majority value!(If it disagreed, would have s(f) > s(f,x) > s+1.)
f(x)=1
_ _
f(x)=?
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight levels0,…,2s
?weight level2s+1; each point here has 2s+1 down-neighbors
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight levels0,…,2s
weight level2s+1; each point here has 2s+1 down-neighbors
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight levels0,…,2s
?weight level2s+1; each point here has 2s+1 down-neighbors
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight levels0,…,2s
weight level2s+1; each point here has 2s+1 down-neighbors
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight levels0,…,2s
weight level2s+1; each point here has 2s+1 down-neighbors
?
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight levels0,…,2s
weight level2s+1; each point here has 2s+1 down-neighbors
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight level2s+1; each point here has 2s+1 down-neighbors
etc;
becomes
weight levels0,…,2s
Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s.
0n
1n
weight levels0,…,2s+1
weight level2s+1; each point here has 2s+1 down-neighbors
etc;
becomes
0n
1n
weight levels0,…,2s
Fill in all of {0,1}n this way, level by level.
Corollary: There are at most 2{n choose <2s} sensitivity-s functions over {0,1}n.
_
Can we use this insight to compute
sensitivity-s functions efficiently?
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.
Algorithm: For |x| stages,• Shift center of Hamming ball along
shortest path to x
• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)
x
0n = first center next center
Compute at most n2s new values
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.
Algorithm: For |x| stages,• Shift center of Hamming ball along
shortest path to x
• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)
x
next center
Compute at most n2s new values
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.
Algorithm: For |x| stages,• Shift center of Hamming ball along
shortest path to x
• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)
x
next center
Compute at most n2s new values
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n.
Algorithm: For |x| stages,• Shift center of Hamming ball along
shortest path to x
• Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote)
x
Shallow circuits for sensitivity-s functions?
The algorithm we just saw seems inherently sequential – takes n stages.
Can we parallelize?
Yes, by being bolder: go n/s levels at each stage rather than one.
Extension of earlier key insight
Sensitivity-s functions are noise-stable at every input x.
• Pick any vertex x. • Flip n/(11s) random coordinates to get y.
• View t-th coordinate flipped as chosen from ‘untouched’ n-t+1 coordinates.At each stage, at most s coordinates are sensitive.Get Pr[f(x) = f(y)] < Pr[stage 1 flips f] + Pr[stage 2 flips f] + … < s/n + s/(n-1) + … + s/(n – n/11s + 1)
< 1/10.
___
/
Downward walks
Similar statement holds for “random downward walks.”
• Pick any vertex x with |x| many ones. • Flip |x|/(11s) randomly chosen 1’s to 0’s to get y.
• View t-th coordinate flipped as chosen from ‘untouched’ |x|-t+1 coords.Get Pr[f(x) = f(y)] < s/|x| + s/(|x|-1) + … + s/(|x| – |x|/11s + 1)
< 1/10._
_/
Shallow circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a formula of depth O(s log n).
Algorithm has value of f on bottom 11s layers “hard-coded” in.
Parallel-Alg: Given x• If |x| < 11s, return hard-coded value.
• Sample C=O(1) points x1, x2, xC from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one.
• Return majority vote of the C results.
x
……..x1 x2 xC
0n
weight levels0,…,11s
_
Algorithm has value of f on bottom 10s layers “hard-coded” in.
Parallel-Alg: Given x• If |x| < 11s, return hard-coded value.
• Sample C=O(1) points x1, x2, xC from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one.
• Return majority vote of the C results.
x
…....x1 x2 xC
0n
• Have Parallel-Alg(x) = f(x) with probability 19/20 for all x(proof: induction on |x|)
• After O(s log n) stages, bottoms out in “red zone”, so parallel runtime is O(s log n)
• C=O(1), so total work is CO(s log n) = nO(s)
weight levels0,…,11s
_
Conclusion / Questions
Many questions remain about computational properties of low-sensitivity functions.
We saw there are at most 2{n choose <2s} many sensitivity-s functions.
Can this bound be sharpened?
We saw every sensitivity-s function has a formula of depth O(s log n).
Does every such function have a
• TC0 circuit / AC0 circuit / DNF / decision tree of size npoly(s)?
• PTF of degree poly(s)? DNF of width poly(s)? GF(2) polynomial of degree poly(s)?
A closing puzzle/request
We saw sensitivity-s functions obey a “majority rule” (MAJ of any 2s+1 neighbors).
Well-known that degree-d functions obey a “parity rule” (PAR over any (d+1)-dim subcube must = 0).
If the conjecture is true, then low-sensitivity functions have low degree…
…and these two very different-looking rules must coincide!
Explain this!
Thank you for your attention
36