1
Provably hard problems below the satisfiability threshold
Paul Beame Univ. of Washington
Michael Molloy Univ. of Toronto
Dimitris Achlioptas Microsoft Research
A sharp threshold in proof complexity yields
lower bounds for satisfiability search
2
CNF Satisfiability
• (x1 x2 x4) (x1 x3) (x3 x2) (x4 x3)
• NP-complete but many heuristics because of its practical importance– presumably exponential in the worst case
• If you know formula is satisfiable– How hard is it to find assignment?– No lower bounds known for interesting
heuristics.
3
Satisfiability Algorithms
• Local search (incomplete)– GSAT [Selman,Levesque,Mitchell 92]
– Walksat [Kautz,Selman 96]
• Backtracking search (complete)– DPLL [Davis,Putnam 60]
[Davis,Logeman,Loveland 62]
– DPLL + “clause learning”
• Select* a literal l (some x or x) Remove all clauses containing l
Shrink all clauses containing l
• While there are 1-clausesPick some (arbitrary) 1-clause, satisfy it and simplify
• If there is a 0-clause (contradiction)Backtrack to last free step
Backtracking search/DPLL
Free step
Yields `residual formula’
*many options for select
5
Resolution
• Start with clauses of CNF formula F
• Resolution rule– Given (A x), (B x) can derive
(A B)
• F is unsatisfiable 0-clause derivable– Proof size = # of clauses
Running DPLL (with any select) on an unsatisfable formula F
results in a tree-resolution proof of F
6
Random CNF formulas
• Random 2-CNF formula with sn clauses– is satisfiable w.h.p. for s 1
• and simple DPLL will find a satisfying assignment in linear time w.h.p.
– is unsatisfiable w.h.p. for s 1• and simple DPLL will finish and yield a
resolution proof of unsatisfiability in linear time w.h.p.
7
DPLL on random 3-CNF*
0
1
probability satisfiable
4.26
ratio of clauses to variables
# of DPLLbacktracks
* n = 50 variables
Can prove 2(n/) time is required for
unsatisfiable formulas above the threshold
What about satisfiableformulas below threshold?
8
Phase transitions and algorithmic complexity
• Easy connection– Hardest random problems will always be at
a monotone sharp threshold n if it exists• Can randomly reduce satisfiable problems of
lower density to those at the threshold– Given a formula with n clauses can always
add () n random clauses to make it a random problem nearly at the threshold and use that soln
• Can reduce unsatisfiable problems of larger density to those at the threshold
– Given a formula with n clauses ignore all but the first nof them
9
Hard satisfiable formulas?
With non-deterministic select we could simply guess n correct value assignments.
.... How can a satisfiable formula possibly be hard?
Any implementation of select must run in polynomial time.…. Very simple heuristics used in practice
Some standard select rules for DPLL algorithms
• UC– Pick variables in a fixed order– Always set True first
• UCwm– Pick variables in a fixed order– Apply a majority vote among 3-clauses for assigning
each value
• GUC– Pick a variable v in a shortest clause C– Set v to satisfy C
Contributions
These natural DPLL algorithms take exponential time on satisfiable formulas
family of unsatisfiable random formulas parametrized by s s.t. w.h.p.
s 1 linear size resolution proofs
s 1 only exponential size resolution proofs possible
12
Key property of each of the select rules we’ve seen
• On random 3-CNF, before the first backtrack occurs, the residual formula is a uniformly random mix of 2-clauses and 3-clauses – If it has m2 2-clauses and m3 3-clauses
then it is equally likely to be any formula with these properties
• key property proofs of algorithms’ success without backtracking
What do long runs look like?
Residual formula at is unsatisfiable
Algorithm’sproof of unsatisfiability is exponentially long
Every resolution2n
Residual formula at each node is a mix of 2- and 3-clauses
14
Proof Complexity
[Chvátal-Szemerédi 88]
Formula is unsatisfiable w.h.p. for 4.57
Theorem. A random CNF formula with n 3-clauses
has no resolution refutation of size 2n w.h.p.
and sn 2-clauses where s 1
[Achlioptas,B.,Molloy 2001]
s 1-and ????
15
Non-rigorous results
1
4.574.26
SAT
UNSAT
2/3 3-clause
ratio
s We can add 2/3 n 3-clauses but not n 2-clauses
2-clause ratio [Kirkpatrick, Monasson, Selman, Zecchina 97]
16
Rigorous results [Achlioptas, Kirousis, Kranakis, Krizanc 97]
1
4.57
SAT
UNSAT
8/32/3
??
??
?
?
??
??
?
?
??
?
s
We can add 2/3 n 3-clauses but not n 2-clauses
2.28
3-clause ratio
2-clause ratio
17
Proof Complexity
Formula is unsatisfiable w.h.p. for 4.57
Theorem. A random CNF formula with n 3-clauses
has no resolution refutation of size 2n w.h.p.
and sn 2-clauses where s 1
[Achlioptas,B.,Molloy 2001]
2.281 and s 1- for .0001
Sharp threshold since resolution is linear for s 1+
18
These DPLL algorithms follow trajectories
1
2/3
[Chao,Franco 88]
[Frieze,Suen 95]
[Achlioptas 00]
[Achlioptas,Sorkin 00]UCGUC
s
3.26 3-clause ratio
2-clause ratio
8/3
19
DPLL crossing into the bad zone
1
4.57
Provably UNSAT& Hard
s
3.26 4.26
ProvablySAT & Easy
Algorithm Trajectory
2-clause ratio
3-clause ratio
Exponential lower bounds far below the threshold.
UC = 3.81
UCwm = 3.83
GUC = 4.01
Theorem. Let A {UC, UCwm, GUC}. Let
W.h.p. algorithm A takes more than 2n steps on a random 3-CNF with An clauses
Lower bound also applies to any resolution-based algorithm thatextends the ‘first’ branch of the execution of A
21
Related Work
• Experiments suggested DPLL algorithms may not be polynomial all the way to the threshold
• [Cocco, Monasson 01] applied non-rigorous methods to suggest exponential GUC behavior below the threshold– Assumed every branch of GUC tree operates like
an independent version of the first branch– Independent of our work
22
Implications for phase transitions and algorithmic complexity
• Difference between polynomial and exponential hardness is not necessarily a function of the phase transition– Applies in both phases, not just the over-
constrained phase– Algorithmically dependent
• A good algorithm will have a transition in a different place from a bad algorithm
• Can’t study the hardness transition in the absence of the study of algorithms
23
Proof Ideas
• Connection between pure literals and resolution proof size [Chvátal,Szemerédi 88] [Ben-Sasson,Wigderson 99]
– pure literals are those that occur only positively or only negatively in a formula
• Digraph structure of random 2-CNF subformula– New graph-theoretic notion “clan”
• generalization of connected component
– Sharp concentration properties for clan size• moment generating function argument
– Amortization of pure literals across clans
24
Resolution proof size and pure literals [Ben-Sasson,Wigderson 99]
• If formula has an s.t.– Every subformula with n clauses has
at least one pure literal– Every subformula with between n and
n clauses has a linear # of pure literals
• Then– all resolution proofs of the formula
require size 2n
1
2
25
Basic idea of argument
• By sparsity of the 2-clause part of the formula, any subset of the 2-clauses will have lots of pure literals
• Clan size analysis & amortization
• In a subformula involving both 2-clauses and 3-clauses, either there are
• so many 3-clauses that they create lots of new pure literals on their own , or
• so few 3-clauses that they can’t cover all the pure literals in the 2-clauses - analysis of clans
easy case
26
2-CNF Digraph on literals
x
y
z
w
c
d
x
y
z
w
c
d
(d y) (y x) (z y)
(c w) (x w) (w z)
27
Hyper/Digraph on literals
x
y
z
w
a
c
b
d
x
y
z
w
f
c
gd
(a b z) (f g w)
28
Pure literals
x
y
z
w
a
c
b
d
x
y
z
w
f
c
gd
29
Pure cycle
x
y
z
w
a
c
b
d
x
y
z
w
c
d
fg
30
Pure Items & Clans of G
• Clans– small subgraphs of G
• one clan per vertex; they cover G
• analog of connected components in sparse random graphs
– pure items typically two per clan leaves in acyclic
connected components in an ordinary graph
– mostly constant size
– never more than log3n vertices
• if x clan(y) then y clan(x)
31
What are clans?
Simpler notion first
in(y) for vertex yin an ordinary digraph
32
in(y) in ordinary digraph
x
y
z
w
v
u
t
Subgraph of vertices that can reach y= Ancestors of y
y
33
clan(y) in ordinary digraph
x
y
z
w
v
u
t
Descendants of ancestors of y
y
34
clan(y) in 2-CNF digraph
yy
35
A complication - bad events
x
w
c
y
d
z
x
z
w
c
y
d
(d y) (z y) (c w) (x w) (w z)
(w d)
36
in(y) in a bad case
yy
37
clan(y) in a bad case
yyThis can cascade
and get even worse!
38
Analysis
• If we ignore bad edges |in(y)| is dominated by a component process in a sub-critical random undirected graph– like trimmed out-trees
[Bollobás,Borgs,Chayes,Kim,Wilson]
• Ignoring bad edges |clan(y)| is dominated by a 2-level process– run a component process to get in(y)– take the union of |in(y)| independent
component processes added to in(y)
39
Analysis
• w.h.p. no more than one bad event happens per clan– |in(y)| is always dominated by the 2-level
component process
• w.h.p. no more than Clog n bad events occur in the whole digraph– fewer than polylog n literals interact with
bad clans– rest of clans dominated by 2-level process
40
Analysis
• Ordinary sub-critical component process on 2n vertices w.h.p.– # of vertices with component size i is at
most 2n (1-)i for some fixed 0
• We show sub-critical 2-level component process on 2n vertices w.h.p.– for i i0, # of vertices with 2-level size i
is at most 2n (1-)i for some fixed 0
This is false for a 3-level component process!
41
?
??
?
??
?
?
?
?
?
?
?
Open problemConjecture. For every > 2/3 there exists an s 1such that a random (2,3)-CNF with n 3-clausesand sn 2-clauses is w.h.p. unsatisfiable
1UNSAT
4.573.262/3
SAT
42
Open problemConjecture. For every > 2/3 there exists an s 1such that a random (2,3)-CNF with n 3-clausesand sn 2-clauses is w.h.p. unsatisfiable
Implies. For every card-game algorithm A there existsa critical density A such that for random 3-CNF
formulas with n clausesFor A w.h.p. A takes linear time
For A w.h.p. A takes exponential time