provably hard problems below the satisfiability threshold

Provably hard problems below the satisfiability threshold

Paul Beame Univ. of Washington

Michael Molloy Univ. of Toronto

Dimitris Achlioptas Microsoft Research

A sharp threshold in proof complexity yields

lower bounds for satisfiability search

CNF Satisfiability

• (x1 x2 x4) (x1 x3) (x3 x2) (x4 x3)

• NP-complete but many heuristics because of its practical importance– presumably exponential in the worst case

• If you know formula is satisfiable– How hard is it to find assignment?– No lower bounds known for interesting

heuristics.

Satisfiability Algorithms

• Local search (incomplete)– GSAT [Selman,Levesque,Mitchell 92]

– Walksat [Kautz,Selman 96]

• Backtracking search (complete)– DPLL [Davis,Putnam 60]

[Davis,Logeman,Loveland 62]

– DPLL + “clause learning”

• Select* a literal l (some x or x) Remove all clauses containing l

Shrink all clauses containing l

• While there are 1-clausesPick some (arbitrary) 1-clause, satisfy it and simplify

• If there is a 0-clause (contradiction)Backtrack to last free step

Backtracking search/DPLL

Free step

Yields `residual formula’

*many options for select

Resolution

• Start with clauses of CNF formula F

• Resolution rule– Given (A x), (B x) can derive

• F is unsatisfiable 0-clause derivable– Proof size = # of clauses

Running DPLL (with any select) on an unsatisfable formula F

results in a tree-resolution proof of F

Random CNF formulas

• Random 2-CNF formula with sn clauses– is satisfiable w.h.p. for s 1

• and simple DPLL will find a satisfying assignment in linear time w.h.p.

– is unsatisfiable w.h.p. for s 1• and simple DPLL will finish and yield a

resolution proof of unsatisfiability in linear time w.h.p.

DPLL on random 3-CNF*

probability satisfiable

ratio of clauses to variables

# of DPLLbacktracks

* n = 50 variables

Can prove 2(n/) time is required for

unsatisfiable formulas above the threshold

What about satisfiableformulas below threshold?

Phase transitions and algorithmic complexity

• Easy connection– Hardest random problems will always be at

a monotone sharp threshold n if it exists• Can randomly reduce satisfiable problems of

lower density to those at the threshold– Given a formula with n clauses can always

add () n random clauses to make it a random problem nearly at the threshold and use that soln

• Can reduce unsatisfiable problems of larger density to those at the threshold

– Given a formula with n clauses ignore all but the first nof them

Hard satisfiable formulas?

With non-deterministic select we could simply guess n correct value assignments.

.... How can a satisfiable formula possibly be hard?

Any implementation of select must run in polynomial time.…. Very simple heuristics used in practice

Some standard select rules for DPLL algorithms

• UC– Pick variables in a fixed order– Always set True first

• UCwm– Pick variables in a fixed order– Apply a majority vote among 3-clauses for assigning

each value

• GUC– Pick a variable v in a shortest clause C– Set v to satisfy C

Contributions

These natural DPLL algorithms take exponential time on satisfiable formulas

family of unsatisfiable random formulas parametrized by s s.t. w.h.p.

s 1 linear size resolution proofs

s 1 only exponential size resolution proofs possible

Key property of each of the select rules we’ve seen

• On random 3-CNF, before the first backtrack occurs, the residual formula is a uniformly random mix of 2-clauses and 3-clauses – If it has m2 2-clauses and m3 3-clauses

then it is equally likely to be any formula with these properties

• key property proofs of algorithms’ success without backtracking

What do long runs look like?

Residual formula at is unsatisfiable

Algorithm’sproof of unsatisfiability is exponentially long

Every resolution2n

Residual formula at each node is a mix of 2- and 3-clauses

Proof Complexity

[Chvátal-Szemerédi 88]

Formula is unsatisfiable w.h.p. for 4.57

Theorem. A random CNF formula with n 3-clauses

has no resolution refutation of size 2n w.h.p.

and sn 2-clauses where s 1

[Achlioptas,B.,Molloy 2001]

s 1-and ????

Non-rigorous results

4.574.26

2/3 3-clause

s We can add 2/3 n 3-clauses but not n 2-clauses

2-clause ratio [Kirkpatrick, Monasson, Selman, Zecchina 97]

Rigorous results [Achlioptas, Kirousis, Kranakis, Krizanc 97]

8/32/3

We can add 2/3 n 3-clauses but not n 2-clauses

3-clause ratio

2-clause ratio

Proof Complexity

Formula is unsatisfiable w.h.p. for 4.57

Theorem. A random CNF formula with n 3-clauses

has no resolution refutation of size 2n w.h.p.

and sn 2-clauses where s 1

[Achlioptas,B.,Molloy 2001]

2.281 and s 1- for .0001

Sharp threshold since resolution is linear for s 1+

These DPLL algorithms follow trajectories

[Chao,Franco 88]

[Frieze,Suen 95]

[Achlioptas 00]

[Achlioptas,Sorkin 00]UCGUC

3.26 3-clause ratio

2-clause ratio

DPLL crossing into the bad zone

Provably UNSAT& Hard

3.26 4.26

ProvablySAT & Easy

Algorithm Trajectory

2-clause ratio

3-clause ratio

Exponential lower bounds far below the threshold.

UC = 3.81

UCwm = 3.83

GUC = 4.01

Theorem. Let A {UC, UCwm, GUC}. Let

W.h.p. algorithm A takes more than 2n steps on a random 3-CNF with An clauses

Lower bound also applies to any resolution-based algorithm thatextends the ‘first’ branch of the execution of A

Related Work

• Experiments suggested DPLL algorithms may not be polynomial all the way to the threshold

• [Cocco, Monasson 01] applied non-rigorous methods to suggest exponential GUC behavior below the threshold– Assumed every branch of GUC tree operates like

an independent version of the first branch– Independent of our work

Implications for phase transitions and algorithmic complexity

• Difference between polynomial and exponential hardness is not necessarily a function of the phase transition– Applies in both phases, not just the over-

constrained phase– Algorithmically dependent

• A good algorithm will have a transition in a different place from a bad algorithm

• Can’t study the hardness transition in the absence of the study of algorithms

Proof Ideas

• Connection between pure literals and resolution proof size [Chvátal,Szemerédi 88] [Ben-Sasson,Wigderson 99]

– pure literals are those that occur only positively or only negatively in a formula

• Digraph structure of random 2-CNF subformula– New graph-theoretic notion “clan”

• generalization of connected component

– Sharp concentration properties for clan size• moment generating function argument

– Amortization of pure literals across clans

Resolution proof size and pure literals [Ben-Sasson,Wigderson 99]

• If formula has an s.t.– Every subformula with n clauses has

at least one pure literal– Every subformula with between n and

n clauses has a linear # of pure literals

• Then– all resolution proofs of the formula

require size 2n

Basic idea of argument

• By sparsity of the 2-clause part of the formula, any subset of the 2-clauses will have lots of pure literals

• Clan size analysis & amortization

• In a subformula involving both 2-clauses and 3-clauses, either there are

• so many 3-clauses that they create lots of new pure literals on their own , or

• so few 3-clauses that they can’t cover all the pure literals in the 2-clauses - analysis of clans

easy case

2-CNF Digraph on literals

(d y) (y x) (z y)

(c w) (x w) (w z)

Hyper/Digraph on literals

(a b z) (f g w)

Pure literals

Pure cycle

Pure Items & Clans of G

• Clans– small subgraphs of G

• one clan per vertex; they cover G

• analog of connected components in sparse random graphs

– pure items typically two per clan leaves in acyclic

connected components in an ordinary graph

– mostly constant size

– never more than log3n vertices

• if x clan(y) then y clan(x)

What are clans?

Simpler notion first

in(y) for vertex yin an ordinary digraph

in(y) in ordinary digraph

Subgraph of vertices that can reach y= Ancestors of y

clan(y) in ordinary digraph

Descendants of ancestors of y

clan(y) in 2-CNF digraph

A complication - bad events

(d y) (z y) (c w) (x w) (w z)

in(y) in a bad case

clan(y) in a bad case

yyThis can cascade

and get even worse!

Analysis

• If we ignore bad edges |in(y)| is dominated by a component process in a sub-critical random undirected graph– like trimmed out-trees

[Bollobás,Borgs,Chayes,Kim,Wilson]

• Ignoring bad edges |clan(y)| is dominated by a 2-level process– run a component process to get in(y)– take the union of |in(y)| independent

component processes added to in(y)

Analysis

• w.h.p. no more than one bad event happens per clan– |in(y)| is always dominated by the 2-level

component process

• w.h.p. no more than Clog n bad events occur in the whole digraph– fewer than polylog n literals interact with

bad clans– rest of clans dominated by 2-level process

Analysis

• Ordinary sub-critical component process on 2n vertices w.h.p.– # of vertices with component size i is at

most 2n (1-)i for some fixed 0

• We show sub-critical 2-level component process on 2n vertices w.h.p.– for i i0, # of vertices with 2-level size i

is at most 2n (1-)i for some fixed 0

This is false for a 3-level component process!

Open problemConjecture. For every > 2/3 there exists an s 1such that a random (2,3)-CNF with n 3-clausesand sn 2-clauses is w.h.p. unsatisfiable

1UNSAT

4.573.262/3

Open problemConjecture. For every > 2/3 there exists an s 1such that a random (2,3)-CNF with n 3-clausesand sn 2-clauses is w.h.p. unsatisfiable

Implies. For every card-game algorithm A there existsa critical density A such that for random 3-CNF

formulas with n clausesFor A w.h.p. A takes linear time

For A w.h.p. A takes exponential time

provably hard problems below the satisfiability threshold

e n random clauses

residual formula

dn clauses d b

simple dpll

d ratio of clauses

unsatisfiable formulas

dpll clause learningselect

exponential time

Documents

an introduction to boolean satisfiability

skiena algorithm 2007 lecture20 satisfiability

provably secure cryptographic constructions

satisfiability methods for colouring graphs

on conjunctive normal form satisfiability

provably reliable qa interfaces

provably secure crossdomain multifactor authentication

upper bound on the satisfiability threshold k;s ·...

a model-constructing satisfiability calculus

satisfiability solvers

the computational complexity of satisfiability

complexity of circuit satisfiability

the boolean satisfiability problem (sat) · 15 threshold...

advances in maximum satisfiability

satisfiability modulo computable functions

propositional satisfiability (sat)

satisfiability - wikipedia, the free encyclopedia

a class of provably

satisfiability threshold for 2-sat in new model - work in...

propositional logic and satisfiability