understanding random satkevinlb/talks/randomsat - cp2004.pdf · (mean, cv) cp 2004. features: local...

36
Understanding Random SAT Beyond the Clauses-to-Variables Ratio Eugene Nudelman Stanford University joint work withKevin Leyton-Brown Holger Hoos University of British Columbia Alex Devkar Yoav Shoham Stanford University

Upload: others

Post on 11-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Understanding Random SAT

Beyond the Clauses-to-Variables Ratio

Eugene NudelmanStanford University

joint work with…Kevin Leyton-Brown

Holger HoosUniversity of British Columbia

Alex DevkarYoav Shoham

Stanford University

Page 2: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Introduction

• SAT is one of the most studied problems in CS

• Lots known about its worst-case complexity– But often, particular instances of NP-hard problems like SAT

are easy in practice

• “Drosophila” for average-case and empirical (typical-case) complexity studies

• (Uniformly) random SAT provides a way to bridge analytical and empirical work

CP 2004

Page 3: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Previously…• Easy-hard-less hard transitions discovered in the behaviour

of DPLL-type solvers [Selman, Mitchell, Levesque]

– Strongly correlated with phase transition in solvability

– Spawned a new enthusiasm for using empirical methods to study algorithm performance

• Follow up included study of: – Islands of tractability [Kolaitis et. al.]

– SLS search space topologies [Frank et.al., Hoos]

– Backbones [Monasson et.al., Walsh and Slaney]

– Backdoors [Williams et. al.]

– Random restarts [Gomes et. al.]

– Restart policies [Horvitz et.al, Ruan et.al.]

– …

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

3.3 3.5 3.7 3.9 4.1 4.3 4.5 4.7 4.9 5.1 5.3

c / v

4 * Pr(SAT) - 2

log(Kcnfs runtime)

CP 2004

Page 4: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Empirical Hardness Models

• We proposed building regression models as a disciplined way of predicting and studying algorithms’ behaviour

[Leyton-Brown, Nudelman, Shoham, CP-02]

• Applications of this machine learning approach:1) Predict running time

Useful to know how long an algorithm will run

2) Gain theoretical understandingWhich variables are important to the hardness model?

3) Build algorithm portfoliosCan select the right algorithm on a per-instance basis

4) Tune distributions for hardnessCan generate harder benchmarks by rejecting easy instances

CP 2004

Page 5: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Outline

• Features

• Experimental Results

–Variable Size Data

–Fixed Size Data

CP 2004

Page 6: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Features: Local Search Probing

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Step Number

BE

ST

# U

nsa

tC

lause

s

Long Plateau

Short Plateau

CP 2004

Page 7: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Features: Local Search Probing

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Step Number

BE

ST

# U

nsa

tC

lause

s

Best Solution(mean, CV)

CP 2004

Page 8: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Features: Local Search Probing

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Step Number

BE

ST

# U

nsa

tC

lause

s

Number of Steps to Optimal(mean, median, CV, 10%.90%)

CP 2004

Page 9: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Features: Local Search Probing

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Step Number

BE

ST

# U

nsa

tC

lause

s

Ave. Improvement To Best Per Step(mean, CV)

CP 2004

Page 10: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Features: Local Search Probing

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Step Number

BE

ST

# U

nsa

tC

lause

s

First LM Ratio(mean, CV)

CP 2004

Page 11: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Features: Local Search Probing

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Step Number

BE

ST

# U

nsa

tC

lause

s

BestCV(CV of Local Minima)

(mean, CV)

CP 2004

Page 12: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Features: DPLL, LP

• DPLL search space size estimate– Random probing with unit propagation

– Compute mean depth till contradiction

– Estimate log(#nodes)

• Cumulative number of unit propagations at different depths (DPLL with Satz heuristic)

• LP relaxation– Objective value

– stats of integer slacks

– #vars set to an integer

CP 2004

Page 13: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Other Features

Var

Var

Var

Clause

Clause

• Problem Size: – v (#vars)

– c (#clauses)

– Powers of c/v, v/c, |c/v — 4.26|

• Graphs: – Variable-Clause (VCG, bipartite)

– Variable (VG, edge whenever two variables occur in the same clause)

– Clause (CG, edge iff two clauses share a variable with opposite sign)

• Balance – #pos vs. #neg literals

– unary, binary, ternary clauses

• Proximity to Horn formula

} used for normalizing many other features

Var

Var Var

VarVar

Clause Clause

ClauseClause

CP 2004

Page 14: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Outline

• Features

• Experimental Results

–Variable Size Data

–Fixed Size Data

CP 2004

Page 15: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Experimental Setup

• Uniform random 3-SAT, 400 vars

• Datasets (20000 instances each)

– Variable-ratio dataset (1 CPU-month)

• c/v uniform in [3.26, 5.26] (∴ c ∈[1304,2104])

– Fixed-ratio dataset (4 CPU-months)

• c/v=4.26 (∴ v=400, c=1704)

• Solvers– Kcnfs [Dubois and Dequen]

– OKsolver [Kullmann]

– Satz [Chu Min Li]

• Quadratic regression with logistic response function

• Training : test : validation split – 70 : 15 : 15

CP 2004

Page 16: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Kcnfs Data

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

3.3 3.5 3.7 3.9 4.1 4.3 4.5 4.7 4.9 5.1 5.3

c / v

4 * Pr(SAT) - 2

log(Kcnfs runtime)

CP 2004

Page 17: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Kcnfs Data

0.01

0.1

1

10

100

1000

3.26 3.76 4.26 4.76 5.26

Clauses-to-Variables Ratio

Run

time(

s)

CP 2004

Page 18: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Kcnfs Data

0.01

0.1

1

10

100

1000

3.26 3.76 4.26 4.76 5.26

Clauses-to-Variables Ratio

Run

time(

s)

CP 2004

Page 19: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Kcnfs Data

0.01

0.1

1

10

100

1000

3.26 3.76 4.26 4.76 5.26

Clauses-to-Variables Ratio

Runti

me(

s)

CP 2004

Page 20: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Kcnfs Data

0.01

0.1

1

10

100

1000

3.26 3.76 4.26 4.76 5.26

Clauses-to-Variables Ratio

Run

time(

s)

CP 2004

Page 21: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Variable Ratio Prediction (Kcnfs)

0.01

0.1

1

10

100

1000

0.01 0.1 1 10 100 1000

Actual Runtime [CPU sec]

Pre

dic

ted R

untim

e [C

PU

sec

]

CP 2004

Page 22: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Variable Ratio - UNSAT

0.01

0.1

1

10

100

1000

0.01 0.1 1 10 100 1000

Actual Runtime [CPU sec]

Pre

dic

ted R

untim

e [C

PU

sec

]

CP 2004

Page 23: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Variable Ratio - SAT

0.01

0.1

1

10

100

1000

0.01 0.1 1 10 100 1000

Actual Runtime [CPU sec]

Pre

dic

ted R

untim

e [C

PU

sec

]

CP 2004

Page 24: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Kcnfs vs. Satz (UNSAT)

0.01

0.1

1

10

100

1000

0.01 0.1 1 10 100 1000

Kcnfs time [CPU sec]

Satz

tim

e [C

PU

sec

]

CP 2004

Page 25: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Kcnfs vs. Satz (SAT)

0.01

0.1

1

10

100

1000

0.01 0.1 1 10 100 1000

Kcnfs time [CPU sec]

Satz

tim

e [C

PU

sec

]

CP 2004

Page 26: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Feature Importance – Variable Ratio

• Subset selection can be used to identify features sufficient for approximating full model performance

• Other (correlated) sets could potentially achieve similar performance

VariableCost of

Omission

|c/v-4.26| 100

|c/v-4.26|2 69

(v/c)2 × SapsBestCVMean 53

|c/v-4.26| × SapsBestCVMean 33

CP 2004

Page 27: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Feature Importance – Variable Ratio

• Subset selection can be used to identify features sufficient for approximating full model performance

• Other (correlated) sets could potentially achieve similar performance

VariableCost of

Omission

|c/v-4.26| 100

|c/v-4.26|2 69

(v/c)2 × SapsBestCVMean 53

|c/v-4.26| × SapsBestCVMean 33

CP 2004

Page 28: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Feature Importance – Variable Ratio

• Subset selection can be used to identify features sufficient for approximating full model performance

• Other (correlated) sets could potentially achieve similar performance

VariableCost of

Omission

|c/v-4.26| 100

|c/v-4.26|2 69

(v/c)2 × SapsBestCVMean 53

|c/v-4.26| × SapsBestCVMean 33

CP 2004

Page 29: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Fixed Ratio Data

0.01

0.1

1

10

100

1000

3.26 3.76 4.26 4.76 5.26

Clauses-to-Variables Ratio

Run

time(

s)

CP 2004

Page 30: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Fixed Ratio Prediction (Kcnfs)

0.01

0.1

1

10

100

1000

0.01 0.1 1 10 100 1000

Actual Runtime [CPU sec]

Pre

dic

ted R

untim

e [C

PU

sec

]

CP 2004

Page 31: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Feature Importance – Fixed Ratio

VariableCost of

Omission

SapsBestSolMean2 100

SapsBestSolMean × MeanDPLLDepth 74

GsatBestSolCV × MeanDPLLDepth 21

VCGClauseMean × GsatFirstLMRatioMean 9

CP 2004

Page 32: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Feature Importance – Fixed Ratio

CP 2004

VariableCost of

Omission

SapsBestSolMean2 100

SapsBestSolMean × MeanDPLLDepth 74

GsatBestSolCV × MeanDPLLDepth 21

VCGClauseMean × GsatFirstLMRatioMean 9

Page 33: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Feature Importance – Fixed Ratio

VariableCost of

Omission

SapsBestSolMean2 100

SapsBestSolMean × MeanDPLLDepth 74

GsatBestSolCV × MeanDPLLDepth 21

VCGClauseMean × GsatFirstLMRatioMean 9

CP 2004

Page 34: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

SAT vs. UNSAT

• Training models separately for SAT and UNSAT instances:– good models require fewer features

– model accuracy improves

– c/v no longer an important feature for VR data

– Completely different features are useful for SAT than for UNSAT

• Feature importance on SAT instances:– Local Search features sufficient

• 7 features for good VR model

• 1 feature for good FR model (SAPSBestSolCV x SAPSAveImpMean)

– If LS features omitted, LP + DPLL search space probing

• Feature importance on UNSAT instances:– DPLL search space probing

– Clause graph features

CP 2004

Page 35: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Beyond Ratio: Weighted CG Clustering Coefficient

• Byproduct of our analysis: a very strong correlation between weighted CG clustering coefficient and v/c

• Clustering coefficient is a more fundamental concept than v/c, since it describes the structure of the constraints explicitly, not implicitly.– correlation between (unweighted) CC and hardness has been

shown for other constraint problems (e.g., graph coloring, combinatorial auctions)

• We have a proof sketch of this correlation

CP 2004

Page 36: Understanding Random SATkevinlb/talks/RandomSat - CP2004.pdf · (mean, CV) CP 2004. Features: Local Search Probing 0 200 400 600 800 1000 1200 ... (DPLL with Satz heuristic) • LP

Conclusions

• Can construct good models for DPLL solvers

• These models can be analyzed to gain understanding about what makes instances hard or easy for solvers

• Algorithm portfolios can be constructed (Satzilla)

• More specifically:– Strong relationship between LS and DPLL search spaces

– Our approach automatically identified importance of c/v

– SAT/UNSAT instances have very different performance characteristics; it helps to model them separately

– Clustering Coefficient explains why c/v is important in terms of local properties of constraint graph

CP 2004