birs workshop, banff, canada jan 22, 2014 © 2014 ibm corporation resolution and parallelizability:...
TRANSCRIPT
BIRS Workshop, Banff, Canada
Jan 22, 2014 © 2014 IBM Corporation
Resolution and Parallelizability:Barriers to the Efficient Parallelization of SAT Solvers
George Katsirelos MIAT, INRA, Toulouse, FranceAshish Sabharwal IBM Watson, USAHorst Samulowitz IBM Watson, USALaurent Simon Univ. Paris-Sud, LRI/CNRS, Orsay, France
[published at AAAI-2013]
Resolution and Parallelizability
© 2014 IBM Corporation
Trend Towards Parallelization
Focus Shifting From Single-Thread Performanceto Multi-Processor Performance
– 100s and even 1000s of compute cores easily accessible
– Classical Algorithm Parallelization, e.g., parallel sort, shortest path,PRAM model, AC circuits
– Significant Advances in Data Parallelisme.g., MapReduce, Hadoop, SystemML, R statistics
Challenge: Search and Optimization on 1000s of Processors
– Tremendous advances in the Sequential case of Combinatorial Search E.g., SAT solvers can tackle instances with ~2M variables, 10M constraints!
– Exponential search appears to be an “obvious” candidate to parallelize!
– In fact, many SAT/CSP/MIP solvers already do support multi-core andmulti-machine runs
2 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2014 IBM Corporation
Parallelization of Combinatorial Search
Fact: State-of-the-Art Search Engines Do NOT Parallelize Well
– Brute Force exponential search is, of course, trivial to parallelize
– But sophisticated search engines that adapt (through e.g. clause learning, variable impact aggregation, etc.) have inherent sequential aspects
– Modern SAT/MIP/”adapting”-CP solvers do not parallelize well• Supporting data: next slide
AAAI 2012 Challenge Paper on the topic [Hamadi & Wintersteiger 2012]
– P-completeness of Unit Propagation a key barrier (solvers spend ~80% of the time Unit Propagating and we don’t know how to parallelize P well)
– Our result: barriers exist even if Unit Propagation came for free!
3 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2014 IBM Corporation
Parallelization of Combinatorial Search: SAT
Rather Disappointing Performance at SAT Competitions – e.g., in 2011:
– Average speedup on 8 cores only ~1.8x, on 32 cores only ~3x
– Top performing parallel solvers were based on little to no communication(CryptoMinisat-MT [Soos 2012], Plingeling [Biere 2012])
– Winners were “simple” Portfolio solvers (ppfolio [Roussel], pfolioUZK [Wotzlaw et al])
Plingeling-ats-587[Dec 2013]
– Single machine with 128 coresand 128 GB memory
– Benchmark set used in thiswork, restricted to the 142instances solved by 1 core in[10,5000) seconds
4 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
1 6 640.50
5.00
1.00
1.25
1.90
2.57 2.69
1.63
Plingeling ats 587
Number of Cores
Sp
ee
du
p(g
eo
me
tric
av
era
ge
)
Resolution and Parallelizability
© 2014 IBM Corporation5 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
What makes parallelization of SAT solvers hard?
Can we obtain insights into their behaviorbeyond eventual wall-clock performance?
Resolution and Parallelizability
© 2014 IBM Corporation
Contributions of the Work
A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity
– Focus on understanding rather than on engineering
– Are there inherent bottlenecks that may hinder parallelization,irrespective of which heuristics are used to share information?
1. A Practical Study: Interesting properties of Actual Proofs
– Proofs generated by state-of-the-art SAT solvers contain narrow bottlenecks
2. Proof-Based Measures that capture Best-Case Parallelizability
– Coarse measure: “Depth” of the proof graph
– Refined measure: Makespan of a resource constrained scheduling problem
3. Empirical Findings: Correlations and Parallelization Limits
– Typical sequential proofs are not very parallelizable even in the best case!
– “Schedule speedup” / makespan correlates with observed speedup
6 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2014 IBM Corporation
Approach: Proof Complexity (applied here to Typically Generated Proofs)
Proof Complexity [Cook & Reckhov, 1979]: Study of the nature (e.g., size, width, space, depth, “shape”, etc.) of Proofs of Unsatisfiability
– Resolution Graph of Conflict-Directed-Clause-Learning (CDCL) SAT Solvers
Runtime(any SAT solver, F) minproofs Size(Resolution proof of F)
– Note: Insights applicable also to Satisfiable instances!• Solvers prove a lot of sub-formulas to be unsatisfiable before hitting the first solution• Formal characterization [Achlioptas et al, 2001 & 2004]
Study of Proofs has provided strong insights into CDCL SAT solvers
– What does “clause learning” bring?
– What do “restarts” add?
[Beame et al, 2004; Buss et al, 2008, 2012; Hertel et al, 2008; Pipatsrisawat et al, 2011]
7 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Worst case / Best case results
Resolution and Parallelizability
© 2014 IBM Corporation8
Underlying Inference Principle: Resolution
CDCL SAT solvers produce Resolution Derivations
Proof Graph and Depth:
– Each initial and derived constraint is a node, annotated with its proof depth
– proofdepth(initial clause C) = 0
– proofdepth(derived clause C) = 1 + maxparents proofdepth(parent(C))
C1 0 C2 0 C3 0 C4 0 C5 0 C6 0
C7 1
C8 2
C9 1
C10 3
C11 2
C12 3
C13 4
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Constraint ID Depth
F :
Resolution and Parallelizability
© 2014 IBM Corporation9
How Parallelizable are Resolution Refutations?
Refutation(F) = Resolution Proof that derives the empty (“false”) clause
Depth of the proof clearly limits the amount of potential parallelization
– Chain of dependencies
– Theorem: All Resolution Proof Graphs of certain “pebbling” style instances have large depth; also holds for all Conflict Resolution Graphs (XOR substitution trick)
However, proofdepth bound on parallelization is very crude
– Does not explain poor performance with small k (e.g., 8, 32, … processors)
How does a typical sequential SAT solver proof look like?
– Setup for Experiments:• Sequential Glucose 2.1 extended with proof output• GluSatX10: using SatX10 to run a k-processor version of Sequential Glucose
– Working Assumption: Proofs produced by GluSatX10 on k cores look “similar”to proofs produced by Sequential Glucose
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
http://x10-lang.org/satx10 [IBM Teams: X10 and SAT/CSP]
** simplified statements; see paper for more formal notions
Resolution and Parallelizability
© 2014 IBM Corporation10
Proof Graph Example: Very Complex Structure
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
[Easy sequential case, solved in ~30 seconds]
Resolution and Parallelizability
© 2014 IBM Corporation11
Bottlenecks in Typical SAT Proofs
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Proofs Generated by SAT Solvers Exhibit Surprisingly Narrow “Bottlenecks”, i.e., Depths with Very Few (~1) Clauses!
– Nothing deeper can be derived before bottleneck clauses Sequentiality
Depth in the proof
Nu
mb
er o
f C
lau
ses
(lo
g-sc
ale)
Der
ived
at
that
Dep
th
Resolution and Parallelizability
© 2014 IBM Corporation12
Best-Case Parallelization with k Processors
Given Proof P and k Processors, Best-Case Parallelization of P = Resource Constrained Scheduling Problem with Precedences
Let Mk(P) = makespan of the optimal schedule of P on k processors
– Even approximating Mk(P) within 4/3 is NP-hard, but (2 – 1/k) approx. is easy
Best-Case k processor speedup on P: Sk(P) = M1(P) / Mk(P)
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
C1 0 C2 0 C3 0 C4 0 C5 0 C6 0
C7 1
C8 2
C9 1
C10 3
C11 2
C12 3
C13 4Constraint ID Depth
C’9 1Example:M1(P) = 8M2(P) = 5M3(P) = 4M4(P) = 4…depth = 4
1 1 2
2 3
3 4
5
Resolution and Parallelizability
© 2014 IBM Corporation13
Makespan vs. Proof Depth
Schedule Makespan yields a finer grained lower bound, Sk(P),on best-case parallelization than proof depth
– proofdepth(P) : limit of parallelization of P with “infinite” processors
– Mk(P) proofdepth(P)
– Mk(P) proofdepth(P) as k
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2014 IBM Corporation14 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Empirical Findings
Resolution and Parallelizability
© 2014 IBM Corporation15
Even Best-Case Parallelization Efficiency is Low Beyond 100 Processors
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Best-Case Efficiency of parallelizing P with k processors = 100 * (Sk(P) / k)
E.g., 100% = full utilization of k processors speedup = k
Resolution and Parallelizability
© 2014 IBM Corporation16
Proofs of Some Instances Exhibit Very LowBest-Case Schedule Speedup
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
A) Even with 1024 processors,best-case speedup ~ 50-100
B) 128 processors insufficient toachieve a speedup of ~ 90
Resolution and Parallelizability
© 2014 IBM Corporation17
Best-Case Schedule Speedup Correlates WithActual Observed Runtime Speedup
Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Average over a sliding window
(Makes the study of the best-case schedule speedup relevant)
Resolution and Parallelizability
© 2014 IBM Corporation
Summary
A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity
– Focus on understanding rather than on engineering
Main Findings:
A. Typical Sequential Refutations Contain Surprisingly Narrow Bottlenecks
B. Typical Sequential Refutations are Not Parallelizable Beyond a Few Processors, even in the best case of offline ‘schedule speedup’ produced in hindsight
C. Observed Runtime Speedup with k processors weakly correlates withBest-Case Schedule Speedup of a Sequential Proof produced in hindsight
Open Question: Can we design SAT solvers that generate Proofs that are inherently More Parallelizable?
Caveat: assumption that proofs generated by GluSatX10 on k cores look “similar” to proofs generated by Sequential Glucose
18 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon