efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · a...

49
Efficient verification of network fault tolerance via counterexample-guided refinement Nick Giannarakis 1 Ryan Beckett 2 Ratul Mahajan 34 David Walker 1 1 Princeton University 2 Microsoft Research 3 University of Washington 4 Intentionet

Upload: others

Post on 10-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Efficient verification of network fault tolerancevia counterexample-guided refinement

Nick Giannarakis 1 Ryan Beckett 2 Ratul Mahajan 34 David Walker 1

1Princeton University 2Microsoft Research3University of Washington 4Intentionet

Page 2: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Network misconfigurations

1 / 21

Page 3: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Configuration challenges

(1) Complexity. Configurations are overly complex.(2) Changing environment. Peers send new routes.(3) Failures. Exponential number of behaviors to check.

2 / 21

Page 4: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Configuration challenges

3 Complexity. Configurations are overly complex.3 Changing environment. Peers send arbitrary routes.3 Failures. Exponential number of behaviors to check.

7 Scale. Millions of configuration lines on thousands of devices.

MineSweeper [Beckett 2017]SMT based verifier.

2 / 21

Page 5: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Configuration challenges

3 Complexity. Configurations are overly complex.3 Changing environment. Peers send arbitrary routes.3 Failures. Exponential number of behaviors to check.7 Scale. Millions of configuration lines on thousands of devices.

MineSweeper [Beckett 2017]SMT based verifier.

network size

time

2 / 21

Page 6: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Network compression: Bonsai [Beckett et al., 2018]

⇒𝑑

Concrete Network

𝑑

Abstract Network

Exploit topology/policy symmetries.Concrete nodes route “in the same way” as their abstraction.

But: does not preserve fault tolerance properties!

3 / 21

Page 7: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Network compression: Bonsai [Beckett et al., 2018]

⇒𝑑

Concrete Network

failed𝑑

Abstract Network

Exploit topology/policy symmetries.Concrete nodes route “in the same way” as their abstraction.But: does not preserve fault tolerance properties!

3 / 21

Page 8: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Our contribution

Is compression possible in the presence of failures?

4 / 21

Page 9: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Our contribution

Is compression possible in the presence of failures?

Yes! In this talk:A network compression theory compatible with failures.Origami, a tool that combines graph algorithms and SMTreasoning to compress a network and verify reachabilityproperties in the presence of link failures.

4 / 21

Page 10: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Network Compression Theory

Page 11: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The routing problem

𝑑

initial route

𝑏2𝑏1 𝑏3

𝑎

Formal model: Stable paths [Griffin et al., 2002], routing algebras [Sobrinho, 2005].

5 / 21

Page 12: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The routing problem

𝑑

initial route

𝑏2𝑏1 𝑏3

𝑎

Formal model: Stable paths [Griffin et al., 2002], routing algebras [Sobrinho, 2005].

5 / 21

Page 13: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The routing problem

𝑑

initial route

𝑏2𝑏1 𝑏3

𝑎

Formal model: Stable paths [Griffin et al., 2002], routing algebras [Sobrinho, 2005].

5 / 21

Page 14: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The routing problem

𝑑

initial route

𝑏2𝑏1 𝑏3

𝑎

Formal model: Stable paths [Griffin et al., 2002], routing algebras [Sobrinho, 2005].

5 / 21

Page 15: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The routing problem

𝑑

initial route

𝑏2𝑏1 𝑏3

𝑎

failed

Formal model: Stable paths [Griffin et al., 2002], routing algebras [Sobrinho, 2005].

5 / 21

Page 16: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The routing problem

𝑑

initial route

𝑏2𝑏1 𝑏3

𝑎

failed

failed

ℱ ∶ edge → boolℒ ∶ node → route

Formal model: Stable paths [Griffin et al., 2002], routing algebras [Sobrinho, 2005].

5 / 21

Page 17: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Topological symmetries: ∀∃-abstraction

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑑

𝑎

𝑏

Concrete and abstract networks have similar connectivity.Example:

blue abstract node has an edge to pink abstract node iff allblue concrete nodes have an edge to some pink concrete node.

6 / 21

Page 18: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Topological symmetries: ∀∃-abstraction

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑑

𝑎

𝑏

Concrete and abstract networks have similar connectivity.Example:

blue abstract node has an edge to pink abstract node iff allblue concrete nodes have an edge to some pink concrete node.

6 / 21

Page 19: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Topological symmetries: ∀∃-abstraction

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑑

𝑎1

𝑏

𝑎234

Concrete and abstract networks have similar connectivity.Example:

blue abstract node has an edge to pink abstract node iff allblue concrete nodes have an edge to some pink concrete node.

6 / 21

Page 20: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Challenges with link failures I

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑑

𝑎

𝑏

Concrete pink nodes have 2 disjoint paths, their abstractionhas only 1.

7 / 21

Page 21: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Challenges with link failures II

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

failed

𝑑

𝑎

𝑏

𝒂𝟏 no longer has similar routing behavior with 𝒂𝟐, 𝒂𝟑 and 𝒂𝟒.�� does not capture both behaviors.

8 / 21

Page 22: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Plausible abstractions

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑑

𝑎

𝑏

Plausible abstraction: Nodes have two disjoint paths!A necessary (but not sufficient) condition for 1-fault tolerance.

9 / 21

Page 23: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Plausible abstractions

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑎13

𝑑

𝑏

𝑎24

Plausible abstraction: Nodes have two disjoint paths!A necessary (but not sufficient) condition for 1-fault tolerance.

9 / 21

Page 24: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Approximating concrete networks

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

failed

𝑎13

𝑑

𝑏

𝑎24

2fai

lures

If a node has a route to the destination with k failures then ithas a route that is at least as good with k’ failures (k’ < k).Abstract network over-approximates link failures.Approximation is key to achieving compression.

10 / 21

Page 25: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Theory of approximation

Label approximation theoremGiven a network and its effective abstraction 𝑓 , for any solution(ℒ, ℱ) of the concrete network there exists a solution (ℒ, ℱ) ofthe abstract network, such that ℒ(𝑢) ⪯ ℒ(𝑓(𝑢)).

Holds for networks whose policy is monotonic and isotonic.Monotonic: route ≺ 𝑓(route)Isotonic: route1 ≺ route2 ⇒ 𝑓(route1) ≺ 𝑓(route2)

Reachability in abstraction implies reachability in the concrete.

11 / 21

Page 26: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Abstraction Algorithm

Page 27: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Abstract + verify

𝑠1 𝑠2 𝑠3 𝑠4

𝑏2𝑏1

𝑎1 𝑎2

𝑏3 𝑏4

𝑎3 𝑎4

𝑏5 𝑏6

𝑎5 𝑎6

𝑏7 𝑏8

𝑎7 𝑑

pink nodes do not announce routes to blue nodes.

(1) Can blue and pink nodes reach the destination when there is 1link failure?

(2) Start from the smallest abstraction.(3) REFINE to obtain a plausible abstraction:

|mincut(Graph,blue)| > 1 and |mincut(Graph,pink)| > 1.(4) Run the verification procedure.

blue node cannot reach the destination!

12 / 21

Page 28: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Abstract + verify

𝑑

(1) Can blue and pink nodes reach the destination when there is 1link failure?

(2) Start from the smallest abstraction.

(3) REFINE to obtain a plausible abstraction:|mincut(Graph,blue)| > 1 and |mincut(Graph,pink)| > 1.

(4) Run the verification procedure.blue node cannot reach the destination!

12 / 21

Page 29: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Abstract + verify

𝑑

(1) Can blue and pink nodes reach the destination when there is 1link failure?

(2) Start from the smallest abstraction.(3) REFINE to obtain a plausible abstraction:

|mincut(Graph,blue)| > 1 and |mincut(Graph,pink)| > 1.

(4) Run the verification procedure.blue node cannot reach the destination!

12 / 21

Page 30: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Abstract + verify

𝑑

failed

(1) Can blue and pink nodes reach the destination when there is 1link failure?

(2) Start from the smallest abstraction.(3) REFINE to obtain a plausible abstraction:

|mincut(Graph,blue)| > 1 and |mincut(Graph,pink)| > 1.(4) Run the verification procedure.

blue node cannot reach the destination!12 / 21

Page 31: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Counterexample-guided refinement

𝑑

6 failures

Spurious counterexample ⇒ refine the abstraction.

13 / 21

Page 32: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Counterexample-guided refinement

𝑑

No progress!

14 / 21

Page 33: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Counterexample-guided refinement

𝑑dis

abled

disabled

Learned that pink nodes do not send routes to blue nodes.Start over, REFINE until |mincut(Graph-disabled,blue)| > 1.

15 / 21

Page 34: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Counterexample-guided refinement

𝑑

Verifies reachability under any single link failure.Carries over to the concrete network by soundness theorem!

16 / 21

Page 35: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The REFINE procedure

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑎

𝑑

𝑏

Goal: Compute a plausible abstraction.

Split abstract nodes, but:(1) Which nodes to split?

(2) How to split them?(3) Must remain a valid ∀∃-abstraction.(4) Need to make the right splitting choices.

17 / 21

Page 36: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The REFINE procedure

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑎

𝑑

𝑏

Goal: Compute a plausible abstraction.Split abstract nodes, but:(1) Which nodes to split?

(2) How to split them?(3) Must remain a valid ∀∃-abstraction.(4) Need to make the right splitting choices.

17 / 21

Page 37: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The REFINE procedure

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑑

𝑏

Goal: Compute a plausible abstraction.Split abstract nodes, but:(1) Which nodes to split?

(2) How to split them?(3) Must remain a valid ∀∃-abstraction.(4) Need to make the right splitting choices.

17 / 21

Page 38: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The REFINE procedure

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑎12

𝑑

𝑏

𝑎34

Goal: Compute a plausible abstraction.Split abstract nodes, but:(1) Which nodes to split?(2) How to split them?

(3) Must remain a valid ∀∃-abstraction.(4) Need to make the right splitting choices.

17 / 21

Page 39: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The REFINE procedure

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑎12

𝑑

𝑏12 𝑏34

𝑎34

Goal: Compute a plausible abstraction.Split abstract nodes, but:(1) Which nodes to split?(2) How to split them?(3) Must remain a valid ∀∃-abstraction.

(4) Need to make the right splitting choices.

17 / 21

Page 40: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The REFINE procedure

𝑑

𝑎2𝑎1

𝑏1 𝑏2

𝑎3 𝑎4

𝑏3 𝑏4

𝑎13

𝑑

𝑏

𝑎24

Goal: Compute a plausible abstraction.Split abstract nodes, but:(1) Which nodes to split?(2) How to split them?(3) Must remain a valid ∀∃-abstraction.(4) Need to make the right splitting choices.

17 / 21

Page 41: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

The REFINE procedure

Computing the smallest plausible abstraction seems difficult!Instead: Explore many plausible abstractions.Guide the search by a set of heuristics.Pick the smallest abstraction found.

18 / 21

Page 42: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Evaluation

Page 43: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Compression and Verification results

Topo V/E # Failed Abs V/E Abstraction Time SMT Time

FT20 500/80001 9/20 0.1 0.13 40/192 1.0 7.65 96/720 2.5 248

FT401 12/28 0.1 0.1

2000/64000 3 45/220 33 12.35 109/880 762.3 184.1

Evaulated on synthetic datacenter topologies.Often reduced edges by more than 100x.Abstraction time is insignificant.SMT verification is possible.

19 / 21

Page 44: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Compression and Verification results

Topo V/E # Failed Abs V/E Abstraction Time SMT Time

FT20 500/80001 9/20 0.1 0.13 40/192 1.0 7.65 96/720 2.5 248

FT401 12/28 0.1 0.1

2000/64000 3 45/220 33 12.35 109/880 762.3 184.1

Evaulated on synthetic datacenter topologies.Often reduced edges by more than 100x.Abstraction time is insignificant.SMT verification is possible.

19 / 21

Page 45: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Compression and Verification results

Topo V/E # Failed Abs V/E Abstraction Time SMT Time

FT20 500/80001 9/20 0.1 0.13 40/192 1.0 7.65 96/720 2.5 248

FT401 12/28 0.1 0.1

2000/64000 3 45/220 33 12.35 109/880 762.3 184.1

Evaulated on synthetic datacenter topologies.Often reduced edges by more than 100x.Abstraction time is insignificant.SMT verification is possible.

19 / 21

Page 46: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Compression and Verification results

Topo V/E # Failed Abs V/E Abstraction Time SMT Time

FT20 500/80001 9/20 0.1 0.13 40/192 1.0 7.65 96/720 2.5 248

FT401 12/28 0.1 0.1

2000/64000 3 45/220 33 12.35 109/880 762.3 184.1

Evaulated on synthetic datacenter topologies.Often reduced edges by more than 100x.Abstraction time is insignificant.SMT verification is possible.

19 / 21

Page 47: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Heuristics effectiveness

1 5 15 250

100

200

300

Search Breadth

Abst

ract

ion

size

FT20 (500/8000), 5 link failures

Heuristics off All Heuristics

Random searches will not achieve high compression.Heuristics make (costly) mistakes.

20 / 21

Page 48: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Conclusions

We enable verification of fault tolerance of large networks:

Based on a new theory of network compression.Origami a tool that can handle networks out of reach tocurrent state-of-the-art tools.Geared towards reachability only.Some properties are not preserved by approximation.

21 / 21

Page 49: Efficient verification of network fault tolerance via ...nickgian.github.io/cav2019-slides.pdf · A network compression theory compatible with failures. Origami, a tool that combines

Thank you!