reasoning about software defined networks - …msagiv/courses/rsys/network.pdf · formal reasoning...

94
Formal Reasoning about Networks Mooly Sagiv [email protected] 03-640-7606 Tel Aviv University Sunday 14-16 http://www.cs.tau.ac.il/~msagiv/courses/rsys.html

Upload: vuhanh

Post on 25-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Formal Reasoning about Networks

Mooly Sagiv

[email protected]

03-640-7606

Tel Aviv University

Sunday 14-16

http://www.cs.tau.ac.il/~msagiv/courses/rsys.html

Outline

• Why bother about network verification?

• Verifying Software Defined Networks [PLDI’14]

• Middlebox Verification [TACAS’16]

• Azure Verification [NSDI’15, POPL’16]

[PLDI’14] T. Ball, N. Bjørner, A. Gember, S. Itzhaky, A. Karbyshev, M. Sagiv, M. Schapira, A. Valadarsky: VeriCon: towards verifying controller programs in software-defined networks [TACAS’16] Y. Velner, K. Alpernas, A. Panda, A. Rabinovich, M. Sagiv, S. Shenker and S. Shoham. Some Complexity Results for Stateful Network Verification [POPL’16] G. Plotkin, N. Bjørner, N. Lopes, A. Rybalchenko, G. Varghese: Scaling network verification using symmetry and surgery [NSDI’15] N. Lopes, N. Bjørner, P. Godefroid, K. Jayaraman, G. Varghese: Checking Beliefs in Dynamic Networks.

The Internet: A Remarkable Story

• Tremendous success – From research experiment

to global infrastructure

• Brilliance of under-specifying – Network: best-effort packet delivery

– Hosts: arbitrary applications

• Enables innovation in applications – Web, P2P, VoIP, social networks, virtual worlds

• But, change is easy only at the edge…

3

Inside the ‘Net: A Different Story… • Closed equipment

– Software bundled with hardware

– Vendor-specific interfaces

• Over specified

– Slow protocol standardization

• Few people can innovate

– Equipment vendors write the code

– Long delays to introduce new features

4

Impacts performance, security, reliability, cost…

Do We Need Innovation Inside?

5

Many boxes (routers, switches, firewalls, …), with different interfaces.

How Hard are Networks to Manage?

• Operating a network is expensive – More than half the cost of a network

– Yet, operator error causes most outages

• Buggy software in the equipment – Routers with 20+ million lines of code

– Cascading failures, vulnerabilities, etc.

• The network is “in the way” – Especially a problem in data centers

– … and home networks

6

Creating Foundation for Networking

• A domain, not a discipline – Alphabet soup of protocols

– Header formats, bit twiddling

– Preoccupation with artifacts

• From practice, to principles – Intellectual foundation for networking

– Identify the key abstractions

– … and support them efficiently

• To build networks worthy of society’s trust

7

Rethinking the “Division of Labor”

8

Traditional Computer Networks

9

Data plane: Packet streaming

Forward, filter, buffer, mark, rate-limit, and measure packets

Traditional Computer Networks

10

Track topology changes, compute routes, install forwarding rules

Control plane: Distributed algorithms

Traditional Computer Networks

11

Collect measurements and configure the equipment

Management plane: Human time scale

Shortest-Path Routing

• Management: set the link weights

• Control: compute shortest paths

• Data: forward packets to next hop

12

1

1

3

1

1

Shortest-Path Routing

• Management: set the link weights

• Control: compute shortest paths

• Data: forward packets to next hop

13

1

1

3

1

1

Inverting the Control Plane

• Traffic engineering

– Change link weights

– … to induce the paths

– … that alleviate congestion

14

5

1

3

1

1

Avoiding Transient Anomalies

• Distributed protocol

– Temporary disagreement among the nodes

– … leaves packets stuck in loops

– Even though the change was planned!

15

1 5

1

3

1

1

Death to the Control Plane!

• Simpler management

– No need to “invert” control-plane operations

• Faster pace of innovation

– Less dependence on vendors and standards

• Easier interoperability

– Compatibility only in “wire” protocols

• Simpler, cheaper equipment

– Minimal software

16

Software Defined Networking (SDN)

17

API to the data plane (e.g., OpenFlow)

Logically-centralized control

Switches

Smart, slow

Dumb, fast

• Networks provide end-to-end connectivity

• Just contain host and switches

• All interesting processing at the hosts

Alice Bob

Trent

Ted Stevens was right Classical Networking

Mallory

• Security (firewalls, IDSs,…)

• Performance (caches, load balancers,…)

• New functionality (proxies,…)

Alice Bob

Trent Mallory

Security & Performance

Firewall

Load Balancer

Cache

Middleboxes

• Middleboxes are intermediaries – Interposed in‐between the communicating hosts

– Often without knowledge of one or both parties

• Examples – Network address translators (NAT)

– Firewall

– Traffic shapers

– Intrusion detection systems (IDSs)

– Transparent Web proxy caches

– Application accelerators

NAT

local prt global

10.0.0.1 1 138.76.29.7

Firewalls

Trusted Hosts

A H

H

HA

B

A

Firewalls

Trusted Hosts

H

HB

B

A

A

C

D

Learning Switch

Learning Switch

A on 1 D on 3

1

2 3

B

25

Web Clients and Servers • Most Web applications use client-server protocol

– Client sends a request

– Server sends a response

• Proxies play both roles

– A server to the client

– A client to the server

www.cnn.com

www.google.com

Cache

Two Views of Middleboxes

• An abomination (toevah) – Violation of layering

– Breaks the functional model

– Responsible for many subtle bugs

• A practical necessity – Significant part of the network

– Solving real and pressing problems

– Needs that are not likely to go away

– Local functionality enhancements

Local enhancements: Riverbed

Overloaded

Cache Proxy

Normal Load

Middlebox code can get complex

• Source code complexity

– Bro Network Intrusion

• 101,500 lines of C++, Python, Perl, Awk, Lex, Yacc

– Snort IDS 220,000 C, …

– Pfsense 476438 locs of C,php,scripts,…

• Hard to specify correctness

– What is a correct IDS?

Middlebox code can get complex

• Source code complexity

– Bro Network Intrusion

• 101,500 lines of C++, Python, Perl, Awk, Lex, Yacc

– Snort IDS 220,000 C, …

– Pfsense 476438 locs of C,php,scripts,…

• Hard to specify correctness

– What is a correct IDS?

Programming error

• The middlebox code fails to implement the required functionality

• Incorrect intrusion detection system – 10 CVE reports for pfsense in 2014, a popular firewall – CVE on Firewall hardware from Palo Alto Networks (2010)

• Misinterprets HTTP cookie options, etc

• Heartbleed bug – allows anyone on the Internet to read the memory of the

systems protected by the vulnerable versions of the OpenSSL software

• Requires code analysis

Hypothesis

• There are only few types of middleboxes

• Can abstract the model of middleboxes as finite state machines

Safety of Computer Networks

• Show that something bad cannot happen

• Early detection of potential bugs

• Isolation:

• A packet of type t sent from host A never reaches host B

• Isolation between two universities

• SSH packets from host A cannot reach B

Safety with middleboxes

• Safety can be checked when the network only has switches with static routing rules

• Trace the forwarding graph

• Middleboxes make everything harder

• Arbitrary behavior – black box

• Rewrite packet headers

• Middlebox behave differently over time – need to reason about history

• Composition may violate safety

Firewall Misconfiguration

Proxy P A B

A B P B

A is isolated from B

Deny A Cache Proxy Firewall

Complex misconfiguration

Load Balancer

IDS

IDS

B A

B

At most one packet from B

At most one packet from B

Load Balancer

VeriCon: Towards Verifying Controller Programs in SDNs

Thomas Ball, Nikolaj Bjorner, Aaron Gember, Shachar Itzhaky, Aleksandr Karbyshev, Mooly Sagiv,

Michael Schapira, Asaf Valadarsky

Traditional Computer Networks

37

Data plane: packet streaming

Control plane: distributed algorithms

New Paradigm: Software Defined Networking (SDN)

38

API to the data plane (e.g., OpenFlow)

logically-centralized control in software

switches

smart but slow software

dumb but fast hardware

Controller: Programmability

39

Controller

events from switches topology changes, traffic statistics, arriving packets

commands to switches (un)install rules, query statistics

APP APP APP

Firewall Pseudocode ft = {} rel trusted(SW, HO) = {} while true { event(switch, srcdst, in-port) if exists out-port s.t. <switch, srcdst, port out-port> ft switch.forward(srcdst, in-port out-port) // handled by switch else if in-port = 0 switch.forward(srcdst, 01) // forward to outside world trusted.insert(switch, dst) // dst is now trusted ft.insert(switch, src dst, 01) // insert a per-flow rule to forward future else if in-port = 1 // packets from the outside world if <switch, src> trusted switch.forward(src dst, 10) // forward the packet to trusted hosts ft.insert(ft.insert(switch, src dst, 10) // insert a per-flow rule to // forward future packets }

Desired Network Properties

• Routing

–No forwarding loops, no black holes, …

• Security

–ACL, firewall, middleboxes, …

• Traffic Engineering

– Load balancing, VM migration, …

• …

41

How can we guarantee such properties?

42

Traditional Networks vs. SDN

• Guaranteeing these properties in a traditional networks is hard – Switch/ Router code is a “black box”

– Protocols are distributed across devices

• SDN opens up the possibility of applying formal software verification to networks! – Accessible code

– Centralized control (sequential core)

– Distributed switches with simple semantics 43

Existing Approaches for SDN Verification

• Finite-state model checking

– NICE & Verificare, FlowLog

• Analyzing network snapshots

– Header Space Analysis

• Run-time checks

– VeriFlow & NetPlumber

44

Might miss bugs!

Discover bugs too late

& run-time overhead

Dream Scenario

• Verify network-wide properties at compile time

– Find violations before they occur!

• Provable verification

–Prove correctness for correct programs

–Parametric network toplogies

– Find a counterexample for incorrect programs (useful for debugging)

45

An Ideal Tool

Controller Code (P)

Desired Properties

Verification Conditions Generator

T P

Solver

Counterexample Proof

Restrictions on Topology (T)

switch1 switch2. port. link(switch1, port, switch2) switch, packet. ssh(p) !forward(switch, packet)

An Ideal Tool

Controller Code (P)

Desired Properties

Verification Conditions Generator

T P

FOL sat. Solver(z3)

Counterexample Proof

Restrictions on Topology (T)

In general P is not expressible in FOL

Inductive Invariants

• An invariant Inv is inductive if: 1. The initial state satisfies Inv 2. Whenever an event E is executed on arbitrary state satisfying Inv

• the resulting state satisfies Inv • {Inv} E {Inv}

• Permits compositional verification • … but may be hard for programmers • Can be inferred by backward propagation (WP)

x = 2; while true do x := 2* x - 1

E E

Non-inductive

x>0

E

Inductive

x>1

A Less Ideal Tool

Controller Code (P)

Desired Properties

Verification Conditions Generator

Init Inv Inv event Inv

FOL sat. Solver(z3)

Counterexample Proof

Restrictions on Topology (T)

Inv

Firewall Pseudocode ft = {} rel trusted(SW, HO) = {} while true { event(switch, srcdst, in-port) if exists out-port s.t. <switch, srcdst, port out-port> ft switch.forward(srcdst, in-port out-port) // handled by switch else if in-port = 0 switch.forward(srcdst, 01) // forward to outside world trusted.insert(switch, dst) // dst is now trusted ft.insert(switch, src dst, 01) // insert a per-flow rule to forward future else if in-port = 1 // packets from the outside world if <switch, src> trusted switch.forward(src dst, 10) // forward the packet to trusted hosts ft.insert(ft.insert(switch, src dst, 10) // insert a per-flow rule to // forward future packets }

Desired Properties Firewall

• S.frwd( Src Dst, 10) Src’: HO. S.frwd(Dst Src, 0 1)

s

a

1 0

Switch Host

trusted

controller

a

Src In Dst Out

* 1 * 0

Forwarding Table

event( , 1)

Desired Properties Firewall(2)

• S.frwd( Src Dst, 10) Src’: HO. S.frwd(Dst Src, 0 1)

• S.ft( Src Dst, 1 0) Src’: HO. S.frwd(Src’ Src, 0 1)

Desired Properties Firewall

• S.frwd( Src Dst, 10) Src’: HO. S.frwd(Dst Src, 0 1)

• S.ft( Src Dst, 1 0) Src’: HO. S.frwd(Src’ Src, 01)

s

a

1 0

Switch Host

* *

trusted controller

a

Src In Dst Out

Forwarding Table

event( , 1)

Inductive Invariant Firewall

• S.frwd( Src Dst, 10) Src’: HO. S.frwd(Dst Src, 0 1)

• S.ft( Src Dst, 1 0) Src’: HO. S.frwd(Src’ Src, 01)

• <S, H> trusted Src: HO. S.frwd(Src H, 01)

Programs Proved

55

Program Program and Property

Firewall Correct forwarding for a basic firewall abstraction

MigFirewall Correct forwarding for a firewall supporting migration of “safe” hosts

Learning Topology learning for a simple learning switch

Resonance Access control for host authentication in enterprises

Stratos (Simplified)

Forwarding traffic through a sequence of middleboxes

Incorrect Programs Program CE

#Host CE #Switch

Auth-NoFlowRemoval 3 2

Firewall-ForgotConsistency 5 3

Firewall-ForgotPortCheck 6 3

Firewall-ForgotTrustedInvariant 6 3

Learning-NoSend 11 1

Resonance-StatesNotMutuallyExclusive 11 4

StatelessFireWall-AllowAll2to1Traffic 4 2

VeriCon: Challenges and Solutions • Inductive Invariants

– We describe a simple tool that infers inductive invariants for some SDN programs • Iterative WP • Future research: Abstract Interpretation, CEGAR

• SDN programs must be coded in a specific language (CSDN) – VeriCon can be extended to support Java, Python, etc.

• SAT solver might not terminate! – Many properties are in a sub-family of FOL (* *) – … solver termination guaranteed!

• VeriCon assumes atomicity of events – “Existing” solutions – Future research: verify stronger properties

57

Summary

• SDN opens up an opportunity for applying formal verification to networks

• VeriCon is the first system to directly prove correctness of generic SDN programs at compile time

– for unbounded topologies, #packets, etc.

58

On the Complexity of Verifying Stateful Networks

A. Panda S. Shenker Y. Velner K. Alpernas A. Rabinovich S. Shoham

Topology Assumptions

• Finite set of hosts H • Fixed set of middleboxes M

– Switches are degenerate middleboxes

• Fixed undirected topology E (H Pr M) (M Pr Pr M)

Packet Assumptions

• Finite set of packet types T

• Finite set of ports Pr per middlebox

• Finite set of packet headers (t, src, dst, pr) P = T H H Pr

• No bound on the number of packet sent

• Many packets may be sent before a safety violation occurs

Middlebox Abstract Semantics

• The abstract semantics of each middlebox is a function

– m: P* P 2P = P* (P 2P)

– Packet bodies are unchanged

Common middleboxes

Middlebox Function

Switch h, p = {p[outpr} | pr PR – p.ip}

Firewall h, p = if trusted(p, h) then {p[outpr} | pr PR – p.ip} // forward else {} // drop

Learning Switch

h, p = if there exists pr0 Prt such that connected(p.dst, h, pr0) then {p[outpr0] } // forward else {p[out} pr :pr Prt, pr p.ip} // flood

IDS h, p = if trusted(p, h) then {p[outpr} | pr PR – p.ip} // forward else {} // drop

Cache Proxy h, p = if avail(p.body, h, response) then {p[srcme, dst p.src,body response]} else {p[src me]}

Modeling Middliboxes by FSMs

• A Transducer m =<S, s0, P, , > where – S are the states of the middleboxes – s0 S is the initial state – : S P 2P is the current forwarding behavior – : S P 2S is the next state – Extend to histories

• ([]) = {s0} • (h . p) = ( (h), p))

• m models m: P* P 2P when for all h P* and P P: – ((h), p) = m(h, p)

Partial FSM for Firewall

… …

… …

(Type, Source, Destination, Port)/{Forwarded Packets}

Trusted ={2}

The Safety Problem

• Given a fixed topology of middleboxes

• A finite state transducer for each of the middleboxes

• Prove that there exists no scenario of packet transmissions leading to a bad state

• Identify such scenariors

Undecidability

• Checking safety properties such as isolation is undecidable even for finite state middleboxes

– Cycles in the topology allows counting

– Even in the absence of forwarding loops

Obtaining Decidability

• Show that if there is a scenario leading to a safety violation then there is also bounded one

• Reduction to a decision procedure

Non-Deterministic Packet Handling

• Assumes that order of packet processing is arbitrary

• It may be that a packet p arrives before q and yet the middlebox processes q first

• If a the network is safe under non-deterministic assumption it is also safe under FIFO assumption

• May lead to false alarms

– Middlebox can impose orders based on acknowledgements

Decidability

• Under non-deterministic assumptions safety is decidable

• More packets per state means more forwarding options – Order is immaterial

– Terminating backward reachabilty

• Well Quasi-Order on Packet Multisets

• Reduction to Coverability in Petri Net – But complexity is high

• EXPSPACE-Complete

Middlebox classification

Arbitrary

Progressing

Increasing

Switch

Nat Learning Switch

Firewall IDS

Cache Load Balancer

Stateless

Stateless Middleboxs

• Behavior independent of the history – Can maintain configuration information

• For all h, h’ P*: – m(h) = m(h’)

– For all p P: m(h, p) = m(h’, p)

• Examples – Switches and Routers

– ACL Firewall

– Simple load-balancer

Increasing Middleboxs

• For every history, adding packets increase forwarding behavior

• For all h1, h2 P* , p, p’ P: – m(h1:h2, p) m(h1:p’:h2, p)

• Good examples – Stateless – Firewall

• Bad Examples – Learning Switch – Cache

Middlebox classification

Arbitrary

Progressing

Increasing

Switch

Nat Learning Switch

Firewall IDS

Cache Load Balancer

Stateless

Abstract Middlebox Definition Language

• Powerful enough to express the behavior of interesting

middleboxes

• Succinct

– Sometimes exponential state saving

• Simple enough for analysis

• Lends itself to classification of middleboxes

– Same worst case complexity

– But sometimes exponential saving

Firewall (AMDL)

firewall(self) =

receive(p, prt)

when prt = 1

trusted_hosts.insert p.dst

forward p to 2

when prt = 2 and p.src trusted_hosts

forward p to 1

Proxy (AMDL)

proxy(self) =

receive(p, prt)

when (p.type, response) cache

//stored response

forward response[src=self.host] to prt

when (p.type, p.src, p.dst,rport)requested

// first response

cache.insert (p.type, p);

forward p[src = self.host] to port

otherwise // new message

requested.insert (p.type, p.src, p.dst, prt);

forward p[src = self.host] to oprt

forall oprt AllPrt and oprt != pr

Firewall vs. FSM firewall(self) =

receive(p, prt)

when prt = 1

trusted_hosts.insert p.dst

forward p to 2

when prt=2 and

p.srctrusted_hosts

forward p to 1

The MuteVer Toolset

AMDL spec

Front-End

LogicBlox

DataLog Petri-Net

Counterexample Proof

Lola

Amazon EC2 Security Groups model

Fat Tree Switch

Tenant 1 Tenant 2 Tenant n

Public 1

Public 2

Private 1

Private 2

Public 1

Public 2

Private 1

Private 2

Public 1

Public 2

Private n

Private 2

Query

• Q1: can a packet arrive from tenant 7 to private host of faulty tenant, provided that the private host never sent a packet to tenant 7? (YES)

• Q2: can a packet arrive from tenant 7 to private host at tenant 2 (not faulty), provided that the private host never sent a packet to tenant 7? (NO)

Results (muZ)

0

10

20

30

40

50

60

70

0 200 400 600 800 1000 1200

Time per query (sec)

Number of tenants (4 hosts per tenants)

SAT (bug)

UNSAT (no bug)

Summary

• Middlebox classification

• Complexity results

• Initial toolset

Checking Beliefs in Dynamic Networks

N. Lopez N. Bjorner P. Godefroid K. Jayaraman G. Varghese:

Monitoring at Scale

Cloud

Explosion

A Cloud Harnessed by Logic/SE

Network Policies: Complexity, Challenge and Opportunity

Several devices, vendors, formats • Net filters • Firewalls • Routers Challenge in the field • Do devices enforce policy? • Ripple effect of policy changes Arcane • Low-level configuration files • Mostly manual effort • Kept working by “Masters of Complexity” 74%

13%

13%

Human Errors by Activity

Config Changes

Device hw/sw updates

WA Cluster Setup

Human errors > 4 x DOS attacks

A Data-center Architecture

Policy Policy

Policy

Policy

Policy

Policy

Policy

Policy

Contract

Database

Azure

Network Devices

GNS Edge

Network Devices

Configuration

Stream

Contract

Stream

SECGURU

ACL Validation

Theorem Prover

Device Validation

Stream

Reports

Database

Alerts

+

Reporting

in

WANetmon

StreamInsight Complex Event Processing (CEP) Application

Windows Azure Network Monitoring Infrastructure

SecGuru workflow

Access Control

DNS ports on DNS servers are accessible from tenant devices over both TCP and UDP.

The SSH ports on management devices are inaccessible from tenant devices.

Contract:

Contract:

MICROSOFT CONFIDENTIAL

SecGuru in WANetmon 40,000 ACL checks per month Each check 50-200ms 20 bugs/month (mostly for build-out)

SecGuru for GNS edge ACLs

Regression Contracts

Edge ACL

Edge ACL

Regression Contracts

Edge ACL

SecGuru

SecGuru

Regression test suite + SecGuru check correctness of Edge ACL prior to deployment

Several major Edge ACL pushes

2700+ to 1000 ACLs

no major impact on any services

Stable state

Policies as Logical Formulas

Combining semantics

Precise Semantics as formulas

Contracts/Policies

Semantic Diffs

Traditional Low level of Configuration network

managers use

Policies as Logical Formulas

Combining semantics

Precise Semantics as formulas

Contracts/Policies

Semantic Diffs

Traditional Low level of Configuration network

managers use

Semantic Diffs

srcIp srcIp srcPort

dstIp

dstIp

Beyond Z3: a new idea to go from one violation to all violations

SecGuru contains optimized algorithm for turning single solutions into all (product of ranges)