onos open network operating system an experimental open-source distributed sdn os pankaj berde,...

Post on 14-Dec-2015

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ONOSOpen Network Operating System

An Experimental Open-Source Distributed SDN OS

Pankaj Berde, Umesh Krishnaswamy, Jonathan Hart, Masayoshi Kobayashi, Pavlin Radoslavov, Pingping Lin, Sean Corcoran, Tim Lindberg, Rachel Sverdlov, Suibin Zhang, William Snow, Guru Parulkar

Agenda Overview of ONRC (Open Networking Research Center)

ONOS • Architecture• Scale-out and high availability • Network graph as north-bound abstraction• DEMO• Consistency models • Development and test environment • Performance • Next steps

Leadership

National Academy of EngineeringACM SIGCOMM Award Winners

Fellow of IEEE and ACMEntrepreneurs

Impact on practice of networking/cloud

Nick MckeownKP, Mayfield, Sequoia

Professor, Stanford

Larry PetersonBob Kahn Professor Princeton

Chief Architect, ON.LAB

Scott Shenker

Professor, UC Berkeley

Chief Scientist, ICSI

Stanford/Berkeley SDN Activities With Partners

2007 20112008 2009 2010

Ethane

Dem

oD

eplo

ymen

tPl

atfor

m D

evel

opm

ent

OpenFlow Spec v0.8.9 v1.0 v1.1

Reference Switch NetFPGA Software

Network OS NOX SNAC Beacon

Virtualization FlowVisor FlowVisor (Java)

Tools Test Suite oftrace MininetMeasurement tools

GENI software suite Expedient/Opt-in Manager/FOAM

Stanford University ~45 switch/APs ~25userIn McKeown Group

CIS/EE BuildingProduction Network

US R&E Community GENI: 8 Universities + Internet2 + NLRMany other campuses

Other countries Over 68 countries(Europe, Japan, China, Korea,Brazil, etc.)

VM Migration(Best Demo)

Trans-PacificVM Migration

Baby GENI Nation Wide GENI “The OpenFlow Show” – IT World

SDN Concept(Best Demo)

SIGCOMM08 GEC3 SIGCOMM09 GEC6 GEC9Interop

2011

+Broadcom

Scaling of SDN Innovation

Standardize OpenFlow and promote SDN~100 Members from all parts of the industry

Bring best SDN content; facilitate high quality dialogue3 successive sold out events; participation of ecosys

Build strong intellectual foundationBring open source SDN tools/platforms to community

SDN Academy

Bring best SDN training to companiesto accelerate SDN development and adoption

ONRC Organizational Structure

BerkeleyScott Shenker

Sylvia Ratnasamy

Open Network LabExec Director: Guru Parulkar

VP Eng: Bill SnowChief Architect: Larry Peterson

16-19 Engineers/Tech Leads(includes PlanetLab team)

Tools/Platforms for SDN community

OpenCloud demonstration of XaaS and SDN

PhD/Postdocs

Research

StanfordNick McKeownGuru ParulkarSachin Katti

7

Mission

Bring innovation and openness to internet and cloud infrastructure with open source

tools and platforms

Tools & Platforms

3rd partycomponents

Network OS

Apps Apps

Network OS

Apps Apps

Open Interfaces

Open Interfaces

Network Hypervisor

Forwarding

FlowVisor, OpenVirteX

MININET, Cluster Edition

ONOS

SDN-IP Peering

Test

ON

with

deb

uggi

ng s

uppo

rt

Net

Sigh

t

Open Network OS (ONOS)

Architecture

Scale-out and high availability

Network graph as north-bound abstraction

DEMO

Consistency models

Development and test environment

Performance

Next steps

ONOS: Executive Summary

ONOS

Status

Distributed Network OS

Network Graph Northbound Abstraction

Horizontally Scalable

Highly Available

Built using open source components

Version 0.1

- Flow API, Shortest Path computation, Sample

application

- Build & QA ( Jenkins, Sanity Tests, Perf/Scale Tests, CHO)

- Deployment in progress at REANNZ (SDN-IP peering)

Next

Exploring performance & reactive computation frameworks

Expand graph abstraction for more types of network state

Control functions: intra-domain & inter-domain routing

Example use cases: traffic engineering, dynamic virtual networks

on demand, …

ONOS – Architecture Overview

Routing TE

Network OS

PacketForwarding

PacketForwarding

PacketForwarding

Mobility

ProgrammableBase Station

Openflow Scale-outDesign

Fault Tolerance

Global network view

Open Network OS Focus(Started in Summer 2012)

Global Network View

Prior Work

Distributed control platform for large-scale networks

Focus on reliability, scalability, and generality

Scale-out NOS focused on network virtualization in data centers

State distribution primitives, global network view, ONIX API

ONIX

Other Work

Helios (NEC), Midonet (Midokura), Hyperflow, Maestro, Kandoo

NOX, POX, Beacon, Floodlight, Trema controllers

Community needs an open source distributed SDN OS

Host

Host

Host

Titan Graph DB

Cassandra In-Memory DHT

Instance 1 Instance 2 Instance 3

Network GraphEventually consistent

Distributed RegistryStrongly Consistent Zookeeper

OpenFlow Controller+

OpenFlow Controller+

OpenFlow Controller+

ONOS High Level Architecture

+Floodlight Drivers

Scale-out & HA

ONOS Scale-Out

Distributed Network OS

Instance 2 Instance 3

Instance 1

Network GraphGlobal network view

An instance is responsible for maintaining a part of network graph

Control capacity can grow with network size or application need

Data plane

Master Switch A = ONOS 1

Candidates = ONOS 2, ONOS 3

Master Switch A = ONOS 1

Candidates = ONOS 2, ONOS 3

Master Switch A = ONOS 1

Candidates = ONOS 2, ONOS 3

ONOS Control Plane Failover

Distributed Network OS

Instance 2 Instance 3Instance 1

Distributed Registry

Host

Host

Host

A

B

C

D

E

F

Master Switch A = NONE

Candidates = ONOS 2, ONOS 3

Master Switch A = NONE

Candidates = ONOS 2, ONOS 3

Master Switch A = NONE

Candidates = ONOS 2, ONOS 3

Master Switch A = ONOS 2

Candidates = ONOS 3

Master Switch A = ONOS 2

Candidates = ONOS 3

Master Switch A = ONOS 2

Candidates = ONOS 3

Network Graph

Cassandra In-memory DHT

Id: 1A

Id: 101, Label

Id: 103, Label

Id: 2C

Id: 3B

Id: 102, Label

Id: 104, Label

Id: 106, Label

Id: 105, Label

Network Graph

Titan Graph DB

ONOS Network Graph Abstraction

Network Graph

port

switch port

device

port

onport

port

port

linkswitch

on

device

host host

Network state is naturally represented as a graph Graph has basic network objects like switch, port, device and links Application writes to this graph & programs the data plane

Example: Path Computation App on Network Graph

port

switch port

device

Flow pathFlow entry

port

onport

port

port

link switch

inport

on

Flow entry

device

outportswitchswitch

host host

flowflow

• Application computes path by traversing the links from source to destination• Application writes each flow entry for the path

Thus path computation app does not need to worry about topology maintenance

Example: A simpler abstraction on network graph?

Logical Crossbar

port

switch port

device

Edge Port

port

onport

port

port

link switch

physical

on

Edge Port

device

physical

hosthost

• App or service on top of ONOS• Maintains mapping from simpler to complex

Thus makes applications even simpler and enables new abstractions

Virtual network objects

Real network objects

Network Graph Representation

Flow path

Flow entry

Flow entry

flow

flow

Vertex with 10 properties

Vertex with 11 properties

Vertex represented as Cassandra row

Property(e.g. dpid)

Property(e.g. state)

Property … Edge Edge

Edge represented as Cassandra

column

Column Value

Label id + direction

Primary key

Edge id Vertex id Signature properties

Other properties

Switch

Vertex with 3 properties

Row indices for fast vertex centric queries

Switch Manager Switch ManagerSwitch Manager

Network Graph: Switches

OFOF

OFOF

OFOF

Network Graph and Switches

SM

Network Graph: Links

SM SM

Link Discovery Link Discovery Link Discovery

LLDP LLDP

Network Graph and Link Discovery

Network Graph: Devices

SM SM SMLD LD LD

Device Manager Device Manager Device Manager

PKTIN

PKTIN

PKTINHost

Host

Host

Devices and Network Graph

SM SM SMLD LD LD

Host

Host

Host

DM DM DM

Path Computation Path Computation Path Computation

Network Graph: Flow Paths

Flow 1

Flow 4

Flow 7

Flow 2

Flow 5

Flow 3

Flow 6

Flow 8

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Path Computation with Network Graph

SM SM SMLD LD LD

Host

Host

Host

DM DM DM

Flow Manager

Network Graph: FlowsPC PC PC

Flow Manager Flow ManagerFlowmod Flowmod

Flowmod

Flow 1

Flow 4

Flow 7

Flow 2

Flow 5

Flow 3

Flow 6

Flow 8

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Flow entriesFlow entriesFlow entries

Network Graph and Flow Manager

Host

Host

Host

Titan Graph DB

Cassandra In-Memory DHT

Instance 1 Instance 2 Instance 3

Network GraphEventually consistent

Distributed RegistryStrongly Consistent Zookeeper

OpenFlow Controller+

OpenFlow Controller+

OpenFlow Controller+

ONOS High Level Architecture

+Floodlight Drivers

DEMO

Consistency Deep Dive

Consistency Definition

• Strong Consistency: Upon an update to the network state by an instance,

all subsequent reads by any instance returns the last updated value.

• Strong consistency adds complexity and latency to distributed data

management.

• Eventual consistency is slight relaxation – allowing readers to be behind

for a short period of time.

Strong Consistency using Registry

Distributed Network OS

Instance 2 Instance 3

Network Graph

Instance 1

A = Switch A

Master = NONE A = ONOS 1

Timeline

All instances Switch A Master = NONE

Instance 1 Switch A Master = ONOS 1Instance 2 Switch A Master = ONOS 1Instance 3 Switch A Master = ONOS 1

Master elected for switch A

Registry Switch A Master = NONE

Switch A Master = ONOS 1

Switch A Master = ONOS

1

Switch AMaster = NONE

Switch A Master = ONOS 1

Delay of Locking & Consensus

All instances Switch A Master = NONE

Why Strong Consistency is needed for Master Election

Weaker consistency might mean Master election on

instance 1 will not be available on other instances.

That can lead to having multiple masters for a switch.

Multiple Masters will break our semantic of control

isolation.

Strong locking semantic is needed for Master Election

Eventual Consistency in Network Graph

Distributed Network OS

Instance 2 Instance 3

Network Graph

Instance 1

SWITCH A STATE= INACTIVE

Switch AState = INACTIVE

Switch ASTATE = INACTIVE

All instances Switch A STATE = ACTIVE

Instance 1 Switch A = ACTIVEInstance 2 Switch A = INACTIVEInstance 3 Switch A = INACTIVE

DHT

Switch Connected to ONOS

Switch A State = ACTIVE

Switch A State = ACTIVE

Switch ASTATE = ACTIVE

Timeline

All instances Switch A STATE = INACTIVE

Delay of Eventual Consensus

Cost of Eventual Consistency

Short delay will mean the switch A state is not ACTIVE on

some ONOS instances in previous example.

Applications on one instance will compute flow through the

switch A while other instances will not use the switch A for

path computation.

Eventual consistency becomes more visible during control

plane network congestion.

Why is Eventual Consistency good enough for Network State?

Physical network state changes asynchronously

Strong consistency across data and control plane is too hard

Control apps know how to deal with eventual consistency

In the current distributed control plane, each router makes its

own decision based on old info from other parts of the network

and it works fine

Strong Consistency is more likely to lead to inaccuracy of

network state as network congestions are real.

Consistency learning

One Consistency does not fit all

Consequences of delays need to be well understood

More research needs to be done on various states

using different consistency models

Development & test environment

ONOS Development & Test cycle

Source code on github

Agile: 3-4 week sprints

Mostly Java and many utility scripts

CI: Maven, Jenkins, JUnit, Coverage, TestON

Vagrant-based development VM

Daily 4 hour of Continuous Hours of Operations (CHO) tests as

part of build

Several CHO cycles simulating rapid churns in network & failures

on ONOS instances

ONOS Development Environment

Single installation script creates a cluster of Virtual Box VMs

Test Lab Topology

ON.LAB ONOS Test implementation

ON.LAB team has implemented following aspects of automated tests• ONOS Unit Tests (70% coverage)• ONOS System Tests for Functionality, Scale,

Performance and Resiliency test (85% coverage)• White Box Network Graph Performance

Measurements All tests are executed nightly in Jenkins

Continuous Integration environment.

Performance

Key performance metrics in Network OS

Network scale (# switches, # ports) -> Delay and Throughput• Link failure, switch failure, switch port failure

• Packet_in (request for setting reactive flows)

• Reading and searching network graph

• Network Graph Traversals

• Setup of proactive flows Application scale (# operations, # applications)

• Number of network events propagated to applications (delay & throughput)

• Number of operations on Network Graph (delay & throughput)

• Parallelism/threading for applications (parallelism on Network Graph)

• Parallel path computation performance

46

Performance: Hard Problems

Off the shelf open source does not perform

Ultra low-latency requirements are unique

Need to apply distributed/parallel programming techniques to

scale control applications

Reactive control applications need event-driven framework

which scale

ONOS: Summary

ONOS

Status

Distributed Network OS

Network Graph Northbound Abstraction

Horizontally Scalable

Highly Available

Built using open source components

Version 0.1

- Flow API, Shortest Path computation, Sample

application

- Build & QA ( Jenkins, Sanity Tests, Perf/Scale Tests, CHO)

- Deployment in progress at REANNZ (SDN-IP peering)

Next

Exploring performance & reactive computation frameworks

Expand graph abstraction for more types of network state

Control functions: intra-domain & inter-domain routing

Example use cases: traffic engineering, dynamic virtual networks

on demand, …

www.onlab.us

top related