sdn: google's b4 and traffic engineering · 2016-04-04 · introduction why sdn based...
TRANSCRIPT
![Page 1: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/1.jpg)
SDN: Google's B4 and Traffic Engineering
1 / 57
![Page 2: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/2.jpg)
Outline
1 B4: Experience with a Globally-Deployed Software Defined WAN
2 Achieving high utilization with software-driven WAN
2 / 57
![Page 3: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/3.jpg)
Introduction
Modern WANs are critical to performance, reliability
Typically provisioned to 30-40% average utilization (2-3x bandwidthcost over-provisioning).
3 / 57
![Page 4: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/4.jpg)
Introduction
Modern WANs are critical to performance, reliability
Typically provisioned to 30-40% average utilization (2-3x bandwidthcost over-provisioning).
Overheads + high bandwidth requirement.
3 / 57
![Page 5: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/5.jpg)
Introduction
Google’s WAN, one of the largest in the Internet.
Delivers range of services like search, video, cloud computing, etc.
4 / 57
![Page 6: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/6.jpg)
Introduction
Google’s WAN, one of the largest in the Internet.
Delivers range of services like search, video, cloud computing, etc.
Architecturally two distinct WANs
4 / 57
![Page 7: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/7.jpg)
Introduction
Google’s WAN, one of the largest in the Internet.
Delivers range of services like search, video, cloud computing, etc.
Architecturally two distinct WANs
1 User-facing network peers: for user traffic.
4 / 57
![Page 8: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/8.jpg)
Introduction
Google’s WAN, one of the largest in the Internet.
Delivers range of services like search, video, cloud computing, etc.
Architecturally two distinct WANs
1 User-facing network peers: for user traffic.2 B4
◮ Connectivity between data centers.◮ 90% of internal traffic runs on this network.
eg. asynchronous data copies, end user data replication, etc.
4 / 57
![Page 9: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/9.jpg)
Introduction
Google’s WAN, one of the largest in the Internet.
Delivers range of services like search, video, cloud computing, etc.
Architecturally two distinct WANs
1 User-facing network peers: for user traffic.2 B4
◮ Connectivity between data centers.◮ 90% of internal traffic runs on this network.
eg. asynchronous data copies, end user data replication, etc.
Why two different WANs?- different requirements (eg. priority, latency, etc.)
Internet traffic continues to grow rapidly, but Google’s WANtraffic grows even more faster.
4 / 57
![Page 10: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/10.jpg)
Introduction
SDN approach for DC WAN interconnect.
5 / 57
![Page 11: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/11.jpg)
Introduction
SDN approach for DC WAN interconnect.
Motivation:◮ Deploy routing and TE protocols customized to Google’s unique
requirements.
5 / 57
![Page 12: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/12.jpg)
Introduction
SDN approach for DC WAN interconnect.
Motivation:◮ Deploy routing and TE protocols customized to Google’s unique
requirements.
Design goals:◮ Treat failures as common events.◮ Switches provide programmatic interface under central control.
5 / 57
![Page 13: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/13.jpg)
Introduction
Why SDN based solution?
Limitations with traditional WAN architectures.
Elastic bandwidth demands: majority traffic, tolerant to transient failures
Moderate number of sites: few dozen data centers
End application control: control the network at every level with more
flexibility, thus reducing over-provisioning of resources
Cost sensitivity: nearly impossible to match the growing demand with
traditional approaches
Others include success of SDN and OF, rapid iteration of novelprotocols, improved capacity planning, scalability, flexibility, etc. E
6 / 57
![Page 14: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/14.jpg)
Introduction
Manage switches using SDN principles
SDN Application: support standard routing protocols + centralizedTE service
◮ Edge servers make decisions on resource availability.◮ Use multipath forwarding based on application priority.◮ Dynamic reallocate bandwidth for link/switch failures.
7 / 57
![Page 15: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/15.jpg)
Introduction
Manage switches using SDN principles
SDN Application: support standard routing protocols + centralizedTE service
◮ Edge servers make decisions on resource availability.◮ Use multipath forwarding based on application priority.◮ Dynamic reallocate bandwidth for link/switch failures.
Allows to achieve:
◮ near 100% link utilization on many B4 links◮ 70% on all link utilization
(ie. 2-3x efficiency improvements vs standard practice)
7 / 57
![Page 16: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/16.jpg)
Design - Overview
8 / 57
![Page 17: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/17.jpg)
Design - Overview
Logically, a three layered architecture.
B4 WAN - consists multiple sites.within each site, the switch hardware layer forwards traffic
Site Controller layer - consists of Network Control Servers (NCS)hosting both OpenFlow Controllers (OFC) and Network ControlApplications (NCAs).
- OFC maintains network state based on NCA directives- Paxos for fault tolerance of individual servers
Global layer - logically centralized applications like SDN Gateway,central TE server.
- enables central control of entire network- SDN gateway provides abstractions to TE server
9 / 57
![Page 18: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/18.jpg)
Design - Overview
Options for integrating existing routing protocols with centralized trafficengineering:
10 / 57
![Page 19: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/19.jpg)
Design - Overview
Options for integrating existing routing protocols with centralized trafficengineering:
Approach 1: Build one integrated, centralized service combiningboth routing and TE
10 / 57
![Page 20: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/20.jpg)
Design - Overview
Options for integrating existing routing protocols with centralized trafficengineering:
Approach 1: Build one integrated, centralized service combiningboth routing and TE
Approach 2: Build routing and centralized TE as separateindependent services
10 / 57
![Page 21: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/21.jpg)
Design - Overview
Options for integrating existing routing protocols with centralized trafficengineering:
Approach 1: Build one integrated, centralized service combiningboth routing and TE
Approach 2: Build routing and centralized TE as separateindependent services
Which one would you prefer?
10 / 57
![Page 22: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/22.jpg)
Design - Overview
Approach 2: Building routing and centralized TE as separate independentservices.
11 / 57
![Page 23: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/23.jpg)
Design - Overview
Approach 2: Building routing and centralized TE as separate independentservices.
Why?
Focus on SDN infrastructure development.
11 / 57
![Page 24: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/24.jpg)
Design - Overview
Approach 2: Building routing and centralized TE as separate independentservices.
Why?
Focus on SDN infrastructure development.
Debug SDN architecture before adding new features.
11 / 57
![Page 25: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/25.jpg)
Design - Overview
Approach 2: Building routing and centralized TE as separate independentservices.
Why?
Focus on SDN infrastructure development.
Debug SDN architecture before adding new features.
TE layer sits on top of routing protocols
BIG RED BUTTON to disable TE (back to shortest path forwarding)
11 / 57
![Page 26: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/26.jpg)
Design - Switch Design
Conventional design needs deep buffers, large forwarding tables, hardwaresupport for HA.
12 / 57
![Page 27: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/27.jpg)
Design - Switch Design
Conventional design needs deep buffers, large forwarding tables, hardwaresupport for HA.
For B4, Google resolves them by:
◮ adjusting transmission rates by careful endpoint management
12 / 57
![Page 28: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/28.jpg)
Design - Switch Design
Conventional design needs deep buffers, large forwarding tables, hardwaresupport for HA.
For B4, Google resolves them by:
◮ adjusting transmission rates by careful endpoint management
◮ having modest number of DCs + abstraction = smaller forwardingtables
12 / 57
![Page 29: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/29.jpg)
Design - Switch Design
Conventional design needs deep buffers, large forwarding tables, hardwaresupport for HA.
For B4, Google resolves them by:
◮ adjusting transmission rates by careful endpoint management
◮ having modest number of DCs + abstraction = smaller forwardingtables
◮ moving software functionality from switches to upper layers
12 / 57
![Page 30: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/30.jpg)
Design - Switch Design
Conventional design needs deep buffers, large forwarding tables, hardwaresupport for HA.
For B4, Google resolves them by:
◮ adjusting transmission rates by careful endpoint management
◮ having modest number of DCs + abstraction = smaller forwardingtables
◮ moving software functionality from switches to upper layers
Need for custom switches
Switches that could export low-level control over switch forwardingbehavior
12 / 57
![Page 31: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/31.jpg)
Design - Switch Design
High-radix switch - deploying fewer larger switches ⇒ yields easiermanagement and software scalability
B4 switches - uses multiple merchant silicon switch chips + two-stageClos topology
Figure: High-radix switch
13 / 57
![Page 32: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/32.jpg)
Design - Network Control Functionality
Majority functionality runs on NCS
Paxos handles leader election for all control functionalities◮ Failure detection◮ New leader election
Modified ONIX for OFC◮ OFC is the Network Information Base (NIB)
eg. topology info., trunk configs., link status, etc.
14 / 57
![Page 33: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/33.jpg)
Design - Routing
How to integrate OpenFlow-based switch with existing routingprotocols?
Google chose Quagga stack for BGP/ISIS on NCS.
Developed an SDN application called”Routing Application Proxy (RAP)”.
RAP provides connectivity between Quagga and OF switches for:◮ BGP/ISIS route updates◮ routing-protocol packets flowing between switches and Quagga◮ interface updates from the switches to Quagga
15 / 57
![Page 34: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/34.jpg)
Traffic Engineering
Goal: share bandwidth among competing
applications/flow-groups
16 / 57
![Page 35: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/35.jpg)
Traffic Engineering
Goal: share bandwidth among competing
applications/flow-groups
Objective function: max-min fair allocation
16 / 57
![Page 36: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/36.jpg)
Traffic Engineering
Notions
Network Topology: a group represents sites as vertices and site-to-siteconnectivity as edges.
Flow Group (FG): aggregate applications to flow groups defined as{source site, dest site, QoS} rule.
Tunnel (T): a site-level path in the network eg. sequence of sites
(A ⇒ B ⇒ C)
Tunnel Group (TG): maps FG to a set of tunnels (T ) and correspondingweights.
17 / 57
![Page 37: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/37.jpg)
Traffic Engineering
Figure: Overview of Traffic Engineering
18 / 57
![Page 38: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/38.jpg)
TE - Bandwidth Functions
Associate bandwidth function with every application
Admin-specified static weights (slope functions)
Allocate bandwidth based on flow’s relative priority (fair share)
19 / 57
![Page 39: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/39.jpg)
TE - Max-Min Fair Allocation
Formal definition:
Resources are allocated to sources in order of increasing demand
No source gets a resource share larger than its demand
Sources with unsatisfied demand gets an equal share of the resource
S. Keshav (1997)
An Engineering Approach to Computer Networking, p. 215-217
Publisher Addison-Wesley, Reading, MA, 1997
20 / 57
![Page 40: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/40.jpg)
TE - Max-Min Fair Allocation
Figure: Example of Max-Min Fair Allocation
1 Assign(
10Mbps4 flows
)
= 2.5 Mbps per flow
2 Sum the over-assigned amount (Residual) for flow 1, 0.5 Mbps over-assigned
3 Assign(
ResidualNo. of under assigned flows
)
to each flow = 0.5/3 = 0.0666 Mbps
4 Repeat steps 2 and 3 with new residual until no residual left or no demand isgreater than residual
Final assignment:Flow 1 = 2 Mbps, Flow 2 = 2.6 Mbps, Flow 3 = 2.7 Mbps, Flow 4 = 2.7 Mbps
21 / 57
![Page 41: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/41.jpg)
TE - Weighted Max-Min Fair Allocation
Figure: Example of Weighted Max-Min Fair Allocation
1 Normalize weights (so that smallest weight is 1) W=[5,8,1,2]
2 Unit share =(
Total resourcesum of normalized weights
)
=(
1616
)
= 1
3 Assign every flow [unit share X normalized weight of flow ] units of resource
4 Calculate over-assigned resources and repeat steps 1,2,3, and 4 with thisresidual
Final assignment:Flow 1 = 4 Mbps, Flow 2 = 2 Mbps, Flow 3 = 4 Mbps, Flow 4 = 6 Mbps
22 / 57
![Page 42: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/42.jpg)
TE - Optimization
LP optimal for allocating fair share for FGs is expensive and notscalable.
B4 team designed their own algorithm to achieve this with at least99% utilization and 25 times faster performance relative to LP.
Two main components:
1 Tunnel Group Generation: allocates bandwidth to FGs usingbandwidth functions to prioritize bottleneck edges.
2 Tunnel Group Quantization: changes split ratios in each TG tomatch granularity supported by switch hardware tables.
23 / 57
![Page 43: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/43.jpg)
TE Protocol & OF - TE State and OpenFlow
Three modes of B4 switch:
1 Encapsulating switch
2 Transit switch
3 Decapsulating switch
24 / 57
![Page 44: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/44.jpg)
TE Protocol & OF - TE State and OpenFlow
25 / 57
![Page 45: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/45.jpg)
TE Protocol & OF - TE State and OpenFlow
Source switch maps packets to FG using <dest ip >, forwards tocorresponding TG.
TG hashes packets to a T in the desired ratio.
Each site in the path maintains per-tunnel forwarding rules.
Source site encapsulates packet with outer header (ie. Tunnel ID).
Transit switch uses tunnel ID to match rules and forwards it.
Decapsulating switch terminates flow based on tunnel ID.
26 / 57
![Page 46: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/46.jpg)
TE Protocol & OF - Composing Routing and TE
B4 supports two routing services.1 Shortest path routing (uses Longest Prefix Match - LPM table)2 TE (uses Access Control List - ACL table)
Map different flows and groups to appropriate tables.
ACL takes strict precedence over LPM entries.
27 / 57
![Page 47: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/47.jpg)
TE Protocol & OF - Composing Routing and TE
28 / 57
![Page 48: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/48.jpg)
TE Protocol & OF - Coordinating TE State Across Sites
Figure: Overview of Traffic Engineering
29 / 57
![Page 49: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/49.jpg)
TE Protocol & OF - Coordinating TE State Across Sites
TE server coordinates T/TG/FG rule installations across multipleOFCs.
TED - Traffic Engineering Database captures state needed to forwardpackets along multiple paths.
TED - <key,value> data store.
Compute per-site TED, generate TE Ops to OFCs.
TE Ops either add/modify/delete TED entries at OFCs.
OFCs convert TE Ops to flow-programming instructions and sends toall devices in its site.
Finally, OFC responds to original TE Op. g
30 / 57
![Page 50: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/50.jpg)
TE Protocol & OF - Dependencies and Failures
Dependencies among Ops:
◮ to avoid packet drops, all Ops cannot run simultaneouslyeg. configure a T at all sites before configuring TG/FG
31 / 57
![Page 51: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/51.jpg)
TE Protocol & OF - Dependencies and Failures
Dependencies among Ops:
◮ to avoid packet drops, all Ops cannot run simultaneouslyeg. configure a T at all sites before configuring TG/FG
Synchronizing TED between TE and OFC:
◮ requires common TED view◮ TE session supports this synchronization◮ TE synchronizes TED with persistent memory - to handle
simultaneous failures
31 / 57
![Page 52: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/52.jpg)
TE Protocol & OF - Dependencies and Failures
Dependencies among Ops:
◮ to avoid packet drops, all Ops cannot run simultaneouslyeg. configure a T at all sites before configuring TG/FG
Synchronizing TED between TE and OFC:
◮ requires common TED view◮ TE session supports this synchronization◮ TE synchronizes TED with persistent memory - to handle
simultaneous failures
Ordering issues:
◮ site-specific sequences IDs assigned to TE Ops◮ enables ordering among operations
31 / 57
![Page 53: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/53.jpg)
TE Protocol & OF - Dependencies and Failures
Dependencies among Ops:
◮ to avoid packet drops, all Ops cannot run simultaneouslyeg. configure a T at all sites before configuring TG/FG
Synchronizing TED between TE and OFC:
◮ requires common TED view◮ TE session supports this synchronization◮ TE synchronizes TED with persistent memory - to handle
simultaneous failures
Ordering issues:
◮ site-specific sequences IDs assigned to TE Ops◮ enables ordering among operations
TE Op failures:
◮ due to RPC failure, OFC rejection, etc.◮ dirty/clean bit for each TED entry◮ enables resuming TE Ops from point of failure
31 / 57
![Page 54: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/54.jpg)
Evaluation - Deployment and Evolution
Network traffic doubled in the year 2012
32 / 57
![Page 55: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/55.jpg)
Evaluation - Deployment and Evolution
33 / 57
![Page 56: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/56.jpg)
Evaluation - Deployment and Evolution
Observations:
1 Topology aggregation significantly reduces path churn and systemload.
2 Edge removals happen multiple times a day.
3 WAN links are susceptible ot frequent port flaps and benefit fromdynamic centralized management
34 / 57
![Page 57: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/57.jpg)
Evaluation - TE Ops Performance
100x reduction in no. of TE Opsby caching recently used tunnels.
reduction in failed Ops
Reduced latency
35 / 57
![Page 58: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/58.jpg)
Evaluation - TE Ops Performance
Notes:
TG Ops run for every topology change or change in demand
Growth in no. of TG Ops due to addition of network sites
Reduction in failure of TG Ops due to optimizations
36 / 57
![Page 59: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/59.jpg)
Evaluation - Impact of Failures
Figure: Impact of failure between two sites
Failure of transit router requires longer convergence time (≈ 3.3 sec)◮ update multi-path table entries for potentially several tunnels◮ each update Op is slow
37 / 57
![Page 60: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/60.jpg)
Evaluation - TE Algorithm Evaluation
Throughput improves as wehave more number of paths
Adding more paths and usingfiner granularity traffic splittinggives more flexibility to TE, butconsumes more hardware tableresources
B4′s deployment uses TE with quantum 1/4 and 4 paths
38 / 57
![Page 61: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/61.jpg)
Evaluation - Link Utilization
Utilization close to100%
Ability to mix priorityclasses across all edges
Use separate edges fordifferent classes
39 / 57
![Page 62: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/62.jpg)
Evaluation - Link Utilization
Figure: Per-link utilization in a trunk, demonstrating the effectiveness of hashing
For at least 75% site-to-site edges, max-min ratio of link utilization is:
◮ 1.05 without failures (ie. 5% from optimal)◮ 2.0 with failures
40 / 57
![Page 63: SDN: Google's B4 and Traffic Engineering · 2016-04-04 · Introduction Why SDN based solution? Limitations with traditional WAN architectures. Elastic bandwidth demands: majority](https://reader033.vdocuments.us/reader033/viewer/2022050308/5f7044c435b3db1344315cff/html5/thumbnails/63.jpg)
Conclusion
B4 now serves more traffic than Google’s public facing WAN withhigher growth rate.
SDN deployed cost-effective WAN bandwidth, running many links at100% utilization.
Hybrid approach an effective way to introduce SDN into existingdeployments.
Leveraging control at edge increases WAN utilization and improvingfault tolerance.
41 / 57