running bgp in data centers at scale - usenix

52
Running BGP in Data Centers at Scale Anubhavnidhi Abhashkumar, Kausik Subramanian, Alexey Andreyev, Hyojeong Kim , Nanda Kishore Salem, Jingyi Yang, Petr Lapukhov, Aditya Akella, Hongyi Zeng University of Wisconsin – Madison, Facebook April 2021

Upload: others

Post on 07-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Running BGP in Data Centers at ScaleAnubhavnidhi Abhashkumar, Kausik Subramanian, Alexey Andreyev, Hyojeong Kim, Nanda Kishore Salem, Jingyi Yang, Petr Lapukhov, Aditya Akella, Hongyi Zeng

University of Wisconsin – Madison, Facebook April 2021

Motivation

April 2021

Facebook Data Center Growth

April 2021

Facebook Data Center Growth

Tremendous user growth

500 million in 2010

2.5 billion in 2020

April 2021

Facebook Data Center Growth

Tremendous user growth

500 million in 2010

2.5 billion in 2020

Our data center networks have grown to meet with user demand

Source: FBOSS: Building Switch Software at Scale, SIGCOMM 18

April 2021

Data Center Routing Requirements

April 2021

Data Center Routing Requirements

Scalable design to support quick growth

April 2021

Data Center Routing Requirements

Scalable design to support quick growth

Reliability, expecting failures to be common

April 2021

Data Center Routing Requirements

Scalable design to support quick growth

Operational efficiency to innovate rapidly while providing production reliability

Reliability, expecting failures to be common

April 2021

BGP as the DC routing protocol

April 2021

BGP (Border Gateway Protocol) is chosen for scalability, extensive policy control, and long-term stability.

BGP as the DC routing protocol

April 2021

BGP (Border Gateway Protocol) is chosen for scalability, extensive policy control, and long-term stability.

BGP on vendor devices helped grow our networks rapidly initially,compared to developing an SDN routing solution from scratch.

BGP as the DC routing protocol

April 2021

BGP Challenges

April 2021

Long convergence, routing instabilities and misconfigurations in the Internet

BGP Challenges

April 2021

Long convergence, routing instabilities and misconfigurations in the Internet

Problematic in the DC as well due to frequent failures and maintenance events

BGP Challenges

April 2021

Long convergence, routing instabilities and misconfigurations in the Internet

Problematic in the DC as well due to frequent failures and maintenance events

Vendor BGP software not able to support evolving routing requirements fast

BGP Challenges

April 2021

Our Solution

April 2021

Scalable BGP routing design adhering to the principles of configuration uniformity and operational simplicity

Our Solution

April 2021

Scalable BGP routing design adhering to the principles of configuration uniformity and operational simplicity

Routing policies to achieve reliability of the network under failures

Our Solution

April 2021

Scalable BGP routing design adhering to the principles of configuration uniformity and operational simplicity

Routing policies to achieve reliability of the network under failures

In-house BGP agent coupled with automated testing and incremental deployment framework to achieve operational efficiency

Our Solution

DC Routing Design with BGP

April 2021

Scalable Network Topology

Spine Switches(SSW)

Spine Plane 1 Spine Plane 4

Server Pod 1 Server Pod N

Fabric Switches(FSW)

Rack Switches(RSW)

Server Server

April 2021

Scalable Network Topology

Spine Switches(SSW)

Spine Plane 1 Spine Plane 4

Server Pod 1 Server Pod N

Fabric Switches(FSW)

Rack Switches(RSW)

Server Server

April 2021

Scalable Network Topology

Spine Switches(SSW)

Spine Plane 1 Spine Plane 4

Server Pod 1 Server Pod N

Fabric Switches(FSW)

Rack Switches(RSW)

Server Server

April 2021

Scalable Network Topology

Spine Switches(SSW)

Spine Plane 1 Spine Plane 4

Server Pod 1 Server Pod N

Fabric Switches(FSW)

Rack Switches(RSW)

Server Server

April 2021

Scalable Network Topology

Spine Switches(SSW)

Spine Plane 1 Spine Plane 4

Server Pod 1 Server Pod N

Fabric Switches(FSW)

Rack Switches(RSW)

Server Server

April 2021

Scalable Network Topology

Spine Switches(SSW)

Spine Plane 1 Spine Plane 4

Server Pod 1 Server Pod N

Fabric Switches(FSW)

Rack Switches(RSW)

Server Server

April 2021

Reusable BGP ASN Allocation

ASF1

AS F2

AS F3

AS F4

AS R1

AS R2 AS R3 AS R4 AS R5 AS RN

Spine Plane 65001

Plane AS: 65001

AS YAS Y

AS YAS Y

AS YAS Y

AS YAS Y

Confed AS: 65101

Server Pod 65101

FSW

RSW

SSW

65101 . . . . . .

65102

65103

65104

65105

65106

65107

65108

65109

NAS

65301AS

65302AS

65303AS

65304

AS 65401

AS 65402

AS 65403

AS 65404

AS 65405

AS N

AS

65001 AS

65001 AS

65001 AS

65001 AS

65001AS

65001 . . . .

FSW

RSW

SSW

April 2021

Route SummarizationControl route scale

April 2021

• Minimize HW FIB size requirements

Route SummarizationControl route scale

April 2021

• Minimize HW FIB size requirements• Reduce control plane processing

Route SummarizationControl route scale

April 2021

• Minimize HW FIB size requirements• Reduce control plane processing

Route SummarizationControl route scale

Example of Production routes, reflecting topology hierarchy

ASF1

AS F2

AS F3

AS F4

AS R1

AS R2 AS R3 AS R4 AS R5 AS RN

FSW

RSW

FSW

RSW

Pod

Servers

April 2021

• Minimize HW FIB size requirements• Reduce control plane processing

Route SummarizationControl route scale

Example of Production routes, reflecting topology hierarchy

ASF1

AS F2

AS F3

AS F4

AS R1

AS R2 AS R3 AS R4 AS R5 AS RN

FSW

RSW

FSW

RSW

Pod

Servers

Server IP addressesface:b00c::1:0/128

April 2021

• Minimize HW FIB size requirements• Reduce control plane processing

Route SummarizationControl route scale

Example of Production routes, reflecting topology hierarchy

ASF1

AS F2

AS F3

AS F4

AS R1

AS R2 AS R3 AS R4 AS R5 AS RN

FSW

RSW

FSW

RSW

Pod

Servers

Server IP addressesface:b00c::1:0/128

Rack Aggregateface:b00c::0/96

April 2021

• Minimize HW FIB size requirements• Reduce control plane processing

Route SummarizationControl route scale

Example of Production routes, reflecting topology hierarchy

ASF1

AS F2

AS F3

AS F4

AS R1

AS R2 AS R3 AS R4 AS R5 AS RN

FSW

RSW

FSW

RSW

Pod

Servers

Server IP addressesface:b00c::1:0/128

Rack Aggregateface:b00c::0/96

Pod Aggregateface:b00c::0/64

April 2021

DC Policy Goals

Goal Description

Reliability Enforce route propagation scopes.Predefine backup paths for failure.

Scalability Enforce route summarizationAvoid backup path explosion

Maintainability Isolate/remediate problematic nodes without disrupting traffic

Service reachability Avoid service disruptions when instances of services are added/removed/migrated

In the Internet, BGP polices implement business relationships.

April 2021

DC Policy Goals

Goal Description

Reliability Enforce route propagation scopes.Predefine backup paths for failure.

Scalability Enforce route summarizationAvoid backup path explosion

Maintainability Isolate/remediate problematic nodes without disrupting traffic

Service reachability Avoid service disruptions when instances of services are added/removed/migrated

In the Internet, BGP polices implement business relationships.

April 2021

DC Policy Goals

Goal Description

Reliability Enforce route propagation scopes.Predefine backup paths for failure.

Scalability Enforce route summarizationAvoid backup path explosion

Maintainability Isolate/remediate problematic nodes without disrupting traffic

Service reachability Avoid service disruptions when instances of services are added/removed/migrated

In the Internet, BGP polices implement business relationships.

April 2021

DC Policy Goals

Goal Description

Reliability Enforce route propagation scopes.Predefine backup paths for failure.

Scalability Enforce route summarizationAvoid backup path explosion

Maintainability Isolate/remediate problematic nodes without disrupting traffic

Service reachability Avoid service disruptions when instances of services are added/removed/migrated

In the Internet, BGP polices implement business relationships.

April 2021

DC Policy Goals

Goal Description

Reliability Enforce route propagation scopes.Predefine backup paths for failure.

Scalability Enforce route summarizationAvoid backup path explosion

Maintainability Isolate/remediate problematic nodes without disrupting traffic

Service reachability Avoid service disruptions when instances of services are added/removed/migrated

In the Internet, BGP polices implement business relationships.

April 2021

FSW1 FSW2

RSW1 RSW2

ReliabilityPredefine backup paths for link failure

During BGP Route Convergence

April 2021

FSW1 FSW2

RSW1 RSW2

ReliabilityPredefine backup paths for link failure

During BGP Route Convergence

action: add tag‘rack_prefix’

RSW1’s rack_prefix:-> RSW1 (best)

April 2021

FSW1 FSW2

RSW1 RSW2

ReliabilityPredefine backup paths for link failure

match: ‘rack_prefix’action: add tag‘backup_path’

match: ‘backup_path’action: allow

During BGP Route Convergence

action: add tag‘rack_prefix’

backup_path

RSW1’s rack_prefix:-> RSW1 (best)-> RSW2--FSW2--RSW1 (backup)

April 2021

ReliabilityPredefine backup paths for link failure

FSW1 FSW2

RSW1 RSW2

Traffic flow on link failure

April 2021

ReliabilityPredefine backup paths for link failure

FSW1 FSW2

RSW1 RSW2

Traffic flow on link failure

RSW1

April 2021

FSW1 FSW2

RSW1 RSW2

During BGP Route Convergence

match: ‘backup_path’action: add tag

‘completed_backup_path’

completed_backup_path

Routes with tag ‘completed_backup_path’ won’t be propagated to other devices.

Enforce route summarization and avoid backup path explosion

Scalability

April 2021

match: ‘backup_path’action: add tag

‘completed_backup_path’

ScalabilityEnforce route summarization and avoid backup path explosion

FSW1 FSW2

RSW1 RSW2action: add tag‘rack_prefix’

match: ‘rack_prefix’action: add tag‘backup_path’

match: ‘backup_path’action: allow

During BGP Route Convergence

Convergence happens locally within a server pod

Operational Experience

April 2021

Built Operational PipelineSupport BGP like any other SW, enabling fast incremental updates!

Configuration (Policy)

BGP agent

Testing

Emulation Canary

Deployment

P1 …P2 P6

Conclusion

April 2021

Scalable BGP-based routing design: Reusable ASN allocation and route summarization coupled with our scalable topology design

Reliable under failures: Routing policies such as predefined backup path

Operationally Efficient: Automated testing and incremental deployment framework for our in-house BGP implementation

Conclusion