self-healing networking with flow label

Post on 12-Jun-2022

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Self-healing Networkingwith Flow Label

Alexander Azimov mitradir@yandex-team.ru

ToR + 2xPlanes + ToR

๐‘†22๐‘†22๐‘†21

๐‘‹22๐‘‹21

๐‘†12๐‘†12๐‘†11

๐‘‹12๐‘‹11

๐‘‡๐‘œ๐‘…1 ๐‘‡๐‘œ๐‘…2

Servers Servers

โ„Ž๐‘Ž๐‘ โ„Ž

๐‘๐‘Ÿ๐‘œ๐‘ก๐‘œ๐‘ ๐‘Ÿ๐‘_๐‘–๐‘๐‘‘๐‘ ๐‘ก_๐‘–๐‘

๐‘ ๐‘Ÿ๐‘_๐‘๐‘œ๐‘Ÿ๐‘ก๐‘‘๐‘ ๐‘ก_๐‘๐‘œ๐‘Ÿ๐‘ก

Theory DC: Many-Many Paths

N_PLANES: Number of planes in DC;

N_X_SPINES: Number of super spines (X) in each plane;

โ€ข Inside ToR: 1

โ€ข Inside PoD: N_PLANES

โ€ข Between PoDs: N_PLANES x N_X_SPINES

Real DC: Many-Many Paths

N_PLANES: Number of planes in DC; (8)

N_X_SPINES: Number of super spines (X) in each plane; (32)

โ€ข Inside ToR: 1

โ€ข Inside PoD: N_PLANES = 8

โ€ข Between PoDs: N_PLANES x N_X_SPINES = 256

๐‘‹11 is Broken: Constant Loss

๐‘†22๐‘†22๐‘†21

๐‘‹22๐‘‹21

๐‘†12๐‘†12๐‘†11

๐‘‹12๐‘‹11

๐‘‡๐‘œ๐‘…1 ๐‘‡๐‘œ๐‘…2

Servers Servers

โ„Ž๐‘Ž๐‘ โ„Ž

๐‘๐‘Ÿ๐‘œ๐‘ก๐‘œ๐‘ ๐‘Ÿ๐‘_๐‘–๐‘๐‘‘๐‘ ๐‘ก_๐‘–๐‘

๐‘ ๐‘Ÿ๐‘_๐‘๐‘œ๐‘Ÿ๐‘ก๐‘‘๐‘ ๐‘ก_๐‘๐‘œ๐‘Ÿ๐‘ก

Unhappy TCP Flow

๐‘†22๐‘†22๐‘†21

๐‘‹22๐‘‹21

๐‘†12๐‘†12๐‘†11

๐‘‹12๐‘‹11

๐‘‡๐‘œ๐‘…1 ๐‘‡๐‘œ๐‘…2

โ„Ž๐‘Ž๐‘ โ„Ž

๐‘๐‘Ÿ๐‘œ๐‘ก๐‘œ๐‘ ๐‘Ÿ๐‘_๐‘–๐‘๐‘‘๐‘ ๐‘ก_๐‘–๐‘

๐‘ ๐‘Ÿ๐‘_๐‘๐‘œ๐‘Ÿ๐‘ก๐‘‘๐‘ ๐‘ก_๐‘๐‘œ๐‘Ÿ๐‘ก

RTOServers Servers

RTO & SYN_RTO Timeouts

0

20

40

60

80

100

120

140

1th retry 2th retry 3th retry 4th retry 5th retry 6th retry 7th retry

Timeout in Seconds

SYN DATA

RTO_MIN SYN_RTO

200ms 1s

Timeouts

Real RTT

1ms

RTO = MAX(RTO_MIN, RTT)

LinuxKernel

2014

LinuxKernel

2015

LinuxKernel

2016

TCP RTO & skb->hash

skb->hashRTO

IP6 Flow Label

GRE Encap: KEY

UDP Encap: SRC Port

IP6 Ecnap: Flow Label

net.ipv6.auto_flowlabels

0: automatic flow labels are completely disabled

1: automatic flow labels are enabled by default, they can be disabled on a per socket basis using the IPV6_AUTOFLOWLABEL socket option

2: automatic flow labels are allowed, they may be enabled on a per socket basis using the IPV6_AUTOFLOWLABEL socket option

3: automatic flow labels are enabled and enforced, they cannot be disabled by the socket option

Default: 1

Unhappy TCP Flow Becomes Happier

๐‘†22๐‘†22๐‘†21

๐‘‹22๐‘‹21

๐‘†12๐‘†12๐‘†11

๐‘‹12๐‘‹11

๐‘‡๐‘œ๐‘…1 ๐‘‡๐‘œ๐‘…2

โ„Ž๐‘Ž๐‘ โ„Ž

๐‘๐‘Ÿ๐‘œ๐‘ก๐‘œ๐‘ ๐‘Ÿ๐‘_๐‘–๐‘๐‘‘๐‘ ๐‘ก_๐‘–๐‘

๐‘ ๐‘Ÿ๐‘_๐‘๐‘œ๐‘Ÿ๐‘ก๐‘‘๐‘ ๐‘ก_๐‘๐‘œ๐‘Ÿ๐‘ก๐‘“๐‘™๐‘œ๐‘ค ๐‘™๐‘Ž๐‘๐‘’๐‘™

RTO

Servers Servers

Evaluation: Without Flow Label

One of four ToR uplinks drops packets, significant service degradation

75%

Evaluation: Flow Label + eBPF

One of four ToR uplink drops packets, no effect on the service!

75%

Self-healing Datacenter: Cookbook

โ€ข Does it scale? Yes!

โ€ข Does it have many paths? Yes!

โ€ข Does it have fault tolerance? Use IPv6! Use flow label!

โ€ข How do I change RTO? eBPF is the answer!

โ€ข Without documentation!

Theory Internet: Many-Many Paths

Multihomed at the edge;

Multiple connections between peers;

Multiple connection with upstreams;

Real Internet: Many-Many Paths

Average number of best paths: 3.8

Maximum number of best paths: 44

>60% of prefixes have more then 1 path

A Real Outage

RTO & Anycast

TCP Proxy 2

TCP Proxy 1Anycast IP

Anycast IP

Src IP 1 Dst IP 2 FL=X1

Src Port 1 Dst Port 2

Ack=A Seq=S

RTO & Anycast

TCP Proxy 2

TCP Proxy 1Anycast IP

Anycast IPRTO

Src IP 1 Dst IP 2 FL=X2

Src Port 1 Dst Port 2

Ack=A Seq=S

SYN RTO & Anycast

TCP Proxy 2

TCP Proxy 1Anycast IP

Anycast IP

SYN

Src IP 1 Dst IP 2 FL=X1

Src Port 1 Dst Port 2

Ack=0 Seq=S1

SYN RTO & Anycast

TCP Proxy 2

TCP Proxy 1Anycast IP

Anycast IP

SYN/ACK

Src IP 2 Dst IP 1 FL=Y1

Src Port 2 Dst Port 1

Ack=S1+1 Seq=S2

SYN RTO & Anycast

TCP Proxy 2

TCP Proxy 1Anycast IP

Anycast IPSYN

Src IP 1 Dst IP 2 FL=X2

Src Port 1 Dst Port 2

Ack=0 Seq=S1

SYN RTO & Anycast

TCP Proxy 2

TCP Proxy 1Anycast IP

Anycast IP

SYN/ACK

SYN/ACK

Src IP 2 Dst IP 1 FL=Y1

Src Port 2 Dst Port 1

Ack=S1+1 Seq=S2

Src IP 2 Dst IP 1 FL=Z1

Src Port 2 Dst Port 1

Ack=S1+1 Seq=S3

SYN RTO & Anycast

TCP Proxy 2

TCP Proxy 1Anycast IP

Anycast IPACK

Src IP 1 Dst IP 2 FL=X2

Src Port 1 Dst Port 2

Ack=S2 + 1 Seq=S1 + 1

Flow Label: Safe Mode

Client โ€“ sends SYN, Server โ€“ responds with SYN&ACK

โ€ข In case of SYN_RTO or RTO events Server SHOULD recalculate its TCP socket hash, thus change Flow Label. This behavior MAY be switched on by default;

โ€ข In case of SYN_RTO or RTO events Client MAY recalculate its TCP socket hash, thus change Flow Label. This behavior MUST be switched off by default;

Self-healing Datacenter: Cookbook

โ€ข Flow label provides is a way to โ€˜jumpโ€™ from a failing path;

โ€ข Already works in controlled environment;

โ€ข Can disrupt TCP connection with stateful anycast services;

โ€ข We need to change Linux defaults!

โ€ข This time we need to document it!

TCP

top related