vpp host stack - wiki.fd.io · vpp –how does it work? compute optimized sw network platform 1...

30
VPP Host Stack TCP and Session Layers Florin Coras, Dave Barach, Keith Burns, Dave Wallace

Upload: phunganh

Post on 04-Nov-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

VPPHostStackTCPandSessionLayers

FlorinCoras,DaveBarach,KeithBurns,DaveWallace

EFFICIENCY

PERFORMANCE

SOFTWARE DEFINED NETWORKING

CLOUD NETWORK SERVICES

LINUX FOUNDATION

VPP - AUniversalTerabitNetworkPlatformForNativeCloudNetworkServices

Superior Performance

Most Efficient on the Planet

Flexible and Extensible

Open Source

Cloud Native

Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !

VPP – How does it work?Compute Optimized SW Network Platform

1Packetprocessingisdecomposedintoadirectedgraphofnodes…

Packet 0

Packet 1

Packet 2

Packet 3

Packet 4

Packet 5

Packet 6

Packet 7

Packet 8

Packet 9

Packet 10

…packetsmovethroughgraphnodesinvector…2

Microprocessor

…graphnodesareoptimizedtofitinsidetheinstructioncache…

…packetsarepre-fetchedintothedatacache.

Instruction Cache3

Data Cache4

3

4

Makes use of modern Intel® Xeon® Processor micro-architectures. Instruction cache & data cache always hot è Minimized memory latency and usage.

vhost-user-input

af-packet-input dpdk-input

ip4-lookup-mulitcast ip4-lookup*

ethernet-input

mpls-inputlldp-input

arp-inputcdp-input...-no-

checksum

ip6-inputl2-input ip4-input

ip4-load-balance

mpls-policy-encap

ip4-rewrite-transit

ip4-midchain

interface-output

*Eachgraphnodeimplementsa“micro-NF”,a“micro-NetworkFunction”processingpackets.

Motivation:Containernetworking

DPDKSummitNorthAmerica2017

FIFO

TCP

IP(routing)

device

send()

FIFO

TCP

IP(routing)

device

recv()

kernel

glibc

PID1234 PID4321

Motivation:Containernetworking

FIFO

PID1234

TCP

IP(routing)

device

send()

FIFO

PID4321

TCP

IP(routing)

device

recv()

FIFO

device

FIFO

device

VPP

af_packet

etc etc etcACL,SR,VXLAN,LISP

IP4/6MPLS

Ethernet

dpdk

dpdk

device

af_packet

DPDKSummitNorthAmerica2017

Whynotthis?

PID1234 PID4321

recv()

FIFOFIFO

TCP

IP

DPDK

send()

Session

DPDKSummitNorthAmerica2017

VPP

VPPHostStack

DPDKSummitNorthAmerica2017

Session

App

BinaryAPI

TCP

IP,DPDK

VPP

shmsegmentrx tx

VPPHostStack:SessionLayer

DPDKSummitNorthAmerica2017

Session

App

BinaryAPI

TCP

IP,DPDK

VPP

§ Maintainsperappstateandconveysto/fromsessionevents

§ Allocatesandmanagessessions/segments/fifos§ Isolatesnetworkresourcesvianamespacing§ Sessionlookuptables(5-tuple)andlocal/global

sessionruletables(filters)§ Supportforpluggabletransportprotocols§ Binary/nativeCAPIforexternal/builtin

applications

shmsegmentrx tx

VPPHostStack:SVMFIFOs

DPDKSummitNorthAmerica2017

Session

App

BinaryAPI

TCP

IP,DPDK

VPP

§ Allocatedwithinsharedmemorysegments§ Fixedpositionandsize§ Lockfreeenqueue/dequeue butatomicsize

increment§ Optiontodequeue/peekdata§ Supportforout-of-orderdataenqueues

shmsegmentrx tx

VPPHostStack:TCP

DPDKSummitNorthAmerica2017

Session

App

BinaryAPI

TCP

IP,DPDK

VPP

shmsegmentrx tx

§ Clean-slateimplementation§ “Complete”statemachineimplementation§ Connectionmanagementandflowcontrol

(windowmanagement)§ Timersandretransmission,fastretransmit,SACK§ NewReno congestioncontrol,SACKbasedfast

recovery§ Checksumoffloading§ LinuxcompatibilitytestedwithIWLTCPprotocol

tester

VPPHostStack:Comms Library(VCL)

DPDKSummitNorthAmerica2017

Session

App

BinaryAPI

TCP

IP,DPDK

VPP

§ Comms library(VCL)appscanlinkagainst§ LD_PRELOADlibraryforlegacyapps§ epoll

shmsegmentrx tx

ApplicationAttachment

DPDKSummitNorthAmerica2017

Session

App

TCP

IP,DPDK

VPP

attachbind(server)connect(client)

BinaryAPI

shmsegment

SessionEstablishment

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Session

Server

TCP

IP,DPDK

VPP

BinaryAPI BinaryAPI

attachbind

listen

SessionEstablishment

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Session

Server

TCP

IP,DPDK

VPP

BinaryAPI

attachconnect

open

BinaryAPI

attachbind

listen

SessionEstablishment

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Session

Server

TCP

IP,DPDK

VPP

BinaryAPI

handshake

BinaryAPI

SessionEstablishment

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Session

Server

TCP

IP,DPDK

VPP

BinaryAPI

handshake

BinaryAPI

newclientconnectsucceeded

SessionEstablishment

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Session

Server

TCP

IP,DPDK

VPP

BinaryAPI

connectreply

BinaryAPI

acceptnotifyshm

segmentshm

segmentrx tx rx tx

DataTransfer

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Session

Server

TCP

IP,DPDK

VPP

read

copytobuffer copytofifo

rx tx rx tx

write

CongestioncontrolReliabletransport

BinaryAPI

tx writeevt

BinaryAPI

rx writeevt

DataTransfer

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Session

Server

TCP

IP,DPDK

VPP

read

copytobuffer copytofifo

rx tx rx tx

write

CongestioncontrolReliabletransport

BinaryAPI

tx writeevt

BinaryAPI

rx writeevt

NotyetpartofCSITbutsomeroughnumbersonaE2690:200kCPSand8Gbps/core!

RedirectedConnections(Cut-through)

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Server

bindBinaryAPI

RedirectedConnections(Cut-through)

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Server

redirectBinaryAPI

connect

RedirectedConnections(Cut-through)

DPDKSummitNorthAmerica2017

Session

Client

TCP

IP,DPDK

VPP

Server

redirectBinaryAPI

connect

Throughputismemorybandwidthconstrained:~120Gbps!

Ongoingwork

• Overallintegrationwithk8s• Istio/Envoy

• TCP• Rxpolicer/tx pacer• TSO• Newcongestioncontrolalgorithms• PMTUdiscovery• Optimization/hardening/testing

• VCL/LD_PRELOAD• Iperf,nginx,wget,curl

DPDKSummitNorthAmerica2017

• GettheCode,BuildtheCode,RuntheCode• Sessionlayer:src/vnet/session• TCP:src/vnet/tcp• SVM:src/svm• VCL:src/vcl

• Read/WatchtheTutorials

• Read/WatchVPPTutorials• JointheMailingLists

DPDKSummitNorthAmerica2017

Nextsteps– Getinvolved

Thankyou!

DPDKSummitNorthAmerica2017

? FlorinCorasemail:([email protected])irc:florinc

DPDKSummitNorthAmerica2017

Multi-threading

DPDKSummitNorthAmerica2017

Session

App1

BinaryAPI

Session

DPDK

rx tx rx tx

TCP

IP

TCP

IP

Core0 Core1

Features:Namespaces

DPDKSummitNorthAmerica2017

Session

App

BinaryAPI

TCP

VPP

Session

TCP

Session

TCP

IP IP IP

ns1 ns2 ns3

fib1 fib2

Requestaccesstovpp ns+secret

Features:SessionTables

DPDKSummitNorthAmerica2017

NSLocalSessionTable

BinaryAPI

TCP

NSLocalSessionTable

TCP

ns1 ns2

fib1

GlobalSessionTable

App1

Requestaccesstoglobaland/orlocalscope

Features:SessionTables

DPDKSummitNorthAmerica2017

NSLocalSessionTable

BinaryAPI

TCP

NSLocalSessionTable

TCP

ns1 ns2

fib1

GlobalSessionTable

§ Bothtablehave“rulestable”thatcanbeusedforfiltering

§ Localtablesarenamespacespecificandcanbeusedforegressfiltering

§ Globaltablesarefibtablespecificandcanbeusedforingressfiltering

App1