vpp host stack - wiki.fd.io · vpp –how does it work? compute optimized sw network platform 1...
TRANSCRIPT
EFFICIENCY
PERFORMANCE
SOFTWARE DEFINED NETWORKING
CLOUD NETWORK SERVICES
LINUX FOUNDATION
VPP - AUniversalTerabitNetworkPlatformForNativeCloudNetworkServices
Superior Performance
Most Efficient on the Planet
Flexible and Extensible
Open Source
Cloud Native
Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !
VPP – How does it work?Compute Optimized SW Network Platform
1Packetprocessingisdecomposedintoadirectedgraphofnodes…
Packet 0
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Packet 7
Packet 8
Packet 9
Packet 10
…packetsmovethroughgraphnodesinvector…2
Microprocessor
…graphnodesareoptimizedtofitinsidetheinstructioncache…
…packetsarepre-fetchedintothedatacache.
Instruction Cache3
Data Cache4
3
4
Makes use of modern Intel® Xeon® Processor micro-architectures. Instruction cache & data cache always hot è Minimized memory latency and usage.
vhost-user-input
af-packet-input dpdk-input
ip4-lookup-mulitcast ip4-lookup*
ethernet-input
mpls-inputlldp-input
arp-inputcdp-input...-no-
checksum
ip6-inputl2-input ip4-input
ip4-load-balance
mpls-policy-encap
ip4-rewrite-transit
ip4-midchain
interface-output
*Eachgraphnodeimplementsa“micro-NF”,a“micro-NetworkFunction”processingpackets.
Motivation:Containernetworking
DPDKSummitNorthAmerica2017
FIFO
TCP
IP(routing)
device
send()
FIFO
TCP
IP(routing)
device
recv()
kernel
glibc
PID1234 PID4321
Motivation:Containernetworking
FIFO
PID1234
TCP
IP(routing)
device
send()
FIFO
PID4321
TCP
IP(routing)
device
recv()
FIFO
device
FIFO
device
VPP
af_packet
etc etc etcACL,SR,VXLAN,LISP
IP4/6MPLS
Ethernet
dpdk
dpdk
device
af_packet
DPDKSummitNorthAmerica2017
Whynotthis?
PID1234 PID4321
recv()
FIFOFIFO
TCP
IP
DPDK
send()
Session
DPDKSummitNorthAmerica2017
VPP
VPPHostStack:SessionLayer
DPDKSummitNorthAmerica2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
§ Maintainsperappstateandconveysto/fromsessionevents
§ Allocatesandmanagessessions/segments/fifos§ Isolatesnetworkresourcesvianamespacing§ Sessionlookuptables(5-tuple)andlocal/global
sessionruletables(filters)§ Supportforpluggabletransportprotocols§ Binary/nativeCAPIforexternal/builtin
applications
shmsegmentrx tx
VPPHostStack:SVMFIFOs
DPDKSummitNorthAmerica2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
§ Allocatedwithinsharedmemorysegments§ Fixedpositionandsize§ Lockfreeenqueue/dequeue butatomicsize
increment§ Optiontodequeue/peekdata§ Supportforout-of-orderdataenqueues
shmsegmentrx tx
VPPHostStack:TCP
DPDKSummitNorthAmerica2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
shmsegmentrx tx
§ Clean-slateimplementation§ “Complete”statemachineimplementation§ Connectionmanagementandflowcontrol
(windowmanagement)§ Timersandretransmission,fastretransmit,SACK§ NewReno congestioncontrol,SACKbasedfast
recovery§ Checksumoffloading§ LinuxcompatibilitytestedwithIWLTCPprotocol
tester
VPPHostStack:Comms Library(VCL)
DPDKSummitNorthAmerica2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
§ Comms library(VCL)appscanlinkagainst§ LD_PRELOADlibraryforlegacyapps§ epoll
shmsegmentrx tx
ApplicationAttachment
DPDKSummitNorthAmerica2017
Session
App
TCP
IP,DPDK
VPP
attachbind(server)connect(client)
BinaryAPI
shmsegment
SessionEstablishment
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI BinaryAPI
attachbind
listen
SessionEstablishment
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
attachconnect
open
BinaryAPI
attachbind
listen
SessionEstablishment
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
handshake
BinaryAPI
SessionEstablishment
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
handshake
BinaryAPI
newclientconnectsucceeded
SessionEstablishment
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
connectreply
BinaryAPI
acceptnotifyshm
segmentshm
segmentrx tx rx tx
DataTransfer
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
read
copytobuffer copytofifo
rx tx rx tx
write
CongestioncontrolReliabletransport
BinaryAPI
tx writeevt
BinaryAPI
rx writeevt
DataTransfer
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
read
copytobuffer copytofifo
rx tx rx tx
write
CongestioncontrolReliabletransport
BinaryAPI
tx writeevt
BinaryAPI
rx writeevt
NotyetpartofCSITbutsomeroughnumbersonaE2690:200kCPSand8Gbps/core!
RedirectedConnections(Cut-through)
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Server
bindBinaryAPI
RedirectedConnections(Cut-through)
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Server
redirectBinaryAPI
connect
RedirectedConnections(Cut-through)
DPDKSummitNorthAmerica2017
Session
Client
TCP
IP,DPDK
VPP
Server
redirectBinaryAPI
connect
Throughputismemorybandwidthconstrained:~120Gbps!
Ongoingwork
• Overallintegrationwithk8s• Istio/Envoy
• TCP• Rxpolicer/tx pacer• TSO• Newcongestioncontrolalgorithms• PMTUdiscovery• Optimization/hardening/testing
• VCL/LD_PRELOAD• Iperf,nginx,wget,curl
DPDKSummitNorthAmerica2017
• GettheCode,BuildtheCode,RuntheCode• Sessionlayer:src/vnet/session• TCP:src/vnet/tcp• SVM:src/svm• VCL:src/vcl
• Read/WatchtheTutorials
• Read/WatchVPPTutorials• JointheMailingLists
DPDKSummitNorthAmerica2017
Nextsteps– Getinvolved
Multi-threading
DPDKSummitNorthAmerica2017
Session
App1
BinaryAPI
Session
DPDK
rx tx rx tx
TCP
IP
TCP
IP
Core0 Core1
Features:Namespaces
DPDKSummitNorthAmerica2017
Session
App
BinaryAPI
TCP
VPP
Session
TCP
Session
TCP
IP IP IP
ns1 ns2 ns3
fib1 fib2
Requestaccesstovpp ns+secret
Features:SessionTables
DPDKSummitNorthAmerica2017
NSLocalSessionTable
BinaryAPI
TCP
NSLocalSessionTable
TCP
ns1 ns2
fib1
GlobalSessionTable
App1
Requestaccesstoglobaland/orlocalscope
Features:SessionTables
DPDKSummitNorthAmerica2017
NSLocalSessionTable
BinaryAPI
TCP
NSLocalSessionTable
TCP
ns1 ns2
fib1
GlobalSessionTable
§ Bothtablehave“rulestable”thatcanbeusedforfiltering
§ Localtablesarenamespacespecificandcanbeusedforegressfiltering
§ Globaltablesarefibtablespecificandcanbeusedforingressfiltering
App1