pvpp: a programmable vector packet processoryo2seol/static/talks/pvpp-cisco.pdf · tcp ipv4...

Post on 22-Mar-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PVPP:AProgrammableVectorPacketProcessor

SeanChoi,XiangLong,MuhammadShahbaz,

SkipBooth,AndyKeep,JohnMarshall,Changhoon Kim

TCP

IPv4

Ethernet

UDP

IPv6 BGP

HTTP

TLS

Fixed-FunctionSwitchChipFixedSetofProtocols

TCP

IPv4

Ethernet

CUSTOM_P

IPv6 BGP

HTTP

TLS

ProgrammableSwitchingChipCustomProtocols

SoftwareSwitch

VM VM

3VirtualPorts

1PhysicalPort

0

20

40

60

2010 2011 2012 2013 2014 2015

Approx.NumberofPhysicalPortsvs.VirtualPorts[1]

PhyicalPorts VirtualPorts

[1] Martin Casado, VMWorld 2013

TCP

IPv4

Ethernet

CUSTOM_P

IPv6 BGP

HTTP

SoftwareSwitchCustomProtocols

PISCES[1]

BMv2[2]

[2] https://github.com/p4lang/behavioral-model

[1] PISCES. ACM SIGCOMM 2016.

7.59

13.32 13.43

0246810121416

64

Throughp

ut(G

bps)

PacketSize(Bytes)

PISCESv0.1 PISCESv1.0 NativeOVS

Performanceoverheadof

<2%

ThroughputonEth+IPv4+ACLbenchmarkapplication[1]

[1] PISCES. ACM SIGCOMM 2016.

So… whyANOTHERP4softwareswitch?

Parser Match+Action Tables Queues/Scheduling

Initially, the switching chip is not programmed and does not know any protocols.

Packet Metadata

Protocol Authoring

L2_L3.p4

Compile

Configure

Parser Match+Action Tables Queues/Scheduling

Packet Metadata

TCP New

IPv4 IPv6

VLANEthRun-time API

Driver

Switch OS

Protocol Authoring

L2_L3.p4

Compile

Configure

Parser Match+Action Tables Queues/Scheduling

Packet Metadata

Run-time APIDriver

Switch OSOF1-3.p4

KernelDPDK

SoftwareSwitch

Parser Match-Action Pipeline

KernelDPDK

Software Switch

Domain-Specific Language (DSL)

Parser Match-Action Pipeline

Compile

Parser Match-Action Pipeline

KernelDPDK

SoftwareSwitch

DSL 1

Parser Match-Action Pipeline

Compile

Parser Match-Action Pipeline

DSL 2

Parser Match-Action Pipeline

KernelDPDK

SoftwareSwitch2

Parser Match-Action Pipeline

PISCES• P4toOvS

BMv2• P4toa

C++customswitch

What’swrongwiththisdesign?

• NotdesignedforCPUbasedarchitectures

• Limitedinexpressiveness

• LimitedAPIstoaccesslowlevelconstructs

=>Lotofroomforimprovements!

VectorPacketProcessing(VPP)Platform

• OpensourceversionofCisco’s

VectorPacketProcessingtechnology

• Modular packetprocessingnodegraphabstraction

• Eachnodeprocessesavectorofpacketstoreduce

CPUI-cachethrashing

• Extensibleanddynamicallyreconfigurableviaplugins

VectorPacketProcessing(VPP)Platform

• ProvenPerformance[1]

[1] https://wiki.fd.io/view/VPP/What_is_VPP%3F

• MultipleMPPSfromasinglex86_64core

• >100Gbpsfull-duplexonasinglephysicalhost

• OutperformsOpenvSwitch invariousscenarios

1core: 9MPPSipv4in+out forwarding2cores:13.4MPPSipv4in+out forwarding4cores:20.0MPPSipv4in+out forwarding

…Packet Vector dpdk-input

ip6-inputip4-input llc-input

ip6-lookup

ip6-rewrite-transmit

dpdk-output

…Packet Vector dpdk-input

ip6-inputip4-input llc-input

ip6-lookup

ip6-rewrite-transmit

dpdk-output

Vanilla VPP Nodes

Custom-input

Node 1 Node 2 Node i

Node j

Node k

Custom Plugin

…Packet Vector dpdk-input

ip6-inputip4-input llc-input

ip6-lookup

ip6-rewrite-transmit

dpdk-output

Vanilla VPP Nodes

Enabled via CLI

Custom-input

Node 1 Node 2 Node i

Node j

Node k

Custom Plugin

PVPPOverview

• Createsaplugin basedontheinputP4program

• NochangestoexistingVPPcodebase

• Compileseithersinglenodeormultiplenodeplugin

• Multiplenodesaresplitbynumberoftablesinthe

inputP4program

• P4programscanbeswappeddynamically

…Packet Vector dpdk-input

ip6-inputip4-input llc-input

ip6-lookup

ip6-rewrite-transmit

dpdk-output

Vanilla VPP Nodes

Enabled via CLI

pvpp-input

Table 1 Table 2 Table i

Table j

Table k

Multi-NodePVPP Plugin

Front-endCompiler

BMv2Mid-endCompiler

BMv2Back-endCompiler

JSON-VPPCompiler

VPPPlugin

Directory

P4Program

VPP PluginCog

Templates

P4 Compiler (P4C)

JSON

C Files

DetailsofPVPPPlugin• HeadersaredefinedasCstructs

header_type ethernet_t {fields {dstAddr: 48;srcAddr: 48;etherType: 16;

}}

typedef struct {u8 dstAddr[6];u8 srcAddr[6];u16 etherType;

} p4_type_Ethernet_h;

• Actioninterfacetakespointerstoallheader,metadata,

runtimedataandcompilerselectsthecorrectpointerandset

ofprimitivestoperformonthedata.

DetailsofPVPPPlugin• Atabledefinitioncontainstwoparts

1. Amatchdefinitionthatdefinesthetypeofmatch

(EXACT,LPM)andwhichfieldstomatchwith

2. Aactiondefinitionwhichcontainssetofactionpointers

correspondingtothematchresult

PVPPCLI• TwoCLIsarecurrentlysupported

1. Enable/DisablePVPPPipeline

$ pvpp [ingress interface name]

2. CLItoinstallmatchrulesforaparticulartable

$ pvpp insert-rule [table name]

[match value] [action name]

[runtime data]

PVPPDPDK

MoonGenSender/Receiver

MoonGenSender/Receiver

10Gx3 10Gx3

M1 M2 M3CPU:IntelXeonE5-2640v32.6GHzMemory:32GBRDIMM,2133MT/s,DualRankNICs:IntelX710DP/QPDASFP+CardsHDD:1TB7.2KRPMNLSAS6Gbps

Experimental Setup

BenchmarkApplication

IPv4_match

Match:ip.dstAddrAction:Set_nhop

drop

ParseEthernet/

IPv4

Match:ip.dstAddrAction:Set_dmac

drop

Destination MAC

Match:egress_portAction:Set_dmac

drop

Source MAC

BaselinePerformance

7.867.05

0

1

2

3

4

5

6

7

8

9

64

Throughp

ut(M

pps)

PacketSize(Bytes)

SingleNode MultipleNode

Compileroptimizations• Removeredundanttables

• Reducingmetadataaccess

• BypassingredundantVPPnodes

• Reducepointerdereference

• CachinglogicalHWinterfaces

• Unrollingloopsformultiplepacketprocessing

LoopUnrolling

Manuallyfetchestwopackets

OptimizedPerformance

7.86

9.25 9.51 9.51 9.58 10.01 10.21

7.05

8.38 8.50 8.80 8.89 9.02 9.20

0

2

4

6

8

10

12

Baseline RemovingRedundantTables

ReducingMetadataAccess

LoopUnrolling BypassingRedundantNodes

ReducingPointer

Dereferences

CachingLogicalHWInterface

Throughp

ut(M

pps)

SingleNode MultipleNode64bytepackets,single10Gport

OptimizedPerformance

10.21

8.07

5.634.38

9.208.07

5.65

4.38

0

2

4

6

8

10

12

64 128 192 256

Throughp

ut(M

pps)

PacketSize(Bytes)

SingleNode MultipleNode

OptimizedPerformance

010002000300040005000600070008000900010000

64 128 192 256

Throughp

ut(M

bps)

PacketSize(Bytes)

SingleNode MultipleNode

OptimizedPerformance

133.00149.00

171.00194.00

159.00172.30

222.30

255.20

0

50

100

150

200

250

300

64 128 192 256

AverageCP

UCyclesp

erPacket

PacketSize(Bytes)

SingleNode MultipleNode

Scalability

8.52

17.03

26.40

35.83

44.23

53.11

8.14

16.57

24.14

33.41

40.69

49.34

0

10

20

30

40

50

60

1 2 3 4 5 6

Throughp

ut(M

pps)

NumberofCPUs

SingleNode MultipleNode

64bytepacketsacross3x10Gports

PerformanceComparison

59.53

49.31

34.71

26.78

63.49

47.23

34.72

26.7830.22 30.22 30.20

26.78

0

10

20

30

40

50

60

70

64 128 192 256

Throughp

ut(M

pps)

PacketSize(Bytes)

PVPP PISCES(withMicroflow) PISCES(withoutMicroflow)

FutureWork

• Automatednodesplitsbasedontheinputprogram

• Morecompilerannotationsforlowlevelconstructs

• ExtendingP4supportsuchasdataplanestates

• VPPspecificP4_16backendcompiler

• ExtendingPVPPCLIfeatures

Summary

PVPP

VPP

P4- Aperformantanddynamically

reconfigurableP4switchbasedonadifferentpacketprocessingabstraction

- Moreimprovementsplannedoverthesummerpriortopublicrelease

Questions?

top related