feature rich flow monitoring with p4 - netronome...2. evict to cpu on collision. packets...
TRANSCRIPT
![Page 1: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/1.jpg)
1 ©2017 Open-NFP
Feature Rich Flow Monitoring with P4
John Sonchack University of Pennsylvania
![Page 2: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/2.jpg)
2 ©2017 Open-NFP
Outline
▪ Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record Generation ▪ Benchmarks and Optimizations
![Page 3: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/3.jpg)
3 ©2017 Open-NFP
Outline
▪ Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record Generation ▪ Benchmarks and Optimizations
![Page 4: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/4.jpg)
4 ©2017 Open-NFP
Introduction: Flow Records
▪ A flow record summarizes groups of packets in the same TCP / UDP stream
Flow key (IP 5-tuple)
Flow features (statistics and meta-data)
![Page 5: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/5.jpg)
5 ©2017 Open-NFP
Introduction: Flow Record Applications
▪ Flow records are input to analysis applications
▪ Examples: • Security: Detect botnets, intrusions,
DoS attacks, port scans • Traffic management: Traffic
classification, QoS routing, flow scheduling, load balancing
• Debugging: Performance monitoring, loop and black hole detection
• More: – “A survey of network flow
applications” B. Li, et al. – “An overview of ip flow-based
intrusion detection” A.Sperotto, et al. – NetFlow/IPFIX applications
Flow records
Information
Reconfiguration
![Page 6: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/6.jpg)
6 ©2017 Open-NFP
Introduction: Flow Record Benefits
▪ Flow records reduce monitoring costs
▪ Important for high coverage monitoring • visibility into many (or all) links • costs add up
Packet Headers Flow Records Volume 2GB per second 1.38MB per second
Rate 328k per second 8.9k per second Tablesource:1hour10Gbit/scoreroutertrace(CAIDA 02/2015 Chicago dir. Ah5ps://www.caida.org/data/passive/trace_stats/)
Flow records
Information
Reconfiguration
![Page 7: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/7.jpg)
7 ©2017 Open-NFP
Introduction: Flow Record Richness
▪ An important quality of flow records is their information richness:
Accuracy: Account for every packet and flow on monitored links
Feature richness: provide the features that applications use
![Page 8: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/8.jpg)
8 ©2017 Open-NFP
Introduction: Flow Record Generation
▪ The problem: current approaches trade overhead for information richness
So#wareGenera+on SamplingSwitches DedicatedASICs
+Informa+onrich-Highoverhead
-Reducesaccuracy+Lowoverhead
-Limitedfeatureset+Lowoverhead
Flow recordsSampled flow records Fixed flow records
![Page 9: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/9.jpg)
9 ©2017 Open-NFP
Introduction: our Goal
▪ Flow record generation that is efficient, accurate, and feature rich
+Informa+onrich+Lowoverhead
Thiswork:so#waregenera+on+P4hardwareaccelera+onSo#waregenerators
+Informa+onrich-Highoverhead
Flow recordsFlow records
+
![Page 10: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/10.jpg)
10 ©2017 Open-NFP
Outline
▪ Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record
Generation ▪ Benchmarks and Optimizations
![Page 11: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/11.jpg)
11 ©2017 Open-NFP
Packets
Flow Records
Flow Records
Microflow Records
Packets
P4 Accelerated Flow Record Generation ▪ Main Idea: Use P4 hardware to preprocess packets into
micro flow records that summarize per-flow packet bursts
Totalwork:map8
packetsto3flowrecords
Totalwork:map4micro
flowrecordsto3flowrecords
• Informa+onrich• featuresarefully
customizable• Efficient
• P4hardwarereducesCPUworkload
So#wareflowrecordgenera+on
P4acceleratedflowrecordgenera+on
![Page 12: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/12.jpg)
12 ©2017 Open-NFP
First attempt: CPU Managed Tables
PacketsFlow Record Generator.P4
Insert
Update
Key Pkt Ct.
1
12
FlowGenerator_controller.c
RemoveInsert
CPU
Flow Records
P4 Hardware Forwarding Table
![Page 13: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/13.jpg)
13 ©2017 Open-NFP
PacketsFlow Record Generator.P4
Insert
Update
Key Pkt Ct.
1
12
FlowGenerator_controller.c
RemoveInsert
CPU
Flow Records
P4 Hardware Forwarding Table
First attempt: CPU Managed Tables
▪ Bottleneck: table update operations ▪ P4 designed for forwarding tables that are not highly dynamic
NotLineRate!
Flowarrivalrate:~50kflowspersec.(10Gbit/sInternetlink)
Controllerruleupdaterate:~200–5kpersecond(hwdep.)
![Page 14: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/14.jpg)
14 ©2017 Open-NFP
A Better Design: Minimal CPU Interaction 1. Use a flat array, i.e., P4 register, indexed by hash to store records.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.8
Hash
Key1.1.1.1—>…
52.2.2.2—>…
![Page 15: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/15.jpg)
15 ©2017 Open-NFP
A Better Design: Minimal CPU Interaction 1. Use a flat array, i.e., P4 register, indexed by hash to store records.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.8
Hash
Key1.1.1.1—>…
52.2.2.2—>…
![Page 16: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/16.jpg)
16 ©2017 Open-NFP
1. Use a flat array, i.e., P4 register, indexed by hash to store records.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.8
Hash
Key1.1.1.1—>…
52.2.2.2—>…
A Better Design: Minimal CPU Interaction
![Page 17: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/17.jpg)
17 ©2017 Open-NFP
1. Use a flat array, i.e., P4 register, indexed by hash to store records.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.9
Hash
Key1.1.1.1—>…
52.2.2.2—>…
A Better Design: Minimal CPU Interaction
![Page 18: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/18.jpg)
18 ©2017 Open-NFP
1. Use a flat array, i.e., P4 register, indexed by hash to store records.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.11
Hash
Key1.1.1.1—>…
72.2.2.2—>…
A Better Design: Minimal CPU Interaction
![Page 19: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/19.jpg)
19 ©2017 Open-NFP
1. Use a flat array, i.e., P4 register, indexed by hash to store records.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.11
Hash
Key1.1.1.1—>…
72.2.2.2—>…
Collision
A Better Design: Minimal CPU Interaction
![Page 20: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/20.jpg)
20 ©2017 Open-NFP
1. Use a flat array, i.e., P4 register, indexed by hash to store records. 2. Evict to CPU on collision.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.11
Hash
Key1.1.1.1—>…
13.3.3.3—>…
Insert
Normal Hash Table
72.2.2.2—>…
Collision
A Better Design: Minimal CPU Interaction
![Page 21: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/21.jpg)
21 ©2017 Open-NFP
1. Use a flat array, i.e., P4 register, indexed by hash to store records. 2. Evict to CPU on collision.
Packets
MicroFlowGenerator.P4
MicroFlowAggregator.c
P4 Hardware
CPUFlow Records
Pkt Ct.11
Hash
Key1.1.1.1—>…
13.3.3.3—>…
Insert
Normal Hash Table
72.2.2.2—>…
Timeout
A Better Design: Minimal CPU Interaction
![Page 22: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/22.jpg)
22 ©2017 Open-NFP
A Simple Implementation
![Page 23: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/23.jpg)
23 ©2017 Open-NFP
Packets
Flow Records
Flow Records
Microflow Records
Packets
P4 Accelerated Flow Record Generation ▪ Main Idea: Use P4 hardware to preprocess packets into
micro flow records that summarize per-flow packet bursts
Totalwork:map8
packetsto3flowrecords
Totalwork:map4micro
flowrecordsto3flowrecords
• Informa+onrich• featuresarefully
customizable• Efficient
• P4HWreducescpuworkload
Commodityserverflowrecordgenera+on
P4acceleratedflowrecordgenera+on
![Page 24: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/24.jpg)
24 ©2017 Open-NFP
Outline
▪ Introduction: Flow Records ▪ Design and Implementation: P4 Accelerated Flow Record Generation ▪ Benchmarks and Optimizations
![Page 25: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/25.jpg)
25 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload? 2. What is the maximum throughput of the P4 component? 3. How can we optimize for the NFP-4000?
![Page 26: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/26.jpg)
26 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload? 2. What is the maximum throughput of the P4 component? 3. How can we optimize for the NFP-4000?
![Page 27: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/27.jpg)
27 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
Workload:1hour10Gbit/scoreroutertrace(CAIDA 02/2015 Chicago dir. Ah5ps://www.caida.org/data/passive/trace_stats/)
Eachmicroflowrepresents~10packets,onaverage
![Page 28: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/28.jpg)
28 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
Workload:1hour10Gbit/scoreroutertrace(CAIDA 02/2015 Chicago dir. Ah5ps://www.caida.org/data/passive/trace_stats/)
Eachmicroflowrepresents~10packets,onaverage
Sta$s$c Packetsaggrega+on Microflowaggrega+on
Avg.ratefor10Gbit/slink
~500k ~50k
CPUaggregatorthroughput(percore)
~600k ~600k
totalCPUmonitoringcapacity(percore)
~1x10Gbit/slink ~10x10Gbit/slinks
Packets
Flow Records
Flow Records
Microflow Records
Packets
![Page 29: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/29.jpg)
29 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component? 3. How can we optimize it for the NFP-4000?
![Page 30: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/30.jpg)
30 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
![Page 31: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/31.jpg)
31 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
packetsize bitrate@10Mpkts/sec
64 ~5Gbit/s
128 ~10Gbit/s
256 ~20Gbit/s
512 ~40Gbit/s
1024 ~80Gbit/s
Averagepacketsizeson10Gbit/sinternetrouterlinks(caida)
![Page 32: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/32.jpg)
32 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
(~40-80 Gbit/s with average size packets)
packetsize bitrate@10Mpkts/sec
64 ~5Gbit/s
128 ~10Gbit/s
256 ~20Gbit/s
512 ~40Gbit/s
1024 ~80Gbit/s
Averagepacketsizeson10Gbit/sinternetrouterlinks(caida)
![Page 33: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/33.jpg)
33 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
(~40-80 Gbit/s with average size packets) 3. How can we optimize for the NFP-4000?
![Page 34: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/34.jpg)
34 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
(~40-80 Gbit/s with average size packets) 3. How can we optimize for the NFP-4000?
Op[miza[on1:finergrainedsemaphores
![Page 35: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/35.jpg)
35 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
(~40-80 Gbit/s with average size packets) 3. How can we optimize for the NFP-4000?
Op[miza[on1:finergrainedsemaphores
![Page 36: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/36.jpg)
36 ©2017 Open-NFP
Core 1
Core 2
Pkt Ct.Key
53.3.3.3—>…
packet.key = 4.4.4.4 —> … packet.hash = 3
1: read key @ 3
packet.key = 3.3.3.3 —> … packet.hash = 3
2: read key @ 3
3: replace 3
4: update 3
Optimization 1: Fine Grained P4 Semaphores
Q: Are there race conditions?
![Page 37: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/37.jpg)
37 ©2017 Open-NFP
Core 1
Core 2
Pkt Ct.Key
53.3.3.3—>…
packet.key = 4.4.4.4 —> … packet.hash = 3
1: read key @ 3
packet.key = 3.3.3.3 —> … packet.hash = 3
2: read key @ 3
3: replace 3
4: update 3
Optimization 1: Fine Grained P4 Semaphores
Incorrect!
Q: Are there race conditions? A: Yes.
![Page 38: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/38.jpg)
38 ©2017 Open-NFP
Optimization 1: Fine Grained P4 Semaphores
Core 1
Core 2
Pkt Ct.Key
53.3.3.3—>…
packet.key = 4.4.4.4 —> … packet.hash = 3
1: read key @ 3
packet.key = 3.3.3.3 —> … packet.hash = 3
3: read key @ 3
2: replace 3
4: replace 3
WAIT
Q: Are there race conditions? A: Yes.
![Page 39: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/39.jpg)
39 ©2017 Open-NFP
Optimization 1: Fine Grained P4 Semaphores
Core 1
Core 2
Pkt Ct.Key
53.3.3.3—>…
packet.key = 4.4.4.4 —> … packet.hash = 3
1: read key @ 3
packet.key = 3.3.3.3 —> … packet.hash = 3
3: read key @ 3
2: replace 3
4: replace 3
WAIT
![Page 40: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/40.jpg)
40 ©2017 Open-NFP
Optimization 1: Fine Grained P4 Semaphores
Core 1
Core 2
Pkt Ct.Key
53.3.3.3—>…
1: read key @ 3
2: replace 3WAIT
Core 3
WAIT
Core N
WAIT
…
![Page 41: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/41.jpg)
41 ©2017 Open-NFP
Optimization 1: Fine Grained P4 Semaphores
lock/unlockbycallingaP4ac[on.
Lockasingleentry,nottheen[rearray.
![Page 42: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/42.jpg)
42 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
(~40-80 Gbit/s with average size packets) 3. How can we optimize for the NFP-4000?
1. Fine grained P4 semaphores
Op[miza[on2:combiningtables
![Page 43: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/43.jpg)
43 ©2017 Open-NFP
Optimization 2: Combining Tables
Version Cycles/packet
Improvement
Baseline 5084 -
![Page 44: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/44.jpg)
44 ©2017 Open-NFP
Optimization 2: Combining Tables
Version Cycles/packet
Improvement
Baseline 5084 -
computehashandgetkeymerged
4191 18%
![Page 45: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/45.jpg)
45 ©2017 Open-NFP
Optimization 2: Combining Tables
Version Cycles/packet
Improvement
Baseline 5084 -
computehashandgetkeymerged
4191 18%
Changingtablesisasexpensiveas~5ememops(register_reads/writes).
![Page 46: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/46.jpg)
46 ©2017 Open-NFP
Optimization 2: Combining Tables
Canwemergeallthetables?Challenge:
branches.
![Page 47: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/47.jpg)
47 ©2017 Open-NFP
Optimization 2: Combining Tables
InC,wecouldusecondi[onalassignments.
Canwemergeallthetables?Challenge:
branches.
![Page 48: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/48.jpg)
48 ©2017 Open-NFP
Optimization 2: Combining Tables
InP4,wecanuseacondiAonalmask.
Canwemergeallthetables?Challenge:
branches.
hdp://homolog.us/blogs/blog/2014/12/04/the-en[re-world-of-bit-twiddling-hacks/
Encodeasmodify_field+register_writeinP4.
![Page 49: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/49.jpg)
49 ©2017 Open-NFP
Optimization 2: Combining Tables
hdp://homolog.us/blogs/blog/2014/12/04/the-en[re-world-of-bit-twiddling-hacks/
Version Cycles/packet
Improvement
Baseline 5084 -
computehashandgetkeymerged
4191 18%
singletable 2915 42%
Endresult:~20-30%bederthroughput.
![Page 50: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/50.jpg)
50 ©2017 Open-NFP
Benchmarks and Optimizations 1. How much does the P4 hardware reduce CPU workload?
(~10x with 1 MB of P4 HW memory) 2. What is the maximum throughput of the P4 component?
(~40-80 Gbit/s with average size packets) 3. How can we optimize for the NFP-4000?
1. Fine grained P4 semaphores 2. Combine tables, use conditional masks
![Page 51: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/51.jpg)
51 ©2017 Open-NFP
Summary
▪ Flow records are powerful for high coverage, low overhead network monitoring.
▪ Generating them efficiently, without sacrificing information richness, is a challenge.
▪ Our P4 accelerator can increase flow generator capacity by factor of 10 or more, without sacrificing information richness.
▪ Table merging, conditional masks, and P4 accessible semaphores are useful and portable optimization techniques.
▪ Code available (soon) at: https://github.com/jsonch/p4_code ▪ Thank you for attending!
![Page 52: Feature Rich Flow Monitoring with P4 - Netronome...2. Evict to CPU on collision. Packets MicroFlowGenerator.P4 MicroFlowAggregator.c P4 Hardware CPU Flow Records Pkt Ct. 11 Hash Key](https://reader035.vdocuments.us/reader035/viewer/2022062923/5f0a509c7e708231d42b0ceb/html5/thumbnails/52.jpg)
52 ©2017 Open-NFP
THANK YOU!