1. t r a 2 . pr e p r o c e s s i n g g pu d o m a i n...

1
Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar 1. The Problem In high speed networks (10 Gbps and above), network traffic monitoring and analysis applications that require scrutiny on a per-packet basis typically demand immense computing power and very high I/O throughputs. These applications face extreme performance and scalability challenges. 2. Our solution: Use GPU-based Traffic Monitoring and Analysis Tools At Fermilab, we have prototyped a network traffic monitoring and analysis system using GPUs. 3. Why Choose GPU? Characteristics of packet-based network monitoring & analysis applications: Time constraints on packet processing. Compute and I/O throughput-intensive. High levels of data parallelism. Extremely poor temporal locality for data. Requirements on computing platform for high performance network monitoring & analysis applications: High Compute power Ample memory bandwidth Capability of handing data parallelism inherent with network data Easy programmability Three types of computing platforms available: NPU/ASIC CPU GPU GPU is well suited for network monitoring and analysis in high-speed networks Features NPU/ASIC CPU GPU High compute power Varies High memory bandwidth Varies Easy programmability Data-parallel execution model Architecture Comparison ... 1. Traffic Capture 2. Preprocessing GPU Domain Monitoring & Analysis Kernels Output User Space Output Output 3. Monitoring & Analysis 4. Output Display Packet Buffer Network Packets NICs Packet Buffer Output ... Capturing Captured Data Packet Chunks Four types of Logical Entities: (1) Traffic Capture (2) Preprocessing (3) Monitoring & Analysis (4) Output Display Key Mechanisms: Partial packet capture approach: GPU has a relatively small memory size. Only packets headers are copied into the GPU domain instead of the the entire packets. A new packet I/O engine: Use commodity NICs to capture network traffic into the CPU domain without packet drops. A GPU-accelerated library for network monitoring and analysis consisting of dozens of CUDA kernels, which can be combined in multiple ways for intended tasks 4. System Architecture Free Packet Buffer Chunks OS Kernel User Space ... Capture Attach Recycle Recv Descriptor Ring Packet Buffer Chunk Incoming Packets NIC ... Descriptor Segments Processing Data Packet I/O Engine Key techniques Pre-allocated large packet buffers Packet-level batch processing Mem-mapping based zero- copy 1 raw_pkts [ ] ltered_pkts [ ] ltering_buf [ ] scan_buf [ ] index 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 index index 0 1 2 3 Filtering 1 Scan 2 Compact 3 p2 p1 p3 p5 p4 p6 p8 p7 0 1 1 0 0 1 0 0 1 1 2 3 3 3 4 p1 p3 p4 p7 x x x x Packet-Filtering Kernel Advanced packet filtering capabilities at wire speed are necessary so that we only analyze those packets of interest to us. 5. Initial Results GPU can significantly speed up packet processing. Compared to a single core CPU, the GPU’s speedup ratios vary from 8.82 to 17.04. When compared to a 6-core CPU, the speedup ratios range from 1.54 to 3.20. GPU Packet-filtering Algorithm Evaluation BPF: “UDP” BPF: net 131.225.107 & tcp 0 20 40 60 80 100 120 140 160 180 mmap-gpu s-cpu-1.6 s-cpu-2.0 s-cpu-2.4 m-cpu-1.6 m-cpu-2.0 m-cpu-2.4 standard-gpu Throughput (Unit: Million packets/s) Data-set-1 Data-set-2 Data-set-3 Data-set-4 0 10 20 30 40 50 60 70 80 mmap-gpu s-cpu-1.6 s-cpu-2.0 s-cpu-2.4 m-cpu-1.6 m-cpu-2.0 m-cpu-2.4 standard-gpu Throughput (Unit: Million packets/s) Data-set-1 Data-set-2 Data-set-3 Data-set-4

Upload: others

Post on 23-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1. T r a 2 . Pr e p r o c e s s i n g G PU D o m a i n ...sc13.supercomputing.org/sites/default/files/... · Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar

Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar

1. The Problem

In high speed networks (10 Gbps and above), network traffic monitoring and analysis applications that require scrutiny on a per-packet basis typically demand immense computing power and very high I/O throughputs. These applications face extreme performance and scalability challenges.

2. Our solution: Use GPU-based Traffic Monitoring and Analysis Tools

At Fermilab, we have prototyped a network traffic monitoring and analysis system using GPUs.

3. Why Choose GPU?

Characteristics of packet-based network monitoring & analysis applications:

• Time constraints on packet processing. • Compute and I/O throughput-intensive. • High levels of data parallelism. • Extremely poor temporal locality for data.

Requirements on computing platform for high performance network monitoring & analysis applications:

• High Compute power • Ample memory bandwidth • Capability of handing data parallelism

inherent with network data • Easy programmability

Three types of computing platforms available:

• NPU/ASIC • CPU • GPU

GPU is well suited for network monitoring and analysis in high-speed networks

Features NPU/ASIC CPU GPU

High compute power Varies ✖ ✔

High memory bandwidth Varies ✖ ✔

Easy programmability ✖ ✔ ✔

Data-parallel execution model ✖ ✖ ✔

Architecture Comparison

...

1. Traffic Capture2. Preprocessing GPU Domain

Monitoring & Analysis

Kernels

Output

User Space

Output

Output

3. Monitoring & Analysis

4. Output Display

Packet

Buffer

Network Packets

NICs

Packet

BufferOutput

...

Capturing

CapturedData

Packet Chunks

Four types of Logical Entities:

(1) Traffic Capture (2) Preprocessing (3) Monitoring & Analysis (4) Output Display

Key Mechanisms:

• Partial packet capture approach: GPU has a relatively small memory size. Only packets headers are copied into the GPU domain instead of the the entire packets.

• A new packet I/O engine: Use commodity NICs to capture network traffic into the CPU domain without packet drops.

• A GPU-accelerated library for network monitoring and

analysis consisting of dozens of CUDA kernels, which can be combined in multiple ways for intended tasks

4. System Architecture

Free Packet Buffer Chunks

OS Kernel

User Space

...

Capture

Attach

Recycle

Recv Descriptor Ring

Packet Buffer Chunk

Incoming PacketsNIC

...

Descriptor Segments

Processing Data

Packet I/O Engine

Key techniques • Pre-allocated large packet

buffers • Packet-level batch processing • Mem-mapping based zero-

copy

1

raw_pkts [ ]

filtered_pkts [ ]

filtering_buf [ ]

scan_buf [ ]

index

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

index

index

0 1 2 3

Filtering

1

Scan

2

Compact

3

p2p1 p3 p5p4 p6 p8p7

0 1 1 0 0 1 0

0 1 1 2 3 3 3 4

p1 p3 p4 p7

x x xx

Packet-Filtering Kernel

Advanced packet filtering capabilities at wire speed are necessary so that we only analyze those packets of interest to us.

5. Initial Results

• GPU can significantly speed up packet processing. Compared to a single core CPU, the GPU’s speedup ratios vary from 8.82 to 17.04. When compared to a 6-core CPU, the speedup ratios range from 1.54 to 3.20.

GPU Packet-filtering Algorithm Evaluation

BPF: “UDP” BPF: net 131.225.107 & tcp

0

20

40

60

80

100

120

140

160

180

mmap-gpu s-cpu-1.6 s-cpu-2.0 s-cpu-2.4 m-cpu-1.6 m-cpu-2.0 m-cpu-2.4 standard-gpu

Throughput(Unit:Millionpackets/s) Data-set-1 Data-set-2

Data-set-3 Data-set-4

0

10

20

30

40

50

60

70

80

mmap-gpu s-cpu-1.6 s-cpu-2.0 s-cpu-2.4 m-cpu-1.6 m-cpu-2.0 m-cpu-2.4 standard-gpu

Throughput(Unit:Millionpackets/s) Data-set-1 Data-set-2

Data-set-3 Data-set-4