Download - 1. T r a 2 . Pr e p r o c e s s i n g G PU D o m a i n ...sc13.supercomputing.org/sites/default/files/... · Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar

Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar

1. The Problem

In high speed networks (10 Gbps and above), network traffic monitoring and analysis applications that require scrutiny on a per-packet basis typically demand immense computing power and very high I/O throughputs. These applications face extreme performance and scalability challenges.

2. Our solution: Use GPU-based Traffic Monitoring and Analysis Tools

At Fermilab, we have prototyped a network traffic monitoring and analysis system using GPUs.

3. Why Choose GPU?

Characteristics of packet-based network monitoring & analysis applications:

• Time constraints on packet processing. • Compute and I/O throughput-intensive. • High levels of data parallelism. • Extremely poor temporal locality for data.

Requirements on computing platform for high performance network monitoring & analysis applications:

• High Compute power • Ample memory bandwidth • Capability of handing data parallelism

inherent with network data • Easy programmability

Three types of computing platforms available:

• NPU/ASIC • CPU • GPU

GPU is well suited for network monitoring and analysis in high-speed networks

Features NPU/ASIC CPU GPU

High compute power Varies ✖ ✔

High memory bandwidth Varies ✖ ✔

Easy programmability ✖ ✔ ✔

Data-parallel execution model ✖ ✖ ✔

Architecture Comparison

...

1. Traffic Capture2. Preprocessing GPU Domain

Monitoring & Analysis

Kernels

Output

User Space

Output

Output

3. Monitoring & Analysis

4. Output Display

Packet

Buffer

Network Packets

NICs

Packet

BufferOutput

...

Capturing

CapturedData

Packet Chunks

Four types of Logical Entities:

(1) Traffic Capture (2) Preprocessing (3) Monitoring & Analysis (4) Output Display

Key Mechanisms:

• Partial packet capture approach: GPU has a relatively small memory size. Only packets headers are copied into the GPU domain instead of the the entire packets.

• A new packet I/O engine: Use commodity NICs to capture network traffic into the CPU domain without packet drops.

• A GPU-accelerated library for network monitoring and

analysis consisting of dozens of CUDA kernels, which can be combined in multiple ways for intended tasks

4. System Architecture

Free Packet Buffer Chunks

OS Kernel

User Space

...

Capture

Attach

Recycle

Recv Descriptor Ring

Packet Buffer Chunk

Incoming PacketsNIC

...

Descriptor Segments

Processing Data

Packet I/O Engine

Key techniques • Pre-allocated large packet

buffers • Packet-level batch processing • Mem-mapping based zero-

copy

1

raw_pkts [ ]

filtered_pkts [ ]

filtering_buf [ ]

scan_buf [ ]

index

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

index

index

0 1 2 3

Filtering

1

Scan

2

Compact

3

p2p1 p3 p5p4 p6 p8p7

0 1 1 0 0 1 0

0 1 1 2 3 3 3 4

p1 p3 p4 p7

x x xx

Packet-Filtering Kernel

Advanced packet filtering capabilities at wire speed are necessary so that we only analyze those packets of interest to us.

5. Initial Results

• GPU can significantly speed up packet processing. Compared to a single core CPU, the GPU’s speedup ratios vary from 8.82 to 17.04. When compared to a 6-core CPU, the speedup ratios range from 1.54 to 3.20.

GPU Packet-filtering Algorithm Evaluation

BPF: “UDP” BPF: net 131.225.107 & tcp

0

20

40

60

80

100

120

140

160

180

mmap-gpu s-cpu-1.6 s-cpu-2.0 s-cpu-2.4 m-cpu-1.6 m-cpu-2.0 m-cpu-2.4 standard-gpu

Throughput(Unit:Millionpackets/s) Data-set-1 Data-set-2

Data-set-3 Data-set-4

0

10

20

30

40

50

60

70

80

mmap-gpu s-cpu-1.6 s-cpu-2.0 s-cpu-2.4 m-cpu-1.6 m-cpu-2.0 m-cpu-2.4 standard-gpu

Throughput(Unit:Millionpackets/s) Data-set-1 Data-set-2

Data-set-3 Data-set-4

Download - 1. T r a 2 . Pr e p r o c e s s i n g G PU D o m a i n ...sc13.supercomputing.org/sites/default/files/... · Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar

Top Related