scalable high-performance parallel design for nids on many-core processors

25
Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy

Upload: anaya

Post on 01-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Haiyang Jiang, Gaogang Xie , Kave Salamatian and Laurent Mathy. Scalable High-Performance Parallel Design for NIDS on Many-Core Processors. Background & Motivation Our Approach Evaluation Conclusion. Outline. Signature based NIDS (de-facto standard) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy

Page 2: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Background & Motivation Our Approach Evaluation Conclusion

04/22/23

2

Page 3: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Signature based NIDS (de-facto standard)

Deep Packet Inspection(DPI) is a crucial component of NIDS Consumes 70%-80% processing time

04/22/23

3

Page 4: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Due to increase in traffic and ruleset

CPU (2.5GHZ)

Cycle for processing a packet

1Gbps 20 Cycle

10Gbps 2 Cycle

40Gbps 0.5 Cycles

Traffic ↑

Ruleset ↑

04/22/23

4

Page 5: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Beyond Single Core Processor Due to powerful parallelism

The Mother of All CPU Charts 2005/2006, Bert Töpelt, Daniel Schuhmann, Frank Völkel, Tom's Hardware Guide, Nov. 2005.

04/22/23

5

Page 6: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Many-core Processor-based NDIS Higher flexibility and lower cost But lower performance than other

solutions

SoftwareDesigns

HardwareDesigns

Performance

Flexibility & Cost

•Flexible•Cheap

•Inflexible •Expensive•Unscalable

Underlying Performance Flexibility Price

TCAM High Low High

FPGA High Low High

GPU High Medial Medial

Many-core Processor

Low High Low

Network Processor

High Medial Medial

04/22/23

6

Page 7: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Two kinds of parallel models for NIDS Data parallelism

Advantages Thread isolation

Disadvantages Memory consumption Reference Locality

IDS

Data Parallelism

Scatter

04/22/23

7

Page 8: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Two kinds of parallel models for NIDS Function parallelism

Advantages Fine-grained Reference locality

Disadvantages Stage contentions Message transfer among stages

Scatter

Functional Parallelism

Gather

04/22/23

8

Page 9: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Communication Contention Bottleneck

Coherence, cooperation and communications

Contention Bottleneck

Shared State

04/22/23

9

Page 10: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Dozens of cores (TILERAGX with 36 cores)

Accelerated hardware modules mPIPE: packet capturing engine User Dynamic Network (UDN): communication

chip among cores

mP

IPE

Memory Controler

Memory Controler

10 GE

10 GE

Tile Architecture

10 GE

10 GE

Processor

L1 cache

L2 cache

CacheControler

SDN

Switch

IDNMDNCDN

TDN UDN

Example many-core processor (TILERAGX 36)

04/22/23

10

Page 11: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Goal: High-performance Flexible Scalable Inexpensive

Two Schemes Hybrid parallel scheme Hybrid load balancing scheme

SoftwareDesigns

HardwareDesigns

Performance

Flexibility

•Flexible•Inexpensive

•Inflexible •Expensive•Unscalable

•Flexible•High performance•Inexpensive•Scalable

04/22/23

11

Page 12: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Combination of two models Data parallel among Packet Processing Modules

(PPM) Function parallel in PPM

04/22/23

12

PacketCapture

ProtocolProcessing

Packet Processing

Module …

MSG Queue

MSG Queue

ProtocolProcessing

DetectionEngine

MSG Queue

DetectionEngine

MSG Queue …

Private Variables

Private Variables

Private Variables

Private Variables

Private Variables

Packet Capture module

ProtocolProcessing

DetectionEngine

23

6

9

5

8 7

1

4

MS

GM

SG

MS

GM

SG

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

G

Packet Processing

Module

…Packet

Processing Module

mP

IPE

Public Variables sharing in the system

Message (MSG) Pool Raw Packets Multi-Pattern Matching Engine

reference

Page 13: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Shared Resource among PPMs Message (MSG) pool

04/22/23

13

PacketCapture

ProtocolProcessing

Packet Processing

Module …

MSG Queue

MSG Queue

ProtocolProcessing

DetectionEngine

MSG Queue

DetectionEngine

MSG Queue …

Private Variables

Private Variables

Private Variables

Private Variables

Private Variables

Packet Capture module

ProtocolProcessing

DetectionEngine

23

6

9

5

8 7

1

4

MS

GM

SG

MS

GM

SG

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

G

Packet Processing

Module

…Packet

Processing Module

mP

IPE

Public Variables sharing in the system

Message (MSG) Pool Raw Packets Multi-Pattern Matching Engine

reference

Page 14: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Due to the lock of MSG pool Exploit mPIPE to access to MSG pool in

parallel Each packet has an individual MSG structure

43

98

5

61

20

pkt address

pkt address

pkt address

pkt address

pkt address

pkt address

pkt address

Packet Processing

Module

Packet Processing

Module

Packet Processing

Module

mP

IPE

7

MSG23

01

4

78

56

MSGMSGMSG

MSGMSGMSG

MSGMSG

9 MSGPacket Descriptors

in mPIPEMSG Pool shared among

all the modules

Capture

Release

Release

Get

pkt address

pkt address pkt address

04/22/23

14

The Lock for MSG pool is eliminated as each RAW packet has

its corresponding MSG

Page 15: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Due to MSG propagation among stages Exploit UDN to transfer MSG

Higher bandwidth and lower latency

Bandwidth latency

UDN 60T bps (1 + core_hop) cycles

Shared MemoryBased Queue

170G bps L1 hit: 2 cyclesL2 hit: 11 cyclesRemote L2 hit: 40 cyclesMain Memory: 80 cycles

04/22/23

15

Page 16: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

First level: PPMs Flow based hashing for load balancing in mPIPE

Second level: Protocol processing threads Flow based hashing for load balancing in pipeline

Third level: Detection engine threads Rule partition balancing (RPB)

PacketCapture

ProtocolProcessing

Packet Processing

Module …

MSG Queue

MSG Queue

ProtocolProcessing

DetectionEngine

MSG Queue

DetectionEngine

MSG Queue …

Private Variables

Private Variables

Private Variables

Private Variables

Private Variables

Packet Capture module

ProtocolProcessing

DetectionEngine

23

6

9

5

8 7

1

4

MS

GM

SG

MS

GM

SG

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

G

Packet Processing

Module

Packet Processing

Module

mP

IPE

Public Variables sharing in the system

Message (MSG) Pool Raw Packets Multi-Pattern Matching Engine

04/22/23

16

Page 17: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Each engine works on a sub-ruleset Offline partition Small detection engine Packet skipping

If one engine finds any intrusion in a packet, the other engines can skip over it.

See the details in our paper

04/22/23

17

Page 18: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

1.5 Mpps with 9 cores 1 Packet Capture thread 2 Protocol Processing threads 6 Detection Engine threads

04/22/23

18

Page 19: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Background & Motivation Our Approach Evaluation Conclusion

04/22/23

19

Page 20: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

TILERAGX36 processor 1.2GHZ * 36

Suricata (Open Source NIDS) implementation

Snort Ruleset 7571 rules

Synthetic traffic generator

04/22/23

20

Page 21: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

7.2Gbps (100 Bytes packet)

04/22/23

21

Page 22: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

04/22/23

22

Page 23: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

17.40 Mbps/$ 8 times larger than MIDeA 3 times larger than Kargus

04/22/23

23

name Throughput (Gbps) Processor Cost ($)

Through per dollar(Mbps/$)

MIDeA 3.2 1138 2.8

Kargus 19.0 3164 6.0

Proposed design

11.0 650 17.4

Page 24: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Two parallel designs Hybrid parallel scheme Hybrid load balancing scheme

NIDS Evaluation on TILERAGX 36 High throughput per dollar cost

04/22/23

24

Page 25: Scalable High-Performance Parallel Design for NIDS on Many-Core Processors

Thank you!

04/22/23

25