scalable high-performance parallel design for nids on many-core processors

Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy

Background & Motivation Our Approach Evaluation Conclusion

04/22/23

2

Signature based NIDS (de-facto standard)

Deep Packet Inspection(DPI) is a crucial component of NIDS Consumes 70%-80% processing time

04/22/23

3

Due to increase in traffic and ruleset

CPU (2.5GHZ)

Cycle for processing a packet

1Gbps 20 Cycle

10Gbps 2 Cycle

40Gbps 0.5 Cycles

Traffic ↑

Ruleset ↑

04/22/23

4

Beyond Single Core Processor Due to powerful parallelism

The Mother of All CPU Charts 2005/2006, Bert Töpelt, Daniel Schuhmann, Frank Völkel, Tom's Hardware Guide, Nov. 2005.

04/22/23

5

Many-core Processor-based NDIS Higher flexibility and lower cost But lower performance than other

solutions

SoftwareDesigns

HardwareDesigns

Performance

Flexibility & Cost

•Flexible•Cheap

•Inflexible •Expensive•Unscalable

Underlying Performance Flexibility Price

TCAM High Low High

FPGA High Low High

GPU High Medial Medial

Many-core Processor

Low High Low

Network Processor

High Medial Medial

04/22/23

6

Two kinds of parallel models for NIDS Data parallelism

Advantages Thread isolation

Disadvantages Memory consumption Reference Locality

IDS

Data Parallelism

Scatter

04/22/23

7

Two kinds of parallel models for NIDS Function parallelism

Advantages Fine-grained Reference locality

Disadvantages Stage contentions Message transfer among stages

Scatter

Functional Parallelism

Gather

04/22/23

8

Communication Contention Bottleneck

Coherence, cooperation and communications

Contention Bottleneck

Shared State

04/22/23

9

Dozens of cores (TILERAGX with 36 cores)

Accelerated hardware modules mPIPE: packet capturing engine User Dynamic Network (UDN): communication

chip among cores

mP

IPE

Memory Controler

Memory Controler

10 GE

10 GE

Tile Architecture

10 GE

10 GE

Processor

L1 cache

L2 cache

CacheControler

SDN

Switch

IDNMDNCDN

TDN UDN

Example many-core processor (TILERAGX 36)

04/22/23

10

Goal: High-performance Flexible Scalable Inexpensive

Two Schemes Hybrid parallel scheme Hybrid load balancing scheme

SoftwareDesigns

HardwareDesigns

Performance

Flexibility

•Flexible•Inexpensive

•Inflexible •Expensive•Unscalable

•Flexible•High performance•Inexpensive•Scalable

04/22/23

11

Combination of two models Data parallel among Packet Processing Modules

(PPM) Function parallel in PPM

04/22/23

12

PacketCapture

ProtocolProcessing

Packet Processing

Module …

MSG Queue

MSG Queue

ProtocolProcessing

DetectionEngine

MSG Queue

DetectionEngine

MSG Queue …

Private Variables

Private Variables

Private Variables

Private Variables

Private Variables

Packet Capture module

ProtocolProcessing

DetectionEngine

23

6

9

5

8 7

1

4

MS

GM

SG

MS

GM

SG

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

G

Packet Processing

Module

…Packet

Processing Module

…

mP

IPE

Public Variables sharing in the system

Message (MSG) Pool Raw Packets Multi-Pattern Matching Engine

reference

Shared Resource among PPMs Message (MSG) pool

04/22/23

13

PacketCapture

ProtocolProcessing

Packet Processing

Module …

MSG Queue

MSG Queue

ProtocolProcessing

DetectionEngine

MSG Queue

DetectionEngine

MSG Queue …

Private Variables

Private Variables

Private Variables

Private Variables

Private Variables


ProtocolProcessing

DetectionEngine

23

6

9

5

8 7

1

4

MS

GM

SG

MS

GM

SG

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

G

Packet Processing

Module

…Packet

Processing Module

…

mP

IPE



reference

Due to the lock of MSG pool Exploit mPIPE to access to MSG pool in

parallel Each packet has an individual MSG structure

43

98

5

61

20

pkt address

pkt address

pkt address

pkt address

pkt address

pkt address

pkt address

Packet Processing

Module

Packet Processing

Module

Packet Processing

Module

mP

IPE

7

MSG23

01

4

78

56

MSGMSGMSG

MSGMSGMSG

MSGMSG

9 MSGPacket Descriptors

in mPIPEMSG Pool shared among

all the modules

Capture

Release

Release

Get

pkt address

pkt address pkt address

04/22/23

14

The Lock for MSG pool is eliminated as each RAW packet has

its corresponding MSG

Due to MSG propagation among stages Exploit UDN to transfer MSG

Higher bandwidth and lower latency

Bandwidth latency

UDN 60T bps (1 + core_hop) cycles

Shared MemoryBased Queue

170G bps L1 hit: 2 cyclesL2 hit: 11 cyclesRemote L2 hit: 40 cyclesMain Memory: 80 cycles

04/22/23

15

First level: PPMs Flow based hashing for load balancing in mPIPE

Second level: Protocol processing threads Flow based hashing for load balancing in pipeline

Third level: Detection engine threads Rule partition balancing (RPB)

PacketCapture

ProtocolProcessing

Packet Processing

Module …

MSG Queue

MSG Queue

ProtocolProcessing

DetectionEngine

MSG Queue

DetectionEngine

MSG Queue …

Private Variables

Private Variables

Private Variables

Private Variables

Private Variables


ProtocolProcessing

DetectionEngine

23

6

9

5

8 7

1

4

MS

GM

SG

MS

GM

SG

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

GM

SG

MS

G

MS

G

Packet Processing

Module

…

Packet Processing

Module

…

mP

IPE



04/22/23

16

Each engine works on a sub-ruleset Offline partition Small detection engine Packet skipping

If one engine finds any intrusion in a packet, the other engines can skip over it.

See the details in our paper

04/22/23

17

1.5 Mpps with 9 cores 1 Packet Capture thread 2 Protocol Processing threads 6 Detection Engine threads

04/22/23

18

Background & Motivation Our Approach Evaluation Conclusion

04/22/23

19

TILERAGX36 processor 1.2GHZ * 36

Suricata (Open Source NIDS) implementation

Snort Ruleset 7571 rules

Synthetic traffic generator

04/22/23

20

7.2Gbps (100 Bytes packet)

04/22/23

21

04/22/23

22

17.40 Mbps/$ 8 times larger than MIDeA 3 times larger than Kargus

04/22/23

23

name Throughput (Gbps) Processor Cost ($)

Through per dollar(Mbps/$)

MIDeA 3.2 1138 2.8

Kargus 19.0 3164 6.0

Proposed design

11.0 650 17.4

Two parallel designs Hybrid parallel scheme Hybrid load balancing scheme

NIDS Evaluation on TILERAGX 36 High throughput per dollar cost

04/22/23

24

Thank you!

04/22/23

25

scalable high-performance parallel design for nids on many-core processors

Documents

paralleleach packet

raw packet

bytes packet

ppmsmessage msg pool

modelsdata parallel

load balancing

msg propagation

processing time