scalable high-performance parallel design for nids on many-core processors
DESCRIPTION
Haiyang Jiang, Gaogang Xie , Kave Salamatian and Laurent Mathy. Scalable High-Performance Parallel Design for NIDS on Many-Core Processors. Background & Motivation Our Approach Evaluation Conclusion. Outline. Signature based NIDS (de-facto standard) - PowerPoint PPT PresentationTRANSCRIPT
Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy
Background & Motivation Our Approach Evaluation Conclusion
04/22/23
2
Signature based NIDS (de-facto standard)
Deep Packet Inspection(DPI) is a crucial component of NIDS Consumes 70%-80% processing time
04/22/23
3
Due to increase in traffic and ruleset
CPU (2.5GHZ)
Cycle for processing a packet
1Gbps 20 Cycle
10Gbps 2 Cycle
40Gbps 0.5 Cycles
Traffic ↑
Ruleset ↑
04/22/23
4
Beyond Single Core Processor Due to powerful parallelism
The Mother of All CPU Charts 2005/2006, Bert Töpelt, Daniel Schuhmann, Frank Völkel, Tom's Hardware Guide, Nov. 2005.
04/22/23
5
Many-core Processor-based NDIS Higher flexibility and lower cost But lower performance than other
solutions
SoftwareDesigns
HardwareDesigns
Performance
Flexibility & Cost
•Flexible•Cheap
•Inflexible •Expensive•Unscalable
Underlying Performance Flexibility Price
TCAM High Low High
FPGA High Low High
GPU High Medial Medial
Many-core Processor
Low High Low
Network Processor
High Medial Medial
04/22/23
6
Two kinds of parallel models for NIDS Data parallelism
Advantages Thread isolation
Disadvantages Memory consumption Reference Locality
IDS
Data Parallelism
Scatter
04/22/23
7
Two kinds of parallel models for NIDS Function parallelism
Advantages Fine-grained Reference locality
Disadvantages Stage contentions Message transfer among stages
Scatter
Functional Parallelism
Gather
04/22/23
8
Communication Contention Bottleneck
Coherence, cooperation and communications
Contention Bottleneck
Shared State
04/22/23
9
Dozens of cores (TILERAGX with 36 cores)
Accelerated hardware modules mPIPE: packet capturing engine User Dynamic Network (UDN): communication
chip among cores
mP
IPE
Memory Controler
Memory Controler
10 GE
10 GE
Tile Architecture
10 GE
10 GE
Processor
L1 cache
L2 cache
CacheControler
SDN
Switch
IDNMDNCDN
TDN UDN
Example many-core processor (TILERAGX 36)
04/22/23
10
Goal: High-performance Flexible Scalable Inexpensive
Two Schemes Hybrid parallel scheme Hybrid load balancing scheme
SoftwareDesigns
HardwareDesigns
Performance
Flexibility
•Flexible•Inexpensive
•Inflexible •Expensive•Unscalable
•Flexible•High performance•Inexpensive•Scalable
04/22/23
11
Combination of two models Data parallel among Packet Processing Modules
(PPM) Function parallel in PPM
04/22/23
12
PacketCapture
ProtocolProcessing
Packet Processing
Module …
MSG Queue
MSG Queue
ProtocolProcessing
DetectionEngine
MSG Queue
DetectionEngine
MSG Queue …
Private Variables
Private Variables
Private Variables
Private Variables
Private Variables
Packet Capture module
ProtocolProcessing
DetectionEngine
23
6
9
5
8 7
1
4
MS
GM
SG
MS
GM
SG
MS
GM
SG
MS
G
MS
GM
SG
MS
G
MS
GM
SG
MS
G
MS
G
Packet Processing
Module
…Packet
Processing Module
…
mP
IPE
Public Variables sharing in the system
Message (MSG) Pool Raw Packets Multi-Pattern Matching Engine
reference
Shared Resource among PPMs Message (MSG) pool
04/22/23
13
PacketCapture
ProtocolProcessing
Packet Processing
Module …
MSG Queue
MSG Queue
ProtocolProcessing
DetectionEngine
MSG Queue
DetectionEngine
MSG Queue …
Private Variables
Private Variables
Private Variables
Private Variables
Private Variables
Packet Capture module
ProtocolProcessing
DetectionEngine
23
6
9
5
8 7
1
4
MS
GM
SG
MS
GM
SG
MS
GM
SG
MS
G
MS
GM
SG
MS
G
MS
GM
SG
MS
G
MS
G
Packet Processing
Module
…Packet
Processing Module
…
mP
IPE
Public Variables sharing in the system
Message (MSG) Pool Raw Packets Multi-Pattern Matching Engine
reference
Due to the lock of MSG pool Exploit mPIPE to access to MSG pool in
parallel Each packet has an individual MSG structure
43
98
5
61
20
pkt address
pkt address
pkt address
pkt address
pkt address
pkt address
pkt address
Packet Processing
Module
Packet Processing
Module
Packet Processing
Module
mP
IPE
7
MSG23
01
4
78
56
MSGMSGMSG
MSGMSGMSG
MSGMSG
9 MSGPacket Descriptors
in mPIPEMSG Pool shared among
all the modules
Capture
Release
Release
Get
pkt address
pkt address pkt address
04/22/23
14
The Lock for MSG pool is eliminated as each RAW packet has
its corresponding MSG
Due to MSG propagation among stages Exploit UDN to transfer MSG
Higher bandwidth and lower latency
Bandwidth latency
UDN 60T bps (1 + core_hop) cycles
Shared MemoryBased Queue
170G bps L1 hit: 2 cyclesL2 hit: 11 cyclesRemote L2 hit: 40 cyclesMain Memory: 80 cycles
04/22/23
15
First level: PPMs Flow based hashing for load balancing in mPIPE
Second level: Protocol processing threads Flow based hashing for load balancing in pipeline
Third level: Detection engine threads Rule partition balancing (RPB)
PacketCapture
ProtocolProcessing
Packet Processing
Module …
MSG Queue
MSG Queue
ProtocolProcessing
DetectionEngine
MSG Queue
DetectionEngine
MSG Queue …
Private Variables
Private Variables
Private Variables
Private Variables
Private Variables
Packet Capture module
ProtocolProcessing
DetectionEngine
23
6
9
5
8 7
1
4
MS
GM
SG
MS
GM
SG
MS
GM
SG
MS
G
MS
GM
SG
MS
G
MS
GM
SG
MS
G
MS
G
Packet Processing
Module
…
Packet Processing
Module
…
mP
IPE
Public Variables sharing in the system
Message (MSG) Pool Raw Packets Multi-Pattern Matching Engine
04/22/23
16
Each engine works on a sub-ruleset Offline partition Small detection engine Packet skipping
If one engine finds any intrusion in a packet, the other engines can skip over it.
See the details in our paper
04/22/23
17
1.5 Mpps with 9 cores 1 Packet Capture thread 2 Protocol Processing threads 6 Detection Engine threads
04/22/23
18
Background & Motivation Our Approach Evaluation Conclusion
04/22/23
19
TILERAGX36 processor 1.2GHZ * 36
Suricata (Open Source NIDS) implementation
Snort Ruleset 7571 rules
Synthetic traffic generator
04/22/23
20
7.2Gbps (100 Bytes packet)
04/22/23
21
04/22/23
22
17.40 Mbps/$ 8 times larger than MIDeA 3 times larger than Kargus
04/22/23
23
name Throughput (Gbps) Processor Cost ($)
Through per dollar(Mbps/$)
MIDeA 3.2 1138 2.8
Kargus 19.0 3164 6.0
Proposed design
11.0 650 17.4
Two parallel designs Hybrid parallel scheme Hybrid load balancing scheme
NIDS Evaluation on TILERAGX 36 High throughput per dollar cost
04/22/23
24
Thank you!
04/22/23
25