para-snort : a multi-thread snort on multi-core ia platform tsinghua university pdcs 2009 november...

Post on 30-Dec-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Para-Snort : A Multi-thread Snort on Multi-Core IA Platform

Tsinghua University

PDCS 2009

November 3, 2009

Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue and Jun Li

2

Outline

Introduction of NIDS on IA Some previous work Structure of our system, what’s different? Detailed module design Breaking the bottlenecks Para-Snort Performance Conclusions

3

NIDS on IA platform

NIDS(Network Intrusion Detection System) looks into both header and payload of packets to identify intrusion

Why on IA platform? low price easily to develop flexibility on structure and ruleset

But not so fast as ASICs or FPGA!

4

The structure of NIDS

Snort by Sourcefire Inc. The most popular open

source NIDS on IA platform Preprocess and Detect cost

most computation power

Data acquisition

Decoder

Preprocess

Detect

Output

5

Way to speed up?

Multicore IA platform Leads the trends of higher processor

computation power Need parallel structure of the software Rarely leveraged in existing NIDS

Two previous work: Supra-linear and MultiSnort

6

Supra-linear Packet Processing

Intel Co. in 2006 One data acquisition

component Duplicated other

components No memory sharing ... ...

Packet capture

Packet classification hash

Thread 1

Packet decoder

Preprocessors

Detection engine

Output plug-in

Packet decoder

Preprocessors

Detection engine

Output plug-in

Thread 2 Thread 4

7

MultiSnort

Derek L. Schuff, Purdue University.

With memory sharing Not a clean-cut

modular structure

Minimal Decode

Queue assignment

Full Decode

Preprocessors

Detection engine

Output Module

Packet Capture

Distributed task queues

Full Decode

Preprocessors

Detection engine

Shared Data

... ...

8

Our design – ParaSnort

Based on SnortSP 3.0, a new different branch Modular design Multifunction

processing modules Memory sharing Optimization on core

algorithms Sufficient speedup

Load Balance Module

Output Module

Data Source Module

Packet Queue

Processing ModuleShared Data

Packet Queue

Processing Module

Core 1

Core 2 Core 8

... ...

9

Detailed module design

Data Source data acquisition and decoder

Load Balance dispatches traffic and makes multi-staged processing

Processing Module each is a single thread preprocessors and detection engine easy to develop functions other than intrusion

detection, such as antivirus or URL filtering Output module

Generate alert

10

Optimize Load Balancing

SnortSP 3.0 provides IP hash algorithm Not so balance when there are few flows Three improve methods:

Dst IP

Src port

Src IP

Dst port

protocol

Hash Processing Module

ID

5-tuple hash Join the Shortest Queue

Modified-JSQ Reassign a

flow when it has silenced for a long time

1

23 4

Packet

11

Optimize Multi-pattern Matching

SnortSP 3.0 provides AC algorithm AC works fast, and when there are few matches, the

cache locality is high. But when there are many matches in the traffic, the

cache locality turns bad. We introduced AC-WM to reduce the size of the state

machines of compiled ruleset. While costs much less memory, AC-WM is a bit slower

than AC for ordinary traffics, so users can decide which to use according to their network environment.

12

Para-Snort Performance

13

The Setup

NIDS platform

Testing machine

TCPreplay

eth0

Para-Snort

Testing ServerTesting Server

TestingClientTestingClient

Testing Server

NIDS platform

TestingClient

Para-Snort

eth0 eth1Forwarding

For tcpdump traces For real traffic

two quad-core Xeon E5335 at 2.00GHz4 GB DRAMUbuntu 8.04Linux kernel version 2.6.27

14

15

Performance of 400~800Mbps

1 2 3 4 5 6 70

100

200

300

400

500

600

700

800

900

Processing Engine Threads

Pro

ce

ss

ing

Sp

ee

d (

Mb

ps

)

LL1

LL2CERNET

http

16

Speedup of 4~7, almost linear for LL

1 2 3 4 5 6 71

2

3

4

5

6

7

Processing Engine Threads

Sp

ee

du

p

LL1

LL2CERNET

http

17

Performance of different load balancers

18

Performance of Different Pattern Matching

19

Performance Summary

Good speedup, up to 7. Performance up to 800Mbps

M-JSQ is fastest AC-WM costs less memory, but slower

20

Conclusions

Multi-thread design fully utilizes multi-core CPU

Modular design, multifunction process modules, easy to add modules.

Solve the issues in load balancing and multi-pattern matching

Can be NIPS if inline data source module added.

21

Questions

Thank You

top related