gnort: high performance intrusion detection using graphics processors giorgos vasiliadis, spiros...

16
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos, Sotiris Ioannidis Institute of Computer Science Foundation for Research and Technology Hellas

Post on 22-Dec-2015

227 views

Category:

Documents


1 download

TRANSCRIPT

Gnort: High Performance Intrusion Detection Using Graphics Processors

Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos, Sotiris Ioannidis

Institute of Computer ScienceFoundation for Research and Technology Hellas

General Idea

• How to speed up the processing throughput of intrusion detection systems by offloading the pattern matching operations to the GPU.

2Giorgos Vasiliadis ICS-FORTH

Introduction• The problem

– Network Intrusion Detection Systems (NIDS) are based on String Matching for detecting and preventing from well-known attacks

– String Matching process accounts up to 75% of the total CPU processing• String Matching Algorithms

– Aho-Corasick• Specialized hardware devices (NP, FPGAs, ASICs)

– Complex to modify and program– Poor flexibility

• Graphics Cards– Easy to program– Powerful and ubiquitous– Researches have begun exploring ways to tap their power for non-graphics

applications

3Giorgos Vasiliadis ICS-FORTH

Why use the GPU ?

• The GPU is specialized for compute-intensive, highly parallel computation

4Giorgos Vasiliadis ICS-FORTH

NVIDIA GeForce SIMD Architecture• Many Multiprocessors• Each multiprocessor contains

many Stream Processors• Memory model

– Shared On-Chip Memory• 1 cycle

– Constant Memory• 400-600 cycles; 1 cycle if cached

– Texture Memory• 400-600 cycles; 1 cycle if cached

– Global Device Memory• 400-600 cycles

Siz

e

Giorgos Vasiliadis ICS-FORTH

GPU can be used as a general purpose processor, capable of executing many threads in parallel

The Aho-Corasick Algorithm• Used in most modern NIDSes

Scans for multiple patterns simultaneously

• Preprocess all patterns to build a state machine

• The state machine is used to scan for multiple patterns simultaneously at linear time Complexity is independent of

the number of patterns

Example: P={he, she, his, hers}

6Giorgos Vasiliadis ICS-FORTH

Mapping Aho-Corasick on GPU• How to represent the State Machine ?• Snort represent each state as an array of pointers

– It is difficult to map them on the GPU memory Transform to a 2D array

– Can easily bind to Texture Memory• Texture fetches are cached

• Aho-Corasick exhibits strong locality of references• Random access memory read

The usage of Texture Memory boosts GPU execution time about 19 %

7Giorgos Vasiliadis ICS-FORTH

Parallelizing Packet Searching (1/2)

• Assigning a Single Packet to each Multiprocessor

Each packet is copied to the shared memory of the Multiprocessor

Stream Processors search different parts of the packet concurrently

Overlapping computation• Matching patterns may span

consecutive chunks of the packet

Same amount of work per Stream Processor

• Stream Processors will be synchronized

8Giorgos Vasiliadis ICS-FORTH

Parallelizing Packet Searching (2/2)

• Assigning a Single Packet to each Stream Processor

Each packet is processed by a different Stream Processor

No overlapping computation Different amount of work per

Stream Processor• Stream processors of the same

Multiprocessor will have to wait until all have finished

9Giorgos Vasiliadis ICS-FORTH

Software Mapping

• Packets are transferred to the GPU in batches– Performs much better than making each transfer separately Packets are stored to a buffer that is copied to the GPU when gets full

• Use page-locked memory to store the packets– Higher transfer throughput from host to device– Copies are performed using DMA, without occupying the CPU

• CPU and GPU execution can overlap10Giorgos Vasiliadis ICS-FORTH

Evaluation (1/2)

• Scalability as a function of the number of patterns

11Giorgos Vasiliadis ICS-FORTH

• We ran Snort using random generated patterns• All patterns are matched

against every packet• Payload trace contained UDP

800-bytes packets of random payload

Throughput remains constant when #patterns increases

2.4x faster than the CPU

Evaluation (2/2)

• Throughput as a function of the packets size

12Giorgos Vasiliadis ICS-FORTH

• Ran Snort using 1000 random patterns• All patterns are matched against

every packet 2.3 Gbit/s for full packets 3.2x faster compared to the CPU

Both GPU implementations do not present significant differences in performance

Evaluation with real input and rules

• Experimental setup– Two PCs connected via a 1 Gbit/s Ethernet switch

• To directly compare with prior work [Jacob et al], we re-implemented the Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM) algorithms on the GPU.

Giorgos Vasiliadis ICS-FORTH 13

Evaluation with real input and rules

14Giorgos Vasiliadis ICS-FORTH

• Snort loaded about 8000 patterns.• Preprocessors and PCRE were

disabled Original Snort (AC) cannot process

all packets in rates higher than 300 Mbit/s

GPU-assisted Snort (AC1, AC2) begins to loose packets at 600 Mbit/s 200% improvement

KMP and BM algorithms used from [Jacob et al] perform worse in all cases

Conclusion

• Graphics cards can be used effectively to speed up Network Intrusion Detection Systems.– Low-cost– Easy programming

• Future work includes– Transfer the packets directly from the NIC to the

GPU– Utilize multiple GPUs on multi-slot motherboards

15Giorgos Vasiliadis ICS-FORTH

Thank you

Any questions?

[email protected]

Giorgos Vasiliadis ICS-FORTH 16