building high-performance networked systems with ... · incapability of snort for high speed...

46
Building High-Performance Networked Systems with Innovative Hardware and Software Techniques Kai Zhang University of Science and Technology of China

Upload: others

Post on 26-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Building High-Performance Networked Systems with Innovative Hardware and Software Techniques

Kai Zhang University of Science and Technology of China

Page 2: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

The Growth Trend of Business Data

2

The volume of business data worldwide is expected to double every 2 years

Source: Oracle

Page 3: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

The Growth Trend of Network Speed

3

Source: IEEE 802.3 Higher Speed Study

Group - Tutorial

Page 4: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Linux, FreeBSD,

Solutions for Next Generation Networked Systems

4

NETWORKED SYSTEM

OPERATING SYSTEM

HARDWARE

Networked Intrusion Detection System

Firewall

IPSec Gateway

Load BalancerRouter

Key-Value Store

… …

TCP/IP Stack

NIC Driver

Socket

CPUs

DPDK, Netmap, …New drivers, bypass the OS

How to utilize?

GPUs CPUs with Integrated GPUs Xeon Phi

Page 5: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Solutions for Next Generation Networked Systems

• Demonstration of Two Networked Systems

• Mega-KV• A key-value store system with the highest throughput• 100x higher throughput than Memcached

• Snort with DPDK• Enhance the efficiency of network I/O of Snort with DPDK• A cooperation between USTC and Intel for educational

purpose• To be a course lab for Advanced Computer Networks

5

Page 6: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores

6

Page 7: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Key-Value Stores๏ A simple but effective method to manage data where a data record

(or a value) is stored and retrieved with its associated key • variable type and length of record (value)• simple or no schema• easy software development for many applications

๏ Key-value stores have been widely deployed in data processing production systems:

7

Page 8: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Simple and Easy Interfaces of Key-Value Stores

• set (key, value)• value = get (key)

Index

38john_age

Variable-length keys & values

Client

key

key-value store

GET Key: john_age Value: 38

8

Page 9: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Key-Value Stores: Examples

Keys Values

Amazon Customer ID Customer profile (e.g., credit card, buying history)

Facebook, Twiter User ID User profile (e.g., friends, photos, posts)

iCloud/iTunes Movie/song name Movie, Song

Distributed file Systems Block ID Block

Page 10: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Workflow of a Typical In-Memory Key-Value Store

Network Processing

Memory Management

Index Operations

Access Value

10

Page 11: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Workflow of a Typical In-Memory Key-Value Store

GET

TCP/IP Processing

Query ParsingNetwork Processing

MemoryFull

MemoryNot Full

Evict Allocate Memory Management

Delete from Index

Insert into Index

Search in Index Index Operations

Read & Send Value Access Value

11

SETDELETE

Page 12: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Where does time go in KV-Store MICA [NSDI’14]

Exec

utio

n Ti

me

Perc

enta

ge

0

0.25

0.5

0.75

1

Four Data Sets

1) 128B Key 1024B Value

2) 32B Key 512B Value

3) 16B Key 64B Value

4) 8B Key 8B Value

Access Value

Index Operations

Network Processing (w/ DPDK) & Memory

Management

Index operation becomes one of the major bottlenecks12

Page 13: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Random Memory Accesses in In-Memory Key-Value Stores

QueryNetwork Processing & Memory Management

Access Value

Random Memory Accesses

Index Operation

Hash Table

13

Page 14: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Random Memory Accesses of Indexing are ExpensiveTi

me

(nan

osec

ond)

0

60

120

180

240

Number of Memory Accesses1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Sequential memory accessRandom memory access

72

163

CPU: Intel Xeon E5-2650v2

Memory: 1600 MHz DDR3

14

Page 15: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Inabilities for CPUs to Accelerate Random Memory Accesses

2. Prefetch

1. Cache

3. Multithreading

• Not easy to predict next memory address

• Working set is large (~100 GB), CPU cache is small (~10 MB)

• Limited number of hardware threads• Limited number of Miss Status Holding Registers (MSHRs)

15

CPU spends a large portion of its time idling, waiting for data

Page 16: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Mega-KV addresses two issues: large number of queries and random memory access delay

Network Processing

Memory Management

Access Value

Index Operation To accelerate it by GPUs

DPDK, Multiget, UDP

Bitmap, Optimistic concurrent access

Prefetch

16

Page 17: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

The Goal of Mega-KV

๏ Throughput is the critical issue in big data environment• Throughput measures the capability of a key-value store

system to process a growing amount of queries on an increasingly large data set

๏ Acceptable in-memory key-value store latency• < 1 millisecond, e.g. Facebook, Amazon, …

๏ Our goal: Maximize throughput subject to an acceptable latency

17

Page 18: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

CPU vs. GPU

Intel Xeon E5-2650v2:2.3 billion Transistors

8 Cores59.7 GB/s memory bandwidth

Nvidia GTX 780:7 billion Transistors

2,304 Cores (12 SMXs)288.4 GB/s memory bandwidth

Massive ALUs

ControlCache

18

Page 19: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Two Advantages of GPUs for Key-Value Stores

GPU Core

cache miss

cache miss

Thread A

Thread B

Thread C

Instruction Buffer memory request issued, switch to another threadmemory request issued, switch to another thread

19

1. Massive Processing Units to Address Large Number of Concurrent Queries

• KV Store — simple independent memory access operations• GPU — thousands of cores for parallel processing

2. Massively Hiding Memory Access Latency • KV Store — random memory accesses in index operations• GPUs can effectively hide memory access latency with massive hardware

threads and zero-overhead thread scheduling (a GPU hardware support)

Page 20: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Basic Design of Mega-KV

Pre-Processing

GPU Processing

Post-Processing

Index OperationsNetwork processing,Memory management Read & Send Value

20

RXDPDK

TXDPDK

Page 21: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Basic Design

Pre-Processing

Network processing,Memory management

21

Page 22: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Basic Design

Pre-Processing

Network processing,Memory management

Batch

22

Page 23: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Basic Design

Pre-Processing

Network processing,Memory management

Parallel Processing in GPUs

23

Page 24: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Basic Design

Pre-Processing

Network processing,Memory management

Parallel Processing in GPUs

24

Page 25: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Post-Processing

Basic Design

Pre-Processing

Network processing,Memory management

Parallel Processing in GPUs

Read & Send Value

25

Page 26: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Post-Processing

Basic Design

Pre-Processing

Network processing,Memory management

Parallel Processing in GPUs

Read & Send Value

26

Page 27: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Challenges of Offloading Index Operations to GPUs

1. GPUs’ memory capacity is small: ~10 GB• Working set may be hundreds of gigabytes

2. Low PCIe bandwidth• PCIe is generally the bottleneck of GPUs if large bulk of data

needs to be transferred

3. Handling variable-length data is inefficient for GPUs• Imbalance load between GPU cores

27

Page 28: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Our Approach

C C C C

Input data (Keys) Index

key

GPU optimized cuckoo hash table thatstores key signatures and value locations

C C C C

Compress

Compressed fixed-length signatures

28

Address challenges 2, 3 (PCIe bandwidth and variable length data)

Address challenges 1, 3(GPU memory capacity and variable length data)

Page 29: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

GPU Cuckoo Hash Table

Key

Signaturecompress

Our Approach: Search Index

29

key comparison

Send to client

ValueKeyKV object

Page 30: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Evaluation - Hardware Setup

CPU:

Intel Xeon E5-2650v2 octa-core, 2.6GHz

Total 16 CPU cores

GPU:

Nvidia GTX 780, 2304 cores, 863MHz

Total 4608 cores

NIC:

Intel dual-port 10Gbps NIC

Total 40 Gbps

30

Page 31: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Reaching a Record High ThroughputTh

roug

hput

(MO

PS)

020406080

100120140160180

Data Sets

8B key8B value

16B key64B value

32B key512B value

128B key1024B value

Fastest CPU-based KV store Mega-KV

2.1x1.9x

2.8x2.1x

31

Page 32: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

LatencyC

DF

0

0.25

0.5

0.75

1

Round Trip Time (microsecond)

0 60 120 180 240 300 360 420 480 540 600

160 MOPS

95th: 390

50th: 256

32

Compared withFacebook

1,200 (95th)

300 (50th)

Page 33: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Accelerating the Network I/O of Snort with DPDK

33

Page 34: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Snort

• Snort is a multi-mode packet analysis tool• Sniffer• Packet Logger• Forensic Data Analysis tool• Network Intrusion Detection System

• Snort is able to perform network traffic analysis both in real-time and for forensic post processing

• Snort “Metrics” • Fast (High probability of detection for an attack on high speed networks)• Configurable (Easy rules language, many reporting/logging options)

34

Page 35: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Snort Architecture

35

Packet I/O

Packet Decoder

Preprocessor

Detection Engine

Output

Page 36: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Incapability in Handling 10Gbps Network Traffic

36

Cycles Needed in Snort

1,200 400 - 25,000…+

Packet I/O Detection, etc.

Your Budget

1,400

10Gbps, min-sized packets, quad-core 2.66GHz CPUs

(in x86, cycle numbers are from RouteBricks [Dobrescu09] and PacketShader[Han10])

Page 37: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Incapability of Snort for High Speed Networks

• Snort was designed to detect attacks on 100Mbps links• Current network speed reaches 10Gbps and 40Gbps• Snort becomes incapable of detecting intrusions on current backbone

and data center network

• Accelerate network I/O with DPDK• Snort 2.9 introduces the Data Acquisition library (DAQ) for packet I/O• Current supported: pcap, AF_PACKET, netmap, ipfw• Our work: add DPDK support in DAQ

37

Page 38: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Opportunities from DPDK

38

Cycles Needed 1,200 400 - 25,000…+

Packet I/O Detection, etc.

80

DPDK

Page 39: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

EXPERIMENTAL SETUP

• Hardware• CPU: Intel(R) Xeon(R) CPU E5-2650 v3 • NIC: 82599ES 10-Gigabit SFI/SFP+ Network Cards

• Software • Linux 3.19.0 • Snort 2.9.8.0 • DPDK 2.1.0

39

Page 40: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Snort Modes

• Two Snort modes: Passive Mode and Inline Mode• Snort is bypassed in the Passive Mode• Snort filters packets in the Inline Mode

40

Snort

PassiveMode

SnortInlineMode

Page 41: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Snort Throughput (Inline Mode w/o DPI)

41

Throughput Stability of DPDK and Netmap

Thro

ughp

ut (M

pps)

13

14

15

1 2 3 4 5 6

14.86 14.88 14.86 14.88 14.86 14.88

14.25

13.90

13.49

14.20

13.7213.90

netmap dpdk

Page 42: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Latency (Inline Mode w/o DPI)

42

Latency Stability of DPDK and Netmap

Late

ncy

(ms)

0.0

0.1

0.2

0.3

1 2 3 4 5 6

0.15 0.16 0.15 0.15 0.15 0.14

0.26 0.25

0.34

0.25

0.34 0.33

netmap dpdk

Page 43: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Snort Throughput (Passive Mode w/ DPI)

43

DPDK

netmap

AF_PACKET

pcap

Throughput (Mpps)

0 1.25 2.5 3.75 5

1.36

1.56

4.24

4.28

Page 44: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Summary

๏ Mega-KV provides the highest throughput• Fastest in-memory key-value store on commodity processors• More than 100x faster than Memcached• Open source at http://kay21s.github.io/megakv/

๏ Integrating DPDK into Snort• Improving the latency and throughput of Snort• For educational purpose: To be a course assignment in USTC

44

Page 45: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network

Thanks!

45

Page 46: Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed Networks • Snort was designed to detect attacks on 100Mbps links • Current network