how to build a 100 gbps ddos traffic generator · how to build a 100 gbps ddos traffic generator...

47
How to Build a 100 Gbps DDoS Traffic Generator DIY with a Single Commodity-off-the-shelf Server (COTS) Surasak Sanguanpong [email protected]

Upload: others

Post on 21-Apr-2020

35 views

Category:

Documents


1 download

TRANSCRIPT

How to Build a 100 GbpsDDoS Traffic Generator

DIY with a Single Commodity-off-the-shelf Server (COTS)

Surasak [email protected]

DISCLAIMER

THE FOLLOWING CONTENTS HAS BEEN APPROVED FORAPPROPIATE AUDIENCES

THE PRESENTATION HAS BEEN RATED RESTRICTED

NON TECHNICAL REQUIRES ACCOMPANYING OR MENTOR

การดำเนินการใดจากตัวอย่างการบรรยายนี้ ต้องทำโดยไม่รบกวนระบบคอมพิวเตอร์อื่นเพื่อไม่ให้กระทบกับมาตรา ๑๐ ซึ่งอาจมีความผิดตามมาตรา ๑๒ ตามที่กำหนดโดย

พรบ. ว่าด้วยการกระทำความผิดเดี่ยวกับคอมพิวเตอร์ (ฉบับที่ ๒) พ.ศ. ๒๕๖๐

USE AT YOUR OWN RISK

A sample of DDoS in Q2 2017350 Gbps Peak 190 Mpps Peak

18,300Simultaneous

HD TV channels

Why DDoS traffic generator?

R&D Tool for :Network behavior Traffic Log, Traffic Analysis, Anti-DDoS

Testing network middle boxesIDS, IPS, Firewall, Router

Synthetic traffic but closed to realistic traffic

HW V.S. SW generator

Items Dedicated Hardware Server with SoftwarePrecision High ModerateLatency Low Moderate

Capability Full max rate Near max rateCost High Economical

Goal:100 Gb/s DDoS traffic generator

Constraints:A single COTS serverA single 100 GigE NIC (not 10x10 GigE)

Outline

IntroductionDDoS understandingEthernet revisiting

HW and SW solution for 100 Gb/s generatorServer and componentsLinux Networking StackOpen source SW generator

Testbed and Performance results

INTRO

TESTBED

HWSW

Introduction: Understanding DDoS

DoS Single Source

DDoS

The same traffic will be simulated

Broadly types of DDoS

Volume Based AttacksTo saturate the bandwidth of the attacked siteMeasured in bits per second (bps)

Application Layer AttacksLow-and-slow attacks to crash targetsMeasured in requests per second (rps)

Protocol AttacksTo consumes actual target resources, or intermediate communication equipment (firewalls, load balancers, etc)Measured in packets per second (pps)

Introduction: Ethernet Update,

Understanding Ethernet Wire Speed and Throughput Calculations

Evolution of Ethernet

• Capacity and speed requirements on data links keep increasing

• Big Data, AI require more bandwidth

• Servers have begun to be capable of sustaining 100G to memory

10 Mb/s 100 Mb/s

1 Gb/s

10 Gb/s

40,100 Gb/s

IEEE Std 802.3bs200, 400 Gb/s

25 Gb/s

40,000Xin34yrs

1983 1995 1998 2002 20172010 2015

Understanding Ethernet Wire speed

Wire Speed refers to the hypothetical peak packet bitrate

What is the maximum packet per second (pps)that can be generated for a specific Ethernet speed?Q:

Minimum Frame Size (Large number of frame per unit time)

S S S S S S S1

The Frame sizes matter

Two options for consideration:

1 second

Maximum Frame Size (Small number of frame per unit time)

L L L

1 second

2

IFG

Ethernet frame by frame delivery7 1 6 6 2 from 46 to1500 4 12 (bytes)

FCS PA SA Type SFD Payload DA PA SFD

Fields Size (bytes)Preamble+SFD 8Dst Address 6Src Address 6Type 2Payload 46FCS 4IFG 12Total 84

Fields Size (bytes)Preamble+SFD 8Dst Address 6Src Address 6Type 2Payload 1,500FCS 4IFG 12Total 1,538

Mini Size (64 bytes) Max Size (1518 bytes) S L

Max @100 GigE

• Maximum frame rate for 64 byte packets over 100 GigE link

M = Speed/Size = 100x109 / 672= 148,809,523 pps

Maximum throughputT = M*64*8

= 76.19 Gbps

• Maximum frame rate for 1518 byte packets over 100 GigE linkM = Speed/Size = 100x109 / 12,304

= 8,127,438 pps

Maximum throughputT = M*1518*8

= 98.69 Gbps

100 GigE performance

Rate (Gb/s)

#Frame(Min:64B)

#Frame(Max:1518B)

1 1.48 M 81 K10 14.88 M 812 K40 59.52 M 3.25 M

100 148.81 M 8.12 M

Max Frame Rate @different speed

Challenge for Packet Processing

Incoming Packet 1T1

T2

Lookup in Packet 1

Do Packet 1

Incoming Packet 2

Lookup in Packet 2

Do Packet 2

InterPacketArrivalTime

Rate (Gb/s)

#Frame(Million)

Inter Packet Arriving Time (ns)

1 1.48 67210 14.88 67.240 59.52 16.8100 148.81 6.72

#Frame and Timing with 64 byte length

Time/CPU budget in 100 Gbps• With 148.81 Mpps, the time budget for

processing a single packet is:

1/(148.81x106) = 6.72 nanosecond

• Considering a server with 3 GHz CPU…..• How many clock cycle does it require to

handle minimum frame size of 100 Gb/s packet rate?

6.72x10-9*3x109 ~ 20 clock cycles

Hardware Investigation

To answerHardware – is it capable?

To Delivery 100 GigE

100 GbE

100 GbE

CPU

1

Interconnection2

PCI Bus3Memory Bus 4

4 Crucial components:

Hardware Capability

PCIe 3.0upto 40lanes/sockets

(252Gb/sforx16)

4ChannelsDDR42133MZHupto

546Gb/s

QPI156Gb/s

1 2

3

4

Yes!, the hardware is capable.

OS Kernel & Network Stack Investigation

To answerSoftware – is it capable?

OS’s obstacle

• Traditional OS network stacks is problematic

• Not design with this speed in mind

• Many features essential for networking

• filtering, connection tracking, memory management, VLANs, overlay, and process isolation

• Not scalable even many CPU cores these days

http://www.makelinux.net/kernel_map/

Overhead in Linux kernel

• Socket based system calls

• Context switching and blocking I/O

• Data Copying from kernel to userspace

• Interrupts Handling

• High latency

LinuxNetworkStackWalkthrough(2.4.20)

https://wiki.openwrt.org/doc/networking/praxis

How to solve this obstacle?

Solution: Kernel Bypass

Conventional Stack V.S. Kernel bypass

• Let’s bypass kernel and work directly with NICs

• Allows access to the hardware directly from applications

• Using a set of libraries for fast packet processing

• Reduces latency with more packets to be processed

• Handles packets within minimum number of CPU cycles

• But…• Provides only very basic set of

functions (memory management, ring buffers, poll-mode drivers)

• Require reimplementation of others IP stack features

Conventional (Sockets based)

Application

Hardware

Kernel

User

Sockets

Network Driver

TCP/IP Stack

Hardware

Kernel

User

Application

Kernel Bypass (RDMA based)

TCP/IP Stack

Network Driver

Packets Library

Zero Copying (ZC) with RDMA Conventional (Sockets based) Kernel Bypass (RDMA based)

Application

Hardware

Kernel

User

Sockets

Network Driver

TCP/IP Stack

App buffer

Sockets buffer

Device buffer

Data copy

Data copy

Data copy

Application

Hardware

Kernel

User

Packet Libraries

Network Driver

TCP/IP Stack

Shared buffer

ZCwithRemoteDirectMemoryAccess

Core 0

Scalable with multicores

Kernel

Core 1

Application

Packet Libraries

Core 2

Application

Packet Libraries

Core 3

Application

Packet Libraries

Tx0 Rx0 Tx1 Rx1NIC

Fast (Userspace) Packet Processing

• Kernel bypass also known as• Fast Packet Processing• High-Performance Packet IO• Data Plane Processing Acceleration Framework

DPDK Netmap PF Ring

OS Linux, FreeBSD FreeBSD,Linux Linux

License BSD BSD LGPL + paid

Language C C C

Use Case Appliances, NFV NFV, Router Packet Capture, IDS/IPS

NIC vendors Several Intel Intel

Supports Community Community Company

DPDK

• Data Plane Development Kit• A set of libraries and drivers for fast

packet processing

• Main Libraries• multicore framework• huge page memory• ring buffers• poll-mode drivers

Currently managed as an open-source project under the Linux Foundation

http://dpdk.org/

DPDK Architecture

DPDK in Linux Distros

• Available as part of several OS distributions

ClearLinux

DPDK based Open Source Projects

SPDKStorage Performance

Development Kit

Packet-journeyLinux router

pktgen-dpdk

Virtual Multilayer Switchintegrated into various cloud platform

Carrier-grade, integrated, open source platform to accelerate Network Function Virtualization (NFV)

IO services framework for the network and storage software with Vector Packet Processing

Linux scalable software routers, proved with 500k routes

libraries for writing high performance, scalable, user-mode storage applications

The Stateful Traffic Generator for L1-L7

Flexible Stateless/StatefulTraffic Generator for L4-L7

Software based traffic generator

What can be built with DPDK?

• Switch/Router • Stateless and stateful

Firewall • IDS/IPS• Load balancer • Traffic recorder

• Fast internet scanners• Stateless packet generator • Stateful, application-like flow

generator • IPsecVPN gateway • Accelerated key-value DB • Accelerated NAS

TRex

• DPDK based stateful/stateless traffic generator (L4-L7)

• Replay of real traffic (pcap), scalable to 10K parallel streams

• Supports about 10-30 mpps per core, scalable with the number of cores

• Scale to 200 Gb/s for one COTS

High scale benchmarks for stateful networking gear (Firewall/NAT/DPI)

Generating high scale DDOS attacks

High scale, flexible testing for switches

Scale tests for huge numbers of clients/servers

https://trex-tgn.cisco.com/

TRex sample Traffic config file

• 255 clients talking to 255 servers

root: ~/trex-core/scripts# cat cap2/dns.yaml- duration : 1.0generator :

distribution : "seq"clients_start : "16.0.0.1"clients_end : "16.0.0.255"servers_start : "48.0.0.1"servers_end : "48.0.0.255"clients_per_gb : 201min_clients : 101dual_port_mask : "1.0.0.0"tcp_aging : 0udp_aging : 0

cap_info :- name: cap2/dns.pcapcps : 10.0ipg : 10000rtt : 10000w : 1

Testbed and Performance Measurements

40 Gb/s Traffic Generator Reports

• Pktgen-dpdk• https://www.chelsio.com/wp-content/uploads/resources/T5-40Gb-

Linux-DPDK.pdf• TRex

• https://trex-tgn.cisco.com/trex/doc/trex_stateless_bench.html• Warp17

• https://github.com/Juniper/warp17

Where is the 100 Gb/s results?

Testbed

• HW: Dell R430• 2xIntel Xeon E5-2640 v4 2.40 GHz

dual socket, 10-core• 64 GB RAM (4x16 GB DDR4 2400

MHz)• 1.5 TB NL-SCSI • DPDK based 100Gbps NICs

• SW• CentOS 7.3 Kernel 3.10• DPDK 17.05.2• TRex 2.29

Dell R430 Dell R430

100 GigESender Receiver

CMLI Output

• UDP packets generators • Random Source IP

Addresses

Results@64 bytes @1518 bytes

Ongoing R&D Project

• Porting Traffic Recorder• HTTP Log and Flow Log• Current testbed 30 Gb/s capability (4x10 Gb/s)

• ~60,000 flow/s• ~10 Million active flows

• Support both IPv4 and IPv6

• Development of Stateless DDoS mitigation

• Development of Traffic base IoT devices auto discovery and analysis

Summary

• COTS Server is capable for 100 GigE

• Data plane solution is a future for COTS based appliance

• Rising trend of SW based network appliances for high speed network

Thanksforyourattention

Q&A