experiences in building a 100 gbps (d)dos traffic...
TRANSCRIPT
Experiences in Building a 100 Gbps (D)DoS Traffic Generator
DIY with a Single Commodity-off-the-shelf (COTS) Server
Surasak [email protected] 31, 2018 Umeda Sky Building Escalators
About me
2
• Teaching @Kasetsart University Computer Engineering• Head of Applied Network Research Lab• Chairman of UNINET Network Monitoring Working Group • Electronics Transactions Committee (DE Ministry)
• Interesting Areas• Internet System Security• Traffic Analysis and Measurements• ISP-Application Collaboration
About This Talk
How to DIY a 100 Gb/s (D)DoStraffic generator?
HW and SW solutions
What are the underlying technology
and techniques?
Theory and Tools
What are lessons learned from the
deployment?
Experiences and Outcomes
Goal and Constraints
Full 100 Gb/s [~100 Mpps]
Capability
Running on a single
COTS server
Running on a single
100 GigE NIC
Closed Network Deployment and Testing with Synthetic Traffic
OutlinePART I: Introduction
DDoS UnderstandingEthernet Revisiting & Update
PART II: HW and SW SolutionHardware ComponentsOS and Software Tools
PART III: Testbed and Performance ResultsThroughputCPU Utilization
PART IV: Lesson Learned Experiences OutcomesRelated Projects
PART I
IntroductionUnderstanding DDoS
2018: Welcome to the New Tb/s DDoS Era!
Misconfigured Memcachedservers to amplify DDoS
Source: https://thehackernews.com/2018/03/ddos-attack-memcached.html
Feb 28,2018
Arbor confirms a 1.7 Tb/s attack targeted at a customer of a U.S.
based ISP
Source: https://thehackernews.com/2018/03/ddos-attack-memcached.html
Memcached Amplification Attack Breaks New 1.7 Tb/s DDoS Mar 5,
2018
~91,500Simultaneous
HD TV channels
Biggest-Ever 1.35 Tb/s DDoS Attack Hits Github
DoS Single Source
DDoS
Simulating this!
Broadly types of DDoS
Volume Based AttacksTo saturate the bandwidth of the attacked siteMeasured in bits per second (bps)
Application Layer AttacksMostly low-and-slow attacks to crash targetsMeasured in requests per second (rps)
Protocol AttacksTo consumes target resources, or intermediate communication equipment (firewalls, IPS, Load balancers, etc.)Measured in packets per second (pps)
PART IIntroduction
Ethernet Revisiting & UpdateUnderstanding Ethernet Wire Speed
and Throughput Calculations
Evolution of Ethernet
• Capacity and speed requirements on data links keep increasing
• Servers have begun to be capable of sustaining 100 Gb/s to memory
10 Mb/s100 Mb/s
1 Gb/s
10 Gb/s
40,100 Gb/s
IEEE Std 802.3bs200, 400 Gb/s
25 Gb/s
40,000X in 34 yrs
1983 1995 1998 2002 20172010 2015
Theoritical 100 GigE Characteristics (Wire Speed)
Frame Type Frame Size Max Packets Max Bandwidth Frame Duration
Minimum 64 bytes 148.8 Mpps 76.19 Gb/s 6.72 ns
Maximum 1518 bytes 8.1 Mpps 98.69 Gb/s 123.04 ns
The Frame sizes matter
S S S S S S S
1 second
Smallest : Minimum Frame Size
1
L L L
1 second
Largest: Maximum Frame Size
2
(High Rate, Low Volume)
(Low Rate, High Volume)
Ethernet frame by frame delivery
7 1 6 6 2 46 to 1,500 4 12
IFGFCSPA SA TypeSFD PayloadDA PA SFD
Frame Frame
Maximum Frame Size 7+1+(6+6+2+1,500+4)+12
= 1,538 bytes(12,304 bits)
1,518 bytes*Minimum Frame Size 7+1+(6+6+2+46+4)+12
= 84 bytes(672 bits)
64 bytes*
* Excluded 20 bytes :- PA:7+SFD:1+IFG:12)
84 to 1,538 bytes
64 to 1,518 bytes
Maximum Frame Rate for 100 GigE
Max frame @64 bytes M = Speed/Size
= 100x109 / (84*8)= 148,809,523 pps
Maximum throughputT = M*64*8
= 76.19 Gbps
Max frame @1,518 bytesM = Speed/Size
= 100x109 / (1,538*8)= 8,127,438 pps
Maximum throughputT = M*1,518*8
= 98.69 Gbps
Theoritical 100 GigE performance
Gb/s #Frame (@64B) #Frame (1,518B)
1 1.48 M 81 K10 14.88 M 812 K
100 148.8 M 8.1 M
Maximum Frame Rate
Maximum BandwidthGb/s #Frame (@64B) #Frame (1,518B)
1 762 Mb/s 987 Mb/s10 7.62 Gb/s 9.87 Gb/s
100 76.2 Gb/s 98.7 Gb/s
Frame Duration1/(148.8x106) = 6.72 ns
Frame Duration 1/(8.1x106)
= 123.04 ns
Timing and CPU budget in 100 GigE
0 10 20 30 40 50 60 70 90 100 110 120 130 140 150 160 Time (ns)
64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 646.72 6.72 6.72
1,518 1,518123.04 123.04
3 GHz Clock
30thcycles
60thcycles
90thcycles
330thcycles
PART II
HW and SW Investigation:
A COTS Server with Multicores CPU – is it capable?
To Delivery 100 GigE with COTS
100 GbE
Performance Characteristics of Buses
100 GbE
CPU
1
2
34 Four Crucial components
CPUMulticores, MultithreadHigh Clock Speed
InterconnectionQPI @153 Gb/sHyperTransport @102.4 Gb/s
PCI BusPCIe 3.0 x16 @128 Gb/sPCIe 4.0 x16 @256 Gb/s
Memory BusDDR4-2400MHz Quad Channel @512 Gb/sDDR4-2666MHz Six Channel @720 Gb/s
1
2
3
4
Yes!, the hardware is capable.
Next : SW investigation, focusing on
OS Kernel & Network Stack
OS’s obstacle
• Traditional OS network stacks is problematic• Not design with this speed in
mind• Many features essential for
networking• filtering, connection tracking,
memory management, VLANs, overlay, and process isolation
• Not scalable even many CPU cores these days
http://www.makelinux.net/kernel_map/
Overhead in Linux kernel• Socket based system calls• Context switching and
blocking I/O• Data copying from kernel to
userspace• Interrupts Handling
• Linux stack designed as control plane not data plane
• NOT SCALE!
Linux Network Stack Walkthrough (2.4.20)https://wiki.openwrt.org/doc/networking/praxis
High latency!
How to solve this obstacle?
Solution: Kernel Bypass
Conventional Stack V.S. Kernel bypass
• Let’s bypass kernel and work directly with NICs
• Allows access to the hardware directly from applications• Using a set of libraries for fast
packet processing• Reduces latency with more
packets to be processed• Handles packets within minimum
number of CPU cycles
• But…• Provides only very basic set of
functions (memory management, ring buffers, poll-mode drivers)
• Require reimplementation of others IP stack features
Conventional (Sockets based)
Application
Hardware
Kernel
User
Sockets
Network Driver
TCP/IP Stack
Hardware
Kernel
User
Application
Kernel Bypass (RDMA based)
TCP/IP Stack
Network Driver
Packets Library
Zero Copying (ZC) with RDMA Conventional (Sockets based) Kernel Bypass (RDMA based)
Application
Hardware
Kernel
User
Sockets
Network Driver
TCP/IP Stack
App buffer
Sockets buffer
Device buffer
Data copy
Data copy
Data copy
Application
Hardware
Kernel
User
Packet Libraries
Network Driver
TCP/IP Stack
Shared buffer
ZCwithRemoteDirectMemoryAccess
Fast (Userspace) Packet Processing
• Kernel bypass also known as• Fast Packet Processing• High-Performance Packet IO• Data Plane Processing Acceleration Framework
DPDK Netmap PF Ring
OS Linux, FreeBSD FreeBSD,Linux Linux
License BSD BSD LGPL + paid
Language C C C
Use Case Appliances, NFV NFV, Router Packet Capture, IDS/IPS
NIC vendors Several Intel Intel
Supports Community Community Company
DPDK (Data Plane Development Kit)
• A set of libraries and drivers for fast packet processing
• Main Libraries• multicore framework• huge page memory• ring buffers• poll-mode drivers
Originally developed by Intel
Currently managed as an open-source project under the Linux Foundation
http://dpdk.org/
DPDK Architecture
DPDK Programmable Packet Processing Pipelineshttps://schd.ws/hosted_files/2ndp4workshop2015/a6/Intel,%20P4%20Workshop%20Nov%2018%202015.pdf
DPDK based Open Source Projects
SPDK
Packet-journeyLinux router
pktgen-dpdk
Virtual multilayer switchintegrated into various cloud platform
Carrier-grade, integrated, open source platform to accelerate Network Function Virtualization (NFV)
IO services framework for the network and storage software with Vector Packet Processing
Linux scalable software routers, proved with 500k routes
Libraries for high performance, scalable, user-mode storage applications
The Stateful traffic generator for L1-L7
Flexible stateless/statefultraffic generator for L4-L7
Original DPDK traffic generator
TRex
• DPDK based stateful/stateless traffic generator (L4-L7)
• Replay of real traffic (pcap), scalable to 10K parallel streams
• Supports about 10-30 mpps per core, scalable with the number of cores
• Scale to 200 Gb/s for one COTS
High scale benchmarks for statefulnetworking gear (Firewall/NAT/DPI)
Generating high scale DDOS attacks
High scale, flexible testing for switches
Scale tests for huge numbers of clients/servers
https://trex-tgn.cisco.com
PART III
Testbed and Performance Measurements
Testbed• HW: Two Rack Servers
• Xeon E5-2640v4 @2.40 GHz, 10-cores• 64 GB RAM (4x16 GB DDR4-2400 GHz)• 1.5 TB NL-SCSI • PCIe Gen3x16• 2 ports 100 GigE NIC
• OS&SW• CentOS 7.3 Kernel 3.10• DPDK 17.05.2• TRex 2.29
100 GigE
Sender Receiver
TRex sample configuration file
• 65,535 clients talking to 255 servers
trex: ~/trex-core/scripts# cat cap2/imix64.yaml- duration : 1.0
generator :distribution : "seq"clients_start : "16.0.0.1"clients_end : "16.0.255.255"servers_start : "48.0.0.1"servers_end : "48.0.0.255"clients_per_gb : 201min_clients : 101dual_port_mask : "1.0.0.0"tcp_aging : 0udp_aging : 0
cap_info :- name: cap2/udp_64B.pcap
cps : 100000000.0ipg : 10000rtt : 10000w : 1
Trex Console
Testbed Scenario
• UDP packets with random 65,535 source IP address to 255 destination IP address
@64 bytes @1518 bytes
Throughput V.S.
#CPU Cores
Throughput V.S.
#CPU Cores
CPU UtilizationV.S.
#CPU Cores
CPU UtilizationV.S.
#CPU Cores
Throughput Measurements
Theoretical Max: 76.2 Gb/s
Theoretical Max: 148.8 pps
@64 bytes @1,518 bytes
CPU Utilizations@64 bytes @1,518 bytes
PART IV
Lesson Learned and
Related Projects
Why DDoS traffic generator?
PacketProcessing
Core
DDoSDetection
Test Tools
IDS,IPS
Firewall
Router
LoadBalancer
TrafficAnalytics
TrafficProfile
UsageBehavior
TrafficLog
Accounting
QuotaControl
LawEnforcement
Deep Packet
Inspection
IoTDiscovery
DataExfiltration
ProtocolDiscovery
6 Projects in
4 Groups to be Introduced
(1) DDoS Detection/Mitigation
Inline 100 GigE Stateless DDoS Detection/Mitigation
PacketGuardian
• Experiments• SYN Flooding and simple P2P Detection• Results: 90 Mpps Detection Capability
• Research Tasks:• Investigation of Efficient Detection/Mitigation
Methodology • HW/SW optimization techniques
Internalnetwork
Internet
GatewayRouter
CoreRouter
100 GigE
Model
In progress R&D
(2) HTTP Flood Detection (1x100 GigE)
• PCAP traffic replaying• Pure HTTP-GET flood attacks
with NO background packets• Detection against 86K signatures
DetectorGen 100 GigE
86K Signatures
31.1 [email protected] Gb/sGb/s
100
80
60
40
20E5-2640v4 10 [email protected] GHz
43 Gb/s8.3 Mpps
Preliminary Results:
(3) HTTP Logger (10x10 GigE)
• PCAP traffic replaying• HTTP packets with
background packets• Inspection and log only HTTP
Gen #1
Gen #2
Gen #3
2x10 GigE
2x10 GigE
6x10 GigE
31.1 Mpps99.5 Gb/s
(4) Traffic Logger Performance• Real Deployment in 10 Gb/s
Campus Network • Real-time HTTP and Packet
Header Log• Repository for Data Analytics
Peak 2,100 req/s (33GB/day) Peak 380,000 req/s (330 GB/day)
14.1 Billion records(Total 2.57 TB)
3.27 Trillion records(Total 28.03 TB)
554455 1467551484.180000 67686345 [email protected] 1467551484.163681 4 158.108.2.X 198.51.100.X TCP 5566 80 GET www.domain.com /index.html
2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x102009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123
Sample HTTP Log format
Sample Packet Header Log format
ELK Stack as Indexing Platform with 80K/s/machine Indexing Rate
Data Lake Statistics
(5) Traffic Analytics
(6) Traffic Accounting/Control• Track sessions and flow for
counting BW usage once login
Login Sessions IPv4 and IPv6 # of Active Sessions
One ClickSession Termination
Ads
Today’s Usage
All ActiveAddress
DualAuthen
Max Burst
65X,XXX Concurrent Flows
Lessons Learned
• Server is really faster than you think!
• Faster, Better• Use latest PCIe Gen3x16 slots • Faster CPU clock speed is rather more preferences than number of cores
• Reducing inter-processor communication cost is a key
• Required in-depth understanding of packet I/O “C” code implementation
Summary
• Generic OS with default network stack: Incapability of handling 100 GigE saturated with smallest frame
• Proved Solution: Data Plane Fast Packet Framework
• COTS Server is capable for 100 GigE
• Rising trend • SW based appliances for high speed network• COTS Security Appliance based Fast Packet Framework
Thank you for your attention
Collaboration and Students Recruitment Welcome!
Q & A Time
Q&A…
Sunset at Narita Airport