beer garden a defense against high-density attacks michael n. gagnon founder and director, hellasec...
Post on 16-Dec-2015
212 Views
Preview:
TRANSCRIPT
Beer GardenA defense against high-density attacks
Michael N. GagnonFounder and Director, HellaSec LLC
mike@hellasec.com
1This work was funded by DARPA’s Cyber Fast Track program. Distribution Statement “A” (Approved for Public Release, Distribution Unlimited). The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Contents
• What is a high-density attack?• Beer Garden defense
• Theory• Demo• Implementation
• Appendix 1: Near-term solutions• Appendix 2: Examples
2
Background: conventional DoS• A server is powerful• It takes an army of PCs to take down one server
5
What is a high-density attack?
1. Each PC sends as much traffic as possible
2. This traffic overloads the server
3. The server becomes unresponsive
attack traffic
High-density attacks
6
• It takes one PC to take down a single server by sending “high-density” attack traffic
1. A single attacker sends attack traffic
2. This traffic overloads the server
3. The server becomes unresponsive
high-densityattack traffic
What is a high-density attack?
Density
7
• Mass = resources consumed• Volume = number of requests• Density = resources consumed per request
• Ratio of mass to volume
• Examples of low-density requests• Most legitimate traffic• Conventional DoS traffic
• Examples of high-density requests• Algorithmic-complexity attacks• Legitimate requests for expensive operations
What is a high-density attack?
How do they work?• Trigger exceptional resource usage. For example:
• Cause poor algorithmic performance, i.e. “algorithmic complexity attack”
• Trigger an infinite loop bug• See Appendix 2 for details and more examples
• What types of resources?• CPU• Memory• Bandwidth• Disk• “Virtual resources” (e.g. connections)
8
What is a high-density attack?
Are you at risk?
• A dubious “best practice”: not planning for worst-case performance because you assume it’s sufficiently rare• Unrealistic assumption: you do not know the probability
distribution of your algorithm’s inputs• Inputs could become accidentally skewed• An attacker could give you worst-case input
• You are most at risk if you have algorithms that have poor worst-case performance that you do not regularly experience• And it is easy to intentionally trigger worst-case
performance9
What is a high-density attack?
Beer Garden: Theory
10
A defense against CPU-bound high-density attacks that target web applications and web services.
Ambitious Goals• Generic• Fully automated• Easy configuration• Security guarantees
12
Beer Garden: Theory
General idea• Treat server like a crowded beer garden
• Doorman “you have to pay to enter”• Limits volume of attack requests that are admitted
• Bouncer “you need to leave now”• Limits damage of admitted attack requests
http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf
13
Beer Garden: Theory
Operation during overload
• One FastCGI worker process per core• Each worker can only handle 1 request at a time• (Also keep a few “spare” workers on deck)
• Doorman keeps a queue of requests• Only forwards request to a worker process if it is idle• If there are no idle workers, and a request has timed out, then ask
the Bouncer to evict that request• During overloads, timeouts are very aggressive
• Keep the queue short by insisting each visitor solve a computational “puzzle”
• Signature service• Learns to identify the “density” of requests (real time machine
learning)• Doorman creates harder puzzles for suspicious requests
• Bouncer• Kill (and restart) workers when Doorman asks
14
Beer Garden: Theory
Security Guarantees
16
• During an attack:• At least 95% of legitimate requests will be serviced
within 250 ms• At least 3,000 low-density requests can be serviced
per second (assuming attacker can solve at most 30 puzzles a second)
• Actual values depend on the application, available resources, and beer garden configuration• Use our “trainer” tool to determine security
guarantees
• Depends on assumptions
http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf
Beer Garden: Theory
Doorman module
21
• Not yet implemented• Requirement: Hot path must be lightening fast to handle high volume of
requests (most exposed component)• nginx module• Classifies incoming HTTP requests using signatures
• Give JavaScript puzzles* in response to HTTP requests• The more suspicious a request is, the harder the puzzle is
• Once visitor solves puzzle, put request in the queue• If queue gets too big, increase puzzle complexity
• If the queue is non-empty:• If there is an idle worker, then forward a request to Load Balancer• If a worker has timed out, then forward a request to Load Balancer
• Send copies of HTTP requests to the Request Cache• Signature service analyzes these requests to generate signatures• If there is a high volume of requests, then send samples• Send the first megabyte of the request along with the size of the request
Beer Garden: Implementation
*Ari Jules and John Brainard, "Client Puzzles: A Cryptographic Countermeasure Against Connection Depletion Attacks," in Proceedings of NDSS '99 (Networks and Distributed Security Systems), 1999.
Load balancer module
22
• Mostly implemented
• nginx module• Only forwards requests to idle workers• Send alerts to kill workers, as needed
• Let A = number of idle workers• Let B = number of “spare” workers• There should always be at least B idle workers.
• A should be >= B
• If A < B, then choose request that has been in the system the longest, and send alert for that worker to Alert Router. (That worker will be killed)
• Nginx notifies Load Balancer every time a request completes.• Send “request complete” message for each successfully
completed request to Alert Router. (So that Signature service can know which requests are low density)
Beer Garden: Implementation
Alert Router
23
• Mostly implemented
• Python service• Reads messages from Load Balancer via named pipe• Sends messages via Thrift RPC
• Receives two kinds of messages:• Alerts to kill workers• “Request complete” messages
• Forwards alerts to:• Bouncer, so it can kill (and restart the worker)• Signature Service, so it knows which requests have high
density
• Forwards “request complete” messages to:• Signature Service, so it knows what requests have low density
Beer Garden: Implementation
Bouncer Process Manager
24
• Mostly implemented
• One bouncer per backend machine• Monitors worker processes
• Restarts workers when they crash (or are killed)
• Thrift service, implemented in Python• Receives alerts via Thrift RPC
• When Bounce receives alert, it kills the selected worker (because it timed out).• Automatically restarts it
Beer Garden: Implementation
Signature Service
25
• Request Cache• Will be implemented as instance of memcached
• Keeps a cache of text from HTTP requests
• Signature Service• Will be implemented as Thrift service in Python• The Alert Router tells the signature service which requests are high-density
and which are low-density.• The Signature Service periodically analyzes the recent examples of high-
and low-density requests to learn their characteristics• Generates signatures for high-density requests and submits them to
Doorman• Requirements:
• Classifying requests using signatures must be lightening fast• Code to classify requests must either exist in C or be sufficiently simple (so I can
implement them in C)• Generating signatures must not be too slow• Analyze relevant features, develop good signatures
• Machine learning algorithms TBD
Beer Garden: Implementation
Backup algorithms
28
• Complementary to Beer Garden• When overload occurs flip a switch that replaces
poor-worst case algorithms with good worst-case algorithms
• What kind of algorithms?• Approximate algorithms• Algorithms that are less complete• Algorithms that have poor average-case performance• Algorithms that exhibit worst-case performance under
different conditions
Appendix 1: Near-term solutions
Randomized algorithms
29
• Let’s say you must always use an algorithm that has bad worst-case performance
• Is it easy to intentionally trigger worst-case performance?
• Can you make it hard to intentionally trigger worst-case performance?
• Examples:• Shuffle before quicksort• Randomize hash seed
Appendix 1: Near-term solutions
Approximate Beer Garden
30
• Beer Garden is ambitious• Generic defense• Fully automated• Easy configuration• Security guarantees
• An application-specific approximation of Beer Garden will be much easier to implement and still be valuable in practice
• Approximate Signature Service• Heuristically detect high-density requests• Which requests in your app have potential for high density?• Allow admin to manually specify signatures during emergencies
• Approximate Doorman: try to allocate resources “securely”• Give logged in users preference• Each “identity” (IP address or username) gets certain number of requests per
minute• Give non-suspicious requests preferential treatment. For example:
• Quarantine suspicious requests: if you have 10 backend machines, send the suspicious requests to 1 designated backend. Send all other requests to the remaining 9.
• Approximate Bouncer• During overloads increase aggressiveness of timeouts
Appendix 1: Near-term solutions
Service Oriented Arch.
31
• Services provide performance isolation• Instead of embedding “dangerous” algorithms
in application code, put each in a separate service.• E.g. a “quicksort” service
• If that service gets overloaded, then that feature is no longer available• But everything else should work• Application should be developed to gracefully
handle crashed services
Appendix 1: Near-term solutions
Related Work
32
• For other ideas, see related work section in http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf
Appendix 1: Near-term solutions
Linux-kernel vulnerability
• Attack packets cause collisions in hash table in Linux kernel• Hash table operations normally O(1)• During attack O(n)
• http://www.enyo.de/fw/security/notes/linux-dst-cache-dos.html
35
Routing decision
Network device driver
Deliver packetForward packet
…
Routing cache implemented as a hash table
attack packets
Appendix 2: Examples
Wikipedia high-density accident (1/2)
36
• On June 25, 2009 rumors of Michael Jackson’s death lead to an increase of traffic to his Wikipedia page
• Because Jackson’s page contained an unusually complex subsection, rendering the page caused Wikipedia’s servers to consume an excessive amount of CPU resources—leading to a site-wide DoS.
Appendix 2: Examples
37http://dom.as/2009/06/26/embarrassment/
http://blog.wikimedia.org/2009/06/25/current-events/
Wikipedia high-density accident (2/2)
A negligible increase in network traffic (300 packets per second)
caused CPU usage to go over capacity, resulting in a DoS
Appendix 2: Examples
• A bug in both Java and PHP language runtimes• If you tried to parse a particular string as a floating point number, it would
cause an infinite loop• Practical significance: unauthenticated users can cause any Java or PHP
web application to crash by giving it a particular floating-point value in the header
• PHP runtime: 545-line function zend_strtod
• Source code for zend_strtod is almost correct• But the compiled code performs double-precision arithmetic on an
extended-precision number• number converges before it is sufficiently precise (an erroneous fixed
point)• The bug fix simply declares the variable as volatile
• forces the use of double-precision numbers
Floating point bug
38
for(;;){incrementally adjust number until it is sufficiently
precise}
Appendix 2: Examples
top related