beer garden a defense against high-density attacks michael n. gagnon founder and director, hellasec...

38
Beer Garden A defense against high-density attacks Michael N. Gagnon Founder and Director, HellaSec LLC [email protected] 1 This work was funded by DARPA’s Cyber Fast Track program. Distribution Statement “A” (Approved for Public Release, Distribution Unlimited). The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Upload: darlene-robertson

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Beer GardenA defense against high-density attacks

Michael N. GagnonFounder and Director, HellaSec LLC

[email protected]

1This work was funded by DARPA’s Cyber Fast Track program. Distribution Statement “A” (Approved for Public Release, Distribution Unlimited). The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Contents

• What is a high-density attack?• Beer Garden defense

• Theory• Demo• Implementation

• Appendix 1: Near-term solutions• Appendix 2: Examples

2

What is a high-density attack?

3

4

What is a high-density attack?

Background: conventional DoS• A server is powerful• It takes an army of PCs to take down one server

5

What is a high-density attack?

1. Each PC sends as much traffic as possible

2. This traffic overloads the server

3. The server becomes unresponsive

attack traffic

High-density attacks

6

• It takes one PC to take down a single server by sending “high-density” attack traffic

1. A single attacker sends attack traffic

2. This traffic overloads the server

3. The server becomes unresponsive

high-densityattack traffic

What is a high-density attack?

Density

7

• Mass = resources consumed• Volume = number of requests• Density = resources consumed per request

• Ratio of mass to volume

• Examples of low-density requests• Most legitimate traffic• Conventional DoS traffic

• Examples of high-density requests• Algorithmic-complexity attacks• Legitimate requests for expensive operations

What is a high-density attack?

How do they work?• Trigger exceptional resource usage. For example:

• Cause poor algorithmic performance, i.e. “algorithmic complexity attack”

• Trigger an infinite loop bug• See Appendix 2 for details and more examples

• What types of resources?• CPU• Memory• Bandwidth• Disk• “Virtual resources” (e.g. connections)

8

What is a high-density attack?

Are you at risk?

• A dubious “best practice”: not planning for worst-case performance because you assume it’s sufficiently rare• Unrealistic assumption: you do not know the probability

distribution of your algorithm’s inputs• Inputs could become accidentally skewed• An attacker could give you worst-case input

• You are most at risk if you have algorithms that have poor worst-case performance that you do not regularly experience• And it is easy to intentionally trigger worst-case

performance9

What is a high-density attack?

Beer Garden: Theory

10

A defense against CPU-bound high-density attacks that target web applications and web services.

11

Beer Garden: Theory

Ambitious Goals• Generic• Fully automated• Easy configuration• Security guarantees

12

Beer Garden: Theory

General idea• Treat server like a crowded beer garden

• Doorman “you have to pay to enter”• Limits volume of attack requests that are admitted

• Bouncer “you need to leave now”• Limits damage of admitted attack requests

http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf

13

Beer Garden: Theory

Operation during overload

• One FastCGI worker process per core• Each worker can only handle 1 request at a time• (Also keep a few “spare” workers on deck)

• Doorman keeps a queue of requests• Only forwards request to a worker process if it is idle• If there are no idle workers, and a request has timed out, then ask

the Bouncer to evict that request• During overloads, timeouts are very aggressive

• Keep the queue short by insisting each visitor solve a computational “puzzle”

• Signature service• Learns to identify the “density” of requests (real time machine

learning)• Doorman creates harder puzzles for suspicious requests

• Bouncer• Kill (and restart) workers when Doorman asks

14

Beer Garden: Theory

Logical flow of a bad request

15

Beer Garden: Theory

Security Guarantees

16

• During an attack:• At least 95% of legitimate requests will be serviced

within 250 ms• At least 3,000 low-density requests can be serviced

per second (assuming attacker can solve at most 30 puzzles a second)

• Actual values depend on the application, available resources, and beer garden configuration• Use our “trainer” tool to determine security

guarantees

• Depends on assumptions

http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf

Beer Garden: Theory

Beer Garden:Demo

17

Beer Garden:Implementation

18

git clone git://github.com/mikegagnon/nginx-overload-handler.git

19

Beer Garden: Implementation

Architecture

20

Beer Garden: Implementation

Doorman module

21

• Not yet implemented• Requirement: Hot path must be lightening fast to handle high volume of

requests (most exposed component)• nginx module• Classifies incoming HTTP requests using signatures

• Give JavaScript puzzles* in response to HTTP requests• The more suspicious a request is, the harder the puzzle is

• Once visitor solves puzzle, put request in the queue• If queue gets too big, increase puzzle complexity

• If the queue is non-empty:• If there is an idle worker, then forward a request to Load Balancer• If a worker has timed out, then forward a request to Load Balancer

• Send copies of HTTP requests to the Request Cache• Signature service analyzes these requests to generate signatures• If there is a high volume of requests, then send samples• Send the first megabyte of the request along with the size of the request

Beer Garden: Implementation

*Ari Jules and John Brainard, "Client Puzzles: A Cryptographic Countermeasure Against Connection Depletion Attacks," in Proceedings of NDSS '99 (Networks and Distributed Security Systems), 1999.

Load balancer module

22

• Mostly implemented

• nginx module• Only forwards requests to idle workers• Send alerts to kill workers, as needed

• Let A = number of idle workers• Let B = number of “spare” workers• There should always be at least B idle workers.

• A should be >= B

• If A < B, then choose request that has been in the system the longest, and send alert for that worker to Alert Router. (That worker will be killed)

• Nginx notifies Load Balancer every time a request completes.• Send “request complete” message for each successfully

completed request to Alert Router. (So that Signature service can know which requests are low density)

Beer Garden: Implementation

Alert Router

23

• Mostly implemented

• Python service• Reads messages from Load Balancer via named pipe• Sends messages via Thrift RPC

• Receives two kinds of messages:• Alerts to kill workers• “Request complete” messages

• Forwards alerts to:• Bouncer, so it can kill (and restart the worker)• Signature Service, so it knows which requests have high

density

• Forwards “request complete” messages to:• Signature Service, so it knows what requests have low density

Beer Garden: Implementation

Bouncer Process Manager

24

• Mostly implemented

• One bouncer per backend machine• Monitors worker processes

• Restarts workers when they crash (or are killed)

• Thrift service, implemented in Python• Receives alerts via Thrift RPC

• When Bounce receives alert, it kills the selected worker (because it timed out).• Automatically restarts it

Beer Garden: Implementation

Signature Service

25

• Request Cache• Will be implemented as instance of memcached

• Keeps a cache of text from HTTP requests

• Signature Service• Will be implemented as Thrift service in Python• The Alert Router tells the signature service which requests are high-density

and which are low-density.• The Signature Service periodically analyzes the recent examples of high-

and low-density requests to learn their characteristics• Generates signatures for high-density requests and submits them to

Doorman• Requirements:

• Classifying requests using signatures must be lightening fast• Code to classify requests must either exist in C or be sufficiently simple (so I can

implement them in C)• Generating signatures must not be too slow• Analyze relevant features, develop good signatures

• Machine learning algorithms TBD

Beer Garden: Implementation

Appendix 1:Near-term solutions

26

27

Appendix 1: Near-term solutions

Backup algorithms

28

• Complementary to Beer Garden• When overload occurs flip a switch that replaces

poor-worst case algorithms with good worst-case algorithms

• What kind of algorithms?• Approximate algorithms• Algorithms that are less complete• Algorithms that have poor average-case performance• Algorithms that exhibit worst-case performance under

different conditions

Appendix 1: Near-term solutions

Randomized algorithms

29

• Let’s say you must always use an algorithm that has bad worst-case performance

• Is it easy to intentionally trigger worst-case performance?

• Can you make it hard to intentionally trigger worst-case performance?

• Examples:• Shuffle before quicksort• Randomize hash seed

Appendix 1: Near-term solutions

Approximate Beer Garden

30

• Beer Garden is ambitious• Generic defense• Fully automated• Easy configuration• Security guarantees

• An application-specific approximation of Beer Garden will be much easier to implement and still be valuable in practice

• Approximate Signature Service• Heuristically detect high-density requests• Which requests in your app have potential for high density?• Allow admin to manually specify signatures during emergencies

• Approximate Doorman: try to allocate resources “securely”• Give logged in users preference• Each “identity” (IP address or username) gets certain number of requests per

minute• Give non-suspicious requests preferential treatment. For example:

• Quarantine suspicious requests: if you have 10 backend machines, send the suspicious requests to 1 designated backend. Send all other requests to the remaining 9.

• Approximate Bouncer• During overloads increase aggressiveness of timeouts

Appendix 1: Near-term solutions

Service Oriented Arch.

31

• Services provide performance isolation• Instead of embedding “dangerous” algorithms

in application code, put each in a separate service.• E.g. a “quicksort” service

• If that service gets overloaded, then that feature is no longer available• But everything else should work• Application should be developed to gracefully

handle crashed services

Appendix 1: Near-term solutions

Related Work

32

• For other ideas, see related work section in http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf

Appendix 1: Near-term solutions

Appendix 2: Examples

33

34

Appendix 2: Examples

Linux-kernel vulnerability

• Attack packets cause collisions in hash table in Linux kernel• Hash table operations normally O(1)• During attack O(n)

• http://www.enyo.de/fw/security/notes/linux-dst-cache-dos.html

35

Routing decision

Network device driver

Deliver packetForward packet

Routing cache implemented as a hash table

attack packets

Appendix 2: Examples

Wikipedia high-density accident (1/2)

36

• On June 25, 2009 rumors of Michael Jackson’s death lead to an increase of traffic to his Wikipedia page

• Because Jackson’s page contained an unusually complex subsection, rendering the page caused Wikipedia’s servers to consume an excessive amount of CPU resources—leading to a site-wide DoS.

Appendix 2: Examples

37http://dom.as/2009/06/26/embarrassment/

http://blog.wikimedia.org/2009/06/25/current-events/

Wikipedia high-density accident (2/2)

A negligible increase in network traffic (300 packets per second)

caused CPU usage to go over capacity, resulting in a DoS

Appendix 2: Examples

• A bug in both Java and PHP language runtimes• If you tried to parse a particular string as a floating point number, it would

cause an infinite loop• Practical significance: unauthenticated users can cause any Java or PHP

web application to crash by giving it a particular floating-point value in the header

• PHP runtime: 545-line function zend_strtod

• Source code for zend_strtod is almost correct• But the compiled code performs double-precision arithmetic on an

extended-precision number• number converges before it is sufficiently precise (an erroneous fixed

point)• The bug fix simply declares the variable as volatile

• forces the use of double-precision numbers

Floating point bug

38

for(;;){incrementally adjust number until it is sufficiently

precise}

Appendix 2: Examples