post-attack analysis of unknown vulnerabilities

30
Computer Science Post-Attack Analysis of Unknown Vulnerabilities Peng Ning With Emre C. Sezer, Chongkyung Kil, and Jun Xu

Upload: lew

Post on 05-Jan-2016

45 views

Category:

Documents


5 download

DESCRIPTION

Post-Attack Analysis of Unknown Vulnerabilities. Peng Ning With Emre C. Sezer, Chongkyung Kil, and Jun Xu. Motivation. Vulnerability analysis Essential for Patching Vulnerability based signature generation Painstakingly slow Depends on human efforts Existing approaches - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Post-Attack Analysis of Unknown Vulnerabilities

Computer Science

Post-Attack Analysis of Unknown Vulnerabilities

Peng Ning

With Emre C. Sezer, Chongkyung Kil, and Jun Xu

Page 2: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 2Computer Science

Motivation

• Vulnerability analysis– Essential for

• Patching

• Vulnerability based signature generation

– Painstakingly slow• Depends on human efforts

• Existing approaches– Static analysis (e.g., [Chen et al. 04] , [Feng et al. 04], [Larochelle & Evans 01])

• False positives

– Dynamic analysis (e.g., Minos [Crandall et al. 04], TaintCheck [Newsome & Song 05], DIRA [Smirnov & Chiueh 05])

• Used for detection; inadequate vulnerability information

– Symbolic execution (e.g., Exe [Cadar et al. 06], DACODA [Crandall et al. 05])• Scalability issues

– Recovery (e.g., STEM [Sidiroglou et al. 05], SEAD [Lacosto et al. 07])• Change of application semantics

Page 3: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 3Computer Science

MemSherlock

• MemSherlock is an automated debugger– Automated analysis of unknown memory corruption vulnerabilities

– Appeared in ACM CCS ’07

• MemSherlock provides– Statement that causes the memory corruption

– Dynamic program slice leading to the corruption

– Program variables involved in the vulnerability

– All presented at programming language level

• Implications– Generating vulnerability conditions

– Improves signature or patch generation speed

Page 4: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 4Computer Science

General Framework: Web Application Example

Light-weight IDS

Program

Logger

Traffic

MemSherlock

Instrumented

Program

Replayer

Trigger

Page 5: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 5Computer Science

MemSherlock Overview

• Goal is to provide vulnerability information – Intuitive, easy to understand for the programmer

• Not only the corruption point– Slice of program involved in the vulnerability

– Effects of user inputs

– Program variables involved

– Variable relationships (e.g., pointer aliasing)

– Type of vulnerability (e.g., stack buffer overflow)

• MemSherlock performs two important tasks– Finding the corruption point

– Tracking program state

Page 6: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 6Computer Science

MemSherlock: Finding Corruption Point

• Observation: A memory object is modified by a small set of statements (inspired by AccMon)

• For memory object m, write set of m is the set of statements that legitimately modify m, WS(m)

• Security Condition: Memory object m should only be updated by statements in WS(m)

Page 7: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 7Computer Science

MemSherlock: Assembly Line

• Pre-Debugging Phase– Instruments the program for debugging phase

– Extracts program information via static analysis

– Needs to be performed once

• Debugging Phase– Tracks program state

– Monitors memory writes and checks for violation of security condition

– Tracks tainted data and its propagation

Page 8: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 8Computer Science

MemSherlock Architecture

Static Analyzer

Source Code

Rewriting

Compiler

Debugging Agent

Vulnerabilityinformation

Pre-debugging phase

CC CC

010110100101

procvaraddr

Original source files

Program executable

Malicious input

Debugging information

Library specification

Page 9: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 9Computer Science

Pre-debugging: Generating Write Sets

• MemSherlock analyses source code to determine write sets

• For a program variable v, WS(v) includes– Assignment statements (i.e., v=expr)

– Library function calls where v is passed as an argument that can be modified (i.e., memcpy(&v,src))

• MemSherlock treats DLLs as black boxes– Assumption: A DLL is internally secure, but externally insecure

• e.g., no stack overflows in the library functions

• Sound for common, well tested libraries (e.g., clib)

– Requires library specifications

– For each DLL, a list of functions and the arguments they might modify

Page 10: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 10Computer Science

Dealing with Pointers

• For a pointer variable p two write sets are kept– WS(p) – Statements that modify p

– WS(ref(p)) – Statements that modify the referent (e.g., *p=5)

• ref(p) is resolved during runtime (debugging)

• Perform the same analysis for pointer-type function arguments at function calls– Removes the requirement for inter-procedural static analysis

1 int i = 0;2 int *p = &i;3 *p = 1;4 p = NULL;

WS(i) = {1}WS(p) = {2,4}

WS(ref(p)) = {3}

(a) Code example

Line1234

ref(p)N/A

ii

NULL

WS(i){1}{1,3}{1,3}{1}

(b) Write sets after static analysis

(c) ref(p) and WS(i) during monitoring

Page 11: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 11Computer Science

Chained Dereferences

• Earlier technique can only handle simple dereferences

• Source code rewriting is used to convert all chained dereferences to simple dereferences

• Any other dereference that is not simple is converted in the same manner

1 int z;

2 int *y = &z;

3 int **x = &y;

4 **x = 10;

1 int z;

2 int *y = &z;

3 int **x = &y;

4 int *temp = *x;

5 *temp = 10;

Page 12: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 12Computer Science

Output of Pre-debugging Phase

• Simplified program– Simplified pointer dereferences

– Compiled with debugging options

• Input file for the debugger– Program variables and their write sets

– Addresses of global symbols

– Frame pointer offsets of local variables

– Other flags that help the debugger

Page 13: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 13Computer Science

MemSherlock Architecture: Debugging

Static Analyzer

Source Code

Rewriting

Compiler

Debugging Agent

Vulnerabilityinformation

CC CC

010110100101

procvaraddr

Original source files

Program executable

Malicious input

Debugging information

Library specification

Debugging phase

Page 14: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 14Computer Science

Debugging: Dynamic Monitoring

• Runtime monitoring– State Maintenance

– Incorporates taint analysis from TaintCheck• Produces a dynamic slice of the program leading to the vulnerability

• Write Checking– Monitors and validates memory writes

– Write sets are file name and line number pairs <f,l>• Instruction pointer IP is translated into <f,l>

– Write sets are associated with program variables• A destination address is translated into a program variable

Page 15: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 15Computer Science

Keeping Program State

• A given memory region may correspond to different program variables depending on program state

• Dynamic monitor keeps track of memory mapping

mainStack base

Virtual Address Space

fnc A

fnc B

main

fnc A

fnc C

Stack base

Program State 1 Program State 2

Memory write0xABABABAB

Memory write0xABABABAB

Page 16: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 16Computer Science

Debugging: Key Data Structures

• Keeps two lists of memory regions– ActiveMemoryRegions

• Memory corresponding to program variables or their referent memory regions

– NonWritableRegions• Saved registers, return addresses, metadata encapsulating dynamically allocated

memory regions

Page 17: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 17Computer Science

Debugging: State Maintenance

• Function calls/returns (memory)– Local variable addresses are calculated and added to ActiveMemoryRegions– Location of return address and saved registers are added to

NonWritableRegions list

• Heap memory (memory)– malloc/free calls are intercepted– Allocated memory is added to ActiveMemoryRegions– The metadata encapsulating the buffer is added to NonWritableRegions

• Pointer value updates (write sets)– Searches ActiveMemoryRegions to find the referent and updates its WS

Page 18: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 18Computer Science

Debugging: Write Checking

• When instruction IP modifies memory m– if m is in ActiveMemoryRegions

• determines the variable v it belongs to

• converts IP into <f,l>

• checks if <f,l> is in WS(v)

• If the memory write check fails or m is in NonWritableRegions– Marks the operation as a memory corruption

– Displays the vulnerability information

Page 19: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 19Computer Science

Generating Vulnerability Information

• The slice of program contributing to the vulnerability– Statements that have propagated tainted values

– Statements that have modified related memory regions

• Dependency between memory objects involved in the vulnerability– Points to analysis shows memory regions and how they were accessed

• Program state– Call stack information

– Write set information

Page 20: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 20Computer Science

Example Test Case: Null HTTP

•~~http.c~~• 91: void ReadPOSTData(int sid) {• …•100: conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024, sizeof(char));•101: if (conn[sid].PostData==NULL) { ...•107: do {•108: rc=recv(conn[sid].socket, pPostData, 1024, 0);•109: …

•--20361-- Error type: Heap Buffer Overflow

•--20361-- Dest Addr: 3AB3E360

•--20361-- IP: 0x804E5C7: ReadPOSTData (http.c:108)

•--20361-- Dest address resolved to:

•--20361-- Global variable "heap var"

• @ 3AB3E280 (size: 224)

•--20361--

•--20361-- Memory allocated by 0x804E531:

• ReadPOSTData (http.c:100)

•--20361-- TAINTED destination 3AB3E360

•--20361-- Fully tainted from:

•--20361-- 0x804E5C7: ReadPOSTData (http.c:108)

•--20361--

•--20361-- TAINTED size used during allocation

•--20361-- Tainted from:

•--20361-- 0x804E456: ReadPOSTData (http.c:100)

•--20361-- 0x804FBB5: read_header (http.c:153)

•--20361-- 0x805121B: sgets (server.c:211)

•Error Report:

Page 21: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 21Computer Science

Vulnerability Analysis Example

~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData;

...100: conn[sid].PostData=calloc(

conn[sid].dat->in_ContentLength+1024, sizeof(char));...

107: do {108: rc=recv(conn[sid].socket, pPostData, 1024, 0);

... Heap Object

Create

Page 22: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 22Computer Science

Vulnerability Analysis Example

Object

Use

~~http.c:~~119: int read_header(int sid) {121: char line[2048];

...127: do {128: memset(line, 0, sizeof(line));129: sgets(line, sizeof(line)-1, conn[sid].socket);

...

153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ...

169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) {170: ReadPOSTData(sid);

~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData;

...100: conn[sid].PostData=calloc(

conn[sid].dat->in_ContentLength+1024, sizeof(char));...

107: do {108: rc=recv(conn[sid].socket, pPostData, 1024, 0);

...

Object

Taint

Page 23: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 23Computer Science

Vulnerability Analysis Example

Object

~~http.c:~~119: int read_header(int sid) {121: char line[2048];

...127: do {128: memset(line, 0, sizeof(line));129: sgets(line, sizeof(line)-1, conn[sid].socket);

...

153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ...

169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) {170: ReadPOSTData(sid);

~~server.c~~202: int sgets(char *buffer, int max, int fd)203: { ...209: conn[sid].atime=time((time_t*)0);210: while (n<max) {211: if ((rc=recv(conn[sid].socket, buffer, 1, 0))<0) {

...

Object

Taint

Taint

Create

Page 24: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 24Computer Science

Implementation

• Source code is rewritten using CIL (C Intermediate Language)• CodeSurfer was used to extract program variables and their write sets

– A commercial static analysis tool

• objdump and dwarfdump were used to extract global symbol information

• Dynamic Monitoring is implemented in Valgrind– An open source emulator

Page 25: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 25Computer Science

Evaluation

• Tested 11 real-world applications with known memory corruption vulnerabilities

• Test cases included– Stack/Heap buffer overflow, Format string– Both control flow and non-control data attacks

• Testing methodology– Programs were run under MemSherlock– Exploit programs were used to attack the applications– Log and replay was not used

Page 26: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 26Computer Science

Evaluation Results

Application Name

Vuln.Type

Description Captured? #FP

GHTTP S A small HTTP server Yes 7

Icecast S An mp3 broadcast server Yes 0

Sumus S A game server for ‘mus’ Yes 0

Monit S Multi-purpose anomaly detector Yes 0

Newspost S Automatic news posting Yes 2

Prozilla S A download accelerator for Linux No 0

NullHTTP H An HTTP server Yes 0

Xtelnet H A telnet server Yes 4

Wsmp3 H Web server with mp3 broadcasting Yes 0

OpenVMPS F Open source VLan management policy server Yes 2

Power F UPS monitoring utility Yes 10

Type abbreviations: (S)tack overflow, (H)eap overflow and (F)ormat string

Page 27: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 27Computer Science

False Negatives

• Prozilla:– memcpy uses a kernel function to manipulate page tables when copying entire

pages

– Valgrind cannot trace into kernel

– Can be prevented by function wrappers

• Other false negatives are theoretically possible– structs within unions or arrays

• Current implementation does not support unions

• Currently do not differentiate between elements of an array

– Memory corruption errors inside DLLs

Page 28: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 28Computer Science

False Positives

• Embedded assembly

• Incomplete library specification– library functions keeping internal state (e.g., strtok(Null, delim) )

– library functions that modify global variables as side effects (e.g., optarg, errno)

– pointers that point to hidden global structures (e.g., getdatetime() in time.h)

• struct pointers– void pointers that are type-cast to modify struct variables

– since the pointer is not of type struct, MemSherlock fails to update accordingly

Page 29: Post-Attack Analysis of Unknown Vulnerabilities

Nov 14, 2007 2007 GMU-CSA Workshop 29Computer Science

Conclusion

• Fully automated vulnerability analysis

• The analysis output is intuitive and human readable

• Future Challenges– Automated, long-term fix of vulnerabilities

• Semantic consistency is a great challenge

– Automated, temporary fix of vulnerabilities• Generating vulnerability condition

• Improving signature generation

Page 30: Post-Attack Analysis of Unknown Vulnerabilities

Computer Science

Thank You