post-attack analysis of unknown vulnerabilities

Computer Science

Post-Attack Analysis of Unknown Vulnerabilities

Peng Ning

With Emre C. Sezer, Chongkyung Kil, and Jun Xu

Nov 14, 2007 2007 GMU-CSA Workshop 2Computer Science

Motivation

• Vulnerability analysis– Essential for

• Patching

• Vulnerability based signature generation

– Painstakingly slow• Depends on human efforts

• Existing approaches– Static analysis (e.g., [Chen et al. 04] , [Feng et al. 04], [Larochelle & Evans 01])

• False positives

– Dynamic analysis (e.g., Minos [Crandall et al. 04], TaintCheck [Newsome & Song 05], DIRA [Smirnov & Chiueh 05])

• Used for detection; inadequate vulnerability information

– Symbolic execution (e.g., Exe [Cadar et al. 06], DACODA [Crandall et al. 05])• Scalability issues

– Recovery (e.g., STEM [Sidiroglou et al. 05], SEAD [Lacosto et al. 07])• Change of application semantics


MemSherlock

• MemSherlock is an automated debugger– Automated analysis of unknown memory corruption vulnerabilities

– Appeared in ACM CCS ’07

• MemSherlock provides– Statement that causes the memory corruption

– Dynamic program slice leading to the corruption

– Program variables involved in the vulnerability

– All presented at programming language level

• Implications– Generating vulnerability conditions

– Improves signature or patch generation speed


General Framework: Web Application Example

Light-weight IDS

Program

Logger

Traffic

MemSherlock

Instrumented

Program

Replayer

Trigger


MemSherlock Overview

• Goal is to provide vulnerability information – Intuitive, easy to understand for the programmer

• Not only the corruption point– Slice of program involved in the vulnerability

– Effects of user inputs

– Program variables involved

– Variable relationships (e.g., pointer aliasing)

– Type of vulnerability (e.g., stack buffer overflow)

• MemSherlock performs two important tasks– Finding the corruption point

– Tracking program state


MemSherlock: Finding Corruption Point

• Observation: A memory object is modified by a small set of statements (inspired by AccMon)

• For memory object m, write set of m is the set of statements that legitimately modify m, WS(m)

• Security Condition: Memory object m should only be updated by statements in WS(m)


MemSherlock: Assembly Line

• Pre-Debugging Phase– Instruments the program for debugging phase

– Extracts program information via static analysis

– Needs to be performed once

• Debugging Phase– Tracks program state

– Monitors memory writes and checks for violation of security condition

– Tracks tainted data and its propagation


MemSherlock Architecture

Static Analyzer

Source Code

Rewriting

Compiler

Debugging Agent

Vulnerabilityinformation

Pre-debugging phase

CC CC

010110100101

procvaraddr

Original source files

Program executable

Malicious input

Debugging information

Library specification


Pre-debugging: Generating Write Sets

• MemSherlock analyses source code to determine write sets

• For a program variable v, WS(v) includes– Assignment statements (i.e., v=expr)

– Library function calls where v is passed as an argument that can be modified (i.e., memcpy(&v,src))

• MemSherlock treats DLLs as black boxes– Assumption: A DLL is internally secure, but externally insecure

• e.g., no stack overflows in the library functions

• Sound for common, well tested libraries (e.g., clib)

– Requires library specifications

– For each DLL, a list of functions and the arguments they might modify


Dealing with Pointers

• For a pointer variable p two write sets are kept– WS(p) – Statements that modify p

– WS(ref(p)) – Statements that modify the referent (e.g., *p=5)

• ref(p) is resolved during runtime (debugging)

• Perform the same analysis for pointer-type function arguments at function calls– Removes the requirement for inter-procedural static analysis

1 int i = 0;2 int *p = &i;3 *p = 1;4 p = NULL;

WS(i) = {1}WS(p) = {2,4}

WS(ref(p)) = {3}

(a) Code example

Line1234

ref(p)N/A

ii

NULL

WS(i){1}{1,3}{1,3}{1}

(b) Write sets after static analysis

(c) ref(p) and WS(i) during monitoring


Chained Dereferences

• Earlier technique can only handle simple dereferences

• Source code rewriting is used to convert all chained dereferences to simple dereferences

• Any other dereference that is not simple is converted in the same manner

1 int z;

2 int *y = &z;

3 int **x = &y;

4 **x = 10;

1 int z;

2 int *y = &z;

3 int **x = &y;

4 int *temp = *x;

5 *temp = 10;


Output of Pre-debugging Phase

• Simplified program– Simplified pointer dereferences

– Compiled with debugging options

• Input file for the debugger– Program variables and their write sets

– Addresses of global symbols

– Frame pointer offsets of local variables

– Other flags that help the debugger


MemSherlock Architecture: Debugging

Static Analyzer

Source Code

Rewriting

Compiler

Debugging Agent

Vulnerabilityinformation

CC CC

010110100101

procvaraddr

Original source files

Program executable

Malicious input

Debugging information

Library specification

Debugging phase


Debugging: Dynamic Monitoring

• Runtime monitoring– State Maintenance

– Incorporates taint analysis from TaintCheck• Produces a dynamic slice of the program leading to the vulnerability

• Write Checking– Monitors and validates memory writes

– Write sets are file name and line number pairs <f,l>• Instruction pointer IP is translated into <f,l>

– Write sets are associated with program variables• A destination address is translated into a program variable


Keeping Program State

• A given memory region may correspond to different program variables depending on program state

• Dynamic monitor keeps track of memory mapping

mainStack base

Virtual Address Space

fnc A

fnc B

main

fnc A

fnc C

Stack base

Program State 1 Program State 2

Memory write0xABABABAB

Memory write0xABABABAB


Debugging: Key Data Structures

• Keeps two lists of memory regions– ActiveMemoryRegions

• Memory corresponding to program variables or their referent memory regions

– NonWritableRegions• Saved registers, return addresses, metadata encapsulating dynamically allocated

memory regions


Debugging: State Maintenance

• Function calls/returns (memory)– Local variable addresses are calculated and added to ActiveMemoryRegions– Location of return address and saved registers are added to

NonWritableRegions list

• Heap memory (memory)– malloc/free calls are intercepted– Allocated memory is added to ActiveMemoryRegions– The metadata encapsulating the buffer is added to NonWritableRegions

• Pointer value updates (write sets)– Searches ActiveMemoryRegions to find the referent and updates its WS


Debugging: Write Checking

• When instruction IP modifies memory m– if m is in ActiveMemoryRegions

• determines the variable v it belongs to

• converts IP into <f,l>

• checks if <f,l> is in WS(v)

• If the memory write check fails or m is in NonWritableRegions– Marks the operation as a memory corruption

– Displays the vulnerability information


Generating Vulnerability Information

• The slice of program contributing to the vulnerability– Statements that have propagated tainted values

– Statements that have modified related memory regions

• Dependency between memory objects involved in the vulnerability– Points to analysis shows memory regions and how they were accessed

• Program state– Call stack information

– Write set information


Example Test Case: Null HTTP

•~~http.c~~• 91: void ReadPOSTData(int sid) {• …•100: conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024, sizeof(char));•101: if (conn[sid].PostData==NULL) { ...•107: do {•108: rc=recv(conn[sid].socket, pPostData, 1024, 0);•109: …

•--20361-- Error type: Heap Buffer Overflow

•--20361-- Dest Addr: 3AB3E360

•--20361-- IP: 0x804E5C7: ReadPOSTData (http.c:108)

•--20361-- Dest address resolved to:

•--20361-- Global variable "heap var"

• @ 3AB3E280 (size: 224)

•--20361--

•--20361-- Memory allocated by 0x804E531:

• ReadPOSTData (http.c:100)

•--20361-- TAINTED destination 3AB3E360

•--20361-- Fully tainted from:

•--20361-- 0x804E5C7: ReadPOSTData (http.c:108)

•--20361--

•--20361-- TAINTED size used during allocation

•--20361-- Tainted from:

•--20361-- 0x804E456: ReadPOSTData (http.c:100)

•--20361-- 0x804FBB5: read_header (http.c:153)

•--20361-- 0x805121B: sgets (server.c:211)

•Error Report:


Vulnerability Analysis Example

~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData;

...100: conn[sid].PostData=calloc(

conn[sid].dat->in_ContentLength+1024, sizeof(char));...

107: do {108: rc=recv(conn[sid].socket, pPostData, 1024, 0);

... Heap Object

Create



Object

Use

~~http.c:~~119: int read_header(int sid) {121: char line[2048];

...127: do {128: memset(line, 0, sizeof(line));129: sgets(line, sizeof(line)-1, conn[sid].socket);

...

153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ...

169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) {170: ReadPOSTData(sid);

~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData;

...100: conn[sid].PostData=calloc(

conn[sid].dat->in_ContentLength+1024, sizeof(char));...

107: do {108: rc=recv(conn[sid].socket, pPostData, 1024, 0);

...

Object

Taint



Object

~~http.c:~~119: int read_header(int sid) {121: char line[2048];

...127: do {128: memset(line, 0, sizeof(line));129: sgets(line, sizeof(line)-1, conn[sid].socket);

...

153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ...

169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) {170: ReadPOSTData(sid);

~~server.c~~202: int sgets(char *buffer, int max, int fd)203: { ...209: conn[sid].atime=time((time_t*)0);210: while (n<max) {211: if ((rc=recv(conn[sid].socket, buffer, 1, 0))<0) {

...

Object

Taint

Taint

Create


Implementation

• Source code is rewritten using CIL (C Intermediate Language)• CodeSurfer was used to extract program variables and their write sets

– A commercial static analysis tool

• objdump and dwarfdump were used to extract global symbol information

• Dynamic Monitoring is implemented in Valgrind– An open source emulator


Evaluation

• Tested 11 real-world applications with known memory corruption vulnerabilities

• Test cases included– Stack/Heap buffer overflow, Format string– Both control flow and non-control data attacks

• Testing methodology– Programs were run under MemSherlock– Exploit programs were used to attack the applications– Log and replay was not used


Evaluation Results

Application Name

Vuln.Type

Description Captured? #FP

GHTTP S A small HTTP server Yes 7

Icecast S An mp3 broadcast server Yes 0

Sumus S A game server for ‘mus’ Yes 0

Monit S Multi-purpose anomaly detector Yes 0

Newspost S Automatic news posting Yes 2

Prozilla S A download accelerator for Linux No 0

NullHTTP H An HTTP server Yes 0

Xtelnet H A telnet server Yes 4

Wsmp3 H Web server with mp3 broadcasting Yes 0

OpenVMPS F Open source VLan management policy server Yes 2

Power F UPS monitoring utility Yes 10

Type abbreviations: (S)tack overflow, (H)eap overflow and (F)ormat string


False Negatives

• Prozilla:– memcpy uses a kernel function to manipulate page tables when copying entire

pages

– Valgrind cannot trace into kernel

– Can be prevented by function wrappers

• Other false negatives are theoretically possible– structs within unions or arrays

• Current implementation does not support unions

• Currently do not differentiate between elements of an array

– Memory corruption errors inside DLLs


False Positives

• Embedded assembly

• Incomplete library specification– library functions keeping internal state (e.g., strtok(Null, delim) )

– library functions that modify global variables as side effects (e.g., optarg, errno)

– pointers that point to hidden global structures (e.g., getdatetime() in time.h)

• struct pointers– void pointers that are type-cast to modify struct variables

– since the pointer is not of type struct, MemSherlock fails to update accordingly


Conclusion

• Fully automated vulnerability analysis

• The analysis output is intuitive and human readable

• Future Challenges– Automated, long-term fix of vulnerabilities

• Semantic consistency is a great challenge

– Automated, temporary fix of vulnerabilities• Generating vulnerability condition

• Improving signature generation

Computer Science

Thank You

post-attack analysis of unknown vulnerabilities

Documents