penumbra: automatically identifying failure-relevant inputs (issta 2009)

Post on 19-Jun-2015

224 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Penumbra: Automatically Identifying Failure-Relevant Inputs

James Clause and Alessandro OrsoCollege of Computing

Georgia Institute of Technology

Supported in part by:NSF awards CCF-0725202 and CCF-0541080

to Georgia Tech

Automated Debugging

• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...

Automated Debugging

Code-centric

• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...

Automated Debugging

Code-centric

• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...

What about inputs which cause the failure?

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06

Data-centric Techniques

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

Penumbra

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

Penumbra

Comparableperformance

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

Requires:1. Single execution2. Reduced manual effort

Penumbra

Comparableperformance

Intuition and Terminology

Failure-revealing input vector

Intuition and Terminology

Failure-revealing input vector

Failure-relevant subset(inputs which are useful for investigating the failure)

Intuition and Terminology

Failure-revealing input vector

Failure-relevant subset(inputs which are useful for investigating the failure)

Approximate failure-relevant subsets by identifying inputs that reach the failure along

program dependencies.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfoCommand line arguments

(flag, list of file names)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

File statistics (for each file)(size, last modified date, ...)

Command line arguments(flag, list of file names)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

File statistics (for each file)(size, last modified date, ...)

File contents (for each file)(first 50 characters)

Command line arguments(flag, list of file names)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

File statistics (for each file)(size, last modified date, ...)

File contents (for each file)(first 50 characters)

Command line arguments(flag, list of file names) Input vector

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Overflow out

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

buf.st_size ≥ 1GB

Overflow out

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

buf.st_size ≥ 1GB

verbose is true

Overflow out

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

buf.st_size ≥ 1GB

verbose is true

Overflow out

read 50 characters

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

1. Many more inputs than lines of code.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

1. Many more inputs than lines of code.

2. Understanding the failure requires tracing interactions between inputs from multiple sources.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

1. Many more inputs than lines of code.

2. Understanding the failure requires tracing interactions between inputs from multiple sources.

3. Only a small percentage of all inputs are relevant for the failure.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

Relevant context:1. When the failure occurs.2. Which data are involved in

the failure.

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

13. strcat(out, pview);

In general, it is chosen using traditional debugging methods.

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

1

2

3

4

5

6

7

8

9

0

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

1

2

3

4

5

6

7

8

9

0

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

verbose is true

read 50 characters

buf.st_size ≥ 1GB

Outline

• Penumbra approach1. Tainting inputs

2. Propagating taint marks

3. Identifying relevant inputs

• Evaluation

• Conclusions and future work

1: Tainting InputsAssign a taint mark to each input as it enters the application.

1: Tainting InputsAssign a taint mark to each input as it enters the application.

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Maintains per -byte precision

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Maintains per -byte precision

Further increases scalability

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

When a taint mark is assigned to an input, log the input’s value and where the input was read from.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Maintains per -byte precision

Further increases scalability

2: Propagating Taint Marks

2: Propagating Taint MarksData-flow

Propagation (DF)Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

1 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

1 21 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

1 21 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

1 2 3

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

1 2 3

The effectiveness of each option depends on the particular failure.

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

3: Identifying Relevant-inputs1. Relevant context indicates

which data is involved in the considered failure.

2. Identify which taint marks as associated with the data indicated by the relevant context.

3. Use recorded logs to reconstruct inputs that are identified by the taint marks.

Baz

1.5GB

Prototype Implementation

TraceProcessor

Tracegenerator

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

Implemented using Dytan, a generic x86 tainting framework

developed in previous work [Clause and Orso 2007].

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

input subset(DF)

input subset(DF+CF)

EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging

EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging

Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:

EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging

Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:

We selected a failure-revealing input vector for each subject.

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

• Location: statement where the failure occurs.

• Data: any data read by such statement

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

• Use gdb to inspect stack trace and program data.

• One second timeout to prevent incorrect results.

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Study 1: Effectiveness

Is the information that Penumbra provides helpful for

debugging real failures?

Study 1 Results: gzip & ncompressCrash when a file name is longer than 1,024 characters.

Study 1 Results: gzip & ncompress

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Inputs: 10,000,056

longfile name[ ]

Study 1 Results: gzip & ncompress

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]

Study 1 Results: gzip & ncompress

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Relevant (DF + CF): 3# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]

Study 1 Results: pineCrash when a “from” field contains 22 or more double quote characters.

Study 1 Results: pine

# Inputs: 15,103,766

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

Crash when a “from” field contains 22 or more double quote characters.

Study 1 Results: pine

# Inputs: 15,103,766

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "

Crash when a “from” field contains 22 or more double quote characters.

Study 1 Results: pine

# Inputs: 15,103,766 # Relevant (DF): 26

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "

Crash when a “from” field contains 22 or more double quote characters.

Study 1 Results: pine

# Relevant (DF + CF): 15,100,344# Inputs: 15,103,766 # Relevant (DF): 26

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "

Crash when a “from” field contains 22 or more double quote characters.

Study 1: Conclusions

Study 1: Conclusions1. Data-flow propagation is always effective,

data- and control-flow propagation is sometimes effective.

➡ Use data-flow first then, if necessary, use control-flow.

Study 1: Conclusions1. Data-flow propagation is always effective,

data- and control-flow propagation is sometimes effective.

➡ Use data-flow first then, if necessary, use control-flow.

2. Inputs identified by Penumbra correspond to the failure conditions.

➡Our technique is effective in assisting the debugging of real failures.

Study 2: Comparison with Delta Debugging

RQ1: How much manual effort does each technique require?

RQ2: How long does it take to fix a considered failure given the information provided by

each technique?

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

Penumbra requires considerably less setup time than Delta Debugging (although more time time overall for gzip and ncompress).

RQ2: Debugging EffortUse number of relevant inputs as a proxy for debugging effort.

RQ2: Debugging Effort

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.

RQ2: Debugging Effort

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.

• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

RQ2: Debugging Effort

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.

• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

• Penumbra (DF + CF) is likely less effective for bc, pine, and squid

Conclusions & Future Work

• Novel technique for identifying failure-relevant inputs.

• Overcomes limitations of existing approaches

• Single execution

• Minimal manual effort

• Comparable effectiveness

• Combine Penumbra with existing code-centric techniques.

top related