are automated debugging techniques actually helping programmers

Are Automated Debugging Techniques Actually Helping

Programmers?Chris Parnin Georgia Tech

@chrisparnin (twitter)

Alessandro (Alex) OrsoGeorgia Tech

@alexorso (twitter)

Finding bugs can be hard…

Automated debugging to the rescue!

I’ll help you find location

of bug!

How it works (Ranking-Based)I have calculated most likely location of bug!

Give me a failing program.

Calculating…

Here is your ranked list of statements.

How it works (Ranking-Based)

Here is your rankedlist of statements.

I have calculated most likely location of bug!

Give me input.

Calculating…

But how does a programmer usea ranked list of statements?

…

Conceptual Model

Here is a list of places to check out

1)

2)

3)

4)

Ok, I will check out your suggestions

one by one.

…

Conceptual Model

1)

2)

3)

4)

Found the bug!

Does the conceptual model make sense?

Have we evaluated it?

A Skeptic

Let’s see…Over 50 years of researchon automated debugging.

1999. Delta Debugging

1962. Symbolic Debugging (UNIVAC FLIT)

1981. Weiser. Program Slicing

2001. Statistical Debugging

Did you see anything?

Only 5 papers have evaluated automated debugging techniques

with actual programmers.

• Most find no benefit• Most done on programs < 100 LOC

More generally, two points

Techniques rely on two strong assumptions

Do you see a bug?

Assumption #1: Perfect bug understanding must also exist when using automated tool.

Assumption #2Programmer inspects statements linearly

and exhaustively until finding bug.

Is this realistic?

Conceptual model: What if gave a developer a list of statements to inspect?

How would they use the list?

Would they be able to see the bug after visiting it?

Is ranking important?

Benefit: What if we evaluate programmers with and without automated debuggers?

> ?

We also could observe what works and what

doesn’t.

Study Setup

34 Developers

2 Debugging Tasks

Automated debugging tool

Study SetupParticipants:

34 developersMS/Phd StudentsDifferent levels of expertise (low,medium,high)

Study Setup

Software subjects:Tetris (2.5 kloc)NanoXML (4.5 kloc)

21

Study Setup

Tools:Traditional debugger

Eclipse ranking plugin(logged activity)

Study Setup

Tasks:Debugging fault30 minutes per taskQuestionnaire at end

Bugs

Bug #1: Pressing rotate key causes square figure to move up!

Bugs

When running the NanoXML program (main is in class Parser1_vw_v1), the following exception is thrown:Exception in thread "main" net.n3.nanoxml.XMLParseException: XML Not Well-Formed at Line 19: Closing tag does not match opening tag: `ns:Bar' != `:Bar'at net.n3.nanoxml.XMLUtil.errorWrongClosingTag(XMLUtil.java:497)at net.n3.nanoxml.StdXMLParser.processElement(StdXMLParser.java:438)

at net.n3.nanoxml.StdXMLParser.scanSomeTag(StdXMLParser.java:202)at net.n3.nanoxml.StdXMLParser.processElement(StdXMLParser.java:453)at net.n3.nanoxml.StdXMLParser.scanSomeTag(StdXMLParser.java:202)at net.n3.nanoxml.StdXMLParser.scanData(StdXMLParser.java:159)at net.n3.nanoxml.StdXMLParser.parse(StdXMLParser.java:133)at net.n3.nanoxml.Parser1_vw_v1.main(Parser1_vw_v1.java:50)

The input, testvm_22.xml, contains the following input xml document:<Foo a=”test”> <ns:Bar> <Blah x=”1” ns:x=”2”/> </ns:Bar></Foo>

Bug #2: Exception on input xml document.

Study Setup: Groups

26

Study Setup: GroupsA B

Study Setup: GroupsRank

Rank

C D

Results

How do developers use a ranked list?

37% of visits jumped avg. 10.Navigation pattern zig-zagged

(avg. 10 zigzags)

Low performers did follow list.

Survey says searched through

statements.

Is perfect bug understanding realistic?

Only 1 out of 10 programmers who clicked on bug stopped investigation.

The others spent on average ten minutes continuing investigation.

Are automated toolsspeeding up debugging?

=Automated group Traditional

No✘



No✘


No✘

Rank



No✘

But… Stratifying Participants

Low Performers

✘ ✘

Medium Performers

✘✔

High Performers

✔ ✔

Significant difference for “experts”

High Performers

✔ ✔On average, 5 minutes faster


=

ExpertsExperts>

Automated group Traditional

No✘

Yes!✔

Automated group Traditional

Observations

Developers searched through statements.

Developers without tool fixed symptoms (not problem).

Developers wanted explanations rather than recommendations.

Future directions

39

Moving beyond fault space reduction

We can keep building

better tools.

But we can’t keep abstracting away

the human.

40

Performing further studies

Does different granularity work better for inspection? Documents? Methods?

How does different interfaces or visualizations impact technique?

Do other automated debugging techniques fare any better?

How do developers use a ranked list?

Is perfect bug understanding realistic?

Are Automated Debugging Tools Helpful?

Human studies, human studies, human studies!

42

64,000,000 miles800,000 miles

1969 2004

35 years of Scientific Progress

43

352 LOC(median 8 programs)

63.5 LOC(median 4 programs)

1981 2011

30 years of Scientific Progress

30 years

are automated debugging techniques actually helping programmers

Technology

automated debugging

likely location of bug

ranked list of statements

list of places

symbolic debugging univac

rankedlist of statements

conceptual model1234here

conceptual model1234found