summarizing the content of large traces to facilitate the understanding of the behaviour of a...

22
Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge ICPC 2006 Athens, Greece

Upload: mariah-day

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System

Abdelwahab Hamou-Lhadj Timothy Lethbridge

ICPC 2006Athens, Greece

Page 2: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

2

Motivation Software engineers need to explore traces

To understand an unexpected behaviour For general understanding

Traces tend to be excessively large and complex Hard to understand

Tools are needed Studies conducted at QNX and Mitel confirm this

Page 3: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

3

Limitations of Existing Techniques

Existing tools have key limitations:

1. Manual exploration has to be done bottom-up: Start from a full, complex trace Apply filters and searches to uncover

needed information Often difficult to perform

2. They do not interoperate well Limiting sharing of data and techniques

Page 4: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

4

Trace Summarization Goal: Permit top-down or middle-out

exploration of traces Top-down: start with small summary, then

selectively expand Middle-out: start with a much simplified

trace, then selectively contract and expand

Initial higher-level trace view enables quicker comprehension

Searching in hidden details could still be available

Page 5: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

5

What is a Trace Summary?

Definition of a text summary (S. K. Jones) “a derivative of a source text condensed by

selection and/or generalization on important content”

Summarizing traces is analogous to summarizing text Select the most important information or removing

information of least importance Generalize by treating similar things as the same

Showing only one instance of an iteration or recursion Treating similar elements and patterns as if they were

the same

Page 6: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

6

Content Selection in Traces

Find implementation details and remove them Calls to well-known libraries, classes and

functions of little interest to the abstract Math functions, string comparison, user

interface calls (perhaps), etc. Automatically-detected utilities

discussed later

Page 7: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

7

Content Generalization in Traces

Replace specific content with more abstract information Show only one instance of an iteration or

recursion Treat similar sequences of events as if they were

the same by varying a similarity function used to compare sequences of calls E.g. ABC, ABBC, ABABABC --> ABC

Identify patterns found in many traces A library of these can be built, so each time one works

with a similar trace, known patterns can be flagged Can be replaced with a user-defined label

Page 8: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

8

Trace Summarization Process

Step 1: Set the parameters for the summarization process o When to stop the process

- How much detail is desiredo Known implementation details (libraries etc.)o Known patternso Similarity function to useo Other algorithm parameters

Step 2: Run the selection and generalization Step 3: Output the result in a format that can

be manipulated by the analyst

Page 9: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

9

Trace Summarization Process (Cont.)

After Step 3, the maintainer can evaluate the result, and if not satisfied: Adjust the parameters and run the process

again, or Manually manipulate the output

Contract the trace further Expand various branches

Page 10: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

10

A Key Step: Detecting Utilities

A utility: Is something called from several

places Can be packaged in a non-utility

module Is used to facilitate implementation

rather than being a core part of the architecture

Page 11: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

11

Utilityhood Metric

N: size of the static call graph built from the system under study

U(r) ranges from 0 not a utility) 1 (most likely to be a utility)

U(r) =N

Fanin(r)x

Log(N)

Log( )Fanout(r) + 1

N

Page 12: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

12

Automatic Detection of Utilities

Routines Fanin Fanout U

r2 6 0 0.86

r5 3 1 0.27

r6 2 2 0.12

r3 1 2 0.06

r7 1 2 0.06

r4 1 4 0.02

r1 0 3 0.00

r3

r4r2

r6

r5

r7

r1

Utilities at narrower scopes

r5

Page 13: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

13

Some Considerations To detect utilities, it is important to have

available the static call graph Instead of the dynamic call graph

The dynamic graph will give a false impression of the extent to which something is a utility

Polymorphic calls can be resolved using various approaches in the literature

Hard to determine the scope of a utility if the system architecture is not clear Architecture recovery techniques can be a

useful adjunct

Page 14: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

14

Case Study

Target System: Weka System: Machine learning algorithms Object-oriented, written in Java 10 packages, 147 classes, 1642 public methods,

and 95 KLOC.

Process Description: Instrument the system Run the system by selecting a software feature Generate a static call graph from the Weka structure Apply the trace summarization algorithm

Page 15: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

15

Setting the Algorithm Parameters

Exit condition: The number of distinct subtrees of the

summary is 10% of the number of subtrees of the initial

Implementation Details: Accessing methods, constructors,

methods of inner classes, user-defined utilities

Page 16: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

16

Quantitative Results

Initial Trace

Step 2: Removal of implementationDetails

Step 3: Automatic Removal of Utilities

Manual Manipulation

Number of calls

97413 31102 32% 3219 3% 453 0.5%

Number of distinct subtrees

275 120 44% 67 24% 26 10%

Number of distinct methods

181 95 52% 51 28% 26 14%

Manual manipulation was performed using a trace analysis tool called SEAT

Page 17: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

17

Validation

The summary was converted into a UML sequence diagram

Participants: Nine software engineers with good to excellent knowledge of Weka

Evaluation focused on the ability of the summary to represent the main events of the trace in the subjective view of the participants

Page 18: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

18

Main Questions Asked

Q1. How would you rank the quality of the summary with respect to whether it captures the main interactions of the traced scenario?

Q2. If you designed or had to design a sequence diagram (or any other behavioural model) for the traced feature while you were designing the Weka system, how similar do you think that your sequence diagram would be to the extracted summary?

Q3. In your opinion, how effective can a summary of a trace be in software maintenance?

Page 19: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

19

Feedback of the Participants

Questions P1 P2 P3 P4 P5 P6 P7 P8 P9 Average

Q1 (Quality) 4 4 4 4 4 4 5 5 3 4.1

Q2 (Diagram) 4 5 3 4 4 4 4 5 3 4

Q3 (Effectiveness) 4 4 5 5 5 4 4 5 4 4.4

‘Very poor’ (score of 1) and ‘Excellent’ (score of 5)

Page 20: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

20

Observations

Participants agreed that The summary is a good high-level representation of the

traced feature Summaries can help understand the dynamics of a

poorly documented system

The level of details needed varies from one participant to another E.g. P3 (an expert) commented that more details might

be needed A tool that must allow manipulation of the level of details

displayed

Page 21: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

21

Conclusions

Summarizing large traces can help understand the features of a system and causes of problems

The approach Uses a mix of selection and generalization Facilitates quick iteration to arrive at the most

useful summary in the eyes of the uer Detecting utilities with a utilityhood metric is

important Case study results show that method is

promising

Page 22: Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge

22

Future Directions

Experiments with many other traces to further validate the approach

Improve tools to speed iteration and interactivity

Explore variants on the approach to utility detection