summarizing the content of large traces to facilitate the understanding of the behaviour of a...
TRANSCRIPT
Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System
Abdelwahab Hamou-Lhadj Timothy Lethbridge
ICPC 2006Athens, Greece
2
Motivation Software engineers need to explore traces
To understand an unexpected behaviour For general understanding
Traces tend to be excessively large and complex Hard to understand
Tools are needed Studies conducted at QNX and Mitel confirm this
3
Limitations of Existing Techniques
Existing tools have key limitations:
1. Manual exploration has to be done bottom-up: Start from a full, complex trace Apply filters and searches to uncover
needed information Often difficult to perform
2. They do not interoperate well Limiting sharing of data and techniques
4
Trace Summarization Goal: Permit top-down or middle-out
exploration of traces Top-down: start with small summary, then
selectively expand Middle-out: start with a much simplified
trace, then selectively contract and expand
Initial higher-level trace view enables quicker comprehension
Searching in hidden details could still be available
5
What is a Trace Summary?
Definition of a text summary (S. K. Jones) “a derivative of a source text condensed by
selection and/or generalization on important content”
Summarizing traces is analogous to summarizing text Select the most important information or removing
information of least importance Generalize by treating similar things as the same
Showing only one instance of an iteration or recursion Treating similar elements and patterns as if they were
the same
6
Content Selection in Traces
Find implementation details and remove them Calls to well-known libraries, classes and
functions of little interest to the abstract Math functions, string comparison, user
interface calls (perhaps), etc. Automatically-detected utilities
discussed later
7
Content Generalization in Traces
Replace specific content with more abstract information Show only one instance of an iteration or
recursion Treat similar sequences of events as if they were
the same by varying a similarity function used to compare sequences of calls E.g. ABC, ABBC, ABABABC --> ABC
Identify patterns found in many traces A library of these can be built, so each time one works
with a similar trace, known patterns can be flagged Can be replaced with a user-defined label
8
Trace Summarization Process
Step 1: Set the parameters for the summarization process o When to stop the process
- How much detail is desiredo Known implementation details (libraries etc.)o Known patternso Similarity function to useo Other algorithm parameters
Step 2: Run the selection and generalization Step 3: Output the result in a format that can
be manipulated by the analyst
9
Trace Summarization Process (Cont.)
After Step 3, the maintainer can evaluate the result, and if not satisfied: Adjust the parameters and run the process
again, or Manually manipulate the output
Contract the trace further Expand various branches
10
A Key Step: Detecting Utilities
A utility: Is something called from several
places Can be packaged in a non-utility
module Is used to facilitate implementation
rather than being a core part of the architecture
11
Utilityhood Metric
N: size of the static call graph built from the system under study
U(r) ranges from 0 not a utility) 1 (most likely to be a utility)
U(r) =N
Fanin(r)x
Log(N)
Log( )Fanout(r) + 1
N
12
Automatic Detection of Utilities
Routines Fanin Fanout U
r2 6 0 0.86
r5 3 1 0.27
r6 2 2 0.12
r3 1 2 0.06
r7 1 2 0.06
r4 1 4 0.02
r1 0 3 0.00
r3
r4r2
r6
r5
r7
r1
Utilities at narrower scopes
r5
13
Some Considerations To detect utilities, it is important to have
available the static call graph Instead of the dynamic call graph
The dynamic graph will give a false impression of the extent to which something is a utility
Polymorphic calls can be resolved using various approaches in the literature
Hard to determine the scope of a utility if the system architecture is not clear Architecture recovery techniques can be a
useful adjunct
14
Case Study
Target System: Weka System: Machine learning algorithms Object-oriented, written in Java 10 packages, 147 classes, 1642 public methods,
and 95 KLOC.
Process Description: Instrument the system Run the system by selecting a software feature Generate a static call graph from the Weka structure Apply the trace summarization algorithm
15
Setting the Algorithm Parameters
Exit condition: The number of distinct subtrees of the
summary is 10% of the number of subtrees of the initial
Implementation Details: Accessing methods, constructors,
methods of inner classes, user-defined utilities
16
Quantitative Results
Initial Trace
Step 2: Removal of implementationDetails
Step 3: Automatic Removal of Utilities
Manual Manipulation
Number of calls
97413 31102 32% 3219 3% 453 0.5%
Number of distinct subtrees
275 120 44% 67 24% 26 10%
Number of distinct methods
181 95 52% 51 28% 26 14%
Manual manipulation was performed using a trace analysis tool called SEAT
17
Validation
The summary was converted into a UML sequence diagram
Participants: Nine software engineers with good to excellent knowledge of Weka
Evaluation focused on the ability of the summary to represent the main events of the trace in the subjective view of the participants
18
Main Questions Asked
Q1. How would you rank the quality of the summary with respect to whether it captures the main interactions of the traced scenario?
Q2. If you designed or had to design a sequence diagram (or any other behavioural model) for the traced feature while you were designing the Weka system, how similar do you think that your sequence diagram would be to the extracted summary?
Q3. In your opinion, how effective can a summary of a trace be in software maintenance?
19
Feedback of the Participants
Questions P1 P2 P3 P4 P5 P6 P7 P8 P9 Average
Q1 (Quality) 4 4 4 4 4 4 5 5 3 4.1
Q2 (Diagram) 4 5 3 4 4 4 4 5 3 4
Q3 (Effectiveness) 4 4 5 5 5 4 4 5 4 4.4
‘Very poor’ (score of 1) and ‘Excellent’ (score of 5)
20
Observations
Participants agreed that The summary is a good high-level representation of the
traced feature Summaries can help understand the dynamics of a
poorly documented system
The level of details needed varies from one participant to another E.g. P3 (an expert) commented that more details might
be needed A tool that must allow manipulation of the level of details
displayed
21
Conclusions
Summarizing large traces can help understand the features of a system and causes of problems
The approach Uses a mix of selection and generalization Facilitates quick iteration to arrive at the most
useful summary in the eyes of the uer Detecting utilities with a utilityhood metric is
important Case study results show that method is
promising
22
Future Directions
Experiments with many other traces to further validate the approach
Improve tools to speed iteration and interactivity
Explore variants on the approach to utility detection