Program Understanding:What Programmers Really Want
Einar W. HøstMay 5th 2011
Evaluation Examples
DescriptionIntroduction
Consider Limitations of Current Tools
Look at somewell-known
Analyses
How Analysis can help
Understanding
Understanding Program
Understanding
AgendaAgenda
Program Analysis
Program Analysis
Program Understanding
Why is program understanding important?
It’s what programmers do!
“To program is to understand” - Kristen Nygaard
Why is program understanding hard?
Size.
Complexity.
Heterogenity.
Side-effects.
The concept assignment problem.
Intangibility.
=>
Disorientation.
“Few are the programmers who can explain their code well enough so that reading it is not incredibly frustrating.” - Jef Raskin
How can program analysis help?
$s.=($f=$ARGV[$_%2])x(substr$s,$_,1or$f)for0..500;
Program Understanding
?
Program Understanding
!
Through the magic lens of a Tool
$s.=($f=$ARGV[$_%2])x(substr$s,$_,1or$f)for0..500;
What do programmers want?
”Help me in finding what needs to be changed...
...make the change...
...and get the !$&?/% out!”
Just-in-time program understanding.
Complete understanding is
- Not realistic- Not cost-effective- Not necessary
Understand the program well enough to
implement a new feature
Understand the program well enough to
fix a bug
Understand the program well enough to
improve performance
Without change, there is no need for understanding!
Program understanding
- Understanding execution behavior- Understanding control flow- Understanding data flow- Understanding dependencies- Understanding side-effects- Understanding change impact- Reasoning about design
at the code level
Program understanding
- Understanding what is relevant
at the task level
Comprehension strategies
Comprehension strategies
Top-down
Reconstruct knowledge about the program domain and mapping it to the source code.
Comprehension strategies
Bottom-up
Read code statements and mentally group these statements into higher level abstractions.
Comprehension strategies
Systematic
Read code in detail,follow control flow,gain global understanding.
Comprehension strategies
Opportunistic
Scan code, looking for clues indicating relevance to the task at hand.
Comprehension strategies
Inquiries
conjecture
read
searchquestion
Comprehension strategies
Integrated
Understanding is formed by switching between different strategies as needed.
Tools for program understanding
Presentation
Analysis
Analysis tools
- Call graphs- Program slicing- Feature location- Effects of change- Software metrics
Presentation tools
- Visualisations- Context-awareness
What is a program?
print ”This is not a program.”
print(”This is not a program.”)
The program as
text
The program as
binary
The program as
versioned
The program as
process
The program ecosystem
- Source code / Compiled binary- Execution environment (runtime)- Test suite - Version control system- Issue tracking system- Integrated development environment- The programmer- External sources
What questions can we ask about a program?
Behavior-centric
- What does this piece of code do? - What happens if I change this?- Where is this code used?- Which parts of the program are affected by change?
Requires program code
Evolution-centric
- Who wrote this piece of code?- Who can help me understand this?- Why was this feature implemented this way? - What are the most bug-prone parts of the program?- Where do changes happen most often?
Requires program history
Interaction-centric
- How was this piece of code written?- Which tasks are hard to accomplish?- Which parts of the program are hard to change?
Requires coding-session history
What is program analysis?
Traditional program analysis
All you need is codeAll you need is code
All you need is code, codeCode is all you need
Static
Dynamic
Static
All possible executions
Static
Sound+
Conservative
Static
Trades precision
for soundness
Sample executions
Dynamic
Efficient+
Precise
Dynamic
Trades completeness
forefficiency
Dynamic
Static
Dynamic
Synergy
Program analysis for program understanding
Call graph construction
Call graph construction
Essential idea
Show calling relationships between parts of a program.
Call graph construction
Sample graph
main
f
g
h
Call graph construction
The ideal call graph
The relation describing exactly those calls made from one entity to another in any possible execution of the program.
Call graph construction
Benefit
Make the program seem less fragmented by showing the control flow links between program parts.
Call graph construction
Limitation
The graph becomes too large and complex to be comprehensible.
Call graph construction
”The sight of gcc's call graph frightened my students so much that they requested a different project”
- Arun Lakhotia
Limitation
Program slicing
Program slicing
Essential idea
Find a reduced program exhibiting the same behavior of interest as the full program.
Program slicing
The reduced program is called a program slice.
Program slicing
x : a statement in program PV : a subset of variables in P
C = < x, V >
Slicing criterion
1 begin
2 read(x,y)
3 total := 0.0
4 sum := 0.0
5 if x <= 1
6 then sum := y
7 else begin
8 read(z)
9 total := x*y
10 end
11 write(total, sum)
12 end.
1 begin
2 read(x,y)
5 if x <= 1
6 then
7 else
8 read(z)
12 end.
Criterion: <12, z>
1 begin
2 read(x,y)
12 end.
Criterion: <12, x> 1 begin
2 read(x,y)
3 total := 0.0
4 sum := 0.0
5 if x <= 1
6 then sum := y
7 else begin
8 read(z)
9 total := x*y
10 end
11 write(total, sum)
12 end.
Program slicing
Forward conditioning
What would the program look like if we assume an initial state satisfying C ?
Program slicing
Forward conditioning
Deletes statements that will not be executed given the initial state.
Program slicing
Backward conditioning
What would the program look like if we assume an eventual state satisfying C ?
Program slicing
Backward conditioning
Deletes statements which cannot lead to the eventual state.
Program slicing
Benefit
The reduced programis smaller.
Program slicing
Limitation
The reduced programlooks foreign.
Program slicing
Limitation
Hard to integrate into the programmer’s work flow.
Concept analysis
Concept analysis
Essential idea
Identify groupings of objects that have common attributes.
Concept analysis
Formal context
O Set of objectsA Set of attributesR Relation R⊆O ×A
C = (O, A, R )
Concept analysis
σ(O) = {a∈A⎮∀o∈O : (o, a)∈R}
Common attributes
O Set of objectsA Set of attributesR Relation R⊆O ×A
Concept analysis
τ(A) = {o∈O⎮∀a∈A : (o, a)∈R}
Common objects
O Set of objectsA Set of attributesR Relation R⊆O ×A
Concept analysis
A pair (O, A) is a conceptif A = σ(O) and O = τ(A)
Definition of concept
Concept analysis
O => SubprogramsA => Features
Feature location
Concept analysis
if subprogram s is invoked when feature f is invoked
Feature location
(s, f) ⊆ R
Concept analysis
Benefit
Identify parts of the program relevant for a feature.
Concept analysis
Benefit
Recovery of components and generation of high-level architecture views.
Concept analysis
Limitation
Imperfect high-level descriptions have limited application for concrete tasks.
Concept analysis
Limitation
Requires effort from the programmer, with unclear benefits.
Concept analysis
Limitation
Hard to integrate into the programmer’s work flow.
Change impact analysis
Change impact analysis
Essential idea
Identify the potential consequences of a program change.
Change impact analysis
Essential idea
Estimate what must be modified to accomplish a change of behavior.
Change impact analysis
Challenges in OO languages
Subtyping and dynamic dispatch means that change impact can be non-local and unexpected.
Add empty classDelete empty class
Add fieldDelete field
Add empty methodDelete empty method
Change method bodyChange method lookup
Change impact analysis
Atomic changes
Change impact analysis
Benefit
Provide a boundary around the effects of a program edit.
Change impact analysis
Benefit
Provide confidence that an edit does not have unexpected effects outside the boundary.
Change impact analysis
Limitation
Limited by the precision of the change impact analysis.
Change impact analysis
Limitation
Still need assurance that no unexpected effects occur in the impacted part of the program.
Change impact analysis
Limitation
Hard to integrate into the programmer’s work flow.
Program metrics
Program metrics
Essential idea
Quantify aspects of the program presumed to be relevant.
Program metrics
Code-centric metrics
- Lines of code- Depth of inheritance tree- Internal cohesion- Coupling to other elements- Cyclomatic complexity- Halstead complexity- ...
Program metrics
Other metrics
- Test coverage- Bug density- Change rate- ...
Program metrics
Benefit
Answer questions regardingprogram quality.
Program metrics
Benefit
Identify potential problemareas in the program.
Program metrics
Limitation
Metrics are indirectindicators of quality.
Program metrics
Limitation
Task-generating, not task-solving.
Software visualisation
Software visualisation
Goal
Avoid overwhelming the programmer by compressing information and using graphics.
Table lens
Table lens
Idea
Zoom out the table layout and display cells as pixel bars scaled and colored by data values
Table lens
Example
Table lens
Application
Present a compact view of software metrics for program elements.
Treemap
Treemap
Idea
Display hierarchical data as nested rectangles.
Treemap
Example
Treemap
Application
Compare metric values for various program elements.
Polymetric view
Polymetric view
Idea
Combine several metrics for an entity into a single view, by relating a visual aspect to each metric.
Polymetric view
Example
Polymetric view
Application
Present a compact view of software metrics for program elements.
Hierarchical edge bundles
Hierarchical edge bundles
Idea
Bundle association data with structure data in a radial view.
Hierarchical edge bundles
Example
Hierarchical edge bundles
Application
Compact view of associations between program elements.
What about task-orientedprogram understanding?
The paradox of software visualisation
Programmers use visualisation all the time when discussing programs informally.
Computers are good at presenting information graphically.
...and yet...
Software visualisation tools are rarely used for everyday software development and understanding.
Why?
Hypothesis
Research is out of touch with reality!
Claim
Software visualisationis a generic solution looking for problems.
Claim
Programmers need specific solutions to specific problems.
Remedy
An easy-to-use tool that lets the programmer define the problem that needs visualisation.
Task-aware environments
Task-aware environments
Idea
Adapt interface to reflect the task the programmer is currently working on.
Task-aware environments
Details
Introduces the notion of development sessions.
Task-aware environments
Details
Links development sessions to tasks.
Task-aware environments
Details
Sessions can be thrown away or saved for later.
Code bubbles
Evaluation Examples
DescriptionIntroduction
Consider Limitations of Current Tools
Look at somewell-known
Analyses
How Analysis can help
Understanding
Understanding Program
Understanding
AgendaSummary