Application of Graph Theory
to OO Software Engineering
Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides
Department of Applied Informatics
University of Macedonia
Thessaloniki, Greece
WISER 2006, May 20, 2006, Shanghai, China
Motivation
• Application of Graph Theory to SE is not new:
Planning: network diagrams (CPM, PERT)
Analysis: DFDs, FSMs, Petri Nets
Design: everything is essentially a graph
Testing: McCabe's complexity measure
. . .
• Graph Theory is suitable for object-oriented SE:
Class diagrams can be perfectly mapped to graphs
Identification of "God" classes
• Goal: to identify heavily loaded classes of an OO design
• such "God" classes imply a poor model
• Inspiration comes from the Web (HITS algorithm)
HITS
Eat Anything
Car Loans
Anti-Wrinkle
MyHumbleHomePage
Super Cars
Mykonos
AlternativeMusic
Relative Importance: Low
SEI
IEEE TSE
ACMSIGSOFT
MyHumbleHomePage
ICSE
GoF
NASA
Relative Importance: High
Identification of "God" classes
• OO system : directed graph G=(V, E)
• classes vertices
• associations edges
• Each edge is annotated with an integer mp,q corresponding to the
number of discrete messages sent to the same direction from p to q.
q1
q2
q3
pmq2, p
hq1
hq2
hq3
αp
q1
q2
q3
pmp, q2
αq1
αq2
αq3
hp
Identification of "God" classes
4
1
2
3
message7
message8
message2
message1
message6
message4
message3
message5
message9
message10
2
1 3
4
2 1
1 1
1
2
1 1
43214
43213
43212
43211
0210
1010
1102
0010
hhhha
hhhha
hhhha
hhhha
43214
43213
43212
43211
0110
2010
1101
0020
aaaah
aaaah
aaaah
aaaah
hAa T aAh
0110
2010
1101
0020
A
Identification of "God" classes
• Using theorems from Linear Algebra, authority/hub weights can be obtained by finding the principal eigenvectors of ATA and AAT
OvenButton
Door
Timer
Light
Power Tube
Beeper
1
2
3
4
5
6
7
add60sec
countDownsetT imeZero
exp ired
turnOff
turnOn
turnOff
turnOn
beep
cook
cancel
doorOpen
doorClose
isOpen
Identification of "God" classes
1 3
2
4 7
6
5
2
2 12
1
2
31
1 2 3 4 5 6 7
1
2
3
4
5
6
7
0
0
0
0
0
0
0
0
0
1
0
0
0
0
2
2
0
1
0
0
0
0
0
3
0
0
0
0
0
0
2
0
0
0
0
0
0
2
0
0
0
0
0
0
1
0
0
0
0
A =
229.0459.0459.0688.00229.00Tna
0000100Tnh
Clustering
• Goal: to partition the system into strongly communicating classes
• might imply relevance of functionality
• might imply possible reusable components
• Spectral graph partitioning employs the degree matrix (diagonal
matrix containing the degrees of vertices), and the
• Laplacian matrix, defined as L = D – A
• the smallest eigenvalue of L is always zero
Clustering
• the properties of the eigenvector x2 associated with the second
smallest eigenvalue λ2 have been explored by M. Fiedler
• Clustering a graph G into two sub-graphs according to the positive
and negatives entries of the Fiedler vector, corresponds to a partition
which minimizes the weight of the cut set.
11
6
7
1
9
6
7
Clustering
11
6
7
1
9
6
7
weightcut-set = 17
11
6
7
1
9
6
7
weightcut-set = 18
11
6
7
1
9
6
7
weightcut-set = 1
provided by Fiedler vector
Clustering
• Application to OO systems: edges are undirected and edge weight
is the sum of number of messages exchanged in both directions
• Partitioning is performed iteratively
• When to stop ? when a resulting graph is less cohesive than the
parent graph
Clustering
BusinessLogic
Entity1
Entity2
MainFrame
InputForm
Confirmation
DB
Connection Statement
Result
1 1
12
2
2
2
3 3
3
1 2
3 4
5
6
7 8
9
10
5
5
7
6
5
8
9
10
3 4
1 2
2
2
2
2
2
5
5
3
3
4
Clustering
7
6
5
8
9
10
3 4
1 2
2
2
2
2
2
5
5
3
3
4
]366.0,388.0,313.0,108.0,152.0,152.0,359.0,317.0,359.0,446.0[2 Tx
DB
]410.0,480.0,285.0,192.0,491.0,491.0[2 Tx
Logic GUI
Design Pattern Detection
• Design Patterns (descriptions of communicating classes): form solutions to common problems
• According to Parnas software engineering deals with multi-version projects
• Multiple Versions + Large Number of Components =
Complicated and messy architecture
• Patterns impose structure
• Consequently, the identification of implemented patterns
• is useful for understanding an existing design
• enables further improvements
Design Pattern Detection
+doIt()
I
+doIt()
A
+doIt()
D
core
+doIt()+doX()
X
+doIt()+doY()
Y
+doIt()+doZ()
Z
1
1
. . . . . .
. . .
+ further annotations
0 1 0 0 1 0 0 . . . 0 1 1 0 0 1 1 0 . . . 1
. . .
1 1 1 0 0 0 1 . . . 0
0 1 0 0 1 0 0 . . . 0 1 1 0 0 1 1 0 . . . 1
. . .
1 1 1 0 0 0 1 . . . 0
0 1 0 0 1 0 0 . . . 0 1 1 0 0 1 1 0 . . . 1
. . .
1 1 1 0 0 0 1 . . . 0
0 1 . . . 1 01 1 . . . 0 0 . . .
1 1 . . . 0 1
0 1 . . . 1 01 1 . . . 0 0 . . .
1 1 . . . 0 1
0 1 . . . 1 01 1 . . . 0 0 . . .
1 1 . . . 0 1
matches
Class Diagram (UML) System / Pattern
Graph Representation Representation as set of matrices
System under study
Sought Design Pattern
Design Pattern Detection
• Classical pattern matching algorithms fail since patterns often differ from the standard representation
A
B
C
1
2
a
b
System Segment 1 System Segment 2 Pattern
Design Pattern Detection
• Exploiting recent research on graph similarity [Blondel2004] it is possible to measure the degree of similarity between two vertices
Design Pattern Detection
Generalization Graphs A
B
C
a
b
1
2
Association Graphs A
B
C
a
b
1
2
similarity: 1
similarity: 1
similarity: 0
similarity: 0
similarity: 0.5
similarity: 0.5
similarity: 1
similarity: 1
A
B
C
a
b
1
2
Design Pattern Detection
• Experimental Results:
JHotDraw v5.1 (172 classes)
JRefactory 2.6.24 (572 classes)
JUnit 3.7 (99 classes)
Design Pattern Detection
JHotDraw v5.1 JRefactory v2.6.24 JUnit v3.7 Design Patterns TP FN TP FN TP FN
Adapter*/Command 18 0 7 0 1 0 Composite 1 0 0 0 1 0 Decorator 3 0 1 0 1 0
Factory Method 2 1 1 3 0 0 Observer 5 0 0 0 4 0 Prototype 1 0 0 0 0 0 Singleton 2 0 12 0 0 0
State/Strategy 22 1 11 1 3 0 Template Method 5 0 17 0 1 0
Visitor 1 0 2 0 0 0
Scale-Freeness of OO Systems
• Popular topic: investigation of whether certain systems (technological, biological, social etc) are scale-free
• A scale-free phenomenon shows up statistically in the form of power law.
• For a network, the probability P(k) that a node in the network connects with k other nodes is P(k) ~ k-γ
Scale-Freeness of OO Systems
• Naturally, research has also focused on OO systems
• Scale-freeness is usually graphically detected, since the relationship of P(k) vs. k, plotted on a log-log scale, appears as a line with slope -γ
1
10
100
1000
10000
1 10 100 1000
k
Cu
mm
ula
tive
Fre
qu
ency
JUnit
JHotDraw
JRefactory
Scale-Freeness of OO Systems
(a)
(b)
(c)
(d)
1
10
100
1 10 100
Vertex Degree
Cu
mu
lati
ve F
req
uen
cy
(e)
Degree Sequence = {16, 8, 8, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
Scale-Freeness of OO Systems
• Recently, in [Li2005], a structural metric has been proposed to evaluate the scale-freeness of a network.
• For an undirected, simple and connected graph g=(V,E)
jEji
iddgs
,
• The metric value is maximized when high-degree nodes ("hubs") are connected to other high-degree nodes.
• Among all graphs having the same degree sequence, there is a graph smax that maximizes the value of the metric s(g) and a graph smin that minimizes it. Thus:
minmax
min
ss
sgsS
Scale-Freeness of OO Systems
Given such a metric, it is possible:
• to validate whether a given OO system is scale-free
• to assess whether an optimization increases scale-freeness
• to evaluate the evolution of systems in terms of scale-freeness
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
versions
scal
e-fr
ee m
etri
c S
JUnit
JHotDraw
JRefactory
Conclusions
• Graph Theory has been widely applied on several CS fields
• It can provide a powerful "tool" for analyzing OO systems
• quantification of properties
• identification of structures
• Graph Theory is important for CS curricula