components - graph based detection of library api limitations
DESCRIPTION
Paper: Graph-based Detection of Library API ImitationsAuthors: Chengnian Sun, Siau-Cheng Khoo, Shao Jie Zhang (All from National University of Singapore)Session: Research Track Session 7: ComponentTRANSCRIPT
Graph-based Detection of
Library API Imitations
October 6, 20111
Chengnian Sun, Siau-Cheng Khoo, Shao Jie Zhang
National University of Singapore
Motivation – Software Libraries
Common practice to employ 3rd-party software libraries
Providing certain functionalities / hiding implementation details
Improving productivity
Well tested
Enhancing program quality
Application Programming Interfaces (APIs)
Exported by libraries
Ways for programmers to interact with libraries
October 6, 20112
Motivation – Problem
APIs are not always effectively used by programmers
Imitation: client code re-implements the behavior of library
APIs
Reasons
Unfamiliar with the library,
Library evolution
Cost
Waste unnecessary resources, time and energy
Error-prone, software maintenance issue
October 6, 20113
Motivation – Example from JBoss
October 6, 20114
Motivation – Example from JBoss
October 6, 20115
Imitation (1): method.getInterceptors() == null ||
method.getInterceptors().length < 1
Motivation – Example from JBoss
October 6, 20116
Imitation (1): method.getInterceptors() == null ||
method.getInterceptors().length < 1
API: return (interceptors != null && interceptors.length > 0)
Motivation – Example from JBoss
October 6, 20117
Imitation (1): method.getInterceptors() == null ||
method.getInterceptors().length < 1
Refactor to: !method.hasAdvices()
Motivation – Example from JBoss
October 6, 20118
Refactor to: !method.hasAdvices()
Imitation (1): method.getInterceptors() == null ||
method.getInterceptors().length < 1
Motivation
October 6, 20119
A library API imitation can be
Not exactly the same
Inter-procedural
Motivation
October 6, 201110
A library API imitation can be
Not exactly the same
Inter-procedural
Goal: to accurately detect such imitations
Detection of Library API Imitations
Motivation
Definitions
Data Dependency Graph
Trace & Subtrace
Trace Subsumption
Potential Imitation
Algorithms
Pre- & Post-processing
Case Studies
Conclusion
October 6, 201111
Definitions – Overview
October 6, 201112
Employing Data Dependency Graphs (DDG) to represent
code
Semantic representation
Capturing data flows within a method
Carrying a portion of control flow information
A library DDG is trace-subsumed by a client DDG
potential API imitation
Relaxation of sub-graph isomorphism
More efficient
Minor-difference tolerant
Definitions – Data Dependency Graph
October 6, 201113
DDG – a graphical representation of a method
Vertices: basic statements (three address form)
Edges v u: direction represents data dependency
vertex u is data dependent on vertex v
a variable var
defined at v
used at u
and there is an execution path P from v to u, and along P, the
var is not redefined.
Definitions – Trace & Subtrace
October 6, 201114
A trace in a data dependency graph
A path of vertices, <v1, v2, …, vm>
The first vertex is an entry of the graph
Definitions – Trace & Subtrace
October 6, 201115
A trace in a data dependency graph
A path of vertices, <v1, v2, …, vm>
The first vertex is an entry of the graph
Given two traces T1 = <v1, v2, …, vm> and T2 = <u1, u2, …, un>, T1
is a subtrace of T2 (T1 ≤ T2) if there exists an integer i,
0 ≤ i ≤ n – m
match(v1, u1 + i), match(v2, u2 + i), …, match(vm, um + i)
Subtrace is a generalization of substring relation.
T1 = <C, D, E>
T2 = <A, B, C, D, E, F>
Definitions – Trace & Subtrace
October 6, 201116
A trace in a data dependency graph
A path of vertices, <v1, v2, …, vm>
The first vertex is an entry of the graph
Given two traces T1 = <v1, v2, …, vm> and T2 = <u1, u2, …, un>, T1
is a subtrace of T2 (T1 ≤ T2) if there exists an integer i,
0 ≤ i ≤ n – m
match(v1, u1 + i), match(v2, u2 + i), …, match(vm, um + i)
Subtrace is a generalization of substring relation.
T1 = <C, D, E>
T2 = <A, B, C, D, E, F>
i = 2
Definitions – Trace Subsumption
October 6, 201117
A data dependency graph Glib
A data dependency graph Gclt
Gclt trace subsumes Glib , if and only if
for each trace there exists at least one trace
such that is a subtrace of
Definitions – Potential Imitation
October 6, 201118
A client method Clt potentially imitates a library
method Lib, if
A DDG Gclt of Clt, resulting from inlining zero or some
method calls into Clt
A DDG Glib of Lib, resulting from inlining zero or some
method calls into Lib
Gclt trace subsumes Glib
Detection of Library API Imitations
Motivation
Definitions
Algorithms
Overall Algorithm
Trace Subsumption Checking
Pre- & Post-processing
Case Studies
Conclusion
October 6, 201119
Algorithms – Overall Algorithm
October 6, 201120
Input
A library API Lib
A client method Clt
A set S of all method calls in both Lib and Clt
Output true if Clt potentially imitates Lib
Body
for each sub-set s of S {
Lib’ = a copy of Lib with calls in s inlined
Clt’ = a copy of Clt with calls in s inlined
if the DDG of Clt’ trace subsumes the DDG of Lib’
return true
}
return false;
Algorithms – Trace Subsumption
October 6, 201121
Input
A DDG of a library API Glib
A DDG of a client method Gclt
Output
true if Gclt trace subsumes Glib
Depth-first Search,
Step-by-step checking
Algorithms – An Example
October 6, 201122
Current:
Stack:
Algorithms – An Example
October 6, 201123
Locating all vertices in client matching each entry of the library (A, {A, A})Stack:
Current:
Algorithms – An Example
October 6, 201124
Locating client vertices matching library A’s successor D Stack:
Current: (A, {A, A})
Algorithms – An Example
October 6, 201125
Locating client vertices matching library A’s successor D (D, {D})Stack:
Current: (A, {A, A})
Algorithms – An Example
October 6, 201126
Locating client vertices matching library A’s successor B (D, {D})Stack:
Current: (A, {A, A})
Algorithms – An Example
October 6, 201127
Locating client vertices matching library A’s successor B (B, {B})
(D, {D})
Stack:
Current: (A, {A, A})
Algorithms – An Example
October 6, 201128
Locating client vertices matching B’s successor {} in library (D, {D})Stack:
Current: (B, {B})
Algorithms – An Example
October 6, 201129
Locating client vertices matching library D’s successor M Stack:
Current: (D, {D})
Detection of Library API Imitations
Motivation
Definitions
Algorithms
Pre-processing & Post-validation
Case Studies
Conclusion
October 6, 201130
Pre-processing Libraries
October 6, 201131
Remove nullness checks
Remove assertions
Remove exception handlers
If (a ==) {
return Constant;
} else {
a.XXX();
}
if (…)
throw Exception();
…….
try {
} catch (…) {}
Post-validating Reported Imitations
October 6, 201132
Reject the following two cases
Unmatched InlinedVertices in Client
Matching All References to Library Locals
Detection of Library API Imitations
Motivation
Definitions
Algorithms
Pre-processing & Post-validation
Case Studies
Conclusion
October 6, 201133
Case Studies
October 6, 201134
Evaluation measure
Subjects – 10 open-source Java projects
Testbed:
Intel Core 2 Quad CPU 3.00GHz and 8GB memory
Case Studies – Two Experiments
October 6, 201135
Detecting Imitations of Imported Libraries
Testing all method pairs (lib, clt), where the declaring class of
lib is already imported in the client class
Precision = 313 / 383 = 82%
Runtime = 314 seconds
Case Studies – Two Experiments
October 6, 201136
Detecting Imitations of Imported Libraries
Testing all method pairs (lib, clt), where the declaring class of
lib is already imported in the client class
Precision = 313 / 383 = 82%
Runtime = 314 seconds
Detecting Imitations of Static Libraries
Testing all method pairs (lib, clt), where lib is a public static
method
Precision = 116 / 155 = 75%
Runtime = 396 seconds
Case Studies – Example of Static API
October 6, 201137
Conclusion
October 6, 201138
A common practice to employ 3rd party software libraries
Client code re-implements behavior of existing APIs
An algorithm based on data dependency graphs to detect
complex imitations
Average precision 82% & 75%
Thank you.
Q&A
October 6, 201139