sentence-to-code traceability recovery with domain ontologies
DESCRIPTION
Presented at APSEC 2010 http://dx.doi.org/10.1109/APSEC.2010.51TRANSCRIPT
Shinpei Hayashi, Takashi Yoshikawa, Motoshi SaekiTokyo Institute of Technology, Japan
Sentence-to-CodeTraceability Recovery
with Domain Ontologies
NL
sent
ence
Source code
Results: It works well!− Implemented an automated tool− Performed a case study using JDraw
• Recovered traceability between 7 sentences and code• Obtained more accurate results than the cases
without using ontology
Documentation-to-code traceability links are important− For reducing maintenance costs − For software reuse & extension
Our focus: recoveringsentence-to-code traceability− In some software products, there’s only
documentation of simple sentences without any detailed descriptionE.g., by use of agile/bazaar-style processes
Background
4
JDraw (http://jdraw.sf.net/)
Example of Sentence
5
JDraw (http://jdraw.sf.net/)
Example of Sentence
JDraw (http://jdraw.sf.net/)
Example of Sentence
6
plain, filled andgradient filled rectangles
Just single sentence, no detailed documents
7
To detect the set of methods related to the input sentence
Aim
Users can drawa plain oval.
NL sentence(a set of words)
draw Oval()
getColor()
OvalTool()DrawPanel()
writeLog()
getCanvas()
setPixel()
getColorPallete()
Source code (a set of methods)
8
How to find the correct set?− Word similarity: leads to false positives/negatives
Challenge
Users can drawa plain oval.
NL sentence(a set of words)
getColor()
OvalTool()DrawPanel()
writeLog()
getCanvas()
setPixel()
getColorPallete()
Source code (a set of methods)
Falsenegatives
Falsepositives
void setPixel(...) {... draw ...
}
draw Oval()
9
How to find the correct set?− Word similarity: leads to false positives/negatives− Method invocation: leads to false positives
Challenge
Users can drawa plain oval.
NL sentence(a set of words)
getColor()
OvalTool()DrawPanel()
writeLog()
getCanvas()
setPixel()
getColorPallete()
Source code (a set of methods)Falsepositive
drawOval()
10
Another criterion required− To judge whether a method invocation is needed− Considering the problem domain
Challenge
Users can drawa plain oval.
NL sentence(a set of words)
drawOval()
getColor()
OvalTool()DrawPanel()
writeLog()
getCanvas()
setPixel()
getColorPallete()
Source code (a set of methods)
Notimportantimportant
11
Formally representing the knowledge of the target problem domain− As relationships between concepts (words)
Domain Ontology
An ontology forpainting tools (excerpt)
canvas
draw
oval color
A concept “canvas” is a possible targetto “draw”.
The function “draw”concerns the concept “color”.
12
Choosing method invocationsusing domain ontology
Solution
Users can drawa plain oval.
NL sentence(a set of words)
draw Oval()
getColor()
OvalTool()DrawPanel()
writeLog()
getCanvas()
setPixel()
getColorPallete()
Source code (a set of methods and their invocations)
Ontology
canvas
draw
oval color
13
System OverviewInputs
FunctionalCall-graphs
(FCGs)
Outputs
Sentence
SourceCode
DomainOntology
2ndscore
85
1stscore100
14
Procedure
m1
m4
… draw oval
m2 m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
1. Extractingcall-graph
2. Extracting words3. Extracting functional
call-graphs (FCGs)4. Prioritizing FCGs
15
Procedure
m1
m4
… draw oval
m2 m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
1. Extractingcall-graphby static analysis
2. Extracting words3. Extracting functional
call-graphs (FCGs)4. Prioritizing FCGs
16
Procedure
m1
m4
… draw oval
m2 m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
1. Extractingcall-graph
2. Extracting words• Stemming• Removing stopwords
3. Extracting functional call-graphs (FCGs)
4. Prioritizing FCGs
{..., draw, oval}{..., draw}
{..., draw}
{..., oval}{..., color}
{..., pixel} {..., log}
{..., color}
{..., scale}
17
Procedure
m1
m4
… draw oval
m2 m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
Sa
Sb
1. Extractingcall-graph
2. Extracting words3. Extracting
functional call-graphs (FCGs)
4. Prioritizing FCGs
18
Procedure
m1
m4
… draw oval
m2 m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
Sa
Sb
Score100
1. Extractingcall-graph
2. Extracting words3. Extracting functional
call-graphs (FCGs)4. Prioritizing FCGs
Score85
19
Extracting FCGs
m1
m4
{wa, wb}
1. Root selection
2. Traversal
m2 m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
20
Extracting FCGs
m1role: {wa}
m4role: {wb}
{wb, ...}
{wa, ...}{wa, wb}
1. Root selection– Choose the methods
having the words in the input sentence
– These words are tagged as roles
2. Traversal
m2role: {wb}
m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
{wb, ...}
21
Extracting FCGs
{wa, wb}
1. Root selection2. Traversal– Traverse method
invocations fromthe roots
– Forwards if the invocation satisfies one of the traversal rules
m3
m5
m6 m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
m4role: {wb}
m2role: {wb}
m1role: {wa}
m6 m7 m8
m4role: {wb}
1. Root selection2. Traversal
– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules
Source code (a set of methods and their invocations)
m3
m5
22
Extracting FCGs
m2role: {wb}
Rule #1 (Sentence-based)holds if the callee methodhas a word in the givensentence
{wa, wb}
NL sentence(a set of words)
m1role: {wa}
{..., wb}
m6 m7 m8
m4role: {wb}
{wa, wb}
1. Root selection2. Traversal
– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules
NL sentence(a set of words) Source code
(a set of methods and their invocations)
m2role: {wb}
m5
23
Extracting FCGs
m1role: {wa}
Rule #2 (Ontology-based)
holds if the method invocationmatches the relationships inthe given ontology
m3role: {wc}
{..., wc}
m6 m7 m8
m4role: {wb}
{wa, wb}
1. Root selection2. Traversal
– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules
NL sentence(a set of words) Source code
(a set of methods and their invocations)
m1role: {wa}
m2role: {wb}
24
Extracting FCGs
m5role: {wc}
m3role: {wc}Rule #3 (Inheritance)
holds if the callee methodhas a word in the roles of the caller method
{..., wc}
25
Ontology-Based Rule
canvas
draw
Holds if the invocation matches the relationships in the given ontology
Ontology
Caller
CalleegetCanvas()role: {canvas}
drawOval()role: {draw, oval}
26
Extracting FCGs
{wa, wb}
1. Root selection2. Traversal– Traverse method
invocations from the roots
– Traversed methods will be a FCG
m2role: {wb}
m5role: {wc}
m6role: {…}
m7 m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
m1role: {wa}
m3role: {wc}
m4role: {wb}
Sa
Sb
27
Using an weighting scheme Criteria: we prioritize FCGs which
1. include methods having important roles as their names
2. include method invocations matching to the relationships in the ontology and/or the sentence
3. cover many words in the input sentence
Prioritizing FCGs
28
Evaluation target: JDraw 1.1.5− We picked up 7 sentences from JDraw's manual
on the Web− Prepared an ontology for painting tools
(including 38 concepts and 45 relationships)− Prepared control (answer) sets of methods by an
expert
Evaluation Criteria− Calculating precision and recall values by
comparing the extracted sets with the control sets
Case Study
29
Use of ontologyInput sentences
YesPrec.
Yes Recall
NoPrec.
NoRecall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -
Results
Picked up the FCGs having the highest F-measure from the results ranked in the top 5− F-measure = 2 / (precision-1 + recall-1)
30
Use of ontologyInput sentences
YesPrec.
Yes Recall
NoPrec.
NoRecall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -
Results
Accurate results by using the ontology− precision > 0.7 for 3 cases− recall > 0.9 for 4 cases
31
Use of ontologyInput sentences
YesPrec.
Yes Recall
NoPrec.
NoRecall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -
Results
Improvement by using the ontology− Improved recall for 1st and 2nd cases− detected traceability for 3rd case
Domain ontology gives us valuable guidesfor sentence-to-code traceability recovery.
Future Work Case study++
− Larger− Another/multiple domains
Supporting ontology construction Improving NLP techniques Combination of other techniques
− Dynamic analysis− Relevance feedback
5
JDraw (http://jdraw.sf.net/)
Example of Sentence
12
Choosing method invocationsusing domain ontology
Solution
Users can drawa plain oval.Users can drawa plain oval.
NL sentence(a set of words)
draw Oval()draw Oval()
getColor()getColor()
OvalTool()OvalTool()DrawPanel()DrawPanel()
writeLog()writeLog()
getCanvas()getCanvas()
setPixel()
getColorPallete()getColorPallete()
Source code (a set of methods and their invocations)
Ontology
canvascanvas
drawdraw
ovaloval colorcolor
28
Procedure
{wa, wb}{wa, wb}
1. Root selection2. Traversal
– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules
– Traversed methods will be a FCG
m2role: {wb}m2
role: {wb}
m5role: {wc}m5
role: {wc}
m6role: {…}m6
role: {…}m7m7 m8m8
NL sentence(a set of words) Source code
(a set of methods and their invocations)
m1role: {wa}
m1role: {wa}
m3role: {wc}m3
role: {wc}
m4role: {wb}m4
role: {wb}
Sa
Sb
34
Results
Improvement by using the ontology− Improved recall for 1st and 2nd cases− detected traceability for 3rd case
Domain ontology gives us valuable guidesfor sentence-to-code traceability recovery.
additional slides
39
System Overview
Inputs
Ordered functionalCall-graphs
Outputs
Sentence
SourceCode
Call-graph
DomainOntology
Prioritizing
FunctionalCall-graphs
Words in theSentence
Words in theCode
Traversingcall-graph
static analysis
Extracting Wordssplitting, stemming,
removing stop words...extractidentifiers
class Tool ....class OvalTool extends Tool ...class Drawer {
public void startDraw() {Tool tool =
Tool.getCurrent();tool.draw();
}}
Tooldraw()
OvalTooldraw()
40
Statically extracting invocations Consideration of overridden methods
Extracting Call Graph
Drawer#startDraw
Tool#draw
Drawer#startDraw
OvalTool#draw
extracted
From source code
From sentence
Normalization− Stemming− Removing stop words
Extractingidentifiers
41
Extracting Words
startsWith,startDraw, ...
{starts,with,
starts, draw, ...}
TokenizingcamelCases
Normalizingdraw a circle
{draw, a,circle}
{draw,oval}
{start,draw, ...}
42
Use of ontologyInput sentences
YesPrec.
Yes Recall
NoPrec.
NoRecall
1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -
Bad Results
Bad effects, no improvement− reason: use of external modules− reason: use of compound words (grayscale vs. grayScale)