sentence-to-code traceability recovery with domain ontologies

38
Shinpei Hayashi , Takashi Yoshikawa, Motoshi Saeki Tokyo Institute of Technology, Japan Sentence-to-Code Traceability Recovery with Domain Ontologies

Upload: shinpei-hayashi

Post on 15-May-2015

3.399 views

Category:

Technology


1 download

DESCRIPTION

Presented at APSEC 2010 http://dx.doi.org/10.1109/APSEC.2010.51

TRANSCRIPT

Page 1: Sentence-to-Code Traceability Recovery with Domain Ontologies

Shinpei Hayashi, Takashi Yoshikawa, Motoshi SaekiTokyo Institute of Technology, Japan

Sentence-to-CodeTraceability Recovery

with Domain Ontologies

Page 2: Sentence-to-Code Traceability Recovery with Domain Ontologies

NL

sent

ence

Source code

Results: It works well!− Implemented an automated tool− Performed a case study using JDraw

• Recovered traceability between 7 sentences and code• Obtained more accurate results than the cases

without using ontology

Page 3: Sentence-to-Code Traceability Recovery with Domain Ontologies

Documentation-to-code traceability links are important− For reducing maintenance costs − For software reuse & extension

Our focus: recoveringsentence-to-code traceability− In some software products, there’s only

documentation of simple sentences without any detailed descriptionE.g., by use of agile/bazaar-style processes

Background

Page 4: Sentence-to-Code Traceability Recovery with Domain Ontologies

4

JDraw (http://jdraw.sf.net/)

Example of Sentence

Page 5: Sentence-to-Code Traceability Recovery with Domain Ontologies

5

JDraw (http://jdraw.sf.net/)

Example of Sentence

Page 6: Sentence-to-Code Traceability Recovery with Domain Ontologies

JDraw (http://jdraw.sf.net/)

Example of Sentence

6

plain, filled andgradient filled rectangles

Just single sentence, no detailed documents

Page 7: Sentence-to-Code Traceability Recovery with Domain Ontologies

7

To detect the set of methods related to the input sentence

Aim

Users can drawa plain oval.

NL sentence(a set of words)

draw Oval()

getColor()

OvalTool()DrawPanel()

writeLog()

getCanvas()

setPixel()

getColorPallete()

Source code (a set of methods)

Page 8: Sentence-to-Code Traceability Recovery with Domain Ontologies

8

How to find the correct set?− Word similarity: leads to false positives/negatives

Challenge

Users can drawa plain oval.

NL sentence(a set of words)

getColor()

OvalTool()DrawPanel()

writeLog()

getCanvas()

setPixel()

getColorPallete()

Source code (a set of methods)

Falsenegatives

Falsepositives

void setPixel(...) {... draw ...

}

draw Oval()

Page 9: Sentence-to-Code Traceability Recovery with Domain Ontologies

9

How to find the correct set?− Word similarity: leads to false positives/negatives− Method invocation: leads to false positives

Challenge

Users can drawa plain oval.

NL sentence(a set of words)

getColor()

OvalTool()DrawPanel()

writeLog()

getCanvas()

setPixel()

getColorPallete()

Source code (a set of methods)Falsepositive

drawOval()

Page 10: Sentence-to-Code Traceability Recovery with Domain Ontologies

10

Another criterion required− To judge whether a method invocation is needed− Considering the problem domain

Challenge

Users can drawa plain oval.

NL sentence(a set of words)

drawOval()

getColor()

OvalTool()DrawPanel()

writeLog()

getCanvas()

setPixel()

getColorPallete()

Source code (a set of methods)

Notimportantimportant

Page 11: Sentence-to-Code Traceability Recovery with Domain Ontologies

11

Formally representing the knowledge of the target problem domain− As relationships between concepts (words)

Domain Ontology

An ontology forpainting tools (excerpt)

canvas

draw

oval color

A concept “canvas” is a possible targetto “draw”.

The function “draw”concerns the concept “color”.

Page 12: Sentence-to-Code Traceability Recovery with Domain Ontologies

12

Choosing method invocationsusing domain ontology

Solution

Users can drawa plain oval.

NL sentence(a set of words)

draw Oval()

getColor()

OvalTool()DrawPanel()

writeLog()

getCanvas()

setPixel()

getColorPallete()

Source code (a set of methods and their invocations)

Ontology

canvas

draw

oval color

Page 13: Sentence-to-Code Traceability Recovery with Domain Ontologies

13

System OverviewInputs

FunctionalCall-graphs

(FCGs)

Outputs

Sentence

SourceCode

DomainOntology

2ndscore

85

1stscore100

Page 14: Sentence-to-Code Traceability Recovery with Domain Ontologies

14

Procedure

m1

m4

… draw oval

m2 m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

1. Extractingcall-graph

2. Extracting words3. Extracting functional

call-graphs (FCGs)4. Prioritizing FCGs

Page 15: Sentence-to-Code Traceability Recovery with Domain Ontologies

15

Procedure

m1

m4

… draw oval

m2 m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

1. Extractingcall-graphby static analysis

2. Extracting words3. Extracting functional

call-graphs (FCGs)4. Prioritizing FCGs

Page 16: Sentence-to-Code Traceability Recovery with Domain Ontologies

16

Procedure

m1

m4

… draw oval

m2 m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

1. Extractingcall-graph

2. Extracting words• Stemming• Removing stopwords

3. Extracting functional call-graphs (FCGs)

4. Prioritizing FCGs

{..., draw, oval}{..., draw}

{..., draw}

{..., oval}{..., color}

{..., pixel} {..., log}

{..., color}

{..., scale}

Page 17: Sentence-to-Code Traceability Recovery with Domain Ontologies

17

Procedure

m1

m4

… draw oval

m2 m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

Sa

Sb

1. Extractingcall-graph

2. Extracting words3. Extracting

functional call-graphs (FCGs)

4. Prioritizing FCGs

Page 18: Sentence-to-Code Traceability Recovery with Domain Ontologies

18

Procedure

m1

m4

… draw oval

m2 m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

Sa

Sb

Score100

1. Extractingcall-graph

2. Extracting words3. Extracting functional

call-graphs (FCGs)4. Prioritizing FCGs

Score85

Page 19: Sentence-to-Code Traceability Recovery with Domain Ontologies

19

Extracting FCGs

m1

m4

{wa, wb}

1. Root selection

2. Traversal

m2 m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

Page 20: Sentence-to-Code Traceability Recovery with Domain Ontologies

20

Extracting FCGs

m1role: {wa}

m4role: {wb}

{wb, ...}

{wa, ...}{wa, wb}

1. Root selection– Choose the methods

having the words in the input sentence

– These words are tagged as roles

2. Traversal

m2role: {wb}

m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

{wb, ...}

Page 21: Sentence-to-Code Traceability Recovery with Domain Ontologies

21

Extracting FCGs

{wa, wb}

1. Root selection2. Traversal– Traverse method

invocations fromthe roots

– Forwards if the invocation satisfies one of the traversal rules

m3

m5

m6 m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

m4role: {wb}

m2role: {wb}

m1role: {wa}

Page 22: Sentence-to-Code Traceability Recovery with Domain Ontologies

m6 m7 m8

m4role: {wb}

1. Root selection2. Traversal

– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules

Source code (a set of methods and their invocations)

m3

m5

22

Extracting FCGs

m2role: {wb}

Rule #1 (Sentence-based)holds if the callee methodhas a word in the givensentence

{wa, wb}

NL sentence(a set of words)

m1role: {wa}

{..., wb}

Page 23: Sentence-to-Code Traceability Recovery with Domain Ontologies

m6 m7 m8

m4role: {wb}

{wa, wb}

1. Root selection2. Traversal

– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules

NL sentence(a set of words) Source code

(a set of methods and their invocations)

m2role: {wb}

m5

23

Extracting FCGs

m1role: {wa}

Rule #2 (Ontology-based)

holds if the method invocationmatches the relationships inthe given ontology

m3role: {wc}

{..., wc}

Page 24: Sentence-to-Code Traceability Recovery with Domain Ontologies

m6 m7 m8

m4role: {wb}

{wa, wb}

1. Root selection2. Traversal

– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules

NL sentence(a set of words) Source code

(a set of methods and their invocations)

m1role: {wa}

m2role: {wb}

24

Extracting FCGs

m5role: {wc}

m3role: {wc}Rule #3 (Inheritance)

holds if the callee methodhas a word in the roles of the caller method

{..., wc}

Page 25: Sentence-to-Code Traceability Recovery with Domain Ontologies

25

Ontology-Based Rule

canvas

draw

Holds if the invocation matches the relationships in the given ontology

Ontology

Caller

CalleegetCanvas()role: {canvas}

drawOval()role: {draw, oval}

Page 26: Sentence-to-Code Traceability Recovery with Domain Ontologies

26

Extracting FCGs

{wa, wb}

1. Root selection2. Traversal– Traverse method

invocations from the roots

– Traversed methods will be a FCG

m2role: {wb}

m5role: {wc}

m6role: {…}

m7 m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

m1role: {wa}

m3role: {wc}

m4role: {wb}

Sa

Sb

Page 27: Sentence-to-Code Traceability Recovery with Domain Ontologies

27

Using an weighting scheme Criteria: we prioritize FCGs which

1. include methods having important roles as their names

2. include method invocations matching to the relationships in the ontology and/or the sentence

3. cover many words in the input sentence

Prioritizing FCGs

Page 28: Sentence-to-Code Traceability Recovery with Domain Ontologies

28

Evaluation target: JDraw 1.1.5− We picked up 7 sentences from JDraw's manual

on the Web− Prepared an ontology for painting tools

(including 38 concepts and 45 relationships)− Prepared control (answer) sets of methods by an

expert

Evaluation Criteria− Calculating precision and recall values by

comparing the extracted sets with the control sets

Case Study

Page 29: Sentence-to-Code Traceability Recovery with Domain Ontologies

29

Use of ontologyInput sentences

YesPrec.

Yes Recall

NoPrec.

NoRecall

1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -

Results

Picked up the FCGs having the highest F-measure from the results ranked in the top 5− F-measure = 2 / (precision-1 + recall-1)

Page 30: Sentence-to-Code Traceability Recovery with Domain Ontologies

30

Use of ontologyInput sentences

YesPrec.

Yes Recall

NoPrec.

NoRecall

1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -

Results

Accurate results by using the ontology− precision > 0.7 for 3 cases− recall > 0.9 for 4 cases

Page 31: Sentence-to-Code Traceability Recovery with Domain Ontologies

31

Use of ontologyInput sentences

YesPrec.

Yes Recall

NoPrec.

NoRecall

1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -

Results

Improvement by using the ontology− Improved recall for 1st and 2nd cases− detected traceability for 3rd case

Domain ontology gives us valuable guidesfor sentence-to-code traceability recovery.

Page 32: Sentence-to-Code Traceability Recovery with Domain Ontologies

Future Work Case study++

− Larger− Another/multiple domains

Supporting ontology construction Improving NLP techniques Combination of other techniques

− Dynamic analysis− Relevance feedback

Page 33: Sentence-to-Code Traceability Recovery with Domain Ontologies

5

JDraw (http://jdraw.sf.net/)

Example of Sentence

12

Choosing method invocationsusing domain ontology

Solution

Users can drawa plain oval.Users can drawa plain oval.

NL sentence(a set of words)

draw Oval()draw Oval()

getColor()getColor()

OvalTool()OvalTool()DrawPanel()DrawPanel()

writeLog()writeLog()

getCanvas()getCanvas()

setPixel()

getColorPallete()getColorPallete()

Source code (a set of methods and their invocations)

Ontology

canvascanvas

drawdraw

ovaloval colorcolor

28

Procedure

{wa, wb}{wa, wb}

1. Root selection2. Traversal

– Traverse method invocations from the roots if the invocation satisfies one of the traversal rules

– Traversed methods will be a FCG

m2role: {wb}m2

role: {wb}

m5role: {wc}m5

role: {wc}

m6role: {…}m6

role: {…}m7m7 m8m8

NL sentence(a set of words) Source code

(a set of methods and their invocations)

m1role: {wa}

m1role: {wa}

m3role: {wc}m3

role: {wc}

m4role: {wb}m4

role: {wb}

Sa

Sb

34

Results

Improvement by using the ontology− Improved recall for 1st and 2nd cases− detected traceability for 3rd case

Domain ontology gives us valuable guidesfor sentence-to-code traceability recovery.

Page 34: Sentence-to-Code Traceability Recovery with Domain Ontologies

additional slides

Page 35: Sentence-to-Code Traceability Recovery with Domain Ontologies

39

System Overview

Inputs

Ordered functionalCall-graphs

Outputs

Sentence

SourceCode

Call-graph

DomainOntology

Prioritizing

FunctionalCall-graphs

Words in theSentence

Words in theCode

Traversingcall-graph

static analysis

Extracting Wordssplitting, stemming,

removing stop words...extractidentifiers

Page 36: Sentence-to-Code Traceability Recovery with Domain Ontologies

class Tool ....class OvalTool extends Tool ...class Drawer {

public void startDraw() {Tool tool =

Tool.getCurrent();tool.draw();

}}

Tooldraw()

OvalTooldraw()

40

Statically extracting invocations Consideration of overridden methods

Extracting Call Graph

Drawer#startDraw

Tool#draw

Drawer#startDraw

OvalTool#draw

extracted

Page 37: Sentence-to-Code Traceability Recovery with Domain Ontologies

From source code

From sentence

Normalization− Stemming− Removing stop words

Extractingidentifiers

41

Extracting Words

startsWith,startDraw, ...

{starts,with,

starts, draw, ...}

TokenizingcamelCases

Normalizingdraw a circle

{draw, a,circle}

{draw,oval}

{start,draw, ...}

Page 38: Sentence-to-Code Traceability Recovery with Domain Ontologies

42

Use of ontologyInput sentences

YesPrec.

Yes Recall

NoPrec.

NoRecall

1. "plain, filled and gradient filled rectangles" 0.83 0.94 1.00 0.192. "plain, filled and gradient filled ovals" 0.82 0.98 1.00 0.213. "image rotation" 1.00 0.35 0.00 0.004. "image scaling" 0.22 0.68 1.00 0.585. "save JPEGs of configurable quality" 0.40 1.00 0.67 1.006. "colour reduction" 0.74 0.95 0.74 0.957. "grayscaling" - - - -

Bad Results

Bad effects, no improvement− reason: use of external modules− reason: use of compound words (grayscale vs. grayScale)