finding code clones for refactoring with clone metrics : a case study of open source software
Post on 01-Jan-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Finding Code Clones for Refactoring with Clone Metrics : A Case Study of Open Source Software
1
†Osaka University, Japan ‡Nara Institute of Science and Technology , Japan
*NEC Corporation, Japan
Eunjong Choi†, Norihiro Yoshida‡, Takashi Ishio†,Katsuro Inoue†, and Tateki Sano*
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Contents
1. Background
2. Clone Metrics
3. Industrial Case Study
4. Case Study of Open Source Software
5. Summary and Future Work
2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Background: Clone Clone
Identical or similar code fragments in source code
The presence of code clones indication of low maintainability of software
if a bug is found in a code clone, the other code clone have to be checked for defect detection.
3
Similar
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Refactoring is a process of restructuring an existing code.Alter software’s internal structure without
changing its external behaviorImprove the maintainability of software
Background: Refactoring [Fowler1999] (1/2)
4
[Fowler1999] M. Fowler, et al., Refactoring: Improving The Design of Existing Code, Addition Wesley, 1999.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Refactoring Code ClonesMerge code clones into a single program
unit
Background: Refactoring [Fowler1999] (2/2)
5
Refactoringcallstatement
[Fowler1999] M. Fowler, et al., Refactoring: Improving The Design of Existing Code, Addition Wesley, 1999.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
It is unavoidable to exist in source code because of specifications of the used program
language.
6
Background: Language-dependent Code Clone
Example of the language-dependent code clone(Consecutive setter invocations)
replacement.setTaskType(taskType); replacement.setTaskName(taskName); replacement.setLocation(location); replacement.setOwningTarget(target); replacement.setRuntime (wrapper); wrapper.setProxy(replacement);
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Background: Clone Set
A set of code clones
7
Code Clone 1
Code Clone 2
Code Clone 3
Clone Set
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Background: Clone Metrics [Higo2007]
Quantitative information on clone setsE.g., LEN(S), RNR(S), POP(S)
PurposesTo check features of code clones in software To extract code clones for several purposes
E.g., The highest length of code clones…
8
[Higo2007] Y.Higo, T. Kamiya, S.Kusumoto, K.Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985-998 (2007-9)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone Metrics: LEN(S)
The average length of token sequences of code clones in a clone set S
9
Clone set S
A token sequence [a b b ] is detected as a code clone
LEN(S) = 3
a b b
a b b a b b
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone Metrics: RNR(S)
The ratio of non-repeated token sequences of code clones in a clone set S
Eliminate language dependent code clonesHigh RNR value
10
RNR(S) = • 100 = 33.3 1
3
The length of non-repeatedThe length of non-repeated token sequencetoken sequence
The length of whole The length of whole token sequencetoken sequence
Clone set S
a b b
a b b a b b
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone Metrics: POP(S)
The number of code clones in a clone set S
11
POP(S) = 3
1
23
Clone set S
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Single Clone Metric (1/3)
Clone sets whose LEN(S) is higherThey Include many consecutive if (of if-else) blocks
involve similar but different conditional expressions.
12
if ((p = getProject().getProperty("ant.netrexxc.binary")) != null) { this.binary = Project.toBoolean(p); } // classpath makes no sense if ((p = getProject().getProperty("ant.netrexxc.comments")) != null) { this.comments = Project.toBoolean(p); }
…………The last part is omitted……………………
Code Clone in a clone set whose POP(S) is the highest in Ant1.7.0
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Single Clone Metric (2/3)
Clone sets whose RNR(S) is higherThey do not organize a single semantic unit
semantic unit : many instructions forming a single functionality
13
Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0
else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {
a a part of part of semantic unitsemantic unit
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Single Clone Metric (3/3)
Clone sets whose POP(S) is higherThey Include many language-dependent code clones
14
Code Clone in a clone set whose POP(S) is higher than others
out.println("\">");
out.println("");
out.print("<!ELEMENT project (target | "); out.print(TASKS); out.print(" | "); out.print(TYPES);
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Key Idea
It is not appropriate to extract code clones for refactoring using just a single clone metric According to our experiences
We propose a method based on combined clone metricsTo improve the weakness of single-metric-based
extraction
15
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Combined Clone Metrics
Clone sets whose RNR(S), POPS(S) are higherEach code clone organizes a single semantic units
16
Code Clone in a clone set whose RNR(S), POP(S) are higher than others
if (ifProperty != null && p.getProperty(ifProperty) == null) { return false; } else if (unlessProperty != null && p.getProperty(unlessProperty) != null) { return false; }
return true; }
Appropriate for Refactoring!
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Industrial Case Study (1/2)
Goal: validating our key ideaUsing combined clone metrics is a feasible
method to extract code clone for refactoring
Target SystemIndustrial Java software developed by NEC110KLOC, 736 clone sets
17
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Industrial Case Study (2/2)
Experimental Step1. Selected 62 clone sets from CCFinder's
output using clone metrics.
2. Conducted a survey about these clone sets and got feedback from a developer.
18
Source files
CCFinderClone sets using clone metrics
Survey
Feed back
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (1/2)
Clone sets whose either clone metric value is highSLEN : Clone sets whose LEN(S) value is top 10
highSRNR : Clone sets whose RNR(S) value is top 10
highSPOP : Clone sets whose POP(S) value is top 10
high
19
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (2/2)
Clone sets whose combined clone metrics values are highSLEN•RNR: 15 clone sets whose LEN(S) and
RNR(S) values are high rank in the top 15SLEN•POP: 7 clone sets whose LEN(S) and POP(S)
values are high rank in the top 15SRNR•POP: 18 clone sets whose RNR(S) and
POP(S) values are high rank in the top 15SLEN•RNR•POP : 1 clone set whose LEN(S), RNR(S)
and POP(S) values are high rank in the top 15
20
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
In Survey : About Clone set XXX
Q. Which practice is appropriate for this clone set?
[] Perform refactoring
[] Write comments about code clones, but don’t perform refactoring.
[] Change nothing.
[] Others. ( )
21
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
In Survey : About Clone set XXX
Q. Which practice is appropriate for this clone set?
[] Perform refactoring
[] Write comments about code clones, but don’t perform refactoring.
[] Change nothing.
[] Others. ( )
22
= Appropriate for refactoring√
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
In Survey : About Clone set XXX
Q. Which practice is appropriate for this clone set?
[] Perform refactoring
[] Write comments about code clones, but don’t perform refactoring.
[] Change nothing.
[] Others. ( )
23
=Inappropriate for refactoring
√
√√
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Results of Case Study (1/2)
24
#Selected Clone Sets: The number of selected clones #Refactoring: The number of clone sets marked as
“Perform refactoring“ in survey
Filtering#Selected Clone Sets
#Refactoring Precision
Each Single Clone metric 30 14 0.47
Combined Clone metrics 41 34 0.87
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Results of Case Study (2/2)
25
Precision : “How many refactoring candidates were accepted by a developer?“
Combined clone metrics is more accepted as refactoring candidates by a developer
#Refactoring
#Selected Clone SetsPrecision =
Filtering#Selected Clone Sets
#Refactoring Precision
Each Single Clone metric 30 14 0.47
Combined Clone metrics 41 34 0.87
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Case Study of Open Source Software
Goal: validating our key ideaUsing combined clone metrics is a feasible
method to extract code clone for refactoringUsing open source software
Experimental Step1. Selected clone sets from CCFinder's output
using clone metrics.
2. Checked Clone sets whether they are appropriate for performing refactoring.
26
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Target systems
implementation in java Apache Ant:
198KLOC, 998 clone sets
Jboss: 633KLOC, 4284 clone sets
27
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject clone sets
Subject clone setsApached Ant: 87 clone setsJboss: 299 clone sets
Clone sets whose either clone metric value is top 10 high
Clone sets whose combined clone metrics values are high rank in the 15
28
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (Apache Ant)
29
Filtering#Selected Clone Sets
#Refactoring Precision
Each Single Clone metric 30 6 0.20
Combined Clone metrics 60 31 0.53
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (Jboss)
30
Filtering#Selected Clone Sets
#Refactoring Precision
Each Single Clone metric 30 9 0.30
Combined Clone metrics 298 76 0.25
Q.Why results are different between the software?Because of the open source software dose not allow coding rule?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis of Results: defects of RNR metric (1/2)
31
RNR metric sometimes extract unintentional code clones E.g., Language-dependent code clones
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis of Results: defects of RNR metric (2/2)
32
lIndex = lReturn.indexOf( "*" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%2a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( "*" ); } lIndex = lReturn.indexOf( ":" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%3a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( ":" ); }
Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis of Results: defects of RNR metric (2/2)
33
lIndex = lReturn.indexOf( "*" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%2a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( "*" ); } lIndex = lReturn.indexOf( ":" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%3a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( ":" ); }
The value of RNR is really 96?
Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis of Results: defects of RNR metric (2/2)
34
lIndex = lReturn.indexOf( "*" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%2a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( "*" ); } lIndex = lReturn.indexOf( ":" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%3a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( ":" ); }
Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS
RNR value of this clone sets Code Clone in a clone sets whose LEN(S) and RNR(S) (=50)
35
Analysis of Results: defects of RNR metric (2/2)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Summary and Future Work
SummaryWe conducted a case study to validate our key
idea and discuss its result Future Work
Update used metricsInvestigate about recallUse more metrics.Conduct case studies of open source software
36
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 37
Thank You for Your Attention!
감사합니다 .
ありがとうございます
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example of clone set that are not selected…
It is too short to organize a semantic unit. RNR metric sometimes extract unintentional code
clones E.g., Language-dependent code clones
38
boolean isEqual(final DeweyDecimal other) { final int max = Math.max(other.components.length, components.length);
for (int i = 0; i < max; i++) { final int component1 = (i < components.length) ? components[ i ] : 0; final int component2 = (i < other.components.length) ? other.components[ i ] : 0; if (
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone sets whose RNR(S) is higher than others
Each code clone in a clone set S consists of more non-repeated token sequences
39
/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {/* … */
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone sets whose RNR(S) is lower than others
Consists of more repeated token sequences Involve in language-dependent code clone
40
/* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1.7.0 */ String sosCmdDir = null; …… skip code….
private String filename = null;
private boolean noCompress = false; private boolean noCache = false; private boolean recursive = false; private boolean verbose = false;/* … */
Consecutive variable declarations
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone metric: RNR(S) (1/2)
File:F1: a b c a b,F2: c c* c* a b,F3: d a b, e fF4: c c* d e f
Superscript * indicated that the token is in a repeated token sequence
RNR(S1) of Clone Set S1 is
41
RNR(S1) = • 100 = 100
2 + 2 + 2 + 22 + 2 + 2 + 2
Clone Set:S1: { , , , }ab ab ab ab
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone metric: RNR(S) (2/2)
File:F1: a b c a b,F2: c c* c* a b,F3: d a b, e fF4: c c* d e f
Superscript * indicated that the token is in a repeated token sequence
RNR(S2) of Clone Set S2 is
42
Clone Set:S2: { , , }c c* c* c* c c*
RNR(S2) = • 100 = 33.31 + 0 + 12 + 2 + 2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
| SRNR ∩ SPOP ∩ SRNR ∙ POP| = 1 | SRNR ∩ SRNR ∙ POP| = 2 | S POP ∩ SRNR ∙ POP| = 2 | SLEN ∙ RNR ∩ SLEN ∙ POP ∩ SRNR ∙ POP
∩ SLEN ∙ RNR ∙ POP| = 1
CS セミナー 2010/12/01
43
The Number of Duplicate Clone Set(Industrial)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
| SRNR ∩ SRNR ∙ POP| = 1 | SPOP ∩ SRNR ∙ POP| = 1 | SPOP ∩ SLEN ∙ POP| = 1
CS セミナー 2010/12/01
44
The Number of Duplicate Clone Set(Apache ant)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
| SRNR ∩ SLEN ∙ RNR| = 3 | SRNR ∩ SRNR ∙ POP| = 1 | SLEN ∙ RNR ∩ SLEN ∙ POP ∩ SRNR ∙ POP
∩ SLEN ∙ RNR ∙ POP| = 2
CS セミナー 2010/12/01
45
The Number of Duplicate Clone Set(JBOSS)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 46
Clone set metrics LEN (C ): Length of token sequence of each element in clone set C
POP (C ): Number of elements in clone set C
RAD (C ): Distribution in the file system of elements in clone set C
DFL (C ): Estimation of how many tokens would be removed from source files when all code fragments of clone set C are replaced with caller statements of a new identical routine
new sub routinecaller statements
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Results, and Precision of each clone set in the survey
47
Filtering #Selected Clone Sets
#Refactoring Precision
Clone sets whose LEN(S) value is top 10 high 10 7 0.70Clone sets whose RNR(S) value is top 10 high 10 4 0.40Clone sets whose POP(S) value is top 10 high 10 3 0.30Clone sets whose LEN(S) and RNR(S) values are high rank in the top 15
15 13 0.87
Clone sets whose LEN(S) and POP(S) values are high rank in the top
7 6 0.86
RNR(S) and POP(S) values are high rank in the top 15
18 14 0.78
Clone sets whose 1 clone set whose LEN(S), RNR(S), and POP(S) values are high rank in the top 15
1 1 1.00
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (Apache Ant)
48
Clone Sets #Selected Clone Sets
#Refactoring Precision
SLEN 10 0 0.00
SRNR 10 6 0.60
SPOP 10 0 0.00
SLEN•RNR 8 6 0.75
SLEN•POP 18 9 0.50
SRNR•POP 34 16 0.47
SLEN•RNR•POP - - -
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (Jboss)
49
Clone Sets #Selected Clone Sets
#Refactoring Precision
SLEN 10 2 0.20
SRNR 10 7 0.60
SPOP 10 0 0.00
SLEN•RNR 63 37 0.59
SLEN•POP 104 5 0.05
SRNR•POP 129 32 0.25
SLEN•RNR•POP 2 2 1.00
top related