msr presentation
DESCRIPTION
MSR 2011 Talk slidesTRANSCRIPT
Comparative Study
Retrieval from Software Libraries for BugLocalization: A Comparative Study of Generic
and Composite Text Models
Shivani Rao and Avinash Kak
School of ECE,Purdue University
May 21, 2011MSR, Hawaii
Mining Software Repositories, Hawaii, 2011
Comparative Study
Outline
1 Bug localization
2 IR(Information Retrieval)-based bug localization
3 Text Models
4 Preprocessing of the source files
5 Evaluation Metrics
6 Results
7 Conclusion
Mining Software Repositories, Hawaii, 2011
Comparative Study
Bug localization
Bug localization
Bug localization means to locate the files, methods, classes,etc., that are directly related to the problem causing abnormalexecution behavior of the software.
IR Bug localization means to locate a bug from its textualdescription.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Background
A typical bug localization process
Mining Software Repositories, Hawaii, 2011
Comparative Study
Background
A typical bug report:JEdit
Mining Software Repositories, Hawaii, 2011
Comparative Study
Background
Past work on IR-based bug localization
Authors/Paper Model Software dataset
Marcus et al.[1]
VSM Jedit
Cleary et al. [2] LM, LSA andCA
Eclipse JDT
Lukins et al. [3] LDA Mozilla, Eclipse, Rhino andJEdit
Drawbacks
1 None of the work reported has been evaluated on a standarddataset.
2 Inability to compare with the static and dynamic techniques.
3 Number of bugs is of the order 5-30
Mining Software Repositories, Hawaii, 2011
Comparative Study
Background
iBUGS
Created by Dallmeier and Zimmerman [4], iBUGS contains alarge number of real bugs with corresponding test suites inorder to generate failing and passing test runs
ASPECTJ software
Software Library Size (Number of files) 6546
Lines of Code 75 KLOC
Vocabulary Size 7553
Number of bugs 291
Table: The iBUGS dataset after preprocessing
Mining Software Repositories, Hawaii, 2011
Comparative Study
Background
A typical bug report in the iBUGS repository
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Text models
VSM : Vector Space Model
LSA : Latent Semantic Analysis Model
UM : Unigram Model
LDA : Latent Dirichlet Allocation Model
CBDM : Cluster-Based Document Model
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Vector Space Model
If V is the vocabularythen queries anddocuments are|V|-dimensional vectors.
sim(q, dm) =wq.wm
|wq||wm|
Sparse yet highdimensional space.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Latent semantic analysis: Eigen decomposition
A = UΣV T
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
LSA based models
Topic based representation: ~wk(m) which is a K -dimensionaleigen vector that mth document ~wm.
~wK (m) = Σ−1K UTK ~wm
qK = Σ−1K UTK q
sim(q, dm) =qK .~wK (m)
|qK ||~wK (m)|
LSA2: Fold back the K-dimensional representation to asmoothed |V| dimensional represenation and compare directlywith the query q. w = UKΣK ~w
TK
Combined Representation: combines the LSA2 with the VSMrepresentation using the mixture parameter λ .Acombined = λA + (1− λ)A
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Unigram model to represent documents usingprobability distribution [5]
The term frequencies in a document are considered to be itsprobability distributionThe term frequencies in a query become the query’sprobablity distributionThe similarities are established by comparing the probabilitydistributions using KL divergence.To add smoothing we add the probability distribution over theentire source library.
puni (w |Dm) = µc(w , dm)
|dm|+ (1− µ)
∑|D|m=1 c(w , dm)∑|D|
m=1 |dm|
puni (w |q) = µc(w , q)
|q|+ (1− µ)
∑|D|m=1 c(w , dm)∑|D|
m=1 |dm|Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
LDA: A mixture model to representdocuments using topics/concepts [6]
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
LDA based models [7]
Topic based representation θm which is a K -dimensionalprobability vector that indicates the topic proportionspresent in mth document.
Maximum Likelihood Representation folds back to the |V|dimensional term space.
plda(w |Dm) =t=K∑t=1
p(w |z = t)p(z = t|Dm)
=t=K∑t=1
φ(t,w)θm(t)
Combined Representation combines the Unigram representation ofthe document and the MLE-LDA representation of adocument.
pcombined(w |Dm) = λplda(w |Dm)+(1−λ)puni (w |Dm)Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Cluster Based Document Model (CBDM) [8]
Cluster the documents into K clusters using deterministicalgorithms like K-means, hierarchical, agglomerative clusteringand so on.
Represent each of the clusters using a multinomial distributionover the terms in the vocabulary. This distribution iscommonly denoted by pML(w |Clusterj) and we can expressprobabilistic distribution for a words in a dm ∈ Clusterj by:
pcbdm(w |~wm) = λ1 ×wm(n)∑n=|V|
n=1 wm(n)+ λ2 × pc(w) +
λ3 × pML(w |Clusterj) (1)
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Summary of Text Models used in thecomparative study
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Summary of Text Models used in thecomparative study (cont.)
Model Representation Similarity Metric
VSM frequency vector Cosine similarity
LSA K dimensional vector in theeigen space
Cosine similarity
Unigram |V| dimensional probability vec-tor (smoothed)
KL divergence
LDA K dimensional probability vec-tor
KL divergence
CBDM |V| dimensional combined prob-ability vector
KL divergence or likeli-hood
Table: Generic models used in the comparative evaluation
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Summary of Text Models used in thecomparative study (cont.)
Model Representation Similarity Metric
LSA2 |V| dimensional representationin term-space
Cosine similarity
MLE-LDA
|V| dimensional MLE-LDAprobability vector
KL divergence or likeli-hood
Table: The variations on two of the generic models used in thecomparative evaluation
Mining Software Repositories, Hawaii, 2011
Comparative Study
Text Models
Summary of Text Models used in thecomparative study (cont.)
Model Representation Similarity Metric
Unigram+ LDA
|V| dimensional combined prob-ability vector
KL divergence or likeli-hood
VSM +LSA
|V| dimensional combined VSMand LSA representation
Cosine similarity
Table: The two composite models used
Mining Software Repositories, Hawaii, 2011
Comparative Study
Preprocessing of the source files
Preprocessing of the source files
If a patch file does not exist in the /trunk then it is searchedand added to the source library from the other branches/tagsof the ASPECTJ
The source library consists of ”.java” files only. After thisstep, our library ended up with 6546 Java files.
The repository.xml file documents all the information relatedto a bug. This includes the BugID, the bug description, therelevant source files, and so on. We shall call thisground-truth information as relevance judgements.
The bugs that are documented in iBUGS and do not have anyrelevant software files in the source library that results fromthe previous step are eliminated. After this step, we are leftwith 291 bugs.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Preprocessing of the source files
Preprocessing of the source files (contd)
Hard-words, camel-case words and soft-words are handled byusing popular identifier-splitting methods [9, 10].
Stop-list consists of most commonly occuring words.Example: “for,” “else,” “while,” “int,”, “double,” “long,”“public,” “void,” etc. There are 375 such words in iBUGSASPECTJ software. We also drop from the vocabulary allunicode strings.
The vocabulary is pruned further by calculating the relativeimportance of terms and eliminating ubiquitous andrarely-occuring terms.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Evaluation Metrics
Mean Average Precision (MAP)
Mean Average Precision (MAP)
Calculated using the following two sets:
retreived(Nr ) set consists of the top Nr documents from a rankedlist of documents retrieved vis-a-vis the query.
relevant set is extracted from relevance judgements availablefrom repository.xml
Precision and Recall:
Precision(P@Nr ) =|{relevant}
⋂{retrieved}|
|{retrieved}|
Recall(R@Nr ) =|{relevant}
⋂{retrieved}|
|{relevant}|
Mining Software Repositories, Hawaii, 2011
Comparative Study
Evaluation Metrics
Mean Average Precision (MAP)
Mean Average Precision (MAP) (cont.)
1 If we were to plot a typical P-R curve from the values forP@Nr and R@Nr , we would get a monotonically decrceasingcurve that has high values of Precision for low values of Recalland vice versa.
2 Area under the P-R curve is called the Average Precision.
3 Taking mean of the Average Precision over all the queriesgives Mean Average Precision (MAP).
4 Physical significance of MAP: Same as that of Precision.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Evaluation Metrics
Rank of Retrieved Files
Rank of Retrieved Files [3]
The number of queries/bugs for which relevant source fileswere retrieved with ranks rlow ≤ R ≤ rhigh is reported.
For the retrieval performance reported in [3], ranks used areR = 1, 2 ≤ R ≤ 5, 6 ≤ R ≤ 10 and R > 10.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Evaluation Metrics
SCORE
SCORE [11]
1 Indicates the proportion of the program that need to beexamined in order to locate or localize a fault
2 For each range of this proportion (example, 10− 20%) thenumber of test-runs (bugs) is reported.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Results
Models using LDA
Figure: MAP using the three LDA models for different values of K, theexperimental parameters for LDA+Unigram model are λ = 0.9 µ = 0.5,β = 0.01 and α = 50/K
Mining Software Repositories, Hawaii, 2011
Comparative Study
Results
The combined LDA+Unigram model
Figure: MAP plotted for different values of mixture proportions (λ andµ) of the LDA+Unigram combined model.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Results
Models using LSA
Figure: MAP using LSA model and its variations and combinations fordifferent values of K. The experimental parameter for the LSA+VSMcombined model is λ = 0.5.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Results
CBDM
Model parameters Kλ1 λ2 λ3 100 250 500 1000
0.25 0.25 0.5 0.093144 0.0914 0.08666 0.07664
0.15 0.35 0.5 0.0883 0.0897 0.0963 0.0932
0.81 0.09 0.1 0.143 0.102 0.108 0.09952
0.27 0.63 0.1 0.1306 0.117 0.111 0.0998
0.495 0.495 0.01 0.141 0.141 0.141 0.141
0.05 0.05 0.99 0.069 0.075 0.072 0.065
Table: Retrieval performance using MAP with the CBDM.λ1 + λ2 + λ3 = 1. λ1 Unigram model λ2 Collection Model λ3 Clustermodel
Mining Software Repositories, Hawaii, 2011
Comparative Study
Results
Rank based metric
Figure: The height of the bars shows the number of queries (bugs) forwhich at least one relevant source file was retrieved at rank 1.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Results
SCORE: IR based bug localization tools
Mining Software Repositories, Hawaii, 2011
Comparative Study
Results
SCORE: Compare with AMPLE andFINDBUGS
Figure: SCORE values calculated over 44bugs in iBUGS ASPECTJ using AMPLE[12]
SCORE with FINDBUGS
None of the bugs werelocalized correctly.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Conclusion
Conclusion
IR based bug localization techniques are equally or moreeffective compared to static or dynamic bug localization tools.
Sophisticated models like LDA, LSA or CBDM do notout-perform simpler models like Unigram or VSM for IR basedbug localization on large software systems.
An analysis of the spread of the word distributions over thesource files with the help of measures such as tf and idf cangive useful insights into the usability of topic and clusterbased models for localization.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Conclusion
End of Presentation
Thanks to
Questions?
Mining Software Repositories, Hawaii, 2011
Comparative Study
Conclusion
Threads to validity
We have tested on a single database like iBUGS. How doesthis generalize?
We have eliminated xml files among those that are indexedand queried. Maybe not a valid assumption?
Mining Software Repositories, Hawaii, 2011
Comparative Study
Conclusion
References
A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic, “AnInformation Retrieval Approach to Concept Location in Sourcecode,” in In Proceedings of the 11th Working Conference onReverse Engineering (WCRE 2004, pp. 214–223, IEEEComputer Society, 2004.
B. Cleary, C. Exton, J. Buckley, and M. English, “An EmpiricalAnalysis of Information Retrieval based Concept LocationTechniques in Software Comprehension,” Empirical Softw.Engg., vol. 14, no. 1, pp. 93–130, 2009.
S. K. Lukins, N. A. Karft, and E. H. Letha, “Source CodeRetrieval for Bug Localization using Latent DirichletAllocation,” in 15th Working Conference on ReverseEngineering, 2008.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Conclusion
References (cont.)
V. Dallmeier and T. Zimmermann, “Extraction of BugLocalization Benchmarks from History,” in ASE ’07:Proceedings of the twenty-second IEEE/ACM internationalconference on Automated software engineering, (New York,NY, USA), pp. 433–436, ACM, 2007.
J. Lafferty and C. Zhai, “A Study of Smoothing Methods forLanguage Models Applied to information retrieval,” ACMTransactions Information Systems, pp. 179–214, 2004.
D. M. Blei, A. V. Ng, and M. I. Jordan, “Latent DirichletAllocation,” Journal of Machine Learning, pp. 993–1022, 2003.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Conclusion
References (cont.)
X. Wei and W. B. Croft, “Lda-Based Document Models forAd-hoc Retrieval,” in Proceedings of the 29th annualinternational ACM SIGIR conference on Research anddevelopment in information retrieval, ACM, 2006.
L. X and W. B. Croft, “Cluster-Based Retrieval UsingLanguage Models,” in ACM SIGIR Conference on Researchand Development in Information Retrieval, ACM, 2004.
D. B. H. Field and D. Lawrie., “An Empirical Comparison ofTechniques for Extracting Concept Abbreviations fromIdentifiers.,” in Proceedings of IASTED InternationalConference on Software Engineering and Applications, 2006.
Mining Software Repositories, Hawaii, 2011
Comparative Study
Conclusion
References (cont.)
E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker, “MiningSource Code to Automatically Split Identifiers for SoftwareAnalysis,” in Proceedings of the 2009 6th IEEE InternationalWorking Conference on Mining Software Repositories, MSR’09, (Washington, DC, USA), pp. 71–80, IEEE ComputerSociety, 2009.
J. A. Jones and M. J. Harrold, “Empirical Evaluation of theTarantula Automatic Fault-Localization Technique,” inAutomated Software Engineering, 2005.
V. Dallmeier and T. Zimmermann, “Automatic Extraction ofBug Localization Benchmarks from History,” tech. rep.,Universiat des Saarlandes, Saarbrucken, Germany, June 2007.
Mining Software Repositories, Hawaii, 2011