informa(on)retrieval)support)for) so3ware)engineering)tasks)burmeste/haiduc_2015.pdf ·...
TRANSCRIPT
![Page 1: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/1.jpg)
Informa(on Retrieval Support for So3ware Engineering Tasks
Sonia Haiduc
Assistant Professor Department of Computer Science
Florida State University
![Page 2: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/2.jpg)
Short Bio
![Page 3: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/3.jpg)
What is Informa(on Retrieval?
3
![Page 4: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/4.jpg)
SE Tasks Supported by Informa(on Retrieval
• Concept/Feature Loca=on • Impact Analysis • Traceability Link Recovery • Code Reuse • Bug Triage • Program Comprehension • Architecture/design recovery
• Quality Assessment • SoGware Evolu=on Analysis • Automa=c Documenta=on
• Requirements Analysis • Defect Predic=on and Debugging
• Refactoring • SoGware Categoriza=on • Licensing Analysis • Clone Detec=on • Effort Es=ma=on • Domain Analysis • Web Services Discovery
![Page 5: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/5.jpg)
SE Tasks Supported by Informa(on Retrieval
• Concept/Feature Loca(on • Impact Analysis • Traceability Link Recovery • Code Reuse • Bug Triage • Program Comprehension • Architecture/design recovery
• Quality Assessment • SoGware Evolu=on Analysis • Automa=c Documenta=on
• Requirements Analysis • Defect Predic=on and Debugging
• Refactoring • SoGware Categoriza=on • Licensing Analysis • Clone Detec=on • Effort Es=ma=on • Domain Analysis • Web Services Discovery
![Page 6: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/6.jpg)
So3ware Changes
6
So3ware Maintenance
75%
Ini(al Development
25%
So3ware Costs
• Adding new features • Modifying exis=ng features
• Fixing bugs • Improving performance • Adap=ng to changes in hardware
• Refactoring • Etc.
![Page 7: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/7.jpg)
So3ware Change is Difficult
• Millions of lines of code – S-‐class Mercedes-‐Benz : 20 million – OpenOffice: 30 million – Windows XP: 45 million
• Developed by large, distributed teams
• Developers have to change soGware with: – Limited domain knowledge – Absence of the original developer – Bad, missing, or out of date documenta=on
7
![Page 8: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/8.jpg)
Concept Loca(on
• Finding the implementa=on of a concept in the code, i.e., a place in the source code where to start a change
• Sources of informa=on: – Structure -‐ the structural aspects of the source code (e.g., control and data flow, class diagrams)
– Dynamic – behavioral aspects of the program (e.g., execu=on traces)
– Text -‐ captures the problem domain and developer inten=ons (e.g., iden=fiers, comments) -‐> Text Retrieval
![Page 9: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/9.jpg)
Text Retrieval for Concept Loca(on
Relevant Code Elements
TR Engine
Source Code Text
Query
INPUT
![Page 10: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/10.jpg)
• Developers have a hard =me formula=ng good queries in unfamiliar soGware systems
Problems
• The results of TR depend on the quality of iden=fiers found in the source code
Query
Source Code Text
Results Presenta=on
• The presenta=on of the results does not offer enough informa=on to understand if the results are relevant
10
![Page 11: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/11.jpg)
• Developers have a hard =me formula=ng good queries in unfamiliar soGware systems
Problem #1 Query
Problem
• How can query formula=on be made easy for developers?
• How can bad queries be improved?
• Automa=c query reformula=on
Research Ques(ons
Solu(on
![Page 12: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/12.jpg)
Approaches • Semi-‐automa(c: Relevance feedback – People can not always express well what they are looking for, but can recognize it when they see it
– Developer provides feedback about relevance of search results and query is automa=cally reformulated
• Fully automa(c: Learning the best reformula=on for each query – Developer needs not be involved – Use machine learning techniques to learn the best reformula=on for queries based on their lexical proper=es
![Page 13: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/13.jpg)
FileZilla Bug Report #3272
No confirm for delete in folder view Reported by: trellmor Priority: normal Component: FileZilla client
Descrip(on If you try to delete a folder by “right click -‐> delete” in the remote folder window, it won’t ask for confirma=on.
![Page 14: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/14.jpg)
1. getRemoteFolder () get remote folder des=na=on
2. viewUserSe7ngs() view user sekngs pane cache
3. confirmFileTransfer() confirm file transfer popup window
-‐ words in documents -‐ view -‐confirm
+ words in documents +get +remote +folder +des=na=on
confirm delete folder view
Ini(al Query
TR
RF
get remote folder des(na(on delete folder
Reformulated Query
![Page 15: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/15.jpg)
Evalua(on
• Empirical evalua=on -‐ loca=ng bugs in code based on text found in bug reports
• Patches in bug reports used for iden=fying buggy methods
• 3 large soGware systems, 18 queries – Eclipse – IDE for Java (2500 KLOC) – jEdit – programming editor (300 KLOC) – Adempiere – enterprise resource planning (330 KLOC)
• 72% of cases queries reformulated using relevance
feedback led to berer results
![Page 16: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/16.jpg)
• In relevance feedback, developers need to spend =me providing feedback -‐ automated solu=on desirable
• Queries are different -‐ different types of queries may require different reformula=on approaches (query expansion, query contrac=on, etc.)
Refoqus: Automa(cally Determining the Best Reformula(on
![Page 17: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/17.jpg)
Refoqus
Training queries
• Query proper=es • Best reformula=on
New query
• Query proper=es
Best reformula(on
MODEL LEARN
![Page 18: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/18.jpg)
Evalua(on • Empirical evalua=on evalua=on -‐ loca=ng bugs in code based on text found in bug reports
• 6 soGware systems, 30 queries each – Adempiere (330 KLOC) -‐ jEdit (300 KLOC) – Atunes (80 KLOC) -‐ Mahout (110 KLOC) – FileZilla (240 KLOC) -‐ WinMerge (410 KLOC)
• Refoqus outperformed any individual reformula=on technique; 85% of cases improved results of TR-‐based concept loca=on
![Page 19: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/19.jpg)
• The results of TR depend on the quality of iden=fiers found in the source code
Problem #2
19
Problem
Source Code Text
• How can we improve the results of TR-‐based concept loca=on when bad iden=fiers are present?
• Iden=fying and renaming bad iden=fiers
Research Ques(on
Solu(on
![Page 20: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/20.jpg)
Lexicon Bad Smells
• Poorly named iden=fiers can be misleading and impact the results of TR techniques
• Defined a catalog of bad smells in iden=fiers
• Proposed a set of renaming opera=ons to fix bad smells
• Empirical evalua=on on concept loca=on
• Results: improved TR-‐based concept loca=on aGer removing bad smells 20
![Page 21: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/21.jpg)
• The presenta=on of the results does not offer enough informa=on to understand if the results are relevant
Problem #3
21
Problem
Results Presenta=on
• How can the results of TR-‐based concept loca=on be presented in a more informa=ve way?
• Automa=c code summaries
Research Ques(on
Solu(on
![Page 22: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/22.jpg)
Code Summaries
• Brief but relevant descrip=ons of source code en==es (methods, classes, etc.)
• Text retrieval and text summariza=on techniques extract most representa=ve informa=on from code
• User evalua=on for method and class summaries • Results: users agreed with the summaries created (score 3.2 out of 4)
• Current work: people summarize code differently -‐ user studies
22
![Page 23: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!](https://reader033.vdocuments.us/reader033/viewer/2022042010/5e7202b80619503631084b4a/html5/thumbnails/23.jpg)
23