software analytics: towards software mining that matters tao xie university of illinois at...

66
Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign http://www.cs.illinois.edu/homes/taoxie/ [email protected]

Upload: lilian-ball

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Software Analytics:

Towards Software Mining

that Matters

Tao XieUniversity of Illinois at Urbana-Champaign

http://www.cs.illinois.edu/homes/taoxie/

[email protected]

Page 2: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Should I test\review my?

©A. Hassan

Page 3: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

©A. Hassan

Page 4: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

©A. Hassan

Page 5: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

©A. Hassan

Page 6: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.

[MALETS’11 Zhang et al.]

Page 7: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Software Intelligence &Software Intelligence &Analytics for Software DevelopmentAnalytics for Software Development

http://people.engr.ncsu.edu/txie/publications/foser10-si.pdfhttp://thomas-zimmermann.com/publications/files/buse-foser-2010.pdf

Page 8: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• use Data Exploration and Analysis Mining Software Repositories (MSR)

• for Software PractitionersBeyond Software Developers

• obtain Insightful and Actionable infoNeed get real as well

• Analytic Techniques• Producing Impact on Practice

Page 9: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Look through your softwaredata

©A. Hassan

Page 10: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Mine through the data!

http://msrconf.orghttp://msrconf.org

An international effort to An international effort to make software repositories actionablemake software repositories actionable

http://promisedata.orghttp://promisedata.org

Promise Data Repository

©A. Hassan

Page 11: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Mining Software Repositories (MSR)

• Transforms static record-keeping repositories to active repositories

• Makes repository data actionable by uncovering hidden patterns and trends

11

MailinglistBugzilla Crashes

Field logs CVS/SVN

©A. Hassan

Page 12: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

12

Field Logs

Source ControlCVS/SVN

Bugzilla Mailing lists

CrashRepos

Historical Repositories Runtime Repos

Code Repos

SourceforgeGoogleCode

©A. Hassan

Page 13: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Bugzilla CVS/SVNMailinglist Crashes

MSR researchersanalyze and cross-link repositories

fixed bug

discussions

Buggy change &Fixing change

Field crashes

Estimate fix effortMark duplicates

Suggest experts and fix!

New Bug Report

©A. Hassan

Page 14: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• use Data Exploration and Analysis Mining Software Repositories (MSR)

• for Software PractitionersBeyond Software Developers

• obtain Insightful and Actionable infoNeed get real as well

• Analytic Techniques• Producing Impact on Practice

Page 15: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

We continue to help practitioners (esp. developers)

©A. Hassan

Page 16: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

©A. Hassan

Page 17: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

©A. Hassan

Page 18: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

©A. Hassan

Page 19: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Detection and Management of Code Clones

©A. Hassan

Page 20: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Support Logs

Source Code

©A. Hassan

Page 21: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

©A. Hassan

Page 22: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• use Data Exploration and Analysis Mining Software Repositories (MSR)

• for Software PractitionersBeyond Software Developers

• obtain Insightful and Actionable infoNeed get real as well

• Analytic Techniques• Case Studies

Page 23: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Predicting Bugs

• Studies have shown that most complexity metrics correlate well with LOC!– Graves et al. 2000 on commercial systems– Herraiz et al. 2007 on open source systems

• Noteworthy findings:– Previous bugs are good predictors of future bugs– The more a file changes, the more likely it will have

bugs in it– Recent changes affect more the bug potential of a file

over older changes (weighted time damp models)– Number of developers is of little help in predicting bugs– Hard to generalize bug predictors across projects unless

in similar domains [Nagappan, Ball et al. 2006]

23

Page 24: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Using Imports in Eclipse to Predict Bugs

24

import org.eclipse.jdt.internal.compiler.lookup.*;import org.eclipse.jdt.internal.compiler.*;import org.eclipse.jdt.internal.compiler.ast.*;import org.eclipse.jdt.internal.compiler.util.*;...import org.eclipse.pde.core.*;import org.eclipse.jface.wizard.*;import org.eclipse.ui.*;

14% of all files that import 14% of all files that import uiui packages, had to be fixed later on.packages, had to be fixed later on.

71% of files that import 71% of files that import compilercompiler packages, packages,

had to be fixed later on.had to be fixed later on.

[Schröter et al. 06]

Page 25: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

25

Percentage of bug-introducing changes for eclipse

Don’t program on Fridays ;-)

[Zimmermann et al. 05]

Page 26: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

26

Failure is a 4-letter Word

[PROMISE’11 Zeller et al.]

Page 27: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

27

Actionable Alone is not Enough!

[PROMISE’11 Zeller et al.]

Page 28: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Who produces more buggy code?

©A. Hassan

Page 29: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• use Data Exploration and Analysis Mining Software Repositories (MSR)

• for Software PractitionersBeyond Software Developers

• obtain Insightful and Actionable infoNeed get real as well

• Analytic Techniques• Producing Impact on Practice

Page 30: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Analytic Techniques in SE

• Association rules and frequent patterns• Classification• Clustering• Text mining/Natural language processing• Visualization

More details are at

• https://sites.google.com/site/xsoftanalytics/

30

Page 31: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

31

Basic mining

algorithms

Solution-Driven Problem-Driven

Advanced mining

algorithmsNew/adapted

mining algorithms

Where can I apply X miner? What patterns do we really need?

E.g., frequent partial order mining [ESEC/FSE 07]

E.g., association rule, frequent itemset mining… E.g., [ICSE 09], [ASE 09]

Page 32: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

32

3232

Code repositoriesCode repositories

1 2 N…

1 2mining patterns

searching mining patterns

Code search engine e.g., Open source code

on the web

Eclipse, Linux, …

Traditional approaches

Our new approaches

Often lack sufficient relevant data points (Eg. API call sites)

Code repositories

Mining Searching + Mining

Page 33: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Existing approaches produce high % of false positivesOne major observation:

Programmers often write code in different ways for achieving the same task

Some ways are more frequent than others

Frequent ways

Infrequent ways

Mined Patterns

mine patterns detect violations

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 34: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

34

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 35: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

35

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

Mined Pattern from existing approaches:“boolean check on return of Iterator.hasNext before

Iterator.next”S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 36: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Example: java.util.Iterator.next()

Require more general patterns (alternative patterns): P1 or P2

P1 : boolean check on return of Iterator.hasNext before Iterator.nextP2 : boolean check on return of ArrayList.size before Iterator.next

Cannot be mined by existing approaches, since alternative P2 is infrequent

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 37: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

37

Our Solution: ImMiner Algorithm

Mines alternative patterns of the form P1 or P2

Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

P2 is frequent among code examples not supporting P1

P2 is infrequent among entire 1243 code examples

[ASE 09]

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 38: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

38

Alternative Patterns

ImMiner mines three kinds of alternative patterns of the general form “P1 or P2”

Balanced: all alternatives (both P1 and P2) are frequent

Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^

2”

Single: only one alternativeS.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 39: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

ImMiner Algorithm

Uses frequent-itemset mining [Burdick et al. ICDE 01] iteratively

An input database with the following APIs for Iterator.next()

Input database Mapping of IDs to APIs

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 40: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

ImMiner Algorithm: Frequent Alternatives

Input database

Frequent itemset mining

(min_sup 0.5)

Frequent item: 1P1: boolean-check on the return of

Iterator.hasNext() before Iterator.next()S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 41: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

41

ImMiner: Infrequent Alternatives of P1

Positive database (PSD)

Negative database (NSD)

Split input database into two databases: Positive and Negative

Mine patterns that are frequent in NSD and are infrequent in PSD Reason: Only such patterns serve as alternatives for P1

Alternative Pattern : P2 “const check on the return of ArrayList.size() before Iterator.next()”

Alattin applies ImMiner algorithm to detect neglected conditionsS.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 42: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Neglected Conditions

Neglected conditions refer to Missing conditions that check the arguments or receiver of the API call before the API call Missing conditions that check the return or receiver of the API call after the API call

One primary reason for many fatal issues security or buffer-overflow vulnerabilities [Chang et al. ISSTA 07]

S.Thummalapenta and T. Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. ASE 2009.

Page 43: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• use Data Exploration and Analysis Mining Software Repositories (MSR)

• for Software PractitionersBeyond Software Developers

• obtain Insightful and Actionable infoNeed get real as well

• Analytic Techniques• Producing Impact on Practice

Page 44: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Machine Learning that MattersMachine Learning that Matters

http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf

[ICML’12 Wagsta ]ff

Page 45: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• Hyper-Focus on Benchmark Data Sets

• Hyper-Focus on Abstract Metrics

• Lack of Follow-Through

http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf

[ICML’12 Wagsta ]ff

Page 46: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• Meaningful Evaluation Methods

• Involvement of the World Outside ML

• Eyes on the Prize

http://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf

[ICML’12 Wagsta ]ff

Page 47: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

MSRA Software Analytics GroupMSRA Software Analytics Group

Utilize data-driven approach to help create highly performing, user friendly, and efficiently developed and operated software and services.

Information VisualizationInformation Visualization

Analysis AlgorithmsAnalysis Algorithms

Large-scale ComputingLarge-scale Computing

Research Topics Technology Pillars

Vertical

Horizontal

Contact: Dongmei Zhang ([email protected])

http://research.microsoft.com/groups/sa/

Page 48: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Software Analytics in Practice

Page 49: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Adoption Challenges for Software Analytics

Must show value Must show value before data quality before data quality

improvesimproves

Correlation vs. Correlation vs. CausationCausation

Page 50: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

ICSE Papers: Industry vs. Academia

Source© Carlo Ghezzi

OSDI 2008 26% vs. xSE ?%Developers, Programmers, Architects Among All Attendees

ICSM 11 KeynoteICSE 09 Keynote

MSR 12 KeynoteMSR 11 Keynote

SCAM 12 Keynote

Page 51: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

"Are Automated Debugging [Research] Techniques Actually Helping Programmers?"

• 50 years of automated debugging research– N papers only 5 evaluated with actual programmers

”[ISSTA11 Parnin&Orso]

Page 52: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Are Regression Testing [Research] Techniques Actually Helping Industry?

• Likely most studied testing problems– N papers

”[STVR11 Yoo&Harman]

Page 53: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Are [Some] Failure-Proneness Prediction [Research] Techniques Actually Helping?

• Empirical software engineering (on prediction)– N papers

[PROMISE11 Zeller et al.]

Page 54: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

A Researcher's Observation in HCI Research Community

• “The reviewers simply do not value the difficulty of building real systems and how hard controlled studies are to run on real systems for real tasks. This is in contrast with how easy it is to build new interaction techniques and then to run tight, controlled studies on these new techniques with small, artificial tasks”

“I give up on CHI/UIST” by James Landayhttp://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay

Page 55: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

• “This attitude is a joke and it offers researchers no incentive to do systems work. Why should they? Why should we put 3-4 person years into every CHI publication? Instead we can do 8 weeks of work on an idea piece or create a new interaction technique and test it tightly in 8-12 weeks and get a full CHI paper.”

A Researcher's Observation in HCI Research Community

“I give up on CHI/UIST” by James Landayhttp://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay

Page 56: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

A Researcher's Observation in HCI Research Community

• “When will this community wake up and understand that they are going to run out any work on creating new systems (rather than small pieces of systems) and cede that important endeavor to industry?”

• “We are our own worst enemies. I think we have been blinded by the perception that "true scientific" research is only found in controlled experiments and nice statistics.”

Does our research community

have similar issues??

“I give up on CHI/UIST” by James Landayhttp://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html Source©J. Landay

Page 57: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

MS Academic Search: “Pointer Analysis”

Page 58: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

“Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE’01]

58

“During the past 21 years, over 75 papers and 9 Ph.D. theses have been published on pointer analysis. Given the tones of work on this topic one may wonder, “Haven't we solved this problem yet?'' With input from many researchers in the field, this paper describes issues related to pointer analysis and remaining open problems.”

Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001)

Source©M. Hind

Page 59: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

“Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE’01]

59

Section 4.3 Designing an Analysis for a Client’s Needs

“Barbara Ryder expands on this topic: “… We can all write an unbounded number of papers that compare different pointer analysis approximations in the abstract. However, this does not accomplish the key goal, which is to design and engineer pointer analyses that are useful for solving real software problems for realistic programs.”

Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001)

Source©M. Hind&B. Ryder

Page 60: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

MS Academic Search: “Clone Detection”

Typically focus/evaluate on intermediate steps (e.g., clone detection) instead of ultimate tasks (e.g., bug detection or refactoring), even when the field already grows mature with n years of efforts on

intermediate steps

Page 61: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Some Success Stories of Applying Clone Detection [Focus on Ultimate Tasks]

61

Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proc. OSDI 2004.

MSRAXIAO

Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO: Tuning Code Clones at Hands of Engineers in Practice. In Proc. ACSAC 2012,

http://patterninsight.com/

http://www.blackducksoftware.com/

http://research.microsoft.com/en-us/groups/sa/

Page 62: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Suggested Actions Tech Adoption

• Get research problems from real practice

• Get feedback from real practice

• Collaborate across disciplines

• Collaborate with industry

Page 63: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

•Software AnalyticsData Exploration and AnalysisFor Software PractitionersObtain Insightful and Actionable infoWith Analytic Techniques

• Producing Impact on Practice

Page 64: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

AcknowledgmentsAcknowledgments

• Microsoft Research Asia Software Analytics Group

• Ahmed Hassan, Lin Tan, Jian Pei

• Many other colleagues

64

Page 65: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

Q&AQ&A

Page 66: Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign  taoxie@illinois.edu

•Software AnalyticsData Exploration and AnalysisFor Software PractitionersObtain Insightful and Actionable infoWith Analytic Techniques

• Producing Impact on Practice