mining software data: code tao xie university of illinois at urbana-champaign taoxie/...

58
Mining Software Data: Code Tao Xie University of Illinois at Urbana- Champaign http://web.engr.illinois.edu/ ~taoxie/ [email protected]

Upload: harold-hood

Post on 02-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

Mining Software Data: Code

Tao XieUniversity of Illinois at Urbana-Champaign

http://web.engr.illinois.edu/~taoxie/

[email protected]

Page 2: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

2

MAIN GOAL Transform static record-

keeping SE data to active data

Make SE data actionable by uncovering hidden patterns and trends

Mining Software Engineering Data

MailingsBugzilla

Code repository

Executiontraces

CVS

Page 3: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

Mining Software Engineering Data

code bases

change history

programstates

structuralentities

software engineering data

bugreports/nl

programming defect detection testing debugging maintenance

software engineering tasks

data mining techniques

https://sites.google.com/site/asergrp/dmse

Page 4: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

Mining Software Engineering Data

code bases

change history

programstates

structuralentities

software engineering data

bugreports/nl

programming defect detection testing debugging maintenance

software engineering tasks

data mining techniques

Page 5: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

5

5

Programmers commonly reuse APIs of existing frameworks or libraries

– Advantages: High productivity of development– Challenges: Complexity and lack of documentation– Consequences:

• Spend more efforts in understanding APIs• Introduce defects in API client code

– Solution: Mining API properties as common patterns across API client code

Frameworks

Motivation

Page 6: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

6

Basic mining

algorithms

Solution-Driven Problem-Driven

Advanced mining

algorithmsNew/adapted

mining algorithms

Where can I apply X miner? What patterns do we really need?

E.g., frequent partial order mining [ESEC/FSE 07]

E.g., association rule mining, frequent itemset mining…

E.g., [ICSE 09], [ASE 09]

Page 7: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

7

77

Code repositoriesCode repositories

1 2 N…

1 2mining patterns

searching mining patterns

Code search engine e.g., Open source code

on the web

Eclipse, Linux, …

Traditional approaches

Our new approaches

Often lack sufficient relevant data points (Eg. API call sites)

Code repositories

Mining Searching + Mining

Page 8: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

8

Agenda MotivationMining Sequence Association Rules (CAR-Miner) [ICSE 09] Detecting Exception-Handling Defects

Mining Alternative Patterns(Alattin) [ASE 09] Detecting Neglected Condition Defects

Conclusion

Page 9: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

9

APIs throw exceptions during runtime errorsExample: Session API of Hibernate framework throws

HibernateException

APIs expect client applications to implement recovery actions after exceptions occur

Example: Hibernate Session API expects client application to rollback open uncommitted transactions after HibernateException occurs

Failure to handle exceptions results inFatal issues, e.g., database lock won’t be released if the

transaction is not rolled back

Exception Handling

Page 10: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

10

Use exception-handling specification to detect violations as defects

Problem: Often specifications are not documentedSolution: Mine specifications from existing API client codeChallenges:

Limited data points: Only from a few code bases searching + mining

Limited expressiveness: Not sufficient to characterize common exception-handling behaviors: why?

Problem Addressed by CAR-Miner

Page 11: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

11

Example1.1: .. .1.2: Orac leD ataS o urce o ds = nu ll; S ess ion sess ion = n ull; Co nn ec tio n conn = nu ll; S tatem e nt state m ent = n ull;1.3: logg er .deb ug ("S ta rt ing u pd ate");1.4: try {1.5: od s = n ew Ora c leD ataS ou rce( );1.6: od s.setUR L(" jdbc :orac le:thin:sco tt/t iger@ 1 92 .1 68 .1 .2 :152 1:catfish") ;1.7: conn = od s.g etCo nne c tio n() ;1.8: state m ent = co nn.cre ate S ta tem e nt( );1.9: state m ent .exe cuteU pda te("D E LE TE F RO M table 1") ;1.10 : conn ec tio n.com m it() ; }1.11 : catch (S QL Excep tio n se ) {1.13 : log ger .error ("E xcep tion o ccur re d"); }1.14 : f ina lly {1.15 : if(s ta te m en t != n ull) { s tatem e nt.clo se () ; }1.16 : if(conn != nu ll) { conn .c lose( ); }1.17 : if(od s != n ull) { o ds .clo se () ; } } 1.18 : }

S ce nar io 1 Defect: No rollback done

when SQLException occurs Requires specification such

as “Connection should be rolled back when a connection is created and SQLException occurs”

Q: Should every connection instance has to be rolled back when SQLException occurs?

Missing “conn.rollback()”

Page 12: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

12

Example (cont.)2.1: Co nn ec tio n conn = nu ll;2.2: S ta te m en t s tm t = null;2.3: B uffe re dW riter b w = null; F ileW riter fw = nu ll;2.3: try {2.4: fw = n ew F il eW riter( "ou tp ut.tx t") ;2.5: bw = B ufferedW r ite r(fw );2.6: conn = D riverM an age r.ge tCon ne ctio n(" jdb c:p l:d b", "ps ", "ps ");2.7: S tatem e nt stm t = c onn.cre ateS ta te m ent( );2.8: Re sultS e t re s = s tm t.ex ec ute Que ry("S E LE CT P a th FR OM File s") ;2.9: wh ile (res .n ex t() ) {2.10 : bw .w r ite (res .g etS tring (1 ));2.11 : }2.12 : re s.c lose( );2.13 : } catch( IO E xcep tio n ex ) { log ge r.er ror (" IOE xce ption o ccur re d");2.14 : } f ina lly {2.15 : if(s tm t != nu ll) s tm t.clo se () ;2.16 : if(conn != nu ll ) co nn.c lose( );2.17 : if (bw != nu ll) bw .c lose( );2.18 : }

1.1: ...1.2: Orac leD ataS o urce od s = nu ll; S ess ion sessio n = n ull; Co nn ec tio n conn = nu ll; S tatem e nt statem ent = n ull;1.3: logg er .deb ug ("S ta rt ing u pd ate");1.4: try {1.5: od s = n ew Ora c leD ataS ou rce( );1.6: od s.setUR L(" jdbc :orac le:th in:sco tt/t ig er@ 1 92 .1 68 .1 .2 :152 1:ca tfish");1.7: conn = od s.g etCo nne c tio n() ;1.8: state m ent = co nn.cre ate S ta tem e nt( );1.9: state m ent .exe cuteU pda te("D E LE TE F RO M table1 ") ;1.10 : conn ec tio n.com m it() ; }1.11 : catch (S QL Excep tio n se ) {1.12 : if (co nn != n ull) { c onn.rollb ack () ; } 1.13 : log ger .error ("E xcep tion o ccur re d"); }1.14 : f ina lly {1.15 : if(s ta te m en t != n ull) { s tatem e nt.clo se () ; }1.16 : if(conn != nu ll) { co nn .c lose( ); }1.17 : if(od s != n ull) { o ds .clo se () ; } } 1.18 : }

S cena rio 2S ce nar io 1

Specification: “Connection creation => Connection rollback” Satisfied by Scenario 1 but not by Scenario 2 But Scenario 2 has no defect

c

Page 13: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

13

Simple association rules of the form “FCa => FCe” are not expressive

Requires more general association rules (sequence association rules) such as

(FCc1 FCc2) Λ FCa => FCe1, where

FCc1 -> Connection conn = OracleDataSource.getConnection()

FCc2 -> Statement stmt = Connection.createStatement()

FCa -> stmt.executeUpdate()

FCe1 -> conn.rollback()

Example (cont.)

Page 14: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

14

Simple association rules of the form “FCa => FCe” are not expressive

Requires more general association rules (sequence association rules) such as

(FCc1 FCc2) Λ FCa => FCe1, where

FCc1 -> Connection conn = OracleDataSource.getConnection()

FCc2 -> Statement stmt = Connection.createStatement()

FCa -> stmt.executeUpdate() //Triggering ActionFCe1 -> conn.rollback()

Example (cont.)

Page 15: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

15

Simple association rules of the form “FCa => FCe” are not expressive

Requires more general association rules (sequence association rules) such as

(FCc1 FCc2) Λ FCa => FCe1, where

FCc1 -> Connection conn = OracleDataSource.getConnection()

FCc2 -> Statement stmt = Connection.createStatement()

FCa -> stmt.executeUpdate()

FCe1 -> conn.rollback() //Recovery Action

Example (cont.)

Page 16: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

16

Simple association rules of the form “FCa => FCe” are not expressive

Requires more general association rules (sequence association rules) such as

(FCc1 FCc2) Λ FCa => FCe1, where

FCc1 -> Connection conn = OracleDataSource.getConnection()

FCc2 -> Statement stmt = conn.createStatement() //Context

FCa -> stmt.executeUpdate()

FCe1 -> conn.rollback()

Example (cont.)

Page 17: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

17

CAR-Miner Approach

InputApplication

Check whether there are any exception-related

defects

Classes and Functions

Open Source Projects on web Open Source Projects on web

1 2 N…

…Exception-Flow

GraphsStatic Traces

SequenceAssociation

RulesViolations

Extract classes and functions

reused

Issue queries and collect relevant code examples. Eg: “lang:java

java.sql.Statement executeUpdate”Construct exception-

flow graphs

Collect static traces

Mine static traces

Detect violations

Page 18: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

18

CAR-Miner Approach

InputApplication

Classes and Functions

Open Source Projects on web Open Source Projects on web

1 2 N…

…Exception-Flow

GraphsStatic Traces

SequenceAssociation

RulesViolations

Page 19: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

Exception-Flow-Graph Construction

Based on a previous algorithm [Sinha&Harrold TSE 00] : normal execution path ----: exceptional execution path

Page 20: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

20

Exception-Flow-Graph Construction

Prevent infeasible edges using a sound static analysis [Robillard&Murphy FSE 99]

Page 21: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

21

CAR-Miner Approach

InputApplication

Classes and Methods

Open Source Projects on web Open Source Projects on web

1 2 N…

…Exception-Flow

GraphsStatic Traces

SequenceAssociation

RulesViolations

Page 22: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

22

Static Trace Generation

Collect static traces with the actions taken when exceptions occur

A static trace for Node 7:“4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”

Page 23: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

23

Static Trace Generation Includes 3 sections:

Normal function-call sequence (4 -> 5 -> 6)

Function call (7) Exception

function-call sequence (15 -> 16 -> 17)

A static trace for Node 7: “4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”

Page 24: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

24

Trace Post-Processing

Identify and remove unrelated function calls using data dependency

“4 -> 5 -> 6 -> 7 -> 15 -> 16 -> 17”

4: FileWriter fw = new FileWriter(“output.txt”)

5: BufferedWriter bw = new BufferedWriter(fw)

...

7: Statement stmt = conn.createStatement()

...

Filtered sequence “6 -> 7 -> 15 -> 16“

Page 25: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

25

CAR-Miner Approach

InputApplication

Classes and Methods

Open Source Projects on web Open Source Projects on web

1 2 N…

…Exception-Flow

GraphsStatic Traces

SequenceAssociation

RulesViolations

Page 26: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

26

Static Trace Mining Handle traces of each function call (triggering

function call) individually

Input: Two sequence databases with a one-to-one mapping

• normal function-call sequences (context)• exception function-call sequences (recovery)

Objective: Generate sequence association rules of the form

(FCc1 ... FCcn) Λ FCa => FCe1 ... FCenContext Trigger Recovery

Page 27: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

27

Input: Two sequence databases with a one-to-one mapping

Mining Problem Definition

Objective: To get association rules of the formFC1 FC2 ... FCm -> FE1 FE2 ... FEn

where {FC1, FC2, ..., Fcm} Є SDB1 and {FE1, FE2, ..., Fen} Є SDB2

Existing association rule mining algorithms cannot be directly applied on multiple sequence databases

Context Recovery

Page 28: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

28

Annotate the sequences to generate a single combined database

Mining Problem Solution

Apply frequent subsequence mining algorithm [Wang and Han, ICDE 04] to get frequent sequences

Transform mined sequences into sequence association rules

Rank rules based on the support assigned by frequent subsequence mining algorithm

(3 10) Λ FCa => (2 8)Context Trigger Recovery

Page 29: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

29

CAR-Miner Approach

InputApplication

Classes and Methods

Open Source Projects on web Open Source Projects on web

1 2 N…

…Exception-Flow

GraphsStatic Traces

SequenceAssociation

RulesViolations

Page 30: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

30

Violation Detection Analyze each call site of triggering call FCa

Step 1: Extract context call sequence “CC1 CC2 ... CCm” from the beginning of the function to the call site of FCa

Step 2: If CC1 CC2 ... CCm is super-sequence of FCc1 ... FCcn

Report any missing function calls of {FCe1 ... FCen} in any exception path

API client: (CC1 CC2 ... CCm) Λ FCa => Missing any? isSuperSeqOf API Rule: (FCc1 ... FCcn) Λ FCa => FCe1 ... FCen

Context Trigger Recovery

Page 31: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

31

EvaluationResearch Questions:

1. Do the mined rules represent real rules?2. Do the detected violations represent real

defects?3. Does CAR-Miner perform better than WN-

miner [Weimer and Necula, TACAS 05]?4. Do the sequence association rules help

detect new defects?

Page 32: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

32

Subjects

Internal Info: classes and methods belonging to the app External Info: classes and methods used by the app Code examples: #files collected through code search engine

Page 33: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

33

RQ1: Real Rules

Real rules: 55% (Total: 294)Usage patterns: 3%False positives: 43%

Do the mined rules represent real rules?

Page 34: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

34

RQ1: Distribution of Real Rules for Axion

#false positives is quite low between 1 to 60 rules

Distribution of rules based on ranks assigned by CAR-Miner

Page 35: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

35

RQ2: Detected Violations Do the detected violations represent real defects?

Total number of defects: 160 New defects not found by WN-Miner approach: 87

Page 36: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

36

RQ2: Status of Detected Violations

HsqlDB developers responded on the first 10 reported defects

Accepted 7 defects Rejected 3 defects

Reason given by HsqlDB developers for rejected defects:“Although it can throw exceptions in general, it should not throw with

HsqlDB, So it is fine”

Page 37: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

37

RQ3: Comparison with WN-miner Does CAR-Miner performs better than WN-miner?

Found 224 new rules and missed 32 rules CAR-Miner detected most of the rules mined by WN-miner Two major factors:

sequence association rules Increase in the data scope

Page 38: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

38

RQ4: New defects by sequence association rules

Detected 21 new real defects among all applications

Do the sequence association rules detect new defects?

Page 39: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

39

Agenda MotivationMining Sequence Association Rules (CAR-Miner) [ICSE 09] Detecting Exception-Handling Defects

Mining Alternative Patterns(Alattin) [ASE 09] Detecting Neglected Condition Defects

Conclusion

Page 40: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

40

40

Existing approaches produce a large number of false positives

One major observation: Programmers often write code in different ways for

achieving the same task Some ways are more frequent than others

Large Number of False Positives

Frequent ways

Infrequent ways

Mined Patterns

mine patterns detect violations

Page 41: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

41

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Example 2

Code Sample 2

Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements

Page 42: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

42

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries){ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

PrintEntries2(ArrayList<string> entries){ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

Mined Pattern from existing approaches:“boolean check on return of Iterator.hasNext before Iterator.next”

Page 43: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

43

Example: java.util.Iterator.next()

Require more general patterns (alternative patterns): P1 or P2

P1 : boolean check on return of Iterator.hasNext before Iterator.nextP2 : boolean check on return of ArrayList.size before Iterator.next

Cannot be mined by existing approaches, since alternative P2 is infrequent

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries){ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

PrintEntries2(ArrayList<string> entries){ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

Page 44: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

44

Our Solution: ImMiner Algorithm Mines alternative patterns of the form P1 or P2

Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

P2 is frequent among code examples not supporting P1

P2 is infrequent among entire 1243 code examples

Page 45: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

45

Alternative Patterns ImMiner mines three kinds of alternative patterns of the general form “P1 or P2”

Balanced: all alternatives (both P1 and P2) are frequent

Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^

2”

Single: only one alternative

Page 46: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

46

ImMiner Algorithm Uses frequent-itemset mining [Burdick et al. ICDE 01] iteratively

An input database with the following APIs for Iterator.next()

Input database Mapping of IDs to APIs

Page 47: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

47

ImMiner Algorithm: Frequent AlternativesInput database

Frequent itemset mining

(min_sup 0.5)

Frequent item: 1P1: boolean-check on the return of

Iterator.hasNext() before Iterator.next()

Page 48: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

48

ImMiner: Infrequent Alternatives of P1

Positive database (PSD)

Negative database (NSD)

Split input database into two databases: Positive and Negative

Mine patterns that are frequent in NSD and are infrequent in PSD Reason: Only such patterns serve as alternatives for P1

Alternative Pattern : P2 “const check on the return of ArrayList.size() before Iterator.next()”

Alattin applies ImMiner algorithm to detect neglected conditions

Page 49: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

49

Neglected Conditions Neglected conditions refer to

Missing conditions that check the arguments or receiver of the API call before the API call

Missing conditions that check the return or receiver of the API call after the API call

One of the primary reasons for many fatal issues security or buffer-overflow vulnerabilities [Chang et

al. ISSTA 07]

Page 50: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

50

Alattin Approach

ApplicationUnder Analysis

Detect neglected conditions

Classes and methods

Open Source Projects on web Open Source Projects on web

1 2 N…

…Pattern

Candidates

Alternative Patterns

Violations

Extract classes and methods

reused

Phase 1: Issue queries and collect relevant code samples. Eg: “lang:java

java.util.Iterator next”Phase 2: Generate pattern candidates

Phase 3: Mine alternative patterns

Phase 4: Detect neglected conditions statically

Page 51: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

51

Evaluation Research Questions:

1. Do alternative patterns exist in real applications?

2. How high percentage of false positives are reduced (with low or no increase of false negatives) in detected violations?

Page 52: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

52

Subjects

Two categories of subjects: 3 Java default API libraries 3 popular open source libraries

#Samples: #code examples collected from Google code search

Page 53: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

53

RQ1: Balanced and Imbalanced Patterns How high percentage of balanced and imbalanced patterns exist in real

apps?

Balanced patterns: 0% to 30% (average: 9.69%) Imbalanced patterns:

30% to 100% (average: 65%) for Java default API libraries 0% to 9.5% (average: 5%) for open source libraries

Explanation: Java default API libraries provide more different ways of writing code compared to open source libraries

Page 54: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

54

RQ2: False Positives and False Negatives How high % of false positives are reduced (with low or no increase of

false negatives)? Applied mined patterns (“P1 or P2 or ... or Pi or A^

1 or A^2 or ... or A^

j ”) in three modes:

Existing mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

P1 ,P2, ... , Pi

Balanced mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

“P1 or P2 or ... or Pi” Imbalanced mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

Page 55: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

55

RQ2: False Positives and False Negatives

Application Existing Mode Balanced Mode

Defects False Positives

Defects False Positives

% of reduction

False Negatives

Java Util 37 104 37 104 0 0

Java Transaction

51 105 51 105 0 0

Java SQL 56 143 56 90 37.06 0

BCEL 2 14 2 8 42.86 0

HSqlDB 1 0 1 0 0 0

Hibernate 10 9 10 8 11.11 0

AVERAGE/TOTAL

15.17 0

Existing Mode vs Balanced Mode

Balanced mode reduced false positives by 15.17% without any increase in false negatives

Page 56: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

RQ2: False Positives and False Negatives

Application Existing Mode Imbalanced Mode

Defects False Positives

Defects False Positives

% of reduction

False Negatives

Java Util 37 104 36 74 28.85 1

Java Transaction

51 105 47 76 27.62 4

Java SQL 56 143 53 81 43.36 3

BCEL 2 14 2 6 57.04 0

HSqlDB 1 0 1 0 0 0

Hibernate 10 9 10 8 11.11 0

AVERAGE/TOTAL

28.01 8

Existing Mode vs Imbalanced Mode

Imbalanced mode reduced false positives by 28% with quite small increase in false negatives

56

Page 57: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

57

Conclusion Problem-driven methodology by identifying

new problems, patterns mining algorithms, defects

CAR-Miner [ICSE 09]: mining sequence association rules of the form

(FCc1 ... FCcn) Λ FCa => (FCe1 ... Fcen) Context Trigger Recovery

reduce false negatives Alattin [ASE 09]: mining alternative patterns classified

into three categories: balanced, imbalanced, and single P1 or P2 or ... or Pi or A^

1 or A^2 or ... or A^

j reduce false positives

Page 58: Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign taoxie/ taoxie@illinois.edu

58

Thank You

Questions?

Tao XieUniversity of Illinois at Urbana-Champaign

http://web.engr.illinois.edu/~taoxie/[email protected]