reducing redundancies in multi-revision code analysis

65
Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall Software Evolution and Architecture Lab University of Zurich, Switzerland {alexandru,panichella,gall}@ifi.uzh.ch 22.02.2017 SANER’17 Klagenfurt, Austria

Upload: sebastiano-panichella

Post on 03-Mar-2017

88 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Reducing Redundancies in Multi-Revision Code Analysis

Carol V. Alexandru, Sebastiano Panichella, Harald C. GallSoftware Evolution and Architecture Lab

University of Zurich, Switzerland{alexandru,panichella,gall}@ifi.uzh.ch

22.02.2017

SANER’17

Klagenfurt, Austria

Page 2: Reducing Redundancies in Multi-Revision Code Analysis

The Problem Domain

• Static analysis (e.g. #Attr., McCabe, coupling...)

1

Page 3: Reducing Redundancies in Multi-Revision Code Analysis

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

2

• Static analysis (e.g. #Attr., McCabe, coupling...)

Page 4: Reducing Redundancies in Multi-Revision Code Analysis

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

2

• Static analysis (e.g. #Attr., McCabe, coupling...)

• Many revisions, fine-grained historical data

Page 5: Reducing Redundancies in Multi-Revision Code Analysis

A Typical Analysis Process

3

www

clone

select project

Page 6: Reducing Redundancies in Multi-Revision Code Analysis

A Typical Analysis Process

www

clone

checkout

select project

select revision

3

Page 7: Reducing Redundancies in Multi-Revision Code Analysis

A Typical Analysis Process

www

clone

checkout analysis tool

Res

select project

select revision

applytool

storeanalysis

results

3

Page 8: Reducing Redundancies in Multi-Revision Code Analysis

A Typical Analysis Process

www

clone

checkout analysis tool

Res

select project

select revision

applytool

storeanalysis

results

more revisions?

3

Page 9: Reducing Redundancies in Multi-Revision Code Analysis

A Typical Analysis Process

www

clone

checkout analysis tool

Res

select project

select revision

applytool

storeanalysis

results

more revisions?

more projects?

3

Page 10: Reducing Redundancies in Multi-Revision Code Analysis

Redundancies all over...

4

Redundancies in historical code analysis

Impact on Code Study Tools

Page 11: Reducing Redundancies in Multi-Revision Code Analysis

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Across RevisionsImpact on Code

Study Tools

4

Page 12: Reducing Redundancies in Multi-Revision Code Analysis

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

4

Page 13: Reducing Redundancies in Multi-Revision Code Analysis

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Changes may not even affect results

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

4

Page 14: Reducing Redundancies in Multi-Revision Code Analysis

Redundancies all over...

4

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Page 15: Reducing Redundancies in Multi-Revision Code Analysis

Redundancies all over...

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Re-implementing identical analyses

Generalizability is expensive

4

Page 16: Reducing Redundancies in Multi-Revision Code Analysis

Redundancies all over...

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Re-implementing identical analyses

Generalizability is expensive

5

Page 17: Reducing Redundancies in Multi-Revision Code Analysis

Important!

6

Techniques implemented

in LISA

Your favouriteanalysis features

Pick what you like!

Page 18: Reducing Redundancies in Multi-Revision Code Analysis

#1: Avoid Checkouts

Page 19: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

7

clone

Page 20: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

7

clone

checkout

read write

Page 21: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

7

clone

checkout

read

read write

analyze

Page 22: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

7

clone

checkout

read

For every file: 2 read ops + 1 write opCheckout includes irrelevant filesNeed 1 CWD for every revision to be analyzed in parallel

read write

analyze

Page 23: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

8

clone analyze

read

Page 24: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

read

Page 25: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

read

Page 26: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

}

read

Page 27: Reducing Redundancies in Multi-Revision Code Analysis

Avoid checkouts

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

}

read

9

Page 28: Reducing Redundancies in Multi-Revision Code Analysis

#2: Use a multi-revision representationof your sources

Page 29: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

10

Page 30: Reducing Redundancies in Multi-Revision Code Analysis

rev. 1

rev. 2

rev. 3

rev. 4

11

rev. 1

Merge ASTs

Page 31: Reducing Redundancies in Multi-Revision Code Analysis

rev. 1

rev. 2

rev. 3

rev. 4

12

rev. 2

Merge ASTs

Page 32: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

13

rev. 3

Page 33: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

14

rev. 4

Page 34: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

15

Page 35: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

16

rev. range [1-4]

rev. range [1-2]

Page 36: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

17

Page 37: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

18

Page 38: Reducing Redundancies in Multi-Revision Code Analysis

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

19

Page 39: Reducing Redundancies in Multi-Revision Code Analysis

#3: Store AST nodes only ifthey're needed for analysis

Page 40: Reducing Redundancies in Multi-Revision Code Analysis

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for each method and class?

20

Page 41: Reducing Redundancies in Multi-Revision Code Analysis

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

140 AST nodes(using ANTLR)

parse

What's the complexity (1+#forks) and name for each method and class?

20

Page 42: Reducing Redundancies in Multi-Revision Code Analysis

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

140 AST nodes(using ANTLR)

parseCompilationUnit

TypeDeclaration

Modifiers

public

Members

Method

Modifiers

public

Name

run

Name

Demo

Parameters ReturnType

PrimitiveType

VOID

Body

Statements

...

...

What's the complexity (1+#forks) and name for each method and class?

20

Page 43: Reducing Redundancies in Multi-Revision Code Analysis

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for each method and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

21

Page 44: Reducing Redundancies in Multi-Revision Code Analysis

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for eachmethod and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

22

Page 45: Reducing Redundancies in Multi-Revision Code Analysis

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for eachmethod and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

23

Page 46: Reducing Redundancies in Multi-Revision Code Analysis

#4: Use non-duplicative data structuresto store your results

Page 47: Reducing Redundancies in Multi-Revision Code Analysis

rev. 1

rev. 2

rev. 3

rev. 4

24

Page 48: Reducing Redundancies in Multi-Revision Code Analysis

rev. 1

rev. 2

rev. 3

rev. 4

24

Page 49: Reducing Redundancies in Multi-Revision Code Analysis

rev. 1

rev. 2

rev. 3

rev. 4

24

[1-1]

label

#attr

mcc

[4-4]

label

#attr

mcc

InnerClass

0 4

1 2 4

[2-3]

label

#attr

mcc

Page 50: Reducing Redundancies in Multi-Revision Code Analysis

rev. 1

rev. 2

rev. 3

rev. 4 [1-1]

label

#attr

mcc

[4-4]

label

#attr

mcc

InnerClass

0 4

1 2 4

[2-3]

label

#attr

mcc

25

Page 51: Reducing Redundancies in Multi-Revision Code Analysis

LISA also does:#5: Parallel Parsing#6: Asynchronous graph computation#7: Generic graph computationsapplying to ASTs from compatiblelanguages

26

Page 52: Reducing Redundancies in Multi-Revision Code Analysis

To Summarize...

Page 53: Reducing Redundancies in Multi-Revision Code Analysis

The LISA Analysis Process

27

www

clone

select project

Page 54: Reducing Redundancies in Multi-Revision Code Analysis

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

select project

Language Mappings(Grammar to Analysis)

determines which ASTnodes are loaded

ANTLRv4Grammar used by

Generates Parser

Page 55: Reducing Redundancies in Multi-Revision Code Analysis

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

Res

select project

storeanalysis

results

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines which ASTnodes are loaded

determines whichdata is persisted

ANTLRv4Grammar used by

Generates Parser

runson graph

Page 56: Reducing Redundancies in Multi-Revision Code Analysis

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

Res

select project

storeanalysis

results

more projects?

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines which ASTnodes are loaded

determines whichdata is persisted

ANTLRv4Grammar used by

Generates Parser

runson graph

Page 57: Reducing Redundancies in Multi-Revision Code Analysis

How well does it work, then?

Page 58: Reducing Redundancies in Multi-Revision Code Analysis

Marginal cost for +1 revision

41.670

4.6330.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033

0

5

10

15

20

25

30

35

40

45

50

1 10 100 1000 2000 3000 4000 5000 6000 7000

Average Parsing+Computation time per Revision when analyzing n revisions of AspectJ (10 common metrics)

Seco

nd

s

# of revisions

28

Page 59: Reducing Redundancies in Multi-Revision Code Analysis

Overall Performance Stats

29

Language Java C# JavScript

#Projects 100 100 100

#Revisions 646'261 489'764 204'301

#Files (parsed!) 3'235'852 3'234'178 507'612

#Lines (parsed!) 1'370'998'072 961'974'773 194'758'719

Total Runtime (RT)¹ 18:43h 52:12h 29:09h

Median RT¹ 2:15min 4:54min 3:43min

Tot. Avg. RT per Rev.² 84ms 401ms 531ms

Med. Avg. RT per Rev.² 30ms 116ms 166ms

¹ Including cloning and persisting results² Excluding cloning and persisting results

Page 60: Reducing Redundancies in Multi-Revision Code Analysis

What's the catch?(There are a few...)

Page 61: Reducing Redundancies in Multi-Revision Code Analysis

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

30

Page 62: Reducing Redundancies in Multi-Revision Code Analysis

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

• Moved files/methods etc. add overhead

• Uniquely identifying files/entities is hard

• (No impact on results, though)

30

Page 63: Reducing Redundancies in Multi-Revision Code Analysis

Language matters

31

Javascript C# Java

E.g.: Javascript takes longer because:• Larger files, less modularization• Slower parser (automatic semicolon-insertion)

Page 64: Reducing Redundancies in Multi-Revision Code Analysis

LISA is E X TREME

32

complexfeature-richheavyweight

simplegeneric

lightweight

Page 65: Reducing Redundancies in Multi-Revision Code Analysis

Thank you for your attention

SANER ‘17, Klagenfurt, 22.02.2017

Read the paper: http://t.uzh.ch/Fj

Try the tool: http://t.uzh.ch/Fk

Get the slides: http://t.uzh.ch/Fm

Contact me: [email protected]