graph-based source code analysis of javascript repositories

Post on 21-Jan-2018

424 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Graph-Based Source Code Analysisof JavaScript Repositories

Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems

Fault Tolerant Systems Research Group

Dániel SteinGábor Szárnyas

Content

1. Context

2. Tooling

3. Use Cases

4. Neo4j Observations

2

Continuous Integration (CI)

– Developers working together

– Prevent integration problems

– Examples

– Jenkins

– Hudson

– Travis CI

3

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

DevelopmentVersion ControlSystem

CompilationUnit and

IntegrationTests

4

Apple,https://blog.codecentric.de/en/2014/02/curly-braces/

4

Apple,https://blog.codecentric.de/en/2014/02/curly-braces/

4

whoops

Apple,https://blog.codecentric.de/en/2014/02/curly-braces/

Static Analysis

– No need for compilation orexecution of the application

– Formatting, structural and semantic rule checking

– Can extend the workflow of continuous integration and improve it

– In this research we used codeanalysis utilizing patternmatching

5

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

DevelopmentVersion ControlSystem

CompilationUnit and

IntegrationTests

StaticAnalysis

Static Analysis

– No need for compilation orexecution of the application

– Formatting, structural and semantic rule checking

– Can extend the workflow of continuous integration and improve it

– In this research we used codeanalysis utilizing patternmatching

5

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

DevelopmentVersion ControlSystem

CompilationUnit and

IntegrationTests

StaticAnalysis

Static Analysis

– No need for compilation orexecution of the application

– Formatting, structural and semantic rule checking

– Can extend the workflow of continuous integration and improve it

– In this research we used codeanalysis utilizing patternmatching

5

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

– Java

– FindBugs

– PMD

– CheckStyle

DevelopmentVersion ControlSystem

CompilationUnit and

IntegrationTests

StaticAnalysis

Static Analysis

– No need for compilation orexecution of the application

– Formatting, structural and semantic rule checking

– Can extend the workflow of continuous integration and improve it

– In this research we used codeanalysis utilizing patternmatching

5

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

– Java

– FindBugs

– PMD

– CheckStyle

– JavaScript

– ESLint

– Facebook Infer, Flow

– Tern

– TAJS

DevelopmentVersion ControlSystem

CompilationUnit and

IntegrationTests

StaticAnalysis

– Thorough code analysis is time-consuming and resource-intensive

– For large projects it can be too slow

Problems to Solve

6

unit tests

static analysis

☼ ☆☾☆

– Thorough code analysis is time-consuming and resource-intensive

– For large projects it can be too slow

– Temporary solution: batching

Problems to Solve

6

unit tests

static analysis

☼ ☆☾☆

unit tests

static analyis

– Thorough code analysis is time-consuming and resource-intensive

– For large projects it can be too slow

– Temporary solution: batching

Present results

as soon and as fast

as possible.

Problems to Solve

6

unit tests

static analysis

☼ ☆☾☆

unit tests

static analyis

Problems to Solve

– Memory limits appear when...

– Global rules are checked

– Storing the structure in-memory

– For large code repositories

– Not being incremental

– Batched execution simplydoes not cut it

– Small change inducescomplete recheck

7

Our Approach

– Incremental methodology– Instead of batched execution

– Update the prepared results with theeffects of the change

– Only store the required parts in thememory

8

analyzer

Δ2.-1.1.

VCS Workspace Abstact SyntaxTree

Abstract SemanticGraph

Well-formednessRules

Query Execution Database

Main.js | ++----

Dependency.js | +++++-

FIterator.js | ----

Parser.js | ++

AutomaticWell-formedness

Rule Evaluation

Manual Executionand Data Extraction

Querying and Transformation

.

discoverer

ChangeProcessor.js

CommandParser.js

FileIterator.js

iterators

DepCollector.js

FileDiscoverer.js

InitIterator.js

Main.js

whitepages

ConnectionMgr.js

DependencyMgr.js

neo4jValidation Report

<!><?>

<.>

Module

declaration

declarators

items

binding init

left right

Architecture overview

9

VCS Workspace Abstact SyntaxTree

Abstract SemanticGraph

Well-formednessRules

Query Execution Database

Main.js | ++----

Dependency.js | +++++-

FIterator.js | ----

Parser.js | ++

AutomaticWell-formedness

Rule Evaluation

Manual Executionand Data Extraction

Querying and Transformation

.

discoverer

ChangeProcessor.js

CommandParser.js

FileIterator.js

iterators

DepCollector.js

FileDiscoverer.js

InitIterator.js

Main.js

whitepages

ConnectionMgr.js

DependencyMgr.js

neo4jValidation Report

<!><?>

<.>

Module

declaration

declarators

items

binding init

left right

Architecture overview

9

VCS Workspace Abstact SyntaxTree

Abstract SemanticGraph

Well-formednessRules

Query Execution Database

Main.js | ++----

Dependency.js | +++++-

FIterator.js | ----

Parser.js | ++

AutomaticWell-formedness

Rule Evaluation

Manual Executionand Data Extraction

Querying and Transformation

.

discoverer

ChangeProcessor.js

CommandParser.js

FileIterator.js

iterators

DepCollector.js

FileDiscoverer.js

InitIterator.js

Main.js

whitepages

ConnectionMgr.js

DependencyMgr.js

neo4jValidation Report

<!><?>

<.>

Module

declaration

declarators

items

binding init

left right

Architecture overview

9

VCS Workspace Abstact SyntaxTree

Abstract SemanticGraph

Well-formednessRules

Query Execution Database

Main.js | ++----

Dependency.js | +++++-

FIterator.js | ----

Parser.js | ++

AutomaticWell-formedness

Rule Evaluation

Manual Executionand Data Extraction

Querying and Transformation

.

discoverer

ChangeProcessor.js

CommandParser.js

FileIterator.js

iterators

DepCollector.js

FileDiscoverer.js

InitIterator.js

Main.js

whitepages

ConnectionMgr.js

DependencyMgr.js

neo4jValidation Report

<!><?>

<.>

Module

declaration

declarators

items

binding init

left right

Architecture overview

9

Code Processing Steps

20

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

21

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

22

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Sequence of statements

formalized in a given language

Code Processing Steps

23

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Sequence of statements

formalized in a given language

Code Processing Steps

24

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

25

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

token – the shortest character sequence still having meaning.

Code Processing Steps

26

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

token – the shortest character sequence still having meaning.

Code Processing Steps

27

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Token Token type

VAR (Keyword)

IDENTIFIER (Ident)

ASSIGN (Punctuator)

NUMBER (NumericLiteral)

DIV (Punctuator)

NUMBER (NumericLiteral)

token – the shortest character sequence still having meaning.

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

12

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LiteralNumericExpressionvalue = 1.0

LiteralNumericExpressionvalue = 0.0

declaration

declarators

items

binding init

left right

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

12

Abstract Syntax Tree (AST)

– Tree representation of

– the grammar structure of

– the sequence of tokens.

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LiteralNumericExpressionvalue = 1.0

LiteralNumericExpressionvalue = 0.0

declaration

declarators

items

binding init

left right

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

12

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LiteralNumericExpressionvalue = 1.0

LiteralNumericExpressionvalue = 0.0

declaration

declarators

items

binding init

left right

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

13

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LiteralNumericExpressionvalue = 1.0

LiteralNumericExpressionvalue = 0.0

declaration

declarators

items

binding init

left right

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

13

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `

LiteralNumericExpressionvalue = 1.0

declaration

declarators

items

binding init

left right

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

13

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `

LiteralNumericExpressionvalue = 1.0

declaration

declarators

items

binding init

left right

GlobalScope

Scope

Variablename = `foo`

Referenceaccessibility = `Write`

variables

references

children

Declarationkind = `Var`

declarations

node

astNode

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing StepsAbstract Semantic Graph(ASG)

– Graph, not necessarily tree.

– Semantic information besidesthe syntactic structure.

– Containscross-edges →

13

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `

LiteralNumericExpressionvalue = 1.0

declaration

declarators

items

binding init

left right

GlobalScope

Scope

Variablename = `foo`

Referenceaccessibility = `Write`

variables

references

children

Declarationkind = `Var`

declarations

node

astNode

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Code Processing Steps

13

Module

VariableDeclarationStatement

VariableDeclaration

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `

LiteralNumericExpressionvalue = 1.0

declaration

declarators

items

binding init

left right

GlobalScope

Scope

Variablename = `foo`

Referenceaccessibility = `Write`

variables

references

children

Declarationkind = `Var`

declarations

node

astNode

AST vs ASG

14

AST vs ASG

14

AST vs ASG

14

1SLOC

20-40-50nodes

Overview of the Approach

15

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

Verziókezelés

Fordítás

Fejlesztés

Egység- és integrációs teszt

Kódanalízis

DevelopmentVersion ControlSystem

CompilationUnit and

IntegrationTests

StaticAnalysis

Overview of the Approach

16

Overview of the Approach

16

VersionControlSystem

IntegratedDevelopmentEnvironment

Git, Visual StudioCode

Overview of the Approach

16

VersionControlSystem

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Git, Visual StudioCode ShapeSecurityShift

Overview of the Approach

16

VersionControlSystem

transformationIntegrated

DevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Git, Visual StudioCode ShapeSecurityShift Java, Cypher

Overview of the Approach

16

VersionControlSystem

transformation

graphdatabase

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j

Overview of the Approach

16

VersionControlSystem

transformation

graphdatabase

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

resultprocessing

Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j

Overview of the Approach

16

VersionControlSystem

transformation

graphdatabase

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

resultprocessing

Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j

Overview of the Approach

16

VersionControlSystem

transformation

graphdatabase

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

resultprocessing

Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j

Overview of the Approach

16

VersionControlSystem

transformation

graphdatabase

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

resultprocessing

Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j

Overview of the Approach

16

VersionControlSystem

transformation

graphdatabase

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

resultprocessing

Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j

Overview of the Approach

16

VersionControlSystem

transformationtransformation

graphdatabase

IntegratedDevelopmentEnvironment

tokenizer

source code

tokens

AST

ASG

parser

scope analyzer

resultprocessing

resultprocessing

Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j

Graph Pattern Matching

17

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LNExpressionvalue = 1.0

LNExpressionvalue = 0.0

Graph Pattern Matching

– Graph pattern

– A declarative,

– graph-like formalism

– expressing constraints.

17

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LNExpressionvalue = 1.0

LNExpressionvalue = 0.0

Graph Pattern Matching

– Graph pattern

– A declarative,

– graph-like formalism

– expressing constraints.

17

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LNExpressionvalue = 1.0

LNExpressionvalue = 0.0

binding be

right

Graph Pattern Matching

– Graph pattern

– A declarative,

– graph-like formalism

– expressing constraints.

17

VariableDeclarator

BindingIdentifiername = `foo`

BinaryExpressionoperator = `Div`

LNExpressionvalue = 1.0

LNExpressionvalue = 0.0

binding be

right

Graph Pattern Matching

– Graph pattern

– A declarative,

– graph-like formalism

– expressing constraints.

17

BindingIdentifiername = `foo`

Graphpatternqueryexpressed in Cypherlookingforadivisionbyzero

binding

Resultsof thepatternmatching

Use Cases static analysis

– Searching for local badsmells (linter warnings)

– without a case

– value set more than once

– Not used variable

– Global rules– Unreachable code parts

– Framework

– Freely extendable

– User-defined rules

– Easier to use than visitorpattern solutions

18

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

if

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

if condition

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

statement statement

if condition

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

statement statement

if

statement

condition

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

statement statement

error

if

statement

condition

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

statement statement

statement

error

if

statement

condition

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

statement statement

statement

error

if

statement

condition

Use Cases transformation

Control Flow Graph (CFG)

– Graph representation of

– every possiblestatement sequence

– during code execution.

19

statement

statement

statement statement

statement

error

if

return

statement

condition

error

Use Cases test generation

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

20

statement

statement

statement statement

statement

if

return

condition

statement

error

Use Cases test generation

– Inspecting control flows

– Is the given statement reachable

given the constraints on the

edges?

– Which one is the shortest route?

– Producing test input

for dynamic testing

20

statement

statement

statement statement

statement

if

return

condition

statement

Use Cases type inference

– Supporting dynamically typed languages

– Python

– JavaScript / ECMAScript

21

Use Cases type inference

– Supporting dynamically typed languages

– Python

– JavaScript / ECMAScript

21

http://marijnhaverbeke.nl/blog/tern.html

Use Cases impact analysis

– Adapting to the continuous integration workflow

– Handling multiple branches

– Following the modifications in a branch

– File-level incremental granularity

– Giving differential reports to the developers

22

Why Neo4j?+++

– Quick prototyping

– Supporting transactions

– Great tooling

--

– Not scaling well

– Only disk-based

23

Remarks MERGE

– MATCH or CREATE

– Great for the lazy

– Can be expensive

– Possible solutions:

– Less MERGE

– Separating queries

– Create first if not present

– Use MATCH instead of MERGE

– Prevention

– Prepare the structure when

inserting the data

24

Remarks MERGE

25

3 1

Remarks if-then-else

– Not a language element in

Cypher

– Can be solved with a trick

– Verrrrrry sloww

– Solution:

– Two smaller, disjunct cases

26

Remarks if-then-else

– Not a language element in

Cypher

– Can be solved with a trick

– Verrrrrry sloww

– Solution:

– Two smaller, disjunct cases

26

Remarks if-then-else

27

Remarks if-then-else

28

Remarks if-then-else

28

Remarks if-then-else

28

Remarks if-then-else

28

Remarks if-then-else

28∞ vs 15 sec

Remarks if-then-else

28∞ vs 15 sec

These are not chickens.

Remarks reachability

– Transitive closure without

length constraints is slow.

– Transitive closure over

repeating node/edge pattern

is only possible using tricks.

29

A B

*

Remarks reachability

– Transitive closure without

length constraints is slow.

– Transitive closure over

repeating node/edge pattern

is only possible using tricks.

29

A B

*

Remarks reachability

– Transitive closure without

length constraints is slow.

– Transitive closure over

repeating node/edge pattern

is only possible using tricks.

29

A B

*

Conclusions

– Source code analyzerframework

– Searching for global errorpatterns

– Close to real time feedback

– Type inference possible

– Test input generation possible

– Approach for both dynamicallyand statically typed languages

– Using Neo4j for

– Storing

– Pattern matching

– Transforming

– Version control

– Storing metadata

30

– Our work was supported by:

– ÚNKP*

– Microsoft Azure for Research

– MTA-BME Lendület Program

Project Details

– The frameworkprototype is open-source.

https://github.com/

ftsrg/codemodel-rifle

31

*Supported by the ÚNKP-16-2-I. New National Excellence Program of the Ministry of Human Capacities.

Project Details

– Supervisors

– Ádám Lippai

– Dávid Honfi

– Gábor Szárnyas

– Helped my research

– Tamás Soma Lucz

– Industrial case study

32

top related