example application: source code analysis

20
File Types in A lfresco S ource 1 10 100 1000 10000 Example application: source code analysis Example application: source code analysis le types; 8029 files; 4689 non-Java; 1112 svn revis

Upload: india-keith

Post on 01-Jan-2016

40 views

Category:

Documents


4 download

DESCRIPTION

Example application: source code analysis. 125 file types; 8029 files; 4689 non-Java; 1112 svn revisions. Querying Software Artefacts. source code. query engine. IDE plugin. version history. parsers. developer. bug reports. build scripts. dash board. software repository. databases. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Example application: source code analysis

1

File Types in Alfresco Source

1

10

100

1000

10000

Example application: source code analysisExample application: source code analysis

125 file types; 8029 files; 4689 non-Java; 1112 svn revisions

Page 2: Example application: source code analysis

2

build scripts

version history

spreadsheets

databases

config files

web pages

bug reports

softwarerepository

parsers

queryengine

analyst

dashboard

IDEplugin

exceladd-in

source code

developer

manager

Querying Software ArtefactsQuerying Software Artefacts

Page 3: Example application: source code analysis

3

The problemThe problem

design query language and enginefor accessing vast repository of different types of source artefact

libraries of queries:tailor framework to different types of artefact

Page 4: Example application: source code analysis

4

Tough problem!Tough problem!

Difficulties: - does not scale- efficient queries extremely hard to write- specific to one kind of source artefact

Dozens of attempts, in industry and academia since 1984: databases, prolog, domain-specific query languages

18 man-years of research at University of Oxford1996-2005 to discover ingredients of solution

15 man-years to implement an industrial product

3 patents pending, several more in pipeline

Page 5: Example application: source code analysis

5

SemmleCode: the power of .QLSemmleCode: the power of .QL

Page 6: Example application: source code analysis

6

The query language .QLThe query language .QL

Object-oriented, for creating libraries of queries

Recursive queries, as in logic programming

Familiar syntax to Java and SQL developers

On top of any traditional relational database

Syntax-highlighting, error-checking and auto-completion

Page 7: Example application: source code analysis

7

How it worksHow it works

.QL library

.QL query

RDBMS

proceduralSQL

java / jar

bytecodefor search

XMLfiles

templatefor RDBMS

Semmleoptimiser

Page 8: Example application: source code analysis

8

DemoDemo

The source we shall explore: Alfresco: Enterprise Content Management Spring: Java/JEE Application Framework Builds on Tomcat, JBoss, …

Demo parts:

• out-of-the-box• writing your own queries• querying XML config files

Vital statistics:

50553 Java methods6647 Java types516 XML files

Page 9: Example application: source code analysis

9

Using SemmleCode out-of-the-boxUsing SemmleCode out-of-the-box

115 pre-packaged queries

Find common bug patterns:e.g. compareTo/equals, cloning, serialisation, internationalization

Compute metrics:42 different metrics, including Robert Martin’s package metrics

Examine dependencies:e.g. cyclic package dependencies

Visualization:pie charts, bar charts, tables, graphs, warnings/errors- easy navigation to source- exportable for generating reports

Page 10: Example application: source code analysis

10

Writing queries of your own: Writing queries of your own: selectselect

from Method mwhere m.fromSource() and m.hasName("compareTo") and not m.getDeclaringType(). getAMethod().hasName("equals")select m, "missing equals?"

In general:

from <variable-declarations>where <conditions>select <results>

Page 11: Example application: source code analysis

11

Writing queries of your own: Writing queries of your own: aggregatesaggregates

select sum (CompilationUnit cu | cu.fromSource() | cu.getNumberOfLinesOfCode())

In general:

agg( T1 x1, …, Tn xn | condition | expr )

Page 12: Example application: source code analysis

12

Writing queries of your own: recursionWriting queries of your own: recursion

from RefType s, RefType t, RefType itwhere it.hasName("PasswordInputTag") and it.hasSupertype*(s) and it.hasSupertype*(t) and t.hasSupertype(s)select t,s

In general, can write recursive predicate definitions

Page 13: Example application: source code analysis

13

Queries in .QLQueries in .QL

from-where-selectautocompletion, typechecking, emptiness tests

aggregatesarbitrary nesting, no group-by needed

recursionimplicit with chaining; or explicit

Page 14: Example application: source code analysis

14

Defining new classes in .QLDefining new classes in .QL

class ClassAttribute extends XMLAttribute {

ClassAttribute() { this.getName()="class" }

string getClassName() { this.getValue() = result }

RefType getType() { result.getQualifiedName() = this.getClassName() }

predicate noType() { not exists(this.getType()) }}

from ClassAttribute cawhere ca.noType() and ca.getClassName().matches("org.alfresco%")select ca, ca.getClassName() + " not found"

Page 15: Example application: source code analysis

15

Classes in .QLClasses in .QL

classes are logical properties “constructor” specifies characteristic property

methodsbody is relation between this, result and parametersmore than one result allowed

predicatesmethods without a resultbody is relation between this and parameters

Page 16: Example application: source code analysis

16

The key points of .QLThe key points of .QL

classes are predicatesinheritance is implicationnondeterministic expressions

recursion with super-simple semantics

syntax familiar to SQL and Java programmers

designed for creating libraries of queries

excellent error checking and IDE integration

Page 17: Example application: source code analysis

Concluding remarksConcluding remarks

Page 18: Example application: source code analysis

18

Couldn’t you use LINQ instead of .QL?Couldn’t you use LINQ instead of .QL?

Different design goals:ORM versus libraries of queries

LINQ does not provide recursion

LINQ cannot do the optimisations across multiple queries that are key to efficiency in .QL

“Fortunately, there is light in the darkness. Based on decades of programming language research, the brilliant team at Semmle has created an elegant, industrial strength object-oriented query language called .QL with full support for recursive queries and aggregation… .QL has all the requisites to become a runaway success.”

(Erik Meijer, Creator of LINQ, Microsoft)

Page 19: Example application: source code analysis

19

Too good to be true?Too good to be true?

Jeff Ullman, 1991:

It is not possible for a query languageto be seriously logical and seriouslyobject-oriented at the same time.

key breakthroughs are Semmle’s proprietary technology:- design of .QL- optimisations on “bytecode for search”

Page 20: Example application: source code analysis

20

Wrapping upWrapping up

Java is not enoughsource code analysistools must process amultitude of artefacts

libraries of queriesa means to achieve suchheterogeneous tools

.QLobject-oriented queriesover trees and graphs made fast and easy