polyglot an extensible compiler framework for java

32
Polyglot An Extensible Compiler Framework for Java Nathaniel Nystrom Michael R. Clarkson Andrew C. Myers Cornell University

Upload: leonard-hopper

Post on 30-Dec-2015

57 views

Category:

Documents


1 download

DESCRIPTION

Polyglot An Extensible Compiler Framework for Java. Nathaniel Nystrom Michael R. Clarkson Andrew C. Myers Cornell University. Language extension. Language designers often create extensions to existing languages e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif, ArchJava, ESCJava, Polyphonic C#, ... - PowerPoint PPT Presentation

TRANSCRIPT

PolyglotAn Extensible Compiler Framework for

Java

Nathaniel NystromMichael R. Clarkson

Andrew C. Myers

Cornell University

2

Language extension• Language designers often create

extensions to existing languages• e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif,

ArchJava, ESCJava, Polyphonic C#, ...• Want to reuse existing compiler

infrastructure as much as possible

• Polyglot is a framework for writing compiler extensions for Java

3

Requirements• Language extension

• Modify both syntax and semantics of the base language

• Not necessarily backward compatible• Goals:

• Easy to build and maintain extensions• Extensibility should be scalable

• No code duplication• Compilers for language extensions should

be open to further extension

4

Rejected approaches• In-place modification

• Macro languages• Limited to syntax extensions• Semantic checks after macro expansion

basecompiler

1.0

bug fixes &upgrades

basecompiler

2.0

copy &

modify

extensioncompiler

1.0

copy &

modify(again)

extensioncompiler

2.0

bug fixes &upgrades (again)

5

Polyglot• Base compiler is a complete Java front

end• 25K lines of Java

• Name resolution, inner class support, type checking, exception checking, uninitialized variable analysis, unreachable code analysis, ...

• Can reuse and extend through inheritance

6

Scalable extensibility

• Most compiler passes are sparse:

AST Nodes

Passes

Changes to the compiler should be proportional to changes in the

language.

+ if x e.f =

name resolution

type checking

exception checking

constant folding

+ if x e.f =

name resolution

type checking

exception checking

constant folding

+ if x e.f =

name resolution

type checking

exception checking

constant folding

+ if x e.f =

name resolution

type checking

exception checking

constant folding

+ if x e.f =

name resolution

type checking

exception checking

constant folding

7

Non-scalable approaches

Visitors

pass as ASTnode method(“naive OO”)

Polyglot

Easy to add or modifyPasses AST nodes

Using

8

Javasource

Javatarget

Javaparser

Codegenerator

AST rewritingpasses

Base Polyglot compiler

Polyglot architecture

Extsource

Javatarget

Extparser

Codegenerator

AST rewritingpasses

Ext2

sourceJava

target

Ext2

parserCode

generatorAST rewriting

passes

9

Architecture details• Parser written using PPG

• Adds grammar inheritance to Java CUP• AST nodes constructed using a node

factory• Decouples node types from implementation

• AST rewriting passes:• Each pass lazily creates a new AST• From naive OO: traverse AST invoking a

method at each node• From visitors: AST traversal factored out

10

Example: PAO• Primitive types as subclasses of Object• Changes type system, relaxes Java

syntax• Implementation: insert boxing and

unboxing code where needed

HashMap m;m.put(“two”, 2);int v = (int) m.get(“two”);

HashMap m;m.put(“two”, new Integer(2));int v = ((Integer) m.get(“two”)).intValue();

11

PAO implementation• Modify parser and type-checking pass

to permit e instanceof int• Parser changes with PPG:

include “java.cup”drop { rel_expr ::= rel_expr INSTANCEOF

ref_type }extend rel_expr ::= rel_expr:a INSTANCEOF

type:b {: RESULT = node_factory.Instanceof(a, b); :}

• Add one new pass to insert boxing and unboxing code

12

Implementing a new pass• Want to extend Node interface with rewrite() method

• Default implementation: identity translation• Specialized implementations: boxing and unboxing

• Mixin extensibility: extensions to a base class should be inherited by subclasses

typeCheck()

codeGen()

condthenelse

typeCheck()

codeGen()

lhsrhs

typeCheck()

codeGen()

Node

If Add

typeCheck()

codeGen()rewrite()

condthenelse

typeCheck()

codeGen()rewrite()

lhsrhs

typeCheck()

codeGen()rewrite()

13

Inheritance is inadequate

typeCheck()

codeGen()

condthenelse

typeCheck()

codeGen()

lhsrhs

typeCheck()

codeGen()

Node

If Add

typeCheck()

codeGen()rewrite()

condthenelse

typeCheck()

codeGen()rewrite()

lhsrhs

typeCheck()

codeGen()rewrite()

PaoNode

PaoIf PaoAdd

14

Inheritance is inadequatetypeCheck()codeGen()

Node

typeCheck()codeGen()

typeCheck()codeGen()

typeCheck()codeGen()

typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()typeCheck()

codeGen()typeCheck()codeGen()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

PaoNode

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

typeCheck()codeGen()rewrite()

15

Extension objects

Use composition to mixin methods and fields into AST node classes

exttypeCheck(

)codeGen()

extcondthenelse

typeCheck()

codeGen()

extlhsrhs

typeCheck()

codeGen()

Node

IfAdd

extrewrite()

PaoExt

PAO extension objects; installed into

all nodes by node factory

null

16

Extension objects

Extension objects have their own ext field to leave extension open

exttypeCheck(

)codeGen()

extcondthenelse

typeCheck()

codeGen()

extlhsrhs

typeCheck()

codeGen()

Node

IfAdd

extrewrite()

PaoExt

exttypeCheck()ext_type_inf

onull

17

Method invocation• A method may be implemented in the node

or in any one of several extension objects.

• Extension should call node.ext.ext.typeCheck()

• Base compiler should call: node.typeCheck()

• Cannot hardcode the calls

exttypeCheck(

)codeGen()

Nodeext

rewrite()

PaoExtext

typeCheck()ext_type_inf

onull

18

Delegate objects• Each node & extension object has a del field• Delegate object implements same interface as

node or ext• Directs call to appropriate method implementation

• Ex: node.del.typeCheck()• Ex: node.ext.del.rewrite()

• Run-time overhead < 2%

delext

typeCheck()

codeGen()

Nodedelext

rewrite()

PaoExtdelext

typeCheck()ext_type_inf

o

typeCheck()

codeGen()

{ node.ext.ext.typeCheck() }{ node.codeGen() }

JavaDel

null

19

Scalable extensibility• To add a new pass:

• Use an extension object to mixin default implementation of the pass for the Node base class

• Use extension objects to mixin specialized implementations as needed

• To change the implementation of an existing pass

• Use delegate object to redirect to method providing new implementation

• To create an AST node type:• Create a new subclass of Node• Or, mixin new fields to existing node using an

extension object

20

Polyglot family tree

Polyglot base (Java)

parameterizedtypes

Coffer PolyJ Jif

PAO

Jif/split

JMatch covariantreturn

21

Results• Can build small extensions in hours or

days• 10% of base code is interfaces and

factoriesExtension #

Tokens

% of Base

Polyglot base (Java) 166K 100 Jif 129K 78 JMatch 108K 65 Jif/split 99K 60 PolyJ 79K 48 Coffer 24K 14 PAO 6.1K 3.6 parameterized types

3.2K 2

covariant return 1.6K 1javac 1.1 132K 80

22

Related work• Other extensible compilers

• e.g., CoSy, SUIF• e.g., JastAdd, JaCo

• Macros• e.g., EPP, Java Syntax Extender, Jakarta• e.g., Maya

• Visitors• e.g., staggered visitors, extensible visitors

23

Conclusions• Several Java extensions have been

implemented with Polyglot• Programmer effort scales well with size

of difference with Java• Extension objects and delegate objects

provide scalable extensibility• Download from:

http://www.cs.cornell.edu/projects/polyglot

24

AcknowledgmentsBrandon Bray JMatch

Michael Brukman PPG

Steve Chong Jif, Jif/split, covariant return

Matt Harren JMatch

Aleksey Kliger JLtools, PolyJ

Jed Liu JMatch

Naveen Sastry JLtools

Dan Spoonhower JLtools

Steve Zdancewic Jif, Jif/split

Lantian Zheng Jif, Jif/split

http://www.cs.cornell.edu/projects/polyglot

Questions?

26

Mixin extensibility

typeCheck()

codeGen()

condthenelse

typeCheck()

codeGen()

lhsrhs

typeCheck()

codeGen()

Node

If Add

typeCheck()

codeGen()rewrite()

condthenelse

typeCheck()

codeGen()rewrite()

lhsrhs

typeCheck()

codeGen()rewrite()

Inheritance does not provide mixin extensibility:when a base class is extended, subclasses should

inherit the changes

27

Other Polyglot features• Quasi-quoting library

• Useful for translation from extension language AST to base language or intermediate language AST

qqStmt(“if (%e.relabelsTo(%e)) %s;

else %s;”,

new Object[] { L, Li, then_body, else_body });

• Automatic separate compilation• Serialize type information and store in the AST• Encoded into the class file via javac• Extracted from class file using reflection

• Data-flow analysis framework

28

PAO rewriting

• rewrite(ts) called for each AST node:

class PaoExt extends Ext { Node rewrite(PaoTypeSystem ts) { return node(); } }

class PaoInstanceofExt extends PaoExt { Node rewrite(PaoTypeSystem ts) { Instanceof e = (Instanceof) node(); Type rtype = e.compareType(); // e.g., “e instanceof int” “e instanceof Integer” if (rtype.isPrimitive()) return e.compareType(ts.boxedType(rtype)); else return n; } }

29

Node factories• Each extension has a node factory (nf)• To create a node of type T, call method

nf.T()• T() may return an extension-specific

subclass of T• T() attaches the extension and

delegate objects to T via a call to extT()• Mixin extensibility: if T is a subclass of

S, then extT() will call extS()

30

Results• Can build small extensions in hours,

days• 10% of base compiler code is interfaces

and factories Extension Lexer

Parser

Total

% of Base

javac 1.1 (excl. bytecode asm)

119K

72

Polyglot base 2.7K

11K 166K

100

Jif 2.7K

5.4K 129K

78

JMatch 2.7K

14K 108K

65

PolyJ 2.7K

3.5K 79K 48

Coffer 2.7K

2.7K 24K 14

PAO 2.7K

169 6.1K

3.6

Param 0 0 3.2K

2

covariant return 0 0 1.6K

1

31

Why not output bytecode?• Wanted to be able to read the output• The symmetry is satisfying• Limitations of Java as a target language

• Scoping rules sometimes make it difficult to output Java code, especially with inner classes

• Lack of goto can make generated control flow inefficient

32

Nameresolution

TranslationSemanticchecking