generative programming (mostly parser generation)

58
Generative Programming Prof. Dr. Ralf Lämmel Universität Koblenz-Landau Software Languages Team Mainly with applications to language processing

Upload: ralf-laemmel

Post on 06-May-2015

656 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Generative programming (mostly parser generation)

Generative Programming

Prof. Dr. Ralf LämmelUniversität Koblenz-LandauSoftware Languages Team

Mainly with applications to language processing

Page 2: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Program generation

Specification ProgramProgram generator

Page 3: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Parser generation

Grammarwith some

codeParser

Program generator

Page 4: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Generative programming

Generative programming is a style of computer programming that uses automated source code creation through generic frames, classes, prototypes, templates, aspects, and code generators to improve programmer productivity ... It is often related to code-reuse topics such as component-based software engineering and product family engineering.

Retrieved from Wikipediaon 3 May, 2011.

Page 5: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Language processing patterns

manual patterns

generative patterns

1. The Chopper Pattern2. The Lexer Pattern3. The Copy/Replace Pattern4. The Acceptor Pattern5. The Parser Pattern6. The Lexer Generation Pattern7. The Acceptor Generation Pattern8. The Parser Generation Pattern9. The Text-to-object Pattern10.The Object-to-text Pattern11.The Text-to-tree Pattern12.The Tree-walk Pattern13.The Parser Generation2 Pattern

Page 6: Generative programming (mostly parser generation)

The Lexer Generation Pattern

See the Lexer Patternfor comparison

Page 7: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Lexer Generation Pattern

Intent:

Analyze text at the lexical level.

Use token-level grammar as input.Operational steps (compile time):

1. Generate code from the lexer grammar.

2. Write boilerplate code to drive the lexer.

3. Write code to process token/lexeme stream.

4. Build generated and hand-written code.

Page 8: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Token-level grammar for 101

COMPANY : 'company';DEPARTMENT : 'department';EMPLOYEE : 'employee';MANAGER : 'manager';ADDRESS : 'address';SALARY : 'salary';OPEN : '{';CLOSE : '}';STRING : '"' (~'"')* '"';FLOAT : ('0'..'9')+ ('.' ('0'..'9')+)?;WS : (' '|'\r'? '\n'|'\t')+;

Page 9: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Lexer generation

....g ....javaANTLR

Page 10: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

ANTLR

“ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages. ANTLR provides excellent support for tree construction, tree walking, translation, error recovery, and error reporting.”

http://www.antlr.org/

Page 11: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Boilerplate code for lexer execution

FileInputStream stream = new FileInputStream(file); ANTLRInputStream antlr = new ANTLRInputStream(stream); MyLexer lexer = new MyLexer(antlr); Token token; while ((token = lexer.nextToken()) !=Token.EOF_TOKEN) {

...}

Grammar-derived class

Page 12: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:antlrLexer

Page 13: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Define regular expressions for all tokens.Generate code for lexer.Set up input and token/lexeme stream.Implement operations by iteration over pairs.Build generated and hand-written code.

Summary of implementation aspects

Page 14: Generative programming (mostly parser generation)

The Acceptor Generation Pattern

See the Acceptor Pattern

for comparison

Page 15: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Acceptor Generation Pattern

Intent:Verify syntactical correctness of input.Use context-free grammar as input.Operational steps (compile time):

1. Generate code from the context-free grammar.

2. Write boilerplate code to drive the acceptor (parser).

3. Build generated and hand-written code.

Page 16: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

company : 'company' STRING '{' department* '}' EOF; department : 'department' STRING '{' ('manager' employee) ('employee' employee)* department* '}';

employee : STRING '{' 'address' STRING 'salary' FLOAT '}';

Wanted: a generated acceptor

Context-free grammar for 101

Page 17: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Parser generation

.g

...Lexer.java

...Parser.java

ANTLR

Page 18: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Driver for acceptor

import org.antlr.runtime.*;

public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); }}

This is basically boilerplate code. Copy and Paste!

Page 19: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Driver for acceptor

import org.antlr.runtime.*;

public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); }}

Prepare System.in as ANTLR input

stream

Page 20: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Driver for acceptor

import org.antlr.runtime.*;

public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); }} Set up lexer

for language

Page 21: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Driver for acceptor

import org.antlr.runtime.*;

public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); }} Obtain token

stream from lexer

Page 22: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Driver for acceptor

import org.antlr.runtime.*;

public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); }} Set up parser and

feed with token stream of lexer

Page 23: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Driver for acceptor

import org.antlr.runtime.*;

public class Test { public static void main(String[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new ...Lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ...Parser parser = new ...Parser(tokens); parser....(); }} Invoke start

symbol of parser

Page 24: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:antlrAcceptor

Page 25: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Define regular expressions for all tokens.

Define context-free productions for all nonterminals.

Generate code for acceptor (parser).

Set up input and token/lexeme stream.

Feed parser with token stream.

Acceptor (parser) execution = call start symbol.

Build generated and hand-written code.

Summary of implementation aspects

Page 26: Generative programming (mostly parser generation)

The Parser Generation Pattern

See the Parser Patternfor comparison

Page 27: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Parser Generation Pattern

Intent:Enable structure-aware functionality.

Use context-free grammar as the primary input.Operational steps (compile time):1. Include semantic actions into context-free grammar.

2. Generate code from enhanced grammar.

3. Write boilerplate code to drive the parser.

4. Build generated and hand-written code.

Page 28: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

A grammar rule with a semantic action

employee : STRING '{' 'address' STRING 'salary' s=FLOAT { total += Double.parseDouble($s.text); } '}';

Semantic actions are enclosed

between braces.

Variable local to parser execution

Retrieve and parse lexeme for the number

Page 29: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:antlrParser

Page 30: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Parser descriptions can also use “synthesized” attributes.

Add synthesized attributes to nonterminals.Add semantic actions to productions.

expr returns [int value] : e=multExpr {$value = $e.value;} ( '+' e=multExpr {$value += $e.value;} | '-' e=multExpr {$value -= $e.value;} )*;

(Context-free part in bold face.)

Compute an int value

Perform addition & Co.

Page 31: Generative programming (mostly parser generation)

The Text-to-object Pattern

Use parsing for AST construction

Page 32: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Text-to-object Pattern

Intent:Enable structure-aware OO programs.

Build AST objects along parsing.Operational steps (compile time):1. Define object model for AST.

2. Instantiate Parser Generation pattern as follows:

Use semantic actions for AST construction.

Page 33: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Acceptor---for comparison

company : 'company' STRING '{'

department*

'}' EOF;

Page 34: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

With semantic actions for object construction included

company returns [Company c]: { $c = new Company(); } 'company' STRING { $c.setName($STRING.text); } '{' ( topdept= department { $c.getDepts().add($topdept.d); } )* '}' ;

Construct company object.

Set name of company.

Add all departments.

Page 35: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:antlrObjects

Page 36: Generative programming (mostly parser generation)

The Object-to-text Pattern

See the Copy/Replace Pattern

for comparison

Page 37: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Text-to-object Pattern

Intent:Map abstract syntax (in objects) to concrete syntax.

Operational steps (run/compile time):This is ordinary OO programming.Walk the object model for the AST.Generate output via output stream.

Page 38: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Methods for pretty printing

public void ppCompany(Company c, String s) throws IOException { writer = new OutputStreamWriter(new FileOutputStream(s)); write("company"); space(); write(c.getName()); space(); write("{"); right(); nl(); for (Department d : c.getDepts()) ppDept(d); left(); indent(); write("}"); writer.close(); }

Undo indentation

Indent

Write into OutputStream

Pretty print substructures

Page 39: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:antlrObjects

Page 40: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Discussion

We should assume Visitor pattern on object model.Challenges:

How to ensure that correct syntax is generated?How can we maintain layout of input?

Alternative technology / technique options:Template processorsPretty printing libraries.

Page 41: Generative programming (mostly parser generation)

The Text-to-tree Pattern

Use trees rather than POJOsfor AST construction

along parsing

Mentioned in passing

Page 42: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Text-to-tree Pattern

Intent:

Enable structure-aware grammar-based programs.

Build AST trees along parsing.

Use grammar with tree construction actions as input.

Operational steps (compile time):1. Author grammar with tree construction actions.2. Generate parser.

Advanced stuff!

Page 43: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Tree construction with ANTLR

department :  'department' name=STRING '{'     manager    ('employee' employee)*    department*  '}'  -> ^(DEPT $name manager employee* department*)  ;

ANTLR’s special semantic actions for tree construction

Page 44: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:antlrTrees

Page 45: Generative programming (mostly parser generation)

The Tree-walk Pattern

Mentioned in passing

Page 46: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Tree-walk Pattern

Intent:

Operate on trees in a grammar-based manner.

Productions describe tree structure.

Semantic actions describe actions along tree walking.

Operational steps (compile time):1. Author tree grammar (as opposed to CFG).2. Add semantic actions.3. Generate and build tree walker.

Advanced stuff!

Page 47: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Total: Tree walking with ANTLRcompany :  ^(COMPANY STRING dept*)  ;  dept :  ^(DEPT STRING manager employee* dept*)  ;    manager :   ^(MANAGER employee)  ;     employee :  ^(EMPLOYEE STRING STRING FLOAT)  { total += Double.parseDouble($FLOAT.text); }  ;

Matching trees rather than

parsing text.

Semantic actions as before.

Page 48: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Cut: Tree walking with ANTLR

Match trees

Rebuild tree

topdown : employee;        employee :  ^(EMPLOYEE STRING STRING s=FLOAT)  -> ^(EMPLOYEE STRING STRING FLOAT[ Double.toString( Double.parseDouble( $s.text) / 2.0d)])  ;

Tree walking strategy

Page 49: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:antlrTrees

Page 50: Generative programming (mostly parser generation)

The Parser Generation2 Pattern

(That is, the Parser Generation Generation Pattern.)

See the Parser Pattern

andthe Parser Generation Pattern

for comparison

Page 51: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The Parser Generation2 Pattern

Intent:

Enable structure-aware OO programs.

Build AST objects along parsing.

Use idomatic context-free grammar as the only input.Operational steps (compile time):1. Author idiomatic context-free grammar.

2. Generate object model from context-free grammar.

3. Generate enhanced grammar for AST construction.

4. Generate parser and continue as before.

Page 52: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

More declarativeparser generation

idiomaticgrammar

Program generator

....g

....java

object model

Page 53: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Idiomatic grammar for 101

Company : ...;

Department : 'department' dname=QqString '{' " 'manager' manager=Employee"" subdepartments=Department*

" " employees=NonManager* " '}';

NonManager : ...;Employee : ...;

Label all nonterminal occurrences

Leverage predefined token-level grammar

Page 54: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Demo

http://101companies.org/wiki/Contribution:yapg

Page 55: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

Bootstrapping1. Develop ANTLR-based parser for idiomatic grammars.2. Develop object model for ASTs for idiomatic grammars.3. Develop ANTLR generator.

4. Develop object model generator.5. Define idiomatic grammar of idiomatic grammars.

6. Generate grammar parser from said grammar.7. Generate object model from said grammar.8. Remove (1.) and (2.).

Page 56: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

The grammar of grammars

Grammar " " : prods=Production*;

Production " : lhs=Id ':' rhs=Expression ';';

Expression ": Sequence | Choice;

Sequence "" : list=Atom*;

Choice "" " : first=Id rest=Option*;

Atom" " " " : Terminal | Nonterminal | Many;

Terminal "" : symbol=QString;

Nonterminal ": label=Id '=' symbol=Id;

Option "" " : '|' symbol=Id;

Many " " " : elem=Nonterminal '*';

Page 57: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

SummaryLanguage processing is a programming domain.

Grammars are program specifications in that domain.

Those specifications may be enhanced semantically.

Those specifications may also be constrained idiomatically.

Manual implementation of such specifications is not productive.

Parser generators automatically implement such specifications.

Parser generation is an instance of generative programming.

Page 58: Generative programming (mostly parser generation)

(C) 2010-2013 Prof. Dr. Ralf Lämmel, Universität Koblenz-Landau (where applicable)

A development cyclefor language-based tools

1. Author language samples.

2. Complete samples into test cases.

3. Author the EBNF for the language.

4. Refine EBNF into acceptor (using generation or not).

5. Build and test the acceptor.

6. Extend the acceptor with semantic actions.

7. Build and test the language processor.

Some bits of software engineering