bernd fischer [email protected] rw713: compiler and software language engineering

28
Bernd Fischer [email protected] RW713: Compiler and Software Language Engineering

Upload: gary-anthony

Post on 19-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Bernd [email protected]

RW713: Compiler andSoftware Language Engineering

Page 2: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Parsing and Unparsing... in a Broad Sense

Vadim Zaytsev, Anya Helene Bagge, Parsing in a Broad Sense,MoDELS’14, LNCS 8767, pp.50-67, 2014, Springer.

Page 3: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Program Representations

SLE tools use different program representations at different abstraction levels:

• textual: strings, tokens, ...• structural: parse trees, ASTs, ...• graphical: vector drawings, graphs, UML models,...

Different representations are typically connected by pairs of bidirectional transformations:

text AST

parsing

unparsing

Page 4: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Textual Representations

• unstructured string of individual characters

• flat sequence of strings (lexemes)– incl. spaces, line breaks, comments etc. (layout)

• flat sequence of typed tokens– with attributes and lexemes but without layout

• structured sequence of typed token groups

f arg = arg +1;

f ;1+=arg' ' ' ' arg' ' ' '

id(f) ;num(1)+=id(arg) id(arg)

id(f) ;num(1)+=id(arg) id(arg)

Page 5: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Structural Representations

• set of alternative parse trees (parse forest)“ambiguity node”

Page 6: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Structural Representations

• parse tree (incl. layout)

Page 7: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Structural Representations

• concrete syntax tree (without layout)

Page 8: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Structural Representations

• abstract syntax tree

Page 9: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Graphical Representations

• rasterized picture• vector graph (drawing)• generic graph• abstract graph model

Page 10: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

All representations can bemerged into one “Mega-Model”.

Page 11: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Tokenization

Definition: A tokenizer tokL : Str → Tok for a lexical grammar L maps a character sequence c1,..., cn toa token sequence w1,...,wk so that their concate-nations are equal (i.e., c1+...+cn = w1+...+wk). Its reverse operation is concat.tokL and concat satisfy the following equations:

∀x Str: ∈ concat (tokL(x)) = x

∀y Tok: ∈ tokL(concat (y)) = ylanguage-independent

Page 12: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Adding/Removing Layout

Definition: A strip operation strip: Tok → TTk removes layout information, while a format operation formatL : TTk → Tok introduces it.strip and formatL satisfy the following equation:

∀x TTk: ∈ strip(formatL(x)) = x

What about

∀y Tok: ∈ formatL(strip(y)) = y

language-independent

not injective

also for trees and graphs

Page 13: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Parsing/Unparsing

keeps layout

ignores layout

Page 14: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Imploding/Exploding ASTs

Page 15: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Imploding/Exploding ASTs

Page 16: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Pretty-Printing

Page 17: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

What is Pretty-Printing?

Page 18: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

What is Pretty-Printing?

A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)

... so that the output is “pretty”

Page 19: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

What is Pretty?

w = 128:

var x:integer; y:char; begin x := 1; y := ’a’; end

w = 40: var x:integer; y:char; begin x := 1; y := ’a’; end

w = 30: var x:integer; y:char; begin x := 1; y := ’a’; end

var x:integer; y:char; begin x := 1;y := ’a’; end var

x:integer;y:char; beginx := 1;y := ’a’;end

Page 20: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

What is Pretty-Printing?

A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)

... so that• line breaks and indentation represent logical

structure• line breaks and indentation are used consistently• line breaks are minimized

Page 21: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Pretty-Printing Architecture

Bad Ideas:• print text during AST traversal• post-processing on raw text

Instead:• generate (new) AST containing text and mark-up

– for layout hints• interpret mark-up AST and generate raw text

(source)AST

unparsing (mark-up)AST

raw textlayout

language-independentsyntax-directed

Page 22: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Oppen-style Mark-up

Oppen’s (core) algorithm uses two mark-up elements:• blanks: positions where a line can be broken

– can denote number of indentation spaces• groups: sequences of elements that are printed on

one line, if possible; otherwise each element is printed on its own line– represented as pair of opening and closing brackets– any two elements must be separated by a blank– blanks can be “inconsistent”, i.e., printer tries to fit as

many elements as possible on one line before breaking

Page 23: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Oppen-style Mark-up

Examples:[[var blank(2) [x:integer; blank(0) y:char;]]

blank(0) [begin blank(2) [x := 1; blank(0) y := ’a’;] blank(0) end]]

vs.[[var x:integer; blank(4) y:char;]]

blank(0) [begin blank(0) [x := 1; blank(2) y := ’a’;] blank(0) end]]

Page 24: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Box-style Mark-up

Uses boxes (similar to Oppen’s groups)• basic boxes

– plain strings keywords

– subtrees

• horizontal boxes

• vertical boxes

• more: HV, HOV, I, ALT

_1

“foo” KW [ “foo” ]

H [ ]B B Bhs=x

B B B

V [ ]

vs=y is=i B B B

B

B

B

Page 25: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Pretty-print tables can be generated.

Exp.IfThen -- KW["if"] _1 KW["then"] _2, Exp.Let -- KW["let"] _1 KW["in"] _2 KW["end"],

Exp.Let.1:iter-star -- _1,Exp.Let.2:iter-star-sep -- _1 KW[";"]

"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}

Page 26: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Pretty-print tables can be modified.

"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}

Exp.Let -- V vs=1 is=0 [ V vs=1 is=2 [KW["let"] _1] V vs=1 is=2 [KW["in"] _2] KW["end"]

]

Page 27: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Further Reading

• Derek C. Oppen: Prettyprinting. ACM Trans. Program. Lang. Syst. 2(4):465-483 (1980).

• Philip Wadler: A prettier printer. In: The Fun of Programming. A symposium in honour of Professor Richard Bird's 60th birthday Examination Schools, Oxford, 24-25 March 2003.http://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf

Page 28: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Further Reading (II)

• Mark van den Brand, Eelco Visser: Generation of Formatters for Context-Free Languages. ACM Trans. Softw. Eng. Methodol. 5(1):1-41 (1996).

• Tobi Vollebregt, Lennart C. L. Kats, Eelco Visser:Declarative specification of template-based textual editors. LDTA 2012: 8