bernd fischer bfischer@cs.sun.ac.za rw713: compiler and software language engineering

Post on 19-Jan-2016

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bernd Fischerbfischer@cs.sun.ac.za

RW713: Compiler andSoftware Language Engineering

Parsing and Unparsing... in a Broad Sense

Vadim Zaytsev, Anya Helene Bagge, Parsing in a Broad Sense,MoDELS’14, LNCS 8767, pp.50-67, 2014, Springer.

Program Representations

SLE tools use different program representations at different abstraction levels:

• textual: strings, tokens, ...• structural: parse trees, ASTs, ...• graphical: vector drawings, graphs, UML models,...

Different representations are typically connected by pairs of bidirectional transformations:

text AST

parsing

unparsing

Textual Representations

• unstructured string of individual characters

• flat sequence of strings (lexemes)– incl. spaces, line breaks, comments etc. (layout)

• flat sequence of typed tokens– with attributes and lexemes but without layout

• structured sequence of typed token groups

f arg = arg +1;

f ;1+=arg' ' ' ' arg' ' ' '

id(f) ;num(1)+=id(arg) id(arg)

id(f) ;num(1)+=id(arg) id(arg)

Structural Representations

• set of alternative parse trees (parse forest)“ambiguity node”

Structural Representations

• parse tree (incl. layout)

Structural Representations

• concrete syntax tree (without layout)

Structural Representations

• abstract syntax tree

Graphical Representations

• rasterized picture• vector graph (drawing)• generic graph• abstract graph model

All representations can bemerged into one “Mega-Model”.

Tokenization

Definition: A tokenizer tokL : Str → Tok for a lexical grammar L maps a character sequence c1,..., cn toa token sequence w1,...,wk so that their concate-nations are equal (i.e., c1+...+cn = w1+...+wk). Its reverse operation is concat.tokL and concat satisfy the following equations:

∀x Str: ∈ concat (tokL(x)) = x

∀y Tok: ∈ tokL(concat (y)) = ylanguage-independent

Adding/Removing Layout

Definition: A strip operation strip: Tok → TTk removes layout information, while a format operation formatL : TTk → Tok introduces it.strip and formatL satisfy the following equation:

∀x TTk: ∈ strip(formatL(x)) = x

What about

∀y Tok: ∈ formatL(strip(y)) = y

language-independent

not injective

also for trees and graphs

Parsing/Unparsing

keeps layout

ignores layout

Imploding/Exploding ASTs

Imploding/Exploding ASTs

Pretty-Printing

What is Pretty-Printing?

What is Pretty-Printing?

A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)

... so that the output is “pretty”

What is Pretty?

w = 128:

var x:integer; y:char; begin x := 1; y := ’a’; end

w = 40: var x:integer; y:char; begin x := 1; y := ’a’; end

w = 30: var x:integer; y:char; begin x := 1; y := ’a’; end

var x:integer; y:char; begin x := 1;y := ’a’; end var

x:integer;y:char; beginx := 1;y := ’a’;end

What is Pretty-Printing?

A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)

... so that• line breaks and indentation represent logical

structure• line breaks and indentation are used consistently• line breaks are minimized

Pretty-Printing Architecture

Bad Ideas:• print text during AST traversal• post-processing on raw text

Instead:• generate (new) AST containing text and mark-up

– for layout hints• interpret mark-up AST and generate raw text

(source)AST

unparsing (mark-up)AST

raw textlayout

language-independentsyntax-directed

Oppen-style Mark-up

Oppen’s (core) algorithm uses two mark-up elements:• blanks: positions where a line can be broken

– can denote number of indentation spaces• groups: sequences of elements that are printed on

one line, if possible; otherwise each element is printed on its own line– represented as pair of opening and closing brackets– any two elements must be separated by a blank– blanks can be “inconsistent”, i.e., printer tries to fit as

many elements as possible on one line before breaking

Oppen-style Mark-up

Examples:[[var blank(2) [x:integer; blank(0) y:char;]]

blank(0) [begin blank(2) [x := 1; blank(0) y := ’a’;] blank(0) end]]

vs.[[var x:integer; blank(4) y:char;]]

blank(0) [begin blank(0) [x := 1; blank(2) y := ’a’;] blank(0) end]]

Box-style Mark-up

Uses boxes (similar to Oppen’s groups)• basic boxes

– plain strings keywords

– subtrees

• horizontal boxes

• vertical boxes

• more: HV, HOV, I, ALT

_1

“foo” KW [ “foo” ]

H [ ]B B Bhs=x

B B B

V [ ]

vs=y is=i B B B

B

B

B

Pretty-print tables can be generated.

Exp.IfThen -- KW["if"] _1 KW["then"] _2, Exp.Let -- KW["let"] _1 KW["in"] _2 KW["end"],

Exp.Let.1:iter-star -- _1,Exp.Let.2:iter-star-sep -- _1 KW[";"]

"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}

Pretty-print tables can be modified.

"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}

Exp.Let -- V vs=1 is=0 [ V vs=1 is=2 [KW["let"] _1] V vs=1 is=2 [KW["in"] _2] KW["end"]

]

Further Reading

• Derek C. Oppen: Prettyprinting. ACM Trans. Program. Lang. Syst. 2(4):465-483 (1980).

• Philip Wadler: A prettier printer. In: The Fun of Programming. A symposium in honour of Professor Richard Bird's 60th birthday Examination Schools, Oxford, 24-25 March 2003.http://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf

Further Reading (II)

• Mark van den Brand, Eelco Visser: Generation of Formatters for Context-Free Languages. ACM Trans. Softw. Eng. Methodol. 5(1):1-41 (1996).

• Tobi Vollebregt, Lennart C. L. Kats, Eelco Visser:Declarative specification of template-based textual editors. LDTA 2012: 8

top related