abstract syntax trees
DESCRIPTION
Abstract Syntax Trees. Compiler Baojian Hua [email protected]. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Recap. Lexer Program source to token sequence Parser token sequence, and answer Y or N Today’s topic: - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/2.jpg)
Front End
source code
abstract syntax
tree
lexical analyzer
parser
tokens
IRsemantic analyzer
![Page 3: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/3.jpg)
Recap
Lexer Program source to token sequence
Parser token sequence, and answer Y or N
Today’s topic: abstract syntax trees
![Page 4: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/4.jpg)
Abstract Syntax Trees
Parse trees encodes the grammatical structure of the source program
However, they contain a lot of unnecessary information
What are essential here?
E
E * E
15 ( E )
E + E
3 4
![Page 5: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/5.jpg)
Abstract Syntax Trees For the compiler to
understand an expression, it only need to know operators and operands punctuations,
parentheses, etc. are not needed
Similar for statements, functions, etc.
E
E * E
15 ( E )
E + E
3 4
![Page 6: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/6.jpg)
Abstract Syntax Trees
E
E * E
15 ( E )
E + E
3 4
Times
Int 15 Plus
Int 3 Int 4
Parse tree Abstract syntax tree
![Page 7: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/7.jpg)
Concrete and Abstract Syntax
Concrete Syntax is needed for parsing includes punctuation symbols,
factoring, elimination of left recursion, depends on the format of the input
Abstract Syntax is simpler, more convenient internal representation clean interface between the parser and
the later phases of the compiler
![Page 8: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/8.jpg)
Concrete and Abstract Syntax
S
E
+E
T
F
2
T
x
3
FT *
F
E ::= E + T
| T
T ::= T * F
| F
F ::= id
| num
| ( E )
2 + 3 * x
![Page 9: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/9.jpg)
Concrete and Abstract Syntax
2 + 3 * x
E ::= id
| num
| E + E
| E * E
| ( E )
Plus
Int 2 Times
Int 3 Id x
![Page 10: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/10.jpg)
AST Data Structures
In the compiler, abstract syntax makes use of the implementation language to represent aspects of the grammatical structure
Highly target and implementation languages dependent arts more than science
![Page 11: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/11.jpg)
AST in SML(* data structures *)datatype exp = Int of int | Id of string | Add of exp * exp | Times of exp * exp
E ::= id
| num
| E + E
| E * E
| ( E )
(* to encode “2+3*x” *)val prog = Add (Int 2, Times (Int 3, Id “x”))
(* Compile “2+3*x”. To be covered later… *)val x86 = compile (prog)
![Page 12: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/12.jpg)
AST in SML(* calculate number of nodes in an ast *)
fun numNodes e =
case e
of Int _ => 1
| Id _ => 1
| Add (e1, e2) =>
1 + numNodes e1 + numNodes e2
| Times (e1, e2) =>
1 + numNodes e1 + numNodes e2
(* Note this may be too inefficient, why? *)
![Page 13: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/13.jpg)
AST in SML(* tail-recursion *)
fun numNodes (e, n) =
case e
of Int _ => 1 + n
| Id _ => 1 + n
| Add (e1, e2) =>
let val n’ = numNodes (e1, n)
in numNodes (e2, 1+n’)
end
| Times (e1, e2) => …(*similar)
![Page 14: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/14.jpg)
AST in SML(* yet another version using reference *)val nodes = ref 0;val op ++ = fn x => x := !x + 1fun numNodes e = case e of Int _ => ++ nodes | Id _ => ++ nodes | Add (e1, e2) => (numNodes e1 ; ++ nodes ; numNodes e2) ) | Times (e1, e2) => …(*similar)
![Page 15: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/15.jpg)
AST in C/* data structures */typedef struct exp *exp;enum expKind {INT, ID, ADD, TIMES};struct exp { enum expKind kind; union { int i; char *id; struct {exp e1; exp e2;} add; struct {exp e1; exp e2;} times; } u;};
E ::= id
| num
| E + E
| E * E
| ( E )
![Page 16: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/16.jpg)
AST in C/* sample program “2+3*x” */exp e1 = malloc (sizeof (*e1));e1->kind = INT;e1->u.i = 3;exp e2 = malloc (sizeof (*e2));e2->kind = ID;e2->u.id = “x”;exp e3 = malloc (sizeof (*e3));e3->kind = TIMES;e3->u.times.e1 = e1;e2->u.times.e2 = e2;…/* really boring and error-prone :-( */
E ::= id
| num
| E + E
| E * E
| ( E )
![Page 17: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/17.jpg)
AST in C(* number of nodes again *)int numNodes (exp e) { switch (e->kind) { case INT: return 1; case ID: return 1; case ADD: case TIMES: return 1+numNodes(e->u.add.e1) +numNodes(e->u.add.e2); default: error (“impossible”); }}
Aha, C compiler is stupid!
![Page 18: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/18.jpg)
AST in OO/* data structures */abstract class Exp {}class Int extends Exp {…}class Id extends Exp {…}class Add extends Exp {…}class Times extends Exp {…}
E ::= id
| num
| E + E
| E * E
| ( E )
/* to encode “2+3*x” */Exp prog = new Add (new Int (2), new Times (new Int (3), new Id (“x”)))
/* Not so ugly as C, but still boring */
![Page 19: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/19.jpg)
AST in OO(* number of nodes again *)int numNodes (Exp e) { if (e instanceof Int) return 1; else if (e instanceof Id) return 1; else if (e instanceof ADD) { Add f = (Add)e; return 1+numNodes(f.e1)+numNodes(f.e2); } …}
![Page 20: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/20.jpg)
AST Generations ML-Yacc uses an attribute-grammar scheme
each nonterminal may have a semantic value associated with it
when the parser reduces with (X ::= s1…sn) a semantic action will be executed uses semantic values from symbols in si
when parsing completes successfully parser returns semantic value associated with the sta
rt symbol usually an abstract syntax tree
![Page 21: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/21.jpg)
Attribute Grammars
E
T
F
2
2 + 3 * 4
+ 3 * 4
+ 3 * 4
+ 3 * 4
+ 3 * 4
3 * 4
* 4
* 4
* 4
2
factor
term
exp
exp +
exp + 3
exp + factor
exp + term
+
3
F
S
E
T
4
F*T
Each nonterminal is associated with a tree.
2
2
2
2
3
3
3
4
4
*
+
![Page 22: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/22.jpg)
Attribute Grammarsdatatype exp
= Id of string
| Num of int
| Add of exp * exp
| Times of exp * exp
%%
%%
e -> e PLUS e (Add (e1, e2))
| e TIMES e (Times (e1, e2))
| ID (Id ID)
| NUM (Num NUM)
![Page 23: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/23.jpg)
Source Position In one-pass compiler, error messages are pr
ecise early compilers never worry about with this
But in a multi-pass compiler, source positions must be stored in AST itself
(* Example *)type pos = …datatype exp = Int of int * pos | Id of string * pos | Add of exp * exp * pos | Times of exp * exp * pos
![Page 24: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/24.jpg)
Source Positiondatatype exp
= Id of string * pos
| Num of int * pos
| Add of exp * exp * pos
| Times of exp * exp * pos
%%
%%
e -> e PLUS e (Add (e1, e2, PLUSleft))
| e TIMES e (Times (e1, e2, TIMESleft))
| ID (Id (ID, IDleft))
| NUM (Num (NUM, NUMleft))
![Page 25: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/25.jpg)
Labs For lab #4, your job is to produce abstract
syntax trees from source programs we’ve offered code skeleton, you should firstly
familiarize yourself with it your job is to understand the “layout” function
etc. and glue the parser by adding semantic actions
Test your compiler carefully to make sure it parses the source programs correctly
![Page 26: Abstract Syntax Trees](https://reader035.vdocuments.us/reader035/viewer/2022062422/5681332d550346895d9a26b4/html5/thumbnails/26.jpg)
Summary
Abstract syntax trees are compiler internal representations of source programs interface between front-end and
compiler later parts Abstract syntax trees design is
language-dependent, and more art than science