![Page 1: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/1.jpg)
Compila(on 0368-‐3133
Lecture 1: Introduc(on
Lexical Analysis
Noam Rinetzky 1
![Page 2: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/2.jpg)
2
![Page 3: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/3.jpg)
Admin
• Lecturer: Noam Rinetzky – [email protected] – h.p://www.cs.tau.ac.il/~maon
• T.A.: Orr Tamir
• Textbooks: – Modern Compiler Design – Compilers: principles, techniques and tools
3
![Page 4: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/4.jpg)
Admin
• Compiler Project 40% – 4.5 prac(cal exercises – Groups of 3
• 1 theore(cal exercise 10% – Groups of 1
• Final exam 50% – must pass
4
![Page 5: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/5.jpg)
Course Goals
• What is a compiler • How does it work • (Reusable) techniques & tools
5
![Page 6: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/6.jpg)
Course Goals
• What is a compiler • How does it work • (Reusable) techniques & tools
• Programming language implementa(on – run(me systems
• Execu(on environments – Assembly, linkers, loaders, OS
6
![Page 7: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/7.jpg)
Introduc(on
Compilers: principles, techniques and tools
7
![Page 8: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/8.jpg)
What is a Compiler?
8
![Page 9: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/9.jpg)
What is a Compiler?
“A compiler is a computer program that transforms source code wriaen in a programming language (source language) into another language (target language).
The most common reason for wan(ng to transform source code is to create an executable program.”
-‐-‐Wikipedia
9
![Page 10: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/10.jpg)
What is a Compiler? source language target language
Compiler
Executable
code
exe Source
text
txt
10
![Page 11: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/11.jpg)
What is a Compiler?
Executable
code
exe Source
text
txt
Compiler
int a, b; a = 2; b = a*2 + 1;
MOV R1,2 SAL R1 INC R1 MOV R2,R1
11
![Page 12: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/12.jpg)
What is a Compiler? source language target language
C C++
Pascal Java
Postscript
TeX
Perl JavaScript
Python Ruby
Prolog
Lisp
Scheme ML
OCaml
IA32 IA64
SPARC C
C++ Pascal Java
Java Bytecode
…
Compiler
12
![Page 13: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/13.jpg)
High Level Programming Languages • Impera(ve Algol, PL1, Fortran, Pascal, Ada, Modula, C
– Closely related to “von Neumann” Computers
• Object-‐oriented Simula, Smalltalk, Modula3, C++, Java, C#, Python – Data abstrac(on and ‘evolu(onary’ form of program development • Class an implementa(on of an abstract data type (data+code) • Objects Instances of a class • Inheritance + generics
• Func(onal Lisp, Scheme, ML, Miranda, Hope, Haskel, OCaml, F#
• Logic Programming Prolog 13
![Page 14: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/14.jpg)
More Languages • Hardware descrip(on languages VHDL
– The program describes Hardware components – The compiler generates hardware layouts
• Graphics and Text processing TeX, LaTeX, postscript – The compiler generates page layouts
• Scrip(ng languages Shell, C-‐shell, Perl – Include primi(ves constructs from the current sojware environment
• Web/Internet HTML, Telescript, JAVA, Javascript • Intermediate-‐languages Java bytecode, IDL
14
![Page 15: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/15.jpg)
High Level Prog. Lang., Why?
15
![Page 16: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/16.jpg)
High Level Prog. Lang., Why?
16
![Page 17: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/17.jpg)
Compiler vs. Interpreter
17
![Page 18: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/18.jpg)
Compiler
• A program which transforms programs • Input a program (P) • Output an object program (O)
– For any x, “O(x)” “=“ “P(x)” Compiler
Source
text
txt Executable
code
exe
P O 18
![Page 19: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/19.jpg)
Compiling C to Assembly
Compiler
int x; scanf(“%d”, &x); x = x + 1 ; printf(“%d”, x);
add %fp,-‐8, %l1 mov %l1, %o1 call scanf ld [%fp-‐8],%l0 add %l0,1,%l0 st %l0,[%fp-‐8] ld [%fp-‐8], %l1 mov %l1, %o1 call printf
5
6
19
![Page 20: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/20.jpg)
Interpreter
• A program which executes a program • Input a program (P) + its input (x) • Output the computed output (P(x))
Interpreter
Source
text
txt
Input
Output
20
![Page 21: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/21.jpg)
Interpre(ng (running) .py programs
• A program which executes a program • Input a program (P) + its input (x) • Output the computed output (“P(x)”)
Interpreter
5
int x; scanf(“%d”, &x); x = x + 1 ; printf(“%d”, x);
6
21
![Page 22: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/22.jpg)
Compiler vs. Interpreter Source
Code
Executable
Code
Machine
Source
Code
Intermediate
Code
Interpreter
preprocessing
processing preprocessing
processing
22
![Page 23: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/23.jpg)
Compiled programs are usually more efficient than
scanf(“%d”,&x); y = 5 ; z = 7 ; x = x + y * z; printf(“%d”,x);
add %fp,-‐8, %l1 mov %l1, %o1 call scanf mov 5, %l0 st %l0,[%fp-‐12] mov 7,%l0 st %l0,[%fp-‐16] ld [%fp-‐8], %l0 ld [%fp-‐8],%l0 add %l0, 35 ,%l0 st %l0,[%fp-‐8] ld [%fp-‐8], %l1 mov %l1, %o1 call printf
Compiler
23
![Page 24: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/24.jpg)
Compilers report input-‐independent possible errors • Input-‐program
• Compiler-‐Output – “line 88: x may be used before set''
scanf(“%d”, &y); if (y < 0)
x = 5; ... If (y <= 0)
z = x + 1;
24
![Page 25: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/25.jpg)
Interpreters report input-‐specific definite errors
• Input-‐program
• Input data – y = -‐1 – y = 0
scanf(“%d”, &y); if (y < 0)
x = 5; ... If (y <= 0)
z = x + 1;
25
![Page 26: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/26.jpg)
Interpreter vs. Compiler
• Conceptually simpler – “define” the prog. lang.
• Can provide more specific error report
• Easier to port
• Faster response (me
• [More secure]
• How do we know the transla(on is correct?
• Can report errors before input is given
• More efficient code – Compila(on can be expensive – move computa(ons to compile-‐(me
• compile-‐:me + execu:on-‐:me < interpreta:on-‐:me is possible
26
![Page 27: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/27.jpg)
Concluding Remarks
• Both compilers and interpreters are programs wriaen in high level language
• Compilers and interpreters share func(onality
• In this course we focus on compilers
27
![Page 28: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/28.jpg)
Ex 0: A Simple Interpreter
28
![Page 29: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/29.jpg)
Toy compiler/interpreter
• Trivial programming language • Stack machine • Compiler/interpreter wriaen in C • Demonstrate the basic steps
• Textbook: Modern Compiler Design 1.2
29
![Page 30: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/30.jpg)
Conceptual Structure of a Compiler
Executable
code
exe Source
text
txt
Semantic
Representation
Backend
(synthesis)
Compiler
Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
Intermediate Representation
(IR)
Code
Generation
30
![Page 31: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/31.jpg)
Structure of toy Compiler / interpreter
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 31
![Page 32: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/32.jpg)
Source Language
• Fully parameterized expressions • Arguments can be a single digit
ü (4 + (3 * 9)) ✗ 3 + 4 + 5 ✗ (12 + 3)
expression → digit | ‘(‘ expression operator expression ‘)’ operator → ‘+’ | ‘*’ digit → ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
32
![Page 33: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/33.jpg)
The abstract syntax tree (AST)
• Intermediate program representa(on • Defines a tree
– Preserves program hierarchy
• Generated by the parser • Keywords and punctua(on symbols are not stored – Not relevant once the tree exists
33
![Page 34: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/34.jpg)
Concrete syntax tree# for 5*(a+b)
expression
number expression ‘*’
identifier
expression ‘(’ ‘)’
‘+’ identifier
‘a’ ‘b’
‘5’
#Parse tree 34
![Page 35: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/35.jpg)
Abstract Syntax tree for 5*(a+b)
‘*’
‘+’
‘a’ ‘b’
‘5’
35
![Page 36: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/36.jpg)
Annotated Abstract Syntax tree
‘*’
‘+’
‘a’ ‘b’
‘5’
type:real
loc: reg1
type:real
loc: reg2
type:real
loc: sp+8
type:real
loc: sp+24
type:integer
36
![Page 37: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/37.jpg)
Driver for the toy compiler/interpreter
#include "parser.h" /* for type AST_node */ #include "backend.h" /* for Process() */ #include "error.h" /* for Error() */ int main(void) { AST_node *icode; if (!Parse_program(&icode)) Error("No top-‐level expression"); Process(icode); return 0; }
37
![Page 38: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/38.jpg)
Structure of toy Compiler / interpreter
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 38
![Page 39: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/39.jpg)
Lexical Analysis
• Par((ons the inputs into tokens – DIGIT – EOF – ‘*’ – ‘+’ – ‘(‘ – ‘)’
• Each token has its representa(on • Ignores whitespaces 39
![Page 40: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/40.jpg)
lex.h: Header File for Lexical Analysis
/* Define class constants */
/* Values 0-‐255 are reserved for ASCII characters */
#define EoF 256
#define DIGIT 257
typedef struct {
int class;
char repr;} Token_type;
extern Token_type Token;
extern void get_next_token(void);
40
![Page 41: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/41.jpg)
#include "lex.h" token_type Token; // Global variable void get_next_token(void) { int ch; do { ch = getchar(); if (ch < 0) { Token.class = EoF; Token.repr = '#'; return; } } while (Layout_char(ch)); if ('0' <= ch && ch <= '9') {Token.class = DIGIT;} else {Token.class = ch;} Token.repr = ch; } static int Layout_char(int ch) { switch (ch) { case ' ': case '\t': case '\n': return 1; default: return 0; } }
Lexical Analyzer
41
![Page 42: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/42.jpg)
Structure of toy Compiler / interpreter
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 42
![Page 43: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/43.jpg)
Parser
• Invokes lexical analyzer • Reports syntax errors • Constructs AST
43
![Page 44: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/44.jpg)
Parser Header File
typedef int Operator;
typedef struct _expression {
char type; /* 'D' or 'P' */
int value; /* for 'D' type expression */
struct _expression *left, *right; /* for 'P' type expression */
Operator oper; /* for 'P' type expression */
} Expression;
typedef Expression AST_node; /* the top node is an Expression */
extern int Parse_program(AST_node **);
44
![Page 45: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/45.jpg)
AST for (2 * ((3*4)+9)) P
* oper
type lej right
P +
P
*
D
2
D
9
D
4 D
3 45
![Page 46: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/46.jpg)
Driver for the Toy Compiler
#include "parser.h" /* for type AST_node */ #include "backend.h" /* for Process() */ #include "error.h" /* for Error() */ int main(void) { AST_node *icode; if (!Parse_program(&icode)) Error("No top-‐level expression"); Process(icode); return 0; }
46
![Page 47: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/47.jpg)
AST for (2 * ((3*4)+9)) P
* oper
type lej right
P +
P
*
D
2
D
9
D
4 D
3 47
![Page 48: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/48.jpg)
Driver for the Toy Compiler
#include "parser.h" /* for type AST_node */ #include "backend.h" /* for Process() */ #include "error.h" /* for Error() */ int main(void) { AST_node *icode; if (!Parse_program(&icode)) Error("No top-‐level expression"); Process(icode); return 0; }
48
![Page 49: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/49.jpg)
Source Language
• Fully parenthesized expressions • Arguments can be a single digit
ü (4 + (3 * 9)) ✗ 3 + 4 + 5 ✗ (12 + 3)
expression → digit | ‘(‘ expression operator expression ‘)’ operator → ‘+’ | ‘*’ digit → ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
49
![Page 50: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/50.jpg)
lex.h: Header File for Lexical Analysis
/* Define class constants */
/* Integers are used to encode characters + special codes */
/* Values 0-‐255 are reserved for ASCII characters */
#define EoF 256
#define DIGIT 257
typedef struct {
int class;
char repr;} Token_type;
extern Token_type Token;
extern void get_next_token(void);
50
![Page 51: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/51.jpg)
#include "lex.h" token_type Token; // Global variable void get_next_token(void) { int ch; do { ch = getchar(); if (ch < 0) { Token.class = EoF; Token.repr = '#’; return;} } while (Layout_char(ch)); if ('0' <= ch && ch <= '9')
Token.class = DIGIT; else
Token.class = ch; Token.repr = ch; } static int Layout_char(int ch) { switch (ch) { case ' ': case '\t': case '\n': return 1; default: return 0; } }
Lexical Analyzer
51
![Page 52: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/52.jpg)
AST for (2 * ((3*4)+9)) P
* oper
type lej right
P +
P
*
D
2
D
9
D
4 D
3 52
![Page 53: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/53.jpg)
Driver for the Toy Compiler
#include "parser.h" /* for type AST_node */ #include "backend.h" /* for Process() */ #include "error.h" /* for Error() */ int main(void) { AST_node *icode; if (!Parse_program(&icode)) Error("No top-‐level expression"); Process(icode); return 0; }
53
![Page 54: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/54.jpg)
Parser Environment #include "lex.h”, "error.h”, "parser.h" static Expression *new_expression(void) { return (Expression *)malloc(sizeof (Expression)); } static int Parse_operator(Operator *oper_p); static int Parse_expression(Expression **expr_p); int Parse_program(AST_node **icode_p) { Expression *expr; get_next_token(); /* start the lexical analyzer */ if (Parse_expression(&expr)) { if (Token.class != EoF) { Error("Garbage after end of program"); } *icode_p = expr; return 1; } return 0; } 54
![Page 55: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/55.jpg)
Top-‐Down Parsing • Op(mis(cally build the tree from the root to leaves • For every P → A1 A2 … An | B1 B2 … Bm
– If A1 succeeds • If A2 succeeds & A3 succeeds & … • Else fail
– Else if B1 succeeds • If B2 succeeds & B3 succeeds & .. • Else fail
– Else fail
• Recursive descent parsing – Simplified: no backtracking
• Can be applied for certain grammars 55
![Page 56: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/56.jpg)
static int Parse_expression(Expression **expr_p) { Expression *expr = *expr_p = new_expression(); if (Token.class == DIGIT) { expr-‐>type = 'D'; expr-‐>value = Token.repr -‐ '0'; get_next_token(); return 1; } if (Token.class == '(') { expr-‐>type = 'P'; get_next_token(); if (!Parse_expression(&expr-‐>left)) { Error("Missing expression"); } if (!Parse_operator(&expr-‐>oper)) { Error("Missing operator"); } if (!Parse_expression(&expr-‐>right)) { Error("Missing expression"); } if (Token.class != ')') { Error("Missing )"); } get_next_token(); return 1; } /* failed on both attempts */ free_expression(expr); return 0; }
Parser
static int Parse_operator(Operator *oper) { if (Token.class == '+') { *oper = '+'; get_next_token(); return 1; } if (Token.class == '*') { *oper = '*'; get_next_token(); return 1; } return 0; }
56
![Page 57: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/57.jpg)
AST for (2 * ((3*4)+9))
P
* oper
type lej right
P +
P
*
D
2
D
9
D
4 D
3 57
![Page 58: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/58.jpg)
Structure of toy Compiler / interpreter
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 58
![Page 59: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/59.jpg)
Seman(c Analysis
• Trivial in our case • No iden(fiers • No procedure / func(ons • A single type for all expressions
59
![Page 60: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/60.jpg)
Structure of toy Compiler / interpreter
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 60
![Page 61: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/61.jpg)
Intermediate Representa(on
P
* oper
type lej right
P +
P
*
D
2
D
9
D
4 D
3 61
![Page 62: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/62.jpg)
Alterna(ve IR: 3-‐Address Code
L1: _t0=a _t1=b _t2=_t0*_t1 _t3=d _t4=_t2-_t3 GOTO L1
“Simple Basic-like programming language” 62
![Page 63: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/63.jpg)
Structure of toy Compiler / interpreter
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 63
![Page 64: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/64.jpg)
Code genera(on
• Stack based machine • Four instruc(ons
– PUSH n – ADD – MULT – PRINT
64
![Page 65: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/65.jpg)
Code genera(on #include "parser.h" #include "backend.h" static void Code_gen_expression(Expression *expr) { switch (expr-‐>type) { case 'D': printf("PUSH %d\n", expr-‐>value); break; case 'P': Code_gen_expression(expr-‐>left); Code_gen_expression(expr-‐>right); switch (expr-‐>oper) { case '+': printf("ADD\n"); break; case '*': printf("MULT\n"); break; } break; } } void Process(AST_node *icode) { Code_gen_expression(icode); printf("PRINT\n"); } 65
![Page 66: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/66.jpg)
Compiling (2*((3*4)+9))
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
P
* oper
type lej right
P +
P
*
D
2
D
9
D
4 D
3 66
![Page 67: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/67.jpg)
Execu(ng Compiled Program
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 67
![Page 68: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/68.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack Stack’
2
68
![Page 69: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/69.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack’
3
2
Stack
2
69
![Page 70: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/70.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack’
4
3
2
Stack
3
2
70
![Page 71: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/71.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack’
12
2
Stack
4
3
2
71
![Page 72: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/72.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack’
9
12
2
Stack
12
2
72
![Page 73: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/73.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack’
21
2
Stack
9
12
2
73
![Page 74: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/74.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack’
42
Stack
21
2
74
![Page 75: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/75.jpg)
Generated Code Execu(on
PUSH 2
PUSH 3
PUSH 4
MULT
PUSH 9
ADD
MULT
Stack’ Stack
42
75
![Page 76: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/76.jpg)
Shortcuts
• Avoid genera(ng machine code • Use local assembler • Generate C code
76
![Page 77: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/77.jpg)
Structure of toy Compiler / interpreter
Executable code
exe
Source
text
txt
Semantic
Representation
Backend (synthesis) Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
(NOP)
Intermediate Representation
(AST)
Code
Generation
Execution
Engine
Execution Engine
Output*
* Programs in our PL do not take input 77
![Page 78: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/78.jpg)
Interpreta(on
• Boaom-‐up evalua(on of expressions • The same interface of the compiler
78
![Page 79: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/79.jpg)
#include "parser.h" #include "backend.h” static int Interpret_expression(Expression *expr) { switch (expr-‐>type) { case 'D': return expr-‐>value; break; case 'P': int e_left = Interpret_expression(expr-‐>left); int e_right = Interpret_expression(expr-‐>right); switch (expr-‐>oper) { case '+': return e_left + e_right; case '*': return e_left * e_right; break; } } void Process(AST_node *icode) { printf("%d\n", Interpret_expression(icode)); }
79
![Page 80: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/80.jpg)
Interpre(ng (2*((3*4)+9))
P
* oper
type lej right
P +
P
*
D
2
D
9
D
4 D
3 80
![Page 81: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/81.jpg)
Summary: Journey inside a compiler
Lexical Analysis
Syntax Analysis
Sem. Analysis
Inter. Rep.
Code Gen.
x = b*b – 4*a*c
txt
<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
Token Stream
81
![Page 82: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/82.jpg)
Lexical Analysis
Syntax Analysis
Sem. Analysis
Inter. Rep.
Code Gen.
<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
‘b’ ‘4’
‘b’ ‘a’
‘c’
ID
ID
ID
ID
ID
factor
term factor MULT
term
expression
expression
factor
term factor MULT
term
expression
term
MULT factor
MINUS
Syntax Tree
Summary: Journey inside a compiler
82
Statement
‘x’
ID EQ
![Page 83: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/83.jpg)
Lexical Analysis
Syntax Analysis
Sem. Analysis
Inter. Rep.
Code Gen.
‘b’ ‘4’
‘b’ ‘a’
‘c’
ID
ID
ID
ID
ID
factor
term factor MULT
term
expression
expression
factor
term factor MULT
term
expression
term
MULT factor
MINUS
Syntax Tree
Summary: Journey inside a compiler
83
<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>
![Page 84: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/84.jpg)
Sem. Analysis
Inter. Rep.
Code Gen.
‘b’
‘4’
‘b’
‘a’
‘c’
MULT
MULT
MULT
MINUS
Lexical Analysis
Syntax Analysis
Abstract Syntax Tree
Summary: Journey inside a compiler
84
![Page 85: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/85.jpg)
Lexical Analysis
Syntax Analysis
Sem. Analysis
Inter. Rep.
Code Gen.
‘b’
‘4’
‘b’
‘a’
‘c’
MULT
MULT
MULT
MINUS
type: int loc: sp+8
type: int loc: const
type: int loc: sp+16
type: int loc: sp+16
type: int loc: sp+24
type: int loc: R2
type: int loc: R2
type: int loc: R1
type: int loc: R1
Annotated Abstract Syntax Tree
Summary: Journey inside a compiler
85
![Page 86: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/86.jpg)
Journey inside a compiler
Lexical Analysis
Syntax Analysis
Sem. Analysis
Inter. Rep.
Code Gen.
‘b’
‘4’
‘b’
‘a’
‘c’
MULT
MULT
MULT
MINUS
type: int loc: sp+8
type: int loc: const
type: int loc: sp+16
type: int loc: sp+16
type: int loc: sp+24
type: int loc: R2
type: int loc: R2
type: int loc: R1
type: int loc: R1
R2 = 4*a R1=b*b R2= R2*c R1=R1-‐R2
Intermediate Representa(on
86
![Page 87: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/87.jpg)
Journey inside a compiler
Inter. Rep.
Code Gen.
‘b’
‘4’
‘b’
‘a’
‘c’
MULT
MULT
MULT
MINUS
type: int loc: sp+8
type: int loc: const
type: int loc: sp+16
type: int loc: sp+16
type: int loc: sp+24
type: int loc: R2
type: int loc: R2
type: int loc: R1
type: int loc: R1
R2 = 4*a R1=b*b R2= R2*c R1=R1-‐R2
MOV R2,(sp+8) SAL R2,2 MOV R1,(sp+16) MUL R1,(sp+16) MUL R2,(sp+24) SUB R1,R2
Lexical Analysis
Syntax Analysis
Sem. Analysis
Intermediate Representa(on
Assembly Code
87
![Page 88: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/88.jpg)
Error Checking
• In every stage…
• Lexical analysis: illegal tokens • Syntax analysis: illegal syntax • Seman(c analysis: incompa(ble types, undefined variables, …
• Every phase tries to recover and proceed with compila(on (why?) – Divergence is a challenge
88
![Page 89: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/89.jpg)
The Real Anatomy of a Compiler
Executable
code
exe
Source
text
txt Lexical Analysis
Sem. Analysis
Process text input
characters Syntax Analysis tokens AST
Intermediate code
generation
Annotated AST
Intermediate code
optimization
IR Code generation IR
Target code optimization
Symbolic Instructions
SI Machine code generation
Write executable
output
MI
89
![Page 90: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/90.jpg)
Op(miza(ons • “Op(mal code” is out of reach
– many problems are undecidable or too expensive (NP-‐complete) – Use approxima(on and/or heuris(cs
• Loop op(miza(ons: hois(ng, unrolling, … • Peephole op(miza(ons • Constant propaga(on
– Leverage compile-‐(me informa(on to save work at run(me (pre-‐computa(on)
• Dead code elimina(on – space
• …
90
![Page 91: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/91.jpg)
Machine code genera(on
• Register alloca(on – Op(mal register assignment is NP-‐Complete – In prac(ce, known heuris(cs perform well
• assign variables to memory loca(ons • Instruc(on selec(on
– Convert IR to actual machine instruc(ons
• Modern architectures – Mul(cores – Challenging memory hierarchies
91
![Page 92: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/92.jpg)
And on a More General Note
92
![Page 93: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/93.jpg)
Course Goals
• What is a compiler • How does it work • (Reusable) techniques & tools
• Programming language implementa(on – run(me systems
• Execu(on environments – Assembly, linkers, loaders, OS
93
![Page 94: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/94.jpg)
To Compilers, and Beyond …
• Compiler construc(on is successful – Clear problem – Proper structure of the solu(on – Judicious use of formalisms
• Wider applica(on – Many conversions can be viewed as compila(on
• Useful algorithms 94
![Page 95: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/95.jpg)
Conceptual Structure of a Compiler
Executable
code
exe Source
text
txt
Semantic
Representation
Backend
(synthesis)
Compiler
Frontend
(analysis)
95
![Page 96: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/96.jpg)
Conceptual Structure of a Compiler
Executable
code
exe Source
text
txt
Semantic
Representation
Backend
(synthesis)
Compiler
Frontend
(analysis)
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
Intermediate Representation
(IR)
Code
Generation
96
![Page 97: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/97.jpg)
Judicious use of formalisms
• Regular expressions (lexical analysis) • Context-‐free grammars (syntac(c analysis) • Aaribute grammars (context analysis) • Code generator generators (dynamic programming)
• But also some niay-‐griay programming
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
Intermediate Representation
(IR)
Code
Generation
97
![Page 98: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/98.jpg)
Use of program-‐genera(ng tools
• Parts of the compiler are automa(cally generated from specifica(on
Stream of tokens
Jlex
regular expressions
input program scanner
98
![Page 99: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/99.jpg)
Use of program-‐genera(ng tools
• Parts of the compiler are automa(cally generated from specifica(on
Jcup
Context free grammar
Stream of tokens parser Syntax tree
99
![Page 100: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/100.jpg)
Use of program-‐genera(ng tools • Simpler compiler construc(on
– Less error prone – More flexible
• Use of pre-‐canned tailored code – Use of dirty program tricks
• Reuse of specifica(on
tool specification
input (generated) code
output 100
![Page 101: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/101.jpg)
Compiler Construc(on Toolset
• Lexical analysis generators – Lex, JLex
• Parser generators – Yacc, Jcup
• Syntax-‐directed translators • Dataflow analysis engines
101
![Page 102: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/102.jpg)
Wide applicability
• Structured data can be expressed using context free grammars – HTML files – Postscript – Tex/dvi files – …
102
![Page 103: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/103.jpg)
Generally useful algorithms
• Parser generators • Garbage collec(on • Dynamic programming • Graph coloring
103
![Page 104: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/104.jpg)
How to write a compiler?
104
![Page 105: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/105.jpg)
How to write a compiler?
L1 Compiler
Executable compiler
exe
L2 Compiler source
txt L1
105
![Page 106: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/106.jpg)
How to write a compiler?
L1 Compiler
Executable compiler
exe
L2 Compiler source
txt L1
L2 Compiler
Executable program
exe
Program source
txt L2
=
106
![Page 107: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/107.jpg)
How to write a compiler?
L1 Compiler
Executable compiler
exe
L2 Compiler source
txt L1
L2 Compiler
Executable program
exe
Program source
txt L2
=
107
Program
Output
Y
Input
X
=
107
![Page 108: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/108.jpg)
Bootstrapping a compiler
108
![Page 109: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/109.jpg)
Bootstrapping a compiler
L1 Compiler simple
L2 executable compiler
exe Simple
L2 compiler source
txt L1
L2s Compiler Inefficient adv.
L2 executable compiler
exe advanced
L2 compiler source
txt L2
L2 Compiler Efficient adv.
L2 executable compiler
Y advanced
L2 compiler source
X
=
=
109
![Page 110: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/110.jpg)
Proper Design
• Simplify the compila(on phase – Portability of the compiler frontend – Reusability of the compiler backend
• Professional compilers are integrated
Java
C
Pascal
C++
ML
Pentium
MIPS
Sparc
Java
C
Pascal
C++
ML
Pentium
MIPS
Sparc
IR
110
![Page 111: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/111.jpg)
Modularity
Source Language 1
txt
Semantic Representation
Backend
TL2
Frontend
SL2
int a, b; a = 2; b = a*2 + 1;
MOV R1,2 SAL R1 INC R1 MOV R2,R1
Frontend
SL3
Frontend
SL1
Backend
TL1
Backend
TL3
Source Language 1
txt
Source Language 1
txt
Executable target 1
exe
Executable target 1
exe
Executable target 1
exe
SET R1,2 STORE #0,R1 SHIFT R1,1 STORE #1,R1 ADD R1,1 STORE #2,R1
111
![Page 112: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/112.jpg)
112
![Page 113: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/113.jpg)
Lexical Analysis
Modern Compiler Design: Chapter 2.1
113
![Page 114: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/114.jpg)
Conceptual Structure of a Compiler
Executable
code
exe Source
text
txt
Semantic
Representation
Backend
Compiler
Frontend
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
Intermediate Representation
(IR)
Code
Generation
114
![Page 115: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/115.jpg)
Conceptual Structure of a Compiler
Executable
code
exe Source
text
txt
Semantic
Representation
Backend
Compiler
Frontend
Lexical Analysis
Syntax Analysis
Parsing
Semantic Analysis
Intermediate Representation
(IR)
Code
Generation
words sentences 115
![Page 116: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/116.jpg)
What does Lexical Analysis do?
• Language: fully parenthesized expressions Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
( ( 23 + 7 ) * 19 )
116
![Page 117: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/117.jpg)
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
Regular languages
( ( 23 + 7 ) * 19 )
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
117
![Page 118: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/118.jpg)
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
Regular languages
( ( 23 + 7 ) * 19 )
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
118
![Page 119: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/119.jpg)
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
Regular languages
( ( 23 + 7 ) * 19 )
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
119
![Page 120: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/120.jpg)
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
Regular languages
( ( 23 + 7 ) * 19 )
LP LP Num Op Num RP Op Num RP
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
120
![Page 121: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/121.jpg)
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
Regular languages
( ( 23 + 7 ) * 19 )
LP LP Num Op Num RP Op Num RP Kind
Value
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’
121
![Page 122: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/122.jpg)
What does Lexical Analysis do?
• Language: fully parenthesized expressions Context free language
Regular languages
( ( 23 + 7 ) * 19 )
LP LP Num Op Num RP Op Num RP Kind
Value
Expr → Num | LP Expr Op Expr RP Num → Dig | Dig Num Dig → ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP → ‘(’ RP → ‘)’ Op → ‘+’ | ‘*’ Token Token …
122
![Page 123: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/123.jpg)
• Par((ons the input into stream of tokens – Numbers – Iden(fiers – Keywords – Punctua(on
• Usually represented as (kind, value) pairs – (Num, 23) – (Op, ‘*’)
• “word” in the source language • “meaningful” to the syntac(cal analysis
What does Lexical Analysis do?
123
![Page 124: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/124.jpg)
From scanning to parsing((23 + 7) * x)
) ? * ) 7 + 23 ( (
RP Id OP RP Num OP Num LP LP
Lexical Analyzer
program text
token stream
Parser Grammar: Expr → ... | Id Id → ‘a’ | ... | ‘z’
Op(*)
Id(?)
Num(23) Num(7)
Op(+)
Abstract Syntax Tree
valid syntax error
124
![Page 125: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/125.jpg)
Why Lexical Analysis?
• Well, not strictly necessary, but … – Regular languages ⊆ Context-‐Free languages
• Simplifies the syntax analysis (parsing) – And language defini(on
• Modularity • Reusability • Efficiency
125
![Page 126: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/126.jpg)
Lecture goals
• Understand role & place of lexical analysis
• Lexical analysis theory • Using program genera(ng tools
126
![Page 127: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/127.jpg)
Lecture Outline
ü Role & place of lexical analysis • What is a token? • Regular languages • Lexical analysis • Error handling • Automa(c crea(on of lexical analyzers
127
![Page 128: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/128.jpg)
What is a token? (Intui(vely)
• A “word” in the source language – Anything that should appear in the input to syntax analysis • Iden(fiers • Values • Language keywords
• Usually, represented as a pair of (kind, value)
128
![Page 129: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/129.jpg)
Example Tokens
Type Examples
ID foo, n_14, last NUM 73, 00, 517, 082 REAL 66.1, .5, 5.5e-‐10 IF if COMMA , NOTEQ != LPAREN ( RPAREN )
129
![Page 130: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/130.jpg)
Example Non Tokens
Type Examples
comment /* ignored */ preprocessor direc(ve #include <foo.h>
#define NUMS 5.6 macro NUMS whitespace \t, \n, \b, ‘ ‘
130
![Page 131: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/131.jpg)
Some basic terminology
• Lexeme (aka symbol) -‐ a series of leaers separated from the rest of the program according to a conven(on (space, semi-‐column, comma, etc.)
• Paaern -‐ a rule specifying a set of strings. Example: “an iden(fier is a string that starts with a leaer and con(nues with leaers and digits” – (Usually) a regular expression
• Token -‐ a pair of (paaern, aaributes)
131
![Page 132: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/132.jpg)
Example void match0(char *s) /* find a zero */
{
if (!strncmp(s, “0.0”, 3))
return 0. ;
}
VOID ID(match0) LPAREN CHAR DEREF ID(s) RPAREN
LBRACE
IF LPAREN NOT ID(strncmp) LPAREN ID(s) COMMA STRING(0.0) COMMA NUM(3) RPAREN RPAREN
RETURN REAL(0.0) SEMI
RBRACE
EOF 132
![Page 133: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/133.jpg)
Example Non Tokens
Type Examples
comment /* ignored */ preprocessor direc(ve #include <foo.h>
#define NUMS 5.6 macro NUMS whitespace \t, \n, \b, ‘ ‘
• Lexemes that are recognized but get consumed rather than transmiaed to parser – If – i/*comment*/f
133
![Page 134: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/134.jpg)
How can we define tokens?
• Keywords – easy! – if, then, else, for, while, …
• Iden(fiers? • Numerical Values? • Strings?
• Characterize unbounded sets of values using a bounded descrip(on?
134
![Page 135: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/135.jpg)
Lecture Outline
ü Role & place of lexical analysis ü What is a token? • Regular languages • Lexical analysis • Error handling • Automa(c crea(on of lexical analyzers
135
![Page 136: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/136.jpg)
Regular languages
• Formal languages – Σ = finite set of leaers – Word = sequence of leaer – Language = set of words
• Regular languages defined equivalently by – Regular expressions – Finite-‐state automata
136
![Page 137: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/137.jpg)
Common format for reg-‐expsBasic Pa4erns Matching
x The character x
. Any character, usually except a new line
[xyz] Any of the characters x,y,z
^x Any character except x
Repe;;on Operators
R? An R or nothing (=op(onally an R)
R* Zero or more occurrences of R
R+ One or more occurrences of R
Composi;on Operators
R1R2 An R1 followed by R2
R1|R2 Either an R1 or R2
Grouping
(R) R itself 137
![Page 138: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/138.jpg)
Examples
• ab*|cd? = • (a|b)* = • (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)* =
138
![Page 139: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/139.jpg)
Escape characters
• What is the expression for one or more + symbols? – (+)+ won’t work – (\+)+ will
• backslash \ before an operator turns it to standard character – \*, \?, \+, a\(b\+\*, (a\(b\+\*)+, …
• backslash double quotes surrounds text – “a(b+*”, “a(b+*”+ 139
![Page 140: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/140.jpg)
Shorthands
• Use names for expressions – leaer = a | b | … | z | A | B | … | Z – leaer_ = leaer | _ – digit = 0 | 1 | 2 | … | 9 – id = leaer_ (leaer_ | digit)*
• Use hyphen to denote a range – leaer = a-‐z | A-‐Z – digit = 0-‐9
140
![Page 141: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/141.jpg)
Examples
• if = if • then = then • relop = < | > | <= | >= | = | <>
• digit = 0-‐9 • digits = digit+
141
![Page 142: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/142.jpg)
Example
• A number is
number = ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+ ( ε | . ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+
( ε | E ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )+ )
)
• Using shorthands it can be written as number = digits (ε | .digits (ε | E (ε|+|-‐) digits ) )
142
![Page 143: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/143.jpg)
Exercise 1 -‐ Ques(on
• Language of ra(onal numbers in decimal representa(on (no leading, ending zeros) – 0 – 123.757 – .933333 – Not 007 – Not 0.30
143
![Page 144: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/144.jpg)
Exercise 1 -‐ Answer
• Language of ra(onal numbers in decimal representa(on (no leading, ending zeros)
– Digit = 1|2|…|9 Digit0 = 0|Digit Num = Digit Digit0* Frac = Digit0* Digit Pos = Num | .Frac | 0.Frac| Num.Frac PosOrNeg = (Є|-‐)Pos R = 0 | PosOrNeg
144
![Page 145: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/145.jpg)
Exercise 2 -‐ Ques(on
• Equal number of opening and closing parenthesis: [n]n = [], [[]], [[[]]], …
145
![Page 146: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/146.jpg)
Exercise 2 -‐ Answer
• Equal number of opening and closing parenthesis: [n]n = [], [[]], [[[]]], …
• Not regular • Context-‐free • Grammar: S ::= [] | [S]
146
![Page 147: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/147.jpg)
Challenge: Ambiguity
• if = if • id = leaer_ (leaer_ | digit)*
• “if” is a valid word in the language of iden(fiers… so what should it be?
• How about the iden(fier “iffy”?
• Solu(on – Always find longest matching token – Break (es using order of defini(ons… first defini(on wins (=> list rules for keywords before iden(fiers) 147
![Page 148: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/148.jpg)
Crea(ng a lexical analyzer
• Given a list of token defini(ons (paaern name, regex), write a program such that – Input: String to be analyzed – Output: List of tokens
• How do we build an analyzer?
148
![Page 149: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1](https://reader034.vdocuments.us/reader034/viewer/2022043004/5f853fc241190567362ab2b0/html5/thumbnails/149.jpg)
149