language processing systems - 会津大学公式 ...hamada/lp/l02-lp.pdf · language processing...
TRANSCRIPT
![Page 1: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/1.jpg)
Language Processing Systems
Prof. Mohamed Hamada
Software Engineering Lab. The University of Aizu
Japan
![Page 2: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/2.jpg)
Today’s Outline
• Anatomy of a compiler
• Compiler front-end and back-end
• Regular expressions
![Page 3: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/3.jpg)
Anatomy of a Compiler
Program written
in a Programming
Languages
Assembly Language
Translation Compiler
![Page 4: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/4.jpg)
What is a compiler?
program in some source
language
executable code for target
machine
compiler
A compiler is a program that reads a program written in one language and translates it into another language.
Traditionally, compilers go from high-level languages to low-level languages.
![Page 5: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/5.jpg)
Example
X=a+b*10
MOV id3, R2 MUL #10.0, R2 MOV id2, R1 ADD R2, R1 MOV R1, id1
compiler
![Page 6: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/6.jpg)
What is a compiler?
program in some source
language
executable code for target
machine
front-end analysis
semantic represen-
tation
back-end synthesis
compiler
Intermediate representation
![Page 7: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/7.jpg)
Compiler Architecture
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Front End Back End
![Page 8: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/8.jpg)
front-end: from program text to AST
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
front-end
![Page 9: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/9.jpg)
front-end: from program text to AST
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
scanner generator
token description
parser generator
language grammar
Scanner
Parser
Semantic analysis Semantic
representation
![Page 10: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/10.jpg)
Semantic representation
• heart of the compiler • intermediate code
– linked lists of pseudo instructions – abstract syntax tree (AST)
program in some source
language
executable code for target
machine
front-end analysis
semantic represen-
tation
back-end synthesis
compiler
![Page 11: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/11.jpg)
AST example
• expression grammar expression → expression ‘+’ term | expression ‘-’ term | term term → term ‘*’ factor | term ‘/’ factor | factor factor → identifier | constant | ‘(‘ expression ‘)’
• example expression b*b – 4*a*c
![Page 12: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/12.jpg)
parse tree: b*b – 4*a*c
‘b’
identifier
expression
term
factor
term
‘b’
factor
identifier
‘*’
‘4’
constant
term
factor
term
‘a’
factor
identifier
‘*’
term
factor ‘*’
‘c’
identifier
expression
‘-’
![Page 13: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/13.jpg)
AST: b*b – 4*a*c
‘*’
‘c’
‘-’
‘b’
‘4’
‘*’
‘a’
‘*’
‘b’
![Page 14: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/14.jpg)
annotated AST: b*b – 4*a*c
• identifier • constant • term • expression
‘*’
‘c’
‘-’
‘b’
‘4’
‘*’
type: real loc: reg1
type: real loc: reg2
type: real loc: const
type: real loc: sp+24
type: real loc: reg2
‘a’ type: real loc: sp+8
‘*’
type: real loc: reg1
type: real loc: sp+16 ‘b’
type: real loc: sp+16
![Page 15: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/15.jpg)
Parser := id1 + id2 *
id3 60
position = initial + rate * 60
Scanner
id1 := id2 + id3 * 60
Semantic Analyzer
:= id1 + id2 *
id3 int-to-real
60
Example
![Page 16: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/16.jpg)
AST exercise • expression grammar
expression → expression ‘+’ term | expression ‘-’ term | term term → term ‘*’ factor | term ‘/’ factor | factor factor → identifier | constant | ‘(‘ expression ‘)’
• example expression b*b – (4*a*c)
• draw parse tree and AST
![Page 17: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/17.jpg)
answer parse tree: b*b – 4*a*c
‘b’
identifier
expression
term
factor
term
‘b’
factor
identifier
‘*’
‘4’
constant
term
factor
term
‘a’
factor
identifier
‘*’
term
factor ‘*’
‘c’
identifier
expression
‘-’
![Page 18: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/18.jpg)
answer parse tree: b*b – (4*a*c)
‘b’
identifier
expression
term
factor
term
‘b’
factor
identifier
‘*’
term
expression
‘-’
expression
factor
‘(’ ‘)’
‘4*a*c’
![Page 19: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/19.jpg)
Advantages of Using Front-end and Back-end
1. Retargeting - Build a compiler for a new machine by attaching a new code generator to an existing front-end.
2. Optimization - reuse intermediate code optimizers in compilers for different languages and different machines.
Note: the terms “intermediate code”, “intermediate language”, and “intermediate representation” are all used interchangeably.
![Page 20: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/20.jpg)
Compiler structure
• L+M modules = LxM compilers
program in some source
language
front-end analysis
semantic represen-
tation
executable code for target
machine
back-end synthesis
compiler
program in some source
language
front-end analysis
executable code for target
machine
back-end synthesis
executable code for target
machine
back-end synthesis
![Page 21: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/21.jpg)
Limitations of modular approach
• performance – generic vs specific – loss of information
• variations must be small – same programming paradigm – similar processor architecture
program in some source
language
front-end analysis
semantic represen-
tation
executable code for target
machine
back-end synthesis
compiler
program in some source
language
front-end analysis
executable code for target
machine
back-end synthesis
executable code for target
machine
back-end synthesis
![Page 22: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/22.jpg)
Front-end and Back-end
• Suppose you want to write 3 compilers to 4 computer platforms:
C++
Java
FORTRAN
MIPS
SPARC
Pentium
PowerPC
We need to write 12 programs
![Page 23: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/23.jpg)
Front-end and Back-end
• But we can do it better
FE BE
IR
– IR: Intermediate Representation – FE: Front-End – BE: Back-End
C++
Java
FORTRAN
MIPS
SPARC
Pentium
PowerPC
BE
BE
BE
FE
FE
We need to write 7 programs only
![Page 24: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/24.jpg)
Front-end and Back-end
• Suppose you want to write compilers from m source languages to n computer platforms. A naïve solution requires n*m programs:
• but we can do it with n+m programs: FE
FE
FE
BE
BE
BE
BE
IR
– IR: Intermediate Representation – FE: Front-End – BE: Back-End
C++ Java
FORTRAN
MIPS SPARC Pentium PowerPC
C++ Java
FORTRAN
MIPS SPARC Pentium PowerPC
![Page 25: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/25.jpg)
Compiler Example
position=initial+rate*60
MOV id3, R2 MUL #60.0, R2 MOV id2, R1 ADD R2, R1 MOV R1, id1
compiler
![Page 26: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/26.jpg)
Parser := id1 + id2 *
id3 60
position := initial + rate * 60
Scanner
id1 := id2 + id3 * 60
Semantic Analyzer
:= id1 + id2 *
id3 int-to-real
60
Intermediate Code Generator
temp1 := int-to-real (60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3
Code Optimizer
temp1 := id3 * 60.0 id1 := id2 + temp1
Code Generator
MOV id3, R2 MUL #60.0, R2 MOV id2, R1 ADD R2, R1 MOV R1, id1
Example
![Page 27: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/27.jpg)
Regular Expressions
Symbol: a A regular expression formed by a.
Alternation: M | N A regular expression formed by M or N.
Concatenation: (M • N) A regular expression formed by M followed by N. Repetition:
(M*) A regular expression formed by zero or more repetitions of M.
Empty Set: Φ A regular expression formed by Empty set.
Lambda: λ A regular expression formed by Empty string.
A regular expression is built up out of simpler regular expressions using a set of defining rules.
![Page 28: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/28.jpg)
Regular Expressions
Example: (a)|((b)*(c)) can be written as: a|b*c.
Language: The language denoted by a regular expression r
will be expressed as L(r)
Operators Precedence: () > * > • > |
This can simplify regular expressions.
Regular expressions allows us to define tokens of programming Languages such as identifiers and numbers.
![Page 29: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/29.jpg)
Regular Expressions
Examples:
1. a* is a regular expression denotes the set {λ,a,aa,…}
2. a|b is a regular expression denotes the set {a}U{b}
3. a*|b is a regular expression denotes the set {λ,a,aa,…}U{b}
4. a*b is a regular expression denotes the set {b,ab,aab,…}
![Page 30: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/30.jpg)
Match and Create the Regular Expressions
1. 0(0|1)*0
2. ((λ|0)1*)*
3. ((0|1)0(0|1))*
• All strings of 0’s and 1’s that does not contain the substring 011
a. 000000 b. 01010 c. 010101 d. 101010 e. 001100
![Page 31: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/31.jpg)
Match and Create the Regular Expressions
1. 0(0|1)*0
2. ((λ|0)1*)*
3. ((0|1)0(0|1))*
• All strings of 0’s and 1’s that does not contain the substring 011
a. 000000 b. 01010 c. 010101 d. 101010 e. 001100
![Page 32: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/32.jpg)
Match and Create the Regular Expressions
1. 0(0|1)*0
2. ((λ|0)1*)*
3. ((0|1)0(0|1))*
• All strings of 0’s and 1’s that does not contain the substring 011
a. 000000 b. 01010 c. 010101 d. 101010 e. 001100
![Page 33: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/33.jpg)
Match and Create the Regular Expressions
1. 0(0|1)*0
2. ((λ|0)1*)*
3. ((0|1)0(0|1))*
• All strings of 0’s and 1’s that does not contain the substring 011
a. 000000 b. 01010 c. 010101 d. 101010 e. 001100
![Page 34: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/34.jpg)
Match and Create the Regular Expressions
1. 0(0|1)*0
2. ((λ|0)1*)*
3. ((0|1)0(0|1))*
• All strings of 0’s and 1’s that does not contain the substring 011 – 1*((010)*0*)*(λ|1)
a. 000000 b. 01010 c. 010101 d. 101010 e. 001100
![Page 35: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0489797e708231d40e75da/html5/thumbnails/35.jpg)
END