![Page 1: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/1.jpg)
Programming Language ConceptsLexical and Syntactic Analysis
Janyl Jumadinova24 January, 2017
![Page 2: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/2.jpg)
Most Important Steps in Compilation
I Optional Preprocessing
I Lexical analysis (scanning)
I Syntax analysis (parsing)
I Semantic analysis
I Intermediate code generation
I Optimization (usually machine-independent)
I Final code generation
I Optional final optimization
2/30
![Page 3: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/3.jpg)
Lexical Analysis
3/30
![Page 4: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/4.jpg)
Lexical Analysis
For each token type, give a description:
- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),
- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”
4/30
![Page 5: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/5.jpg)
Lexical Analysis
For each token type, give a description:
- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”
4/30
![Page 6: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/6.jpg)
Lexical Analysis
Lexical analysis produces a “token stream” in which the progam isreduced to a sequence of token types, each with its identifyingnumber and the actual string (in the program) corresponding to it.
5/30
![Page 7: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/7.jpg)
6/30
![Page 8: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/8.jpg)
Syntactic Analysis
I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.
I Grammars are often specified in BNF notation (“Backus NaurForm”):
<item1> ::= valid replacements for <item1>
<item2> ::= valid replacements for <item2>
7/30
![Page 9: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/9.jpg)
Syntactic Analysis
I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.
I Grammars are often specified in BNF notation (“Backus NaurForm”):
<item1> ::= valid replacements for <item1>
<item2> ::= valid replacements for <item2>
7/30
![Page 10: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/10.jpg)
Syntactic Analysis
I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.
I Grammars are often specified in BNF notation (“Backus NaurForm”):
<item1> ::= valid replacements for <item1>
<item2> ::= valid replacements for <item2>
7/30
![Page 11: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/11.jpg)
Syntactic Analysis
8/30
![Page 12: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/12.jpg)
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
![Page 13: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/13.jpg)
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
![Page 14: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/14.jpg)
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
![Page 15: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/15.jpg)
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
![Page 16: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/16.jpg)
Grammars (Context-free Gramars)
I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.
I Collection of TERMINALS (“constants”, strings that can’t bereplaced)
I One special variable called the START SYMBOL.
I Collection of RULES, also called PRODUCTIONS.
variable → rule1 | rule2 | rule3 | ...
You can also write each rule on a separate line (as in the book)
9/30
![Page 17: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/17.jpg)
Grammars (Context-free Gramars)
Grammar
A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 2011020 can be parsed?
10/30
![Page 18: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/18.jpg)
Grammars (Context-free Gramars)
Grammar
A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 2011020 can be parsed?
10/30
![Page 19: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/19.jpg)
Grammars (Context-free Gramars)
Grammar
A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?
11/30
![Page 20: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/20.jpg)
Grammars (Context-free Gramars)
Grammar
A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:
I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2
Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?
11/30
![Page 21: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/21.jpg)
12/30
![Page 22: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/22.jpg)
Syntactic Analysis
I The process of verifying that a token stream represents a validapplication of the rules is called parsing.
I Using the BNF rules we can construct a parse tree:
13/30
![Page 23: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/23.jpg)
Sample Parse Tree (portion)
14/30
![Page 24: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/24.jpg)
Sample Parse Tree (failed)
15/30
![Page 25: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/25.jpg)
Grammar for Java (version 8)
I Overview of notation used:https://docs.oracle.com/javase/specs/jls/se8/html/
jls-2.html
I The full syntax grammar:https://docs.oracle.com/javase/specs/jls/se8/html/
jls-19.html
16/30
![Page 26: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/26.jpg)
Compiling
So far, we have looked at:
I the scanner (lexical analysis)–tokenizes input
I the parser (syntactic analysis)–validates structure
I Next, we do semantic analysis: is the program meaningful?
EXAMPLE:
In Java,int i, i, i;
has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.
17/30
![Page 27: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/27.jpg)
Compiling
So far, we have looked at:
I the scanner (lexical analysis)–tokenizes input
I the parser (syntactic analysis)–validates structure
I Next, we do semantic analysis: is the program meaningful?
EXAMPLE:
In Java,int i, i, i;
has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.
17/30
![Page 28: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/28.jpg)
Compiling
So far, we have looked at:
I the scanner (lexical analysis)–tokenizes input
I the parser (syntactic analysis)–validates structure
I Next, we do semantic analysis: is the program meaningful?
EXAMPLE:
In Java,int i, i, i;
has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.
17/30
![Page 29: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/29.jpg)
Semantic Analysis
I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.
I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.
18/30
![Page 30: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/30.jpg)
Semantic Analysis
I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.
I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.
18/30
![Page 31: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/31.jpg)
As the parser encounters names, it looks them up to see if they arealready declared; if not, it creates a table entry. (Some names arepre-declared as part of the language.)
19/30
![Page 32: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/32.jpg)
20/30
![Page 33: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/33.jpg)
21/30
![Page 34: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/34.jpg)
22/30
![Page 35: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/35.jpg)
Intermediate Code Generation
I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.
I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).
I A simple example is something called “three-address code.”
23/30
![Page 36: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/36.jpg)
Intermediate Code Generation
I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.
I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).
I A simple example is something called “three-address code.”
23/30
![Page 37: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/37.jpg)
Intermediate Code Generation
24/30
![Page 38: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/38.jpg)
Why Intermediate Code Generation?
25/30
![Page 39: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/39.jpg)
Why Intermediate Code Generation?
25/30
![Page 40: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/40.jpg)
Optimization
26/30
![Page 41: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/41.jpg)
Optimization
27/30
![Page 42: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/42.jpg)
Optimization
Here is a list of the many, many kinds of optimizations:http://www.compileroptimizations.com/
(The examples show the effects on source code, but theoptimizations are usually made on the intermediate code.)
28/30
![Page 43: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/43.jpg)
Code Generation
I It is usually very straightforward to generate machineinstructions from intermediate code, since the intermediatecode is simple.
I Some further machine-specific optimizations may take placeduring or after this stage.
29/30
![Page 44: Programming Language Concepts Lexical and Syntactic Analysis · 2018-08-27 · Most Important Steps in Compilation I Optional Preprocessing I Lexical analysis (scanning) I Syntax](https://reader033.vdocuments.us/reader033/viewer/2022041808/5e563fa5a0895e7a2c7fd582/html5/thumbnails/44.jpg)
Pipelining
I The steps in compilation don’t need to be done in wholephases, one after the other, but can be “pipelined”:
I int count = 1;
create tokens, pass to parser, generate some intermediate code
I j = j + count;
create tokens, pass to parser, generate some intermediate code... etc. ...
30/30