description of programming languages 1 using regular expressions and context free grammars
TRANSCRIPT
Description of programming languages
1
Description of programming languages
Using regular expressions and context free grammars
Description of programming languages
2
Introduction
• Programming languages must be described in an exact language– No discussion whether a language element is legal or
not• I will introduce 2 description languages
– Regular expressions• Used to describes the “small” parts of a programming
language– Identifiers, numbers, etc.
– Context free grammars• Used to describes the “bigger” parts of a programming
language– Expressions, statements, classes, etc.
Description of programming languages
3
Regular expressions defined
• We need an alphabet called Σ– Example alphabets: ASCII, UNICODE
• Regular expressions are sets– Ø (the empty set) is a regular expression– { ε } is a regular set
• ε means the empty string– All sets {a} where a is in the alphabet Σ are regular
expressions– From two regular expressions R and S we can generate
more regular expressions• R | S R U S• RS Concatenations of strings from R and
from S• R* if R is {a} then R* is {ε, a, aa, aaa, … }
Description of programming languages
4
Regular expressions examples
• Set of positive integers – (0|1|2|3|4|5|6|7|8|9) (0|1|2|3|4|5|6|7|8|9)*
• Set of words in English– (a|b|…|z)(a|b|…|z)*– Not exactly English …
• bbz is in the set, but is not an English word
Description of programming languages
5
Regular expressions, short hand notation
• R+ means R R*– 1 or more occurrences
• R? means ε | R– 0 or 1 occurrence
• [a-z] means a|b|c|…|z• [a-zA-Z] means [a-z] | [A-Z]• Examples
– Integer: -?[0-9]+– Identifier: [a-zA-Z][a-zA-Z0-9]*
Description of programming languages
6
Regular expressions in Java
• Java API which uses regular expressions– Class String
• String[].split(String regex)• “Java is my favorite language”.split(“ “)
– produces an array {Java, is, my, favorite, language}– “ “ is a very simple regular expression
– Package java.util.regex• Class Pattern• Class Matcher
Description of programming languages
7
What regular expressions can’t do
• Regular expression can describe simple languages.
• Regular expressions have no “memory”– Cannot describe parenthesis structures
• (((a + b) + c) + d)• if (…) { if (…) … else …} else …
• We need something stronger!– Context free grammars
Description of programming languages
8
Context free grammars defined
• A context free grammar consists of 4 parts– V is an alphabet– Σ is a set of terminals,Σ ⊂ V
• The elements of the set V − Σ are called non-terminals
– R is a set of production rules, (V − Σ) X V*– S the start symbol, S ∈ V − Σ
Description of programming languages
9
Context free grammars examples
• Example a, b– Alphabet {a, b, A}– Terminals { a, b }
• Non-terminals { A }
– Production• {A → Aa, A → Ab, A → a, A → b}
– Some derivations• A → Aa → Aaa → Abaa → abaa• A → Ab → ab• A → Ab → bb
Description of programming languages
10
Example: Boolean expressions
• We only state the productions explicitly– Terminals and non-
terminals can be inferred by looking at the productions
– Convention• Capital letters: Non-
terminals
• Non-capital letters: Terminals
• Boolean expressions– E → true– E → false– E → E && E– E → E || E– E → (E)– E → !E– Derivations
E → E && E → E && (E) → E && (E || E) →* true && (false || true)
Sometimes pictured as a (parse) tree.
Description of programming languages
11
What context free grammars can’t do
• Context free grammars cannot be used to check that a variable is declared before it is used– And by no means to check the variables type
Description of programming languages
12
The phases of a compiler
• Lexical analysis (scanning)– Using regular expressions
• Syntax analysis (parsing)– Using context free grammars
• Semantic analysis– Using a symbol table
• Code generation
Description of programming languages
13
References• Wikipedia
– Regular expression http://en.wikipedia.org/wiki/Regular_expression
– Context-free grammar http://en.wikipedia.org/wiki/Context-free_grammar
• Friedl Mastering Regular Expressions, 2nd edition, O’Reilly 2002
– An entire book (460 pages) devoted to regular expressions
• J2SE 5.0 API specification– package java.util.regex
• Scott A. Hommel Regular Expressions, The Java Tutorial
– http://java.sun.com/docs/books/tutorial/extra/regex/index.html
• Lewis & Papadimitriou Elements of the Theory of Computation, Pearson 1997
– Introduction to regular expressions and context free grammars (and a lot more)
• Aho, Sethi & Ullman Compilers: Principles, Techniques and Tools, Addison Wesley 1986
– A famous book on compilers.– Referred to as “The Dragon Book”