cps 506 comparative programming languages

29
CPS 506 Comparative Programming Languages Syntax Specification

Upload: abedi

Post on 15-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

CPS 506 Comparative Programming Languages. Syntax Specification. Compiling Process Steps. Program  Lexical Analysis Convert characters into a stream of tokens Lexical Analysis  Syntactic Analysis Send tokens to develop an abstract representation or parse tree. 2. - PowerPoint PPT Presentation

TRANSCRIPT

CPS 506Comparative Programming

Languages

Syntax Specification

Compiling Process Steps

2

• Program Lexical Analysis–Convert characters into a stream

of tokens

• Lexical Analysis Syntactic Analysis–Send tokens to develop an

abstract representation or parse tree

Compiling Process Steps (con’t)

3

• Syntactic Analysis Semantic Analysis– Send parse tree to analyze for semantic

consistency and convert for efficient run in the architecture (Optimization)

• Semantic Analysis Machine Code– Convert abstract representation to

executable machine code using code generation

Formal Methods and Language Processing

• Meta-Language– A language to define other languages

• BNF (Backus-Naur Form)– A set of rewriting rules ρ

– A set of terminal symbols ∑

– A set of non-terminal symbols Ν

– A start symbol S є Ν

– ρ : Α ω

– Α є Ν and ω є (Ν U Σ)

– Right-hand side: a sequence of terminal and non-terminal symbols

– Left-hand side: a non-terminal symbol

4

BNF (con’t)

• The words in Ν : grammatical categories– Identifier, Expression, Loop, Program, …

– S : principal grammatical category

– Symbols in Σ : the basic alphabet

– Example 1:

binaryDigit 0

binaryDigit 1

• or

binaryDigit 0 | 1– Example 2:

Integer Digit | Integer Digit

Digit 0|1|2|3|4|5|6|7|8|9

5

BNF (con’t)

• Parse Tree

• DerivationInteger Integer Digit Integer Digit Digit Digit Digit Digit 2 Digit Digit 28 Digit 281

Integer

Integer

Integer Digit

Digit

Digit

1

8

2

6

BNF (con’t)

• Lexeme: The lowest-level syntactic units

• Tokens : A set of all grammatical categories that define strings of non-blank characters (Lexical Syntax)– Identifier (variable names, function names,…)

– Literal (integer and decimal numbers,…)

– Operator (+,-,*,/,…)

– Separator (;,.,(,),{,},…)

– Keyword (int, if, for, where,…)

7

BNF (con’t)

// comments …void main ( ) {

float p;p = 3.14 ;

}

Comment

Keyword

Identifier

Operator

Separator

Literal

8

BNF (con’t)

9

Regular Expressions

10

• An alternative for BNF to define a language lexical rules– x : A character

– “abc” : A literal string

– A | B : A or B

– A B : Concatenation of A and B

– A* : Zero or more occurrence of A

– A+ : One or more occurrence of A

– A? : Zero or one occurrence of A

– [a-z A-Z] : Any alphabetic character

– [0-9] : Any digit

– . : Any single character

• Example

Integer : [0-9]+

Identifier : [a-z A-Z][a-z A-Z 0-9]*

Syntactic Analysis

11

• Primary tool: BNF• Input: Tokens from lexical analysis• Output: Parse• Syntactic categories– Program

• Declaration• Assignment• Expression• Loop• Function definition

Syntactic Analysis (con’t)

12

• ExampleArithmetic Expression Term | Arithmetic Expression +

Term | Arithmetic Expression – Term

Term Factor | Term * Factor | Term / Factor

Factor Identifier | Literal | ( Arithmetic Expression )

Syntactic Analysis (con’t)

13

• Example2 * a - 3

Arithmetic Expression

Term

Term

Factor

Factor

3

Identifier

Literal

Arithmetic Expression

Term

Factor

Literal

Integer

-

*

2

Integer

Letter

a

Syntactic Analysis (con’t)

14

• BNF limitations–Declaration of identifiers?– Initial value of identifiers?

• In statically typed languages–Using Type System for the first

problem–Detect in compile time or run

time

Ambiguous Grammar

15

• A string is parsed into two or more various trees• Example

Exp Identifier | Literal | Exp – ExpInput: A – B – COutput: 1- A – (B – C)

2- (A – B) – C• Another example is “dangling else”– Using BNF rules– Using extra-grammatical rules

Operator Precedence

16

<expr> <id> + <expr> | <id> * <expr>

| ( <expr> ) | <id>

A = B + C * A A = B + (C * A)

A = B * C + A A = B * (C + A)

Solution

<expr> <expr> + <term> | <term>

<term> <term> * <factor> | <factor>

<factor> ( <expr> ) | <id>

A = B + C * A A = B + (C * A)

A = B * C + A A = (B * C) + A

Associativity of Operators

17

A + B + C A * B * C A / B / C …

• Left Associativity– Left Recursive: In a grammar rule, LHS also appears at the

beginning of its RHS

<expr> <expr> + <term> | <term>

A + B + C (A + B) + C

• Right Associativity– Right Recursive: In a grammar rule, LHS also appears at the

end of its RHS

<factor> <exp> ** <factor> | <exp>

<exp> ( <expr> ) | <id>

A + B ** C A + (B ** C)

Extended BNF (EBNF)

18

• Optional part of an RHS

<if_stmt> if ( <expr> ) <statement> [ else <statement> ]

• Repetition, or recursion, part of an RHS

<id_list> <id> { , <id_list> }

• Multiple choice option of an RHS

<term> <term> ( * | / | % ) <factor>

• Optional use of * and +

<id_list> <id> { , <id_list> }*

<integer> {0 | … | 9}+

Extended BNF (EBNF) (con’t)

19

• opt subscript

Conditional Statement if ( Expr ) Statement { else Statement }opt

• Syntax Diagram

FactorTerm

* | /

Case Study

20

• A BNF or EBNF for one grammar, such as Expression, different Literals, or if Statement in Java, C, C++, or Pascal• BNF or EBNF for floating point

numbers in Java, C, C++• BNF or EBNF for loop statements

in one language

Abstract Syntax

21

• Consider the following codes:

Although syntax are different, they are essentially equivalent

• Abstract Syntax is a solution to show the essential elements of a language

• PascalWhile i < 10 dobegin

i := i+ 1;end;

• C or Javawhile (i < 10) {

i = i + 1;}

Abstract Syntax (con’t)

22

• General FormAbstract Syntax Class = list of essential components

• ExampleLoop = Expression test; Statement body

• A Java class for abstract syntax of loop class Loop extends Statement {

Expression test;Statement body;

}

Member

Element

Abstract Syntax (con’t)

23

• More examplesAssignment = Variable target; Expression source

• A Java class for abstract syntax of Assignment class Assignment extends Statement {

Variable target;Expression source;

}

Member

Element

Abstract Syntax Tree

24

• A tree to show the abstract syntax treeExamplex = 2; x := 2;

Assignment = Variable target; Expression source

Statement

Assignment

x

Variable Expression

2

Value

Recursive Descent Parser

25

• A top-down parser to verify the syntax of a stream of text from left to right

• It contains several recursive methods, each of which implements a rule of the grammar

• More details and parsing algorithms in Compiler course

Exercises

26

1.Modify the following grammar to add a unary minus operator that has higher precedence than either + or *.

<assign> <id> = <expr>

<id> A | B | C

<expr> <expr> + <term> | <term>

<term> <term> * <factor> | <factor>

<factor> ( <expr> ) | <id>

Exercises

27

2.Consider the following grammar:

<S> <A> a <B> b

<A> <A> b | b

<B> a <B> | a

Which of the following sentences are in the language generated by this grammar?

1. baab

2. bbbab

3. bbaaaaa

4. bbaab

Exercises

28

3. Convert the following EBNF to BNF:

S A { bA }

A a [b]A

4. Using grammar in question 1, add the ++ and – unary operators of Java.

5. Using grammar in question 1, show a parse tree and a leftmost derivation for each of the following statements:

a) A = (A+B) * C

b) A = B * (C * (A + B))

Exercises

29

6. Rrewrite the BNF in question 1 to give + precedence over *, and force + to be right associative.

7. Using BNF write an algorithm for the language consisting of strings {ab}n, where n>0, such as ab, aabb, … .

Can you write this using regular expressions?