cse 425: syntax ii context free grammars and bnf in context free grammars (cfgs), structures are...

7
CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur form (BNF) notation describes CFGs Symbols are either tokens or nonterminal symbols Productions are of the form nonterminal definition where definition defines the structure of a nonterminal Rules may be recursive, with nonterminal symbol appearing both on left side of a production and in its own definition Metasymbols are used to identify the parts of the production (arrow), alternative definitions of a nonterminal (vertical bar) Next time we’ll extend metasymbols for repeated (braces) or optional (square brackets) structure in a definition (EBNF)

Upload: megan-blake

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur

CSE 425: Syntax II

Context Free Grammars and BNF• In context free grammars (CFGs), structures are

independent of the other structures surrounding them

• Backus-Naur form (BNF) notation describes CFGs– Symbols are either tokens or nonterminal symbols– Productions are of the form nonterminal → definition where

definition defines the structure of a nonterminal– Rules may be recursive, with nonterminal symbol appearing

both on left side of a production and in its own definition– Metasymbols are used to identify the parts of the production

(arrow), alternative definitions of a nonterminal (vertical bar)– Next time we’ll extend metasymbols for repeated (braces) or

optional (square brackets) structure in a definition (EBNF)

Page 2: CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur

CSE 425: Syntax II

Parse Trees and Abstract Syntax Trees• Parse trees show derivation of a structure from BNF

– E.g., number → DIGIT | DIGIT number

• Abstract syntax trees (ASTs) encapsulate the details– Very useful for converting between structurally similar forms

parse tree abstract syntax tree

number

numberDIGIT

numberDIGIT

DIGIT

4

2

5

hornclause

bodyhead

predicate …

Page 3: CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur

CSE 425: Syntax II

Ambiguity, Associativity, Precedence• If any statement in the language has more than one

distinct parse tree, the language is ambiguous– Ambiguity can be removed implicitly, as in always replacing

the leftmost remaining nonterminal (an implementation hack)

• Recursive production structure also can disambiguate – E.g., adding another production to the grammar to establish

precedence (lower in parse tree gives higher precedence)– E.g., replacing exp → exp + exp with alternative productions

exp → exp + term or exp → term + exp

• Recursive productions also define associativity– I.e., left-recursive form exp → exp + term is left-associative,

right-recursive form exp → term + exp is right-associative

Page 4: CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur

CSE 425: Syntax II

Extended Backus-Naur Form (EBNF)

• Optional/repeated structure is common in programs– E.g., whether or not there are any arguments to a function– E.g., if there are arguments, how many there are

• We can extend BNF with metasymbols– E.g., square brackets indicate optional elements, as in the production

function → name ‘(‘ [args] ‘)’– E.g., curly braces to indicate zero or more repetitions of elements, as in

the production args → arg {‘,’ arg}– Doesn’t change the expressive power of the grammar

• A limitation of EBNF is that it obscures associativity– Better to use standard BNF to generate parse/syntax trees

Page 5: CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur

CSE 425: Syntax II

Recursive-Descent Parsing• Shift-reduce (bottom-up) parsing techniques are powerful,

but complex to design/implement manually– Further details about them are in another course (CSE 431)– Still will want to understand how they work, use techniques

• Recursive-descent (top-down) parsing is often more straightforward, and can be used in many cases– We’ll focus on these techniques somewhat in this course

• Key idea is to design (potentially recursive) parsing functions based on the productions’ right-hand sides– Then, work through a grammar from more general rules to more

specific ones, consuming input tokens upon a match– EBNF helps with left recursion removal (making a loop) and left

factoring (making remainder of parse function optional)

Page 6: CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur

CSE 425: Syntax II

Lookahead with First and Follow Sets• Recursive descent parsing functions are easiest to

write if they only have to consider the current token– I.e., the head of a stream or list of input tokens

• Optional and repeated elements complicate this a bit– E.g., function → name ( [args] ) and arg → 0 |…| 9 and

args → arg {, arg} with ( ) 0 |…| 9 , as terminal symbols

• But, EBNF structure helps in handling these two cases– The set of tokens that can be first in a valid sequence, e.g.,

each digit in 0 |…| 9 is in the first set for arg (and for args) – The set of tokens that can follow a valid sequence of

tokens, e.g., ‘)’ is in the follow set for args– A token from the first set gives a parse function permission

to start, while one from the follow set directs it to end

Page 7: CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur

CSE 425: Syntax II

Today’s Studio Exercises• We’ll code up ideas from Scott Chapter 2.3

– Looking at more ideas and mechanisms for parsing, especially ones that are relevant to the lab assignment

• Today’s exercises are again all in C++– Please take advantage of the on-line tutorial and reference

manual pages that are linked on the course web site– As always, please ask us for help as needed

• When done, email your answers to the course account with “Syntax Studio II” in the subject line