1 languages and compilers (sprog og oversættere) parsing

21
1 Languages and Compilers (SProg og Oversættere) Parsing

Upload: lora-hardy

Post on 27-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Languages and Compilers (SProg og Oversættere) Parsing

1

Languages and Compilers(SProg og Oversættere)

Parsing

Page 2: 1 Languages and Compilers (SProg og Oversættere) Parsing

2

Parsing

– Describe the purpose of the parser

– Discuss top down vs. bottom up parsing

– Explain necessary conditions for construction of recursive decent parsers

– Discuss the construction of an RD parser from a grammar

Page 3: 1 Languages and Compilers (SProg og Oversættere) Parsing

3

Top-down parsing

The cat sees a rat .The cat sees rat .

Sentence

Subject Verb Object .

Sentence

Noun

Subject

The

Noun

cat

Verb

sees a

Noun

Object

Noun

rat .

Page 4: 1 Languages and Compilers (SProg og Oversættere) Parsing

4

Bottom up parsing

The cat sees a rat .The cat

Noun

Subject

sees

Verb

a rat

Noun

Object

.

Sentence

Page 5: 1 Languages and Compilers (SProg og Oversættere) Parsing

5

Look-Ahead

Derivation

LL-Analyse (Top-Down)

Look-Ahead

Reduction

LR-Analyse (Bottom-Up)

Top-Down vs Bottom-Up parsing

Page 6: 1 Languages and Compilers (SProg og Oversættere) Parsing

6

Development of Recursive Descent Parser

(1) Express grammar in EBNF

(2) Grammar Transformations: Left factorization and Left recursion elimination

(3) Create a parser class with– private variable currentToken– methods to call the scanner: accept and acceptIt

(4) Implement private parsing methods:– add private parseN method for each non terminal N

– public parse method that

• gets the first token form the scanner

• calls parseS (S is the start symbol of the grammar)

Page 7: 1 Languages and Compilers (SProg og Oversættere) Parsing

7

Recursive Descent Parsing

Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees

Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees

Define a procedure parseN for each non-terminal N

private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();

private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();

Page 8: 1 Languages and Compilers (SProg og Oversættere) Parsing

8

Recursive Descent Parsing

public class MicroEnglishParser {

private TerminalSymbol currentTerminal;

//Auxiliary methods will go here ...

//Parsing methods will go here ...}

public class MicroEnglishParser {

private TerminalSymbol currentTerminal;

//Auxiliary methods will go here ...

//Parsing methods will go here ...}

Page 9: 1 Languages and Compilers (SProg og Oversættere) Parsing

9

Recursive Descent Parsing: Auxiliary Methods

public class MicroEnglishParser {

private TerminalSymbol currentTerminal

private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error

}

...}

public class MicroEnglishParser {

private TerminalSymbol currentTerminal

private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error

}

...}

Page 10: 1 Languages and Compilers (SProg og Oversættere) Parsing

10

Recursive Descent Parsing: Parsing Methods

private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}

private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}

Sentence ::= Subject Verb Object .Sentence ::= Subject Verb Object .

Page 11: 1 Languages and Compilers (SProg og Oversættere) Parsing

11

Recursive Descent Parsing: Parsing Methods

private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}

private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}

Subject ::= I | a Noun | the Noun Subject ::= I | a Noun | the Noun

Page 12: 1 Languages and Compilers (SProg og Oversættere) Parsing

12

Recursive Descent Parsing: Parsing Methods

private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}

private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}

Noun ::= cat | mat | ratNoun ::= cat | mat | rat

Page 13: 1 Languages and Compilers (SProg og Oversættere) Parsing

13

LL 1 Grammars

• The presented algorithm to convert EBNF into a parser does not work for all possible grammars.

• It only works for so called “LL 1” grammars.• Basically, an LL1 grammar is a grammar which can be

parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token.

• What grammars are LL1?

How can we recognize that a grammar is (or is not) LL1? We can deduce the necessary conditions from the parser

generation algorithm. We can use a formal definition

Page 14: 1 Languages and Compilers (SProg og Oversættere) Parsing

14

LL 1 Grammars

parse X* parse X*

while (currentToken.kind is in starters[X]) { parse X}

while (currentToken.kind is in starters[X]) { parse X}

parse X|Y parse X|Y

switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }

switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }

Condition: starters[X] and starters[Y] must be disjoint sets.

Condition: starters[X] and starters[Y] must be disjoint sets.

Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *

Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *

Page 15: 1 Languages and Compilers (SProg og Oversættere) Parsing

15

Formal definition of LL(1)

A grammar G is LL(1) iff for each set of productions M ::= X1 | X2 | … | Xn :1. starters[X1], starters[X2], …, starters[Xn] are all pairwise disjoint 2. If Xi =>* ε then starters[Xj]∩ follow[X]=Ø, for 1≤j≤ n.i≠j

If G is ε-free then 1 is sufficient

Page 16: 1 Languages and Compilers (SProg og Oversættere) Parsing

16

Converting EBNF into RD parsers

• The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated!

=> JavaCC “Java Compiler Compiler”

Page 17: 1 Languages and Compilers (SProg og Oversættere) Parsing

17

JavaCC and JJTree

Page 18: 1 Languages and Compilers (SProg og Oversættere) Parsing

18

LR parsing

– The algorithm makes use of a stack.

– The first item on the stack is the initial state of a DFA

– A state of the automaton is a set of LR0/LR1 items.

– The initial state is constructed from productions of the form S:= • [, $] (where S is the start symbol of the CFG)

– The stack contains (in alternating) order:

• A DFA state

• A terminal symbol or part (subtree) of the parse tree being constructed

– The items on the stack are related by transitions of the DFA

– There are two basic actions in the algorithm:

• shift: get next input token

• reduce: build a new node (remove children from stack)

Page 19: 1 Languages and Compilers (SProg og Oversættere) Parsing

19

JavaCUP: A LALR generator for Java

Grammar BNF-like Specification

JavaCUP

Java File: Parser Class

Uses Scanner to get TokensParses Stream of Tokens

Definition of tokens

Regular Expressions

JFlex

Java File: Scanner Class

Recognizes Tokens

Syntactic Analyzer

Page 20: 1 Languages and Compilers (SProg og Oversættere) Parsing

20

Steps to build a compiler with SableCC

1. Create a SableCC specification file

2. Call SableCC3. Create one or more

working classes, possibly inherited from classes generated by SableCC

4. Create a Main class activating lexer, parser and working classes

5. Compile with Javac

Page 21: 1 Languages and Compilers (SProg og Oversættere) Parsing

21

Hierarchy