grammars and parsing for ssc1 - university of birminghamhxt/teaching/ssc1/parsing08.pdf · yacc,...

68
Grammars and Parsing for SSC1 Hayo Thielecke Introduction Grammars Derivations Parse trees abstractly From grammars to code Translation to Java methods Parse trees in Java Parser generators Yacc, ANTLR, JavaCC and SableCC Summary Grammars and Parsing for SSC1 Hayo Thielecke November 2008

Upload: others

Post on 19-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Grammars and Parsing for SSC1

Hayo Thielecke

November 2008

Page 2: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Outline of the parsing part of the module

1 Introduction

2 GrammarsDerivationsParse trees abstractly

3 From grammars to codeTranslation to Java methodsParse trees in Java

4 Parser generatorsYacc, ANTLR, JavaCC and SableCC

5 Summary

Page 3: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Motivation: a challenge

Write a program that reads strings like this and evaluatesthem:

(2+3*(2-3-4))*2

In particular, brackets and precedence must be handledcorrectly (* binds more tightly than +).

If you attempt brute-force hacking, you may end up withsomething that is inefficient, incorrect, or both. —Try it!

With parsing, that sort of problem is straighforward.

Moreover, the techniques scale up to more realisticproblems.

Page 4: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Books

I hope these slides are detailed enough, but if you want to digdeeper:There are lots of book on compilers.The ones which I know best are:

Appel, Modern Compiler Design in Java.

Aho, Sethi, and Ullman, nicknamed “The Dragon Book”

Parsing is covered in both, but the Composite Pattern for treesand Visitors for tree walking only in Appel.See also the websites for ANTLR (http://antlr.org) andSableCC (http://sablecc.org).

Page 5: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Why do you need to learn about grammars?

Grammars are widespread in programming.

XML is based on grammars and needs to be parsed.

Knowing grammars makes it much easier to learn thesyntax of a new programming language.

Powerful tools exists for parsing (e.g., yacc, bison,ANTLR, SableCC, . . . ). But you have to understandgrammars to use them.

Grammars give us examples of some more advancedobject-oriented programming: Composite Pattern andpolymorphic methods.

You may need parsing in your final-year project for readingcomplex input.

Page 6: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Describing syntax

One often has to describe syntax precisely, particularlywhen dealing with computers.

“A block consists of a sequence of statements enclosed incurly brackets ”

Informal English descriptions are too clumsy and notprecise enough. Rather, we need precise rules, somethinglike Block→ . . ..

Such precise grammar rules exist for all programminglanguage. See for example http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html.

Page 7: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Example from programming language syntax

Some rules for statements S in Java or C:

S → if (E ) S else S

S → while (E ) S

S → V = E;

S → { B }B → S B

B →Here V is for variables and E for expressions.

E → E - 1

E → ( E )

E → 1

E → E == 0

V → foo

Page 8: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Nesting in syntax

Specifically, we need to be very careful about bracketing andnesting. Compare:

while(i < n)a[i] = 0;i = i + 1;

and

while(i < n){a[i] = 0;i = i + 1;}

Theses snippets looks very similar. But their difference is clearin the parse tree.

Page 9: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

What do we need in a grammar

Some symbols can occur in the actual syntax, like whileor cat.

We also need other symbols that act like variables (fornouns in English or statements in Java, say).

Rules then say how to replace these symbols, e.g. a nouncan be cat or mat.

Page 10: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Grammars: formal definition

A context-free grammar consists of

some terminal symbols a, b, . . . , +, ),. . .

some non-terminal symbols A, B, S ,. . .

a distinguished non-terminal start symbol S

some rules of the form

A→ X1 . . .Xn

where n ≥ 0, A is a non-terminal, and the Xi are symbols.

Page 11: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Notation: Greek letters

Mathematicians and computer scientists are inordinately fond ofGreek letters.

α alpha

β beta

γ gamma

ε epsilon

Page 12: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Notational conventions for grammars

We will use Greek letters α, β, . . . , to stand for strings ofsymbols that may contain both terminals and non-terminals.In particular, ε is used for the empty string (of length 0).We will write A, B, . . . for non-terminals.Terminal symbols are usually written in typewriter font, likefor, while, [.These conventions are handy once you get used to them andare found in most books.

Page 13: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Some abbreviations in grammars (BNF)

A rule with an alternative, written as a vertical bar |,

A → α | β

is the same as having two rules for the same non-terminal:

A → α

A → β

There is also some shorthand notation for repetitions:α∗ stands for zero or more occurences of α, andα+ stands for one or more occurrences of α.

Page 14: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Derivations

If A→ α is a rule, we can replace A by α for any strings βand γ on the left and right:

β A γ ⇒ β α γ

This is one derivation step.

A string w consisting only of terminal symbols isgenerated by the grammar if there is a sequence ofderivation steps leading to it from the start symbol S :

S ⇒ · · · ⇒ w

Page 15: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

An example derivation

Recall the rules

S → { B }

B → S B

B →Replacing always the leftmost non-terminal symbol, we have:

S ⇒ { B }

⇒ { S B }

⇒ { { B } B }

⇒ { { } B }

⇒ { { } S B}

⇒ { { } { B } B}

⇒ { { } { } B}

⇒ { { } { } }

Page 16: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

The language of a grammar

In this context, a language is a set of strings (of terminalsymbols).

For a given grammar, the language of that grammarconsists of all strings of terminals that can be derived fromthe start symbol.

For useful grammars, there are usually infinitely manystrings in its language (e.g., all Java programs).

Two different grammars can define the same language.Sometimes we may redesign the grammar as long as thelanguage remains the same.

Page 17: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Grammars and brackets

Grammars are good at expressing various forms of bracketingstructure.In XML: an element begins with <tag> and ends with </tag>.The most typical context-free language is the Dyck language ofbalanced brackets:

D →D → D D

D → ( D )

D → [ D ]

(This makes grammars more powerful than RegularExpressions.)

Page 18: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Matching brackets

Note that the brackets have to match in the rule

D → [ D ]

This is different from “any number of [ and then any numberof ]”.For comparison, we could have a different grammar in whichthe brackets do not have to match:

D → O D C

O → [ O

O → ε

C → C ]

C → ε

Page 19: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Another pitfall in reading rules

Recall this rule:

D → D D

Note that the repetition D D does not mean that the samestring has to be repeated. It means any string derived from aD followed by any string derived from a D. They may be thesame, but need not.

D ⇒ D D

. . . ⇒ [ ] D

. . . ⇒ [ ] [ [ ] ]

Page 20: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Recursion in grammars

A symbol may occur on the right hand-side of one of its rules:

E → (E )

We often have mutual recursion in grammars:

S → { B }

B → S B

Mutual recursion also exists in Java: for example, method fcalls g and g calls f.

Page 21: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Recursion in grammars and in Java

Compare recursion in grammars, methods and classes:

T → . . .T . . .T . . .

int sumTree(Tree t){ ...... return sumTree(t.left) + sumTree(t.right);

}

and classes

public class Tree{ ...

public Tree left;public Tree right;

}

Page 22: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Parse trees abstractly

The internal nodes are labelled with nonterminals.If there is a rule A→ X1 . . .Xn, then an internal node can havethe label A and children X1, . . . , Xn.The root node of the whole tree is labelled with the startsymbol.The leaf nodes are labelled with terminal symbols or ε.

Root: Start symbol

Non-terminal A

Terminal a . . .

Non-terminal B

. . . . . . Terminal z

Page 23: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Example: parse trees

We define a grammar such that the parse trees are binary trees:

B → 1 | 2 | . . .B → B B

Here are two parse trees (for the strings “1” and “1 1 2”):

B

1

B

B

1

B

B

1

B

2

Page 24: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Parse trees and derivations

Parse trees and derivations are equivalent; they contain thesame information.For each derivation of a word, there is a parse tree for the word.(Idea: each step using A→ α tells us that the children of someA-labelled node are labelled with the symbols in α.)For each parse tree, there is a (unique leftmost) derivation.(Idea: walk over the tree in depth-first order; each internalnode gives us a rule.)

Page 25: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Automata and grammars—what we are not goingto do

Pushdown automata (stack machines) are covered in Models ofComputation.The use of a stack for parsing will be covered in more detail inCompilers and Languages.Independently of formal automata models, we can use aprogramming perspective: grammars give us classes ormethods.For experts: we let Java manage the stack for us (as its callstack).

Page 26: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Overview: from grammars to Java code

Methods for processing the language follow the structure of thegrammar:Each non-terminal gives a method (with mutual recursionbetween such methods).Each grammar gives us a Java class hierarchy (with mutualrecursion between such classes).Each word in the language of the grammar gives us an objectof the class corresponding to the start symbol (a parse tree).

Page 27: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Recursive methods

From grammars to mutually recursive methods:

For each non-terminal A there is a method A. The methodbody is a switch statement that chooses a rule for A.

For each rule A→ X1 . . .Xn, there is a branch in theswitch statement. There are method calls for all thenon-terminals among X1, . . . , Xn.

Each grammar gives us some recursive methods.For each derivation in the language, we have a sequence ofmethod calls.

Page 28: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Reminder: the switch statement in Java

Assume test() returns an integer. We can use switch tobranch on which integer is returned:

switch(test()){case 1:// test() is 1break; // we are done with this case

case 2: case 3:// test() is 2 or 3break;

default:// test() had some other valuebreak;

}

Page 29: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Switch and if

We could write the same more clumsily with if and else:

temp = test();if (temp == 1) {// test() returned 1

}else if (temp == 2 && temp == 3) {// test() returned 2 or 3}else {

// test() returned some other value}

Page 30: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Grammar of switch

We extend the grammar for statements S with switchstatements:

S → switch (E) { B }

B → C B | εC → L T

L → K | default:K → case V : | case V : K

T → break; | returnE; | S T

Here E can be any expression and V any constant value.

Page 31: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Example: a Java method for a grammar rule

The grammar rule for while-loops S → while (E ) S givesus this Java code:

public static void S() { // method for Sswitch(...) { // which rule for S?

case ...: // if this one:... "while (" ... // process some textE(); // call method for E... ")" ... // process some textS(); // call method for Sbreak; // finished with this rule

case ...: // other rules}

}

Page 32: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Recognizing or parsing

Abstactly, a parser reads a string and constructs a parsetree for that string if there is one; otherwise, the parserreports an error. That is equivalent to recognizing.

The parse tree need not be constructed as an actual datastructure in memory, since that can consume a lot ofstorage. It is enough if the parser provides enoughinformation to construct the parse tree in principle, usingevents or parsing actions.

Example: the Xerces XML parser constructs a parse tree,SAX does not.

Page 33: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Recognizing a grammar

There are many different parsing technologies (LL(k),LR(1), LALR(1), . . . ).

Here we consider only predictive parsers, sometime alsocalled recursive descent parsers. They correspond totranslating grammar rules into code as described above.

The hard part is choosing the rules according to thelookahead.

Page 34: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Lookahead example

S → if (E ) S else S

S → while (E ) S

S → V = E;

S → { B }

B → S B

B →

If we see if in the input, we choose the first rule; if we seewhile, the second.Less obvious: how to chose between the last two rules.

Page 35: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

An easy grammar to parse

Many constructs start with a keyword telling us immediatelywhat it is.Some languages are easy to parse with lookahead, e.g. Lispand Scheme. Idea: everything is a prefix operator.

E → (+ E E)

E → (* E E)

E → 1

Obvious with 2 symbols of lookahead. Can do with 1 bytweaking the grammar.C and Java are not so easy to parse (and C++ is worse). Anytype name can begin a function definition, for instance.

Page 36: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Predictive parser

A predictive parser can be constructed from grammarrules.

The parser is allowed to “look ahead” in the input;based on what it sees there, it then makes predictions.

Canonical example: matching brackets.If the parser sees a [ as the next input symbol, it“predicts” that the input contains something in brackets.

More technically: switch on the lookahead; [ labels onecase.

Page 37: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

The lookahead and match methods

A predictive parser relies on two methods for accessing theinput string:

char lookhead() returns the next symbol in the input,without removing it.

void match(char c) compares the next symbol in theoutput to c. If they are the same, the symbol is removedfrom the input. Otherwise, the parsing is stopped with anerror; in Java, this can be done by throwing an exception.

Page 38: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Simplied view of parsing

In the real world, lookahead and match are calls to thelexical analyzer, and they return tokens, not characters.

There are efficiency issues of buffering the input file, etc.

We ignore all that to keep the parser as simple as possible.(Only single-letter keywords.)

But this setting is sufficient to demonstrate the principles.

Page 39: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Parsing with lookahead

Parsing with lookahead is easy if every rule for a givennon-terminal starts with a different terminal symbol:

S → [ S ]

S → +

Idea: suppose you are trying to parse an S . Look at the firstsymbol in the input:if it is a [, use the first rule;if it is a +, use the second rule.

Page 40: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Translation to code (only recognizing)

void parseS() throws ParseError{

switch(lookahead()) { // what is in the input?case ’[’: // If I have seen a [

match(’[’); // remove the [parseS(); // now parse what is inside [...]match(’]’); // make sure there is a ]break; // done in this case

case ’+’: // If I have seen a +match(’+’); // get rid of itbreak; // and we are done

default: error(); // throws ParseError}

}

Page 41: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

How do we get the symbols for the case labels?

Parsing with lookahead is easy if every rule for a givennon-terminal starts with a different terminal symbol.

In that case, the lookahead immediately tells us which ruleto choose.

But what if not? The right-hand-side could instead startwith a non-terminal, or be the empty string.

More general methods for using the lookahead: FIRSTand FOLLOW construction.

Page 42: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

FIRST and FOLLOW sets

We define FIRST, FOLLOW and nullable:

A terminal symbol b is in first(α) if there is a derivation

α⇒∗ b β

(b is the first symbol in something derivable from α).

A terminal symbol b is in follow(X ) if if there is aderivation

S ⇒∗ αX b γ

(b can appear immediately behind X in some derivation)

α is nullable if α⇒∗ ε (we can derive the empty stringfrom it)

Page 43: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

FIRST and FOLLOW give the case labels

FIRST and FOLLOW gives us the case labels for thebranches of the switch statement.

The branch for A→ α gets the labels in FIRST(α).

The branch for A→ ε gets the labels in FOLLOW(A).

FIRST and FOLLOW are tedious to compute by hand. Wewon’t go into the details here.

Parser generators compute this sort of informationautomatically.

Page 44: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

From grammars to Java classes

We translate a grammar to some mutually recursive Javaclasses:

For each non-terminal A there is an abstract class A

For each rule A→ X1 . . .Xn, there is a concrete subclassof A. It has fields for all non-terminals among X1, . . . , Xn.

The toString method of A concatenates the toStringof all the Xi :if Xi is a terminal symbol, it is already a string;if Xi is a non-terminal, its toString method is called.

Thus each grammar gives us a class hierarchy.Instead of an abstract class, we could also use an interface foreach non-terminal A, which the classes for the rules thenimplement.

Page 45: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Parse trees as Java objects

Suppose we translate the grammar to Java classes. Then foreach string in the language, we have an object. Its fields mayrefer to other objects.Together, these objects represent the parse tree for that word.Its toString method constructs a (leftmost) derivation of thesentence.

Page 46: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Example: binary trees

abstract class BinTree {public abstract String toString();

}

class Leaf extends BinTree{

private int label;

Leaf(int n) { label = n; }

public String toString() { return "" + label; }}

Page 47: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Example continued

class Node extends BinTree{

private BinTree left, right;

Node(BinTree l, BinTree r){

left = l; right = r;}

public String toString() {return "(" + left.toString() + ","+ right.toString() + ")"; }

}

Exercise: create an object whose toString() prints((1,2),(1,2)).

Page 48: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Example: a Java class for a grammar rule

The grammar rule for while-loops

S → while (E ) S

gives us this Java class:

public class While extends S{

private E testExp;

private S loopBody;

public String toString () {return "while (" + testExp.toString() + ")"+ loopBody;

}}

Page 49: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

COMPOSITE pattern and (parse) trees

The representation of parse trees is an instance of theCOMPOSITE pattern in object-oriented software. Gammaet. al. define it as:“Compose objects into tree structures to represent part-wholehierarchies.”(From Design Patterns: Elements of Reusable Object-OrientedSoftware by Erich Gamma, Richard Helm, Ralph Johnson, JohnVlissides.)This is essentially the representation parse trees describedabove. It describes a “hierarchy” as a tree, where the nodes arethe whole and its children the parts.

Page 50: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Composite pattern in UML notation

Consider the binary tree grammar above. In UML notation, wehave the following class diagram:

abstract class B

class Leaf class Node

A B “is a” Leaf or a Node, and a Node “has a” B.

Page 51: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Building the parse tree

For the parser that only recognizes, we had a void return type.We extend our translation of grammar rules to code:

The method for non-terminal A has as its return type theabstract class that we created for A.

Whenver we call a method for a non-terminal, weremember its return value in a local variable.

At the end of the translation of a rule, we call theconstructor.

Page 52: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Classes for the parse tree

Grammar: S → [ S ] | +

abstract class S { ... }

class Bracket extends S{

private S inBrackets;

Brackets(S s) { inBrackets = s; }...

}

class Plus extends S { ... }

Page 53: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Parsing while building the parse tree

S parseS() // return type is abstract class S{

switch(lookahead()) {case ’[’:

{S treeS;match(’[’);treeS = parseS(); // remember treematch(’]’);return new Bracket(treeS);// call the constructor for this rule

}...

}}

Page 54: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Methods in the parse tree

To perform useful work after the parse tree has beenconstructed, we call its methods.Canonical example: expression evaluation. Suppose we have agrammar for arithmetical expressions:

E → E + E

E → E ∗ E

E → 1 | 2 | 3 | . . .

The each expression “knows” how to evaluate itself (dependingon whether it is an addition, a multiplication, . . . ).The methods in the parse tree can be given additionalparameters to move information around the tree.

Page 55: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Expression evaluation as a method of the parse tree

abstract class Expression { abstract int eval() ... }

class Plus extends Expression{

....public int eval()

{return left.eval() + right.eval();

}....

}

Page 56: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Adding parameters to the methods

Suppose we also have variables: E → x | . . ..

abstract class Expression{

abstract int eval(Environment env);}

class Variable extends Expression{

private String name;

public int eval(Environment env) {return env.get(name);

}}

Page 57: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

OO and parsing

Parse trees are an instance of the composite pattern.

The classes made for rules A→ α extend the class for A

The return type of a parsing method for A is the abstractclass A, but what is actually return is an instance of one ofits subclasses.

The methods in the abstract class are overidden by thesubclasses. During a treewalk, we rely on dynamicdispatch.

Polymorphism is crucial.

Page 58: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Parser generators

Except when the grammar is very simple, one typically does notprogram a parser by hand from scratch.Instead, one uses a parser generator.Compare:

JavaCompiler−→ JVM code

GrammarParser generator−→ Parser

Examples of parser generators: yacc, bison, ANLTR, JavaCC,SableCC, . . .

Page 59: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

More on parser generators

Parser generators use some ASCII syntax rather thansymbols like →.

With yacc, one attaches parsing actions to eachproduction that tell the parser what to do.

Some parsers construct the parse tree automatically. Allone has to do is tree-walking.

Parser generators often come with a collection of usefulgrammars for Java, XML, HTML and other languages

If you need to parse a non-standard language, you need agrammar suitable for input to the parser generator

Pitfalls: ambiguous grammars, left recursion

A parser generator is often combined with a lexicalanalyzer generator (like lex and yacc).

Page 60: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Yacc parser generator

“Yet Another Compiler-Compiler”

An early (1975) parser generator geared towards C.

Technically, an LALR(1) parser. LALR(1) is toocomplicated to explain here.

Very influential. You should have heard about if forhistorical reasons (like 1066).

Still widely referred to: “This is the yacc for 〈blah〉”means “This large automates doing 〈blah〉 while hidingmuch of the complexity”

Linux version is called bison, seehttp://www.gnu.org/software/bison/.

Page 61: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

ANTLR parser generator

Works with Java or C

Download and documentation at http://antlr.org

uses LL(k): similar to our predictive parsers, but usingmore than one symbol of lookahead

Parse tree can be constructed automatically

you just have to annotate the grammar to tell ANTLRwhen to construct a node

The parse trees are a somewhat messy data structure, notvery typed or OO

ANTLR has been used in several student projects here

See http://www.antlr.org/wiki/display/ANTLR3/Expression+evaluator for a simple example.

Page 62: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

JavaCC parser generator

JavaCC is a parser generator aimed at Java.

See https://javacc.dev.java.net/ for downloads anddocumentation.

uses LL(1) if possible

Blurb: “Java Compiler Compiler [tm] (JavaCC [tm]) is themost popular parser generator for use with Java [tm]applications.”

Page 63: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

SableCC parser generator

Works with Java

Download and documentation at http://sablecc.org

uses LALR(1) for parsing, just like Yacc

SableCC has no problem with left recursion, as LR parsingdoes not only depend on the look-ahead

you may get cryptic errors about shift/reduce andreduce/reduce conflicts if the grammar is unsuitable

SableCC constructs an object-oriented parse tree, similarto the way we have constructed Java classes

uses the Visitor Pattern for processing the parse tree

SableCC has been used in several students projects here

Page 64: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Visitor pattern for walking the parse tree

Having to modify the methods in the tree classes is poorsoftware engineering.

It would be much better if we had a general-purpose treewalker into which we can plug whatever functionality isdesired.

The canonical way to do that is the Visitor Design pattern.

The tree is separate from specialized code that “visits” it.

The tree classes only have “accept” methods for visitorobjects.

See the Design Patterns book by Gamma, Helms, Johnsonand Vlissides, or www.sablecc.org.

Page 65: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

A problem: left recursion

Left recursion is a problem for the simple parsers we havewritten, and for some parser generators, like ANTLR andJavaCC.Example:

E → E − E

E → 1

The symbol 1 is in the lookahead for both rules, so cannotguide our choice between them.If you have this situation, and you want to use ANTLR, youneed to rewrite the grammar.There is a standard technique for eliminating left recursion,desribed in most Compiling books.Left recursion is not a problem for LR parsers like yacc orSableCC.

Page 66: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Another problem: ambiguous grammars

A grammar is ambiguous if there is a string that has more thanone parse tree.Standard example:

E → E − E

E → 1

One such string is 1-1-1. It could mean (1-1)-1 or 1-(1-1)depending on how you parse it.Ambiguous grammars are a problem for parsing, as we do notknow what is intended.Again, one can try to rewrite the grammar to eliminate theproblem.

Page 67: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Parser generator overview

Parser generator LR or LL Tree processing

Yacc/Bison LALR(1) Parsing actions in C

SableCC LALR(1) Visitor Pattern in Java

ANTLR LL(k) Tree grammars + Java or C++

JavaCC LL(k) JJTree + Visitors in Java

Page 68: Grammars and Parsing for SSC1 - University of Birminghamhxt/teaching/ssc1/parsing08.pdf · Yacc, ANTLR, JavaCC and SableCC Summary Outline of the parsing part of the module 1 Introduction

Grammars andParsing for

SSC1

HayoThielecke

Introduction

Grammars

Derivations

Parse treesabstractly

Fromgrammars tocode

Translation toJava methods

Parse trees inJava

Parsergenerators

Yacc, ANTLR,JavaCC andSableCC

Summary

Summary

We have seen some basics of parsing (grammars,derivations, parse trees).

We have translated grammars to Java code.You should be able to do this (exercise, exam).

We have touched upon some more advanced material(FIRST/FOLLOW, parser generators, Visitor pattern).

The code contains generally useful programming concepts:mutual recursion of methods and data, composite pattern,abstract classes, polymorphism, exceptions, . . .

In practice, one would use a parser generator rather thenreinvent the wheel

One still needs to have some idea of what a parsergenerator does, as it is not just a button to click on