using javacc professor yihjia tsai tamkang university

22
Using JavaCC Using JavaCC Professor Yihjia Professor Yihjia Tsai Tsai Tamkang University Tamkang University

Post on 21-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using JavaCC Professor Yihjia Tsai Tamkang University

Using JavaCCUsing JavaCC

Professor Yihjia TsaiProfessor Yihjia Tsai

Tamkang UniversityTamkang University

Page 2: Using JavaCC Professor Yihjia Tsai Tamkang University

2

Automating Lexical Analysis Automating Lexical Analysis Overall pictureOverall picture

Tokens

Scanner generator

NFAREJava scanner program

String stream

DFA

Minimize DFA

Simulate DFA

Page 3: Using JavaCC Professor Yihjia Tsai Tamkang University

3

Building Faster Scanners Building Faster Scanners from the from the DFADFA

Table-driven recognizers waste a lot of effortTable-driven recognizers waste a lot of effort• Read (& classify) the next characterRead (& classify) the next character• Find the next state Find the next state • Assign to the state variable Assign to the state variable • Branch back to the topBranch back to the top

We can do betterWe can do better• Encode state & actions in the code Encode state & actions in the code • Do transition tests locallyDo transition tests locally• Generate ugly, spaghetti-like codeGenerate ugly, spaghetti-like code (it is OK, this is automatically generated code)(it is OK, this is automatically generated code)• Takes (many) fewer operations per input characterTakes (many) fewer operations per input character

state = s0 ;

string = ; char = get_next_char();while (char != eof) { state = (state,char); string = string + char; char = get_next_char();}if (state in Final) then report acceptance;else report failure;

Page 4: Using JavaCC Professor Yihjia Tsai Tamkang University

4

Inside lexical analyzer Inside lexical analyzer generatorgenerator

• How does a lexical analyzer work?How does a lexical analyzer work?– Get input from user who defines tokens Get input from user who defines tokens

in the form that is equivalent to regular in the form that is equivalent to regular grammargrammar

– Turn the regular grammar into a NFATurn the regular grammar into a NFA– Convert the NFA into DFAConvert the NFA into DFA

– Generate the code that simulates theGenerate the code that simulates the DFADFA

Page 5: Using JavaCC Professor Yihjia Tsai Tamkang University

5

Flow for Using JavaCCFlow for Using JavaCC

Extracted from http://www.cs.unb.ca/profs/nickerson/courses/cs4905/Labs/L1_2006.pdf

Page 6: Using JavaCC Professor Yihjia Tsai Tamkang University

6

Structure of a JavaCC FileStructure of a JavaCC File

• A JavaCC file is composed of 3 portions: A JavaCC file is composed of 3 portions: – OptionsOptions– Class declarationClass declaration– Specification for lexical analysis (tokens), Specification for lexical analysis (tokens),

and specification for syntax analysis. and specification for syntax analysis.

• For the very first example of JavaCC, let's For the very first example of JavaCC, let's recognize two tokens: ``+'', and recognize two tokens: ``+'', and numerals. numerals.

• Use an editor to edit and save it with file Use an editor to edit and save it with file name name numeral.jjnumeral.jj

Focus of this Focus of this LectureLecture

Focus of this Focus of this LectureLecture

Page 7: Using JavaCC Professor Yihjia Tsai Tamkang University

7

Using javaCC for lexical analysisUsing javaCC for lexical analysis

• javacc is a “top-down” parser javacc is a “top-down” parser generator.generator.

• Some parser generators (such as Some parser generators (such as yacc , bison, and JavaCUP) need yacc , bison, and JavaCUP) need a separate lexical-analyzer a separate lexical-analyzer generator.generator.

• With javaCC, you can specify the With javaCC, you can specify the tokens within the parser tokens within the parser generator.generator.

Page 8: Using JavaCC Professor Yihjia Tsai Tamkang University

8

Example FileExample File

/* main class definition */PARSER_BEGIN(Numeral)public class Numeral{ public static void main(String[] args) throws ParseException, TokenMgrError { Numeral numeral = new Numeral(System.in); while (numeral.getNextToken().kind!=EOF); }}PARSER_END(Numeral)

/* token definitions */TOKEN:{ <ADD: "+">| <NUMERAL: (["0"-"9"])+>}

Page 9: Using JavaCC Professor Yihjia Tsai Tamkang University

9

OptionsOptions

• The options portion is optional and is omitted in The options portion is optional and is omitted in the previous example. the previous example.

• STATIC is a boolean option whose default value is STATIC is a boolean option whose default value is true. If true, all methods and class variables are true. If true, all methods and class variables are specified as static in the generated parser and specified as static in the generated parser and token manager. token manager. – This allows only one parser object to be present, but it This allows only one parser object to be present, but it

improves the performance of the parser. improves the performance of the parser. – To perform multiple parses during one run of your Java To perform multiple parses during one run of your Java

program, you will have to call the ReInit() method to program, you will have to call the ReInit() method to reinitialize your parser if it is static. reinitialize your parser if it is static.

– If the parser is non-static, you may use the "new" If the parser is non-static, you may use the "new" operator to construct as many parsers as you wish. operator to construct as many parsers as you wish. These can all be used simultaneously from different These can all be used simultaneously from different threads. threads.

Page 10: Using JavaCC Professor Yihjia Tsai Tamkang University

10

StartStart/* main class definition */PARSER_BEGIN(Numeral)public class Numeral{ public static void main(String[] args) throws ParseException, TokenMgrError { Numeral numeral = new Numeral(System.in); while (numeral.getNextToken().kind!=EOF); }}PARSER_END(Numeral)

/* token definitions */TOKEN:{ <ADD: "+">| <NUMERAL: (["0"-"9"])+>}

Simple Loop

Getting Tokens

Simple Loop

Getting Tokens

Page 11: Using JavaCC Professor Yihjia Tsai Tamkang University

11

CompilationCompilation

After calling javacc to compile numeral.jj, eight files are generated if no error messages occur.

They are Numeral.java, NumberalConstants.java, NumeralTokenManger.java, ParseException.java, SimpleCharStream.java, Token.java, and TokenMgrError.java.

bash-2.05$ javacc numeral.jj

Java Compiler Compiler Version 3.2 (Parser Generator)

(type "javacc" with no arguments for help)

Reading from file numeral.jj . . .

File "TokenMgrError.java" does not exist. Will create one.

File "ParseException.java" does not exist. Will create one.

File "Token.java" does not exist. Will create one.

File "SimpleCharStream.java" does not exist. Will create one.

Parser generated successfully

Page 12: Using JavaCC Professor Yihjia Tsai Tamkang University

12

javaCC specification of a lexerjavaCC specification of a lexer

Note the needNote the need for ( )!for ( )!

Defining Defining WhitespaceWhitespace

Page 13: Using JavaCC Professor Yihjia Tsai Tamkang University

A Full ExampleA Full Example

See the sample fileSee the sample file

Page 14: Using JavaCC Professor Yihjia Tsai Tamkang University

14

Dealing with errorsDealing with errors

• Error reporting:Error reporting: 123e+q 123e+q• Could consider it an invalid token Could consider it an invalid token

(lexical error) or (lexical error) or • return a sequence of valid tokens return a sequence of valid tokens

– 123, e, +, q, 123, e, +, q, – and let the parser deal with the error.and let the parser deal with the error.

Page 15: Using JavaCC Professor Yihjia Tsai Tamkang University

15

Lexical error correction?Lexical error correction?

• Sometimes interaction between Sometimes interaction between the Scanner and parser can helpthe Scanner and parser can help– especially in a top-down (predictive) especially in a top-down (predictive)

parseparse– The parser, when it calls the scanner, The parser, when it calls the scanner,

can pass as an argument the set of can pass as an argument the set of allowable tokens.allowable tokens.

– Suppose the Scanner sees Suppose the Scanner sees calsscalss in a in a context where only a top-level context where only a top-level definition is allowed. definition is allowed.

Page 16: Using JavaCC Professor Yihjia Tsai Tamkang University

16

Same symbol, different Same symbol, different meaning.meaning.

• How can the scanner distinguish How can the scanner distinguish between binary minus and unary between binary minus and unary minus?minus?– x = -a; x = -a; vsvs x = 3 – a;x = 3 – a;

Page 17: Using JavaCC Professor Yihjia Tsai Tamkang University

17

Scanner “troublemakers”Scanner “troublemakers”

• Unclosed stringsUnclosed strings• Unclosed comments.Unclosed comments.

Page 18: Using JavaCC Professor Yihjia Tsai Tamkang University

JavaCC as a Parsing JavaCC as a Parsing ToolTool

Page 19: Using JavaCC Professor Yihjia Tsai Tamkang University

19

Javacc OverviewJavacc Overview

• Generates a top down parser.Generates a top down parser.Could be used for generating a Prolog Could be used for generating a Prolog

parser which is in LL.parser which is in LL.• Generates a parser in Java.Generates a parser in Java.

Hence can be integrated with any Java Hence can be integrated with any Java based Prolog compiler/interpreter to based Prolog compiler/interpreter to continue our example.continue our example.

• Token specification and grammar Token specification and grammar specification structures are in the same specification structures are in the same file => easier to debugfile => easier to debug..

Page 20: Using JavaCC Professor Yihjia Tsai Tamkang University

20

Types of Productions in JavaccTypes of Productions in Javacc

There can be four different kinds of Productions.There can be four different kinds of Productions.• Javacode Javacode

For something that is not context free or is difficult to For something that is not context free or is difficult to write a grammar for.write a grammar for.eg) recognizing matching braces and error processing.eg) recognizing matching braces and error processing.

• Regular ExpressionsRegular Expressions Used to describe the tokens (terminals) of the Used to describe the tokens (terminals) of the

grammar.grammar.• BNFBNF

Standard way of specifying the productions of the Standard way of specifying the productions of the grammar.grammar.

• Token Manager DeclarationsToken Manager Declarations The declarations and statements are written into the The declarations and statements are written into the

generated Token Manager (lexer) and are accessible generated Token Manager (lexer) and are accessible from within lexical actions.from within lexical actions.

Page 21: Using JavaCC Professor Yihjia Tsai Tamkang University

21

Javacc Look-ahead mechanismJavacc Look-ahead mechanism

• Exploration of tokens further ahead in the input stream.Exploration of tokens further ahead in the input stream.• Backtracking is unacceptable due to performance hit.Backtracking is unacceptable due to performance hit.• By default Javacc has 1 token look-ahead. Could specify any By default Javacc has 1 token look-ahead. Could specify any

number for look-ahead.number for look-ahead.• Two types of look-ahead mechanismsTwo types of look-ahead mechanisms

Syntactic Syntactic A particular token is looked ahead in the input A particular token is looked ahead in the input

stream.stream. SemanticSemantic

Any arbitrary Boolean expression can be specified Any arbitrary Boolean expression can be specified as a look-ahead parameter.as a look-ahead parameter.

eg) A -> aBc and B -> b ( c )? Valid strings: eg) A -> aBc and B -> b ( c )? Valid strings: “abc” and “abcc”“abc” and “abcc”

Page 22: Using JavaCC Professor Yihjia Tsai Tamkang University

22

ReferencesReferences

• Compilers Principles, Techniques and Compilers Principles, Techniques and Tools, Aho, Sethi, and UllmanTools, Aho, Sethi, and Ullman