design lexer && parser · 2013-05-24 · lexer process of converting a sequence of...

21
Design Lexer && parser

Upload: others

Post on 16-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

DesignLexer && parser

Page 2: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

Grammarprogram -> decls stmts

block -> {decls stmts}

decls -> decls decl | ϵ

decl -> type *id*;

type -> type [*num*] | *basic* | *record*

{decls}

stmts -> stmts stmt | ϵ

stmt -> assign;

| *if*(assign) stmt

| *if*( assign ) stmt *else* stmt

| *while*( assign ) stmt

| *do* stmt *while* ( assign );

| *break*;

| *return*;

| *return* loc;

| *print* loc;

| block

loc -> loc [assign] | *id* | loc.*id*

assign -> loc=assign | bool

bool -> bool||join | join

join -> join&&equality | equality

equality -> equality==rel | equality!=rel | rel

rel -> expr<expr | expr<=expr | expr>=expr |

expr>expr | expr

expr -> expr+term | expr-term | term

term -> term*unary | term/unary | unary

unary -> !unary | -unary | factor

factor -> (assign) | loc | *num* | *real* | *true* |

*false* | *string*

Page 3: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

GrammarTerminals

digit -> [0-9]

digits -> digit+

integralDigits -> -? digits

exponent -> ((e | E) integralDigits)

num -> digits exponent?

real -> digits . digits exponent?

basic -> (long | double | string | bool)

alpha -> [a-zA-Z]

alphaNumeric -> [a-zA-Z0-9]

id -> alpha alphaNumeric*

Page 4: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

The structure of a compiler

Page 5: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

Lexer

● Process of converting a sequence of characters into a sequence of tokens

● Called by a parser

● Usually based on a finite-state machine

● May involve backtracking

Page 6: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

Lexer and parserSome interface discussions

There are various options with pros and conse.g. Fortran 90

DO 5 I = 1.25<variable, "DO5I"> <assign> <number,"1.25">

DO 5 I = 1,25<do> <number, "5"> <variable, "I"> <assign> <number, "1"> <comma> <number, "25">

Page 7: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

LexerInterface

Page 8: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

LexerInterface

Page 9: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

LexerInterface

Page 10: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserOverview

Parser

ID, "myInt" AR_OP, "Plus" LOG, false

Arith. OperationResult Type: Int

Identifier,Declared Type: Int

left expression right expression

ConstantResult Type: BoolValue= false

Page 11: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserTasks

Parse

● convert sequence of tokens to a sequence of derivations

● deal with ambiguousness ○ dangling else

Page 12: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserTasks

Analyze

● semantic analysis○ Convert parse tree to AST○ assign attributes○ deal with circular dependencies

Page 13: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserImplementation

● parse

CFGExperimenter ( http://www.curtclifton.net/cfg-experimenter )

myInt + false

Page 14: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserImplementation

Arith. OperationResult Type: Int

Identifier,Declared Type: Int

left expression right expression

ConstantResult Type: BoolValue= false

● parse● to AST

myInt + false

Page 15: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserImplementation

Arith. OperationResult Type: Int

Identifier,Declared Type: Int

left expression right expression

ConstantResult Type: BoolValue= false

● parse● to AST● analyze

semantic

PEW PEW!Type Error!

myInt + false

Page 16: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserTo be more precise...

https://github.com/swp-uebersetzerbau-ss13/common/blob/master/doc/ast.jpeg

Page 17: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserTo be more precise...

● two design approaches○ tree structure

■ ASTNode::GetNodeList()■ reflects grammar■ supports visualisation

○ interface only design■ ASTNode::GetNodeType()/SetNodeType(n)■ implementation can use an arbitrary inheritance

hierarchy■ maximum degree of freedom

Page 18: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserRecap the situation...

● ArithmeticBinaryExpression○ extends ExpressionNode

■ extends StatementNode● extends ASTNode

● BasicIdentifier○ extends IdentifierNode

■ extends ExpressionNode● ....

● LiteralNode○ extends ExpressionNode

■ ...

myInt + false

Page 19: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserRecap the situation...

myInt + false

Arith. OperationResult Type: Int

Identifier,Declared Type: Int

left expression right expression

ConstantResult Type: BoolValue= false

Idea Realisation

ArithmeticBinaryExpression- Op: ADD (+)

BasicIdentifier- id: "myInt"- type: Boolean

getRightValue()

LiteralNode- Type: Boolean- literal:"false"

getLeftValue()

Page 20: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

ParserRecap the situation...

myInt + false

Arith. OperationResult Type: Int

Identifier,Declared Type: Int

left expression right expression

ConstantResult Type: BoolValue= false

Idea Realisation

ArithmeticBinaryExpression- Op: ADD (+)

BasicIdentifier- id: "myInt"

getRightValue()

LiteralNode- Type: Boolean- literal:"false"

getLeftValue()

PEW PEW!Type Error!

Page 21: Design Lexer && parser · 2013-05-24 · Lexer Process of converting a sequence of characters into a sequence of tokens Called by a parser Usually based on a finite-state machine

Sources

● http://cg.scs.carleton.ca/~morin/teaching/3002/notes/lex.pdf

● Frau Prof. Fehr

● Aaaand special thanks go to ...○ Florian○ Frank