Download - Syntax Analysis With Bison
Lexical and Syntax Analysis(of Programming Languages)
Bison, a Parser Generator
Bison: a parser generator
Specificationof a parser
Contextfree grammarwith a C action for each
production.
Match the input stringand execute the actions of the
productions used.
Bison
C function called
yyparse()
Input to Bison
The structure of a Bison (.y) fileis as follows.
/* Declarations */
%%
/* Grammar rules */
%%
/* C Code (including main function) */
Any text enclosed in /* and */is treated as a comment.
Grammar rules
Let α be any sequence ofterminals and nonterminals. Agrammar rule defining nonterminal n is of the form:
n : α1 action1| α2 action2| ⋯| αn actionn;
Each action is a C statement, ora block of C statements of theform {⋯ }.
Example 1
/* No declarations */
%%
e : 'x'| 'y'| '(' e '+' e ')'| '(' e '*' e ')'
%%
/* No main function */
expr1.y
/* No actions */
Nonterminal
Terminal
Output of Bison
Bison generates a C function
int yyparse() {⋯
}
Takes as input a stream of tokens.
Returns zero if input conforms togrammar, and nonzero otherwise.
Calls yylex() to get the next token.
Stops when yylex() returns zero.
When a grammar rule is used, thatrule’s action is executed.
Example 1, revisted
/* No declarations */
%%
e : 'x'| 'y'| '(' e '+' e ')'| '(' e '*' e ')'
%%
int yylex() {char c = getchar();if (c == '\n') return 0; else return c;
}
voidmain() {printf(“%i\n”, yyparse());
}
expr1.y
/* No actions */
Running Example 1
At a command prompt '>':
> bison o expr1.c expr1.y
> gcc o expr1 expr1.c ly
> expr1(x+(y*x))0
Input
Important!
Output(0 means successful parse)
Example 2
Terminals can be declared using a%token declaration, for example,to represent arithmetic variables:
%token VAR
%%
e : VAR| '(' e '+' e ')'| '(' e '*' e ')'
%%
/* main() and yylex() */
expr2.y
Example 2 (continued)
int yylex() {int c = getchar();
/* Ignore white space */while (c == ' ') c = getchar();
if (c == '\n') return 0;if (c >= 'a' && c <= 'z')return VAR;
return c;}
voidmain() {printf(“%i\n”, yyparse());
}
expr2.y
Return a VAR token
Example 2 (continued)
Alternatively, the yylex() functioncan be generated by Flex.
%{#include "expr2.h"%}
%%
" " /* Ignore spaces */\n return 0;[az] return VAR;. return yytext[0];
%%
expr2.lex
Generated by Bison
Running Example 2
At a command prompt '>':
> bison defines o expr2.c expr2.y
> flex o expr2lex.c expr2.lex
> gcc o expr2 expr2.c expr2lex.c –ly lfl
> expr2(a + ( b * c ))0
Parser
Output(0 means successful parse)
Lexer
Generate expr2.h
Example 3
Adding numeric literals:
%token VAR%token NUM
%%
e : VAR| NUM| '(' e '+' e ')'| '(' e '*' e ')'
%%
voidmain() {printf(“%i\n”, yyparse());
}
expr3.y
Numeric Literal
Example 3 (continued)
%{#include "expr3.h"%}
%%
" " /* Ignore spaces */\n return 0;[az] return VAR;[09]+ return NUM;. return yytext[0];
%%
expr3.lex
Numeric Literal
Adding numeric literals:
Semantic valuesof tokens
A token can have a semanticvalue associated with it.
A NUM token contains an integer.
A VAR token contains a variable name.
Semantic values are returned viathe yylval global variable.
Example 3 (revisited)
%{#include "expr3.h"%}
%%
" " /* Ignore spaces */\n return 0;[az] { yylval = yytext[0];
return VAR; }[09]+ { yylval = atoi(yytext);
return NUM;}
. return yytext[0];
%%
expr3.lex
Returning values via yylval:
Type of yylval
Problem: different tokens mayhave semantic values of differenttypes. So what is type of yylval?
%union{char var;int num;
}
Solution: a union type, whichcan be specified using the%union declaration, e.g.
yylval is eithera char or an int
Example 3 (revisted)
%{ #include "expr3.h" %}
%%
" " /* Ignore spaces */\n return 0;[az] { yylval.var = yytext[0];
return VAR; }[09]+ { yylval.num = atoi(yytext);
return NUM;}
. return yytext[0];
%%
expr3.lex
Returning values via yylval:
Tokens have types
The type of token’s semanticvalue can be specified in a %tokendeclaration.
%union{char var;int num;
}
%token <var> VAR;%token <num> NUM;
Semantic valuesof nonterminals
A nonterminal can also have asemantic value associated with it.
$n refers to the semantic value ofthe nth symbol in the rule;
$$ refers to the semantic value of theresult of the rule.
In the action of a grammar rule:
%type <num> e;
The type can be specified in a%type declaration, e.g.
Example 4
%{int env[256]; /* Variable environment */
%}%union{ int num; char var; }%token <num> NUM%token <var> VAR%type <num> e
%%
s : e { printf(“%i\n”, $1); }
e : VAR { $$ = env[$1]; }| NUM { $$ = $1; }| '(' e '+' e ')' { $$ = $2 + $4; }| '(' e '*' e ')' { $$ = $2 * $4; }
%%
voidmain() { env['x'] = 100; yyparse(); }
expr4.y
Exercise 1
Modify Example 4 so that yyparse()constructs an abstract syntax tree.
typedef enum { Add, Mul } Op;
struct expr {enum { Var, Num, App } tag;union {char var;int num;struct {struct expr* e1; Op op; struct expr* e2;
} app;};
};
typedef struct expr Expr;
Consider the following abstract syntax.
Precedence and associativity
The associativity of an operatorcan be specified using a %left,%right, or %nonassoc directive.
%left '+’%left '*'%right '&'%nonassoc '='
Operators specified in increasingorder of precedence, e.g. '*' hashigher precedence than '+'.
Example 5
%token VAR%token NUM
%left '+'%left '*'
%%
e : VAR| NUM| e '+' e| e '*' e| ( e )
%%
void main() {printf(“%i\n”, yyparse());
}
expr5.y
Conflicts
Sometimes Bison cannot deducethat a grammar is unambiguous,even if it is*.
In such cases, Bison will report:
* Not surprising: ambiguity detection isundecidable in general!
a shiftreduce conflict; or
a reducereduce conflict.
ShiftReduce Conflicts
Bison does not know whether toconsume more tokens (shift) or tomatch a production (reduce), e.g.
stmt : IF expr THEN stmt| IF expr THEN stmt ELSE stmt
Bison defaults to shift.
ReduceReduce Conflicts
Bison does not know whichproduction to choose, e.g.
expr : functionCall| arrayLookup| ID
functionCall : ID '(' ID ')'
arrayLookup : ID '(' expr ')'
Bison defaults to using the firstmatching rule in the file.
Variants of Bison
There are Bison variants availablefor many languages:
Language Tool
Java JavaCC, CUP
Haskell Happy
Python PLY
C# Grammatica
* ANTLR
Summary
Bison converts a contextfreegrammar to a parsing functioncalled yyparse().
yyparse() calls yylex() to obtainnext token, so easy to connectFlex and Bison.
Summary
Each grammar production mayhave an action.
Terminals and nonterminalshave semantic values.
Easy to construct abstractsyntax trees inside actionsusing semantic values.
Gives a declarative (high level)way to define parsers.