context free grammars - missouri state...

Post on 08-Sep-2021

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Context Free Grammarshttps://courses.missouristate.edu/anthonyclark/333/

Some notes adapted from Professor GianLuigi Ferrari at University of Pisa

Outline

Topics and Learning Objectives• Formal grammar theory• Context free grammars

Assessments• Context free grammars

Picture from “Crafting Interpreters”

Interpreter

1. // GCD Program (in C)2. int main() {3. int i = getint(), j = getint();4. while (i != j) {5. if (i > j) i = i - j;6. else j = j - i;7. }8. putint(i);9. }

Source Code (Plain Text)

int main ( ) {int i = getint ( ) , j = getint ( ) ;

while ( i != j ) {if ( i > j ) i = i - j ;

else j = j - 1 ;}

putint ( i ) ;}

Lexemes/Tokens

Lexer/Scanner

Group characters into smallest meaningful units

Parser

Put Tokens in a Tree-like data structure that represents semantics of program

Evaluate/Interpret

“Walk” the tree and evaluate the program given optional input from outside of the program

User input / File Input / Sockets / Etc.

Result

We’ll create our own simple calculator language

You’ve already started this

Regular Expressionsand

Finite State Machines

Self-evident

Formal Grammarsand

Push-Down Automata

Assignments 5 through 7

We’ll build our own representationTree-Walking is pretty straightforward

Assignment 8

Interpreter

Parser

Lexer

requesttoken

sendtoken

read

Tree Walker

sendAST

I/OConsole

requestAST

… …

Chomsky Hierarchy

• Type-0: Turing machine

• Type-1: Linear bounded automaton

• Type-2: Pushdown automaton

• Type-3: Finite state automaton

Scanning vs Parsing

• Regular expressions for regular languages recognized by lexer• Context free grammars for context-free languages recognized by

parsers

REs cannot “count”• Cannot balance parenthesis• Cannot balance if then else expressions• Etc.

Example from:Writing An Interpreter In Go

Lexems/Tokens

Abstract Syntax Tree

No parentheses, semicolons, braces, etc.

We now care about syntax!

Grammar

A grammar is a tool for describing a languageA grammar is a set of rules (productions) for creating valid strings

grammar English;

sentence : subject verbPhrase object;subject : 'This' | 'Computers' | 'I';verbPhrase : adverb verb | verb;adverb : 'never';verb : 'is' | 'run' | 'am' | 'tell';object : 'the' noun | 'a' noun | noun;noun : 'university' | 'world' | 'cheese' | 'lies';

Generating Strings

We can use the grammar to generate stringsStart with the top-level rule: sentenceReplace RHS with other rules or terminalsFor example: This is a university

sentence : subject verbPhrase object;subject : 'This' | 'Computers' | 'I';verbPhrase : adverb verb | verb;adverb : 'never';verb : 'is' | 'run' | 'am' | 'tell';object : 'the' noun | 'a' noun | noun;noun : 'university' | 'world' | 'cheese' | 'lies';

Syntax vs Semantics

• You can also create syntactically valid strings that do not make sense semantically• “Computers run cheese”• “This am a lies”

• These are valid sentences, but they do not have any real meaning• The same can be true for our programming language rules• We’ll worry about semantics later.

def f():return “hi”

float y = f() + 5;

Error types

Invalid lexemes (all languages will catch this problem early)var x = 5 @ “6” # ‘@’ is not a valid operator in most languages

Valid lexemes, invalid syntax (all languages will catch this problem early)x var= 5 5 * # all valid lexemes, but in the wrong order

Valid lexemes, valid syntax, invalid semantics (catch at compile time or runtime)var x = 5 * “6” # many languages will not multiply an integer and a string

Valid lexemes, valid syntax, valid semanticsvar x = 5 * 6

Formal Grammars

Grammar a set of rules for creating valid strings

Nonterminal a grammar symbol that can be replaced by a sequence of symbols

Terminal a word in the language (cannot be replaced with something else)

Production a single rule in the grammar (XàY1Y2…)

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol is used to say that a nonterminal can be replaced with nothing

Example

Write a regular expression for the following regular language.At least 1 zero followed by at least 1 one

Now write a grammar for the same language

0 0* 1 1*

Example

• Define a regular expression where you have n 0’s followed by n 1’s

• Define a context-free grammar where you have some number of 0’s followed by the same number of 1’s

Activity

1. Write three strings that can be generated by the following grammar.

S à 1S | 0T | εT à 1T | 0S

What does this recognize?

S à 1S | 0T | εT à 1T | 0S

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S1S11S110T1101T11010S11010

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

S à 1S | 0T | εT à 1T | 0S

Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language

Production a single rule in the grammar

Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar

• 1• “”• 0101• 1111• 100• 00• …

2. Write a grammar for palindromes where your terminals are the symbols ‘a’ and ‘b’.

3. Write a grammar for representing all strings that start with x number of a’s, followed by ynumber of b’s, followed by z number of a’s, where y = x + z.

top related