syntax: 10/18/2015it 3271 semantics: describe the structures of programs describe the meaning of...

Post on 01-Jan-2016

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Syntax:

04/20/23 IT 327 1

Semantics:

Describe the structures of programs

Describe the meaning of programs

Programming Languages (formal languages)

-- How to describe them?-- How to use them? (machine and human)

Grammars -- Ambiguous (sometimes)

Textbook, manuals -- Confusing (always) solution: denotation semantics

(for nuts only)

solution: using unambiguous only

English Grammar The man hit the ball. subject verb object

04/20/23 IT 327 2

The man saw the girl with a telescope. subject verb object

The purpose of grammar:

To have a device to generate all valid sentences in the target language (from a root).

To tell whether a sentence is valid.

Chomsky:

(old fashion)

Noam Chomsky1928 -

04/20/23 IT 327 3

http://www.canada.com/nationalpost/news/issuesideas/story.html?id=1385b76d-6c34-4c22-942a-18b71f2c4a44

Syntactic Structures (1957)

Generative Grammar

A valid sentence is generated from a root according to some fixed rules (grammar).

A generative grammar in Syntactic Structures

04/20/23 IT 327 4

S

NP

VP

T N

Verb

the | a

man | ball | car

hit | take | took | run | ran

NP

T

N

NP +

VP

Verb

+

+

…..

…..

root

terminal symbols

non-terminal symbols

Syntactic Structures

04/20/23 IT 327 5

S

NP VP

T N Verb

the man

the ball

hit

NP

T N

the man hit the ball

Backus-Naur Form, BNF

04/20/23 IT 327 6

<S> ::= <NP> <VP> <NP> ::= <T> <N><VP> ::= <V> <NP><T> ::= the<N> ::= man | ball<V> ::= hit | took

<S> ::= <NP> <V> <NP><NP> ::= <A> <N><V> ::= loves | hates|eats<A> ::= a | the<N> ::= dog | cat | rat

Grammar 1 Grammar 2

<S> ::= <NP> <V> <NP> | <NP> <VP> <NP> ::= <T> <N> | <A> <N><V> ::= loves | hates|eats |hit | took<A> ::= a | the<T> ::= the <N> ::= dog | cat | rat|man | ball

Deviation: the sequence of processes that generate a sentence

04/20/23 IT 327 7

<S> <NP> <VP> <T> <N> <VP>the <N> <VP>the man <VP>the man <V> <NP>the man hit <NP>the man hit <T> <N> the man hit the <N> the man hit the ball

<S> ::= <NP> <VP> <NP> ::= <T> <N><VP> ::= <V> <NP><T> ::= the<N> ::= man | ball<V> := hit | took

Grammar 1

the man hit the ball

04/20/23 IT 327 8

Parse: v. To break (a sentence) down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part.

(American Heritage Dict.)

the dog loves the cat

the loves dog the cat

loves the dog the cat

×

×

A Parse Tree

04/20/23 IT 327 9

<S>

<NP> <V> <NP>

<A> <N><A> <N>

the dog the cat

lovesGrammar

<S> ::= <NP> <V> <NP><NP> ::= <A> <N><V> ::= loves | hates|eats<A> ::= a | the<N> ::= dog | cat | rat

“the loves dog the cat” doesn’t have a parse tree

A grammar for Arithmetic Expression

04/20/23 IT 327 10

<exp> ::= <exp> + <exp> |

<exp> * <exp> |

( <exp> ) |

a | b | c

Example: ((a+b)*c)

Is this expression valid?

<exp>( <exp> )( <exp> * <exp> )(( <exp> ) * <exp> )((<exp> + <exp> ) * <exp> )((a + <exp> ) * <exp> )((a +b) * <exp> )((a+b)*c)

Yes

A Parse Tree for ((a+b)*c)

04/20/23 IT 327 11

<exp>

<exp> + <exp>

( <exp> )

<exp> * <exp>

( <exp> )

a b

c

Parse Trees for a+b*c

04/20/23 IT 327 12

<exp>

<exp> + <exp>

<exp> * <exp>

a b

c

<exp>

<exp> * <exp>

<exp> + <exp>

b c

a

?

What is the meaning of a+b*c

Restrictions on Grammars

04/20/23 IT 327 13

Unrestricted Grammars(type-0)

Why context sensitive grammars have less restrictions than context free grammars?

Right/Left Linear Grammars(type-3)

Context Sensitive(type-1)

Context Free(type-2)

Diagram in terms of the sizes of the set of restrictions

Chomsky Hierarchy

04/20/23 IT 327 14

Regular Expressions(type-3)

Computable (formal) languages(type-0)

Context-free languages(type-2)

Context-sensitive languages(type-1)

Diagram in terms of the sizes of the language families

• A BNF grammar consists of four parts:

– The finite set of tokens (terminal symbols)

– The finite set of non-terminal symbols

– The start symbol

– The finite set of production rules

04/20/23 IT 327 15

<S> ::= <NP> <VP> <NP> ::= <T> <N><VP> ::= <V> <NP><T> ::= the<N> ::= man | ball<V> ::= hit | took

Grammars in BNF (Backus-Naur Form)

Constructing Grammars

• Using divide and conquer to simplify the job. • Data types, variable names (identifiers)• One variable, one type (this is not grammar’s job to make sure)

04/20/23 IT 327 16

float a;

boolean a, b, c;

int a, b;

<var-dec>

Primitive type names

Using divide and conquer

04/20/23 IT 327 17

<var-dec> ::= <type-name> <declarator-list> ;

<type-name> ::= boolean | byte | short | int | long | char | float | double

<declarator-list> ::= <declarator> | <declarator> , <declarator-list>

<declarator> ::= <variable-name> | <variable-name> = <expr>

Tokens:

• How is such a program file (a sequence of characters) divided into a sequence of tokens?

04/20/23 IT 327 18

e.g. • identifiers (const, x, fact)

• keywords (if, const)

• operators (==)

• constants (123.4), etc.

• Programs stored in files are just sequences of characters, but we want to prepare them into tokens before further analysis.

Reserved words

Tokens are atoms of the program

Lexical Structure And Phrase Structure

• Grammars so far have defined phrase structure: how a program is built from a sequence of tokens

• We also need to define lexical structure: how a text file is divided into tokens

04/20/23 IT 327 19

Separate Grammars

• Usually there are two separate grammars– to construct a sequence of tokens from a file of

characters (Lexical Structure)

– to construct a parse tree from a sequence of tokens (Phrase Structure)

04/20/23 IT 327 20

<program-file> ::= <end-of-file> | <element> <program-file>

<element> ::= <token> | <one-white-space> | <comment><one-white-space> ::= <space> | <tab> | <end-of-line><token> ::= <identifier> | <operator> | <constant> | …

Separate Compiler Passes

• Scanner tokens string• parser parse tree

• (more to do afterwards)

04/20/23 IT 327 21

Historical Note #1

• Early languages sometimes did not separate lexical structure from phrase structure

– Early Fortran and Algol dialects allowed spaces anywhere, even in the middle of a keyword

– Other languages like PL/I or Early Fortran allow keywords to be used as identifiers

This makes them difficult to scan and parse

It also reduces readability

04/20/23 IT 327 22

Historical Note #2

• Some languages have a fixed-format lexical structure -- column positions are significant

– One statement per line (i.e. per card)

– First few columns for statement label

– Etc.

• Early dialects of Fortran, Cobol, and Basic

• Almost all modern languages are free-format: column positions are ignored

04/20/23 IT 327 23

Other Grammar Forms

• BNF variations

• EBNF variations

• Syntax diagrams

04/20/23 IT 327 24

BNF Variations

• Some use or = instead of ::=

• Some leave out the angle brackets and use a distinct typeface for tokens

• Some allow single quotes around tokens, for example to distinguish ‘|’ as a token from | as a meta-symbol

04/20/23 IT 327 25

Sir, please Step away from the ASR-33

Interesting operator!!Or not!

EBNF Variations

• Additional syntax to simplify some grammar chores:

– {x} to mean zero or more repetitions of x

– [x] to mean x is optional (i.e. x | <empty>)

– () for grouping

– | anywhere to mean a choice among alternatives

– Quotes around tokens, if necessary, to distinguish from meta-symbols

04/20/23 IT 327 26

EBNF Examples

• Anything that extends BNF this way is called an Extended BNF: EBNF

• There are many variations

04/20/23 IT 327 27

<stmt-list> ::= {<stmt> ;}

<if-stmt> ::= if <expr> then <stmt> [else <stmt>]

<thing-list> ::= { (<stmt> | <declaration>) ;}

Syntax Diagrams

• Syntax diagrams (“railroad diagrams”)

04/20/23 IT 327 28

if then elseexpr stmt stmtif-stmt

<if-stmt> ::= if <expr> then <stmt> else <stmt>

Bypasses

04/20/23 IT 327 29

if then elseexpr stmt stmtif-stmt

<if-stmt> ::= if <expr> then <stmt> [else <stmt>]

Branching

04/20/23 IT 327 30

exp

exp + exp

exp * exp

( exp )

a

b

c

<exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> )| a | b | c

Loops

04/20/23 IT 327 31

<exp> ::= <addend> {+ <addend>}

exp addend

+

Syntax Diagrams, Pro and Con

• Easier for human to read (follow)

• Difficult to perceive the phrase structures (syntax tree)?

• Harder for machine to read (for automatic parser-generators)

04/20/23 IT 327 32

Conclusion• We use grammars to define programming

language syntax, both lexical structure and phrase structure

• Connection between theory and practice– Two grammars, two compiler passes– Parser-generators can produce code for those

two passes automatically from grammars (compiler tools)

04/20/23 IT 327 33

top related