allyson m. hoss, january 28, 2008 csc 7101 programming language structures spring 2008 louisiana...
TRANSCRIPT
Allyson M. Hoss,
January 28, 2008
CSC 7101 Programming Language Structures
Spring 2008
Louisiana State University
CSC 7101 Programming Language Structures
• Research Assignment
• Miscellaneous Issues
• PL Design Goals
• Syntax & Semantics
• Attribute Grammars
Topic & References
• Sources - published papers (incl. IEEE / ACM)
(not: class websites; blogs; advertisements)
• Limit sources from Wikipedia
• Trouble finding or accessing papers?
• Do your own research!
• Your research is YOURS!
• Example Topic Description
Example of a Good Topic Description
Topic: Parallel Programming Languages
Focus: Usability on today's multi-core and
future multi-core / multi-processor systems
Approach: Compare & contrast programming languages,
strengths and weaknesses w.r.t. multi-core processors
Focus: Determine if current Software Development Toolkits
built on top of existing languages are better suited
and easier to use
Systems: IBM's X10 language, Cray Inc's Chapel,
possibly Sun's Fortress language.
Toolkits: OpenMP and Intel's Threading Blocks
Guidelines for Outline
• Topic sentence(s)
• Focus (narrowing the topic)
• Approach (your review will take)
• Address PL design goals of your topic
• Include a comparative analysis table/diagram of related research organized based on your approach
• Address open issues / research directions
• Review syntax briefly … focus on semantics
• Minimal review of HW
CSC 7101 Programming Language Structures
• Research Assignment
• Miscellaneous Issues
• PL Design Goals
• Syntax & Semantics
• Attribute Grammars
Teaching Assistant
• John W. Burris
• Office Hours
Monday 11:00 AM - 12:30 PM
Tuesday 11:00 AM - 1:00 PM
• Thursday by appointment
(with at least 12 hours notice)
• Coates 162
Website
My main home page will be:
http://www.csc.lsu.edu/~hoss/index.html
Additional Reading Material
Slonneger, K. and Kurtz, B., Formal Syntax and Semantics of Programming Languages:A Laboratory-Based Approach
Addison-Wesley, Reading, MA,
ISBN: 0-201-65697-3, 1995.http://www.cs.uiowa.edu/~slonnegr/plf/Book/
Please Read:
Chapter 1, pp. 1-8; 21-29
Chapter 3, pp. 59-71
CSC 7101 Programming Language Structures
• Research Assignment
• Miscellaneous Issues
• PL Design Goals
• Syntax & Semantics
• Attribute Grammars
Design Questions
• What design decisions make each language different from the others?
• Are these differences a result of minor syntactic rules or important underlying semantic issues?
• Is a controversial design decision necessary to make the language appropriate for its intended use or was the design decision an accident?
Design Questions
• Could different design decisions result in a language with more strengths and fewer weaknesses?
• Are the good parts of different languages mutually exclusive or could they be efficiently combined?
• Can a language be extended to compensate for its weaknesses?
Design Goals
What do
you think
are
some
design goals?
Design Goals
• Initially: time (execution) vs space (memory)
• Next: simplicity, expressiveness, generality
• Then: reliability, maintainability, efficiency
Design Goals
What do you think
are the
current goals
today?
Design Goals NOW
• Simplicity : easy to learn, use, understand
• Robustness: security, safety
(strongly typed; restricts ptrs)
• Portability: architectures
(run-time bytecode interpreters)
• Internet Compatibility: access SW anywhere
(class libraries)
• Concurrency: multi-interaction
(multi-threading & conc.primitives)
Procedural or ImperativeFunctional DeclarativeObject-oriented
Rule-based, Event-driven, Parallel or Concurrent,Scripting, Markup, Specification, Assembly, Visual, …
Programming Language Paradigms
• Computation based on command such as
do this, do that, do the next thing
• Variables represent memory locations
• Assignment statements store values
• Destructive assignment
• Uses iteration for repetition
• Acts on stored data, modifies system state
• Assembly, Fortran, COBOL, C, C++, Java
Procedural or Imperative Paradigm
• View programs as function definitions and sets of expressions
• Computation based on math functions -“back boxes” accepts inputs and returns outputs
• Apply functions to arguments
• Minimal use of variable or assignment statements
• No extraneous side effects
• Natural recursion (primary form of repetition)
• Lisp, Scheme, ML
Functional Paradigm
• Not procedural : commands describe what is
and not how to
• Computation using symbolic logic, facts, and rules
• Typically interpreted
• Looping via recursion
• Prolog
Declarative (Logic) Paradigm
• Data abstraction : object & methods
• Information hiding
• Classes, class hierarchy, instances of classes
• Computation via interaction of objects
• Inheritance
• Smalltalk
• OO features incorporated into most modern PL…C++, C#, Java
Object-Oriented Paradigm
• Language: set of strings (infinite?) symbols from a finite alphabet
• Language Specification:• Syntax:
arrangement of symbols well-formedness (not ambiguous; values defined wellgrammar
• Semantics: meaning of syntactically valid stringsrelationship between input and output steps of program executionrules for legal programs – often syntax can not describe
• Pragmatics – extra information Usage of the language (ease of use, efficiency)Features of the implementation (optimization)
Defining a PL
CSC 7101 Programming Language Structures
• Research Assignment
• Website
• Additional Reading Material
• Design Goals
• Syntax & Semantics
• Attribute Grammars
Syntax & Semantics
• Syntax
how does a program “look”
form and structure of language constructs
(programs, procedures, statements, …)
• Semantics
what do the language constructs “do”
meaning (behavior) of the syntactic units
(what does an “if” statement “do”)
Syntax & Semantics
• Syntax
grammar of a natural language statement
a set of rules to define a language
• Semantics
meaning of a natural language statement
English Grammar
<sentence> ::= <noun phrase> <verb phrase>
A sentence is a noun phrase
followed by a verb phrase
::= can be read
“is defined to be” or
“is composed of”
also written
English Grammar
<sentence> ::= <noun phrase> <verb phrase>
<noun phrase> ::= <determiner> <noun>
| <determiner> <noun> <prepositional phrase>
<verb phrase> ::= <verb> | <verb> <noun phrase>
| <verb> <noun phrase> <prepositional phrase>
<prepositional phrase> ::= <preposition> <noun phrase>
<noun> ::= boy | dog | leash | ball
<determiner> ::= a | the
<verb> ::= walked | threw
<preposition> ::= with | to
::= can be read “is defined to be” or “may be composed of” also written
Parse Tree
< n o u n >
< n o u n p h ra se >
< d e t >
< sen ten c e >
< v erb p h ra se >
< n o u n > < v erb > < p rep p h ra se >
< p rep >
< n o u n p h ra se >
< d e t > < n o u n p h ra se >
< d e t > < n o u n >
< . >
th e b o y w a lk ed
th e d o g w ith
a lea sh
Parse Tree alone can not validate semantics
< n o u n >
< n o u n p h ra se >
< d e t >
< sen ten c e >
< v erb p h ra se >
< n o u n > < v erb > < p rep p h ra se >
< p rep >
< n o u n p h ra se >
< d e t > < n o u n p h ra se >
< d e t > < n o u n >
< . >
th e d o g th rew
th e b o y to
th e b a ll
FORMAL FORMAL
SYNTAX SEMANTICS
Static Dynamic
BNF attribute grammars operational
(Backus axiomatic
Naur denotational
Form)
Formal Syntax
• Formal Translation Models
• Grammar – a formal definition of syntax
• Types of Grammars (0..3)
• BNF
Formal Languages
• Language: set of strings containing symbols from alphabet
• What strings can you form over the alphabet {a, b}
1.{abbb}
2.{baa, baaa, baaaa, baaaaa, . . . }
3.{ab, aabb, aaabbb, aaaabbbb, aaaaabbbbb, . . . }
4.{aba, abba, abbba, …}
Definition of a formal language
model generates & recognizes all (and only) strings of a formal language
A programming language grammar is used to parse a program producing a parse tree.
It contains every symbol in the input program as well as all sets of symbols used in the program's derivation
Important role in the design and implementation of programming languages
Formal Languages
Grammar
• Required to define a formal language
• Alphabet: finite set Σ of symbols
• String: finite sequence of symbols
Empty string Σ* - set of all strings over Σ (incl. )Σ+ - set of all non-empty strings over Σ
• Language: set of strings L Σ*
• Set of Rules to determine legal strings
Grammars
• G = (T, N, S, P)
• Finite set of terminal symbols T
• Finite set of non-terminal symbols N
• Starting non-terminal symbol S N
• Finite set of productions P
x y (x ::= y)
x (N T)+, y (N T)*
• Applying a production: uxv uyw
Grammars
G = (T, N, S, P)terminal symbols : symbols (words) of an alphabet (word
set) from which strings of the language can be created; {a, b, c...}
nonterminal symbols : symbols describing sets of strings (syntactic categories) ; {A, B, C...}
start symbol S: marks starting point for string derivations; unique in the grammar;
productions : rules describing how each nonterminal is defined in terms of terminal symbols and nonterminals; ordered pairs of strings (x, y) such that x y (x ::= y)
Grammar
• String derivation – sequence of rule application
• w1 w2 … wn; denoted w1 wn
• Language generated by a grammar
• L(G) = { w T* | S w }
• Traditional classification
• Regular
• Context-free
• Context-sensitive
• Unrestricted
Chomsky Hierarchy
T yp e 0 : R ec u rs iv e ly E n u m era b le L a n g u a g es m o st u nrest r ic ted ; reco g nized by T u rn ing m achine
T yp e 1 : C o n tex t-S en sitiv e L a n g u a g es reco g nized by linea r-bo u nd au to m ata
T yp e 2 : C o n tex t-F ree L a n g u a g es m o st P L ; reco g nized by p u sh-d o w n au to m ata
T yp e 3 : R e g u la r L a n g u a g e s m o st re st r ic ted ; re c o gn iz e d b y fin ite au to m ata
Regular Languages (Type 3)
• Most restricted
• LHS is a single non-terminal
• RHS has exactly one terminal and at most one nonterminal
• All productions are A wB and A w
A,B N and w T*Or all productions are A Bw and A w
Regular Languages Examples (Type 3)
• L = { anb | n > 0 } is a regular language
S Ab and A a | Aa
• What are the strings that can be generated using this language?
{ab, aab, aaab, …}
Regular Languages Examples (Type 3)
• Binary numerals
B 0B 1B 0 BB 1 B
Uses of Regular Grammar
• Lexical analysisLexical analysis in compilers
e.g. identifier = letter (letter|digit)*
Token sequence for syntactic analysissyntactic analysis done by parser
tokens = terminals for CFG
• Pattern matchinggrep “a\+b” foo.txtEvery line from program that contains a string from the
language L = { anb | n > 0 }i.e. the language for reg. expr. a+b
Context-Free Languages (Type 2)
• LHS must be a single nonterminal ;
• All productions are xAy --> xzy
or xAy --> xZy
A, Z N and z T* and x,y =
• A can be rewritten by the strings z or Z on the right regardless of the context in which A finds itself
• A z, A Z, Z z
Context-Free Languages (Type 2)
Example:
L1 = { anbn | n > 0 } is c.f. but not regular
L2 = {axby; x>0, y>0}
What are the strings that can be generated using L1 and L2 languages?
L1 {ab, aabb, aaabbb, …}
L2 {ab, aab, aaab, …, abb, abbb, abbb, … aabb, aabbb, …}
Context-Free Languages (Type 2)
S → ABS → ASBA → aB → b
S AB aB abS ASB aSB aABB aaBB aabB aabb This grammar can be simplified by removing the nonterminals A
and B, leaving just two rewrite rules:S abS aSb
Uses of Context-Free Languages
• Describe the essential features of all current PLs
• Syntax of a programming language• e.g. Java
• Terminals: identifiers, keywords, literals, separators, operators
• Starting non-terminal: CompilationUnit
• Implementation of most parsers in a compiler to determine syntactic structure and produce CFG parse trees
• Backus-Naur Form (BNF) : alternative notation for context-free grammars; John Backus and Peter Naur, for ALGOL60
Limitations of Context-Free Languages
• Cannot represent semantics
• e.g. “every variable used in a statement should be declared in advance”
• e.g. “the use of a variable should conform to its type” (type checking)
• cannot say “string s1 divided by string s2”
• Solution: attribute grammars
For certain kinds of semantic analysissemantic analysis
Context-Sensitive Languages (Type 1)
• RHS contains no fewer symbols than LHS
• All productions are xAy --> xzy
A N and x,y,z T* and z · ≠ Ø
• A can be rewritten by z only when it is in the context of x and y
(when the string x precedes N and the string y follows it)
• Example Rule
ABC AbbC
Context-Sensitive Languages (Type 1)
• Example language
L = { anbncn | n >= 1}
• More powerful than context-free grammars
• All context-free languages are also context-sensitive
• Not all context-sensitive languages are context-free
Recursively Enumerable Languages (Type 0)
• No restrictions – most general grammar (linguists find useless)
aYb bY Y N and a,b T*
• Language accepted by a Turning machinea general model of computation(a finite-state machine in which each transition prints a symbol on a tape. – The tape head can move in either direction. – The tape is infinite to the right)
• models a human being solving a problem in an algorithmic way
Using Grammars
How do we represent
graphically
a sequence of productions
from a formal grammar?
Derivation Tree
< n o u n >
< n o u n p h ra se >
< d e t >
< sen ten c e >
< v erb p h ra se >
< n o u n > < v erb > < p rep p h ra se >
< p rep >
< n o u n p h ra se >
< d e t > < n o u n p h ra se >
< d e t > < n o u n >
< . >
th e b o y w a lk ed
th e d o g w ith
a lea sh
Derivation Tree
• Derivation tree = parse tree• Leaf nodes: terminals• Inner nodes: non-terminals• Root: starting non-terminal of the grammar
• Describes a particular way to derive a string • Leaf nodes from left to right are the string• To get the string: depth-first traversal, following the
leftmost unexplored branch
• Begins with Start symbol and replace one nonterminal at a time by its corresponding right-hand side in some production for that nonterminal
Derivation Tree
• Types of Parsing:
Top-down parsing
Bottom-up parsing
Depth-first left-corner parsing
Derivation Tree
• Top-down parsing : starts from the start symbol (S) and works down to the leaves
+ Only builds trees that are rooted in S
− Wastes time building trees that don’t match the input
Derivation Tree
• Bottom-up parsing : starts from the leaves and works up to the start symbol (S)
+ Only builds trees that match the input
− Wastes time building trees that will never lead to S
Derivation Tree
• Depth-first left-corner parsing : combines the best of both types of parsing:
+ Only build trees that are rooted in S
+ Only build trees that match the input
Derivation Sequence
•Each tree represents a set of derivation sequences
•The tree “filters out” the choice of order of production application
•Filtering out the order
•Leftmost derivation: expand leftmost non-terminal
•Rightmost derivation: expand rightmost non-terminal
•A derivation may be neither leftmost nor rightmost
Backus-Naur (Normal) Form (BNF)
• Describes for Context-Free Languages • MetaLanguage to describe most PL syntax• John Backus and Peter Naur• Algol-60• Essential in compiler construction Guides the parser Should not be ambiguous Although a parser may not produce a derivation tree, the
structure of the tree is embodied in the parsing process
Backus-Naur (Normal) Form (BNF)
Special Symbols
<...> nonterminals <expression>
Terminals use no special symbols if, while, (
::= is defined as / composed of
| alternatives <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
BNF example
<stmt> ::= while <exp> do <stmt>
| if <exp> then <stmt>
| if <exp> then <stmt> else <stmt>
| <exp> := <exp>
| <id> ( <exps> )
<exps> ::= <exp> | <exps> , <exp>
* note: recursion with <stmt>
Limitations of BNF
• Describes only syntax, not semantics
• Does not specify implementation details
• Difficult to impose length limitations
(e.g. maximum length of variable names)
• Impossible to impose requirements such as a variable must be declared before it is used
• Can not indicate issues such as
blank lines in a program
one statement spans multiple lines
• BUT … nothing better yet
Extended BNF (EBNF)
[...] options (0 or 1 occurrences) <stmt> ::= if <cond> then <stmt> [ else <stmt>]
{...} repetition (0 or more occurrences) <unsigned> ::= <digit> {<digit>}
BNF EBNF
<a> ::= <b> | <a> ; <b> <a> ::= <b> {;<b>}
<a> ::= x <c> <a> ::= x <b> {, <b>}
<c> ::= <b> | <c>, <b>
Ambiguous Grammar
• Generates two or more distinct parse trees for the same string
• Unambiguous grammars required so compiler can produce correct code(parse tree provides precedence and associativity
of operators)
Ambiguous Grammar
< n o u n >
< n o u n p h ra se >
< d e t >
< sen ten c e >
< v erb p h ra se >
< n o u n > < v erb > < p rep p h ra se >
< p rep >
< n o u n p h ra se >
< d e t > < n o u n p h ra se >
< d e t > < n o u n >
< . >
th e b o y w a lk ed
th e d o g w ith
a lea sh
Ambiguous Grammar
< n o u n p h ra se >
< d e t >
< sen ten c e >
< v erb p h ra se >
< n o u n > < v erb > < n o u n p h ra se >
< . >
th e b o y w a lk ed < n o u n >< d e t >
th e d o g
< p rep p h ra se >
< p rep > < n o u n p h ra se >
< d e t > < n o u n >w ith
a lea sh
Ambiguous Grammar
• One famous ambiguity is “dangling else”<stmt> ::= if <cond> then <stmt> [else <stmt>]
• Solve syntactically by adding nonterminals & productions
<stmt> ::= <matched> | <unmatched>
<matched> ::= if <cond> then <matched> else <matched>
<unmatched> ::= if <cond> then <stmt> | if <cond> then <matched> else <unmatched>
• Solve semantically by adding constraint “elses are associated with immediately preceding unmatched then”
Syntax Graphs
• Are equivalent to CFGs
• Terminals in circles
• Non-terminals in rectangles
• Lines and arrows indicate how constructs are built
Syntax Graphs
Part of the Context-Free Syntax for Mini-Language Core in BNF
<program> ::= program <declaration-sequence> begin <statement-seqeuence> end ;
<declaration-sequence> ::= <declaration> | <declaration> <declaration-sequence>
<declaration> ::= <identifier-list> : integer ;
<identifier-list> ::= <identifier> | <identifier> , <identifier-list>
Syntax Graphs
p ro g ram d ec la ra tio n endb eg in s ta tem en t ;
id en tifie r : in teg e r
,
;
S y n ta x G ra ph for M in i-L a n g u a g e C ore
D ecla ra tion S y n ta x G ra ph
Formal Semantics
• Static
Attribute Grammars
• Dynamic
Operational
Axiomatic
Denotational
Formal Semantics - Static
• Not all program properties can be checked by a context free parser
• Context free parsing can be extended with attributes
• Useful to specify things BNF can not
• Determined at compile time
• Attribute Grammars
CSC 7101 Programming Language Structures
• Research Assignment
• Website
• Additional Reading Material
• Design Goals
• Syntax & Semantics
• Attribute Grammars
Attribute Grammars
• Extension of CFG
• Provides context-sensitive information such as declarations and type checking to facilitate semantic checking e.g. boolean state information to help control the parsing process itself symbol table information
• Adds attributes (typed values) to some nonterminalsSynthesized & Inherited
• Each attribute has a domain of possible values
Attribute Grammars
• Functions added to productions to assign values to the attributes
• Attributes evaluated in assignments or conditions during parse tree walk
• Conditions to reject invalid parse trees
• Evaluation order depends on attribute dependencies
Attribute Grammars
• Algorithms exist to test for the circularity of attribute dependencies in an attribute grammar
• Incorporated into some parser generator tools
Yacc, for example, is not attribute based, but provides a mechanism for accessing the results of child nodes in the parse tree when performing a reduction
• Synthesized attribute
gets its values from the attributes attached to the children of its nonterminal.
• Inherited attribute
gets its values from the attributes attached to the parent (or siblings) of its nonterminal
Synthesized vs Inherited Attributes
Synthesized vs Inherited Attributes
S
t
SYN INH A
Evaluation Rules
•Synthesized attribute associated with N:Each alternative in N’s production should
contain a rule for evaluating the attribute
• Inherited attribute associated with N:for every occurrence of N on the right-hand
side of any alternative, there must be a rule for evaluating the attribute
Attribute Grammar Example
• L = { anbncn | n > 0 }; not context-free
• BNF<start> ::= <A><B><C>
<A> ::= a | a<A>
<B> ::= b | b<B>
<C> ::= c | c<C>
• Attributes (Value domain = integers )
Na: associated with <A>
Nb: associated with <B>
Nc: associated with <C>
Evaluation
• Evaluation rules (similar for <B>, <C>)
<A> ::= a
Na(<A>) := 1
| a<A>
Na(<A>) := 1 + Na(<A>2)
• Conditions
<start> ::= <A><B><C>
Cond: Na(<A>) = Nb(<B>) = Nc(<C>)
Alternative notation: <A>.Na
Parse Tree
Na:1
Na:2
Nc:1Nb:1
Cond:true
Nc:2Nb:2
<start>
<A> <B> <C>
a <A> b <B> c <C>
a b c
Parse Tree for an Attribute Grammar
• Valid tree for the underlying BNF
• Each node has a set of (attribute,value) pairs
One pair for each attribute associated with the terminal or non-terminal in the node
• Some nodes have boolean conditions
• Valid parse treeAttribute values conform to the evaluation rulesAll boolean conditions are true
Example: Binary Numbers
• Context-free grammar
For simplicity, will use X instead of <X>
B ::= D
B ::= D B
D ::= 0
D ::= 1
Goal: compute the value of a binary number
BNF Parse Tree for Input 1010
B
B
B
B
D
D
D
D
1
0
0
1
Add attributes
B: synthesized val
B: synthesized pos
D: synthesized val
D: inherited pow
Evaluated Parse Tree
B
B
B
B
D
D
D
D
1
0
0
1
pos:4 val:10
pos:3 val:2
pos:2 val:2
pos:1 val:0
pow:0val:0
pow:1val:2
pow:2val:0
pow:3val:8
No
Class
Next Week