informatica 3 - intranet deibhome.deib.polimi.it/restelli/mywebsite/pdf/syntax_semantics.pdf ·...
TRANSCRIPT
9/15/079/30/07
Marcello Restelli
Laurea in Ingegneria InformaticaPolitecnico di Milano
Informatica 3Syntax and Semantics
2
Introduction
Introduction to the concepts of syntax and semanticsBindingVariablesRoutines
3
Programming Languages
Tools for writing softwareFormal notations for describing algorithms for execution by computersFormal notations need
Syntaxset of rules specifying how programs are writtenspecifies the validity of a programformal notations: EBNF, Syntax Diagrams
Semanticsset of rules specifying “the meaning” of syntactically correct programsspecifies the effect of a programoperational semantics
4
Syntax
Rules that define admissible sequences of symbolsLexical rules
Define the “alphabet” of the languageE.g, set of chars, combinationsExample
Pascal does not distinguish between uppercase and lowercase chars, while C and Ada do<> is a valid operator in Pascal, but denoted by != in C, and /= in Ada
Syntactic rulesDefine how to compose words to build syntactically correct programsHelps programmer to write syntactically correct programAllows to check whether a program is syntactically correct
5
EBNF – Extended Backus Naur Form
Early descriptions were done using natural languageALGOL60 was described by John Backus using a context free grammar (BNF)EBNF is a metalanguage
I.e., a language to describe other languagesmetasymbols
::= symbol definition< > elements defined later (nonterminals)“{“ “;” “}” “if” “else” terminals| choice* zero or more occurrences of the preceding element+ one or more occurrences of the preceding element
6
Syntax Rules
Example:<program> ::= { <statement>* } rule<statement> ::= <assignment>|<conditional>|<loop> <assignment>::= <identifier> = <expr> ;<conditional>::=
if <expr> { <statement>+ } | if <expr> {<statement>+} else {<statement>+}
<loop> ::= while <expr> {<statement>+}<expr> ::= <identifier> | <number> |
( <expr> ) | <expr> <operator> <expr>
7
Lexical Rules
Example<operator> ::= + | - | * | / | = | /= | < | > | <= | >=<identifier> ::= <letter> <ld>*<ld> ::= <letter> | <digit><number> ::= <digit>+<letter> ::= a | b | c | . . . | z<digit> ::= 0 | 1 | . . . | 9
8
Syntax Diagrams
Graphical representation of the syntax rulesterminal symbols are put inside circlesnonterminals symbols are put inside rectanglesadmissible sentences are obtained by traveling the diagram according to the arrows
9
Syntax Diagrams
Graphical representation of the syntax rulesterminal symbols are put inside circlesnonterminals symbols are put inside rectanglesadmissible sentences are obtained by traveling the diagram according to the arrows
10
Syntax Diagrams
Graphical representation of the syntax rulesterminal symbols are put inside circlesnonterminals symbols are put inside rectanglesadmissible sentences are obtained by traveling the diagram according to the arrows
11
Syntax Diagrams
Graphical representation of the syntax rulesterminal symbols are put inside circlesnonterminals symbols are put inside rectanglesadmissible sentences are obtained by traveling the diagram according to the arrows
12
Syntax Diagrams
Graphical representation of the syntax rulesterminal symbols are put inside circlesnonterminals symbols are put inside rectanglesadmissible sentences are obtained by traveling the diagram according to the arrows
13
Syntax Diagrams
Graphical representation of the syntax rulesterminal symbols are put inside circlesnonterminals symbols are put inside rectanglesadmissible sentences are obtained by traveling the diagram according to the arrows
14
Abstract Syntax vs Concrete Syntax
There can be construction in different languages that have the same conceptual structure, but different lexical rules
Differences“{” and “}” instead of “begin” and “end”“!=” instead of “<>”In Pascal, “(” and “)” can be omitted
Same abstract syntax, but different concrete syntax
C Language
while (x != y) {
...}
Pascal
while x <> y do begin
...end
15
Pragmatics
Concrete syntax has a high impact on usabilityExample
In C language, when loops have only one instruction, parentheses may be omittedThis feature can be error proneOther languages (e.g., Modula-2 and Ada) address this issue by closing the conditional/loop statements with an “end” keywordIt is useful to adopt a programming style that avoids such problems (i.e., in C you should always use the curly braces even when there is only a single statement)
16
Language processing
InterpretationLanguage statements are directly executed
TranslationHigh-level language statements are translated into machine-level statements before being executed
17
Interpretation
Main cycleGet the next statementDetermine the actions to be executedPerform the actions
Interpretation is a simulation on a host of a special purpose computer with a high-level machine language
18
Translation
Cross-Translationtranslation is performed on a machine that is different from the one used for execution
needed when the execution machine has a special purpose processor
19
Interpretation vs Translation
Translation to intermediate codeInterpretation of the intermediate code of a virtual machinePortability of executable codeExamples
Java bytecode
20
Basic Concepts of a Programming Language
Programs deal with different types of entitiesvariablesroutinesstatements
Entities have different properties or attributesVariables
Name, type, storage area…
RoutinesName, formal parameters, parameter-passing information…
StatementsActions to be taken
Attribute information is stored in the descriptorBinding sets attributes for entities
21
The Concept of Binding
Is a key element to define the semantics of a programming languageDifferent programming languages…
…have different number of entities…have different number of attributes bound to entities…allow different binding time…have different binding stability
Binding timeDetermine when the binding occurs
StabilityDetermine whether certain bindings are fixed or modifiableStatic binding cannot be modified
22
Binding Time
Language definition timeE.g., in most languages (FORTRAN, Ada, C++) the integer type is bound at language definition
Language implementation timeE.g., then during implementation, the integer type is bound to a memory representation
Compile-time (translation-time) Pascal provides a way to redefine the integer type, the new representation is bound to the type when the program is compiled
Execution-time (run-time)Variables are usually bounded to a value during execution
StaticBinding
DynamicBinding
23
Static Binding vs Dynamic Binding
Static BindingEstablished before execution and cannot be changed thereafterE.g., FORTRAN, Ada, C, C++ integer bound to a set of values at definition/implementation time
Dynamic BindingEstablished at run-time and usually modifiable during executionE.g., variable values
Exceptionsin Pascal, read-only constant are variables whose value is initialized at run-time but cannot be modified thereafter
24
Variables
Conventional language variables are memory abstractions
In main memory, cells are identified by an addressThe contents of a cell are encoded representation of a valueThe value can be modified at run-time
VariablesVariables abstract the concept of memory cellThe variable name abstracts the address informationThe assignment statement abstracts the cell modification
25
Formal Definition of a Variable
A variable is a 5-tuple<name, scope, type, l-value, r-value>
NameString used in program statement to denote the variable
ScopeRange of program instructions over which the name is defined
TypeL-value
Memory location associated to the variable
R-valueEncoded value stored in the variable’s location
26
Name and Scope
The variable's name is a string of chars that is introduced by a declaration statementThe scope is the range of program statements over which the name is known
The variable’s scope usually extends from the declaration until a later closing point
A program can use variables through their name inside their scope
it means that a variable is visible with its name only inside its scope
27
Scope and Binding
The rules of binding of a variable inside its scope are language dependentStatic scope binding
The variable’s scope is defined in terms of the syntactic structure of the programThe scope of a variable can be determined just examining the program textMany programming languages adopt static scope binding
Dynamic scope bindingThe variable’s scope is defined during program executionEach declaration extends its effect over all the instructions executed thereafter, until a new declaration for the same variable is encounteredAPL, SNOBOL4, and LISP use dynamic scope binding
28
Example: Static Scope Binding
main(){
int x,y;...{
int temp;...temp = x-y;
}}
main(){
int x,y;...{
int temp;int x;...temp = x-y;
}}
29
Example: Dynamic Scope Binding
Case 1: A then CVariable x in block C refers to x declared in block A
Case 2: B then CVariable x in block C refers to x declared in block B
{ /* block A */ int x;}{ /* block B */ float x;}{ /* block C */ x = ...;}
30
Dynamic Scope Binding
Rules for dynamic scope binding are simple and rather easy to implementThe major disadvantages are in terms of
programming disciplineefficiency of implementation
Programs are hard to read and difficult to debug
31
Type
Set of values that can be associated to a variable, and...Set of operations that can be legally performed on such values
creating, reading, and accessing the values
It protects variables from nonsensical operationsA variable of a given type is also called an instance of the typeA language can be
Typeless or untyped, e.g., assembly languagesDynamically typed, e.g., LISPStatically typed
32
Type and Binding
Binding between type and set of admissible values and operations that modify the valuesBuilt-in types (e.g., integer)
the set of admissible value is associated when the language is definedthe operations are defined when the type is implemented
Declarations of new types (e.g., typedef float reale;)Binding between the type name and implementation at translation timeNew type inherits all operations
Abstract Data Typesallow to define new types by specifying the set of admissible values and the set of legal operations
33
Abstract Data Types
Typical ADT declaration is structured astypedef new_type_name{ data structure for new_type_name; operations to manipulate data objects;}
34
Abstract Data Types: Example
class StackOfChar {private: int size; char* top; char* s;public: StackOfChar (int sz) { top = s = new char [size=sz]; } ~StackOfChar ( ) {delete [ ] s;} void push (char c) {*top++ = c;} char pop ( ) {return *--top;} int length ( ) {return top - s;}};
35
Static vs Dynamic Type Binding
Statically typed language (C, C++, Pascal, etc.)Binding occurs at compile-timeBinding cannot be changed at run-time
type checking can be performed at compile-time
Dynamically typed language (LISP, APL, Smalltalk, etc.)Binding occurs at run-timeBinding can be changed at run-time variables are polymorphic
the type depends on the values assigned to the variabletype checking cannot be performed statically
No typed languages (script and assembly languages)memory cells and registries contain strings that can be interpreted as values of any type
36
lvalue
The l-value of a variable is the storage area bound to the variable during executionThe lifetime or extent of a variable is the period of time in which such a binding existsThe l-value is used to hold the r-valueData object is the pair <l-value,r-value>The binding between the variable and the l-value is the memory allocation
Memory allocation acquires storage and binds the variableLifetime extends from allocation to deallocation
37
Memory Allocation
Static allocation Allocation is performed before run-time, deallocation is performed upon termination
Dynamic allocationAllocation and deallocation are at runt-time performed on explicit request (through creation statements) or automatically
38
rvalue
The r-value of a variable is the encoded value stored in the location (l-value) of the variableThe r-value is interpreted according to variable's typeInstructions
Access variables through their l-value (lefthand side of assignments)Modify their r-value (righthand side of assignments)
x = y; // x is a location, while y is a value
The binding between variables and values is usually dynamicGenerally, it is static only for constants
39
Initialization
What is the r-value when the variable is created?Different languages, different solutions
Some languages (e.g., ML) requires that the binding is established at variable creationOther languages, like C and Ada, support such a binding but do not require it (int I, J=0;)
What if initialization is not provided?Ignore
the bit string found in the storage area is considered the variable initial value
System-defined initializationint = zero, char = blank
Special undefined valueThe variable is considered initialized to an undefined value, and any read access to an undefined variable is trapped
40
Reference and Unnamed Variables
Some languages allow variables that can be accessed through the r-value of another variableReference or pointers are variables whose r-value allow to access the content
of another variableof an unnamed variable
Access via a pointer is called dereferencing
int x = 5; int* px;px = &x;
5
xpx
41
Reference and Unnamed Variables
Some languages allow variables that can be accessed through the r-value of another variableReference or pointers are variables whose r-value allow to access the content
of another variableof an unnamed variable
Access via a pointer is called dereferencing
int* px;
px = malloc(sizeof(int)); 5
px
42
Reference and Unnamed Variables
Some languages allow variables that can be accessed through the r-value of another variableReference or pointers are variables whose r-value allow to access the content
of another variableof an unnamed variable
Access via a pointer is called dereferencingtype PI = ^integer;type PPI = ^PI;var pxi: PI;var ppxi: PPI;...new(pxi);pxi^ := 0;...new(ppxi);ppxi^ := pxi;
0
ppxi
pxi
43
Routines
Routines allow programs to be decomposed in a number of functional unitsRoutines are general concepts found in most languages
Subprograms in assembly languageSubroutines in FORTRANProcedures and functions in Pascal and AdaFunctions in C
Functions are routines that return a valueProcedures are routines that do not return a valueFormally, they can be defined as variables
<name, scope, type, l-value, r-value>
44
Routines: Name and Scope
NameIntroduced by a routine declaration
int sum(int n); // C prototype
ScopeThe scope of the routine name extends from the declaration point to some closing point, either statically or dynamically determined
In C, the end of the file where the routine is defined
ActivationRoutines activation is achieved through a routine invocation or routine callMust be specified routine's name and argumentsThe call statement must be in routine’s scope
45
Routines: Scope
ScopeLocal scope
Routines also define a scope for the declarations that are nested in them
local declarationssuch local declarations are only visible within the routine
non-local declarationsvisible from other units, but not from the routine
global declarationsvisible from all the units in the program
46
Routines: Type
The type of a routine is defined from its headername of the functiontype of input parameterstype of returned value
int function(int,int,float)
Type correctif the call conforms to the routine type
Signature
int sum(int n){ ... }...i = sum(10);
int sum(int n){ ... }...i = sum(5.3);
47
Routines: lvalue and rvalue
l-valueThe reference to the area where the routine body is stored
r-valueThe body that is currently bound to the routineUsually this binding is statically determined at translation timeSome languages allow variables of type “routine”
in C, we can define pointers to routines
Routines are considered first-class objects when... ...it is possible to distinguish between their l-value and r-value ...so as to allow the definition of variables of type routine
int sum(int);int (*ps)(int);ps = ∑int i = (*ps)(5);
48
Routines: Declaration and Definition
Some languages distinguish between routine declaration and definitionDeclaration
introduces the routine’s header without specifying the bodyspecifies scope
Definitionspecifies both the header and the body
The distinction between declaration and definition support mutual recursion
/* declaration of A, A is visible from now on */int A (int x, int y);/* definition of B, B is visible from now on */float B (int z){ int w, u;
/* A is visible here */ w = A (z, u); ...};int A(int x, int y){ ...
/* B is visible here */ float t = B(x); }
49
Routine instance
The representation of a routine during execution is called routine instanceCode segment
Contains the instructions of the unit, the content is fixed
Activation record (or frame)Includes all the data objects associated with the local variables of a specific routine instanceContains all the information needed to execute the routineIt is changeable
Referencing Environment of an instance U contains:U’s local variable, i.e., its local environmentU’s non local variables, i.e., its non local environmentNon local variables are modifiable via side-effect
50
Recursive Attivation
All instances of the same unit composed ofsame code segmentdifferent activation records
The binding between code segment and activation record constitutes a new unit instanceThe binding between an activation record and its code segment is dynamic
51
Routines: Parameters
Information passingdataroutine (some languages)
Parametersformal: routine's definitionactual: routine's call
Bindingpositional method
routine R(F1, F2, ..., Fn);call R(A1, A2, ..., An);
named associationroutine R(A:T1, B:T2, C:T3);call R(C=>Z, A=>X, B=>Y);
52
Generic Routines
ExampleRoutine to sort an array of integers
... int i,j; ... swap(i,j); //caso 1
Routine to sort an array of floats... float i,j; ... swap(i,j); //caso 2
void swap(int& a, int& b){
int temp = a;a = b;b = temp;
}
void swap(float& a, float& b){
float temp = a;a = b;b = temp;
}
template <class T> void swap(T& a, T& b){
T temp = a;a = b;b = temp;
}
Binding at compile-time
53
Overloading
It occurs when...more than one entity is bound to a name at a given point... and name occurrence provides enough information to disambiguate the binding
Example
int i,j,k;
float a,b,c;
i = j + k; // int additiona = b + c; // float addition
a = b + c + b();// b is overloaded
a = b + c + b(i);// b is overloaded
54
Aliasing
It is the opposite of overloadingTwo names, N1 and N2, are aliases if they denote the same entity at the same program pointN1 and N2 share the same object in the same referencing environmentMain issue: modifications under N1 have an effect visible under N2Aliasing may lead to error prone and difficult to read programsint x =0;
int *i = &x;
int *j = &x;
int q1 (int a)
{
return a *= a;
}
void q2 (int &a)
{
a *= a;
}
...
int x = 2;
cout << q1(x);
cout << x;
q2(x);
cout << x;
Stampa 4Stampa 2
Stampa 4