informatica 3 - intranet deibhome.deib.polimi.it/restelli/mywebsite/pdf/syntax_semantics.pdf ·...

9/15/079/30/07

Marcello Restelli

Laurea in Ingegneria InformaticaPolitecnico di Milano

Informatica 3Syntax and Semantics

2

Introduction

Introduction to the concepts of syntax and semanticsBindingVariablesRoutines

3

Programming Languages

Tools for writing softwareFormal notations for describing algorithms for execution by computersFormal notations need

Syntaxset of rules specifying how programs are writtenspecifies the validity of a programformal notations: EBNF, Syntax Diagrams

Semanticsset of rules specifying “the meaning” of syntactically correct programsspecifies the effect of a programoperational semantics

4

Syntax

Rules that define admissible sequences of symbolsLexical rules

Define the “alphabet” of the languageE.g, set of chars, combinationsExample

Pascal does not distinguish between uppercase and lowercase chars, while C and Ada do<> is a valid operator in Pascal, but denoted by != in C, and /= in Ada

Syntactic rulesDefine how to compose words to build syntactically correct programsHelps programmer to write syntactically correct programAllows to check whether a program is syntactically correct

5

EBNF – Extended Backus Naur Form

Early descriptions were done using natural languageALGOL60 was described by John Backus using a context free grammar (BNF)EBNF is a metalanguage

I.e., a language to describe other languagesmetasymbols

::= symbol definition< > elements defined later (nonterminals)“{“ “;” “}” “if” “else” terminals| choice* zero or more occurrences of the preceding element+ one or more occurrences of the preceding element

6

Syntax Rules

Example:<program> ::= { <statement>* } rule<statement> ::= <assignment>|<conditional>|<loop> <assignment>::= <identifier> = <expr> ;<conditional>::=

if <expr> { <statement>+ } | if <expr> {<statement>+} else {<statement>+}

<loop> ::= while <expr> {<statement>+}<expr> ::= <identifier> | <number> |

( <expr> ) | <expr> <operator> <expr>

7

Lexical Rules

Example<operator> ::= + | - | * | / | = | /= | < | > | <= | >=<identifier> ::= <letter> <ld>*<ld> ::= <letter> | <digit><number> ::= <digit>+<letter> ::= a | b | c | . . . | z<digit> ::= 0 | 1 | . . . | 9

8

Syntax Diagrams

Graphical representation of the syntax rulesterminal symbols are put inside circlesnonterminals symbols are put inside rectanglesadmissible sentences are obtained by traveling the diagram according to the arrows

9

Syntax Diagrams


10

Syntax Diagrams


11

Syntax Diagrams


12

Syntax Diagrams


13

Syntax Diagrams


14

Abstract Syntax vs Concrete Syntax

There can be construction in different languages that have the same conceptual structure, but different lexical rules

Differences“{” and “}” instead of “begin” and “end”“!=” instead of “<>”In Pascal, “(” and “)” can be omitted

Same abstract syntax, but different concrete syntax

C Language

while (x != y) {

...}

Pascal

while x <> y do begin

...end

15

Pragmatics

Concrete syntax has a high impact on usabilityExample

In C language, when loops have only one instruction, parentheses may be omittedThis feature can be error proneOther languages (e.g., Modula-2 and Ada) address this issue by closing the conditional/loop statements with an “end” keywordIt is useful to adopt a programming style that avoids such problems (i.e., in C you should always use the curly braces even when there is only a single statement)

16

Language processing

InterpretationLanguage statements are directly executed

TranslationHigh-level language statements are translated into machine-level statements before being executed

17

Interpretation

Main cycleGet the next statementDetermine the actions to be executedPerform the actions

Interpretation is a simulation on a host of a special purpose computer with a high-level machine language

18

Translation

Cross-Translationtranslation is performed on a machine that is different from the one used for execution

needed when the execution machine has a special purpose processor

19

Interpretation vs Translation

Translation to intermediate codeInterpretation of the intermediate code of a virtual machinePortability of executable codeExamples

Java bytecode

20

Basic Concepts of a Programming Language

Programs deal with different types of entitiesvariablesroutinesstatements

Entities have different properties or attributesVariables

Name, type, storage area…

RoutinesName, formal parameters, parameter-passing information…

StatementsActions to be taken

Attribute information is stored in the descriptorBinding sets attributes for entities

21

The Concept of Binding

Is a key element to define the semantics of a programming languageDifferent programming languages…

…have different number of entities…have different number of attributes bound to entities…allow different binding time…have different binding stability

Binding timeDetermine when the binding occurs

StabilityDetermine whether certain bindings are fixed or modifiableStatic binding cannot be modified

22

Binding Time

Language definition timeE.g., in most languages (FORTRAN, Ada, C++) the integer type is bound at language definition

Language implementation timeE.g., then during implementation, the integer type is bound to a memory representation

Compile-time (translation-time) Pascal provides a way to redefine the integer type, the new representation is bound to the type when the program is compiled

Execution-time (run-time)Variables are usually bounded to a value during execution

StaticBinding

DynamicBinding

23

Static Binding vs Dynamic Binding

Static BindingEstablished before execution and cannot be changed thereafterE.g., FORTRAN, Ada, C, C++ integer bound to a set of values at definition/implementation time

Dynamic BindingEstablished at run-time and usually modifiable during executionE.g., variable values

Exceptionsin Pascal, read-only constant are variables whose value is initialized at run-time but cannot be modified thereafter

24

Variables

Conventional language variables are memory abstractions

In main memory, cells are identified by an addressThe contents of a cell are encoded representation of a valueThe value can be modified at run-time

VariablesVariables abstract the concept of memory cellThe variable name abstracts the address informationThe assignment statement abstracts the cell modification

25

Formal Definition of a Variable

A variable is a 5-tuple<name, scope, type, l-value, r-value>

NameString used in program statement to denote the variable

ScopeRange of program instructions over which the name is defined

TypeL-value

Memory location associated to the variable

R-valueEncoded value stored in the variable’s location

26

Name and Scope

The variable's name is a string of chars that is introduced by a declaration statementThe scope is the range of program statements over which the name is known

The variable’s scope usually extends from the declaration until a later closing point

A program can use variables through their name inside their scope

it means that a variable is visible with its name only inside its scope

27

Scope and Binding

The rules of binding of a variable inside its scope are language dependentStatic scope binding

The variable’s scope is defined in terms of the syntactic structure of the programThe scope of a variable can be determined just examining the program textMany programming languages adopt static scope binding

Dynamic scope bindingThe variable’s scope is defined during program executionEach declaration extends its effect over all the instructions executed thereafter, until a new declaration for the same variable is encounteredAPL, SNOBOL4, and LISP use dynamic scope binding

28

Example: Static Scope Binding

main(){

int x,y;...{

int temp;...temp = x-y;

}}

main(){

int x,y;...{

int temp;int x;...temp = x-y;

}}

29

Example: Dynamic Scope Binding

Case 1: A then CVariable x in block C refers to x declared in block A

Case 2: B then CVariable x in block C refers to x declared in block B

{ /* block A */ int x;}{ /* block B */ float x;}{ /* block C */ x = ...;}

30

Dynamic Scope Binding

Rules for dynamic scope binding are simple and rather easy to implementThe major disadvantages are in terms of

programming disciplineefficiency of implementation

Programs are hard to read and difficult to debug

31

Type

Set of values that can be associated to a variable, and...Set of operations that can be legally performed on such values

creating, reading, and accessing the values

It protects variables from nonsensical operationsA variable of a given type is also called an instance of the typeA language can be

Typeless or untyped, e.g., assembly languagesDynamically typed, e.g., LISPStatically typed

32

Type and Binding

Binding between type and set of admissible values and operations that modify the valuesBuilt-in types (e.g., integer)

the set of admissible value is associated when the language is definedthe operations are defined when the type is implemented

Declarations of new types (e.g., typedef float reale;)Binding between the type name and implementation at translation timeNew type inherits all operations

Abstract Data Typesallow to define new types by specifying the set of admissible values and the set of legal operations

33

Abstract Data Types

Typical ADT declaration is structured astypedef new_type_name{ data structure for new_type_name; operations to manipulate data objects;}

34

Abstract Data Types: Example

class StackOfChar {private: int size; char* top; char* s;public: StackOfChar (int sz) { top = s = new char [size=sz]; } ~StackOfChar ( ) {delete [ ] s;} void push (char c) {*top++ = c;} char pop ( ) {return *--top;} int length ( ) {return top - s;}};

35

Static vs Dynamic Type Binding

Statically typed language (C, C++, Pascal, etc.)Binding occurs at compile-timeBinding cannot be changed at run-time

type checking can be performed at compile-time

Dynamically typed language (LISP, APL, Smalltalk, etc.)Binding occurs at run-timeBinding can be changed at run-time variables are polymorphic

the type depends on the values assigned to the variabletype checking cannot be performed statically

No typed languages (script and assembly languages)memory cells and registries contain strings that can be interpreted as values of any type

36

lvalue

The l-value of a variable is the storage area bound to the variable during executionThe lifetime or extent of a variable is the period of time in which such a binding existsThe l-value is used to hold the r-valueData object is the pair <l-value,r-value>The binding between the variable and the l-value is the memory allocation

Memory allocation acquires storage and binds the variableLifetime extends from allocation to deallocation

37

Memory Allocation

Static allocation Allocation is performed before run-time, deallocation is performed upon termination

Dynamic allocationAllocation and deallocation are at runt-time performed on explicit request (through creation statements) or automatically

38

rvalue

The r-value of a variable is the encoded value stored in the location (l-value) of the variableThe r-value is interpreted according to variable's typeInstructions

Access variables through their l-value (lefthand side of assignments)Modify their r-value (righthand side of assignments)

x = y; // x is a location, while y is a value

The binding between variables and values is usually dynamicGenerally, it is static only for constants

39

Initialization

What is the r-value when the variable is created?Different languages, different solutions

Some languages (e.g., ML) requires that the binding is established at variable creationOther languages, like C and Ada, support such a binding but do not require it (int I, J=0;)

What if initialization is not provided?Ignore

the bit string found in the storage area is considered the variable initial value

System-defined initializationint = zero, char = blank

Special undefined valueThe variable is considered initialized to an undefined value, and any read access to an undefined variable is trapped

40

Reference and Unnamed Variables

Some languages allow variables that can be accessed through the r-value of another variableReference or pointers are variables whose r-value allow to access the content

of another variableof an unnamed variable

Access via a pointer is called dereferencing

int x = 5; int* px;px = &x;

5

xpx

41




Access via a pointer is called dereferencing

int* px;

px = malloc(sizeof(int)); 5

px

42




Access via a pointer is called dereferencingtype PI = ^integer;type PPI = ^PI;var pxi: PI;var ppxi: PPI;...new(pxi);pxi^ := 0;...new(ppxi);ppxi^ := pxi;

0

ppxi

pxi

43

Routines

Routines allow programs to be decomposed in a number of functional unitsRoutines are general concepts found in most languages

Subprograms in assembly languageSubroutines in FORTRANProcedures and functions in Pascal and AdaFunctions in C

Functions are routines that return a valueProcedures are routines that do not return a valueFormally, they can be defined as variables

<name, scope, type, l-value, r-value>

44

Routines: Name and Scope

NameIntroduced by a routine declaration

int sum(int n); // C prototype

ScopeThe scope of the routine name extends from the declaration point to some closing point, either statically or dynamically determined

In C, the end of the file where the routine is defined

ActivationRoutines activation is achieved through a routine invocation or routine callMust be specified routine's name and argumentsThe call statement must be in routine’s scope

45

Routines: Scope

ScopeLocal scope

Routines also define a scope for the declarations that are nested in them

local declarationssuch local declarations are only visible within the routine

non-local declarationsvisible from other units, but not from the routine

global declarationsvisible from all the units in the program

46

Routines: Type

The type of a routine is defined from its headername of the functiontype of input parameterstype of returned value

int function(int,int,float)

Type correctif the call conforms to the routine type

Signature

int sum(int n){ ... }...i = sum(10);

int sum(int n){ ... }...i = sum(5.3);

47

Routines: lvalue and rvalue

l-valueThe reference to the area where the routine body is stored

r-valueThe body that is currently bound to the routineUsually this binding is statically determined at translation timeSome languages allow variables of type “routine”

in C, we can define pointers to routines

Routines are considered first-class objects when... ...it is possible to distinguish between their l-value and r-value ...so as to allow the definition of variables of type routine

int sum(int);int (*ps)(int);ps = ∑int i = (*ps)(5);

48

Routines: Declaration and Definition

Some languages distinguish between routine declaration and definitionDeclaration

introduces the routine’s header without specifying the bodyspecifies scope

Definitionspecifies both the header and the body

The distinction between declaration and definition support mutual recursion

/* declaration of A, A is visible from now on */int A (int x, int y);/* definition of B, B is visible from now on */float B (int z){ int w, u;

/* A is visible here */ w = A (z, u); ...};int A(int x, int y){ ...

/* B is visible here */ float t = B(x); }

49

Routine instance

The representation of a routine during execution is called routine instanceCode segment

Contains the instructions of the unit, the content is fixed

Activation record (or frame)Includes all the data objects associated with the local variables of a specific routine instanceContains all the information needed to execute the routineIt is changeable

Referencing Environment of an instance U contains:U’s local variable, i.e., its local environmentU’s non local variables, i.e., its non local environmentNon local variables are modifiable via side-effect

50

Recursive Attivation

All instances of the same unit composed ofsame code segmentdifferent activation records

The binding between code segment and activation record constitutes a new unit instanceThe binding between an activation record and its code segment is dynamic

51

Routines: Parameters

Information passingdataroutine (some languages)

Parametersformal: routine's definitionactual: routine's call

Bindingpositional method

routine R(F1, F2, ..., Fn);call R(A1, A2, ..., An);

named associationroutine R(A:T1, B:T2, C:T3);call R(C=>Z, A=>X, B=>Y);

52

Generic Routines

ExampleRoutine to sort an array of integers

... int i,j; ... swap(i,j); //caso 1

Routine to sort an array of floats... float i,j; ... swap(i,j); //caso 2

void swap(int& a, int& b){

int temp = a;a = b;b = temp;

}

void swap(float& a, float& b){

float temp = a;a = b;b = temp;

}

template <class T> void swap(T& a, T& b){

T temp = a;a = b;b = temp;

}

Binding at compile-time

53

Overloading

It occurs when...more than one entity is bound to a name at a given point... and name occurrence provides enough information to disambiguate the binding

Example

int i,j,k;

float a,b,c;

i = j + k; // int additiona = b + c; // float addition

a = b + c + b();// b is overloaded

a = b + c + b(i);// b is overloaded

54

Aliasing

It is the opposite of overloadingTwo names, N1 and N2, are aliases if they denote the same entity at the same program pointN1 and N2 share the same object in the same referencing environmentMain issue: modifications under N1 have an effect visible under N2Aliasing may lead to error prone and difficult to read programsint x =0;

int *i = &x;

int *j = &x;

int q1 (int a)

{

return a *= a;

}

void q2 (int &a)

{

a *= a;

}

...

int x = 2;

cout << q1(x);

cout << x;

q2(x);

cout << x;

Stampa 4Stampa 2

Stampa 4

informatica 3 - intranet deibhome.deib.polimi.it/restelli/mywebsite/pdf/syntax_semantics.pdf ·...

Documents