languages and compilers (sprog og oversættere)

1

Languages and Compilers(SProg og Oversættere)

Bent Thomsen

Department of Computer Science

Aalborg University

With acknowledgement to Simon Gay and Elizabeth White who’s slides this lecture is based on.

2

Types

• Watt & Brown may leave you with the impression that types in languages are simple and type checking is a minor part of the compiler

• However, type system design and type checking and/or inferencing algorithms is one of the hottest topics in programming language research at present!

• Types:– Provides a precise criterion for safety and sanity of a design.

– “Features” correspond to types.

– Close connections with logics and semantics.

3

Type checking

Triangle is statically typed: all type errors are detected at compile-time.Most modern languages have a large emphasis on static type checking.(But object-oriented programming requires some runtime type checking:e.g. Java has a lot of compile-time type checking but it is still necessaryfor some potential runtime type errors to be detected by the runtime system.) Scripting languages such as Perl and Python are exceptions,having little or no static type checking.

Type checking involves calculating or inferring the types of expressions(by using information about the types of their components) andchecking that these types are what they should be (e.g. the condition inan if statement must have type Boolean).

4

Types

Each type is associated with a set of values.

Example: the set of values of type Boolean is {false, true}.The set of values of type Integer is a finite set such as {-maxint … maxint }, not the mathematical set of integers.

Operations are naturally associated with types: for example, + makessense for integers but not for booleans (even though, ultimately, bothintegers and booleans are represented by bit patterns). The purpose oftype checking is to protect the programmer by detecting errors of thiskind. Type information is also good documentation.

It is useful to consider the structure of types and type constructors,independently of the form which they take in particular languages.

5

Primitive Types

Basic data types, (more or less) easily understood as sets of possiblevalues. E.g. (Java) boolean, int, float, double.

Strings are sometimes viewed as primitive types, but are more often aspecial case of a structured type. E.g. in Ada a string is an array ofcharacters; in Java a string is an object.

Some languages (e.g. Pascal, Ada) provide enumerated types which areuser-defined sets of values.E.g. type Colour = (Red, Blue, Green);

This is different from enum types in C, which create abbreviations fora range of integers: enum {red, blue, green}yields red = 0, blue = 1, green = 2, so that red+blue = blue etc.

6

Arrays

An array is a collection of values, all of the same type, indexed bya range of integers (or sometimes a range within an enumerated type).

In Ada: a : array (1..50) of Float;In Java: float[] a;

Most languages check at runtime that array indices are within thebounds of the array: a(51) is an error. (In C you get the contents of thememory location just after the end of the array!)

If the bounds of an array are viewed as part of its type, then arraybounds checking can be viewed as typechecking, but it is impossible todo it statically: consider a(f(1)) for an arbitrary function f.

Static typechecking is a compromise between expressiveness andcomputational feasibility.

7

Products and RecordsIf T and U are types, then T U (written (T * U) in SML) is the typewhose values are pairs (t,u) where t has type T and u has type U.Mathematically this corresponds to the cartesian product of sets. Moregenerally we have tuple types with any number of components. Thecomponents can be extracted by means of projection functions.

Product types more often appear as record types, which attach a labelor field name to each component. Example (Ada):

type T isrecord x : Integer; y : Floatend record

8

Products and Records

type T isrecord x : Integer; y : Floatend record

If v is a value of type T then v containsan Integer and a Float. Writing v.x and v.y can be more readable than fst(v) and snd(v).

Record types are mathematically equivalent toproducts.

An object can be thought of as a record in which some fields arefunctions, and a class definition as a record type definition in whichsome fields have function types. Object-oriented languages alsoprovide inheritance, leading to subtyping relationships betweenobject types.

9

Variant RecordsIn Pascal, the value of one field of a record can determine the presenceor absence of other fields. Example: type T = record

x : integer; case b : boolean of false : (y : integer); true : (z : boolean) end

It is not possible for statictype checking to eliminate all typeerrors from programs which usevariant records in Pascal:the compiler cannot check consistency between the tag field and thedata which is stored in the record. The following code passes thetype checker in Pascal: var r : T, a : integer;

begin r.x := 1; r.b := true; r.z := false; a := r.y * 5end

10

Variant Records in AdaAda handles variant records safely. Instead of a tag field, the typedefinition has a parameter, which is set when a particular record iscreated and then cannot be changed.

type T(b : Boolean) is record x : Integer; case b is when False => y : Integer; when True => z : Boolean end caseend record;

declare r : T(True), a : Integer;begin r.x := 1; r.z := False; a := r.y * 5;end;

r does not have field y, and never will

this type error can be detected statically

11

Disjoint Unions

The mathematical concept underlying variant record types is thedisjoint union. A value of type T+U is either a value of type T or avalue of type U, tagged to indicate which type it belongs to:

T+U = { left(x) | x T } { right(x) | x U }

SML and other functional languages support disjoint unions bymeans of algebraic datatypes, e.g.

datatype X = Alpha String | Numeric IntThe constructors Alpha and Numeric can be used as functions to buildvalues of type X, and pattern-matching can be used on a value of typeX to extract a String or an Int as appropriate.

An enumerated type is a disjoint union of copies of the unit type (whichhas just one value). Algebraic datatypes unify enumerations and disjointunions (and recursive types) into a convenient programming feature.

12

Variant Records and Disjoint Unions

The Ada type: type T(b : Boolean) is record x : Integer; case b is when False => y : Integer; when True => z : Boolean end caseend record;

can be interpreted as

(Integer Integer) + (Integer Boolean)

where the Boolean parameter b plays the role of the left or right tag.

13

Functions

In a language which allows functions to be treated as values, we needto be able to describe the type of a function, independently of itsdefinition.

In Ada, defining function f(x : Float) return Integer is …

produces a function f whose type isfunction (x : Float) return Integer

the name of the parameter is insignificant (it is a bound name) so thisis the same type as function (y : Float) return Integer

In SML this type is written Float Int

14

Functions and Procedures

Float Int Int

A function with several parameters can be viewed as a function withone parameter which has a product type:

function (x : Float, y : Integer) return Integer

In Ada, procedure types are different from function types:

procedure (x : Float, y : Integer)

whereas in Java a procedure is simply a function whose result typeis void. In SML, a function with no interesting result could begiven a type such as Int ( ) where ( ) is the empty product type(also known as the unit type) although in a purely functional languagethere is no point in defining such a function.

15

Structural and Name EquivalenceAt various points during type checking, it is necessary to check that twotypes are the same. What does this mean?

structural equivalence: two types are the same if they have the samestructure: e.g. arrays of the same size and type, records with the samefields.

name equivalence: two types are the same if they have the same name.

Example: if we define type A = array 1..10 of Integer;type B = array 1..10 of Integer;function f(x : A) return Integer is …var b : B;

then f(b) is correct in a language which uses structural equivalence,but incorrect in a language which uses name equivalence.

16

Structural and Name EquivalenceDifferent languages take different approaches, and some use both kinds.

Ada uses name equivalence. Triangle uses structural equivalence.Haskell uses structural equivalence for types defined by type (these are viewed as new names for existing types) and name equivalence for types defined by data (these are algebraic datatypes; they are genuinely new types).

Structural equivalence is sometimes convenient for programming, butdoes not protect the programmer against incorrect use of values whosetypes accidentally have the same structure but are logically distinct.

Name equivalence is easier to implement in general, especially in alanguage with recursive types (this is not an issue in Triangle).

17

Recursive Types

Example: a list is either empty, or consists of a value (the head)and a list (the tail)

SML: datatype List = Nil | Cons (Int * List)

Cons 2 (Cons 3 (Cons 4 Nil)) represents [2,3,4]

Abstractly: List = Unit + (Int List)

18

Recursive Types

Ada: type ListCell;type List is access ListCell;type ListCell is record head : Integer; tail : List; end record;

so that the name ListCellis known here

this is a pointer(i.e. a memory address)

In SML, the implementation uses pointers, but the programmer doesnot have to think in terms of pointers.

In Ada we use an explicit null pointer null to stand for the empty list.

19

Recursive Types

Java: class List { int head; List tail;}

The Java definition does not mention pointers, but in the same way asAda, we use the explicit null pointer null to represent the empty list.

20

Equivalence of Recursive Types

In the presence of recursive types, defining structural equivalence ismore difficult.

We expect List = Unit + (Int List)

and NewList = Unit + (Int NewList)

to be equivalent, but complications arise from the (reasonable)requirement that List = Unit + (Int List)

and NewList = Unit + (Int (Unit + (Int NewList)))

should be equivalent.

It is usual for languages to avoid this issue by using name equivalencefor recursive types.

21

Other Practical Type System Issues

• Implicit versus explicit type conversions– Explicit user indicates (Ada, SML)

– Implicit built-in (C int/char) -- coercions

• Overloading – meaning based on context– Built-in

– Extracting meaning – parameters/context

• Polymorphism• Subtyping

22

Polymorphism

Polymorphism describes the situation in which a particular operator orfunction can be applied to values of several different types. There is afundamental distinction between:• ad hoc polymorphism, usually called overloading, in which a single name refers to a number of unrelated operations. Example: +• parametric polymorphism, in which the same computation can be applied to a range of different types which have structural similarities. Example: reversing a list.

Most languages have some support for overloading.

Parametric polymorphism is familiar from functional programming,but less common (or less well developed) in imperative languages.

23

Subtyping

The interpretation of a type as a set of values, and the fact that one setmay be a subset of another set, make it natural to think about whena value of one type may be considered to be a value of another type.

Example: the set of integers is a subset of the set of real numbers.Correspondingly, we might like to consider the type Integer to be asubtype of the type Float. This is often written Integer <: Float.

Different languages provide subtyping in different ways, including(in some cases) not at all. In object-oriented languages, subtypingarises from inheritance between classes.

24

Subtyping for Product Types

The rule is:if A <: T and B <: U then A B <: T U

This rule, and corresponding rules for other structured types, can beworked out by following the principle:

T <: U means that whenever a value of type U is expected, it issafe to use a value of type T instead.

What can we do with a value v of type T U ?• use fst(v) , which is a value of type T• use snd(v) , which is a value of type UIf w is a value of type A B then fst(w) has type A and can be usedinstead of fst(v). Similarly snd(w) can be used instead of snd(v).Therefore w can be used where v is expected.

25

Subtyping for Function Types

if T <: A and B <: U then A B <: T U

Suppose we have f : A B and g: T U and we want to use f in place of g.

It must be possible for the result of f to be used in place of the resultof g , so we must have B <: U.

It must be possible for a value which could be a parameter of g to begiven as a parameter to f , so we must have T <: A.

Therefore:

Compare this with the rule for product types, and notice thecontravariance: the condition on subtyping between A and T is theother way around.

26

Subtyping in Java

Instead of defining subtyping, the specification of Java says whenconversion between types is allowed, in two situations:• assignments x = e where the declared type of x is U and the type of the expression e is T• method calls where the type of a formal parameter is U and the type of the corresponding actual parameter is T.

In most cases, saying that type T can be converted to type U means thatT <: U (exceptions: e.g. byte x = 10 is OK even though 10 : int and itis not true that int <: byte )

Conversions between primitive types are as expected, e.g. int <: float.

For non-primitive types:• if class T extends class U then T <: U (inheritance) • if T <: U then T[] <: U[] (rule for arrays)

27

Subtyping in JavaConversions which can be seen to be incorrect at compile-time generatecompile-time type errors. Some conversions cannot be seen to beincorrect until runtime. Therefore runtime type checks are introduced,so that conversion errors can generate exceptions instead ofexecuting erroneous code.

Example: class Point {int x, y;}

class ColouredPoint extends Point {int colour;}

A Point object has fields x, y. A ColouredPoint object has fieldsx, y, colour. Java specifies that ColouredPoint <: Point, and this makessense: a ColouredPoint can be used as if it were a Point, if we forgetabout the colour field.

28

Point and ColouredPoint

Point[] pvec = new Point[5];

ColouredPoint[] cpvec = new ColouredPoint[5];

CP CP CP CP CPP P P P P

pvec cpvec

29





pvec cpvec

pvec = cpvec; pvec now refers to an array of ColouredPointsOK because ColouredPoint[] <: Point[]

30





pvec cpvec


pvec[0] = new Point( ); OK at compile-time, but throws an exceptionat runtime

31





pvec cpvec


cpvec = pvec;compile-time error because it is not the casethat Point[] <: ColouredPoint[]BUT it’s obviously OK at runtime becausepvec actually refers to a ColouredPoint[]

32





pvec cpvec


cpvec = (ColouredPoint[])pvec;

introduces a runtime check that the elementsof pvec are actually ColouredPoints

33

Subtyping Arrays in Java

The rule if T <: U then T[] <: U[]

is not consistent with the principle that

T <: U means that whenever a value of type U is expected, it issafe to use a value of type T instead

because one of the operations possible on a U array is to put a Uinto one of its elements, but this is not safe for a T array.

The array subtyping rule in Java is unsafe, which is why runtime typechecks are needed, but it has been included for programmingconvenience.

34

Subtyping and Polymorphism

abstract class Shape { abstract float area( ); }

the idea is to define several classes of shape,all of which define the area function

class Square extends Shape { float side; float area( ) {return (side * side); } }

class Circle extends Shape { float radius; float area( ) {return ( PI * radius * radius); } }

Square <: Shape

Circle <: Shape

35

Subtyping and Polymorphism

float totalarea(Shape s[]) { float t = 0.0; for (int i = 0; i < s.length; i++) { t = t + s[i].area( ); }; return t;}

totalarea can be applied to any array whose elements are subtypesof Shape. (This is why we want Square[] <: Shape[] etc.)

This is an example of a concept called bounded polymorphism.

36

Formalizing Type Systems

• The Triangle type system is extremely simple– Thus its typing rules are easy to understand from a verbal

description in English

• Languages with more complex type systems, such as SML, has a type system with formalized type rules– Mathematical characterizations of the type system

– Type soundness theorems

• Some languages with complex type rules, like Java, ought to have had a formal type system before implementation! – But a lot of effort has been put into creating formal typing

rules for Java

37

How to go about formalizing Type systems

• Very similar to formalizing language semantic with structural operational semantics

• Assertions made with respect to the typing environment.Judgment: |- where is an assertion, is a static typing

environment and the free variables of are declared in Judgments can be regarded as valid or invalid.

39

Example Type Rules

(addition) |- int, |- int

|- E + E: int

(conditional)

|- bool, |- S1: T, S2: T

|- if E then S1 else S2: T

(function call)

|- FT1 T2, |- E: T1

|- F(E): T2

40

Very simple example

• Consider inferring the type of 1 + F(1+1) where we know 1: int and F: int int

• 1 + 1: int by addition rule• F(1+1): int by function call rules• 1 + F(1 + 1) : int by addition rule

41

Type Derivations

• A derivation is a tree of judgments where each judgment is obtained from the ones immediately above by some type rule of the system.

• Type inference – the discovery of a derivation for an expression

• Implementing type checking or type inferencing based on a formal type system is an (relatively) easy task of implementing a set of recursive functions.

42

Connection with Semantics

• Type system is sometimes called static semantics– Static semantics: the well-formed programs– Dynamic semantics: the execution model

• Safety theorem: types predict behavior.– Types describe the states of an abstract machine model.– Execution behavior must cohere with these descriptions.

• Thus a type is a specification and a type checker is a theorem prover.

• Type checking is the most successful formal method!– In principal there are no limits.– In practice there is no end in sight.

• Examples:– Using types for low-level languages, say inside a compiler.– Extending the expressiveness of type systems for high-level languages.

languages and compilers (sprog og oversættere)

Documents