chapter 7: data types

82
1 Chapter 7: Data Types

Upload: velika

Post on 06-Jan-2016

34 views

Category:

Documents


2 download

DESCRIPTION

Chapter 7: Data Types. Chapter 7: Data type. 7.1 Type Systems 7.2 Type Checking 7.3 Records (Structures) and Variants (Unions) 7.4 Arrays 7.5 Sets 7.6 Lists 7.7 Pointers and Recursive types. Data Types. Computers manipulate sequences of bits - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 7: Data Types

11

Chapter 7: Data Types

Page 2: Chapter 7: Data Types

22

Chapter 7: Data type

7.1 Type Systems

7.2 Type Checking

7.3 Records (Structures) and Variants (Unions)

7.4 Arrays

7.5 Sets

7.6 Lists

7.7 Pointers and Recursive types

Page 3: Chapter 7: Data Types

33

Data Types

• Computers manipulate sequences of bits

• But most programs manipulate more general data

– Numbers– String– Lists– …

• Programming languages provide data types that raise the level of abstraction from bits to data

– But computer hardware only knows about bits!

Page 4: Chapter 7: Data Types

44

Data TypesThe Purpose of Types

• Types provide implicit context– Compilers can infer more information, so programmers

write less code– E.g. the expression a+b in Java may be adding two

integer, two floats or two strings depending on the context

• Types provides a set of semantically valid operations

– Compilers can detect semantic mistakes– Make sure that certain meaningless operations do not

occur. Type checking cannot prevent all meaningless operations, but it catches enough of them to be useful.

Page 5: Chapter 7: Data Types

55

Type Systems

• High-level languages have type systems– All objects and expressions have a type– E.g. int (*)(const void *, const void *) is the

type of a C++ function: int compare(const void *, const void *)

• A type system consists of– A mechanism for defining types and associating them

with certain language constructs– A set of rules for type checking

» Type equivalence» Type compatibility» Type inference

Page 6: Chapter 7: Data Types

66

Type SystemsType Checking

• Type checking is the process of ensuring that a program obeys the language’s type compatibility rules

• Strongly typed languages always detect types errors

– Weakly typed languages do not– It means that the language prevents you from

applying an operation to data on which it is not appropriate.

– All expressions and objects must have a type– All operations must be applied in appropriate type

contexts

Page 7: Chapter 7: Data Types

77

• Static Typing means that the compiler can do all the checking at compile time.

– Common Lisp is strongly typed, but not statically typed.

– Ada is statically typed. – Pascal is almost statically typed. – Java is strongly typed, with a non-trivial mix of things

that can be checked statically and things that have to be checked dynamically.

Type SystemsStatic Type Checking

Page 8: Chapter 7: Data Types

88

• Dynamic type checking is performed at run time. It is a form of late binding.

• It tends to be found in languages that delay other issues as well.

– Lisp, Scheme, and Smalltalk are dynamically typed

• Languages with dynamic scoping are generally dynamically typed.

Type SystemsDynamic Type Checking

Page 9: Chapter 7: Data Types

99

What is a type?

• Three points of view:– Denotational: a set of values– Constructive: a type is built-in type or a composite

type» Composite types are created using type constructors» E.g. In Java, boolean is a built-in type, while boolean[] is a composite type

– Abstraction-based: a type is an interface that defines a set of consistent operations

• These points of view complement each other

Page 10: Chapter 7: Data Types

1010

Classification of TypesBuilt-in Types

• Built-in/Primitive/Elementary types– Mimic hardware units– E.g. boolean, character, integer, real (float)

• Their terminology and implementation varies across languages

• Characters are traditionally one-byte quantities using the ASCII character set

– Early computers had a different byte sizes» Byte = 8 bits standardized by Fred Brooks et al.

thanks to the IBM System/360– Other character sets have also been used– Newer languages use the Unicode character set

Page 11: Chapter 7: Data Types

1111

Built-in TypesNumeric Types

• Most languages support integers and floats– The range of value is implementation dependent

• Some languages support other numeric types– Complex numbers (e.g. Fortran, Python)– Rational numbers (e.g. Scheme, Common Lisp)– Signed and unsigned integers (e.g. C, Modula-2)– Fixed point numbers (e.g. Ada)

• Some languages distinguish numeric types depending on their precision

– Single vs. double precision numbers» C’s int (4 bytes) and long (8 bytes)

Page 12: Chapter 7: Data Types

1212

Classification of TypesEnumerations

• Enumeration improve program readability and error checking

• They were first introduced in Pascal– E.g. type weekday = (sun, mon, tue, wed, thu, fri, sat);

– They define an order, so they can be used in enumeration-controlled loops

– The same feature is available in C» enum weekday {sun, mon, tue, wed, thu, fri, sat};

– Other languages use constants to define enumeration– Pascal’s approach is more complete: integers and

enumerations are not compatible

Page 13: Chapter 7: Data Types

1313

Classification of TypesSubranges

• Subranges improve program readability and error checking

• They were first introduced in Pascal– E.g. type test_score = 0..100; type workday = mon..fri;

– They define an order, so they can be used in enumeration-controlled loops

– The distinction between derived types and subranges (i.e., subtypes) is a feature of Ada.

type test_score is new integer range 0..100;subtype workday is weekday range mon..fri;

Page 14: Chapter 7: Data Types

1414

Classification of TypesComposite Types

• Composite/Constructed types are created applying a constructor to one or more simpler types

• Examples– Records– Variant Records– Arrays– Sets– Pointers– Lists– Files

Page 15: Chapter 7: Data Types

1515

Classification of TypesOrthogonality

• Orthogonality is an important property in the design of type systems

• A collection of features is orthogonal if there are no restrictions on the ways in which the features can be combined.

– Pascal is more orthogonal than Fortran, because it allows arrays of anything, for instance.

• Orthogonality is nice primarily because it makes a language easy to understand, easy to use, and easy to reason about.

Page 16: Chapter 7: Data Types

1616

Chapter 7: Data type

7.1 Type Systems

7.2 Type Checking

7.3 Records (Structures) and Variants (Unions)

7.4 Arrays

7.5 Sets

7.6 Lists

7.7 Pointers and Recursive types

Page 17: Chapter 7: Data Types

1717

Type Checking

• Type equivalence– When are the types of two components the same?

• Type compatibility– When can a value of type A be used in a context that

expects type B?

• Type inference– What is the type of an expression, given the types of

the operands?

Page 18: Chapter 7: Data Types

1818

Type Checking

• Type compatibility - When can a value of type A be used in a context that expects type B?

• Type inference - What is the type of an expression, given the types of the operands?

• Type equivalence - When are the types of two objects the same?

Page 19: Chapter 7: Data Types

1919

Type Checking

• Type equivalence - When are the types of two objects the same?

Why is this important?

The compiler wants to determine if an object can be used in a certain context.

At a minimum, the object can be used if its type and thetype expected are equivalent.

Page 20: Chapter 7: Data Types

2020

Type Equivalence

• Type equivalence is defined in two principal ways– Structural equivalence– Name equivalence

• Two types are structurally equivalent if they have identical type structures

– They must have the same components– E.g. Fortran

• Two type are nominally equivalent (name equivalent) if they have the same name

– E.g. Java– Name equivalence is more fashionable these days

Page 21: Chapter 7: Data Types

2121

• Structural equivalence depends on simple comparison of type descriptions.

How to compare two structures: substitute all names and expand all the way to built-in types. Original types are equivalent if the expanded type descriptions are the same

• Pointers complicate matters, but the Algol folks figured out how to handle it in the late 1960's. The simple (not quite correct) approach is to pretend all pointers are equivalent.

Type EquivalenceStructural Equivalence

Page 22: Chapter 7: Data Types

2222

Type EquivalenceStructural Equivalence

• Consider these two structures

typedef struct {typedef struct {

int a, b;int a, b;

} foo2;} foo2;

typedef struct { int a, b; } foo1;typedef struct { int a, b; } foo1;

• All languages consider these two types structurally equivalent

Page 23: Chapter 7: Data Types

2323

Type EquivalenceStructural Equivalence

• Is the order in the declaration relevant?

typedef struct {

int a;

int b;

} foo1;

typedef struct {

int b;

int a;

} foo2;

• Most languages consider these two types structurally equivalent

Page 24: Chapter 7: Data Types

2424

Type EquivalenceStructural Equivalence Pitfall

• Are these two types structurally equivalent?

typedef struct {

char *name; char *address;int age;

} student;

typedef struct {

char *name; char *address;int age;

} school;

• They are, and it is unlikely that the programmer intended to allow these two types to be usedin the same context (although the compiler willhave no trouble doing so).

Page 25: Chapter 7: Data Types

2525

Type EquivalenceName Equivalence

• Name Equivalence assumes type definitions with different names are different types

– Otherwise, why did the programmer create two definitions?

• It solves the previous problem– student and school are not nominally equivalent

• Aliases:– Under strict name equivalence, aliases are not

equivalent» type A = B is a definition (i.e. new object)

– Under loose name equivalence, aliases are equivalent» type A = B is a declaration (i.e. binding)

Page 26: Chapter 7: Data Types

2626

Type EquivalenceName Equivalence and Aliases

• Loose name equivalence may introduce undesired type equivalences

Page 27: Chapter 7: Data Types

2727

7.2.2 Type Conversion

• A values of one type can be used in a context that expects another type using type conversion or type cast

• Two flavors:– Converting type cast: underlying bits are changed

» E.g. In C,int i; float f = 3.4;i = int(f);

– Nonconverting type cast: underlying bits are not altered» E.g. In C,int i; float f; int j = 4;i = j;i = *((int *) &f);

Page 28: Chapter 7: Data Types

2828

Notes

Does Java use structural or name equivalence?Make an entry in your Java notebook for this.

p 419

Page 29: Chapter 7: Data Types

2929

Type Checking

• Type compatibility - When can a value of type A be used in a context that expects type B?

• Type inference - What is the type of an expression, given the types of the operands?

• Type equivalence - When are the types of two objects the same?

Page 30: Chapter 7: Data Types

3030

7.2.3 Type Compatibility

• Most languages do not require type equivalence in every context

• Instead, a value's type is required to be compatible with that of the context in which it appears.

• Java example:

result = tirandXmlClient.process(GET_COST, xmlString);

Assume that process() returns an Account object. What is the type of result?

Page 31: Chapter 7: Data Types

3131

7.2.3 Type Compatibility

• In Ada, two types T and S are compatible if any of the following conditions is true:

– T and S are equivalent– T is a subtype of S– S is a subtype of T– T and S are arrays with the same number of

elements and the same type of elements

Page 32: Chapter 7: Data Types

3232

Type Compatibility

• Implicit type conversion due to type compatibility are known as type coercions

– They may involve low-level conversions

• Example: type weekday is (sun,mon,tue,wed,thur,fri,sat);

subtype workday is weekday range mon..fri;

Page 33: Chapter 7: Data Types

3333

Type Compatibility

• Type coercions make type systems weaker

• Example:

Page 34: Chapter 7: Data Types

3434

Type Compatibility

• Generic container objects require a generic reference type

• Example

Page 35: Chapter 7: Data Types

3535

Type Checking

• Type compatibility - When can a value of type A be used in a context that expects type B?

• Type inference - What is the type of an expression, given the types of the operands?

• Type equivalence - When are the types of two values the same?

Page 36: Chapter 7: Data Types

3636

Type Inference

• Type inference– What is the type of an expression, given the

types of the operands?

– Generally easy, but subranges and composites may complicate this issue

result = 5 + 7.5 + sin(j) - 10.5E-12

Example

Page 37: Chapter 7: Data Types

3737

An Example

Strict name equivalence: types are equivalent if they refer to the same declaration

Loose name equivalencetypes are equivalent if they refer to the same outermost

constructor,i.e. if they refer to the same declaration after factoring out any type

aliases.

Example:type alink = pointer to cell;subtype blink = alink;p, q : pointer to cell;r : alink;s : blink;u : alink;

Page 38: Chapter 7: Data Types

3838

Structural equivalence says all five variables have same type.

Strict name equivalence equates types of p & q and r & u, because they refer back to the same type declaration (a constructor on the RHS of a var declaration is an implicit declaration of an ANONYMOUS type)

Loose name equivalence equates types of p & q and r, s, & u because in both cases we can trace back to the same "^" constructor.

An Example

Page 39: Chapter 7: Data Types

3939

Chapter 7: Data type

7.1 Type Systems

7.2 Type Checking

7.3 Records (Structures) and Variants (Unions)

7.4 Arrays

7.5 Sets

7.6 Lists

7.7 Pointers and Recursive types

Page 40: Chapter 7: Data Types

4040

Records

• Also known as ‘structs’ and ‘types’.– Cstruct resident { char initials[2]; int ss_number; bool married;};

• fields – the components of a record, usually referred to using dot notation.

Page 41: Chapter 7: Data Types

4141

Nesting Records

• Most languages allow records to be nested within each other.

– Pascaltype two_chars = array [1..2] of char;type married_resident = record

initials: two_chars; ss_number: integer; incomes: record husband_income: integer; wife_income: integer; end; end;

Page 42: Chapter 7: Data Types

4242

Memory Layout of Records

initials(2 bytes)

married (1 byte)

ss_number(4 bytes)

• Fields are stored adjacently in memory.

• Memory is allocated for records based on the order the fields are created.

• Variables are aligned for easy reference.

initials(2 bytes)

ss_number(4 bytes)

married (1 byte)

Optimized for space Optimized for memory alignment

4 bytes / 32 bits 4 bytes / 32 bits

Page 43: Chapter 7: Data Types

4343

Simplifying Deep Nesting

• Modifying records with deep nesting can become bothersome. book[3].volume[7].issue[11].name := ‘Title’;

book[3].volume[7].issue[11].cost := 199;

book[3].volume[7].issue[11].in_print := TRUE;

• Fortunately, this problem can be simplified.

• In Pascal, keyword with “opens” a record.

with book[3].volume[7].issue[11] do

begin

name := ‘Title’;

cost := 199;

in_print := TRUE;

end;

Page 44: Chapter 7: Data Types

4444

Simplifying Deep Nesting

• Modula-3 and C provide better methods for manipulation of deeply nested records.

– Modula-3 assigns aliases to allow multiple openings with var1 = book[1].volume[6].issue[12], var2 = book[5].volume[2].issue[8] DO var1.name = var2.name; var2.cost = var1.cost;END;

– C allows pointers to types» What could you write in C to mimic the code above?

Page 45: Chapter 7: Data Types

4545

Variant Records

• variant records – provide two or more alternative fields.

• discriminant – the field that determines which alternative fields to use.

• Useful for when only one type of record can be valid at a given time.

Page 46: Chapter 7: Data Types

4646

Variant Records – Pascal Example

type resident = record

initials: array [1..2] of char;

case married: boolean of

true: (

husband_income: integer;

wife_income: integer;

);

false: (

income: real;

);

id_number: integer;

end;

initials (2 bytes)

married (1 byte)

husband_income (4 bytes)

wife_income (4 bytes)

initials (2 bytes)

married (1 byte)

income (4 bytes)

Case is TRUE

Case is FALSE

Page 47: Chapter 7: Data Types

4747

Unions

• A union is like a record– But the different fields take up the same space within memory

union foo { int i; float f; char c[4];}

• Union size is 4 bytes!

Page 48: Chapter 7: Data Types

4848

Union example (from an assembler)

union DisasmInst {

#ifdef BIG_ENDIAN

struct { unsigned char a, b, c, d; } chars;

#else

struct { unsigned char d, c, b, a; } chars;

#endif

int intv;

unsigned unsv;

struct { unsigned offset:16, rt:5, rs:5, op:6; } itype;

struct { unsigned offset:26, op:6; } jtype;

struct { unsigned function:6, sa:5, rd:5, rt:5, rs:5,

op:6; } rtype;

};

Page 49: Chapter 7: Data Types

4949

void CheckEndian() {

union {

char charword[4];

unsigned int intword;

} check;

check.charword[0] = 1;

check.charword[1] = 2;

check.charword[2] = 3;

check.charword[3] = 4;

#ifdef BIG_ENDIAN

if (check.intword != 0x01020304) { /* big */

cout << "ERROR: Host machine is not Big Endian.\nExiting.\n";

exit (1);

}

#else

#ifdef LITTLE_ENDIAN

if (check.intword != 0x04030201) { /* little */

cout << "ERROR: Host machine is not Little Endian.\nExiting.\n";

exit (1);

}

#else

cout << "ERROR: Host machine not defined as Big or Little Endian.\n";

cout << "Exiting.\n";

exit (1);

#endif // LITTLE_ENDIAN

#endif // BIG_ENDIAN

}

Another union example

Page 50: Chapter 7: Data Types

5050

Chapter 7: Data type

7.1 Type Systems

7.2 Type Checking

7.3 Records (Structures) and Variants (Unions)

7.4 Arrays

7.5 Sets

7.6 Lists

7.7 Pointers and Recursive types

Page 51: Chapter 7: Data Types

5151

Arrays

• Group a homogenous type into indexed memory.

• Language differences: A(3) vs. A[3]. – Brackets are preferred since parenthesis are typically

used for functions/subroutines.

• Subscripts are usually integers, though most languages support any discrete type.

Page 52: Chapter 7: Data Types

5252

Array Dimensions

• C uses 0 -> (n-1) as the array bounds.– float values[10]; // ‘values’ goes from 0 -> 9

• Fortran uses 1 -> n as the array bounds.– real(10) values ! ‘values’ goes from 1 -> 10

• Some languages let the programmer define the array bounds.– var values: array [3..12] of real; (* ‘values’ goes from 3 -> 12 *)

Page 53: Chapter 7: Data Types

5353

Multidimensional Arrays

• Two ways to make multidimensional arrays– Both examples from Ada

– Construct specifically as multidimensional.

matrix: array (1..10, 1..10) of real;-- Reference example: matrix(7, 2)

» Looks nice, but has limited functionality.

– Construct as being an array of arrays.

matrix: array (1..10) of array (1..10) of real;-- Reference example: matrix(7)(2)

» Allows us to take ‘slices’ of data.

Page 54: Chapter 7: Data Types

5454

Array Memory Allocation

• An array’s “shape” (dimensions and bounds) determines how its memory is allocated.

– The time at which the shape is determined also plays a role in determining allocation.

• At least 5 different cases for determining memory allocation:

Page 55: Chapter 7: Data Types

5555

Memory Layout Options

• Ordering of array elements can be accomplished in two ways:

– row-major order – Elements travel across rows, then across columns.

– column-major order – Elements travel across columns, then across rows.

Row-major Column-major

Page 56: Chapter 7: Data Types

5656

Row Pointers vs. Contiguous Allocation

• Row pointers – an array of pointers to an array. Creates a new dimension out of allocated memory.

• Avoids allocating holes in memory.

yadrut

aSyadirF

yadsruhTy

adsendeWy

adseuTyad

noMyadnuS

day[4]

day[6]

day[5]

day[3]

day[2]

day[1]

day[0]

Array = 57 bytes

Pointers = 28 bytes

Total Space = 85 bytes

Page 57: Chapter 7: Data Types

5757

Row Pointers vs. Contiguous Allocation

• Contiguous allocation - array where each element has a row of allocated space.

• This is a true multi-dimensional array.– It is also a ragged array

S u n d a y

M o n d a y

T u e s d a y

W e d n e s d a y

T h u r s d a y

F r i d a y

S a t u r d a y

Array = 70 bytes

Page 58: Chapter 7: Data Types

5858

Array Address Calculation

• Calculate the size of an element (1D)

• Calculate the size of a row (2D)– row = element_size * (Uelement - Lelement + 1)

• Calculate the size of a plane (3D)– plane = row_size * (Urows - Lrows + 1)

• Calculate the size of a cube (4D)

::

Page 59: Chapter 7: Data Types

5959

Array Address Calculation

• Address of a 3-dimenional array A(i, j, k) is:address of A

+ ((i - Lplane) * size of plane)

+ ((j - Lrow) * size of row)

+ ((k - Lelement) * size of element)

A

Memory

A(i, j)A(i) A(i, j, k)

Page 60: Chapter 7: Data Types

6060

Chapter 7: Data type

7.1 Type Systems

7.2 Type Checking

7.3 Records (Structures) and Variants (Unions)

7.4 Arrays

7.5 Sets

7.6 Lists

7.7 Pointers and Recursive types

Page 61: Chapter 7: Data Types

6161

Sets

• Introduced by Pascal, found in most recent languages as well.

• Common implementation uses a bit vector to denote “is a member of”.

– Example: U = {‘a’, ‘b’, …, ‘g’}A = {‘a’, ‘c’, ‘e’, ‘g’} = 1010101

• Hash tables needed for larger implementations. – Set of integers = (232 values) / 8 = 536,870,912 bytes

Page 62: Chapter 7: Data Types

6262

Enumerations

• enumeration – set of named elements– Values are usually ordered, can compareenum weekday {sun,mon,tue,wed,thu,fri,sat}if (myVarToday > mon) { . . . }

• Advantages– More readable code– Compiler can catch some errors

• Is sun==0 and mon==1?– C/C++: yes; Pascal: no

• Can also choose ordering in Cenum weekday {mon=0,tue=1,wed=2…}

Page 63: Chapter 7: Data Types

6363

Chapter 7: Data type

7.1 Type Systems

7.2 Type Checking

7.3 Records (Structures) and Variants (Unions)

7.4 Arrays

7.5 Sets

7.6 Lists

7.7 Pointers and Recursive types

Page 64: Chapter 7: Data Types

6464

Lists

• list – the empty list or a pair consisting of an object (list or atom) and another list(a . (b . (c . (d . nil))))

• improper list – list whose final pair contains two elements, as opposed to the empty list(a . (b . (c . d)))

• basic operations: cons, car, cdr, append

• list comprehensions (e.g. Miranda and Haskell)[i * i | i <- [1..100]; i mod 2 = 1]

Page 65: Chapter 7: Data Types

6565

Chapter 7: Data type

7.1 Type Systems

7.2 Type Checking

7.3 Records (Structures) and Variants (Unions)

7.4 Arrays

7.5 Sets

7.6 Lists

7.7 Pointers and Recursive types

Page 66: Chapter 7: Data Types

6666

Pointers

• pointer – a variable whose value is a reference to some object

– pointer use may be restricted or not» only allow pointers to point to heap (e.g. Pascal)» allow “address of” operator (e.g. ampersand in C)

– pointers not equivalent to addresses!– how reclaim heap space?

» explicit (programmer’s duty)» garbage collection (language implementation’s duty)

Page 67: Chapter 7: Data Types

6767

Recursive Types

• recursive type - type whose objects may contain references to other objects of the same type

– Most are records (consisting of reference objects and other “data” objects)

– Used for linked data structures: lists, treesstruct Node {Node *left, *right;int data;

}

Page 68: Chapter 7: Data Types

6868

Recursive Types

• In reference model of variables (e.g. Lisp, Java), recursive type needs no special support. Every variable is a reference anyway.

• In value model of variables (e.g. Pascal, C), need pointers.

Page 69: Chapter 7: Data Types

6969

Value vs. Reference

• Functional languages– almost always reference model

• Imperative languages– value model (e.g. C)– reference model (e.g. Smalltalk)

» implementation approach: use actual values for immutable objects

– combination (e.g. Java)

Page 70: Chapter 7: Data Types

7070

Value Model – More on C

• Pointers and single dimensional arrays interchangeable, though space allocation at declaration differentint a[10]; int *b;

• For subroutines, pointer to array is passed, not full array

• Pointer arithmetic– <Pointer, Integer> addition

int a[10];int n;n = *(a+3);

– <Pointer, Pointer> subtraction and comparisonint a[10];int * x = a + 4;int * y = a + 7;int closer_to_front = x < y;

Page 71: Chapter 7: Data Types

7171

Dangling References

• dangling reference – a live pointer that no longer points to a valid object

– to heap object: in explicit reclamation, programmer reclaims an object to which a pointer still refers

– to stack object: subroutine returns while some pointer in wider scope still refers to local object of subroutine

• How do we prevent them?

Page 72: Chapter 7: Data Types

7272

Dangling References

• Prevent pointer from pointing to objects with shorter lifetimes (e.g. Algol 68, Ada 95). Difficult to enforce

• Tombstones

• Locks and Keys

Page 73: Chapter 7: Data Types

7373

Tombstones

• Idea– Introduce another level of indirection: pointer contain the address of the

tombstone; tombstone contains address of object– When object is reclaimed, mark tombstone (zeroed)

• Time overheads– Create tombstone– Check validity of access– Double indirection

• Space overheads– when to reclaim??

• Extra benefits– easy to compact heap– works for heap and stack

Page 74: Chapter 7: Data Types

7474

Page 75: Chapter 7: Data Types

7575

Locks and Keys

• Idea– Every pointer is <address, key> tuple– Every object starts with same lock as pointer’s key– When object is reclaimed, object’s lock marked (zeroed)

• Advantages– No need to keep tombstones around

• Disadvantages– Objects need special key field (usually implemented only for heap

objects)– Probabilistic protection

• Time overheads– Lock to key comparison costly

Page 76: Chapter 7: Data Types

7676

Page 77: Chapter 7: Data Types

7777

Garbage Collection

• Language implementation notices when objects are no longer useful and reclaims them automatically

– essential for functional languages– trend for imperative languages

• When is object no longer useful?– Reference counts– Mark and sweep– “Conservative” collection

Page 78: Chapter 7: Data Types

7878

Reference Counts

• Idea– Counter in each object that tracks number of pointers

that refer to object– Recursively decrement counts for objects and

reclaim those objects with count of zero

• Must identify every pointer– in every object (instance of type)– in every stack frame (instance of method)– use compiler-generated type descriptors

Page 79: Chapter 7: Data Types

7979

Type descriptors example

public class MyProgram {

public void myFunc () {

Car c;

}

}

public class Car {

char a, b, c;

Engine e;

Wheel w;

}

public class Engine {

char x, y;

Valve v;

}

public class Wheel {...}

public class Valve {...}

Car type descriptor at 0x018

i offset address

0 4 (Engine) 0x0A2

1 5 (Wheel) 0x005

Engine type descriptor at 0x0A2

i offset address

0 3 (Valve) 0xB05

Wheel type descriptor at 0x005

Valve type descriptor at 0xB05

myFunc type descriptor at 0x104

i offset address

0 0 (Car) 0x018

Page 80: Chapter 7: Data Types

8080

Mark-and-Sweep

• Idea… when space low1. Mark every block “useless”2. Beginning with pointers outside the heap,

recursively explore all linked data structures and mark each traversed as useful

3. Return still marked blocks to freelist

• Must identify pointers– in every block– use type descriptors

Page 81: Chapter 7: Data Types

8181

Garbage Collection Comparison

• Reference Count– Will never reclaim circular

data structures– Must record counts

• Mark-and-Sweep– Lower overhead during

regular operation– Bursts of activity (when

collection performed, usually when space is low)

Page 82: Chapter 7: Data Types

8282

Conservative collection

• Idea– Number of blocks in heap is much smaller than number of possible

addresses (232) – a word that could be a pointer into heap is probably pointer into heap

– Scan all word-aligned quantities outside the heap; if any looks like block address, mark block useful and recursively explore words in block

• Advantages– No need for type descriptors– Usually safe, though could “hide” pointers

• Disadvantages– Some garbage is unclaimed– Can not compact (not sure what is pointer and what isn’t)