2 programming language concepts using c and c++_data level structure - wikibooks, open books for an...

8/10/2019 2 Programming Language Concepts Using C and C++_Data Level Structure - Wikibooks, open books for an open

1/30

Programming Language Concepts Using C and

C++/Data Level Structure

In this chapter, we will start with defining properties common to all data items usable in a programming context and

then move on to classifying data according to their structure and type. While doing so, we will also try to give an idea

of how they can be laid out in memory.

Contents

1 General Properties

1.1 Mutability

1.1.1 Immutable Data

1.1.2 Mutable Data

1.2 Visibility

1.3 Accessibility2 Data Categories

2.1 Data Elements

2.1.1 Primitive Data Elements

2.1.2 Data Elements that Are Addresses

2.1.3 Compound Data Elements

2.2 Structures

2.2.1 Data Structures

2.2.2 Storage (Memory) Structures

2.2.3 File Structures

3 Data Types

3.1 Relationships between Data Types

3.1.1 Equivalence3.1.2 Extension

3.1.3 Implementation

3.2 Declarations

3.2.1 Explicit Type Declarations

3.2.2 Implicit Type Declarations

3.3 Scalar Data Types

3.3.1 Numeric Types

3.3.2 Logical (Boolean) Type

3.3.3 Pointer Type

3.4 Structured Data Types

3.4.1 Strings3.4.2 Arrays

3.4.2.1 Multidimensional Arrays

3.4.2.2 Associative Arrays

3.4.3 Lists

3.4.3.1 Multilists

3.4.4 Dynamic Arrays

3.4.5 Records

3.4.5.1 Variant Records

3.4.5.2 Variable-length Records

3.4.5.3 Alignment Requirements

3.4.6 Sets

3.4.7 Trees

3.4.8 Graphs

3.4.9 User-defined Types

3.4.10 Abstract Data Types

gramming Language Concepts Using C and C++/Data Level Structure... http://en.wikibooks.org/wiki/Programming_Language_Concepts_Using...

30 20-11-2014 13:00


2/30

4 Notes

General Properties

Mutability

Immutable Data

Constantis a data item that remains unchanged throughout its lifetime. A constant may be used literally or may be

named. Named constants are sometimes termed symbolic constants or figurative constants.

Literal and named constants.

3, ' 5' , . TRUE. , "St r i ng l i teral " are examples to literal constants, whereas const doubl e pi =3. 141592654; is the C++ definition of [an approximation to] as a named constant.

Some programming languages make a distinction between constants whose values are determined at compile-time

and those whose values are determined at run-time. In C#, for example, the former is tagged with the keywordconstwhile the latter with readonly. In Java, constancy of a field (or local identifier) is flagged with the final

keyword and presence or absence of the staticmodifier classifies the associated data item to be a compile-time or

run-time constant, respectively.

Example: Compile-time constants in C#.

public class Mat h { . . . public const /* static final in Java */ double pi = 3. 141592654; public const double e = 2. 718281;

. . .}

Here the values for pi and e(Euler constant) can be determined even before the program starts to run. So they aredefined to be const.

Note that same definitions are valid for all instances, if there is any, of the Mat hclass. That is, the value of pi or edoes not change from one instance to another. As a matter of fact, they exist independently of the instances as class

fields. In other words, they are [implicitly] static. Explicit use of stat i ctogether with const in C++, which is asource of inspiration for C#, is a manifestation of this.

Example: Run-time constants in C#.

public class Ci t i zen { . . . Ci t i zen ( . . . , long SSNof t heNewCi t i zen, . . . ) { . . . SSN = SSNof t heNewCi t i zen; . . . } . . .

private readonly long SSN; /* private final long SSN in Java*/} // end of class Citizen

Here, value for SSNcannot be determined before the related Ci t i zenobject is created. Once the object is created,


30 20-11-2014 13:00


3/30


4/30

{ doubl e i 1; doubl e d; . . . } /* end of inner block */ . . .} /* end of outer block */. . .

In the above code fragment, scopes of variables (identifiers) are:

i 1(of outer block), i 2: from its declaration point to the end of the outer blocki 1(of inner block), d: from its declaration point to the end of the inner block

Visibilities of the variables are:

i 2, i 1(of inner block), d: same with their scopei 1(of outer block): its scope minus the inner block

Accessibility

A notion found in modular and object-oriented languages, accessibility constraints can be applied on data. This is

done for two purposes:

To preserve the consistency of the data and1.

To give the implementer the freedom of changing the implementation details.2.

Example: Access constraints in Java.

public class St ack {

. . . private int _top; private Obj ect [ ] _cont ent s;} // end of class Stack

Declaring the implementation details in the above class definition asprivatemarks them off-limits to the user,

which means the below statements are not permitted.

st k. _t op = 0; st k. _cont ent s[ 7] = new I nt eger ( 5) ;

Issuing the first statement would probably cause a nonempty St ackobject to look empty. Similarly, the secondstatement would probably populate the St ackobject by adding a new element to some location other than thatindicated by_t op, which is definitely against the definition of a stack.

Additionally, such a definition enables the implementer of the St ackclass to change the implementation details. Forinstance, she may choose to use a Vect or as the underlying data structure. Now that it is not known to the outsideworld, the users of the St ackclass will not be affected by this decision.

Data Categories

Data Elements

The most basic data entity is the data element. These data entities may be grouped together to form structures.

Primitive Data Elements


30 20-11-2014 13:00


5/30

Primitive data elements are those that can be directly operated on by machine-language instructions and are broadly

divided into numeric, character, logical (boolean). Numeric data elements can further be divided into integers and

reals.

Example: Primitive data elements in C/C++.

In C/C++, char , i nt , shor t , l ong, and l ong l ongare used for representing integers. char can also beinterpreted as holding a single-byte character. f l oat , doubl e, and l ong doubl erepresent floating-point

single, double, and extended precision values. bool is used to represent boolean values.[1]

Data Elements that Are Addresses

An addressis a value that indicates a location in the process image created as a result of running the program.

Depending on the program segment it points to, an address falls in either one of two groups.

Labelis actually the address of a program statement, an address in the code segment of a program. It may also

be thought of as a data element that may be operated upon by a goto operation.

1.

Pointersreference or point to other elements in the code and data segments. In the former case, a pointer

indicating the start of a subprogram can be used to invoke the subprogram dynamically, which enablesimplementation of polymorphic subprograms by means of callbacks. Pointers into the data segment often refer

to unnamed data elements. They are heavily used in constructing dynamic data structures and recursive

algorithms.Handles, which may be considered as intelligent pointers, serve the same purpose.

2.

Compound Data Elements

Strings of caharcters, generally referred to as simply strings, are linear sequences of characters. They are sometimes

considered as primitive data elements, because they are directly operated on by machine language instructions, and

sometimes classified as data structures.

Structures

Data Structures

Data structures'are organized collections of data elements that are subject to certain allowable operations. Data

structures are logical entities in the sense that they are created by programmers and are operated on by high-level

programs. These may have little bearing to the physical entities, that is, the storage structures operated on by

machine language code.

Data structures may be classified as:

Linear vs. nonlinear: A linear, as opposed to nonlinear, data structure is one in which the individualcomponents are an ordered sequence. Examples to linear data structures are strings, arrays, and lists. Examples

to nonlinear data structures include trees, graphs, and sets.

Static vs. dynamic: A static structure is one that has no capacity for change, specifically with regard to its size,

during the course of execution. Arrays and records are examples to static data structures. An example to

dynamic data structures is lists.

Storage (Memory) Structures

Storage structuresare data structures after they have been mapped to memory. While the data structure is the logical

organization of your data, the storage structure represents the way in which your data is physically stored in memory

during the execution of your program.


30 20-11-2014 13:00


6/30

Possible layouts for multi-dimensional arrays.

Suppose you are working with a two-dimensional array. Your array, say n-by-m, is not actually stored in two

dimensions in the memory, but as a linear sequence of elements. In row-major order, the array is stored as

follows:

A(1, 1), A(1, 2), ..., A(1, m), A(2, 1), ..., A(n, 1), A(n, 2), ..., A(n, m)

In column-major order, same array is laid out in memory as follows:

A(1, 1), A(2, 1), ..., A(n, 1), A(1, 2), ..., A(1, m), A(2, m), ..., A(n, m)

Storage can be allocated in two ways:

Sequential: A structure allocated this way may also be called a static structure since it is incapable of change

throughout its lifetime. Such structures use implicit ordering: components are ordered by virtue of their

sequential ordering in the structure (indexing).

Linked: A structure allocated this way may also be called a dynamic structure since the data structure can

grow and shrink during its lifetime. They use explicit ordering: each component contains within itself the

address of the next item so that it is in effect "pointing" to its own successor.

As for the pros and cons of each method:

In static structures, full storage remains allocated during the entire lifetime of the structure. Additionally, in

order to avoid overflow, maximum theoretical size is used in the declaration of the structure. For these reasons,

static structures are not memory efficient. In dynamic structures, there is a space overhead due to the pointer

field(s).

Due to the possibility of shifting, which can be avoided at the expense of some extra memory, insertions into

and deletions from any position other than the end of a static structure are expensive. Such is not the case for

dynamic structures.

Using the index value direct access is possible in static structures, which means accessing any item in the

structure takes constant time. This, however, is not valid for dynamic structures. This weakness, however, canbe alleviated by imposing a hierarchical structure on the data.

File Structures

File structuresrefer to data residing in secondary storage. When program execution terminates, file structures are

the only structures to survive the termination of the program.

The data hierarchy refers to the logical organization of data that is probably stored on secondary or external storage

media such as magnetic tape.

A file is a collection of related records, related to a particular application. A record is a collection of related data

items, or fields, related to a single object of processing. A field is a data item, a piece of information (either a data

element or a structured data item) that is contained within the record.

A file can be accessed for input, output, input/output, and append. It can be processed in two different modes: batch

mode, query mode. In batch mode, the component records of the file are operated on in sequential order. Report

generation for test results of a class is an example to this type of processing. In query mode individual records are

manipulated by accessing them directly. Retrieving the record of one single student falls into this category.

Another important issue is the organization of the file on secondary storage. This (physical) ordering of records can

be done in three ways:

Sequential: The file is seen as a linear sequence of records. Such files cannot be accessed in input/outputaccess mode. Text files are typical examples of sequential files.

Relative: Each record in the file can be accessed directly by location. Naturally, such organization becomes

possible only when the file is stored on a direct-access storage device (DASD). The mapping between the key

field in the record and the location on disk can be done in two ways: direct-mapping and hashing.


30 20-11-2014 13:00


7/30

Indexed sequential: This is a compromise between the first two methods. Files are stored sequentially on a

DASD but there is also an index file that allows optimum direct access by way of a search on index.

Each file organization technique must be evaluated on the basis of:

Access time: The time it takes to find a particular data item.

Insertion time: The time it takes to insert a new data item. This includes the time it takes to find the correct

place to insert the new data item as well as the time it takes to update the index structure.

Deletion time: The time it takes to delete a data item. This includes the time it takes to find the item to bedeleted as well as the time it takes to update the index structure.

Space overhead: The additional space occupied by an index structure.

These in turn are affected by three factors. The system must first move the head to the appropriate track or cylinder.

This head movement is called a seekand the time to complete is seek time. Once the head is at the right track, it must

wait until the desired block rotates under the read-write head. This delay is called latency time. Finally, the actual

transfer of data between the disk and main memory can take place. This last part is transfer time.

Typical operations on files are: open, close, read, write, EOF, and maintenance operations such as sorting, merging,

updating, and backup.

Data Types

A data type, usually referred to as simply type, is composed of a domain of data elements and a set of operations that

act on those elementsoperations that can construct, destroy, or modify instances of those data elements.

In addition to built-in types, which can be structured as well as primitive data types, most languages have facilities

for the definition of new data types by the user. [ALGOL68 and Pascal were the first programming languages to

provide this.]

A type systemis a facility for defining new types and for declaring variables to be of such types. A type system may

also have the capability for type checking, which may be static or dynamic depending on whether this checking is

done at compile time or during execution.

Example: (Restricted) type system of C.

Basic types: char , shor t , i nt , l ong, f l oat , doubl e

Type constructors: * , [ ] , ( ) , struct , uni on

t ypedef char Per f ormance[ 2] ;t ypedef char * St r i ng;

t ypedef struct {

St r i ng f i r st _name; St r i ng l ast _name; Per f ormance grades;} St udent ;

t ypedef voi d ( *EVAL_PERF) ( Student* Cl ass) ;. . .

A strongly typedprogramming language is one in which the types of all variables are determined at compile time.

Programs written in such a language, which is said to have static typing, must explicitly declare all programmer-

defined words. Storage requirements for global and local variables are determined completely during compile time. Astrongly typed programming language may include a typing facility for defining new types.

With dynamic typing, variables are not bound to type at compile time. Languages that allow for dynamic typing of

variables, which are also classified as weakly typed, may utilize dynamic type checking, which requires the presence


30 20-11-2014 13:00


8/30


9/30

var

v1, v2: ar r t ype1; v3: ar r t ype2;

In the above fragment, v1and v2have equivalent types, while type of v3is different than that of these variables.

Extension

A type is said to extend another if all of its instances can be seen as instances of the type it extends. The base type,

the one being extended, is called a generalizationof the extended type, while extended type is said to be a

specializationof its base type.

Thanks to the nature of the relation, an expression of extended type is assignment-compatible with a variable of

base-type.

The most popular technique used to provide this relation is inheritancecommon to most object-oriented

programming languages.

Implementation

A type is said to implement another if it provides an implementation for all operations listed in the interface of the

implemented type.

Implementation can be seen as a special case of extension, where the base type does not provide any implementation

of its operations.

In many object-oriented programming languages, such a relation is found between an interfaceand its implementing

class.

Declarations

In languages with static type checking, the program must somehow communicate the types of the identifiers it uses.

Complete knowledge of identifier types at compile time leads to a more efficient program because of the following

reasons.

More efficient allocation of storage. For instance, all integer types can be stored similarly using the largest such type.

But, if you know the exact type, you dont have to allocate the largest size. More efficient routines at run time. A +

B is handled differently depending upon whether A and B are integers or real numbers Compile-time checks. Many

invalid uses of the programming constructs are spotted before the program even starts to run.

On the other hand, at the expense of ensuring type safety compilers may at times reject valid programs.

Identifier types can be communicated in two ways.

Explicit Type Declarations

Widely preferred over the alternative, the programmer uses declarative statements to communicate the types of

variables, functions, and so on. Some programming languages, such as ML and Haskell, do not require the

programmer to provide type declarations for all identifiers. Through a process called type inference, compiler does its

best to figure out the types of expressions.

Implicit Type Declarations

In (some versions of) languages like FORTRAN and BASIC, the way a variable is named reveals its type.


30 20-11-2014 13:00


10/30

Implicit type declarations in Fortran.

In versions of FORTRAN before FORTRAN 90, an identifier beginning with I-N is taken to be an integer, and an

identifier beginning with any other letter is taken to be real.

Scalar Data Types

A scalar data typehas a domain composed only of individual primitive data elements.

Numeric Types

Numeric types are related to or represent quantities in the outside world. However, whereas in real life these types

may have infinite domains, in the world of computers their domains are finite.

Numeric types.

Integer, floating-point, fixed-point, and complex numbers.

Logical (Boolean) Type

Variables of such a type can take on only two values, trueor false, which may be represented in the machine as 0

and 1, zero and non-zero. Typical operations on booleans are and, or , not .

Pointer Type

Apointeris a reference to an object or data element. A pointer variable is an identifier whose value is a reference to

an object.

Pointers are important for the dynamic allocation of a previously undetermined amount of data. Additionally, they

permit the same data to reside on different structures simultaneously. In other words, they make sharing of data

possible.

Pointer variables point to and provide the means for accessing unnamed, or anonymous variables. Consequently,

operations on pointers must distinguish between operations on the pointer variable itself and operations on the

quantity to which the pointer is pointing.

Example: Pointers in C/C++

i nt num1, num2, *pnum;1.. . .2.. . .3.pnum = &num1;4.num1 = 15;5.num2 = *pnum;6.. . .7.

Memory layout after line 6

pnumin the above example is used to access a location holding an i nt value. As a matter of fact, it can hold areference to i nt s only. This is the most important difference between a pointer and a plain address: while an addresscan be used to refer to values of any type, a pointer holds a reference to a specific data type. However, a precious

tool in implementing generic collections in C, type agnosticism of addresses can be reclaimed by disciplined use of

voi d * .

In C, where pointers are heavily used, advantages may turn into maintenance nightmares. Following program is an

example to this.

Example: Pointer pitfall in C.


f 30 20-11-2014 13:00


11/30

#i ncl ude 1.2.

i nt mai n( voi d) {3.i nt ch = 65;4.i nt * p2i ;5.const i nt *p2ci = &ch;6.

7.

p2i = p2ci ; /* !!! */8.pr i nt f ( "p2ci : %i\tp2i : %i\n" , *p2ci , *p2i ) ;9.*p2i = ch++; /* !!! */10.pr i nt f ( "p2ci : %i\tp2i : %i\n" , *p2ci , *p2i ) ;11.

12.exi t ( 0) ;13.

} /* end of int main(void) */14.

When we compile and run this program, it will produce the following output.

p2ci: 65 p2i: 65

p2ci: 66 p2i: 66

This is a violation of the contract we have made. On line 6, we guarantee that the content of the location pointed to

by p2ci will not change. On line 8, we assign p2ci to p2i , which is another pointer that lets itself be updated. Welater go on to change the value found at the location through the non-constant pointer p2i , which means the valuepointed to by p2ci is also changed.

The compiler may issue a warning for this error, which is the case in the GNU C compiler. But you cannot rely on

this: not all compilers will issue a warning. Sometimes, the programmer will turn off the warnings to avoid reading

annoying messages and the message will go unnoticed.

Structured Data Types

A structured data type has domain elements that are themselves composed of other scalar or structured type

elements.

Strings

Strings are ordered set of data elements of dynamically changing size.

Two attributes can be associated with a string: type and length. Type refers to the domain of the individual elementswhile length refers to the number of elements. (Note that length and size are two different things.)

The type attribute is generally character although bit strings are also commonly used for implementing sets.

Example: Character strings in C/C++.

char *name = " At i l l a" ;

Another possible representation of the same string in a different language would be:

Note that number of bytes reserved for the length prefix can change from implementation to implementation.

Another point to keep in mind: ASCII is not without competition. Alternatives include Unicode and ISO/IEC-8859


f 30 20-11-2014 13:00


12/30

series of ASCII-based encodings. In the case of Unicode, each character takes up two bytes in memory.[2]

Example: BSTRtype used in [OLE] Automation.[3]

CSt r i ng name = _T( "At i l l a") ;BSTR bst r Name = name. Al l ocSysSt r i ng( ) ;

As hinted by the above figure, BSTRis used for exchanging character string data between components writtenpossibly using different languages. As a matter of fact the length prefix is inherited from Visual Basic while the

terminating null character is taken from C.

Note that length prefix holds the number of bytes the character string proper occupies and the bst r Nameidentifieris actually a pointer to the first character.

Typical operations on strings are:

Concatenationcreates a string from smaller strings.

Substringcreates a string from a subsequence of another string.Index testsfor containment of a smaller string in a larger string. It returns the index value where the

subsequence containing the smaller string starts.

Lengthreturns the number of components in the string.

Insert, delete, replace, ...

Arrays

An array may be defined as a fixed-size, ordered set of homogeneous data. It is stored in contiguous memory

locations and this makes direct access possible. Its fixed-size makes an array a static structure.

In some programming languages such as Standard Pascal, size must be known at compile time while in many this sizecan also be given at run time. But, even in the latter case, size of the array does not change during its lifetime.

Example: Static nature of arrays in Standard Pascal.

program pr i mes( i nput , out put ) ; const N = 1000; var a: array [ 1. . N] ofboolean; . . .

begin . . . end

This Pascal fragment must be recompiled and run for a different value of N.Example: Open arrays in Oberon.

PROCEDURE SetZero (VAR v: ARRAY OF REAL) ;VAR j : INTEGER;BEGIN

j : = 0; WHILE j < LEN( v) DO v[j ] : = 0; I NC(j ) END

END Set Zero;

In the above Oberon fragment, any actual (one dimensional) array parameter with element type REALis compatible

with v. The subprogram can be called with a one-dimensional array of any size. But, once the array is passed as an


f 30 20-11-2014 13:00


13/30

argument its size cannot change.

Among the important attributes of an array are component type, dimensionality of the array, and size in each

dimension.

Arrays can be represented in the memory in two different ways:

Row-major (Almost all major programming languages)1.

Column-major (FORTRAN)2.

In order to minimize the access time to individual array components, we should make sure that fastest changing

indexes in our programs (that is, the innermost loop variables) correspond to the fastest changing indexes in the

memory layout. If we ignore the loop ordering, with large multidimensional arrays, the virtual memory performance

may suffer badly.

Example: Multidimensional array usage in Pascal.

var a: array [ 3. . 5, 1. . 2, 1. . 4] of integer;. . . for i : = 3 to 5 do

for j : = 1 to 2 do for k : = 1 to 4 do {some processing done using a[i, j, k]}

This program fragment shows the accurate loop order for a language that represents arrays using row-major

representation. Note that the innermost loop variable (the fastest changing one) in the fragment corresponds to the

fastest changing index in the memory layout. This correspondence must be extended to other loop variables and

indexes, i.e. the second innermost loop variable must correspond to the second fastest changing index in the memory

and ... .

Example: Multidimensional array usage in Fortran.

DO k = 1, 4 DO j = 1, 2 DO i = 3, 5 {some pr ocessi ng done usi ng a( i , j , k) } END DO END DOEND DO

This second fragment gives the correct loop order for a language with column-major representation.


f 30 20-11-2014 13:00


14/30

It should be noted that header information included in the array representation is not standard and may vary or even

disappear in some programming languages. Most notable example to this is Java, in which size of the array is not

used in type checking. This means one can use an array handle to manipulate arrays of different sizes.

Example: Type-compatibility of arrays in Java.

int[ ] i nt Ar r ay = new int[ 10] ;. . . // use intArray as an array of 10 intsi nt Ar r ay = new int[ 20] ; // OK!. . . // use intArray as an array of 20 ints

This may at first seem to contradict the static nature of arrays. After all, size of i nt Ar r ayhas been changed from10 to 20. Not really! What we have done in the previous code fragment is to make i nt Ar r ayindicate two differentarray objects in the heap. In other words, we have changed the array handle, not the array object itself.


f 30 20-11-2014 13:00


15/30

Determining the address of an element in a Pascal-style array.

Given the pseudo-Pascal declaration

var a: array [ 5. . 10, 0. . 3, - 2. . 2] of integer;

calculate the address of a[7, 2, 1] assuming a) column-major representation and b) row-major representation. For

both cases, assume that base of a is 200 and an integer is represented in four bytes.

a) (10 5 + 1) * (3 0 + 1) * (1 (-2 )) + 1 gives us the order of the component at [5, 0, 1]. We need (10 5 +

1) * (2 0) more to get to [5, 2, 1]. (7 5) more and we reach [7, 2, 1]. So, [7, 2, 1] is the (6 * 4 * 3 + 1) + (6 *

2) + 2 = 87th component. It can be found at address 200 + (87 1) * 4 = 544.

The above calculation can be generalized as follows:

In C-based programming languages, where the lower bound is always taken to be 0, this formula can be

simplified to:

Memory address of a particular component is given as follows:[4]

b) (2 (-2) + 1) * (3 0 + 1) * (7 5) + 1 gives us the order of the component at [7, 0, -2]. (2 (-2) + 1) * (2

0) more and we are at (7, 2, -2). [7, 2, 1] is (1 (-2)) location after [7, 2, -2]. So, [7, 2, 1] is the (5 * 4 * 2 + 1) +

(5 * 2) + 3 = 54th component. Its address is 200 + (54 1) * 4 = 412.

After generalization we have:

For C-based programming languages this is simplified to:


f 30 20-11-2014 13:00


16/30

Determining the address of an element in a C-style array.

Given the pseudo-C declaration

doubl e a[ 13] [ 10] [ 9] ;

calculate the address of a[ 7] [ 6] [ 8] assuming a) column-major representation and b) row-majorrepresentation. For both cases, assume that base of a is 200 and a doubl eis represented in eight bytes.

a)

b)

Multidimensional Arrays

Support for multidimensional arrays can be provided in two ways: jagged arrayssometimes referred to as ragged

arraysand rectangular arrays.[5]

Example: Jagged arrays in Java.

int[ ] [ ] numAr r = { {1, 2, 3}, {4, 5, 6, 7}, {8, 9} };

What we have here is actually an array of arrays. Since different arrays can possibly have differing component

counts, sub-arrays in our example can and do have different lengths. Same array can be formed using the following

code, which reflects the way arrays are treated in Java.

int[ ] [ ] numArr = new int[ 3] [ ] ;

numArr [ 0] = new int[ 3] ;for ( int i = 0; i


17/30

This leads us to the conclusion, which is also reflected in the layout given before, that Java-style multidimensional

arrays are not necessarily allocated in contiguous memory locations.[6]

Example: Rectangular arrays in C#.

int[ , ] 4 numMat r i x = { {1, 2}, {3, 4}, {5, 6} };

Here, we have a 3-by-2 matrix. Its also guaranteed that the entire matrix is allocated in contiguous memory

locations.

Associative Arrays

Array index is usually of a scalar type although languages such as Perl and Tcl provide for non-scalar indexes

through the use of hashing.5 Such arrays are called associative arrays.

Example: Associative arrays in Perl.

$di cti onar y{' wor d' } = ' szck, kel i me' ;$gr ade{' Emr ah' } = 90;$di cti onar y{' sent ence' } = ' t mce, cml e' ;

Thanks to the operator name overloading facility, languages such as C++ and C# incorporate such a facility by

means of a class that overloads the subscript operator ([ ] ).

Example: Associative arrays in C#.

HashTabl e di ct i onar y = new HashTabl e( ) ;. . .di cti onar y[ "word"] = "szck, kel i me" ;di cti onar y[ "sent ence"] = "t mce, cml e";. . .Consol e. Wr i t e( "sent ence i n Tur ki sh i s {0}", di ct i onar y[ "sentence"] ) ;

// will print "sentence in Turkish is tmce, cmle" on the standard output

In languages that lack operator name overloading, such as Java, one has to resort to using the relevant class with its

messages.

Example: Associative arrays in Java.

Map di ct i onar y = new HashMap( ) ;. . .di cti onar y. put ( "wor d", "szck, kel i me") ;

di cti onar y. put ( "sentence", " t mce, cml e") ;. . .Syst em. out . pr i nt ( "Sent ence i n Tur ki sh " + di cti onar y. get ( "sent ence") ) ;// will print "sentence in Turkish is tmce, cmle" on the standard output

Lists

Lists are generalized, dynamic, linear data structures. A (linked) list is a set of items organized sequentially, just like

an array. In an array, the sequential organization is provided implicitly (by the position in the array); in a list, we use

an explicit arrangement in which each item is part of a node that also contains a link to the next node.

Example: Linked lists in Pascal.

type

l i nk = node;


f 30 20-11-2014 13:00


18/30

node = record key : integer; next : l i nk end;var

head : l i nk;. . .

Depending on how links are provided, we have:

Singly linked lists: These provide us only with a forward pointer pointing to the next node in the list.

Doubly linked lists: These provide us with two pointers, one pointing to the next and one pointing to the

previous node in the list.

Another classification is possible with how the first and last nodes in the list are related:

Circular lists: In circular lists, the first and last nodes in the list are connected with link(s). In a circular doubly

linked list, the next link of the last node points to the first node and previous link of the first node points to the

last node of the list. In the case of a circular singly linked list, there is no pointer from the first node to the last

one.

Linear lists: In linear lists, there are no links between the first and last nodes. The next field of the last node

and the previous field of the first node point nowhere, i.e. they are null.

In the implementation of a list, one can make use of a header and/or a dummy end node.

Operations on lists are:

Creation/destruction operations.

Insert: Before/after a node with a certain key value; into the end, at the beginning.

Remove: A component with a certain key value; the first, last component.

Search: A component with a certain key value.

Empty: Test whether or not the list is empty.

Some commonly used, specialized (restricted) forms of lists are stack (LIFO), queue (FIFO), deque (Double-ended

queue), output-restricted deque, and input-restricted deque.

Multilists

Multilists are similar to lists. The only difference is that nodes reside on more than one list simultaneously.

Example: Pascal representation of a sparse matrix using a multilist.

type

l i nk = node; node = record r ow_no, col umn_no : integer; key : integer; next _r ow, next _col umn : l i nk end;


f 30 20-11-2014 13:00


19/30

Dynamic Arrays

A cross between arrays and lists, dynamic arrays can grow or shrink their size in the run-time. Known to the

Java-world as Vect or and C#-world as Ar r ayLi st , a dynamic array maintains an internal, heap-allocated array

and replaces this with a larger (smaller) one as need arises.

Records

A recordas a logical data structure is a fixed-size, ordered collection of possibly heterogeneous components that

may themselves be structured. It may also be called the hierarchical or structured type. Record components, often

called fields, are accessed by name rather than by subscript.

Example: Record type definition in COBOL.

01 STUDENT- RECORD.

02 NAME. 03 FI RST- NAME PIC X( 15) . 03 MI DDLE- INITIAL PIC X. 03 LAST- NAME PIC X( 15) . 02 STUDENT-NO PIC 9( 9) . 02 TEST- SCORES. 03 MI DTERM PIC 9( 3) . 99. 03 FINAL PIC 999. 99. 02 ASSI GNMENTS OCCURS 5 TIMES PICTURE IS 9( 3) . 9( 2) .

Variant Records

Some languages allow us to have variations of a record. Such a record with variations is called a variant record. This

means that there will be some fields that are common to all variations and some fields that are unique to each

variation.

Example: Variant records in C.

enum shape_t ag {Ci rcl e, Squar e};struct SHAPE { f l oat area;

enum shape_t ag t ag; uni on { f l oat r adi us; f l oat edge_l engt h;


f 30 20-11-2014 13:00


20/30


21/30

Before we see examples of how records are laid out in the memory, lets look at a rather important issue affecting it:

alignment requirements.

Alignment Requirements

For reasons of efficiency, some architectures do not let a data element start at an address that is not a multiple of the

data elements size. As a result of this alignment, it takes fewer memory accesses to fetch the data required in the

program. This improvement on the running speed, however, comes at the expense of more memory.

It should be noted that the alignment scheme offered in the following examples is not the only one. Programming

environments may in some way let you alter the way data is aligned. As a matter of fact, programming environments

built on top of the Intel architecture may even let you turn on and off alignment.

Alignment requirement of double

An IEEE754 double precision floating point number can be represented in 8 bytes. A data type compatible with

this specification and implemented on an architecture with an alignment requirement stated as above cannot start

at memory location, say, 4 or 6. It should start at locations whose addresses are divisible by 8. So, it can start at 0,

8, ..., 33272, ... .

The alignment requirement for a record type will be at least as stringent as for the component having the most

stringent requirements. The record must terminate on the same alignment boundary on which it started.

Example: Alignment in records.

struct S { doubl e val ue; char name[ 10] ;};

struct S {char name[ 10] ;doubl e val ue;};

Realize that alignment requirement of an array is equal to the alignment requirement of its component. Whether the

array size is 1 or a million bytes, its alignment requirement is always the same. This is due to the fact that an array

declaration is shorthand for defining multiple variables. The above fragment should actually be considered as given

below:

struct S { doubl e val ue; char name0, name1, name2, . . . , name9;};

So, alignment requirement of our structure definitions is equal to that of val ue: 8.

Example:

Alignmentrequirements

in variantrecords.

type

t agt ype= ( f i r s t ,


f 30 20-11-2014 13:00


22/30

second) ; vt ype =record

f 1 :integer; f 2 :real; case c

: t agt ypeof

f i r s t :( f 3, f 4 :integer) ;second :( f 5, f 6 :real) end;

var

v :vtype;

If we change the selector of the variant part to case t agt ype ofwe have

Depending on the architecture, this saves us 4 to 8 bytes for each record. But it leaves us without type safety.

Sets

A setis a nonlinear structure containing an unordered collection of distinct values. Typical operations on sets are

insertion of an individual component, deletion of an individual component, union, intersection, difference, and

membership.

Example: Sets in Pascal.

type

days = ( sun, mon, t ues, wed, t hur , f r i , sat ) ; dayset = set of days

var weekdays, weekend: dayset ;. . .

begin

weekdays : = [ mon, t ues, wed, t hur , f r i ] ; weekend : = [ sun, sat ] ; . . . . . .end.

Base types in sets are restricted to scalar types and, due to the storage requirements; the number of potential

members is severely limited.


f 30 20-11-2014 13:00


23/30

Trees

A treeis a nonempty collection of nodes and edges that satisfies certain requirements. A node is a simple object

containing link(s) to other nodes and some value; a link to another node is called an edge.

In a tree,

Predecessor of a node is called its parent.

Successor(s) of a node is (are) called its child(ren).Nodes without successors are called leaves.

The node without a predecessor is called the root.

There are no unreachable nodes from the root.

There is only one path between the root and some node.

Trees are encountered frequently in everyday life. For example, many people keep track of their ancestors and/ or

descendants with a family tree. As a matter of fact, much of the terminology derives from this usage. Another

example is found in the organization of sports tournaments.

Example: Representing binary trees in Pascal.

type

l i nk = node; node = record i nf o: char; l ef t _ chi l d, r i ght_chi l d: l i nk end;

Graphs

Similar to a tree, a graphis a collection of nodes and edges. Unlike trees, graphs do not have the notion of a root, a

leaf, a parent, or a child. A node, also called vertex, can live in isolation; that is, it can be unreachable. For some

vertex pairs, there can be more than one path connecting them to each other.

A great many problems are naturally formulated using graphs. For example, given an airline route map of Europe, we

might be interested in questions like: "What's the fastest way to get from Izmir to St. Petersburg?" It's very likely that

many city pairs have more than one path connecting them. Another example is found in finite-state machines.

A graph can be represented in two ways:

1. Adjacency matrix representation

A V-by-V, where V is the number of vertices, array of boolean values is maintained, with a[x, y] set to true if

there is an edge from vertex x to vertex y and false otherwise.

Example: Adjacency matrix representation in Pascal.

program adj mat r i x ( i nput , out put ) ;const max_no_of _ver t i ces = 50;

type

mat r i x_t ype = array [ 1. . max_no_of _ver t i ces, 1. . max_no_of _vert i ces] ofboolean;

var a: mat r i x_t ype;. . .

. . .


f 30 20-11-2014 13:00


24/30

2. Adjacency-structure representation

In this representation all the vertices connected to each vertex are listed on an adjacency list for that vertex.

Example: Adjacency structure representation in Pascal,

program adj l i st ( i nput , out put ) ;const max_no_of _ver t i ces = 100;

type l i nk = node; node = record v: integer; next : l i nk end; bucket _ar r ay = array [ 1. . max_no_of _ver t i ces] of l i nk;var adj : bucket _ar r ay;. . .. . .

While adjacency matrix is the better choice for dense graphs, for sparse graphs adjacency-structure representation

turns out to be a more feasible solution.

User-defined Types

Besides enhancing the readability and clarity of the program text, defining user-defined types makes it possible to

compose a complicated data structure once and then create as many instances (variables of that type) of it as

necessary and, secondly, to use the languages own type-checking facility for input data validation such as range or

consistency checking.

User-defined types come in two flavors:

1. Enumeration types

An enumeration type provides for the enumeration of the domain of the type by the programmer. The domain

values are listed in a declarative statement.

Example: Enumeration type in C++

In C++, instead of

const i nt i nput = 1;const i nt out put = 2;const i nt append = 3;. . .

bool open_f i l e( st r i ng f i l e_name, i nt open_mode) ; . . . i f ( open_f i l e( "Sal esRepor t " , append) ) . . .

we can have

enum open_modes {i nput = 1, out put , append};. . .bool open_f i l e( st r i ng f i l e_name, open_modes om) ;. . . i f ( open_f i l e( "Sal esRepor t " , i nput ) ) . . .

2. Subtypes


f 30 20-11-2014 13:00


25/30

A subtype is the specification of the domain as a subrange of another already existing type.

Example: Subtypes in Pascal.

type ShoeSi zeType = 35. . 46;var ShoeSi ze : ShoeSi zeType;

With these declarations in place, we cannot assign any value outside the range to the ShoeSi zevariable.

Any such attempt would be caught by the typing system and cause the program to terminate with an error. [Inprogramming languages supporting exceptions, this could be handled more graciously without terminating the

program.]

Realize, the enumeration and subtype definitions do not allow the programmer to specify operations on the newly

defined data type.

Abstract Data Types

The defining characteristics of an abstract data type(ADT) is that nothing outside of the definition of the data

structure and the algorithms operating on it should refer to anything inside, except through function and procedure

calls for the fundamental operations.

Example: Stack implementation in Pascal,

type

l i nk = node; node = record key: integer; next : l i nk end;var head, z: l i nk;

procedure stacki ni t ;begin

new( head) ; new( z) ; head . next : = z; z . next : = zend;

procedure push( v: integer) ;var t : l i nk;

begin new( t ) ; t . key : = v; t . next : = head . next ; head . next : = tend;

function pop: integer;var t : l i nk;

begin

t : = head . next ;

pop : = t . key; head . next : = t . next ; di spose( t )end;


f 30 20-11-2014 13:00


26/30

function peek: integer;begin

peek : = t . key;end

function st ackempt y: boolean;begin

st ackempty : = ( head . next = z)

end;

The above Pascal fragment is unfortunately not an ideal implementation of an ADT. Fields of the record definition is

open to manipulation; there is no language construct that prohibits changing the underlying structure directly by

changing its components. It is also possible for the programmer to use any subprogram, be it one meant for export or

one meant for implementing some auxiliary functionality. A mechanism to provide controlled access to subprograms

and record fields, such as provided with access specifiers in object-oriented programming languages, is missing.

Organization of the related subprograms into a compilation unit is not enforced by the language. One could add

subprograms that are irrelevant to the ADT being implemented. There is no compiler-reinforced rule relating the

pre-existing and/or newly added subprograms to the ADT in question. The best we can do is to stick to conventionsfor better organizing our programs.

One other weakness, one that can be remedied very easily, of the preceding fragment is the fact that we cannot

create more than one stack.

What we need is something like the module construct found in modular programming languages or the class construct

of object-oriented programming languages.

Example: Stack implementation in Ada83.

PACKAGE St ack_Package I S TYPE Stack_Type I S PRIVATE; PROCEDURE I ni t ( St ack: IN OUT St ack_Type) ; PROCEDURE Push ( St ack: IN OUT St ack_Type; I t em: IN I nt eger ) ; FUNCTION Pop ( St ack: IN OUT St ack_Type) RETURN I nt eger ; FUNCTION Peek ( St ack: IN St ack_Type) RETURN I nt eger ; FUNCTION Empt y ( St ack: IN St ack_Type) RETURN Bool ean;PRIVATE

St ack_Si ze: CONSTANT I nt eger : = 10; TYPE I nt eger _Li st _Type I SARRAY ( 1. . St ack_Si ze) OF I nt eger ; TYPE Stack_Type I S RECORD

Top: I nt eger RANGE 0. . St ack_Si ze; El ement s: I nt eger _Li st _Type; END RECORD;END Stack_Package;

PACKAGE BODY St ack_Package I S PROCEDURE I ni t ( St ack: IN OUT St ack_Type) I S BEGI N St ack. Top : = 0; END I ni t ;

PROCEDURE Push ( St ack: IN OUT Stack_Type; I t em: IN I nt eger ) I SBEGI N Stack. Top : = Stack. Top + 1; St ack. El ement s ( St ack. Top) : = I t em;


f 30 20-11-2014 13:00


27/30

END Push;

FUNCTION Pop ( St ack: IN OUT St ack_Type) RETURN I nt eger I S I t em: I nt eger : = St ack. El ement s ( St ack. Top) ;BEGI N St ack. Top : = St ack. Top 1; RETURN I t em;END Pop;

FUNCTION Peek ( St ack: IN St ack_Type) RETURN I nt eger I SBEGI N RETURN St ack. El ement s( St ack. Top) ;END Peek;

FUNCTION Empt y ( St ack: IN St ack_Type) RETURN Bool ean I SBEGI N I F St ack. Top = 0THEN RETURN True; ELSE RETURN Fal se;END Empt y;

END Stack_Package;

Using this implementation, a procedure reading ten integers and printing them in reverse order can be implemented

as given below.

WITH Basi c_I O, St ack_Package;

PROCEDURE Rever se_I nt eger s I S x: I nt eger ; St : St ack_Package. St ack_Type;BEGI N

Stack_Package. I ni t ( St ) ; FOR i IN 1. . 10 LOOP Basi c_I O. Get ( x) ; St ack_Package. Push( St , x) ; END LOOP;

FOR i IN 1. . 10 LOOP x : = Stack_Package. Pop( St ) ; Basi c_I O. Put ( x) ; Basi c_I O. New_Li ne; END LOOP;

END Rever se_I nt eger s;

The above implementation is a lot better than the previous one: The only way a stack can be manipulated is through

the operations defined on it. The representation of the ADT is declared to belong to the private part of the package.

This certainly is a great improvement over the first example. But, it still requires the user to perform creation and

initialization in two separate steps, a rather error-prone process. What if we forget to call the initialization routine?[7]

One other problem with this approach is its lack of adaptability. Once it has been defined, the user cannot alter its

behavior except for by modifying its definition. This may at times be very restrictive. Consider defining a shape type

in a graphics system. Although shapes may differ in many aspects, there are certain operations that can be applied to

all shapes, such as drawing, rotating, ttanslating, and etc. At first we might come up with a central routine where wecall the appropriate drawing routine of the specific shape type. An implementation, which is typical of non-object-

oriented programming languages, would look like as follows:


f 30 20-11-2014 13:00


28/30

enum ki nd { ci rcl e, t r i angl e, rect angl e };

cl ass Shape { / / at t r i but es common t o al l shapes ki nd k; . . .publ i c:

/ / common i nt er f ace of al l shapes voi d dr aw( voi d) ; voi d rotate( i nt degree) ; . . .};

voi d Shape: : dr aw( voi d) { swi t ch( k) { case ci r cl e: / / dr aw t he ci r cl e break;

case tr i angl e: / / dr aw t he t r i angl e break; case r ectangl e: / / dr aw t he r ect angl e break; def aul t : error ( . . . ) ; } / / end of swi t ch( k)} / / end of voi d Shape: : dr aw( voi d)

voi d Shape: : rotate( i nt howManyDegr ees) {

swi t ch( k) { case ci r cl e: / / rot at e t he ci rc l e break; case tr i angl e: / / rot at e t he t r i angl e break; case r ectangl e: / / r ot at e t he r ectangl e break; def aul t : error ( ) ;

} / / end of swi t ch( k)} / / end of voi d Shape: : r ot at e( i nt )

One weakness of the preceding solution is the requirement that the central function dr awand rotatemust knowabout all kinds of different shapes. If you define a new shape, every operation on a shape must be examined and

possibly modified. As a matter of fact, you cannot even add a new shape to the system unless you have access to the

source code. And generally, you do not have such a privilege when you are a mere user of the class. So, the

implementer faces a dilemma: either ship the implementation with missing shapes or indefinitely postpone shipping it

until you make sure you have an exhaustive list of possible shapes.[8]Ideally, you would like to ship your software as

early as you can and as fully functional as possible. This is referred to as the open-closed principle. But, how can

you possibly provide an exhaustive functionality when you have limited time and you dont have much idea aboutthe alternatives? The answer is to get the user to extend your software and the keyword is inheritance. [9]In C++,

you would provide the following solution:


f 30 20-11-2014 13:00


29/30

cl ass Shape { / / at t r i but es common t o al l shapes. No at t r i but es f or hol di ng t he ki nd! . . .publ i c: / / i nt er f ace common t o al l shapes vi r t ual voi d dr aw( voi d) ; vi r t ual voi d rotate( i nt degr ee) ;

. . .} / / end of cl ass Shape

cl ass Ci rc l e : publ i c Shape { / / at t r i but es speci al t o Ci rcl e. . . .publ i c: voi d dr aw( voi d) { /* circle-specific implementation of draw */ } voi d rotate( i nt degree) { /* circle-specific impl. of rotate */ } . . .} / / end of cl ass Ci rcl e

cl ass Tri angl e : publ i c Shape { / / at t r i but es speci al t o Tr i angl e. . . .publ i c: voi d dr aw( voi d) { /* triangle-specific implementation of draw */ } voi d rotate( i nt degree) { /* triangle-specific impl. of rotate */ } . . .} / / end of cl ass Tr i angl e

cl ass NewShape : publ i c Shape {

/ / at t r i but es speci al t o t he new shape. . . .publ i c: voi d dr aw( voi d) { /* new shape-specific implementation of draw */ } voi d rotate( i nt degree) { /* new shape-specific impl. of rotate */ } . . .} / / end of cl ass NewShape

In addition to the aforementioned advantages, it is the compiler that controls the whole process, whereas in the

previous case it was the user doing the all bookkeeping stuff.

Notes

Although it is a language conceived in early 70's, bool has been added to C in 1999. Therefore, you willprobably not see it being used very frequently. Instead, you will see conventional uses of integer values

through macros and t ypedefs.

1.

A character taking up two bytes in memory does not mean that it will be serialized as a sequence of two

bytes. that is, it will probably not consume two bytes of disk space or will not be transmitted in two bytes over

the network. Different encoding techniques may be used. A commonly used technique is the UTF-8, which

assumes that the most commonly used characters are those found in the ASCII subset of the Unicode standard.

In this scheme, the following mappings are used to encode individual characters: 0000 0000 0xxx xxxx

0xxx xxxx; 0000 0xxx xxxx xxxx 110x xxxx 10xx xxxx; xxxx xxxx xxxx xxxx 1110 xxxx 10xx xxxx10xx xxxx

2.

Automation is a Microsoft technology that allows you to take advantage of an existing programs content and

functionality, and to incorporate it into your own applications. A typical example is the automation of MS

Office applications (automation objects) by VBA (automation controller).

3.


f 30 20-11-2014 13:00


30/30

The formulas given here are made slightly more complex to avoid having a 0th component. Normally, the

compiler implementer would not add 1, just to subtract it in the next computation.

4.

This term admittedly invokes the image of a two-dimensional array. However, it covers arrays of any rank.5.

This is not true for components of the same sub-array. That is, this is not true for components of the same

sub-array. numArr [ 0] and numAr r [ 1] , numAr r [ 1] [ 2] and numArr [ 1] [ 3] , and so on are guaranteedto reside in contiguous locations. Otherwise would have ruled out random access. However, numArr [ 0] [ 2]and numAr r [ 1] [ 0] , which are neighboring components living in different sub-arrays, cannot be guaranteedto occupy adjacent locations.

6.

As a matter of fact, this is exactly why constructors are provided in object-oriented programming languages.Similarly, object-oriented programming languages without automatic garbage collection provide a destructor

routine for each class.

7.

A third option would be giving out the source code of the class, which would spell disaster for a software

company: bankruptcy.

8.

Note that inheritance is not the only way to extend your software. Composition can be used as an alternative

to inheritance.

9.

Retrieved from "http://en.wikibooks.org/w/index.php?title=Programming_Language_Concepts_Using_C_and_C

%2B%2B/Data_Level_Structure&oldid=2717092"

This page was last modified on 23 October 2014, at 20:47.

Text is available under the Creative Commons Attribution-ShareAlike License.; additional terms may apply.

By using this site, you agree to the Terms of Use and Privacy Policy.


2 programming language concepts using c and c++_data level structure - wikibooks, open books for an...

Documents