lecture 13 - review

107
1 Lecture 13 - Review Lecture 13 - Review

Upload: gregory-mccall

Post on 02-Jan-2016

32 views

Category:

Documents


1 download

DESCRIPTION

Lecture 13 - Review. Review. L ecture 1 - Address Map - Global vs Local. Pointer. int var (var is a variable and occupies 4 bytes ) int *var (*var is a pointer that points to an integer). Example. Example 1 *. Pointer to an integer. int var = 108; int *varpointer; - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 13 - Review

1

Lecture 13 - ReviewLecture 13 - Review

Page 2: Lecture 13 - Review

22

Review Review

Page 3: Lecture 13 - Review

33

LLecture 1 - Address Map - Global vs ecture 1 - Address Map - Global vs LocalLocal

Page 4: Lecture 13 - Review

44

Pointer Pointer

int var (var is a variable and occupies 4 bytes )

int *var (*var is a pointer that points to an integer)

Page 5: Lecture 13 - Review

55

ExampleExample

Page 6: Lecture 13 - Review

66

Example 1 *Example 1 *

int var = 108;

int *varpointer;

&varpt = var; (*varpt = 123)

Pointer to an integer

Page 7: Lecture 13 - Review

77

Example 2Example 2Point to location

0x0012FF7C Value is

0x61 = ‘a’

Page 8: Lecture 13 - Review

88

Array and Pointer * Array and Pointer *

char a[10]=“1234567890”;a[0] =“1”;a[1] =“2”;char *ptr;ptr = &a; (or just a in C);*ptr -> 1 (a[0])*(ptr + 1) ->2 (a[1])

array[i] * (array + i) &array[i] array + i array[i + j] * (array + i + j) &array[i + j] array + i + j

Same result,

different expression

Page 9: Lecture 13 - Review

99

Multi-Dimensional Arrays – pointer of Multi-Dimensional Arrays – pointer of pointerpointer

array2[7][10] or **array2

Page 10: Lecture 13 - Review

1010

Naughty PointersNaughty Pointers

The value pointed by pointer is modified.

It will destroy the program and is not recommended.

However, if you can master pointer, you can write a very elegant program.

Page 11: Lecture 13 - Review

1111

SummarySummary

Integer : 4 bytes such as int a = 3;

0x0065FDF1:03

0x0065FDF2:00

0x0065FDF3:00

0x0065FDF4:00

You have to rotate the data 0x00000003

Page 12: Lecture 13 - Review

1212

SummarySummary

Short: two bytes

Short a = 3;

0x0065FDF3: 03

0x0065FDF4: 00

Short b = 4;

0x0065FDF0: 04

0x0065FDF1: 00

Not used

As short uses 2 bytes, remaining two bytes in memory 0x0065FDF2

(0xCC) and 0x0065FDF1 (0xCC) are not used

Page 13: Lecture 13 - Review

1313

Lecture 2 Lecture 2

Page 14: Lecture 13 - Review

1414

Example – abc Example – abc program nameprogram name

#include <stdio.h> int first; int second;

void callee ( int first ) { int second;

second = 1; first = 2; printf("callee: first = %d second = %d\n", first, second); }

int main (int argc, char *argv[]) { first = 1; second = 2; callee(first); printf("caller: first = %d second = %d\n", first, second); return 0; }

DOS>abc 12 34

Here, argc = 3,argv[0] = abcargv[1]= 12argv[2] =34

Same variable “second”, but different memory

location

Page 15: Lecture 13 - Review

1515

Example (passed by pointer) *Example (passed by pointer) *

void callee ( int * first ) //not a variable, but an address { int second; second = 1; *first = 2; printf("callee: first = %d second = %d\n", *first, second); } int main (int argc, char *argv[]) { first = 1; second = 2; callee(&first); //passed by address --- printf("caller: first = %d second = %d\n", first, second); return 0; }

Content by address

Page 16: Lecture 13 - Review

1616

Diagram - stack push (create)Diagram - stack push (create)

Page 17: Lecture 13 - Review

1717

Diagram - stack pop (return)Diagram - stack pop (return)

Page 18: Lecture 13 - Review

1818

The CPU also Has MemoryThe CPU also Has Memory

The CPU also maintains its own banks of memory called registers.

They temporarily hold the data

As a result, the program is faster.register

Cache memory

Main memory

Disk

Memoryhierarch

y

Page 19: Lecture 13 - Review

1919

Lecture 3 - attentionLecture 3 - attention

Page 20: Lecture 13 - Review

2020

Bit OperationsBit Operations

AND &

OR |

ONE'S COMPLEMENT ~

EXCLUSIVE OR ^

SHIFT (right) >>

SHIFT (left) <<

Page 21: Lecture 13 - Review

2121

Operation - examplesOperation - examples

AND 1 & 1 = 1; 1& 0 = 0

OR 1 |1 = 1; 1| 0 = 1; 0|0 = 0

~ 0 =~1; 1 =~0;

^ 0^ 0 = 0; 1^1 = 0; 1^0 =1; 0^1 = 1

>> 0x010 = 0x001 <<1

<< 0x001 = 0x010 >>1

Page 22: Lecture 13 - Review

2222

One’s complementOne’s complement

1111 0010 (0xf2)

-------------- ~0000 1101 (0x0d)

char c = 0xf2;char e = ~c; //e is 0x0d

Page 23: Lecture 13 - Review

2323

EXCLUSIVE OREXCLUSIVE OR

1111 0010 (0xf2)1111 1110 (0xfe)-------------- (^) 0000 1100 (0x0c)

char c = 0xf2;char d = 0xfe;char e = c ^ d; //e is 0x0c

Page 24: Lecture 13 - Review

2424

SHIFT >> (right) by one bitSHIFT >> (right) by one bit

1111 0010 (0xf2)>> 1 (shift right by one bit)---------------------

0111 10001 (0x79)

char c = 0xf2;char e = c >>1; //e is 0x79

Page 25: Lecture 13 - Review

2525

SHIFT << (left) by one bitSHIFT << (left) by one bit

1111 0010 (0xf2)<< 1 (shift right by one bit)---------------------

1110 0100 (0xe4)

char c = 0xf2;char e = c <<1; //e is 0xe4

Page 26: Lecture 13 - Review

2626

SHIFT << by two bitsSHIFT << by two bits

1111 0010 (0xf2)>> 2 (shift right by one bit)---------------------

1100 1000 (0xc8)

char c = 0xf2;char e = c <<2; //e is 0xc8

Page 27: Lecture 13 - Review

2727

Lecture 4Lecture 4

Page 28: Lecture 13 - Review

2828

ExpressionExpression

1 bit sign bit, 8 bit exponent, and 23 bit Mantissa (total 32 bits)

-1^Sign * 2^(Exponent - 127) * (1 + Mantissa * 2^-23)

Zero, sign bit is 0, Negative, sign bit is 1

Exponent is unsigned, minus 127. That is if the value is 128, it means 128 – 127 = 1, if the value is 256, it means 256 – 127 = 128, or the value is zero, it means 0 – 127 = -127.

Page 29: Lecture 13 - Review

2929

Example Example

Page 30: Lecture 13 - Review

3030

ExampleExample

2.5 (floating point)

0100 0000 0010 0000 0000 0000 0000 0000

Sign: positive (1)

Exponent : 1000 0000 : 128 (128 – 127 = 1)

Mantissa: 1. 010 0000 0000 0000 0000 0000, 1.25

Result 1 x 1.25 x 2^1 = 2.5

Page 31: Lecture 13 - Review

3131

StringString

Is an array of character and is terminated by a null character (0x00)

char a[4] = “Hi?”;

a[0] = H;

a[1] = I;

a[2] =?;

a[3] = 0x00

Incorrect declaration: char char[3] = “Hi?”,

as 0x00 is missing

Page 32: Lecture 13 - Review

3232

An exampleAn example

struct {

char a, b, c, cc;

int i;

double d;

} mystruct;

Name is mystruct

Page 33: Lecture 13 - Review

3333

Lecture 5Lecture 5

Page 34: Lecture 13 - Review

3434

Static AllocationStatic Allocation

The word static (fix) refers to things that happen at compile time (compile) and link (link) time when the program is constructed.For example, you can define

char a[9] =“12345678”; //assign 9 bytes for array a

The compiler will assign 9 bytes during compilationLinker will assign the correct address for array aYou cannot change it even you think you need 10 bytes while running this program

Page 35: Lecture 13 - Review

3535

An exampleAn example

int my_var[128]; // a statically allocated variable static bool my_var_initialized = false; //static

declaration int my_fn(int x) { if (my_var_initialized) return; my_var_initialized = true; for (int i = 0; i < 128; i++) my_var[i] = 0; }

Initially, it

is false

Page 36: Lecture 13 - Review

3636

Dynamic allocationDynamic allocation

Limitations of Static AllocationIf two procedures use a local variable named i, there will be a conflict if both i's are globally visible. If i is only declared once, then i will be shared by the two procedures. One might call the other, even indirectly, and cause i to be overwritten unexpectedly. It would be better if each procedure could have its own copy of i.

Page 37: Lecture 13 - Review

3737

Grab memoryGrab memory

To grab memory, we have to use malloc(size). For exampleptr = malloc(4) will return a pointer with memory size of 4 bytesptr = malloc(4*int) will return a pointer with 16 bytes = 4 x 4 (integer) = 16 bytesmalloc(4*long) will return a pointer with 16 bytes = 4 x 4 (long) = 16 bytesfree(ptr), free this pointer to the memory

Page 38: Lecture 13 - Review

3838

Fragmentation – holes Fragmentation – holes

Although it has memory

Page 39: Lecture 13 - Review

3939

Example of First fitExample of First fit

Page 40: Lecture 13 - Review

4040

Example of Best fitExample of Best fit

Page 41: Lecture 13 - Review

4141

Example of Worst fitExample of Worst fit

Page 42: Lecture 13 - Review

4242

Lecture 6Lecture 6

Page 43: Lecture 13 - Review

4343

Block sizes – the size to hold the data for users’ usageBlock sizes – the size to hold the data for users’ usage

The standard method for determining the size of a block, given a pointer to the block, is to store its size in the word before the pointer.

Here, the memory block that can be used is 16 bytes, the block size is 20 bytes including 4 bytes for the size

Only 16

bytes

Page 44: Lecture 13 - Review

4444

Determine the sizeDetermine the size

Note that it uses [-1] to point to location before the pointer (location that contains block size)As the size is a multiple of 4, it clears the lowest two bits (3 =0000 0000 0000 0011 (hex), ~3 = 1111 1111 1111 1100Free means the block can be used by user (binary 1)

size = ((int *) ptr)[-1]; // read integer before the memory block

correct_size = size & ~3; // clear the lower 2 bits

free = size & 1; // get low-order bit

Page 45: Lecture 13 - Review

4545

Splitting a Free BlockSplitting a Free Block

The heap is normally initialized to look like one giant free block. (40 bytes)

When allocations occur, it would be wasteful to return a large block of free memory when a small one would do just as well. (I need 8 bytes, no point to return 40 bytes)

Therefore, the memory allocator will typically split a block if the block size is larger than the requested size. (12 allocated and 28 free)

Page 46: Lecture 13 - Review

4646

Common bug in scanf Common bug in scanf

Note that you should supply the address rather than the variable

Use &i; instead of i

It is important in your exam.

int i;

double d;

scanf("%d %g", i, d); // wrong!!!

// here is the correct call:

scanf("%d %g", &i, &d);

Page 47: Lecture 13 - Review

4747

Overwriting MemoryOverwriting Memory

Here, i will be incremented from 0 to array_size, not array_size – 1;The solution is

; i < array_size; not; i <= array_size;

#define array_size 100

int *a = (int *) malloc(sizeof(int *) * array_size);

for (int i = 0; i <= array_size; i++)

a[i] = NULL;

Page 48: Lecture 13 - Review

4848

Memory bugMemory bug

Here, the memory allocated is 100 bytes, not 400 bytes and a[] is defined as array pointer

The solution is:

int *a = (int *) malloc( array_size* sizeof(int));

#define array_size 100

int *a = (int *) malloc(array_size);

a[99] = 0; // this overwrites memory beyond the block

Page 49: Lecture 13 - Review

4949

String must be terminated by 0x00String must be terminated by 0x00

String must be terminated by 0x00;

The solution is:

char *new_s = (char *) malloc(len + 1);

char *heapify_string(char *s)

{ int len = strlen(s);

char *new_s = (char *) malloc(len);

strcpy(new_s, s);

return new_s;

}

By 0x00

Page 50: Lecture 13 - Review

5050

Memory leaksMemory leaks

The memory that is on longer used is not returned to the memory pool.

The result is that the system will run out of memory.

The failure to deallocate (free) a block of memory when it is no longer needed is often called a memory leak

Do not return the memory block to

the pool

Page 51: Lecture 13 - Review

5151

Lecture 7Lecture 7

Page 52: Lecture 13 - Review

5252

ProcedureProcedure

A slow but correctProgram

Modify the programTo make it faster

Page 53: Lecture 13 - Review

5353

What to Measure (Wall clock)What to Measure (Wall clock)

An alternative is to measure real time or "wall clock time“This is the time an ordinary clock on the wall or a wrist watch shows.

The difference between CPU time and wall time can give some indication of the time spent waiting for I/O.

Wall time

CPU time

I/O time

Page 54: Lecture 13 - Review

5454

Principles - PerformancePrinciples - Performance

The 80/20 Rule – It means 80% of the CPU time is spent in 20% of the program.

In this case, you can have better performance by looking at this 20%.

Amdahl's Law – for parallel processing, the performance is limited by sequential part of the program.

Page 55: Lecture 13 - Review

5555

Example of 80/20: Example of 80/20: 10% on one means 2% as a whole10% on one means 2% as a whole

A module consists of 5 modules

20 ms

20 ms

20 ms

20 ms

20 ms

20 ms

18 ms

20 ms

20 ms

20 ms

Page 56: Lecture 13 - Review

5656

Example of 80/20: Example of 80/20: 10% on one means 5% as a whole10% on one means 5% as a whole

A module consists of 5 modules

10 ms

50 ms

10 ms

10 ms

10 ms

10 ms

45 ms

10 ms

10 ms

10 ms

Conclusion: focus on

module with more CPU time

Page 57: Lecture 13 - Review

5757

Lecture 8Lecture 8

Page 58: Lecture 13 - Review

5858

Coding for Speed Coding for Speed http://http://www.abarnett.demon.co.uk/tutorial.htmlwww.abarnett.demon.co.uk/tutorial.html mainly from this web mainly from this web

sitesite

Array Indices Aliases Registers Integers Loop Jamming Dynamic Loop Unrolling Faster for() loops Switch Pointers Early loop breaking Misc Using array indices

There are many ways to speed up

the operation.

Page 59: Lecture 13 - Review

5959

Aliases (1)Aliases (1)

void func1( int *data ) {     int i; for(i=0; i<10; i++)     {           

somefunc2( *data, i);   } }

Not very good

Page 60: Lecture 13 - Review

6060

Aliases – better change to this Aliases – better change to this

void func1( int *data ){    

int i;     int localdata;     localdata = *data;     for(i=0; i<10; i++)     {           

somefunc2( localdata, i);     }

}

Better way

Page 61: Lecture 13 - Review

6161

Loop JammingLoop Jamming

Never use two loops where one will suffice: for(i=0; i<100; i++) {    

stuff(); } for(i=0; i<100; i++) {    

morestuff(); }

Better combine

them

Page 62: Lecture 13 - Review

6262

Early loop breakingEarly loop breaking

This loop searches a list of 10000 numbers to see if there is a -99 in it. found = FALSE; for(i=0;i<10000;i++) {     if( list[i] == -99 )     {         found = TRUE;     } } if( found ) printf("Yes, there is a -99. Hooray!\n"); This works well but searches the whole list.

Page 63: Lecture 13 - Review

6363

Early loop breakingEarly loop breaking

A better way is to abort the search when it is found.

found = FALSE; for(i=0; i<10000; i++) {     if( list[i] == -99 )     {         found = TRUE;         break;     } } if( found ) printf("Yes, there is a -99. Hooray!\n");

Page 64: Lecture 13 - Review

6464

Lecture 9Lecture 9

Page 65: Lecture 13 - Review

6565

Memory and CPUMemory and CPU

Program here

Cache and register

here

Page 66: Lecture 13 - Review

6666

Memory hierarchiesMemory hierarchies

Within CPU

Page 67: Lecture 13 - Review

6767

common memory technologiescommon memory technologies

Static Random Access Memory (SRAM) Dynamic Random Access Memory (DRAM)

Magnetic disks Magnetic tapes Optical disks

Page 68: Lecture 13 - Review

6868

SpeedSpeed

Page 69: Lecture 13 - Review

6969

Size and CostSize and Cost

Page 70: Lecture 13 - Review

7070

Principle of LocalityPrinciple of Locality

references to a single address occur close together in time

like int i, j; (like i and j)

(this is called temporal locality).

references to addresses that are near to each other occur together in time

Like it calls i and then j later

(this is called spatial locality).

Page 71: Lecture 13 - Review

7171

Principle of localityPrinciple of locality

The principle of locality of reference is not an assurance, but rather a conjecture. (means GUESS)

Empirically, however, there is little doubt that programs behave according to this principle.

Think about it: If you need to

use the same variable i later, it is better to keep this in the cache. Not to release to the memory.

Page 72: Lecture 13 - Review

7272

Graph showing CPU, DRAM & Graph showing CPU, DRAM & SRAMSRAM

Page 73: Lecture 13 - Review

7373

Four-level hierarchyFour-level hierarchy

Page 74: Lecture 13 - Review

7474

Lecture 10Lecture 10

Page 75: Lecture 13 - Review

7575

ExampleExample

/* Assumes n is a power of two */ void merge_sort (int * data, int n) {

int half = n >> 1; if (n == 1) return; binary_sort(data, half); binary_sort(data + half, half); merge(data, data + half, half); }

// no need to memorise

Page 76: Lecture 13 - Review

7676

Graph of Merge SortGraph of Merge Sort

the access times in nanoseconds (ns) for the L1 cache (T1), L2 cache (T2), L3 cache (T3), and main memory (Tm).

Page 77: Lecture 13 - Review

7777

Looking at the CachesLooking at the Caches

We can deduce many things about the cache design of a particular computer by carefully examining its memory performance.

We can design a benchmark program whose locality we control.

int data[MAXSIZE]; for (i = 0; i < repeat; i++) { for (i = 0; i < N; i++) { dummy = data[i]; } }

Page 78: Lecture 13 - Review

7878

Control the spatial localityControl the spatial locality

Here, stride controls the amount of spatial locality

int data[MAXSIZE]; for (i = 0; i < repeat; i++) { for (i = 0; i < N; i += stride) { dummy = data[i]; } }

Page 79: Lecture 13 - Review

7979

Graph showing the effectGraph showing the effect

Page 80: Lecture 13 - Review

8080

Example of a matrixExample of a matrix

int data[M][N];

for (i = 0 ; i < N; i++) {

for (j = 0; j < M; j++) {

sum += data[j][i];

}

}

Page 81: Lecture 13 - Review

8181

Changing the order of the iterations is not always better. Below is Changing the order of the iterations is not always better. Below is an example.an example.

int original[M][N];

int transposed[N][M];

for (i = 0; i < M; i++) {

for (j = 0; j < N; j++) {

transposed[i][j] = original[j][i];

}

}

Page 82: Lecture 13 - Review

8282

Insufficient Temporal LocalityInsufficient Temporal Locality

int original[M][N]; int transposed[N][M];

for (k = 0; k < M / m; k++) { for (l = 0; l < N / n; k++) { for (i = k*m; i < (k+1)*m; i++) { for (j = l*n; j < (l+1)*n; j++) { transposed[i][j] = original[j][i]; } } } }

Page 83: Lecture 13 - Review

8383

Lecture 11Lecture 11

Page 84: Lecture 13 - Review

8484

Example of a matrixExample of a matrix

int data[M][N];

for (i = 0 ; i < N; i++) {

for (j = 0; j < M; j++) {

sum += data[j][i];

}

}

This is a

MxN matrix

Page 85: Lecture 13 - Review

8585

Row-major and Column-majorRow-major and Column-major

Row major – sequence of access

data

Column major

Page 86: Lecture 13 - Review

8686

Accessing a column-majorAccessing a column-major

Page 87: Lecture 13 - Review

8787

Accessing row dataAccessing row data

It will be faster, as it accesses [0,0], [0,1][0,2] which will be loaded into cache line after reading [00] up to [13], as the data is already in memory in this sequence

Row major is faster than column major

Page 88: Lecture 13 - Review

8888

Segment address translationSegment address translation

DiskMemory

Page 89: Lecture 13 - Review

8989

PagingPaging

the allocation of memory into chunks of varying size causes external fragmentation.

To solve this problem we can change the nature of the address translation so that, instead of mapping virtual to physical address in big chunks of varying size, it maps them in small chunks of constant size,

Page 90: Lecture 13 - Review

9090

PagingPaging

Page 91: Lecture 13 - Review

9191

Impact of VM on PerformanceImpact of VM on Performance

int data[M][N]; for (i = 0 ; i < N; i++){ for (j = 0; j < M; j++){ sum += data[j][i]; } } //column major – more page fault

Page 92: Lecture 13 - Review

9292

Impact of VM on PerformanceImpact of VM on Performance

int data[M][N]; for (j = 0 ; j < N; j++){ for (i = 0; i < M; i++){ sum += data[j][i]; } } //row major – less page fault

Page 93: Lecture 13 - Review

9393

Example of Context SwitchingExample of Context Switching

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Page 94: Lecture 13 - Review

9494

Process stateProcess state

Here, there are three states for one process. Running means it uses the CPU, ready means it is ready to use the CPU,while suspended means it is waiting for an I/O.

Page 95: Lecture 13 - Review

9595

Non-Preemptive processNon-Preemptive process

Must finish before CPU can switch to others, say you have three processes, P1, P2, P3

Page 96: Lecture 13 - Review

9696

Preemptive processPreemptive process

CPU can switch without finishing the process

Page 97: Lecture 13 - Review

97

Lecture 12 Lecture 12

Network Programming

Page 98: Lecture 13 - Review

9898

ReviewReview

Client Server Programming Model

Networks

Global IP Internet

Socket Interface

Web Servers

Page 99: Lecture 13 - Review

9999

Client Server Programming ModelClient Server Programming Model

1. When a client needs service, it initiates a transaction2. The server receives the request, interprets and

manipulates3. The server sends a response to the client and waits for the

next request4. The client receives the response and manipulates it.

resourceserver

processClient

process

Page 100: Lecture 13 - Review

100100

Hardware and Software OrganisationsHardware and Software Organisations

Client

TCP/IP

Network Adaptor

Client

TCP/IP

Network Adaptor

Page 101: Lecture 13 - Review

101101

Internet Domain NamesInternet Domain Names

U nnam ed ro o t

m il ed u go v c o m

c ityu c uhk

D C O

/* DNS entry structure */Struct hostnet {char *h_name; /* official domain name of host */char **h_aliases /* null-terminated array of domain name */

int h_addrtype; /* host address */Int h_length; /* length of an address in bytes */char **h_addr_list; /*null terminated array of in_addr structs */

};

Page 102: Lecture 13 - Review

102102

Internet ConnectionInternet Connection

Internet Clients and severs communicate by sending and receiving streams of bytes over connection.

A connection is point-to-point.

A connection is full duplex in the sense that data can flow in both directions.

A socket is an end point connection.

Page 103: Lecture 13 - Review

103103

Socket ConnectionSocket Connection

Client Server

Each socket has a corresponding socket address that consists of IP address and 16-bit integer port. It is denoted by address: port (such as address:port 121.2.3.4:12345)

Page 104: Lecture 13 - Review

104104

Socket InterfaceSocket Interface

Socket

Connect

Rio_written

Rio_readlineb

close

Socket

accept

Rio_readlineb

Rio_written

close

listen

bind

Rio_readlineb

Page 105: Lecture 13 - Review

105105

Socket FunctionSocket Function

/* listen function

#include <sys/socket.h>

int listen (int sockfd, int backlog); /* return -1 on Unix error, -2 on DNS error */

/* accept function */

#include <sys/socket.h>

int accept (int listenfd, struct sockaddr, *addr, int *addrlen); /* return -1 on Unix error, -2 on DNS error */

Page 106: Lecture 13 - Review

106106

Role of The listening and Connected DescriptorsRole of The listening and Connected Descriptors

Client Server

Client Server

Client Server

Page 107: Lecture 13 - Review

107107

Last PageLast Page