1 code optimization. 2 outline machine-independent optimization code motion memory optimization...

35
1 Code Optimization

Upload: gloria-joseph

Post on 18-Jan-2018

234 views

Category:

Documents


0 download

DESCRIPTION

3 Motivation Constant factors matter too! –easily see 10:1 performance range depending on how code is written –must optimize at multiple levels algorithm, data representations, procedures, and loops

TRANSCRIPT

Page 1: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

1

Code Optimization

Page 2: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

2

Outline

• Machine-Independent Optimization– Code motion– Memory optimization

• Suggested reading– 5.2 ~ 5.6

Page 3: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

3

Motivation

• Constant factors matter too!– easily see 10:1 performance range depending on

how code is written– must optimize at multiple levels

• algorithm, data representations, procedures, and loops

Page 4: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

4

Motivation

• Must understand system to optimize performance– how programs are compiled and executed– how to measure program performance and

identify bottlenecks– how to improve performance without destroying

code modularity and generality

Page 5: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

5

5.3 Program Example

Page 6: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

6

Vector ADT P384

typedef struct { int len ;

data_t *data ; } vec_rec, *vec_ptr ; typedef int data_t ;

length

data

0 1 2 length–1

Figure 5.3 P385

Page 7: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

7

Procedures P386

• vec_ptr new_vec(int len)– Create vector of specified length

• data_t *get_vec_start(vec_ptr v)– Return pointer to start of vector data

P392

P386

Page 8: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

8

Procedures

• int get_vec_element(vec_ptr v, int index, int *dest)– Retrieve vector element, store at *dest– Return 0 if out of bounds, 1 if successful

• Similar to array implementations in Pascal, Java– E.g., always do bounds checking

Page 9: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

9

Vector ADT

vec_ptr new_vec(int len) {

/* allocate header structure */vec_ptr result = (vec_ptr) malloc(sizeof(vec_rec)) ;if ( !result )

return NULL ;

Page 10: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

10

Vector ADT

/* allocate array */if ( len > 0 ) {

data_t *data = (data_t *)calloc(len, sizeof(data_t)) ;

if ( !data ) { free( (void *)result ) ; return NULL ; /* couldn’t allocte stroage

*/}result->data = data

} else result->data = NULL

return result ;}

Page 11: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

11

Vector ADT

/** Retrieve vector element and store at dest.* Return 0 (out of bounds) or 1 (successful)*/ int get_vec_element(vec_ptr v, int index, data_t *dest)

{ if ( index < 0 || index >= v->len)

return 0 ;*dest = v->data[index] ;return 1;

}

Page 12: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

12

Vector ADT

/* Return length of vector */ int vec_length(vec_ptr) {return v->len ;

}

/* Return pointer to start of vector data */data_t *get_vec_start(vec_ptr v){return v->data ;

}

P392

Page 13: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

13

Optimization Example P385

#ifdef ADD #define IDENT 0#define OPER +

#else#define IDENT 1#define OPER *

#endif

Page 14: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

14

Optimization Example P387

void combine1(vec_ptr v, data_t *dest){ int i; *dest = IDENT; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest = *dest OPER val; }}

Page 15: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

15

Optimization Example

• Procedure– Compute sum (product) of all elements of

vector– Store result at destination location

Page 16: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

16

5.2 Expressing Program Performance

Page 17: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

17

Time Scales P382

• Absolute Time– Typically use nanoseconds

• 10–9 seconds

– Time scale of computer instructions

Page 18: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

18

Time Scales

• Clock Cycles– Most computers controlled by high frequency

clock signal– Typical Range

• 100 MHz – 108 cycles per second– Clock period = 10ns

• 2 GHz – 2 X 109 cycles per second– Clock period = 0.5ns

Page 19: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

19

CPE P383

1 void vsum1(int n)2 {3 int i;45 for (i = 0; i < n; i++)6 c[i] = a[i] + b[i];7 }8

Page 20: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

20

CPE P383

9 /* Sum vector of n elements (n must be even) */10 void vsum2(int n)11 {12 int i;1314 for (i = 0; i < n; i+=2) {15 /* Compute two elements per iteration */16 c[i] = a[i] + b[i];17 c[i+1] = a[i+1] + b[i+1];18 }19 }

Page 21: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

21

Cycles Per Element

• Convenient way to express performance of program that operators on vectors or lists

• Length = n• T = CPE*n + Overhead

Page 22: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

22

Cycles Per Element Figure 5.2 P383

0

100

200

300

400

500

600

700

800

900

1000

0 50 100 150 200

Elements

Cyc

les

vsum1Slope = 4.0

vsum2Slope = 3.5

Page 23: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

23

5.3 Program Example5.4 Eliminating Loop Inefficiencies5.5 Reducing Procedure Calls5.6 Eliminating Unneeded Memory References

Page 24: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

24

Time Scales P387

void combine1(vec_ptr v, int *dest){ int i; *dest = 0; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest += val; }}

5.3 Program Example

Page 25: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

25

Time Scales P385

• Procedure– Compute sum of all elements of integer vector– Store result at destination location– Vector data structure and operations defined

via abstract data type• Pentium II/III Performance: CPE

– 42.06 (Compiled -g) 31.25 (Compiled -O2)

Page 26: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

26

Understanding Loop

void combine1-goto(vec_ptr v, int *dest){ int i = 0; int val; *dest = 0; if (i >= vec_length(v)) goto done; loop: get_vec_element(v, i, &val); *dest += val; i++; if (i < vec_length(v)) goto loop done:}

1 iteration

Page 27: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

27

Inefficiency

• Procedure vec_length called every iteration• Even though result always the same

5.4 Eliminating Loop Inefficiencies

Page 28: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

28

Code Motion P388

void combine2(vec_ptr v, int *dest){ int i; int length = vec_length(v); *dest = 0; for (i = 0; i < length; i++) { int val; get_vec_element(v, i, &val); *dest += val; }}

Page 29: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

29

Code Motion P388

• Optimization– Move call to vec_length out of inner loop

• Value does not change from one iteration to next• Code motion

– CPE: 22.61 (Compiled -O2)• vec_length requires only constant time, but

significant overhead

Page 30: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

30

Reduction in Strength P392

void combine3(vec_ptr v, int *dest){ int i; int length = vec_length(v); int *data = get_vec_start(v); *dest = 0; for (i = 0; i < length; i++) { *dest += data[i];}

5.5 Reducing Procedure Calls

Page 31: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

31

Reduction in Strength

• Optimization– Avoid procedure call to retrieve each vector

element• Get pointer to start of array before loop• Within loop just do pointer reference• Not as clean in terms of data abstraction

– CPE: 6.00 (Compiled -O2)• Procedure calls are expensive!• Bounds checking is expensive

Page 32: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

32

Eliminate Unneeded Memory References P394void combine4(vec_ptr v, int *dest){ int i; int length = vec_length(v); int *data = get_vec_start(v); int sum = 0; for (i = 0; i < length; i++) sum += data[i]; *dest = sum;}

5.6 Eliminating Unneeded Memory References

Page 33: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

33

Eliminate Unneeded Memory References

• Optimization– Don’t need to store in destination until end– Local variable sum held in register– Avoids 1 memory read, 1 memory write per

cycle– CPE: 2.00 (Compiled -O2)

• Memory references are expensive!

Page 34: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

34

Detecting Unneeded Memory References

.L18:movl (%ecx,%edx,4),%eaxaddl %eax,(%edi)

incl %edxcmpl %esi,%edxjl .L18

.L24:addl (%eax,%edx,4),%ecx

incl %edxcmpl %esi,%edxjl .L24

Combine3 Combine4

Page 35: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6

35

Detecting Unneeded Memory References

• Performance– Combine3

• 5 instructions in 6 clock cycles• addl must read and write memory

– Combine4• 4 instructions in 2 clock cyles