it d ti tintroduction to code optimizationcode optimization will deal with multicores next week...

72
Spring 2010 Spring 2010 It d ti t Introduction to Code Optimization Code Optimization Instruction Scheduling

Upload: duongcong

Post on 31-Mar-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Spring 2010Spring 2010

I t d ti tIntroduction to Code OptimizationCode Optimization

Instruction Scheduling

Page 2: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Outline

5

• Modern architectures • Introduction to instruction scheduling • List scheduling • List scheduling • Resource constraints

g• Scheduling across basic blocks • Trace scheduling

Saman Amarasinghe 2 6.035 ©MIT Fall 1998

Page 3: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Simple Machine Modelp

• Instructions are executed in sequenceq– Fetch, decode, execute, store results – One instruction at a time

• For branch instructions, start fetching from a different location if neededdifferent location if needed – Check branch condition

Next instruction may come from a new locationNext instruction may come from a new location given by the branch instruction

Saman Amarasinghe 3 6.035 ©MIT Fall 1998

Page 4: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Simple Execution Modelp

• 5 Stage pipe-lineg p p

fetch decode execute memory writeback

• Fetch: get the next instruction • Decode: figure-out what that instruction is • Execute: Perform ALU operationp

– address calculation in a memory op • Memory: Do the memory access in a mem. Op.

Saman Amarasinghe 4 6.035 ©MIT Fall 1998

Memory: Do the memory access in a mem. Op. • Write Back: write the results back

Page 5: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Simple Execution Modelptime

IF DE EXE MEM WB

IF DE EXE MEM WB

Inst 1 IF DE EXE MEM WB

Inst 2

IF DE EXE MEM WB

IF DE EXE MEM WB

Inst 1

Inst 2 IF DE EXE MEM WB

IF DE EXE MEM WB

Inst 3

I t 4

Saman Amarasinghe 5 6.035 ©MIT Fall 1998

IF DE EXE MEM WB

Inst 4 Inst 5

Page 6: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Outline

5

• Modern architectures • Introduction to instruction scheduling • List scheduling • List scheduling • Resource constraints

g• Scheduling across basic blocks • Trace scheduling

Saman Amarasinghe 6 6.035 ©MIT Fall 1998

Page 7: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

From a Simple Machine Model tto a RReal Machi hine MModdell M l

• Many pipeline stages – Pentium 5

– Pentium Pro 10

– Pentium IV (130nm) 20

– Pentium IV (90nm) 31

Core 2 Duo Core 2 Duo 1414 • Different instructions taking different amount of time

to executeto execute

•• Hardware to stall the pipeline if an instruction uses a Hardware to stall the pipeline if an instruction uses a result that is not ready

Saman Amarasinghe 7 6.035 ©MIT Fall 1998

Page 8: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Real Machine Model cont.

• Most modern processors have multiple coresp p – Will deal with multicores next week

• Each core has multiple execution units Each core has multiple execution units (superscalar) – If the instruction sequence is correct multipleIf the instruction sequence is correct, multiple

operations will happen in the same cycles – Even more important to have the right instructionEven more important to have the right instruction

sequence

Saman Amarasinghe 8 6.035 ©MIT Fall 1998

Page 9: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Instruction Schedulingg

• Reorder instructions so that pp pipeline stalls are minimized

Saman Amarasinghe 9 6.035 ©MIT Fall 1998

Page 10: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Constraints On Schedulingg

• Data deppendencies • Control dependencies • Resource Constraints • Resource Constraints

Saman Amarasinghe 10 6.035 ©MIT Fall 1998

Page 11: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Data Dependency between I t ti Instructions

• If two instructions access the same variable,, they can be dependent

• Kind of dependenciesp– True: write read – Anti: read write – Output: write write

• What to do if two instructions are dependent.p– The order of execution cannot be reversed – Reduce the possibilities for scheduling

Saman Amarasinghe 11 6.035 ©MIT Fall 1998

Page 12: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Computing Dependenciesp g p

• For basic blocks, comppute deppendencies byy

walking through the instructions • Identifying register dependencies is simpleIdentifying register dependencies is simple

– is it the same register?

•• For memory accesses For memory accesses – simple: base + offset1 ?= base + offset2 – d tdata ddependdence anallysiis: a[2i] ? [2i] ?= a[2i+1][2i+1] – interprocedural analysis: global ?= parameter

i li l i f f – pointer alias analysis: p1foo ?= p2foo

Saman Amarasinghe 12 6.035 ©MIT Fall 1998

Page 13: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Representing Dependenciesp g p

• Usingg a de ppendence DAG, one pper basic block • Nodes are instructions, edges represent

dependencies dependencies

Saman Amarasinghe 13 6.035 ©MIT Fall 1998

Page 14: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

= -

1

3 3

Representing Dependenciesp g p

• Usingg a de ppendence DAG, one pper basic block • Nodes are instructions, edges represent

dependencies – v(i j) = delay required between initiation times of

Saman Amarasinghe 14 6.035 ©MIT Fall 1998

dependencies 1: r2 = *(r1 + 4)2: r3 = *(r1 + 8)

2 2: r3 (r1 + 8) 3: r4 = r2 + r3 4: r5 = r2 - 1 4

2 2 2

4: r5 r2 1

• Edge is labeled with Latency: (i j) d l i d b i i i i i f

4

i and j minus the execution time required by i

Page 15: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1

33

Examplep1: r2 = *(r1 + 4)2: r3 = *(r2 + 4)3: r4 = r2 + r3 4: r5 = r2 - 1 2

4

2 2 2

4

Saman Amarasinghe 15 6.035 ©MIT Fall 1998

Page 16: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1

33

Another Examplep1: r2 = *(r1 + 4)2: *(r1 + 4) = r33: r3 = r2 + r3 4: r5 = r2 - 1 21

4

2 2 1

4

Saman Amarasinghe 16 6.035 ©MIT Fall 1998

Page 17: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Control Dependencies and R C t i tResource Constraints

• For now, lets only worry about basic blocks • For now lets look at simple pipelines For now, lets look at simple pipelines

Saman Amarasinghe 17 6.035 ©MIT Fall 1998

Page 18: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Examplep1: lea var_a, %rax2: add $4, %rax3: inc %r11 4: mov 4(%rsp), %r10 4: mov 4(%rsp), %r10 5: add %r10, 8(%rsp)6: and 16(%rsp), %rbx7: imul %rax, %rbx

Saman Amarasinghe 18 6.035 ©MIT Fall 1998

Page 19: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Examplep1: lea var_a, %rax

Results In 1 cycle

2: add $4, %rax3: inc %r11 4: mov 4(%rsp), %r10

1 cycle1 cycle3 cycles 4: mov 4(%rsp), %r10

5: add %r10, 8(%rsp)6: and 16(%rsp), %rbx

3 cycles

4 cycles7: imul %rax, %rbx 3 cycles

Saman Amarasinghe 19 6.035 ©MIT Fall 1998

Page 20: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Examplep1: lea var_a, %rax

Results In 1 cycle

2: add $4, %rax3: inc %r11 4: mov 4(%rsp), %r10

1 cycle1 cycle3 cycles 4: mov 4(%rsp), %r10

5: add %r10, 8(%rsp)6: and 16(%rsp), %rbx

3 cycles

4 cycles7: imul %rax, %rbx 3 cycles

Saman Amarasinghe 20 6.035 ©MIT Fall 1998

1 2 3 4 st st 5 6 st st st 7

Page 21: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Outline

5

• Modern architectures • Introduction to instruction scheduling • List scheduling • List scheduling • Resource constraints

g• Scheduling across basic blocks • Trace scheduling

Saman Amarasinghe 21 6.035 ©MIT Fall 1998

Page 22: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

List Scheduling Algorithmg g

• Idea – Do a topological sort of the dependence DAG – Consider when an instruction can be scheduled

without causing a stall – Schedule the instruction if it causes no stall and all

its predecessors are already scheduled • Opptimal list schedulingg is NP-compplete

– Use heuristics when necessary

Saman Amarasinghe 22 6.035 ©MIT Fall 1998

Page 23: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

List Scheduling Algorithmg g

• Create a deppendence DAG of a basic block • Topological Sort

READY = nodes with no predecessors

Schedule each node in READY when no stalling

READY nodes with no predecessors Loop until READY is empty

Schedule each node in READY when no stallingUpdate READY

Saman Amarasinghe 23 6.035 ©MIT Fall 1998

Page 24: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Heuristics for selection

• Heuristics for selecting from the READY listg– pick the node with the longest path to a leaf in the

dependence graph – pick a node with most immediate successors – pick a node that can go to a less busy pipeline (in ap g y p p (

superscalar)

Saman Amarasinghe 24 6.035 ©MIT Fall 1998

Page 25: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Heuristics for selection

• pick the node with the longest path to a leaf inp g p the dependence graph

• Algorithm (for node x)Algorithm (for node x) – If no successors

d x = 0 – d = MAX( d + c ) for all successors y of x dx MAX( dy + cxy) for all successors y of x

reverse breadth first visitation order reverse breadth-first visitation order

Saman Amarasinghe 25 6.035 ©MIT Fall 1998

Page 26: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Heuristics for selection

• pick a node with most immediate successorsp• Algorithm (for node x):

– f = number of successors of x fx number of successors of x

Saman Amarasinghe 26 6.035 ©MIT Fall 1998

Page 27: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Examplep1: lea var_a, %rax

Results In 1 cycle

2: add $4, %rax3: inc %r11 4: mov 4(%rsp), %r10

1 cycle1 cycle3 cycles 4: mov 4(%rsp), %r10

5: add %r10, 8(%rsp)6: and 16(%rsp), %rbx

3 cycles

4 cycles7: imul %rax, %rbx8: mov %rbx, 16(%rsp)9: lea var b %rax

3 cycles

9: lea var_b, %rax

Saman Amarasinghe 27 6.035 ©MIT Fall 1998

Page 28: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

7

8 9

8 9

Examplep

1 3 1: lea var_a, %rax

1 32: add $4, %rax3: inc %r11 4: mov 4(%rsp), %r10

1 4 4: mov 4(%rsp), %r10 5: add %r10, 8(%rsp)6: and 16(%rsp), %rbx

3 17: imul %rax, %rbx8: mov %rbx, 16(%rsp)9: lea var b %rax 9: lea var_b, %rax

Saman Amarasinghe 28 6.035 ©MIT Fall 1998

Page 29: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

7

8 98 9

Examplep

1 3

d=0 d=3 d=5 f=1f=0f=1

1 3 d=0d=7d=4 f=0f=1f=1

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 29 6.035 ©MIT Fall 1998

Page 30: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

7

8 98 9

Examplep

1 3

d=0 d=3 d=5 f=1f=0f=1

1 3 d=0d=7d=4 f=0f=1f=1

READY = { }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 30 6.035 ©MIT Fall 1998

Page 31: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

7

8 98 9

Examplep

1 3

d=0 d=3 d=5 f=1f=0f=1

1, 3, 4, 6 1 3 d=0d=7d=4 f=0f=1f=1

READY = { } 1, 3, 4, 6

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 31 6.035 ©MIT Fall 1998

Page 32: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

43

62 5

7

8 98 9

Exampple

1 1 3

d=0 d=3 d=5 f=1f=0f=1

1 3 d=0d=7d=4 f=0f=1f=1

READY = { 1, 4, 3 }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 32 6.035 ©MIT Fall 1998

6

Page 33: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

43

6 5

7

8 98 9

Exampple

1 1 3

d=0 d=3 d=5 f=1f=0f=1

2

1 3 d=0d=7d=4 f=0f=1f=1

READY = { 2, 4, 3 }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 33 6.035 ©MIT Fall 1998

6 1

Page 34: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 3

62 5

7

8 98 9

Exampple

1 4

3

d=0 d=3 d=5 f=1f=0f=1

1 3 d=0d=7d=4 f=0f=1f=1

READY = { 7, 4, 3 }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 34 6.035 ©MIT Fall 1998

6 1 2

Page 35: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

5

8 98 9

Exampple

1 3

d=0 d=3 d=5 f=1f=0f=1

62

1 3 d=0d=7d=4 f=0f=1f=1

READY = { 7, 3, 5 }

7 1 4

d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 35 6.035 ©MIT Fall 1998

6 1 2 4

Page 36: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 4

62 5

7

8 98 9

Exampple

1 3 3 d=0 d=3 d=5

f=1f=0f=1 1 3

d=0d=7d=4 f=0f=1f=1

READY = { 3, 5, 8, 9 }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 36 6.035 ©MIT Fall 1998

6 1 2 4 7

Page 37: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 3

62

7

8 98 9

Exampple

1 4

3

d=0 d=3 d=5 f=1f=0f=1

1

5

3 d=0d=7d=4 f=0f=1f=1

READY = { 5, 8, 9 }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 37 6.035 ©MIT Fall 1998

6 1 2 4 7 3

Page 38: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

8 98 9

Exampple

1 3

d=0 d=3 d=5 f=1f=0f=1

1 3 d=0d=7d=4 f=0f=1f=1

READY = { 9 }

7 1 4

d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 38 6.035 ©MIT Fall 1998

6 1 2 4 7 3 5 8

Page 39: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

7

8 98 9

Exampple

1 3

d=0 d=3 d=5 f=1f=0f=1

1 3 d=0d=7d=4 f=0f=1f=1

READY = { 9 }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 39 6.035 ©MIT Fall 1998

6 1 2 4 7 3 5 8 9

Page 40: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

7

8 98 9

Exampple

1 3

d=0 d=3 d=5 f=1f=0f=1

1 3 d=0d=7d=4 f=0f=1f=1

READY = { }

1 4 d=3 f=2

3 1

d=0 d=0

f=2

d 0 d 0f=0 f=0

Saman Amarasinghe 40 6.035 ©MIT Fall 1998

6 1 2 4 7 3 5 8 9

Page 41: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Examplep1: lea var_a, %rax

Results In 1 cycle

2: add $4, %rax3: inc %r11 4: mov 4(%rsp), %r10

1 cycle1 cycle3 cycles 4: mov 4(%rsp), %r10

5: add %r10, 8(%rsp)6: and 16(%rsp), %rbx

3 cycles

4 cycles7: imul %rax, %rbx8: mov %rbx, 16(%rsp)9: lea var b %rax

3 cycles

1 2 3 4 st st 5 6 st st st 7 8 9 14 cycles vs

9: lea var_b, %rax

Saman Amarasinghe 41 6.035 ©MIT Fall 1998

14 cycles vs 9 cycles6 1 2 4 7 3 5 8 9

Page 42: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Outline

5

• Modern architectures • Introduction to instruction scheduling • List scheduling • List scheduling • Resource constraints

g• Scheduling across basic blocks • Trace scheduling

Saman Amarasinghe 42 6.035 ©MIT Fall 1998

Page 43: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Resource Constraints

• Modern machines have many resourceyconstraints

• Superscalar architectures:Superscalar architectures: – can run few parallel operations – But have constraintsBut have constraints

Saman Amarasinghe 43 6.035 ©MIT Fall 1998

Page 44: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Resource Constraints of a S l PSuperscalar Processor

• Example: – One fully pipelined reg-to-reg unit

– One fully pipelined memory-to/from-reg unit

• All integer operations taking one cycle In parallel with

• Data loads take two cycles

•• Data stores teke one cycleData stores teke one cycle

Saman Amarasinghe 44 6.035 ©MIT Fall 1998

Page 45: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

List Scheduling Algorithm with t i tresource constraints

Each pipeline represent some resource

Represent the superscalar architecture as multipleRepresent the superscalar architecture as multiple pipelines

Each pipeline represent some resource

Saman Amarasinghe 45 6.035 ©MIT Fall 1998

Page 46: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

List Scheduling Algorithm with t i tresource constraints

• Represent the superscalar architecture as multipleRepresent the superscalar architecture as multiple pipelines

Each pipeline represent some resourceEach pipeline represent some resource • Example

O i l l ALU i– One single cycle reg-to-reg ALU unit – One two-cycle pipelined reg-to/from-memory unit

ALU

MEM 1

Saman Amarasinghe 46 6.035 ©MIT Fall 1998

MEM 2

Page 47: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

List Scheduling Algorithm with t i t

• Create a dependence DAG of a basic blockCreate a dependence DAG of a basic block

resource constraints

• Topological Sort READY d ith dREADY = nodes with no predecessors Loop until READY is empty

Let n READY be the node with the highest priority Schedule n in the earliest slot

that satisfies precedence + resource constraints Update READY

Saman Amarasinghe 47 6.035 ©MIT Fall 1998

Page 48: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Example 1: lea var_a, %rax

p2: add 4(%rsp), %rax3:3: incinc %r11%r11 4: mov 4(%rsp), %r105: mov %r10, 8(%rsp)66: andand $0 00ff %rb$0x00ff, %rbx 7: imul %rax, %rbx8: lea var_b, %rax

%rbx, 16(%rsp)9 % b ) 9: mov 16(%

Saman Amarasinghe 48 6.035 ©MIT Fall 1998

Page 49: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

1 43

62 5

7

8

Exampple

1 2

1: lea var_a, %rax2: add 4(%rsp), %rax3: inc %r11 1 23: inc %r11 4: mov 4(%rsp), %r105: mov %r10, 8(%rsp)6: and $0x00ff, %rbx

2

1

1

1

7: imul %rax, %rbx8: lea var_b, %rax9: mov %rbx, 16(%rsp)

9

1 1

READY = { }

1 6 3 7 8ALUop

4 2 5 9MEM 1

Saman Amarasinghe 49 6.035 ©MIT Fall 1998

4 2 5 9 4 2MEM 2

Page 50: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Outline

5

• Modern architectures • Introduction to instruction scheduling • List scheduling • List scheduling • Resource constraints

g• Scheduling across basic blocks • Trace scheduling

Saman Amarasinghe 50 6.035 ©MIT Fall 1998

Page 51: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Scheduling across basic blocksg

• Number of instructions in a basic block is small – Cannot keep a multiple units with long pipelines

busy by just scheduling within a basic block • Need to handle control dependence

– Scheduling constraints across basic blocksScheduling constraints across basic blocks – Scheduling policy

Saman Amarasinghe 51 6.035 ©MIT Fall 1998

Page 52: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Moving across basic blocksg

• Downward to adjacent basic blockj

A

B CB C

Saman Amarasinghe 52 6.035 ©MIT Fall 1998

Page 53: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Moving across basic blocksg

• Downward to adjacent basic blockj

A

B CB C

Saman Amarasinghe 53 6.035 ©MIT Fall 1998

Page 54: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Moving across basic blocksg

• Downward to adjacent basic blockj

A

B CB C

• A path to B that does not execute A?

Saman Amarasinghe 54 6.035 ©MIT Fall 1998

Page 55: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Moving across basic blocksg

• Upward to adjacent basic blockp j

B CB C

A

Saman Amarasinghe 55 6.035 ©MIT Fall 1998

Page 56: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Moving across basic blocksg

• Upward to adjacent basic blockp j

B CB C

A

Saman Amarasinghe 56 6.035 ©MIT Fall 1998

Page 57: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Moving across basic blocksg

• Upward to adjacent basic blockp j

B CB C

A

• A path from C that does not reach A?

Saman Amarasinghe 57 6.035 ©MIT Fall 1998

Page 58: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Control Dependenciesp

• Constraints in moving instructions across basic gblocks

Saman Amarasinghe 58 6.035 ©MIT Fall 1998

Page 59: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Control Dependenciesp

• Constraints in moving instructions across basic gblocks

if ( . . . )a = b op cp

Saman Amarasinghe 59 6.035 ©MIT Fall 1998

Page 60: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Control Dependenciesp

• Constraints in moving instructions across basic gblocks

if ( . . . )a = b op cp

Saman Amarasinghe 60 6.035 ©MIT Fall 1998

Page 61: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Control Dependenciesp

• Constraints in moving instructions across basic gblocks

if ( c != 0 )a = b / c/

NO!!!

Saman Amarasinghe 61 6.035 ©MIT Fall 1998

Page 62: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Control Dependenciesp

• Constraints in moving instructions across basic gblocks

If ( . . . )( ) d = *(a1)

Saman Amarasinghe 62 6.035 ©MIT Fall 1998

Page 63: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Control Dependenciesp

• Constraints in moving instructions across basic gblocks

If ( valid address? )( ) d = *(a1)

Saman Amarasinghe 63 6.035 ©MIT Fall 1998

Page 64: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Outline

5

• Modern architectures • Introduction to instruction scheduling • List scheduling • List scheduling • Resource constraints

g• Scheduling across basic blocks • Trace scheduling

Saman Amarasinghe 64 6.035 ©MIT Fall 1998

Page 65: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Trace Schedulingg

• Find the most common trace of basic blocks

• Combine the basic blocks in the trace andCombine the basic blocks in the trace and– Use profile information

schedule them as one block •• Create clean up code if the execution goes off Create clean-up code if the execution goes off-

trace

Saman Amarasinghe 65 6.035 ©MIT Fall 1998

Page 66: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Trace Schedulingg

B C

A

D

B C

E

F G

Saman Amarasinghe 66 6.035 ©MIT Fall 1998

H

Page 67: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Trace Schedulingg

B

A

D

B

E

G

Saman Amarasinghe 67 6.035 ©MIT Fall 1998

H

Page 68: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Large Basic Blocks via C d D li ti Code Duplication

• Creating large extended basic blocks byg g y duplication

• Schedule the larger blocksSchedule the larger blocks

A

B C

D

Saman Amarasinghe 68 6.035 ©MIT Fall 1998

E

Page 69: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Large Basic Blocks via C d D li ti Code Duplication

• Creating large extended basic blocks byg g y duplication

• Schedule the larger blocksSchedule the larger blocks

A A

B C B C

D D D

Saman Amarasinghe 69 6.035 ©MIT Fall 1998

E E E

Page 70: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Trace Schedulingg

B

A

C

D

B

D

C

E E

GF H

F G

Saman Amarasinghe 70 6.035 ©MIT Fall 1998

HH H

Page 71: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

Next 5

• Schedulingg for loopps • Loop unrolling • Software pipelining • Software pipelining • Interaction with register allocation • Hardware vs. Compiler

Saman Amarasinghe 71 6.035 ©MIT Fall 1998

Page 72: It d ti tIntroduction to Code OptimizationCode Optimization Will deal with multicores next week • Each core has multiple Each core has multiple execution units execution units (superscalar)

MIT OpenCourseWarehttp://ocw.mit.edu

6.035 Computer Language Engineering Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.