code optimization - stanford university...final code optimization goal: optimize generated code by...

245
Code Optimization

Upload: others

Post on 24-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Code Optimization

Page 2: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Announcements

● Programming Project 4 due Saturday at 11:30AM.● Stop by office hours!● Ask questions via email!● Ask questions via Piazza!

● Keith has extra office hours this Wednesday and Friday from 2PM – 4PM.

● Online course evaluation available on Axess.● Please give feedback!

Page 3: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We Are

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

SourceCode

MachineCode

Page 4: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We Are

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

SourceCode

MachineCode

The Next GenerationThe Next Generation

Page 5: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We Are

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

SourceCode

MachineCode

Page 6: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Final Code Optimization

● Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level.

● Critical step in most compilers, but often very messy:● Techniques developed for one machine may

be completely useless on another.● Techniques developed for one language may

be completely useless with another.

Page 7: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Outline

● Goals for Today:● Explore important properties of machines

and how they impact performance.● Survey common optimization techniques and

the theory behind them.● Motivate you to take CS243 to learn more!

● Non-Goals for Today:● Explore tricky details of the algorithms.

Page 8: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Optimizations for Pipelining

Page 9: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

Page 10: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

Page 11: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

Page 12: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

Page 13: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

Read Port Write Port

Page 14: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

Page 15: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Page 16: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Page 17: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Page 18: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Page 19: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Page 20: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Processor Pipelines

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t5, $t3, $t4 # $t5 = $t3 + $t4add $t8, $t6, $t7 # $t8 = $t6 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Page 21: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Pipeline Hazards

Page 22: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

Pipeline Hazards

Page 23: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

Pipeline Hazards

Page 24: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 25: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 26: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 27: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline HazardsThis value isn't

ready yet!

Page 28: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 29: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 30: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline HazardsStall pipeline until value is ready

Page 31: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 32: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 33: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 34: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 35: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 36: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 37: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 38: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 39: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Page 40: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

add $t2, $t0, $t1 # $t2 = $t0 + $t1add $t7, $t5, $t6 # $t7 = $t5 + $t6add $t4, $t3, $t2 # $t5 = $t3 + $t2add $t0, $t0, $t7 # $t0 = $t0 + $t7

InstructionDecoder

RegisterFile

ALU

Read Port Write Port

ID RR ALU RW

Pipeline Hazards

Two clock cycles faster!

Page 41: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

● Because of processor pipelining, the order in which instructions are executed can impact performance.

● Instruction scheduling is the reordering or insertion of machine instructions to increase performance.

● All good optimizing compilers have some sort of instruction scheduling support.

Page 42: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Data Dependencies

● A data dependency in machine code is a set of instructions whose behavior depends on one another.

● Intuitively, a set of instructions that cannot be reordered around each other.

● Three types of data dependencies:

Read-after-Write(RAW)

Write-after-Read(WAR)

Write-after-Write (WAW)

x = …… = x

… = xx = …

x = …x = …

Page 43: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependencies

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

Page 44: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependenciest0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 45: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependenciest0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 46: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependenciest0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 47: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependenciest0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 48: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependenciest0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 49: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependenciest0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 50: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Finding Data Dependencies

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 51: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Data Dependency Graphs

● The graph of the data dependencies in a basic block is called the data dependency graph.

● Always a directed acyclic graph:● Directed: One instruction depends on the other.● Acyclic: No circular dependencies allowed. (Why?)

● Can schedule instructions in a basic block in any order as long we never schedule a node before all its parents.

● Idea: Do a topological sort of the data dependency graph and output instructions in that order.

Page 52: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

Page 53: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 54: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 55: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

Page 56: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

Page 57: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

Page 58: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

Page 59: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

Page 60: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

Page 61: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

Page 62: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

Page 63: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

Page 64: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

Page 65: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

Page 66: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

Page 67: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

Is this a legal schedule?

Page 68: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

Is this a good schedule?

Page 69: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 70: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 71: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

Page 72: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

Page 73: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

Page 74: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

Page 75: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

Page 76: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t1 = t0 + t1

t0 = t1 + t2

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

Page 77: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t0 = t1 + t2

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

t1 = t0 + t1

Page 78: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t0 = t1 + t2

t5 = t3 + t4

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

t1 = t0 + t1

Page 79: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t0 = t1 + t2

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

t1 = t0 + t1

t5 = t3 + t4

Page 80: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7

t0 = t1 + t2

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

t1 = t0 + t1

t5 = t3 + t4

Page 81: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7t3 = t2 + t4

t3 = t2 + t4

t5 = t3 + t4

t0 = t1 + t2

t1 = t0 + t1

t0 = t1 + t2

t6 = t2 + t7 t0 = t1 + t2

t6 = t2 + t7

t0 = t1 + t2

t3 = t2 + t4

t1 = t0 + t1

t5 = t3 + t4

Page 82: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

One Small Problem

● There can be many valid topological orderings of a data dependency graph.

● How do we pick one that works well with the pipeline?

● In general, finding the fastest instruction schedule is known to be NP-hard.● Don't expect a polynomial-time algorithm anytime soon!

● Heuristics are used in practice:● Schedule instructions that can run to completion without

interference before instructions that cause interference.● Schedule instructions with more dependents before

instructions with fewer dependents.

Page 83: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 84: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

Page 85: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

0

0

0 0

0

0

Page 86: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

+3

t0 = t1 + t2

t0 = t1 + t2

+3

+3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 0

0

0

Page 87: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

+3

t0 = t1 + t2

t0 = t1 + t2

+3

+3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 0

0

0ID RR ALU RW

Page 88: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

+3

t0 = t1 + t2

t0 = t1 + t2

+3

+3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 0

0

0ID RR ALU RW

t0 = t1 + t2

Page 89: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

+3

t0 = t1 + t2

t0 = t1 + t2

+3

+3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 3

3

0ID RR ALU RW

t0 = t1 + t2

Page 90: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 3

3

ID RR ALU RW

t0 = t1 + t2

Page 91: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 3

3

ID RR ALU RW

t0 = t1 + t2

Page 92: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 3

3

ID RR ALU RW

t0 = t1 + t2

Page 93: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

0

0 3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

Page 94: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t3 = t2 + t4

t5 = t3 + t4

+3

t6 = t2 + t7

0

4

0 3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

Page 95: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t4

t6 = t2 + t7

4

0 3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

Page 96: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t4

t6 = t2 + t7

4

0 3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

Page 97: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t4

t6 = t2 + t7

4

0 3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

Page 98: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t3

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t4

t6 = t2 + t7

4

0 3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

Page 99: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t44

3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

Page 100: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t44

3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

Page 101: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t44

3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

Page 102: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t44

3

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

Page 103: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t1 = t0 + t1

t0 = t1 + t2 +3

t5 = t3 + t44

6

3

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

Page 104: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t2

t5 = t3 + t44

6

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

Page 105: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t2

t5 = t3 + t44

6

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

Page 106: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t2

t5 = t3 + t44

6

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

Page 107: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t2

t5 = t3 + t44

6

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

Page 108: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t26

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

Page 109: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t26

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

Page 110: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t26

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

Page 111: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4

t0 = t1 + t26

ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

t0 = t1 + t2

Page 112: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

t0 = t1 + t2

Page 113: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

t0 = t1 + t2

Page 114: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Instruction Scheduling

t1 = t0 + t1

t0 = t1 + t2

t0 = t1 + t2

t3 = t2 + t4

t5 = t3 + t4

t6 = t2 + t7

t3 = t2 + t4ID RR ALU RW

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

t0 = t1 + t2

Page 115: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

For Comparison

t0 = t1 + t2

t3 = t2 + t4

t6 = t2 + t7

t1 = t0 + t1

t5 = t3 + t4

t0 = t1 + t2

ID RR ALURW

t0 = t1 + t2

t1 = t0 + t1

t3 = t2 + t4

t0 = t1 + t2

t5 = t3 + t4

t6 = t2 + t7

ID RR ALURW

Page 116: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

More Advanced Scheduling

● Modern optimizing compilers can do far more aggressive scheduling to obtain impressive performance gains.

● One powerful technique: Loop unrolling● Expand out several loop iterations at once.● Use previous algorithm to schedule instructions more

intelligently.● Can find pipelining-level parallelism across loop iterations.

● Even more powerful technique: Software pipelining● Loop unrolling on steroids; can convert loops using tens of

cycles into loops averaging two or three cycles.

Page 117: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Optimizations for Locality

Page 118: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

● Because computers use different types of memory, there are a variety of memory caches in the machine.

● Caches are designed to anticipate common use patterns.

● Compilers often have to rewrite code to take maximal advantage of these designs.

Page 119: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Locality

● Empirically, many programs exhibit temporal locality and spatial locality.

● Temporal locality: Memory read recently is likely to be read again.

● Spatial locality: Memory read recently will likely have nearby objects read as well.

● Most memory caches are designed to exploit temporal and spatial locality by● Holding recently-used memory addresses in cache.● Loading nearby memory addresses into cache.

Page 120: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Page 121: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

Page 122: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

Page 123: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

Page 124: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

Page 125: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

Page 126: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

0

0

Page 127: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

0

0

Page 128: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

0

0

Already in cache!

Page 129: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

0

0

Page 130: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

Page 131: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

Page 132: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

Page 133: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

arr[9]

arr[10]

arr[11]

arr[8] 0

0

0

0

Page 134: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

arr[9]

arr[10]

arr[11]

arr[8] 0

0

13

0

Page 135: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

0

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

arr[9]

arr[10]

arr[11]

arr[8] 0

0

13

0

Page 136: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

13

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

Page 137: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

13

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

Page 138: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Memory Caches

Memory Cache

arr[3]

arr[4]

arr[5]

arr[0]

arr[1]

arr[2]

arr[9]

arr[10]

arr[11]

arr[6]

arr[7]

arr[8]

arr[0] = 5;arr[2] = 6;arr[10] = 13;arr[1] = 4;

0

0

0

0

0

0

0

0

0

0

13

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

0

6

0

arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

0

0

0

0arr[3]

arr[0]

arr[1]

arr[2]

5

4

6

0

Page 139: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Caches

Page 140: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

Page 141: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Page 142: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Page 143: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 144: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 145: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

00 01 02 03

Page 146: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

00 01 02 03

Page 147: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 148: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

10 11 12 13

Page 149: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

11 12 1310

Page 150: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 151: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

20 21 22 23

Page 152: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

20 21 22 23

Page 153: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 154: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

The Problem with Cachesint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

30 31 32 33

Page 155: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Improving Locality

● Programmers frequently write code without understanding the locality implications.● Languages don't expose low-level memory

details.

● Some compilers are capable of rewriting code to take advantage of locality.

● Cool optimizations:● Loop reordering.● Structure peeling.

Page 156: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 157: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (j = 0; j < 4; j = j + 1) for (i = 0; i < 4; i = i + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 158: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 159: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 160: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 161: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

00 01 02 03

Page 162: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

00 01 02 03

Page 163: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

00 01 02 03

Page 164: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

00 01 02 03

Page 165: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

00 01 02 03

Page 166: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

Page 167: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

10 11 12 13

10 11 12 1310 11 12 13

Page 168: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

12 13

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Cache

10 11 12 13

10 11 12 1310 11 12 13

11

Page 169: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

121211 12 13

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 13 20 21 22 23 30 31 32 33

Cache

10 11 13

10 11 12 1310 11 12 13

Page 170: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

1312 13121211 13

Loop Reorderingint[][] array;for (i = 0; i < 4; i = i + 1) for (j = 0; j < 4; j = j + 1) array[i][j] = 0;

00 01 02 03 10 11 20 21 22 23 30 31 32 33

Cache

10 11

10 11 12 1310 11 12 13

Page 171: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peeling

Page 172: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Page 173: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Page 174: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

arr[3]

pts[2].x

pts[2].y

arr[0]

arr[1]

arr[2]

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

Page 175: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

arr[3]

pts[2].x

pts[2].y

arr[0]

arr[1]

arr[2]

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

Page 176: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

arr[3]

pts[2].x

pts[2].y

arr[0]

arr[1]

arr[2]

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

Page 177: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

arr[3]

pts[2].x

pts[2].y

arr[0]

arr[1]

arr[2]

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

Page 178: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

arr[3]

pts[2].x

pts[2].y

arr[0]

arr[1]

arr[2]

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

pts[2].x

pts[2].y

pts[3].x

pts[3].y

Page 179: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

arr[3]

pts[2].x

pts[2].y

arr[0]

arr[1]

arr[2]

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

pts[3].y

pts[2].x

pts[2].y

pts[3].x

Only half the cache is useful!

Page 180: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

arr[3]

pts[2].x

pts[2].y

arr[0]

arr[1]

arr[2]

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

arr[3]

arr[0]

arr[1]

arr[2]

pts[1].y

pts[0].x

pts[0].y

pts[1].x

Page 181: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cachepts[2].x

pts[2].y

pts[4].y

pts[5].x

pts[5].y

pts[3].x

pts[3].y

pts[4].x

...

pts[1].y

pts[0].x

pts[0].y

pts[1].x

Page 182: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

pts[2].x

pts[2].y

pts[4].y

pts[3].x

pts[3].y

pts[4].x

...

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

Page 183: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

pts[2].x

pts[2].y

pts[4].y

pts[3].x

pts[3].y

pts[4].x

...

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

Page 184: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

pts[2].x

pts[2].y

pts[4].y

pts[3].x

pts[3].y

pts[4].x

...

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

Array of classes to class of arrays!

Page 185: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

pts[2].x

pts[2].y

pts[4].y

pts[3].x

pts[3].y

pts[4].x

...

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

Page 186: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

pts[2].x

pts[2].y

pts[4].y

pts[3].x

pts[3].y

pts[4].x

...

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

pts[2].x

pts[3].x

pts[0].x

pts[1].x

Page 187: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Structure Peelingclass Point2D { int x; int y;}

void MyFunction() { Point2D[] pts = new Point2D[1024];

/* … initialize the points … */

int maxX = 0, maxY = 0; for (int i = 0; i < 512; ++i) maxX = max(pts[i].x, maxX); for (int i = 512; i < 1024; ++i) maxY = max(pts[i].y, maxY);}

Memory Cache

pts[2].x

pts[2].y

pts[4].y

pts[3].x

pts[3].y

pts[4].x

...

pts[1].y

pts[0].x

pts[0].y

pts[1].x

...

pts[2].x

pts[3].x

pts[0].x

pts[1].x

Page 188: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Optimizations for Parallelism

Page 189: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Everything is Parallel Now

● Virtually all new computers have multiple processors.● Even some phones have multiple cores!

● High-end machines now usually have at least eight cores.

● How do we optimize code to work well in parallel?

Page 190: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Loop Parallelization

● Many numeric computations work by iterating over large arrays of elements.● Especially true in FORTRAN code.

● Optimize loops over arrays by identifying inherent parallelism.

● Three-step process:● Identify which array values depend on one another.● Split each group of dependent values into its own task.● Map each task onto one processor.

● Beautiful (but tricky!) framework for doing this automatically; I'm not even going to try to go into mathematical details.

Page 191: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Iteration Spaces

● The iteration space of a set of loops is the set of valid indices taken by the loop counters.

Page 192: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Iteration Spaces

● The iteration space of a set of loops is the set of valid indices taken by the loop counters.

Page 193: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Iteration Spaces

● The iteration space of a set of loops is the set of valid indices taken by the loop counters.

Page 194: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Iteration Spaces

● The iteration space of a set of loops is the set of valid indices taken by the loop counters.

Page 195: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

Page 196: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

Page 197: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

Page 198: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

Page 199: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

Page 200: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

Page 201: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

Page 202: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

Page 203: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

Page 204: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

Page 205: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

Page 206: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Identifying Dependent Accesses

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

Page 207: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

Page 208: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

Page 209: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

P0 P1 P2 P3

Processors

Page 210: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

P0 P1 P2 P3

Processors

Page 211: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

P0 P1 P2 P3

Processors

Page 212: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

P0 P1 P2 P3

Processors

Page 213: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 0; j < 8; ++j) arr[i][j] = 0;

P0 P1 P2 P3

Processors

Page 214: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assignment Considerations

● When assigning iterations to processors:● Pick an assignment that is good for locality.● Pick an assignment that maximizes the

degree of parallelism.

● In many cases, can be determined automatically!● See Ch. 11 of the Dragon Book.

Page 215: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

Page 216: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

Page 217: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

Page 218: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

P0 P1 P2 P3

Processors

Page 219: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

P0 P1 P2 P3

Processors

Page 220: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 0; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][j] = arr[i][j-1];

P0 P1 P2 P3

Processors

Page 221: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

Page 222: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

Page 223: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

Page 224: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

P0 P1 P2 P3

Processors

Page 225: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Assigning Groups to Processors

for (int i = 1; i < 8; ++i) for (int j = 1; j < 8; ++j) arr[i][i] = arr[i][j-1] + arr[i-1][j] + arr[i-1][j-1];

P0 P1 P2 P3

Processors

Page 226: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Summary

● Instruction scheduling optimizations try to take advantage of the processor pipeline.

● Locality optimizations try to take advantage of cache behavior.

● Parallelism optimizations try to take advantage of multicore machines.

● There are many more optimizations out there!

Page 227: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We Are

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

SourceCode

MachineCode

Page 228: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We Are

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

SourceCode

MachineCode

One small step...One small step...

Page 229: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We Are

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

SourceCode

MachineCode

Page 230: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

SourceCode

MachineCode

Page 231: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

● Regular Expressions● Finite Automata● Maximal-Munch● Subset Construction● flex

Page 232: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

● Context-Free Grammars● Parse Trees● ASTs● Leftmost DFS● LL(1)● Handles● LR(0)● SLR(1)● LR(1)● LALR(1)● Earley● Earley-on-DFA● bison

Syntax Analysis

Page 233: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

● Scope-Checking● Spaghetti Stacks● Function Overloading● Type Systems● Well-Formedness● Null and Error Types● Covariance

Semantic Analysis

Page 234: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

● Runtime Environments● Function Stacks● Closures● Coroutines● Parameter Passing● Object Layouts● Vtables● Inline Caching● TAC

IR Generation

Page 235: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

● Basic Blocks● Control-Flow Graphs● Common Subexpression

Elimination● Copy Propagation● Dead Code Elimination● Arithmetic Simplification● Constant Folding● Meet Semilattices● Transfer Functions● Global Constant Propagation● Partial Redundancy Elimination

IR Optimization

Page 236: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

● Memory Hierarchies● Live Ranges● Live Intervals● Linear-Scan Register Allocation● Register Interference Graphs● Chaitin's Graph-Coloring

Algorithm● Reference Counting● Mark-and-Sweep Collectors● Baker's Algorithm● Stop-and-Copy Collectors● Generational Collectors

Code Generation

Page 237: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where We've Been

Lexical Analysis

Semantic Analysis

Syntax Analysis

IR Optimization

IR Generation

Code Generation

Optimization

● Instruction Scheduling

● Loop Reordering● Structure Peeling

● Automatic Parallelization

Optimization

Page 238: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Why Study Compilers? (Recap)

● Build a large, ambitious software system.

● See theory come to life.● Learn how to build programming

languages.● Learn how programming languages

work.● Learn tradeoffs in language design.

Page 239: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Where to Go from Here

Page 240: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

CS243: Program Analysis and Optimization

● In-depth treatment of optimization topics:● Dataflow framework.● Register allocation.● Garbage collection.● Instruction scheduling.● Locality optimizations.● Parallelization.● Interprocedural analysis.

Page 241: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

CS242: Programming Languages

● Survey of programming languages, their innovations, and their implementations:● Functional programming.● Object-oriented languages through history.● Runtime environments and optimizations.● Language-level security.● Modern language features.

Page 242: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

CS258: Intro to Programming Language Theory

● Mathematical exploration of programs and their properties:● Operational semantics.● Fixed-point operators and recursion.● Axiomatic semantics.● Formal algebras● Denotational semantics.

Page 243: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

CS343: Advanced Topics in Compilers

● Research-level topics in compilers and interpreters:● Building fast JITs.● Binary translation.● Dynamic and static analysis.● Sandboxing.● Superoptimizers.

Page 245: Code Optimization - Stanford University...Final Code Optimization Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step

Final Thoughts