cs738: advanced compiler optimizations - ssa continued

66
CS738: Advanced Compiler Optimizations SSA Continued Amey Karkare [email protected] http://www.cse.iitk.ac.in/~karkare/cs738 Department of CSE, IIT Kanpur

Upload: others

Post on 19-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS738: Advanced Compiler Optimizations - SSA Continued

CS738: Advanced Compiler Optimizations

SSA Continued

Amey Karkare

[email protected]

http://www.cse.iitk.ac.in/~karkare/cs738

Department of CSE, IIT Kanpur

Page 2: CS738: Advanced Compiler Optimizations - SSA Continued

Agenda

◮ Properties of SSA

◮ SSA to Executable

◮ SSA for Optimizations

Page 3: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

Page 4: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

◮ N: nodes, E : edges in flow graph

Page 5: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

◮ N: nodes, E : edges in flow graph

◮ A: number of assignments

Page 6: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

◮ N: nodes, E : edges in flow graph

◮ A: number of assignments

◮ M: number of uses of variables

Page 7: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

◮ N: nodes, E : edges in flow graph

◮ A: number of assignments

◮ M: number of uses of variables

◮ Computation of DF: O(R2)

Page 8: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

◮ N: nodes, E : edges in flow graph

◮ A: number of assignments

◮ M: number of uses of variables

◮ Computation of DF: O(R2)

◮ Computation of SSA: O(R3)

Page 9: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

◮ N: nodes, E : edges in flow graph

◮ A: number of assignments

◮ M: number of uses of variables

◮ Computation of DF: O(R2)

◮ Computation of SSA: O(R3)

◮ In practice, worst case is rare.

Page 10: CS738: Advanced Compiler Optimizations - SSA Continued

Complexity of Construction

◮ R = max(N,E ,A,M)

◮ N: nodes, E : edges in flow graph

◮ A: number of assignments

◮ M: number of uses of variables

◮ Computation of DF: O(R2)

◮ Computation of SSA: O(R3)

◮ In practice, worst case is rare.

◮ Practical complexity: O(R)

Page 11: CS738: Advanced Compiler Optimizations - SSA Continued

Linear Time Algorithm for φ-functions

◮ By Sreedhar and Gao, in POPL’95

Page 12: CS738: Advanced Compiler Optimizations - SSA Continued

Linear Time Algorithm for φ-functions

◮ By Sreedhar and Gao, in POPL’95

◮ Uses a new data structure called DJ-graph

Page 13: CS738: Advanced Compiler Optimizations - SSA Continued

Linear Time Algorithm for φ-functions

◮ By Sreedhar and Gao, in POPL’95

◮ Uses a new data structure called DJ-graph

◮ Linear time is achieved by careful ordering of nodes in the

DJ-graph

Page 14: CS738: Advanced Compiler Optimizations - SSA Continued

Linear Time Algorithm for φ-functions

◮ By Sreedhar and Gao, in POPL’95

◮ Uses a new data structure called DJ-graph

◮ Linear time is achieved by careful ordering of nodes in the

DJ-graph

◮ DF for a node is computed only once an reused later if

required.

Page 15: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form: Simple Example

x = . . .

. . . = x

y = . . .

z = . . .

x = . . .

. . . = x

y = . . .

z = . . .

. . . = y . . . = y

. . . = z

Original Program

Page 16: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form: Simple Example

x1 = . . .

. . . = x1

y1 = . . .

z1 = . . .

x2 = . . .

. . . = x2

y2 = . . .

z2 = . . .

. . . = y1 . . . = y2

x3 = φ(x1, x2)

y3 = φ(y1, y2)

z3 = φ(z1, z2)

. . . = z3

Minimal SSA form

Page 17: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form

◮ Minimal SSA still contains extraneous φ-functions

Page 18: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form

◮ Minimal SSA still contains extraneous φ-functions◮ Inserts some φ-functions where they are dead

Page 19: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form

◮ Minimal SSA still contains extraneous φ-functions◮ Inserts some φ-functions where they are dead◮ Would like to avoid inserting them

Page 20: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form

◮ Minimal SSA still contains extraneous φ-functions◮ Inserts some φ-functions where they are dead◮ Would like to avoid inserting them

◮ Pruned SSA

Page 21: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form

◮ Minimal SSA still contains extraneous φ-functions◮ Inserts some φ-functions where they are dead◮ Would like to avoid inserting them

◮ Pruned SSA

◮ Semi-Pruned SSA

Page 22: CS738: Advanced Compiler Optimizations - SSA Continued

Pruned SSA

◮ Only insert φ-functions where their value is live

Page 23: CS738: Advanced Compiler Optimizations - SSA Continued

Pruned SSA

◮ Only insert φ-functions where their value is live

◮ Inserts fewer φ-functions

Page 24: CS738: Advanced Compiler Optimizations - SSA Continued

Pruned SSA

◮ Only insert φ-functions where their value is live

◮ Inserts fewer φ-functions

◮ Costs more to do

Page 25: CS738: Advanced Compiler Optimizations - SSA Continued

Pruned SSA

◮ Only insert φ-functions where their value is live

◮ Inserts fewer φ-functions

◮ Costs more to do

◮ Requires global Live variable analysis

Page 26: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form: Pruned SSA Example

x1 = . . .

. . . = x1

y1 = . . .

z1 = . . .

x2 = . . .

. . . = x2

y2 = . . .

z2 = . . .

. . . = y1 . . . = y2

z3 = φ(z1, z2)

. . . = z3

Page 27: CS738: Advanced Compiler Optimizations - SSA Continued

Semi-Pruned SSA Form

◮ Discard names used in only one block

Page 28: CS738: Advanced Compiler Optimizations - SSA Continued

Semi-Pruned SSA Form

◮ Discard names used in only one block

◮ Total number of φ-functions between minimal and pruned

SSA

Page 29: CS738: Advanced Compiler Optimizations - SSA Continued

Semi-Pruned SSA Form

◮ Discard names used in only one block

◮ Total number of φ-functions between minimal and pruned

SSA

◮ Needs only local Live information

Page 30: CS738: Advanced Compiler Optimizations - SSA Continued

Semi-Pruned SSA Form

◮ Discard names used in only one block

◮ Total number of φ-functions between minimal and pruned

SSA

◮ Needs only local Live information

◮ Non-locals can be computed without iteration or elimination

Page 31: CS738: Advanced Compiler Optimizations - SSA Continued

Variants of SSA Form: Semi-pruned SSA Example

x1 = . . .

. . . = x1

y1 = . . .

z1 = . . .

x2 = . . .

. . . = x2

y2 = . . .

z2 = . . .

. . . = y1 . . . = y2

y3 = φ(y1, y2)

z3 = φ(z1, z2)

. . . = z3

Page 32: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

Page 33: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

defined = {}

Page 34: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

defined = {}

foreach instruction v = x op y {

Page 35: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

defined = {}

foreach instruction v = x op y {

if x not in defined

Page 36: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

defined = {}

foreach instruction v = x op y {

if x not in defined

non-locals = non-locals ∪ {x}

Page 37: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

defined = {}

foreach instruction v = x op y {

if x not in defined

non-locals = non-locals ∪ {x}if y not in defined

Page 38: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

defined = {}

foreach instruction v = x op y {

if x not in defined

non-locals = non-locals ∪ {x}if y not in defined

non-locals = non-locals ∪ {y}

Page 39: CS738: Advanced Compiler Optimizations - SSA Continued

Computing Non-locals

foreach block B {

defined = {}

foreach instruction v = x op y {

if x not in defined

non-locals = non-locals ∪ {x}if y not in defined

non-locals = non-locals ∪ {y}defined = defined ∪ {v}

}

}

Page 40: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code

Page 41: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code◮ Need to fix up the φ-function

Page 42: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code◮ Need to fix up the φ-function

◮ Basic idea

Page 43: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code◮ Need to fix up the φ-function

◮ Basic idea◮ Insert copies in predecessors to mimic φ-function

Page 44: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code◮ Need to fix up the φ-function

◮ Basic idea◮ Insert copies in predecessors to mimic φ-function◮ Simple algorithm

Page 45: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code◮ Need to fix up the φ-function

◮ Basic idea◮ Insert copies in predecessors to mimic φ-function◮ Simple algorithm

◮ Works in most cases, but not always

Page 46: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code◮ Need to fix up the φ-function

◮ Basic idea◮ Insert copies in predecessors to mimic φ-function◮ Simple algorithm

◮ Works in most cases, but not always

◮ Adds lots of copies

Page 47: CS738: Advanced Compiler Optimizations - SSA Continued

SSA to Executable

◮ At some point, we need executable code◮ Need to fix up the φ-function

◮ Basic idea◮ Insert copies in predecessors to mimic φ-function◮ Simple algorithm

◮ Works in most cases, but not always

◮ Adds lots of copies◮ Many of them will be optimized by later passes

Page 48: CS738: Advanced Compiler Optimizations - SSA Continued

φ-removal: Examplex0 = . . . x1 = . . .

x2 = φ(x0, x1)

. . . = x2

Page 49: CS738: Advanced Compiler Optimizations - SSA Continued

φ-removal: Examplex0 = . . . x1 = . . .

x2 = φ(x0, x1)

. . . = x2

⇓ φ-removal

x0 = . . .

x2 = x0

x1 = . . .

x2 = x1

. . . = x2

Page 50: CS738: Advanced Compiler Optimizations - SSA Continued

Lost Copy Problem

x = 1

y = x

x = x + 1

print y

Program

Page 51: CS738: Advanced Compiler Optimizations - SSA Continued

Lost Copy Problem

x = 1

y = x

x = x + 1

print y

x1 = 1

x2 = φ(x1, x3)

x3 = x2 + 1

print x2

Program SSA from with

copy propagation

Page 52: CS738: Advanced Compiler Optimizations - SSA Continued

Lost Copy Problem

x = 1

y = x

x = x + 1

print y

x1 = 1

x2 = φ(x1, x3)

x3 = x2 + 1

print x2

x1 = 1

x2 = x1

x3 = x2 + 1

x2 = x3

print x2

Program SSA from with After φ-removal

copy propagation

Page 53: CS738: Advanced Compiler Optimizations - SSA Continued

Lost Copy Problem: Solutions

x1 = 1

x2 = x1

x3 = x2 + 1

t = x2

x2 = x3

print t

1. Use of Temporary

Page 54: CS738: Advanced Compiler Optimizations - SSA Continued

Lost Copy Problem: Solutions

x1 = 1

x2 = x1

x3 = x2 + 1

t = x2

x2 = x3

print t

x1 = 1

x2 = x1

x3 = x2 + 1

print x2

x2 = x3

1. Use of Temporary 2. Critical Edge Split

Page 55: CS738: Advanced Compiler Optimizations - SSA Continued

Swap Problem

a = 1

b = 2

x = a

a = b

b = x

print b

Program

Page 56: CS738: Advanced Compiler Optimizations - SSA Continued

Swap Problem

a = 1

b = 2

x = a

a = b

b = x

print b

a1 = 1

b1 = 2

a2 = φ(a1, b2)

b2 = φ(b1, a2)

print a2

Program SSA form with

copy propagation

Page 57: CS738: Advanced Compiler Optimizations - SSA Continued

Swap Problem

a = 1

b = 2

x = a

a = b

b = x

print b

a1 = 1

b1 = 2

a2 = φ(a1, b2)

b2 = φ(b1, a2)

print a2

a1 = 1

b1 = 2

a2 = a1

b2 = b1

a2 = b2

b2 = a2

print a2

Program SSA form with After φ-removal

copy propagation

Page 58: CS738: Advanced Compiler Optimizations - SSA Continued

Swap Problem: Solution

◮ Fix requires compiler to detect and break dependency from

output of one φ-function to input of another φ-function.

Page 59: CS738: Advanced Compiler Optimizations - SSA Continued

Swap Problem: Solution

◮ Fix requires compiler to detect and break dependency from

output of one φ-function to input of another φ-function.

◮ May require temporary if cyclic dependency exists.

Page 60: CS738: Advanced Compiler Optimizations - SSA Continued

SSA Form for Optimizations

◮ SSA form can improve and/or speed up many analysesand optimizations

Page 61: CS738: Advanced Compiler Optimizations - SSA Continued

SSA Form for Optimizations

◮ SSA form can improve and/or speed up many analysesand optimizations◮ (Conditional) Constant propagation

Page 62: CS738: Advanced Compiler Optimizations - SSA Continued

SSA Form for Optimizations

◮ SSA form can improve and/or speed up many analysesand optimizations◮ (Conditional) Constant propagation◮ Dead code elimination

Page 63: CS738: Advanced Compiler Optimizations - SSA Continued

SSA Form for Optimizations

◮ SSA form can improve and/or speed up many analysesand optimizations◮ (Conditional) Constant propagation◮ Dead code elimination◮ Value numbering

Page 64: CS738: Advanced Compiler Optimizations - SSA Continued

SSA Form for Optimizations

◮ SSA form can improve and/or speed up many analysesand optimizations◮ (Conditional) Constant propagation◮ Dead code elimination◮ Value numbering◮ PRE

Page 65: CS738: Advanced Compiler Optimizations - SSA Continued

SSA Form for Optimizations

◮ SSA form can improve and/or speed up many analysesand optimizations◮ (Conditional) Constant propagation◮ Dead code elimination◮ Value numbering◮ PRE◮ Loop Invariant Code Motion

Page 66: CS738: Advanced Compiler Optimizations - SSA Continued

SSA Form for Optimizations

◮ SSA form can improve and/or speed up many analysesand optimizations◮ (Conditional) Constant propagation◮ Dead code elimination◮ Value numbering◮ PRE◮ Loop Invariant Code Motion◮ Strength Reduction