optimal chain rule placement for instruction selection based on

40
Optimal Chain Rule Placement for Instruction Selection based on SSA Graphs Stefan Schäfer, Bernhard Scholz (stefans|scholz)@it.usyd.edu.au School of IT, University of Sydney

Upload: doandang

Post on 11-Feb-2017

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal Chain Rule Placement for Instruction Selection based on

Optimal Chain Rule Placementfor Instruction Selectionbased on SSA Graphs

Stefan Schäfer, Bernhard Scholz(stefans|scholz)@it.usyd.edu.au

School of IT, University of Sydney

Page 2: Optimal Chain Rule Placement for Instruction Selection based on

Outline● Related Work

● Motivation (Instruction Selection based on SSA Form)

● Chain Rule Placement

● Implementation

● Results

● Conclusion

Page 3: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (1)

SourceProgram

CompilerFront­End

IntermediateRepresentationin SSA Form

CompilerBack­End Target

Program

Machine­Independent

Optimisations

CodeSelection

InstructionScheduling

RegisterAllocation

Page 4: Optimal Chain Rule Placement for Instruction Selection based on

Related Work (1)● Tree Pattern Matching

C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):68­76 (1992)

– Works fine with trees (expressions)

Page 5: Optimal Chain Rule Placement for Instruction Selection based on

Related Work (1)● Tree Pattern Matching

C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):68­76 (1992)

– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs

Page 6: Optimal Chain Rule Placement for Instruction Selection based on

Related Work (1)● Tree Pattern Matching

C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):68­76 (1992)

– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs

● Code Selection for DAGs

M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999

Page 7: Optimal Chain Rule Placement for Instruction Selection based on

Related Work (1)● Tree Pattern Matching

C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):68­76 (1992)

– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs

● Code Selection for DAGs

M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999

– DAG­Matching is NP­complete.

Page 8: Optimal Chain Rule Placement for Instruction Selection based on

Related Work (2)● Code Selection based on SSA Graphs

E. Eckstein, O. König, B. ScholzCode Instruction Selection based on SSA GraphsSCOPES 2003, Volume 2826 of Lecture Notes on Computer Science

– Introduced a (heuristical) code selection techniques for DAGs– Cost­optimal derivation of a graph grammar for a given SSA graph

– Chain rules used for type conversion, but optimal placement unaddressed– optimal means: cost­minimal for a given cost metric

Page 9: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

Page 10: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)

b1 cast

b14 add b11 add b12 add

Page 11: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)

b1 cast

b14 add b11 add b12 add

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 12: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)

reg reg reg

sreg::=cast(sreg)

reg::=add(reg,reg) reg::=add(reg,reg) reg::=add(reg,reg)

b1 cast

b14 add b11 add b12 add

sreg

reg reg reg

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 13: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

strategy time space trade­off 1:4def (b

1) 30140 1 6028.8

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 14: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

strategy time space trade­off 1:4def (b

1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 15: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

strategy time space trade­off 1:4def (b

1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 16: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

strategy time space trade­off 1:4def (b

1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4

optimal 3510placed at b

5, b

9, b

10

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 17: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

strategy time space trade­off 1:4def (b

1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4

optimal 3510 1placed at b

5, b

9, b

10b

1

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 18: Optimal Chain Rule Placement for Instruction Selection based on

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

strategy time space trade­off 1:4def (b

1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4

optimal 3510 1 704placed at b

5, b

9, b

10b

1b

5, b

7

reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]

Page 19: Optimal Chain Rule Placement for Instruction Selection based on

SSA Form● Single Static Assignment form

● There is at most one assignment to each variable.

● Each definition of a variable is distinct.

Page 20: Optimal Chain Rule Placement for Instruction Selection based on

SSA Form● Single Static Assignment form

● There is at most one assignment to each variable.

● Each definition of a variable is distinct.

● Multiple definitions have to be resolved:

– if (e) b=32 else b=42; ­> if (e) b1=32 else b

2=42;

● Further uses induce φ­functions:

– a=b; ­> a=φ(b1,b

2);

● SSA graphs as intermediate data flow representation in SSA form

Page 21: Optimal Chain Rule Placement for Instruction Selection based on

Chain Rule Placement● Map the CFG to a network

● Reduce the network for each definition and non­terminal(a definition node dominates all of its users)

● Find a minimum cut for each reduced network

Page 22: Optimal Chain Rule Placement for Instruction Selection based on

Mapping to a Network

d 10

v 10u 10

Page 23: Optimal Chain Rule Placement for Instruction Selection based on

Mapping to a Network

dn

dx

10

d 10

tnt d

v 10u 10 un

ux

10

vn

vx

10

∞ ∞

Page 24: Optimal Chain Rule Placement for Instruction Selection based on

Reducing each Network● Done for each definition d and non­terminal

● Starts in each user u:

● Case 1: u is not a φ­node

Page 25: Optimal Chain Rule Placement for Instruction Selection based on

Reducing each Network● Done for each definition d and non­terminal

● Starts in each user u:

● Case 1: u is not a φ­node

– All nodes an all acyclic paths from d to u are dominated by d– All those nodes added to reduced network

Page 26: Optimal Chain Rule Placement for Instruction Selection based on

Reducing each Network● Done for each definition d and non­terminal

● Starts in each user u:

● Case 2: u is a φ­node, all v ∈ preds(u) is dominated by d

r

u  = (..., w1, ..., w

2 ...)

w2= op’ (...)

w2w1

w1= op (...)

v1 v2

Page 27: Optimal Chain Rule Placement for Instruction Selection based on

Reducing each Network● Done for each definition d and non­terminal

● Starts in each user u:

● Case 2: u is a φ­node, all v ∈ preds(u) are dominated by d

– All nodes an all acyclic paths from d to v are dominated by d– All those nodes and u added to reduced network

r

u  = (..., w1, ..., w

2 ...)

w2= op’ (...)

w2w1

w1= op (...)

v1 v2

Page 28: Optimal Chain Rule Placement for Instruction Selection based on

Reducing each Network● Done for each definition d and non­terminal

● Starts in each user u:

● Case 3: u is a φ­node, any v ∈ preds(u) is not dominated by d

r

u  = (..., d1, ..., d

2 ...)

d2= op’ (...)

d2d1

d1= op (...)

x1x2

y

Page 29: Optimal Chain Rule Placement for Instruction Selection based on

Reducing each Network● Done for each definition d and non­terminal

● Starts in each user u:

● Case 3: u is a φ­node, any v ∈ preds(u) is not dominated by d

– Stop traversal for all users of d and add only d to reduced networkr

u  = (..., d1, ..., d

2 ...)

d2= op’ (...)

d2d1

d1= op (...)

x1x2

y

Page 30: Optimal Chain Rule Placement for Instruction Selection based on

Reducing each Network● Done for each definition d and non­terminal

● Starts in each user u:

● Case 3: u is a φ­node, any v ∈ preds(u) is not dominated by d

– Stop traversal for all users of d and add only d to reduced network

not cost­optimal butdoes not occur very often:

2264628 nodes94183 φ­usescase 3 occurs 1076 times

r

u  = (..., d1, ..., d

2 ...)

d2= op’ (...)

d2d1

d1= op (...)

x1x2

y

Page 31: Optimal Chain Rule Placement for Instruction Selection based on

Implementation

GraphGrammar

 Code Basein L

Page 32: Optimal Chain Rule Placement for Instruction Selection based on

Implementation

GraphGrammar

CodeGeneratorGenerator

Source forCode

Generatorin L

 Code Basein L

Page 33: Optimal Chain Rule Placement for Instruction Selection based on

Implementation

GraphGrammar

Source forCode

Generatorin L

Compilerfor L

CodeGenerator

in L

PBQPLibraryfor L

 Code Basein L

CodeGeneratorGenerator

Page 34: Optimal Chain Rule Placement for Instruction Selection based on

Implementation

GraphGrammar

Source forCode

Generatorin L

Compilerfor L

CodeGenerator

in L

Run

Input Program inSSA Form

Base RuleMatching

PBQPLibraryfor L

 Code Basein L

CodeGeneratorGenerator

CompleteMatching

Chain RulePlacement

Page 35: Optimal Chain Rule Placement for Instruction Selection based on

Costs (Spec2000, Time:Space 1:4)

168.wupw

ise171.sw

im172.m

grid173.applu175.vpr176.gcc177.m

esa179.art181.m

cf183.equake186.crafty188.am

mp

197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw

olf301.apsi

0

10

20

30

40

50

60

70

80

90

100

Use

Def

Def-Use

Min-Cut

%

Page 36: Optimal Chain Rule Placement for Instruction Selection based on

Costs (MiBench, Time:Space 1:4)

bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw

caudioraw

daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm

ediantoastuntoast

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Use

Def

Def-Use

Min-Cut

%

Page 37: Optimal Chain Rule Placement for Instruction Selection based on

Execution Times (Spec2000)

168.wupw

ise171.sw

im172.m

grid173.applu175.vpr176.gcc177.m

esa179.art181.m

cf183.equake186.crafty188.am

mp

197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw

olf301.apsi

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Misc

Min Cut

NetworkPBQP

Program

% T

ime

Page 38: Optimal Chain Rule Placement for Instruction Selection based on

Execution Times (MiBench)

bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw

caudioraw

daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm

ediantoastuntoast

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Misc

Min Cut

Network

PBQP

Program

% T

ime

Page 39: Optimal Chain Rule Placement for Instruction Selection based on

Contributions● Contributed to code selection based on SSA­Graphs

● Main Contributions:

– Formally addressed the unsolved problem of placing chain rules optimally– Introduced an efficient and effective algorithm to place chain rules 

optimally with respect to an arbitrary cost metric– Implemented a free, open­source code generator generator, enhancing rule 

matching with chain rule placement– Proved the correctness of our algorithm– Conducted experiments with Spec2000 and MiBench suites

Page 40: Optimal Chain Rule Placement for Instruction Selection based on

Thank you for your attention!

Any questions or comments?