an ssa-based algorithm for optimal speculative code motion...

36
An SSA-based Algorithm for Optimal Speculative Code Motion under an Execution Profile Hucheng Zhou Tsinghua University June 2011 Joint work with: Wenguang Chen (Tsinghua University), Fred Chow (ICube Technology Corp.)

Upload: others

Post on 11-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

An SSA-based Algorithm for Optimal Speculative

Code Motion under an Execution Profile

Hucheng Zhou

Tsinghua University

June 2011

Joint work with: Wenguang Chen (Tsinghua University),

Fred Chow (ICube Technology Corp.)

Page 2: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 2

Contents

Basic Concepts

PRE

SSA

SSAPRE

Speculative Code Motion

MC-SSAPRE

Algorithm

Complexity

Experiments

Conclusion

Page 3: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 3

Partial Redundancy Elimination (PRE)

• Eliminates expressions redundant on some (not

necessarily all) paths

• One of the most important and widely applied

target-independent global optimization

• Subsumes global common subexpression and

loop invariant code motion

B3

B4 a+bB5a+b

B1 B2a+b

B3

B4 tB5t

B1 t=a+bB2t=a+b

PRE

Page 4: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 4

PRE Facts

• Applied to each lexically identified expression

independently

– e.g (a+b), (a-b), (a*c)

• Formulated as a Placement problem:

Step 1 – Determine where to perform insertions

– Render more computations fully redundant

Step 2 – Delete fully redundant computations

• Main challenge is in Step 1

Page 5: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 5

The Most Popular PRE Algorithms

Lazy Code Motion (Knoop et. al )

– Computationally and Life-time Optimal

– Ordinary program representation

– Bit-vector-based iterative data flow analyses

SSAPRE

– Computationally and Life-time Optimal

– SSA form of program representation

– Sparse solution of data flow properties

– Subsumes local common subexpression

• Insensitive to basic block boundaries

Page 6: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 6

Static Single Assignment (SSA)

• Program representation with built-in use-def

information

• Use-def edges factored at join points in CFG

• Use-def implicitly represented via unique names

• Each renamed variable has only one definition

B3

B4 =aB5=a

B1 a=B2a=

CFG

use-defB3

B4 =aB5=a

B1 a=B2a=

USE-DEF

a3 = (a1,a2)B3

B4 =a3B5=a3

B1 a2=B2a1=

factored use-def

Page 7: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 7

Factored Redundancy Graph (FRG)• Used in SSAPRE to represent redundant relationships among

occurrences of the same expression via edges

• The redundancy edges are factored as in SSA

• Can view as SSA applied to expressions

– Effectively put the t storing the expression after PRE in SSA form

B3

B4 a+bB5a+b

B1 a+bB2a+b

CFG

redundancyB3

B4 a+bB5a+b

B1 a+bB2a+b

Redundancy

t3= (t1,t2)B3

B4 t3B5t3

B1 t2=a+bB2t1=a+b

factored redundancy

Page 8: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 8

Speculative Code MotionClassical PRE only inserts at places where the

expression is anticipated (down-safe)– Many redundant computations cannot be eliminated

Speculative code motion ignores safety constraint

– Can remove more redundancies

– Not applicable to computations that may trigger runtime exceptions

Classical PRE

B3

B4 B5a+b

B1 B2a+b

CFG

B3

B4 B5t

B1 t=a+bB2t=a+b

Unsafe Path

Speculation

Page 9: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 9

While Loop Example

Classical PRE

Speculation

Invariant code motion involves speculation

Page 10: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 10

While Loop Restructuring

while loop

restructurePRE

• The common solution

• Speculation no longer necessary

• But code size increases

Page 11: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 11

Speculation not always beneficial

• Useless computations introduced for some paths

• Beneficial only if removed computations executed

more frequently than inserted computations

• Requires execution frequency information

B3

150

B4

50

B5

100a+b

B1

50

B2

100a+b

B3

150

B4

50

B5

100t

B1

50 t=a+bB2

100t=a+b

Non-beneficial

because

freq(B2) > freq(B4)

Page 12: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 12

Problem Statement

How to minimize the dynamic execution count of an

expression under an execution profile

• A more aggressive form of PRE

– Classical PRE beneficial regardless of execution

frequencies

• Cai and Xue (2003, 2006) first to apply min-cut to solve

this problem optimally

– Algorithm called MC-PRE

– Uses bit-vector-based data flow analyses

– Min-cut applied to CFG

• No SSA-based technique exists yet

Page 13: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 13

Topic of this Paper

MC-SSAPRE – a new algorithm that yields

optimal code placement under the SSAPRE

framework

Overview:

• Form a essential flow graph (EFG) out of the

FRG

• Map the BB execution frequencies to the EFG

nodes

• Apply min-cut to the EFG

Page 14: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 14

Algorithm Steps

SSAPRE Steps

• Construct FRG

F insertion

– Rename

• Data Flow Attributes

– DownSafety

– WillBeAvail

• Book-keeping

– Finalize

– CodeMotion

MC-SSAPRE Steps

• Construct FRG

o F insertion

o Rename

• Form EFG and perform min-cut

o Data flow

o Graph reduction

o Single source

o Single sink

o Minimum cut

o WillBeAvail

• Book-keeping

o Finalize

o CodeMotion

Page 15: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 15

Running example in SSA Form

a1+b1B1

50

B2

20

B3

70

a1+b1 exita1+b1

a1+b1 exit

exit exit

B4

50

B5

10

B6

10

B7

50

B8

60

B9

5

B10

5

B12

60

B12

5

a1+b1

Input Program

Page 16: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 16

FRG for Running Example

Introduce h so the FRG can be viewed from an SSA perspective

FFF

h1B1

50

h2= F(h1,^)B3

70

h4= F(h3,h2)

h4

h3h2

h2

B4

50 B6

10

B8

60B9

5

F Insertion

and

Rename

a1+b1B1

50

B2

20

B3

70

a1+b1 exita1+b1

a1+b1 exit

exit exit

B4

50

B5

10

B6

10

B7

50

B8

60

B9

5

B10

5

B12

60

B12

5

a1+b1

Input Program FRG

Page 17: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 17

Roles of Factored Redundancy Graph• Insertions need to be considered only at F’s

– associated with the F operands

• Medium to compute data flow properties to disqualify

more F’s from being insertion candidates

• SSA form for t (temporary to store the computed

value) will be carved out of the FRG

• Three kinds of nodes:

1.Real occurrences in original program

• Def – always non-redundant

• Use – partially redundant (including fully redundant)

2. F (def)

3. F operand (use) – can be ^

Page 18: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 18

Data Flow Properties for MC-SSAPRE

Fully available

• Insertions at these F’s always unnecessary

because the computed values are available

Partially anticipated

• Insertions should only be at these F’s

• otherwise, the inserted computation would

have no use

Page 19: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 19

Graph Reduction

Use computed data flow properties to further

narrow down the F candidates for insertion

Delete:

F’s that are fully available

F’s that are not partial anticipated

Use nodes (real occurrences or F

operands) that are fully redundant

Edges from/to above nodes

Page 20: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

Graph Reduction for Running Example

June 2011 MC-SSAPRE PLDI 20

graph reduction

h1B1

50

h2= F(h1,^)B3

70

h3h2

h2

B4

50 B6

10

B9

5

FFF

h2= F(h1, ^)B3

70

h4= F(h3,h2)

h4

h2B6

10

B8

60

rg_excluded

rg_excluded – fully redundant occurrences determined during Renaming

FFFh4= F(h3,h2)

h4

B8

60

Page 21: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 21

Form Essential Flow Graph (EFG)

• Introduce a virtual source node

– Add an edge from it to each ^ F operand

• Introduce a virtual sink node

– Add an edge from each real occurrence to it

• Result is a complete flow network

source

new edges

h2= F(h1,^)B3

70

h2B6

10

sink

FFFh4= F(h3,h2)

h4

B8

60

Page 22: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 22

Edges in EFG

Edges to the sink are never insertion candidate

– Mark with ∞ frequency

Other edges are:

Type 1 edge – Edges ending at a F operand

Type 2 edge – Edges from a F to a real occurrence

source

Type 1

h2= F (h1,^)B3

70

h2B6

10

sink

FFFh4= F(h3,h2)

h4

B8

60

Type 2

Page 23: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 23

Mapping Frequencies to EFG Edges

• Model insertion at a Type 1 edge by inserting at

exit of the predecessor BB corresponding to the

F operand

– Annotate the Type 1 edge by the node

frequency of that predecessor BB

• Insertion at a Type 2 edge means performing

the computation in place

– Annotate the Type 2 edge by the frequency of

the real occurrence

Page 24: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

h4= F(h3,h2)

h4

60

1010

20

EFG annotated with Frequencies

June 2011 MC-SSAPRE PLDI 24

a1+b1B1

50

B2

20

B3

70

a1+b1 exita1+b1

a1+b1 exit

exit exit

B4

50

B5

10

B6

10

B7

50

B8

60

B9

5

B10

5

B12

60

B12

5

a1+b1

source

h2= F(h1,^)B3

70

h2B6

10

sink

B8

60

Type 2

Original Program

Type 1

Final EFG

Page 25: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 25

Performing Minimum Cut

A minimum cut

• separates the flow network into two halves, such that

• the sum of the weights of the cut edges is minimized

By performing insertions at the cut edges, the number of execution of the computation is minimized

– Implies computational optimality

If min-cut not unique, choose the cut nearest the sink

– Induces life-time optimality

Page 26: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 26

Our Example

60

1010

20

source

h2= F(h1,^)B3

70

h2B6

10

sink

B8

60 h4= F(h3,h2)

h4

• Two possible min-cuts

• Pick later red one

min-cut

min-cut

Page 27: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 27

Final Result

B3

70

t2

t2 =a1+b1

t2t2=a1+b1 exitt1=a1+b1

t1

t1 exit

exit exit

B4

50

B5

10

B6

10

B7

50

B8

60

B9

5

B10

5

B11

10

B13

5

a1+b1B1

50

B2

20

60

1010

20

source

h2= F(h1,^)B3

70

h2B6

10

sink

B8

60 h4= F(h3,h2)

h4

min-cut

final transformed program

Page 28: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

V – number of FRG nodes

E – number of FRG edges

• Except the minimum cut

step, all the steps are

O(V+E)

• Performing minimum cut

is

• In general,

Vcfg > Vfrg > Vefg

Complexity of MC-SSAPRE

)2( EVO

June 2011 MC-SSAPRE PLDI 28

MC-SSAPRE Steps

• Construct FRG

o F insertion

o Rename

• Form EFG and perform min-cut

o Data flow

o Graph reduction

o Single source

o Single sink

o Minimum cut

o WillBeAvail

• Book-keeping

o Finalize

o CodeMotion

Page 29: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 29

Our Implementation

• Implemented MC-SSAPRE in the open

source Path64 compiler, a descendent of the

compiler with the original SSAPRE

• Leveraged existing SSAPRE infrastructure

• Resulting compiler will perform:

– SSAPRE when no profile available

• Perform speculation for loop-invariant

computations

– MC-SSAPRE with profile data

• Compiler always restructures while loops

Page 30: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 30

Setup of Experiment 1

• Target is Intel CoreTM i7-970 at 2.67GHz with 8MB

cache

• Ubuntu 9.10

• With 6GB on board memory

• Compare run-time performances of all of SPEC

CPU2006 (29 benchmarks)

• The 3 runs:

SSAPRE – no speculation, no profile data

SSAPREsp – loop-based speculation, no profile data

MC-SSAPRE – speculation based on profile data

Page 31: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 31

Experimental Results – CINT2006

• Average speedup of 2.13% over SSAPRE

• Average speedup of 2.25% over SSAPREsp

0.94

0.96

0.98

1

1.02

1.04

1.06

1.08

400.pe

rlben

ch

401.bz

ip2

403.gc

c

429.mcf

445.go

bmk

456.hm

mer

458.sjen

g

462.lib

quan

tum

464.h2

64re

f

471.om

netpp

473.as

tar

483.xa

lanc

bmk

Ave

rage

SSAPRE SSAPREsp MC-SSAPRE

Page 32: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 32

Experimental Results – CFP2006

• Average speedup of 2.76% over SSAPRE

• Average speedup of 1.96% over SSAPREsp

0.94

0.96

0.98

1

1.02

1.04

1.06

1.08

1.1

410.bwav

es

416.gam

ess

433.milc

434.ze

usmp

435.gromac

s

436.ca

ctus

ADM

437.leslie3d

444.nam

d

447.dea

lII

450.so

plex

453.pov

ray

454.ca

lculix

459.Gem

sFDTD

465.tonto

470.lbm

481.wrf

482.sp

hinx

3

Ave

rage

SSAPRE SSAPREsp MC-SSAPRE

Page 33: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 33

Setup of Experiment 2

• Calculate size of EFGs formed during MC-SSAPRE

• Same 29 SPEC CPU2006 benchmarks

• Target-independent

• Show

– Optimization overhead in MC-SSAPRE

– Impact of sparse approach

• Exclude empty EFGs

• Smallest EFG is 4 nodes:

– Source, sink, F, real occurrence

Page 34: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 34

Sizes of EFGs

• 183152 EFGs in the 29 SPEC CPU2006 benchmarks

• Near 50% of EFGs are only 4 nodes

• 86.5% of EFGs are less than 10 nodes

• 99.0% of EFGs are less than 50 nodes

• 24 EFGs larger than 300 nodes (largest size is 805)

0

20000

40000

60000

80000

100000

4 5 6 7 8 9 10 11–15 16–20 21–30 31–40 41–50 51–60 61–70 >=71

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

Number of EFGs Cumulative %

Number of Nodes in the EFG

Page 35: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

June 2011 MC-SSAPRE PLDI 35

Conclusion

• The minimum-cut technique for flow networks can

effectively be applied to SSA graphs

• SSA-based compilers can apply MC-SSAPRE to

achieve optimal speculative code motion under an

execution profile

• The sparse approach is effective in reducing the

problem sizes

• The polynomial time complexity of Min-cut only has

limited effect on MC-SSAPRE’s optimization efficiency

• MC-SSAPRE always improves program performance

over SSAPRE

Page 36: An SSA-based Algorithm for Optimal Speculative Code Motion ...pacman.cs.tsinghua.edu.cn/~cwg/papers_cwg/pldi11-slides.pdfAn SSA-based Algorithm for Optimal Speculative Code Motion

Questions?