dynamic removal of redundant computations

31
U U P P C C Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona {cmolina,antonio,jordit}@ac.upc.es ICS´99, Rhodes (Greece) - June 20-25, 1999

Upload: caia

Post on 30-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

ICS´99, Rhodes (Greece) - June 20-25, 1999. Dynamic Removal of Redundant Computations. Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona {cmolina,antonio,jordit}@ac.upc.es. Motivation. Quasi-common subexpression. Quasi - invariant. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Removal of Redundant Computations

UU PP CC

Dynamic Removal of Redundant Computations

Dynamic Removal of Redundant Computations

Carlos Molina, Antonio González and Jordi Tubella

Universitat Politècnica de Catalunya - Barcelona

{cmolina,antonio,jordit}@ac.upc.es

ICS´99, Rhodes (Greece) - June 20-25, 1999

Page 2: Dynamic Removal of Redundant Computations

UU PP CC

for (i=0; i<N; i++)

A[i] = B[i]+C[i];

. . . . .

R = S / T ;

. . . . .

X = S / U ;

. . . . .

MotivationMotivation

Quasi - invariantQuasi-common subexpression

Page 3: Dynamic Removal of Redundant Computations

UU PP CC

OutlineOutline

Instruction Reuse

Related Work

Redundant Computation Buffer

Performance Results

Conclusions

Page 4: Dynamic Removal of Redundant Computations

UU PP CC

Instruction ReuseInstruction Reuse

FetchDecode

& Rename

CommitOOO

Execution

Reuse

Mechanismindex

Page 5: Dynamic Removal of Redundant Computations

UU PP CC

Related WorkRelated Work

Instruction Reuse Value Cache for the Tree Machine (Harbison 82) Result Cache (Richardson 92, Oberman et al. 95) Reuse Buffer (Sodani and Sohi 97) Physical Register Reuse (Jourdan et al. 98)

Trace Reuse Basic blocks (Huang and Lilja 99) General traces (González et al. 99)

Page 6: Dynamic Removal of Redundant Computations

UU PP CC

Related WorkRelated Work

Result Cache Richardson 92, Oberman & Flynn 95

– Special purpose (long latency operations)– Indexed by operand values– No reuse chaining– Can reuse dynamic instances of other static instructions

Reuse Buffer Sodani & Sohi 97

– General purpose– Indexed by PC– Reuse chaining– Only reuse dynamic instances of same static instructions

Page 7: Dynamic Removal of Redundant Computations

UU PP CC

Redundant Computation BufferRedundant Computation Buffer

Vtable

Atable pointer

opcode result/address opnd1 opnd2 pointer

Atable

address tag result

Mtable

Reuse Test

Reused Value

Reused Memory Value

Page 8: Dynamic Removal of Redundant Computations

UU PP CC

RCB (Working Example)RCB (Working Example)

I1: 8 / 2 = 4

Vtable Atable

10: div 8 nil2 4

4

while (cond) { r = s / t ; ...... x = s / u ; }

Page 9: Dynamic Removal of Redundant Computations

UU PP CC

20: div 8 2 4 nil

RCB (Working Example)RCB (Working Example)

Vtable

10:

Atable

div 8 nil2 4

4

while (cond) { r = s / t ; ...... x = s / u ; } I2: 8 / 2 = 4

Page 10: Dynamic Removal of Redundant Computations

UU PP CC

Vtable

10:

Atable

div 8 nil2 4

4

while (cond) { r = s / t ; ...... x = s / u ; } I2: 8 / 2 = 4

20: div 8 2 4

RCB (Working Example)RCB (Working Example)

Page 11: Dynamic Removal of Redundant Computations

UU PP CC

20: div 8 nil2 4

div 8 nil2 4div 9 nil3 3

Vtable

10:

Atable

4

while (cond) { r = s / t ; ...... x = s / u ; }

I1: 9 / 3 = 3

3

I2: 9 / 3 = 3

RCB (Working Example)RCB (Working Example)

Page 12: Dynamic Removal of Redundant Computations

UU PP CC

Enhanced Result CacheMtable

address tag result

Atable

opcode result/address opnd1 opnd2Operands

Enhanced Reuse BufferMtableAtable

opcode result/address opnd1 opnd2

address tag result

PC

Enhancements to Other SchemesEnhancements to Other Schemes

Page 13: Dynamic Removal of Redundant Computations

UU PP CC

Timing ConsiderationsTiming Considerations

fetch issue commitexecute write backdecode&

rename

opnd read&dispatch

Pipeline Stages

Atablelookup

reuse test

Latency of the Reuse Buffer

1stAtable lookup

reuse test

2ndAtable lookup

Latency of the RCB

Atablelookup

reuse test

Latency of the Result Cache

Page 14: Dynamic Removal of Redundant Computations

UU PP CC

Experimental FrameworkExperimental Framework

Simulator Alpha version of the SimpleScalar Toolset

BenchmarksSpec95

Maximum Optimization LevelDEC C & F77 compilers with -non_shared -O5

Statistics Collected for 125 million instructionsSkipping initializations

Page 15: Dynamic Removal of Redundant Computations

UU PP CC

Basic Reuse StatisticsBasic Reuse Statistics

We evaluate different schemes- Enhanced Result Cache (ERC)- Enhanced Reuse Buffer (ERB)- Redundant Computation Buffer (RCB)

We find best configuration for each scheme- Number of entries- History depth

Best configurations will be evaluated- Percentage of reuse- Speedup

Page 16: Dynamic Removal of Redundant Computations

UU PP CC

Quasi-Common SubexpressionsQuasi-Common Subexpressions

05

1015202530354045

Per

cen

tag

e o

f R

euse

ERB

RCB

32 KB

Page 17: Dynamic Removal of Redundant Computations

UU PP CC

Study of Reuse (ERB)Study of Reuse (ERB)

10

15

20

25

30

35

40

45

50

55

Per

cen

tag

e o

f R

euse

16K entries

8K entries

4K entries

2K entries

1K entries

512 entries

256 entries

128 entries

| | | | | | | | |

8 16 32 64 128 256 512 1024 2048 4096

Size in Kbytes

Page 18: Dynamic Removal of Redundant Computations

UU PP CC

Study of Reuse (RCB)Study of Reuse (RCB)

15

20

25

30

35

40

45

50

55

60

Per

cen

tag

e o

f R

euse

16K entries

8K entries

4K entries

2K entries

1K entries

512 entries

256 entries

128 entries

| | | | | | | | |

8 16 32 64 128 256 512 1024 2048 4096

Size in Kbytes

Page 19: Dynamic Removal of Redundant Computations

UU PP CC

Study of Reuse (Comparative)Study of Reuse (Comparative)

10

20

30

40

50

60

70

Pe

rce

nta

ge

of

Re

us

e

ERB RCB ERC

| | | | | | | | |

8 16 32 64 128 256 512 1024 2048 4096

Size in Kbytes

Page 20: Dynamic Removal of Redundant Computations

UU PP CC

Performance EvaluationPerformance Evaluation

Two different capacities are evaluated- 32 KB- 200 KB

Best configuration has been chosen for each reuse scheme

We present a performance evaluation for a supercalar processor

- Speedup- Percentage of reuse

Page 21: Dynamic Removal of Redundant Computations

UU PP CC

Instruction fetch 4 instructions per cycle

Branch predictor 2048-entry bimodal predictor

Data cache 16 KB, 2-way-set associative, 32-byte block, 6-cycle miss latency

Instruction cache 16KB, direct mapped, 32 byte cache line, 6-cycle miss latency

Instruction issue/commitOut of order issue, 4 I´s commit per cycle, 32-entry reorder buffer,load execute if preceding stores are known, store-load forwarding

Architected registers 32 integer and 32 FP

Functional units4 integer ALUs, 2 load/store units, 4 FP adders,

1 integer mult/div, 1 FP mult/div

FU latency/repeat timeInteger ALU 1/1, load/store 1/1, integer mult 3/, integer div 20,19,

FP adder 2/1, FP mult 4/1, FP div 12/12

Base MicroarchitectureBase Microarchitecture

Page 22: Dynamic Removal of Redundant Computations

UU PP CC

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

H_Mean

ERB

RCB

ERC

Speedup (32 KB)Speedup (32 KB)

1.20

1.10

1.00

1.05

1.15

Page 23: Dynamic Removal of Redundant Computations

UU PP CC

Speedup (200 KB)Speedup (200 KB)

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

H_Mean

ERB

RCB

ERC

1.25

1.20

1.15

1.10

1.05

1.00

Page 24: Dynamic Removal of Redundant Computations

UU PP CC

0

10

20

30

40

50

60

70

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

0

10

20

30

40

50

60

70

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

Reuse (32 KB)Reuse (32 KB)

Ops ready

Page 25: Dynamic Removal of Redundant Computations

UU PP CC

Reuse (200 KB)Reuse (200 KB)

0

10

20

30

40

50

60

70

80

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

0

10

20

30

40

50

60

70

80

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

Ops ready

Page 26: Dynamic Removal of Redundant Computations

UU PP CC

0102030405060708090

100

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

0102030405060708090

100

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

0102030405060708090

100

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

0102030405060708090

100

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

ERB

RCB

ERC

Reuse by Instruction CategoryReuse by Instruction Category

Load Value Memory Address Arithmetic Cond Branch

Page 27: Dynamic Removal of Redundant Computations

UU PP CC

Hybrid SchemeHybrid Scheme

opco res/addr op1 op2 pointer

Atable

PC Atable

opco res/addr op1 op2 pointerPC

Opnds opco res/addr op1 op2 nilAtable

opcod result/addr opnd1 opnd2 Opnds

Page 28: Dynamic Removal of Redundant Computations

UU PP CC

Speedup (Hybrid Scheme)Speedup (Hybrid Scheme)

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

H_Mean

RCB

Hybrid

1.20

1.10

1.05

1.00

1.15

Page 29: Dynamic Removal of Redundant Computations

UU PP CC

Reuse (Hybrid Scheme)Reuse (Hybrid Scheme)

0

10

20

30

40

50

60

70

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

A_Mean

RCB

Hybrid

Page 30: Dynamic Removal of Redundant Computations

UU PP CC

Speedup (Perfect Reuse Engine)Speedup (Perfect Reuse Engine)

Applu

Compre

ssGcc Go Li

M88

ksim

Mgrid Perl

Swim

Turb3d

Vortex

H_Mean

1.60

1.40

1.80

2.00

2.20

1.20

1.00

Page 31: Dynamic Removal of Redundant Computations

UU PP CC

ConclusionsConclusions

Redundant Computation Buffer Quasi-invariants Quasi-common subexpressions

High reuse coverage and low latency 30% reuse 10% speedup Outperforms previous schemes