december 5, 2001micro-34, austin, texas cool-cache for hot multimedia osman s. unsal, raksit ashok,...

Post on 15-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

December 5, 2001 MICRO-34, Austin, Texas

Cool-Cache for Hot Multimedia

Osman S. Unsal, Raksit Ashok,

Israel Koren, C. Mani Krishna,

Csaba Andras Moritz

Department of Electrical and Computer Engineering

University of Massachusetts, Amherst

Power Density

1

10

100

1000

W/cm2

i386 i486 Pentium PentiumPro

PentiumII

PentiumIII

PentiumIV

NuclearReactor

Source: Fred Pollack, Intel, Micro32

Cool-* Project

A compiler-enabled power-aware architecture.

CPU Power Dissipation by Block

Alpha 21464

Issue46%

Mem26%

Exec22%

Fetch6%

Power PC

Clock19%

Cache23%

Control L.16%

Data Flow11%

I/O 7%

PLA5%

ROM2%

TLB17%

Strong ARM

Icache26%

Ibox18%

Clock10%

IMMU9%

EBOX8%

DMMU8%

Others5%

Dcache16%

Concentrate on L1 data cache

IEEE Journal of SSC Nov. 96 Proceedings of ISSCC 94

Cool Chips, Micro-32, 99

Cool-Cache Philosophy

Speculatively employ static information to simplify memory accesses

Leverage multimedia sensitive compile-time partitioning of memory accesses

Conventional Cool-Cache

Data

Static and dynamic

Tag

Dynamic

SRAM Buffer

•Non-adaptive

•Tags

•Single access mechanism

•Statically Speculative

•No Tags

•Multiple access mechanism

Cool-Cache Framework

Minibuffer Scratchpad – Scalars in media applications have low memory

footprint, high access frequency

– Partition scalars from non-scalars

Hotlines– Non-scalar locations in cache can be speculatively

predicted

– Simplify memory accesses

Cool-Cache Architecture

Cool-Cache Architecture

Cool-Cache Architecture

Cool-Cache Architecture

Hotline Approachfor (i=0;i<100; i++) {

a[i]=a[i+1]; /* both can be mapped to the same hotline */

*p++=b[i]; /* to separate hotlines without alias analysis */

}

•Based on:

Type analysis

Control-flow and loop-structure analysis

Alias analysis

• A compile-time fully-predictable approach would require loop-transformations to align accesses to cache line boundaries, has limited scope to simple loops.

Hotlines Advantages

• Speculative prediction does not require static correctness

• Granularity of speculation is compiler controllable

• Hotlines does not increase code size

Cool-Cache Compiler

High-Level Analysis

Alias Analysis

Hotlines Analysis

Cool-Cache Specific Code Generation

Footprint AnalysisAnnotations

High-Level Optimizations

BenchmarksBenchmark DescriptionADPCM Adaptive differential pulse code modification audio coding

EPIC Image compression coder based on wavelet decomposition

G721 Voice compression coder based on G.711,721,723 standards

GSM Rate speech transcoding coder based on the GSM standard

JPEG A lossy image compression coder

MESA OpenGL clone: using Mipmap quadilateral texture mapping

MPEG Lossy motion video compression decoder

PEGWIT Public key encryption coder generates a public key

RASTA Speech recognition front-end processing

Experimental Setup

General Parameters

1GHz,

0.35μm, 2.5V

Issue In-order, single

L1 D-Cache 64K, 2way

Minibuffer 1K

L1 I-Cache 32K, 2way

L2 Cache None

Main memory 100 cycles

Minibuffer FootprintApplication Size

Adpcm 0

Epic 203

G721 Enc. 32

Gsm Enc. 146

Jpeg Enc. 83

Mpeg Enc. 604

Pegwit 16

Rasta 152

PGP 358

Mesa 770

Application 32reg. 16 reg.

Epic 32.0 62.4

G721 4.5 38.8

Gsm 2.3 37.2

Jpeg 1.1 46.5

Rasta 16.0 36.0

•Scalar memory requirements are low!

•Percentage of scalars in total memory accesses are high!

Impact of Minibuffer

0100200300400500600700800900

Ene

rgy

cons

umpt

ion

(mJ)

Epi

c

Peg

wit

Ras

ta

Mip

map

Mpe

g

G72

1

Gsm

Jpeg

16 R. W/Minibuffer 16 R. No Minibuffer32 R. No Minibuffer

Minibuffer Energy Savings

0

10

20

30

40

50

60

Per

cent

Epi

c

Peg

wit

Ras

ta

Mip

map

Mpe

g

G72

1

Gsm

Jpeg

32-Register16-Register

Hotlines Hit Rate

0

20

40

60

80

100

Per

cent

Gsm

Jpeg

Mpe

g

Ras

ta

Epi

c

G72

1

Mes

a

Peg

wit

Adp

cm

SW handler

Cache TLB

Static

Cool-Cache Relative Runtime

00.20.40.60.8

11.21.4

Gsm

Jpeg

Mpe

g

Ras

ta

Epi

c

G72

1

Mes

a

Peg

wit

1024

256

64

Cool-Cache Energy Savings(32 Registers)

0

10

20

30

40

50

60

Per

cent

Gsm

Jpeg

Mp

eg

Ras

ta

Ep

ic

G72

1

Mes

a

Peg

wit

Ad

pcm

4-WayDirect

Cool-Cache Energy Savings(16 Registers)

01020304050607080

Per

cent

Gsm

Jpeg

Mp

eg

Ras

ta

Ep

ic

G72

1

Mes

a

Peg

wit

Ad

pcm

4-Way

Conclusion

Cool-Cache: a compiler-enabled, power-aware data cache

Static speculative approach is powerful

top related