jenga: software-defined cache hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...jenga...

152
Jenga: Software-Defined Cache Hierarchies Po-An Tsai , Nathan Beckmann, and Daniel Sanchez

Upload: others

Post on 23-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga: Software-Defined

Cache Hierarchies

Po-An Tsai, Nathan Beckmann, and Daniel Sanchez

Page 2: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Executive summary

Heterogeneous caches are traditionally organized as a rigid hierarchy

Easy to program but introduce expensive overheads when hierarchy is not helpful

Jenga builds application-specific cache hierarchies on the fly

Key contribution: New algorithms to find near-optimal hierarchies

Arbitrary application behaviors & changing resource constraints

Full system optimization at 36 cores in <1 ms

Jenga improves EDP by up to 85% vs. state-of-the-art

2

Page 3: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Deep, rigid hierarchies are running out of steam

3

Page 4: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Deep, rigid hierarchies are running out of steam

Main

MemoryL1

L2

~1ns ~10ns ~100ns

Systems had few cache levels with widely different sizes and latencies

Past

3

Page 5: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Deep, rigid hierarchies are running out of steam

Main

MemoryL1

L2

~1ns ~10ns ~100ns

Systems had few cache levels with widely different sizes and latencies

Past

L1 L2

~1ns ~5ns

Now

3

Page 6: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Deep, rigid hierarchies are running out of steam

Main

MemoryL1

L2

~1ns ~10ns ~100ns

Systems had few cache levels with widely different sizes and latencies

Past

L1 L2

~1ns ~5ns

Now

~25ns

Distributed SRAM L3

Core

Private

L1 & L2

SRAM

Cache Bank

3

Page 7: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Deep, rigid hierarchies are running out of steam

Main

MemoryL1

L2

~1ns ~10ns ~100ns

DRAM

bank

DRAM

bank

DRAM

bank

DRAM

bank

Distributed DRAM L4

~50ns

Systems had few cache levels with widely different sizes and latencies

Past

L1 L2

~1ns ~5ns

Now

~25ns

Distributed SRAM L3

Core

Private

L1 & L2

SRAM

Cache Bank

3

Page 8: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Deep, rigid hierarchies are running out of steam

Main

Memory

Main

MemoryL1

L2

~1ns ~10ns ~100ns

DRAM

bank

DRAM

bank

DRAM

bank

DRAM

bank

Distributed DRAM L4

~50ns~100ns

Systems had few cache levels with widely different sizes and latencies

Past

L1 L2

~1ns ~5ns

Now

~25ns

Distributed SRAM L3

Core

Private

L1 & L2

SRAM

Cache Bank

3

Page 9: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Deep, rigid hierarchies are running out of steam

Main

Memory

Main

MemoryL1

L2

~1ns ~10ns ~100ns

DRAM

bank

DRAM

bank

DRAM

bank

DRAM

bank

Distributed DRAM L4

~50ns~100ns

Systems had few cache levels with widely different sizes and latencies

Higher overheads due to closer sizes and latencies across hierarchy levels

Past

L1 L2

~1ns ~5ns

Now

~25ns

Distributed SRAM L3

Core

Private

L1 & L2

SRAM

Cache Bank

3

Page 10: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

4

App 1: Scan through a 256MB array repeatedly

Page 11: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2

4

App 1: Scan through a 256MB array repeatedly

Page 12: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

App 1: Scan through a 256MB array repeatedly

Page 13: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

Page 14: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

~25ns ~50ns~5ns + + = ~80nsHit latency =

Page 15: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

~25ns ~50ns~5ns + + = ~80nsHit latency = ×

Page 16: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

~25ns ~50ns~5ns + + = ~80nsHit latency =

0ns ~50ns~5ns + + = ~55ns (30% lower)Hit latency =

×

Page 17: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

~25ns ~50ns~5ns + + = ~80nsHit latency =

0ns ~50ns~5ns + + = ~55ns (30% lower)Hit latency =

×

Page 18: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

~25ns ~50ns~5ns + + = ~80nsHit latency =

0ns ~50ns~5ns + + = ~55ns (30% lower)Hit latency =

×

×

Page 19: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

~25ns ~50ns~5ns + + = ~80nsHit latency =

0ns ~50ns~5ns + + = ~55ns (30% lower)Hit latency =

0ns ~40ns~5ns + + = ~45ns (45% lower)Hit latency =

×

×

Page 20: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Rigid hierarchies must cater to the conflicting needs of many applications

SRAM L3 DRAM L4

Main

Memory

Private

L1 & L2App 1

4

0% hit rate 0% hit rate 100% hit rate

Array data

App 1: Scan through a 256MB array repeatedly

~25ns ~50ns~5ns + + = ~80nsHit latency =

0ns ~50ns~5ns + + = ~55ns (30% lower)Hit latency =

0ns ~40ns~5ns + + = ~45ns (45% lower)Hit latency =

×

×

Even the best rigid hierarchy is a bad compromise!

(See paper for details)

Page 21: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga: Software-defined cache hierarchies

5

Page 22: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

5

Page 23: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

5

App 1: Scan through a 256MB array

256MB

cache

Main

Memory

Private

L1 & L2App 1

Ideal hierarchy

Page 24: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

5

App 1: Scan through a 256MB array

256MB

cache

Main

Memory

Private

L1 & L2App 1

Ideal hierarchy

Page 25: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

5

App 1: Scan through a 256MB array

256MB

cache

Main

Memory

Private

L1 & L2App 1

Ideal hierarchy

App 2: Lookup a 5MB hashmap

5MB

cachePrivate

L1 & L2App 2

Ideal hierarchy

Page 26: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

5

App 1: Scan through a 256MB array

256MB

cache

Main

Memory

Private

L1 & L2App 1

Ideal hierarchy

App 2: Lookup a 5MB hashmap

5MB

cachePrivate

L1 & L2App 2

Ideal hierarchy

Page 27: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

6

Page 28: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

6

App 3: Scan through two arrays

(1MB and 256MB)256MB

cachePrivate

L1 & L2App 3

1MB

cache

Page 29: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga manages distributed and heterogeneous banks as a single resource pool

and builds virtual hierarchies tailored to each application in the system.

SRAM bankDRAM bank

Jenga: Software-defined cache hierarchies

6

App 3: Scan through two arrays

(1MB and 256MB)256MB

cachePrivate

L1 & L2App 3

1MB

cache

Page 30: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Prior work to mitigate the cost of rigid hierarchies

7

Page 31: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Prior work to mitigate the cost of rigid hierarchies

Bypass levels to avoid cache pollutions

Do not install lines at specific levels

Give lines low priority in replacement policy

L3L4Private

L1 & L2

7

Page 32: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Prior work to mitigate the cost of rigid hierarchies

Bypass levels to avoid cache pollutions

Do not install lines at specific levels

Give lines low priority in replacement policy

Speculatively access up the hierarchy

Hit/miss predictors, prefetchers

Hide latency with speculative accesses L3 L4Private

L1 & L2

L3L4Private

L1 & L2

7

Page 33: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Prior work to mitigate the cost of rigid hierarchies

Bypass levels to avoid cache pollutions

Do not install lines at specific levels

Give lines low priority in replacement policy

Speculatively access up the hierarchy

Hit/miss predictors, prefetchers

Hide latency with speculative accesses

They must still check all levels for correctness!

Waste energy and bandwidth

L3 L4Private

L1 & L2

L3L4Private

L1 & L2

7

Page 34: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Prior work to mitigate the cost of rigid hierarchies

Bypass levels to avoid cache pollutions

Do not install lines at specific levels

Give lines low priority in replacement policy

Speculatively access up the hierarchy

Hit/miss predictors, prefetchers

Hide latency with speculative accesses

They must still check all levels for correctness!

Waste energy and bandwidth

L3 L4Private

L1 & L2

It’s better to build the right hierarchy and

avoid the root cause: unnecessary accesses to

unwanted cache levels

L3L4Private

L1 & L2

7

Page 35: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga = flexible hardware + smart softwareH

ard

ware

Soft

ware

8

Page 36: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga = flexible hardware + smart softwareH

ard

ware

Soft

ware

Time

8

Page 37: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga = flexible hardware + smart softwareH

ard

ware

Soft

ware

Read hardware

monitors

Time

8

Page 38: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga = flexible hardware + smart softwareH

ard

ware

Soft

ware

Read hardware

monitors

Optimize

hierarchies

Time

8

Page 39: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga = flexible hardware + smart softwareH

ard

ware

Soft

ware

Read hardware

monitors

Optimize

hierarchies

Time

Update

hierarchies

8

Page 40: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga = flexible hardware + smart softwareH

ard

ware

Soft

ware

Read hardware

monitors

Optimize

hierarchies

Time

Update

hierarchies

100ms

8

Page 41: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga = flexible hardware + smart softwareH

ard

ware

Soft

ware

Read hardware

monitors

Optimize

hierarchies

Time

Update

hierarchies

Optimize

hierarchies

100ms

8

Page 42: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga hardware: supporting virtual hierarchies (VHs)

Cores consult virtual hierarchy table (VHT) to find the access path

Similar to Jigsaw [PACT’13, HPCA’15], but it supports two levels

9

Page 43: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga hardware: supporting virtual hierarchies (VHs)

Cores consult virtual hierarchy table (VHT) to find the access path

Similar to Jigsaw [PACT’13, HPCA’15], but it supports two levels

SRAM Bank

Private $

NoC

Router

Core VHTTLB

Addr

VH id

DRAM bank

9

Page 44: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga hardware: supporting virtual hierarchies (VHs)

Cores consult virtual hierarchy table (VHT) to find the access path

Similar to Jigsaw [PACT’13, HPCA’15], but it supports two levels

SRAM Bank

Private $

NoC

Router

Core VHTTLB

Addr

VH id

DRAM bankTwo-level using both

SRAM and DRAM

9

Page 45: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga hardware: supporting virtual hierarchies (VHs)

Cores consult virtual hierarchy table (VHT) to find the access path

Similar to Jigsaw [PACT’13, HPCA’15], but it supports two levels

SRAM Bank

Private $

NoC

Router

Core VHTTLB

Addr

VH id

DRAM bankTwo-level using both

SRAM and DRAM

9

Page 46: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing a two-level virtual hierarchy

Tile 10Private

CachesCore 1VHT

DRAM

cache

bank

Tile

Access path: SRAM bank DRAM bank Mem

10

Page 47: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing a two-level virtual hierarchy

Tile 10

Virtual L1

(VL1)

1

SRAM (bank 10)

Private

CachesCore 1VHT

Core miss VL1 bank

DRAM

cache

bank1

Tile

Access path: SRAM bank DRAM bank Mem

10

Page 48: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing a two-level virtual hierarchy

Tile 10

Virtual L1

(VL1)

Virtual L2

(VL2)

1

2

SRAM (bank 10)

Private

CachesCore 1VHT

DRAM (bank 38)

Core miss VL1 bank

VL1 miss VL2 bank

DRAM

cache

bank1

2

Tile

Access path: SRAM bank DRAM bank Mem

10

Page 49: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing a two-level virtual hierarchy

Tile 10

Virtual L1

(VL1)

Virtual L2

(VL2)

1

2

3

SRAM (bank 10)

Private

CachesCore 1VHT

DRAM (bank 38)

Core miss VL1 bank

VL1 miss VL2 bank

VL2 hit, serve line

DRAM

cache

bank1

2

3

Tile

Access path: SRAM bank DRAM bank Mem

10

Page 50: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing an single-level VH using SRAM + DRAM

With VHT, software can group any combinations of banks to form a VH

VHT

11

Private

CachesCore

Main

Memory

Page 51: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing an single-level VH using SRAM + DRAM

With VHT, software can group any combinations of banks to form a VH

VHTSingle-level using both

SRAM and DRAM

11

Private

CachesCore

Main

Memory

Page 52: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing an single-level VH using SRAM + DRAM

With VHT, software can group any combinations of banks to form a VH

VHTSingle-level using both

SRAM and DRAM

11

Private

CachesCore

Main

Memory

Addr X

Page 53: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing an single-level VH using SRAM + DRAM

With VHT, software can group any combinations of banks to form a VH

VHTSingle-level using both

SRAM and DRAM

11

Private

CachesCore

Main

Memory

Addr Y

Addr X

Page 54: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Accessing an single-level VH using SRAM + DRAM

With VHT, software can group any combinations of banks to form a VH

VHTSingle-level using both

SRAM and DRAM

11

Private

CachesCore

Main

Memory

Addr Y

Logically equivalent to…

Private

CachesCore

DRAM

SRAMSRAMSRAM

Addr X

Page 55: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga software: finding near-optimal hierarchies

Periodically, Jenga reconfigures VHs to minimize data movement

Set VHTs

Hardware

Monitors

Reconfigure

Virtual

Hierarchies

Hardware Software

12

Page 56: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga software: finding near-optimal hierarchies

Periodically, Jenga reconfigures VHs to minimize data movement

Set VHTs

Hardware

Monitors

Application

miss curves

Reconfigure

Virtual

Hierarchies

Hardware Software

12

Page 57: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga software: finding near-optimal hierarchies

Periodically, Jenga reconfigures VHs to minimize data movement

Set VHTs

Hardware

Monitors

Application

miss curves

Reconfigure

Virtual

Hierarchies

Hardware Software

Virtual

Hierarchy

Allocation

12

Page 58: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga software: finding near-optimal hierarchies

Periodically, Jenga reconfigures VHs to minimize data movement

Set VHTs

Hardware

Monitors

Application

miss curves

Reconfigure

Virtual

Hierarchies

Hardware Software

Virtual

Hierarchy

Allocation

VL1VL2

VH sizes & levels

12

Page 59: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga software: finding near-optimal hierarchies

Periodically, Jenga reconfigures VHs to minimize data movement

Set VHTs

Hardware

Monitors

Application

miss curves

Reconfigure

Virtual

Hierarchies

Hardware Software

Bandwidth-Aware

Placement

Virtual

Hierarchy

Allocation

VL1VL2

VH sizes & levels

12

Page 60: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga software: finding near-optimal hierarchies

Periodically, Jenga reconfigures VHs to minimize data movement

Final allocation

Set VHTs

Hardware

Monitors

Application

miss curves

Reconfigure

Virtual

Hierarchies

Hardware Software

Bandwidth-Aware

Placement

Virtual

Hierarchy

Allocation

VL1VL2

VH sizes & levels

12

Page 61: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Modeling performance of heterogeneous caches

Treat SRAM and DRAM as different “flavors” of banks with different latencies

13

Page 62: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Modeling performance of heterogeneous caches

Treat SRAM and DRAM as different “flavors” of banks with different latencies

DRAM

bank

Color latencyStart

13

Page 63: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Modeling performance of heterogeneous caches

Treat SRAM and DRAM as different “flavors” of banks with different latencies

DRAM

bank

Color latencyStart

Ca

che

Acc

ess

La

tenc

y

Total Capacity

DRAM bank

13

Page 64: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Modeling performance of heterogeneous caches

Treat SRAM and DRAM as different “flavors” of banks with different latencies

DRAM

bank

Color latencyStart

Ca

che

Acc

ess

La

tenc

y

Total Capacity

DRAM bank

Virtual Cache size

Late

ncy

13

Page 65: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Modeling performance of heterogeneous caches

Treat SRAM and DRAM as different “flavors” of banks with different latencies

DRAM

bank

Color latencyStart

Ca

che

Acc

ess

La

tenc

y

Total Capacity

DRAM bank

Virtual Cache size

Late

ncy

Access latency

13

Page 66: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Modeling performance of heterogeneous caches

Treat SRAM and DRAM as different “flavors” of banks with different latencies

DRAM

bank

Color latencyStart

Ca

che

Acc

ess

La

tenc

y

Total Capacity

DRAM bank

Virtual Cache size

Late

ncy

Access latency

Miss latency

Miss curve from hardware

monitors

13

Page 67: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Modeling performance of heterogeneous caches

Treat SRAM and DRAM as different “flavors” of banks with different latencies

DRAM

bank

Color latencyStart

Ca

che

Acc

ess

La

tenc

y

Total Capacity

DRAM bank

Virtual Cache size

Late

ncy

Access latency

Miss latency

Total latency

Miss curve from hardware

monitors

Latency curve for single-level,

heterogeneous cache

13

Page 68: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Optimizing hierarchies by minimizing system latency

14

Page 69: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Optimizing hierarchies by minimizing system latency

Our prior work has proposed algorithms to take latency curves, allocate

capacity and place them on chip to minimize system latency

But only builds single-level VHs

14

Page 70: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Optimizing hierarchies by minimizing system latency

Our prior work has proposed algorithms to take latency curves, allocate

capacity and place them on chip to minimize system latency

But only builds single-level VHs

Late

ncy

Capacity

App2

App3

App1

14

Page 71: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Optimizing hierarchies by minimizing system latency

Our prior work has proposed algorithms to take latency curves, allocate

capacity and place them on chip to minimize system latency

But only builds single-level VHs

Late

ncy

Capacity

App2

App3

App1 Capacity

App2 App1 App3

14

Page 72: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Optimizing hierarchies by minimizing system latency

Our prior work has proposed algorithms to take latency curves, allocate

capacity and place them on chip to minimize system latency

But only builds single-level VHs

Late

ncy

Capacity

App2

App3

App1 Capacity

App2 App1 App3

14

Page 73: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Multi-level hierarchies are much more complex

15

Page 74: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Multi-level hierarchies are much more complex

Many intertwined factors

Best VL1 size depends on VL2 size

Best VL2 size depends on VL1 size

Should we have VL2? (Depends on total size)

15

Page 75: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Multi-level hierarchies are much more complex

Many intertwined factors

Best VL1 size depends on VL2 size

Best VL2 size depends on VL1 size

Should we have VL2? (Depends on total size)

Jenga encodes these tradeoffs in a single curve

Can reuse prior allocation algorithms

15

Page 76: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

How to get a latency curve for a multi-level VH

16

Page 77: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

How to get a latency curve for a multi-level VH

Two-level hierarchies form

a latency surface!

16

Page 78: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

How to get a latency curve for a multi-level VH

Best 1- and 2-level

hierarchy at every size

Two-level hierarchies form

a latency surface!

16

Project

Page 79: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

How to get a latency curve for a multi-level VH

Best 1- and 2-level

hierarchy at every size

Two-level hierarchies form

a latency surface!

16

Project

Page 80: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

How to get a latency curve for a multi-level VH

Best 1- and 2-level

hierarchy at every size

Best overall hierarchy

at every size

Two-level hierarchies form

a latency surface!

16

Project

Page 81: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

How to get a latency curve for a multi-level VH

Best 1- and 2-level

hierarchy at every size

Best overall hierarchy

at every size

Two-level hierarchies form

a latency surface!

16

Project

Page 82: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

How to get a latency curve for a multi-level VH

Best 1- and 2-level

hierarchy at every size

Best overall hierarchy

at every size

Two-level hierarchies form

a latency surface!

Curve lets us optimize

multi-level hierarchies!

16

Project

Page 83: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Allocating virtual hierarchies

VH2

VH3

VH1

Latency curves

17

Page 84: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Allocating virtual hierarchies

VH2

VH3

VH1

Latency curves

17

Cache

allocation

algorithm

Page 85: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Allocating virtual hierarchies

VH2

VH3

VH1

Latency curves

17

Total capacity

of each VH

Capacity

VH1 VH2 VH3

Cache

allocation

algorithm

Page 86: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Allocating virtual hierarchies

VH2

VH3

VH1

Latency curves

17

Total capacity

of each VH

Capacity

VH1 VH2 VH3

Cache

allocation

algorithm

Decide

the best

hierarchy

Page 87: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Allocating virtual hierarchies

VH2

VH3

VH1

Latency curves

17

Total capacity

of each VH

Capacity

VH1 VH2 VH3

Cache

allocation

algorithm

Decide

the best

hierarchy

Virtual hierarchy

size and levels

VL1

VL1

VL1 VL2

Page 88: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

SRAM bank

DRAM bank

18

VL1

VL1

VL1 VL2

Page 89: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

SRAM bank

DRAM bank

18

VL1

VL1

VL1 VL2

Page 90: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

SRAM bank

DRAM bank

18

VL1

VL1

VL1 VL2

Page 91: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

18

VL1

VL1

VL1 VL2

Page 92: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

18

VL1

VL1

VL1 VL2

Page 93: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

1.0X Latency

1.0X Latency18

VL1

VL1

VL1 VL2

Page 94: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

1.0X Latency1.1X Latency

1.0X Latency18

VL1

VL1

VL1 VL2

Page 95: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

1.0X Latency1.1X Latency1.3X Latency

1.0X Latency18

VL1

VL1

VL1 VL2

Page 96: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

1.0X Latency1.1X Latency1.3X Latency

1.0X Latency1.1X Latency18

VL1

VL1

VL1 VL2

Page 97: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

1.0X Latency1.1X Latency1.3X Latency

1.0X Latency1.1X Latency1.3X Latency18

VL1

VL1

VL1 VL2

Page 98: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

1.0X Latency1.1X Latency1.3X Latency

1.0X Latency1.1X Latency1.3X Latency18

VL1

VL1

VL1 VL2

Page 99: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Bandwidth-aware virtual hierarchy placement

Place data close without saturating DRAM bandwidth

Every iteration, Jenga …

Chooses a VH (via an opportunity cost metric, see paper)

Greedily places a chunk of its data in its closest bank

Update DRAM bank latency

1.0X Latency1.1X Latency1.3X Latency

1.0X Latency1.1X Latency1.3X Latency18

VL1

VL1

VL1 VL2

Page 100: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga adds small overheads

19

Page 101: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga adds small overheads

Hardware overheads

VHT requires ∼2.4 KB/tile

Monitors are 8 KB x 2/tile

In total, Jenga adds ∼20 KB per tile, 4% of the SRAM banks

Similar to Jigsaw

19

Page 102: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga adds small overheads

Hardware overheads

VHT requires ∼2.4 KB/tile

Monitors are 8 KB x 2/tile

In total, Jenga adds ∼20 KB per tile, 4% of the SRAM banks

Similar to Jigsaw

Software overheads

0.4% of system cycles at 36 tiles

Runs concurrently with applications; only needs to pause cores to update VHTs

Trivial to parallelize

19

Page 103: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

See paper for …

Hardware support for

Fast reconfiguration

Page reclassification

Efficient implementation of hierarchy allocation

OS integration

20

Page 104: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Evaluation

21

Page 105: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Evaluation

Modeled system

36 cores on 6x6 mesh

18MB SRAM

1GB Stacked DRAM

21

Page 106: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Evaluation

Modeled system

36 cores on 6x6 mesh

18MB SRAM

1GB Stacked DRAM

Workloads

36 copies of same app (SPECrate)

Random 36 SPECCPU apps mixes

36-threaded SPECOMP apps

21

Page 107: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Evaluation

Modeled system

36 cores on 6x6 mesh

18MB SRAM

1GB Stacked DRAM

Workloads

36 copies of same app (SPECrate)

Random 36 SPECCPU apps mixes

36-threaded SPECOMP apps

Compared 5 schemes

SRAM DRAM

S-NUCA Rigid L3 -

Alloy Rigid L3 Rigid L4

Jigsaw App-specific L3 -

JigAlloy App-specific L3 Rigid L4

Jenga App-specific Virtual Hierarchies21

Page 108: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

22

Page 109: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

22

Page 110: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

Private L2

22

Page 111: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

Private L2

22

Rigid SRAM L3Data

Page 112: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

Memory

Private L2

22

Rigid SRAM L3Data

~100%

miss rate

Page 113: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

Memory

Private L2

Wasteful accesses to

L3, should have gone

to memory directly

22

Rigid SRAM L3Data

~100%

miss rate

Page 114: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

Memory

Private L2

Wasteful accesses to

L3, should have gone

to memory directly

22

Rigid SRAM L3Data

~100%

miss rate

Page 115: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

23

Page 116: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

23

Page 117: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

23

Rigid SRAM L3Data

Page 118: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Rigid DRAM L4

Working set: 6MB x 36 = 216 MB

23

Rigid SRAM L3Data

~100%

miss rate

Page 119: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

Rigid DRAM L4

Working set: 6MB x 36 = 216 MB

23

Rigid SRAM L3Data

~100%

miss rate

~0%

miss rate

Page 120: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

Rigid DRAM L4

Cache working

sets with DRAM L4

Working set: 6MB x 36 = 216 MB

23

Rigid SRAM L3Data

~100%

miss rate

~0%

miss rate

Page 121: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

24

Page 122: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

24

Page 123: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

24

App-specific SRAM L3

Page 124: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

Working set: 6MB x 36 = 216 MB

24

App-specific SRAM L3

~90% miss

rate

Page 125: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

Reduce 10% misses with

app-specific SRAM L3

Working set: 6MB x 36 = 216 MB

24

App-specific SRAM L3

~90% miss

rate

Page 126: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

25

Page 127: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

25

Page 128: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

25

App-specific SRAM L3

Page 129: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

25

App-specific SRAM L3

~90% miss

rate

Page 130: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Rigid DRAM L4

Working set: 6MB x 36 = 216 MB

25

App-specific SRAM L3

~90% miss

rate

Page 131: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

Rigid DRAM L4

Working set: 6MB x 36 = 216 MB

25

App-specific SRAM L3

~90% miss

rate

~0%

miss rate

Page 132: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

Rigid DRAM L4

Combines Jigsaw’s and

Alloy’s benefits, but

still a rigid hierarchy

Working set: 6MB x 36 = 216 MB

25

App-specific SRAM L3

~90% miss

rate

~0%

miss rate

Page 133: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Working set: 6MB x 36 = 216 MB

26

Page 134: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

Working set: 6MB x 36 = 216 MB

26

Page 135: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Private L2

6MB, SRAM + DRAM

VL1-only hierarchy

Working set: 6MB x 36 = 216 MB

26

Page 136: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

6MB, SRAM + DRAM

VL1-only hierarchy

Working set: 6MB x 36 = 216 MB

26

~0% miss

rate

Page 137: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

6MB, SRAM + DRAM

VL1-only hierarchy

Single lookup to the

working set!

No wasteful lookups!

60%

better

20%

better

Working set: 6MB x 36 = 216 MB

26

~0% miss

rate

Page 138: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Case study: 36 copies of xalanc

Memory

Private L2

6MB, SRAM + DRAM

VL1-only hierarchy

Single lookup to the

working set!

No wasteful lookups!

60%

better

20%

better

Working set: 6MB x 36 = 216 MB

26

~0% miss

rate

Jenga improves performance and energy efficiency by

creating the right hierarchy using the best available resources!

Page 139: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga works across a wide range of behaviors

App with two-level working set App with flat working set

27

Page 140: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga works across a wide range of behaviors

Working set

Jenga VHsSRAM VL1

DRAM VL2

0.5MB +

16MB

1MB +

8MB

SRAM+DRAM VL1

DRAM VL2

App with two-level working set App with flat working set

27

Page 141: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga works across a wide range of behaviors

Working set

Jenga VHsSRAM VL1

DRAM VL2

0.5MB +

16MB

1MB +

8MB

SRAM+DRAM VL1

DRAM VL2

2.5MB

SRAM+

DRAM VL1

8MB

SRAM+

DRAM VL1

>50MB

DRAM VL1 or

No caching

App with two-level working set App with flat working set

27

Page 142: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga works for random multi-program mixes

28

Page 143: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga works for random multi-program mixes

2.6X over S-NUCA

20% over JigAlloy28

Page 144: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga works for random multi-program mixes

2.6X over S-NUCA

20% over JigAlloy

1.7X over S-NUCA

10% over JigAlloy28

Page 145: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Jenga works for random multi-program mixes

Jenga consistently outperforms the other

schemes for multi-program mixes

2.6X over S-NUCA

20% over JigAlloy

1.7X over S-NUCA

10% over JigAlloy28

Page 146: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

See paper for more results

Full result for SPECCPU-rate

Multithreaded apps

Sensitivity study for Jenga’s software techniques

2.5D DRAM architectures

Jigsaw SRAM L3 + Jigsaw DRAM L4

And more

29

Page 147: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Conclusion

30

Page 148: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Conclusion

Rigid, multi-level cache hierarchies are ill-suited to many applications

They cause significant overhead when they are not helpful

30

Page 149: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Conclusion

Rigid, multi-level cache hierarchies are ill-suited to many applications

They cause significant overhead when they are not helpful

We propose Jenga, a software-defined, reconfigurable cache hierarchy

Adopts application-specific organization on-the-fly

Uses new software algorithm to find near-optimal hierarchy efficiently

30

Page 150: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Conclusion

Rigid, multi-level cache hierarchies are ill-suited to many applications

They cause significant overhead when they are not helpful

We propose Jenga, a software-defined, reconfigurable cache hierarchy

Adopts application-specific organization on-the-fly

Uses new software algorithm to find near-optimal hierarchy efficiently

Jenga improves both performance and energy efficiency, by up to 85% in

EDP, over a combination of state-of-art techniques

30

Page 151: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Thanks! Questions?

Rigid, multi-level cache hierarchies are ill-suited to many applications

They cause significant overhead when they are not helpful

We propose Jenga, a software-defined, reconfigurable cache hierarchy

Adopts application-specific organization on-the-fly

Uses new software algorithm to find near-optimal hierarchy efficiently

Jenga improves both performance and energy efficiency, by up to 85% in

EDP, over a combination of state-of-art techniques

31

Page 152: Jenga: Software-Defined Cache Hierarchiespeople.csail.mit.edu/poantsai/talks/2017.jenga.isca...Jenga manages distributed and heterogeneous banks as a single resource pool and builds

Thank you for your attention!

Questions?

Jenga: Software-Defined Cache Hierarchies