introductionintroduction methodologymethodology level based hybrid cache architecturelevel based...

18
Hybrid Cache Architecture with Disparate Memory Technologies Xiaoxia Wu† Jian Li‡ Lixin Zhang‡ Evan Speight‡ Ram Rajamony‡ Yuan † Pennsylvania State University ‡ IBM Austin Research Laboratory

Upload: robyn-cole

Post on 17-Jan-2018

229 views

Category:

Documents


0 download

DESCRIPTION

3/18

TRANSCRIPT

Page 1: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Hybrid Cache Architecture with Disparate Memory Technologies

Xiaoxia Wu† Jian Li‡ Lixin Zhang‡ Evan Speight‡ Ram Rajamony‡ Yuan Xie†† Pennsylvania State University

‡ IBM Austin Research Laboratory

Page 2: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Agenda• Introduction• Methodology• Level based Hybrid Cache Architecture• Region based Hybrid Cache Architecture• 3D Hybrid Cache Stacking• Conclusion

2/18

Page 3: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Introduction (1/3)• Traditional SRAM-based cache architecture

– Limited size with CMP: cache-core balance– Leakage power– More cache levels: Design overhead, coherence– Non-uniform Cache Architecture (wire delay)

• Improve cache power-performance with Emerging Memory Technologies, under the same chip area/footprint– Embedded DRAM– Magnetic RAM– Phase Change RAM– Three-dimensional space

3/18

Page 4: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Introduction (2/3)• Different Memory Technologies

4/23

SRAM(6T)

DRAM(1T 1C)

MRAM(1T 1J)

PRAM(1T 1J)

Density (ratio) Low (1) High (4) High (4) High (16)

Dynamic Power Low Medium Low for readHigh for write

Medium for read

High for write

Leakage Power High Medium Low Low

Speed Very fast Fast Fast for readSlow for write

Slow for readSlowest for

write

Non-volatility No No Yes Yes

Scalability Yes Yes Yes Yes

Endurance1610 1610 1510 810

Page 5: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Introduction (3/3)• Motivation

5/18

L2 Cache

Page 6: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Methodology (1/2)

Core / L1

L2(SRAM)

L3(SRAM)

L3(SRAM)

L2(SRAM)

Core / L1

Core / L1

L2(SRAM)

L3(SRAM)

L3(SRAM)

L2(SRAM)

Core / L1

Core / L1

L2(SRAM)

L3(SRAM)

L3(SRAM)

L2(SRAM)

Core / L1

Core / L1

L2(SRAM)

L3(SRAM)

L3(SRAM)

L2(SRAM)

Core / L1

Core / L1L2

(SRAM)

L3(SRAM)

Core / L1L2

(SRAM)L3

(eDRAMMRAMPRAM)

Core / L1L2

(SRAM)L2

(eDRAMMRAMPRAM)

L3 PRAM

L2 (SRAM)L2 (eDRAMMRAM)

Core / L1

L2 PRAM

L2 (SRAM)L2 (eDRAMMRAM)

Core / L1

L4 PRAM

L2 (SRAM)L2 (eDRAM

MRAM)

Core / L1

LHCA RHCA

3DHCA

(A) (B)

(C) (D) (E)

6/18

Page 7: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Methodology (2/2)

Cache Density Latency(cycle)

DynamicEnergy (nJ)

StaticPower

(W)

SRAM(1M) 1 8 0.388 1.36

eDRAM(4M) 4 24 0.72 0.4

MRAM(4M) 4 Read:20Write:60

Read: 0.4Write: 2.3 0.15

PRAM(16M) 16 Read:40Write:200

Read: 0.8Write: 1.5 0.3

Item Setting value

Processor 8-way issue out-of-order, 8-core, 4GhzL1 32KB DL1, 32KB IL1, 128B, 4-way, 1 R/W port

L2/L3/L4 eDRAM, MRAM, PRAM & 3D StackingMemory 400 cycles latency

• Benchmark: SpecInt06, Specjbb, NAS, Bioperf, Parsec

• Simulator: SystemSim full system simulatorBase line: 256KB (L2) + 1MB(L3)7/18

Page 8: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

LHCA (Level based Hybrid Cache Architec-ture)

8/18

Page 9: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

RHCA (Region based Hybrid Cache Architecture, 1/7)• Mutually exclusive regions• Parallel search unified LRU• Fast and slow regions in on cache level• Intra-cache data movement policy

– Move frequently used data to the fast region• Drowsy* RHCA

– Keep slow region in drowsy mode– The drowsy mode can be power-gating the non-volatile

memory cells and/or corresponding peripheral CMOS logic.

– It will be used the primitive drowsy mode for the DRAM.

Drowsy Mode: 캐시의 일정 성능을 유지하는 범위에서 소자에 전달되는 전력을 조절하여 전력 소모를 최적화 하는 방법 [15]9/18

Page 10: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

RHCA (Region based Hybrid Cache Architecture, 2/7)• Intra-cache data movement policy

– On a cache hit, if the corresponding cache line resides in the fast region, its sticky bit is always set.

10/18

Page 11: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

RHCA (Region based Hybrid Cache Architecture, 3/7)• Structure for swap operation.

11/18

Page 12: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

RHCA (Region based Hybrid Cache Architecture, 4/7)

RHCA (fast+slow) Fast region L2 total size (latency)

SRAM+eDRAM 256KB (6 cycles) 4MB (24 cycles)

SRAM+MRAM 256KB (6 cycles) 4MB (r: 20, w: 60)

SRAM+PRAM 256KB (6 cycles) 16MB (r: 40, w: 200)

• Slow region: 256KB/bank, 1 r/w port, block size 128B, associativity:16, 16, 64

• RHCA is 256KB less size than corresponding LHCA– Avoid odd-sized cache

• DNUCA policy: more fine grained, move a line to a closer bank to CPU on each hit, bank-based, same size(Dynamic Non-Uniform Cache Architectures)

12/18

Page 13: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

RHCA (Region based Hybrid Cache Architecture, 5/7)

eDRAM

MRAM

PRAM

Hit ratio

13/18

SRAM-eDRAM

Page 14: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

RHCA (Region based Hybrid Cache Architecture, 6/7)• Multi-core

• Wake-up latency

14/17

SRAM-eDRAM

Page 15: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

RHCA (Region based Hybrid Cache Architecture, 7/7)• Threshold

• Replacement and insertion policy

15/17

Baseline: LRU

Page 16: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

3D Hybrid Cache Stacking

L3 PRAM

L2 (SRAM)L2 (eDRAMMRAM)

Core / L1

L2 PRAM

L2 (SRAM)L2 (eDRAMMRAM)

Core / L1

L4 PRAM

L2 (SRAM)L2 (eDRAM

MRAM)

Core / L1

(C) (D) (E)

• 3DHCA-C (3D LHCA): 256KB L2 SRAM, 4M L3 eDRAM, 32M L4 PRAM

• 3DHCA-D: 32M L2 fast, middle, slow region (3D RHCA)– Data in slow region can be moved to fast and middle re-

gions• 3DHCA-E: 4M L2 fast+slow region, 32M L3 PRAM

(LHCA+RHCA) 16/18

Page 17: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

3D Hybrid Cache Stacking

17/18

Page 18: IntroductionIntroduction MethodologyMethodology Level based Hybrid Cache ArchitectureLevel based Hybrid Cache Architecture Region based Hybrid Cache ArchitectureRegion

Conclusion• Hybrid cache architecture is promising to improve

cache power-performance under same chip area/footprint

• RHCA and LHCA achieve better power-perfor-mance than SRAM-based design

• RHCA outperforms LHCA with minimal hardware support

• 3DHCA achieves better performance than LHCA and RHCA, while still maintains lower power than 2D SRAM baseline

18/18