leveraging heterogeneity in dram main memories to accelerate critical word access niladrish...

8
Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis Zhen Fang ‡† Ramesh Illikkal* Ravi Iyer* University of Utah , NVidia and Intel La Work done while at Intel

Upload: cecily-willis

Post on 24-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Niladrish ChatterjeeManjunath ShevgoorRajeev BalasubramonianAl DavisZhen Fang‡†

Ramesh Illikkal*Ravi Iyer*

University of Utah , NVidia‡ and Intel Labs*

†Work done while at Intel

Page 2: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

Memory Bottleneck• DRAM power as high as 25% of total datacenter power• Low-Power DRAM in place of DDR3.

– BOOM from HP Labs – Energy Proportional Memory from Stanford

2

CPUDDR3

DDR3

DDR3

DDR3

CPU LPDDR

LPDDR

LPDDR

LPDDR

BASELINE Low Power Memory

Page 3: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

Latency Wall

• Memory latency wall not going away– Emerging scale-out workloads e.g. Cloudsuite– Move towards energy-efficient in-order cores

• Reduced Latency DRAM offers very low latency– Row-cycle time (tRC) of 8-12ns (DDR3 tRC = 48.75ns, LPDDR2 tRC = 60ns)

3

Page 4: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

No one memory works best

4

RLDRAM3 DDR3 LPDDR20

100

200

300

400

500

600

700

Power (mW)

RLDRAM3 DDR3 LPDDR20

10

20

30

40

50

60

Latency (ns)

Page 5: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

Heterogeneous Memory

5

PERFORMANCE OPTIMIZED

DRAMCPU

• Combine high-performance and low-power dram to outperform DDR3 at a lower energy cost

• Large number of possible designs– Different DRAM device combinations– Channel Organization– Data Placement Granularity

POWER OPTIMIZED

DRAM

Page 6: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

Critical Word Regularity

6

• Most DRAM requests are for word-0 of the cache-line

Frequency of accesses to individual words of a cache-line

Page 7: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

Critical Word Acceleration

7

CPU LPDDR

RLDRAM

LPDDR

RLDRAM

Word 0

Words 1 - 7

• Critical Word fetched from RLDRAM to boost performance• Rest of the cache-line placed & retrieved from LPDRAM for

energy efficiency.

Page 8: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis

Results

8

• Throughput improved by 12.9%

• System energy improved by 6%