leveraging heterogeneity in dram main memories to accelerate critical word access niladrish...
TRANSCRIPT
![Page 1: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/1.jpg)
Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access
Niladrish ChatterjeeManjunath ShevgoorRajeev BalasubramonianAl DavisZhen Fang‡†
Ramesh Illikkal*Ravi Iyer*
University of Utah , NVidia‡ and Intel Labs*
†Work done while at Intel
![Page 2: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/2.jpg)
Memory Bottleneck• DRAM power as high as 25% of total datacenter power• Low-Power DRAM in place of DDR3.
– BOOM from HP Labs – Energy Proportional Memory from Stanford
2
CPUDDR3
DDR3
DDR3
DDR3
CPU LPDDR
LPDDR
LPDDR
LPDDR
BASELINE Low Power Memory
![Page 3: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/3.jpg)
Latency Wall
• Memory latency wall not going away– Emerging scale-out workloads e.g. Cloudsuite– Move towards energy-efficient in-order cores
• Reduced Latency DRAM offers very low latency– Row-cycle time (tRC) of 8-12ns (DDR3 tRC = 48.75ns, LPDDR2 tRC = 60ns)
3
![Page 4: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/4.jpg)
No one memory works best
4
RLDRAM3 DDR3 LPDDR20
100
200
300
400
500
600
700
Power (mW)
RLDRAM3 DDR3 LPDDR20
10
20
30
40
50
60
Latency (ns)
![Page 5: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/5.jpg)
Heterogeneous Memory
5
PERFORMANCE OPTIMIZED
DRAMCPU
• Combine high-performance and low-power dram to outperform DDR3 at a lower energy cost
• Large number of possible designs– Different DRAM device combinations– Channel Organization– Data Placement Granularity
POWER OPTIMIZED
DRAM
![Page 6: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/6.jpg)
Critical Word Regularity
6
• Most DRAM requests are for word-0 of the cache-line
Frequency of accesses to individual words of a cache-line
![Page 7: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/7.jpg)
Critical Word Acceleration
7
CPU LPDDR
RLDRAM
LPDDR
RLDRAM
Word 0
Words 1 - 7
• Critical Word fetched from RLDRAM to boost performance• Rest of the cache-line placed & retrieved from LPDRAM for
energy efficiency.
![Page 8: Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis](https://reader035.vdocuments.us/reader035/viewer/2022071807/56649dcf5503460f94ac45ee/html5/thumbnails/8.jpg)
Results
8
• Throughput improved by 12.9%
• System energy improved by 6%