![Page 1: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/1.jpg)
Optimizing DRAM Based Main Memories Using Intelligent Data
Placement
Ph.D. Thesis ProposalKshitij Sudan
![Page 2: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/2.jpg)
Thesis Statement
Improving DRAM access latency, power consumption, and capacity by
leveraging intelligent data placement.
![Page 3: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/3.jpg)
3
Overview
CPUMC
DIMM…
Memory Interconnect
Narrow, buffered channels to increase
capacity
Proposed work
Memory ControllerMaximize DRAM row-buffer utility
Micro-pages: ASPLOS 2010
System Re-design
Increasing capacity within a fixed power budget
Tiered MemoryUnder Review
![Page 4: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/4.jpg)
4
RE-ARCHITECTING MEMORY CHANNELS
Proposed Work
![Page 5: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/5.jpg)
5
Challenges in Increasing DRAM Capacity
• Slow growth in CPU pin count limits number of memory channels
• Signal integrity limits capacity per channel– Use serial, point-to-point links
• Drawbacks of using serial, point-to-point links– Increased latency due to signal re-conditioning– Memory controller complexity limits resource use
![Page 6: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/6.jpg)
6
Increasing DRAM Capacity by Re-Architecting Memory Channel
• Re-architect CPU-to-DRAM channel• Many skinny, serial channels vs. few, wide buses
• CMPs might have changed the playing field• Improved signal integrity due to re-conditioning
• New channel topology to reduce latency• Study effects of channel frequency
![Page 7: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/7.jpg)
7
Re-Architecting Memory Channel
Organize modules as binary tree, and move some MC functionality to “Buffer Chip”
• Reduces module depth from O(n) to O(log n)
• Reduces worst case latency, improves signal integrity
• Buffer chip manages low-level DRAM operations and channel arbitration
• Not limited by worst-case latency like FB-DIMM
• NUMA like DRAM access – leverage data mapping
![Page 8: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/8.jpg)
8
MICRO-PAGESPast Work
![Page 9: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/9.jpg)
9
Increasing Row-Buffer Utility with Data Placement
• Over fetch due to large row-buffers• 8 KB read into row buffer for a 64 byte cache line• Row-buffer utilization for a single request < 1%
• Diminishing locality in multi-cores• Increasingly randomized memory access stream• Row-buffer hit rates bound to go down
• Open page policy and FR-FCFS request scheduling• Memory controller schedules requests to open row-buffers first
GoalImprove row-buffer hit-rates for CMPs
![Page 10: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/10.jpg)
10
Key ObservationPost-L2 Cache Block Access Pattern Within OS Pages
For heavily accessed pages in a given time interval,accesses are usually to a few cache blocks
![Page 11: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/11.jpg)
11
Basic Idea
Hottest micro-pages
1 KB micro-pages
Coldest micro-pages
4 KB OS Pages
DRAM Memory
Reserved DRAM Region
![Page 12: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/12.jpg)
12
Hardware Implementation (HAM)
PhysicalAddress
X
New addr . Y
4 GB Main MemoryCPU Memory Request
4 MB ReservedDRAM region
Y
X Page A
Mapping Table
X Y
Old Address New Address
BaselineHardware Assisted Migration (HAM)
![Page 13: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/13.jpg)
13
Conclusions• On average, for applications with room for improvement
and with our best performing scheme• Average performance ↑ 9% (max. 18%)• Average memory energy consumption ↓ 18% (max. 62%). • Average row-buffer utilization ↑ 38%
• Hardware assisted migration offers better returns due to fewer overheads of TLB shoot-down and misses
![Page 14: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/14.jpg)
14
TIERED MEMORYPast Work
![Page 15: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/15.jpg)
15
Increase DRAM Capacity in Fixed Power Budget
• DRAM power budget increasing steadily with increases in capacity– Memory power budget in large systems already close
to 50% of total power budget• DRAM low-power modes hard to use in current
systems– Granularity at which low-power modes operate at (a
DRAM rank)– Data placement to increase bandwidth reduces
opportunities to place ranks in low-power modes
![Page 16: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/16.jpg)
16
DRAM Power Mgmt. Challenges
• DRAM supports low-power modes, but not easy to exploit:– Granularity at which memory can be put in low-power
mode is large.– Random distribution of memory accesses across ranks
• Memory interleaving.• Little co-ordination between memory managers (library, OS,
and hypervisor).• As a result, no rank experiences sufficient idleness to
warrant being placed in a low-power modes.
Few systems can exploit DRAM low-power modes aggressively
![Page 17: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/17.jpg)
17
Tiered Memory
• Access to 4KB OS pages show a step curve• Leverage this to place frequently accessed pages in active-mode DRAM ranks • Place “cold” pages in low-power mode ranks
![Page 18: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/18.jpg)
18
Iso-Power Tiered Memory-I
• A DRAM rank in self-refresh mode consumes ~15% of the power of an idle rank in active mode.– 1 rank in active idle mode = 6 ranks in self-refresh.
• By maintaining most of the memory in a low-power mode, can build systems with a much larger memory capacity in same power budget.
![Page 19: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/19.jpg)
19
Iso-Power Tiered Memory-II
• 2 tiers of DRAM with heterogeneous power and performance characteristics.– “Hot” tier DRAM always available, “cold” tier DRAM uses
self-refresh low-power mode when idle.• Place frequently accessed data in hot tier.
– Maintain performance– Fewer accesses to cold tier -> reduces power.
• Batch references to cold tier:– Amortize entry/exit overheads of low-power mode.– Stay in low-power mode for longer.
![Page 20: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/20.jpg)
20
Intelligent Data Placement
• Counters keep track of hot pages with low overhead
• Every epoch, migrate hot pages in low-power ranks, to active ranks– Requires page-table updates, TLB flushes– Still low overhead - after first few epoch, little
change in hot page set
![Page 21: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/21.jpg)
21
Servicing cold-tier requests in batches
• Buffer requests at the memory controller for cold-tier accesses• At most, delay any request by t_g – prevents starvation• t_g chosen to amortize overheads of low-power mode entry/exit• Requires minimal change to the memory controller
![Page 22: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/22.jpg)
22
Attributions
• Re-architecting memory channel: Rajeev Balasubramonian, Al Davis, Niladrish Chatterjee, Manu Awasthi
• Micro-Pages: Rajeev Balasubramonian, Al Davis, Niladrish Chatterjee, Manu Awasthi
• Tiered Memory: Karthick Rajamani, Wei Huang, John Carter, Freeman Rawson
![Page 23: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/23.jpg)
Thanks
Questions?
![Page 24: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/24.jpg)
Backup Slides
![Page 25: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/25.jpg)
25
Other Work• Dynamic Hardware-Assisted Software-Controlled Page Placement to
Manage Capacity Allocation and Sharing within Large Caches - Manu Awasthi, Kshitij Sudan, Rajeev Balasubramonian, John Carter, HPCA, February 2009.
• Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality of Service - Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, Ravi Iyer, Under Review.
• A Novel System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O - Kshitij Sudan, Saisanthosh Balakrishnan, Sean Lie, Min Xu, Dhiraj Mallick, Rajeev Balasubramonian, Gary Lauterbach, Under Review.
• Data Locality Optimization of Pthread Applications for Non-Uniform Cache Architectures – Gagan S. Sachdev, Kshitij Sudan, Rajeev Balasubramonian, Mary Hall, Under Review. Contd.
![Page 26: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/26.jpg)
26
• Efficient Scrub Mechanisms for Error-Prone Emerging Memories - Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan, Bipin Rajendran, Rajeev Balasubramonian, Viji Srinivasan, To Appear at HPCA-18, Feb 2012.
• Hadoop Jobs Require One-Disk-per-Core, Myth or Fact? - Kshitij Sudan, Min Xu, Sean Lie, Saisanthosh Balakrishnan, Gary Lauterbach, XLDB-5 Lightning Talk, Oct. 2011.
• Handling PCM Resistance Drift with Device, Circuit, Architecture, and System Solutions - Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan, Rajeev Balasubramonian, Bipin Rajendran, Viji Srinivasan, Non-Volatile Memory Workshop, March 2011.
• Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers - Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian, Al Davis, PACT, September 2010
• Improving Server Performance on Multi-Cores via Selective Off-loading of OS Functionality - David Nellans, Kshitij Sudan, Erik Brunvand Rajeev Balasubramonian, WIOSCA, June 2010.
• Hardware Prediction of OS Run-Length For Fine-Grained Resource Customization - David Nellans, Kshitij Sudan, Erik Brunvand, Rajeev Balasubramonian, ISPASS-2010, March 2010.
![Page 27: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/27.jpg)
27
Iso-Power Memory Configurations Tiered Memory Size for u=0.9
Nh=2
Nh=3
Nh=4
15171921232527293133
5 6 7 8 9 10 11
Idle Power Ratio (Hot/Cold)
Num
ber o
f Tie
red
Ran
ks
Tiered Memory Size for u=0.5
Nh=2
Nh=3
Nh=4
15171921232527293133
5 6 7 8 9 10 11
Idle Power Ratio (Hot/Cold)
Num
ber o
f Tie
red
Ran
ks
• 8 active ranks in baseline • ratio of idle active and self-refresh power,• fraction (u) of memory requests served by hot ranks,• service rate,• bandwidth.
Tiered Memory Size for u=0.7
Nh=2
Nh=3
Nh=4
15171921232527293133
5 6 7 8 9 10 11
Idle Power Ratio (Hot/Cold)
Num
ber o
f Tie
red
Ran
ks
4h,12c:2X baseline
2h,22c:3X baseline
Analytical model determines iso-power configurations for a given access rate to the active-mode (“hot”) DRAM ranks
![Page 28: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/28.jpg)
28
Iso-Power Memory Configurations
Analytical model determines iso-power configurations for a given access rate to the active-mode (“hot”) DRAM ranks
![Page 29: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/29.jpg)
29
Tiered Memory: Iso-Power Memory Architecture to Address Memory Power Wall
• Build tiers out of DRAM ranks• Aggressively use low-power (LP) modes• Intelligent data placement to reduce
overheads of entry/exit from LP modes• Buffer requests to ranks in LP and service
them in batches to amortize entry/exit costs
![Page 30: Optimizing DRAM Based Main Memories Using Intelligent Data Placement](https://reader036.vdocuments.us/reader036/viewer/2022081505/568162ca550346895dd357ff/html5/thumbnails/30.jpg)
30