Transcript
Page 1: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

Optimizing DRAM Based Main Memories Using Intelligent Data

Placement

Ph.D. Thesis ProposalKshitij Sudan

Page 2: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

Thesis Statement

Improving DRAM access latency, power consumption, and capacity by

leveraging intelligent data placement.

Page 3: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

3

Overview

CPUMC

DIMM…

Memory Interconnect

Narrow, buffered channels to increase

capacity

Proposed work

Memory ControllerMaximize DRAM row-buffer utility

Micro-pages: ASPLOS 2010

System Re-design

Increasing capacity within a fixed power budget

Tiered MemoryUnder Review

Page 4: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

4

RE-ARCHITECTING MEMORY CHANNELS

Proposed Work

Page 5: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

5

Challenges in Increasing DRAM Capacity

• Slow growth in CPU pin count limits number of memory channels

• Signal integrity limits capacity per channel– Use serial, point-to-point links

• Drawbacks of using serial, point-to-point links– Increased latency due to signal re-conditioning– Memory controller complexity limits resource use

Page 6: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

6

Increasing DRAM Capacity by Re-Architecting Memory Channel

• Re-architect CPU-to-DRAM channel• Many skinny, serial channels vs. few, wide buses

• CMPs might have changed the playing field• Improved signal integrity due to re-conditioning

• New channel topology to reduce latency• Study effects of channel frequency

Page 7: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

7

Re-Architecting Memory Channel

Organize modules as binary tree, and move some MC functionality to “Buffer Chip”

• Reduces module depth from O(n) to O(log n)

• Reduces worst case latency, improves signal integrity

• Buffer chip manages low-level DRAM operations and channel arbitration

• Not limited by worst-case latency like FB-DIMM

• NUMA like DRAM access – leverage data mapping

Page 8: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

8

MICRO-PAGESPast Work

Page 9: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

9

Increasing Row-Buffer Utility with Data Placement

• Over fetch due to large row-buffers• 8 KB read into row buffer for a 64 byte cache line• Row-buffer utilization for a single request < 1%

• Diminishing locality in multi-cores• Increasingly randomized memory access stream• Row-buffer hit rates bound to go down

• Open page policy and FR-FCFS request scheduling• Memory controller schedules requests to open row-buffers first

GoalImprove row-buffer hit-rates for CMPs

Page 10: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

10

Key ObservationPost-L2 Cache Block Access Pattern Within OS Pages

For heavily accessed pages in a given time interval,accesses are usually to a few cache blocks

Page 11: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

11

Basic Idea

Hottest micro-pages

1 KB micro-pages

Coldest micro-pages

4 KB OS Pages

DRAM Memory

Reserved DRAM Region

Page 12: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

12

Hardware Implementation (HAM)

PhysicalAddress

X

New addr . Y

4 GB Main MemoryCPU Memory Request

4 MB ReservedDRAM region

Y

X Page A

Mapping Table

X Y

Old Address New Address

BaselineHardware Assisted Migration (HAM)

Page 13: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

13

Conclusions• On average, for applications with room for improvement

and with our best performing scheme• Average performance ↑ 9% (max. 18%)• Average memory energy consumption ↓ 18% (max. 62%). • Average row-buffer utilization ↑ 38%

• Hardware assisted migration offers better returns due to fewer overheads of TLB shoot-down and misses

Page 14: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

14

TIERED MEMORYPast Work

Page 15: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

15

Increase DRAM Capacity in Fixed Power Budget

• DRAM power budget increasing steadily with increases in capacity– Memory power budget in large systems already close

to 50% of total power budget• DRAM low-power modes hard to use in current

systems– Granularity at which low-power modes operate at (a

DRAM rank)– Data placement to increase bandwidth reduces

opportunities to place ranks in low-power modes

Page 16: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

16

DRAM Power Mgmt. Challenges

• DRAM supports low-power modes, but not easy to exploit:– Granularity at which memory can be put in low-power

mode is large.– Random distribution of memory accesses across ranks

• Memory interleaving.• Little co-ordination between memory managers (library, OS,

and hypervisor).• As a result, no rank experiences sufficient idleness to

warrant being placed in a low-power modes.

Few systems can exploit DRAM low-power modes aggressively

Page 17: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

17

Tiered Memory

• Access to 4KB OS pages show a step curve• Leverage this to place frequently accessed pages in active-mode DRAM ranks • Place “cold” pages in low-power mode ranks

Page 18: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

18

Iso-Power Tiered Memory-I

• A DRAM rank in self-refresh mode consumes ~15% of the power of an idle rank in active mode.– 1 rank in active idle mode = 6 ranks in self-refresh.

• By maintaining most of the memory in a low-power mode, can build systems with a much larger memory capacity in same power budget.

Page 19: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

19

Iso-Power Tiered Memory-II

• 2 tiers of DRAM with heterogeneous power and performance characteristics.– “Hot” tier DRAM always available, “cold” tier DRAM uses

self-refresh low-power mode when idle.• Place frequently accessed data in hot tier.

– Maintain performance– Fewer accesses to cold tier -> reduces power.

• Batch references to cold tier:– Amortize entry/exit overheads of low-power mode.– Stay in low-power mode for longer.

Page 20: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

20

Intelligent Data Placement

• Counters keep track of hot pages with low overhead

• Every epoch, migrate hot pages in low-power ranks, to active ranks– Requires page-table updates, TLB flushes– Still low overhead - after first few epoch, little

change in hot page set

Page 21: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

21

Servicing cold-tier requests in batches

• Buffer requests at the memory controller for cold-tier accesses• At most, delay any request by t_g – prevents starvation• t_g chosen to amortize overheads of low-power mode entry/exit• Requires minimal change to the memory controller

Page 22: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

22

Attributions

• Re-architecting memory channel: Rajeev Balasubramonian, Al Davis, Niladrish Chatterjee, Manu Awasthi

• Micro-Pages: Rajeev Balasubramonian, Al Davis, Niladrish Chatterjee, Manu Awasthi

• Tiered Memory: Karthick Rajamani, Wei Huang, John Carter, Freeman Rawson

Page 23: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

Thanks

Questions?

Page 24: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

Backup Slides

Page 25: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

25

Other Work• Dynamic Hardware-Assisted Software-Controlled Page Placement to

Manage Capacity Allocation and Sharing within Large Caches - Manu Awasthi, Kshitij Sudan, Rajeev Balasubramonian, John Carter, HPCA, February 2009.

• Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality of Service - Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, Ravi Iyer, Under Review.

• A Novel System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O - Kshitij Sudan, Saisanthosh Balakrishnan, Sean Lie, Min Xu, Dhiraj Mallick, Rajeev Balasubramonian, Gary Lauterbach, Under Review.

• Data Locality Optimization of Pthread Applications for Non-Uniform Cache Architectures – Gagan S. Sachdev, Kshitij Sudan, Rajeev Balasubramonian, Mary Hall, Under Review. Contd.

Page 26: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

26

• Efficient Scrub Mechanisms for Error-Prone Emerging Memories - Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan, Bipin Rajendran, Rajeev Balasubramonian, Viji Srinivasan, To Appear at HPCA-18, Feb 2012.

• Hadoop Jobs Require One-Disk-per-Core, Myth or Fact? - Kshitij Sudan, Min Xu, Sean Lie, Saisanthosh Balakrishnan, Gary Lauterbach, XLDB-5 Lightning Talk, Oct. 2011.

• Handling PCM Resistance Drift with Device, Circuit, Architecture, and System Solutions - Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan, Rajeev Balasubramonian, Bipin Rajendran, Viji Srinivasan, Non-Volatile Memory Workshop, March 2011.

• Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers - Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian, Al Davis, PACT, September 2010

• Improving Server Performance on Multi-Cores via Selective Off-loading of OS Functionality - David Nellans, Kshitij Sudan, Erik Brunvand Rajeev Balasubramonian, WIOSCA, June 2010.

• Hardware Prediction of OS Run-Length For Fine-Grained Resource Customization - David Nellans, Kshitij Sudan, Erik Brunvand, Rajeev Balasubramonian, ISPASS-2010, March 2010.

Page 27: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

27

Iso-Power Memory Configurations Tiered Memory Size for u=0.9

Nh=2

Nh=3

Nh=4

15171921232527293133

5 6 7 8 9 10 11

Idle Power Ratio (Hot/Cold)

Num

ber o

f Tie

red

Ran

ks

Tiered Memory Size for u=0.5

Nh=2

Nh=3

Nh=4

15171921232527293133

5 6 7 8 9 10 11

Idle Power Ratio (Hot/Cold)

Num

ber o

f Tie

red

Ran

ks

• 8 active ranks in baseline • ratio of idle active and self-refresh power,• fraction (u) of memory requests served by hot ranks,• service rate,• bandwidth.

Tiered Memory Size for u=0.7

Nh=2

Nh=3

Nh=4

15171921232527293133

5 6 7 8 9 10 11

Idle Power Ratio (Hot/Cold)

Num

ber o

f Tie

red

Ran

ks

4h,12c:2X baseline

2h,22c:3X baseline

Analytical model determines iso-power configurations for a given access rate to the active-mode (“hot”) DRAM ranks

Page 28: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

28

Iso-Power Memory Configurations

Analytical model determines iso-power configurations for a given access rate to the active-mode (“hot”) DRAM ranks

Page 29: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

29

Tiered Memory: Iso-Power Memory Architecture to Address Memory Power Wall

• Build tiers out of DRAM ranks• Aggressively use low-power (LP) modes• Intelligent data placement to reduce

overheads of entry/exit from LP modes• Buffer requests to ranks in LP and service

them in batches to amortize entry/exit costs

Page 30: Optimizing DRAM Based Main Memories Using Intelligent Data Placement

30


Top Related