dram background fully-buffered dimm memory architectures: understanding mechanisms, overheads and...

40
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Upload: keely-starr

Post on 29-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

DRAM background

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07

CS 8501, Mario D. Marino, 02/08

Page 2: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

DRAM Background

Page 3: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Typical Memory

• Busses: address, command, data, DIMM (Dual In-Line Memory Module) selection

Page 4: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

DRAM cell

Page 5: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

DRAM array

Page 6: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

DRAM device or chip

Page 7: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Command/data movement: DRAM chip

Page 8: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Operations(commands)

• protocol, timing

Page 9: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Examples of DRAM operations(commands)

Page 10: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08
Page 11: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

“The purpose of a row access command is to move data from the DRAMarrays to the sense amplifiers.”

tRCD and tRAS

Page 12: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

“ A column read command moves data from the array of sense amplifiers of a given bank to the memory controller.”

tCAS, tBurst

Page 13: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Precharge: separate phase that is a prerequisite for the subsequent phases of a row access operation (bitlines set to Vcc/2 or Vcc)

Page 14: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Organization, access, protocols

Page 15: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Logical Channels: set of physical channels connected to the same memory controller

Page 16: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Examples of Logical Channels

Page 17: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Rank = set of banks

Page 18: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08
Page 19: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Row = DRAM page

Page 20: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Width: aggregating DRAM chips

Page 21: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Scheduling: banks

Page 22: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Scheduling banks

Page 23: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Scheduling: ranks

Page 24: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Open x Close page

Open-page: data access to and from cells requires separate row and column commands

– Favors accesses on the same row (sense aps open)

– Typical general purpose computers (desktop/laptop)

Close-page:

– Intense amount of requests, favors random accesses

– Large multiprocessor/multicore systems

Page 25: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Available Parallelism in DRAM System Organization

Channel: Pros: performance

different logical channels, independent memory controllers

schedulling strategies

cons Number of pins, power to deliver Smart but not adaptive firmware

Page 26: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Available Parallelism in DRAM System Organization

Rank

pros accesses can proceed in parallel in different ranks

(busses availability)cons

Rank-to-rank switching penalties in high frequency Globally synchronous DRAM (global clock)

Page 27: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Available Parallelism in DRAM System Organization

Bank

Different banks (busses availability)

Row

Only 1 row/bank can be active at any time period

Column

Depends on management (close-page / open-page)

Page 28: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Paper: Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07

Page 29: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

processor #cores #MC #pins

Intel Core 2 2 2

4 1366

6 3 1973

6 939AMD Bulldozer 12 1974

GT 200 8 2485

GTX 100/Fermi 512 6 -

Intel Nehalem

Intel Westmere

AMD Opteron

Page 30: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Issues

• parallel bus scaling: frequency, widths, length, depth (man hops => latency )

• #memory controllers increased CPUs, GPUs– #DIMMs/channel (depth) decreases

• 4DIMMs/channel in DDRs• 2 DIMMs/channel in DDR2• 1 DIMM/channel in DDR3

• scheduling

Page 31: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Contributions• Applied DDR based memory controller policies in

FBDIMM memory

• Evaluation of Performance

• Exploit FBDIMM depth: rank (DIMM) parallelism

• latency and bandwidth for FBDIMM and DDR

– high utilization of the channels, FBDIMM

• 7% in latency

• 10%

– low utilization of the channels

• 25% in latency

• 10 % in bandwidth

Page 32: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Northbound channel: reads / Southbound-channel: writes

AMB: pass-through switch, buffer, serial/parallel converter

Page 33: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Methodology DRAMsim simulator

Execution-driven simulator

Detailed models of FBDIMM and DDR2 based on real standard configurations

Standalone / coupled with M5/SS/Sesc

Benchmarks: bandwidth-bound SVM from Bio-Parallel (r:90%)

SPEC-mixed: 16 independent (r:w = 2:1)

UA from NAS (r:w = 3:2)

ART (SPEC-2000, OpenMP) (r:w = 2:1)

Page 34: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Methodology: cont

• Different scheduling policies: greedy, OBF, most/last pending and RIFF

• 16-way CMP, 8MB L2

• Multi-threaded traces gathered with CMP$im

• SPEC traces using Simplescalar with 1MB L2, in-order core

• 1 rank/DIMM

Page 35: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

High-bandwidth utilization:

– Better bandwidth: FBDIMM

– Larger latency

Page 36: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

• ART and UA: latency reduction

Page 37: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Low utilization: serialization cost

Depth: FBDIMM scheduler offsets serialization

Page 38: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

• Overhead: queue, south and rank availability

• Single-rank: higher latency

Page 39: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Scheduling

• Best: RIFF, priority on reads than writes

Page 40: DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

Bandwidth is less sensitive th Higher latency in open-page mode

More channels => decreases channel utilization