dezső sima spring 2008 (ver. 1.0) sima dezső, 2008 fb-dimm technology

33
Dezső Sima Spring 2008 (Ver. 1.0) Sima Dezső, 2008 FB-DIMM technology

Upload: roland-mccoy

Post on 24-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Dezső Sima

Spring 2008

(Ver. 1.0) Sima Dezső, 2008

FB-DIMM technology

Page 2: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Motivations to introduce FB-DIMMs in servers/workstations

Shortcommings of the stub-bus topology used with conventional DRAM architectures [2]

Impedance discontinuities effectsignal integrity [2]

Stub-bus topology

Data lines of the memory controller are electrically connected

to the data lines of every DRAM deviceon the bus (memory channel)

Memory channels may have 8 DIMMs with 8 DRAM devices/DIMM(i.e. 72 devices/channel)

Heavy signal loading due to the large number of devices and impedance discontinuities on the bus

limit the number of DRAM devices connected to the channelthe more the higher the data rate

Page 3: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Figure: Scaling number of channels with memory hubs [7]. Two ranks of DRAM devices per DIMM is assumed. In the case of single rank per DIMM , while the number of DIMMs per channel may be doubled, the declining trend shown in the figure remains the same.

Page 4: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

For higher DRAM speeds less DRAM devicescan be connected

per memory channel [2]

Stub-bus channel capacity(device density x nr. of devices)

has hit its ceiling [2]

but

increasing server performancedoubles memory capacity demand

about every two years [2]

Page 5: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

from Jacob mem systems 2007

Page 6: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Increasing the number of memory channels

Each DDR2 memory channel requires 240 pins

Page 7: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

• introduce packed based serial transmission (like in the PCI-E, SATA, SAS buses)

• introduce full buffering (registered DIMMs buffer only addresses)

• CRC error checking (cyclic redundancy check)

FB-DIMM technology (1)

Principle of operation

Page 8: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Figure: FB-DIMM memory architecture [4]

FB-DIMM technology (2)

Page 9: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Figure: Maximum supported FB-DIMM configuration [6](6 channels/8 DIMMs)

Page 10: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

• Serial transmission between the North Bridge and the DIMMs (each bit needs a pair of wires)

• Read packets (frames, bursts): 168 bits (12 x 14 bits)

• 144 data bits

(equals the number of data bits produced by a 72 bit wide DDR2 module (64 data bits + 8 ECC bits) in two memory cycles)

• 24 CRC bits.

• Every 12 cycles (that is every two memory cycles) constitute a packet.

• Write packets (frames, bursts): 120 bits (12 x 10 bits)

• 98 payload bits

• 22 CRC bits.

• Clocked at 6 x double pumped data rate

e.g. for a DDR 667 DRAM the clock rate is: 6 x 667 MHz = 4 GHz

FB-DIMM technology (3)

Implementation details (1)

• Number of seral links

• 14 read lanes (2 wires each)• 10 write lanes (2 wires each)

Page 11: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

98 payload bits.

• 2 frame type bits,

• 24 bits of command,

• 72 bits for data and commands, according to the frame type, e.g. 72 bits of data, 36 bits of data + one command or two commands.

Commands

• row select, precharge, refresh, read, write etc.• all commands include a 3-bit FB-DIMM module address to select one of 8 modules.

FB-DIMM technology (4)

Implementation details (2)

Page 12: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Read bandwidth:

One FB-DIMM channel transfers in one frame (that is in 12 cycles): 128 data bits, + 16 ECC bits

One frame lasts 2 memory cycles

One DDR2 DIMM channel transfers in 2 memory cycles: 2 x 72 bits (2 x 64-bit data + 2 x 8-bit ECC)

The read bandwidth of an FB-DIMM channel

equals

the bandwidth of a DDR2 channel

Write bandwidth:

The write bandwidth of an FB-DIMM channel is up to 0.5 x the read bandwidth.

But FB-DIMMs allow simultan read and write operation

FB-DIMM technology (5)

Implementation details (3)

Page 13: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Source: PC stats

FB-DIMM-4300 (DDR2-533 SDRAM); Clock Speed: 133MHz, Data Rate: 532MHz, Through-put 4300MB/s

PC2-5300 (DDR2-667 SDRAM); Clock Speed: 167MHz, Data Rate: 667MHz, Through-put 5300MB/s

PC2-6400 (DDR2-800 SDRAM); Clock Speed: 200MHz, Data Rate: 800MHz, Through-put 6400MB/s

FB-DIMM data puffer

Figure: Different implementations of FB-DIMMs

FB-DIMM technology (6)

(Advanced Memory Buffer, AMB)

Manages the read/write operationsof the module

Page 14: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

(There are two Command/Address buses (C/A) to limit loads of 9 to 36 DRAMs)

Figure: Block diagram of the AMB [3]

Page 15: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology
Page 16: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Necessary routing to connect the north bridge to the DIMM socket

a) In case of a DDR2 DIMM (240 pins)

b) In case of an FB-DIMM (69 pins)

A 3-layer PCB is needed A 2-layer PCB is needed(but a 3. layer is used for power lines)

Figure: PCB routing [4]

FB-DIMM technology (7)

Page 17: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Figure: Latency and bandwith figures of different DRAM technologies for a mix of SPEC applications [5]

FB-DIMM technology (8)

Page 18: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Advantage of FB-DIMMs vs DDR2 and DDR3 DIMMs

• more memory channels (up to 6) higher total bandwidth

• more DIMM modules (up to 8) per channel

higher memory capacity (up to 192 GB)

Disadvantage of FB-DIMMs vs DDR2 and DDR3 DIMMs

• higher latency and lower bandwidth figures for 4 to 8 DIMM modules

• higher cost

• higher dissipation

• less wires

simplified PCB routing

• symultaneous read/write operation in a channel

FB-DIMM technology (9)

Pros and cons of FB-DIMMs

(Typical dissipation figures: DDR2: about 5 W AMB: about 5 W DDR2 FB-DIMM: about 10 W)

Page 19: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Latency

The other issue is potentially more troubling. Intel addressed this by not having the signals be stored and then retransmitted. The data travels along a special fast-pass-through channel in the buffer itself. This lessens much of the latency that would be

induced by store and forward architectures.

Page 20: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Figure: FB-DIMM heat sinks (heat spreaders)

Page 21: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

• 5/2006 Intel adopts it in its Bensley platform (5000) for DPs

• 8/2007 Sun introduces it in the Niagara II

• 9/2006 AMD has taken it off from their road map

• 9/2007 Intel uses it in the Caneland platform (7000) for MPs

• 2007 Major memory manufacturers intend to develop DDR3 DIMMs

instead of DDR3 based FB-DIMMs

FB-DIMM technology (10)

Market penetration of the FB-DIMM technology

Standardisation

3/2007 JESD205 DDR2 SDRAM Fully Buffered DIMM (FBDIMM) Design Specification

DDR2-533, DDR2-667, DDR2-800 x72 ECC, 240 pin256 Mb, 512 Mb, 1 Gb, 2 Gb, 4 Gb devices

1/2007 JESD 206 FBDiMM Architecture and Protocol

Page 22: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

The key difference between DDR and DDR2 is that the DDR2 data bus is clocked at twice the speed of the memory cells, so four data words can be transferred in each memory cell cycle without speeding up the memory cells themselves.

DDR2 vs (SDRAM) DDR

FB-DIMM technology (11)

Figure: Clocking schemes of the SDR, DDR and DDR2 SDRAM techologies [1]

Page 23: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Although introduced in Q2 2003 at 200/266 MHz, initially DDR2 could not becompetitive due to too high latency figures. As lower latency parts became available by the end of 2004 DDR2 became widespread.

DDR2's bus frequency is boosted by electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers. However, latency is greatly increased as a trade-off. The DDR2 prefetch buffer is 4 bits deep, whereas it is 2 bits deep for DDR (and 8 bits deep for DDR3). While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, early DDR2 may have read latencies between 4 and 6 cycles.

Memory Timings Latency Bandwidth in dual-channel mode

DDR400 SDRAM 2.5–3–3 12.5 ns 6.4 GB/sec

DDR400 SDRAM 2–3–2 10 ns 6.4 GB/sec

DDR533 SDRAM 3–4–4 11.2 ns 8.5 GB/sec

DDR533 SDRAM 2.5–3–3 9.4 ns 8.5 GB/sec

DDR2-533 SDRAM 5–5–5 18.8 ns 8.5 GB/sec

DDR2-533 SDRAM 4–4–4 15 ns 8.5 GB/sec

DDR2-533 SDRAM 3–3–3 11.2 ns 8.5 GB/sec

DDR2-600 SDRAM 5–5–5 16.6 ns 9.6 GB/sec

DDR2-600 SDRAM 4–4–4 13.3 ns 9.6 GB/sec

Table: Burst timing, latency and bandwidth figures of DDR and DDR2 DRAM technologies [1]

Page 24: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Early DDR2-533 SDRAM modules available at the time of the announcement of i925 and i915 chipsets (6/2004) had 4-4-4 timings (CAS Latency - RAS to CAS Delay - RAS Precharge Time).

CAS latency (Column Address Select),(CL)

the time delay (in number of clock cycles) between a memory chip is accessed for dataand the first data bit becomes available

For instance, after accessing a 400 MHz CL3 device, the first bit arrives in 3 x 2.5 ns = 7.5 ns

Page 25: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

FB-DIMM technology ()

DDR2 has 240 pins instead of 168 pins used by DDR DIMMs

Power savings are achieved primarily due to a drop in operating voltage (1.8 V compared to DDR's 2.5 V).

Page 26: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Official JEDEC Specifications

  DDR2 DDR3

Rated Speed 400-800 Mbps 800-1600 Mbps

Vdd/Vddq 1.8V +/- 0.1V 1.5V +/- 0.075V

Internal Banks 4 8

Termination Limited All DQ signals

Topology Conventional T Fly-by

Driver Control OCD Calibration Self Calibration with ZQ

Thermal Sensor No Yes (Optional)

Source: Anandtech

DDR3

Appeared mid 2007 e.g. in Intel’s P35 Bearlake

Source: Wiki

Page 27: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology
Page 28: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

5.2. Speed gap between processor and memory (1a)

Figure 5.1a: DRAM types

DRAM FPM EDO BEDO SDRAM DRDRAM

Cycle time within a burst(for a 60 ns part)

Full burst timing

Max. bandwidth MB/s

Effective bandwidth MB/s

Examples

Remakes

Random access,typ. access time60/70/80/100 ns

(60 ns)

(5-5-5-5)

Access to 4subsequentcolumns

Overlapping theread and addresstransfer operations

Internal 2-bitaddress generator,

dual banks

Full pipelinedoperation,

assuming at leastdual banks

66/100/133 MHz

Asynchronous

Burst mode access (4*8B) on the same row (page)

Synchronous

Up to 66 MHz bus frequency

Internal on-chipSRAM cache,page is filled in

1 clock cycle,1-2 B wide data path256/300/356/400MHz transfer rate

~ 40 ns ~ 25 ns ~ 15 ns ~ 15/10/7.5 ns (4/3.3/2.8/2.5 ns)

(5-7)-3-3-3(5-7)-4-4-4

(5-7)-2-2-2 5-1-1-1 (5-7)-1-1-1

Triton I.: 7-3-3-3Triton III.: 6-3-3-3

Triton I.: 7-2-2-2Triton II,III.:6-2-2-2

Triton III.: 7-1-1-1430 ZX.: 7-1-1-1

820840

Developed byMICRON

Developed byRAMBUS

Level of overlapping

Since 1996

Cached structure

1

1

2

2

3

3

4

4

5

5

6

6

Dynamic RAMFast Page Mode DRAMExtended Data Out DRAM

Burst mode EDOSynchronous DRAMDirect Rambus DRAM

Page 29: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

5.2. Speed gap between processor and memory (1b)

Figure 5.1b: Latency of DRAM chips

486 DX P PPro PII PIII386 DX

86 8881 82 83 84 85 87 89 90 91 92 93 94 95 96 97 98 99

200

180

160

140

120

100

80

60

40

20

2000

*

PC AT

*

*

* *

**

**

*

*

16 K 128 K 256 K 256 K 4 M 16 M

tRAC

Year

Processorchipset

Typ. DRAMparts

(ns)

430 NX

4 M

4 M

4 M1 M 1 M

8 M

16 M 64 M64 M

16 M64 M 128 M

256 M

200

150

100

80

80

60

70

5060

50

30

450 KX/GX 440 BX 815

tRAC

: Row access time (time from row address until data valid)

128 K256 K

Page 30: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

5.2. Speed gap between processor and memory (1c)

Figure 5.1c: System-level memory latency in x86-based PCs

486 DX P PPro PII PIII386 DX

86 8881 82 83 84 85 87 89 90 91 92 93 94 95 96 97 98 99

100

10

1

2000

PC

Year

Processor

Memory latencyin proc. cycles

AT(286)(8088)

P4

50

1000

3020

500

200

23

5

*

*

*

*

10

40

85

702

300

**

*

1 1

3

Memory latencyns

500

400

300

200

100

*

*

**

*

155

135

141

116

468

*200

Latency in ns

Latency in proc.cycles

Page 31: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

5.2. Speed gap between processor and memory (1d)

Figure 5.1d: Latency of DRAM chips (in clock cycles)

20

40

30

1.0 2.0fc

1.5 2.50.5

10 *

*

*

*

*

*

*

3.0 3.5

*

4.0

Memory latency

*

*

*

**

60

50

80

70

100

90

Pentium

Pentium Pro

Pentium II

Pentium III Pentium 4

RDRAM-40

120

110

*

*

*

*

**

RDRAM-60 DDR2 533

DDR 400

DDR 333

PC 133

PC 100

PC 66

386

EDO

(cycles)

FPM

130*

DDR 266

486

*

*

(GHz)

Page 32: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

Figure 5.2: Relative transfer rate of memories (D: dual channel)

0.20

0.40

0.30

1.0 2.0fc

1.5 2.50.5

0.10

**

*

**

*

*

*

*

*

***

*

3.0 3.5

*

*

*

**

4.0

Tmemory/f c

*

*

*

**

**

*

*

*

** *

*

**

*

0.60

0.50

0.80

0.70

1.00

0.90

Pentium

Pentium Pro

Pentium II

Pentium III Pentium 4

PC-66

PC-100

PC-133

DDR 266

PC-800D

DDR 333

DDR 333D

**

*

*****

*

DDR 400

DDR 400D

DDR 533D

*

*

*

*

*

*

*

*

FPM

EDO

(GHz)

5.2. Speed gap between processor and memory (2)

Page 33: Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology

References

[1]: Gavrichenkov I., „DDR2 vs. DDR: Revenge Gained,” Xbit Laboratories, 12/17/2004, http://www.xbitlabs.com/articles/memory/display/ddr2-ddr.html

[2]: Vogt P., Fully Buffered DIMM (FB-DIMM) Server Memory Architecture,”, Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA_S008_FB-DIMM-Arch.pdf

[3]: McTague M. & David H., „ Fully Buffered DIMM (FB-DIMM) Design Considerations,” Febr. 18, 2004, Intel Developer Forum, http://www.idt.com/content/OSA-S009.pdf

[4]: Haas, J. & Vogt P., Fully buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazine, March 2005, pp. 1-7

[5]: Ganesh B., Jaleel A., Wang D. , Jacob B., „Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling”, Proc. HPCA 2007

[6]: - „Introducing FB-DIMM Memory: Birth of Serial RAM?,” PCStats, Dec. 23, 2005, http://www.pcstats.com/articleview.cfm?articleid=1812&page=1

[7]: Haas J. & Vogt P., „Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level,” Technology Intel Magazin, Technology Intel Magazin, http://www.intel.com/ technology/magazine/computing/fully-buffered-dimm-0305.htm