computer architecture peripherals

27
Computer Architecture 2011 – periph 1 Computer Architecture Peripherals By Dan Tsafrir, 6/6/2011 Presentation based on slides by Lihu Rappoport

Upload: faraji

Post on 24-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Computer Architecture Peripherals. By Dan Tsafrir, 6/6/2011 Presentation based on slides by Lihu Rappoport. Memory: reminder. 1000. CPU. 100. Performance. Gap grew 50% per year. 10. DRAM. 1. 1980. 1981. 1982. 1983. 1984. 1985. 1986. 1987. 1988. 1989. 1990. 1991. 1992. 1993. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals1

Computer Architecture

Peripherals

By Dan Tsafrir, 6/6/2011Presentation based on slides by Lihu Rappoport

Page 2: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals2

MEMORY: REMINDER

Page 3: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals3

Not so long ago…

1

10

100

1000

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

Perf

orm

ance

Time

DRAM9% per yr2X in 10 yrs

CPU60% per yr2X in 1.5 yrs

Gap grew 50% per year

Page 4: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals4

Not so long ago… In 1994, in their paper

“Hitting the Memory Wall: Implications of the Obvious”,

William Wulf & Sally McKee said:

“We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed – each is improving exponentially, but the exponent for microprocessors is substantially larger than that for DRAMs.

The difference between diverging exponentials also grows exponentially; so, although the disparity between processor and memory speed is already an issue, downstream someplace it will be a much bigger one.”

Page 5: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals5

More recently (2008)…lo

wer

= s

low

erFast

Slow

The memory wall in the multicore era

Perf

orm

ance

(se

cond

s)

Processor cores

Conventionalarchitecture

Page 6: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals6

Memory Trade-Offs Large (dense) memories are slow Fast memories are small, expensive and consume high

power Goal: give the processor a feeling that it has a memory

which is large (dense), fast, consumes low power, and cheap

Solution: a Hierarchy of memories

Speed: Fastest SlowestSize: Smallest BiggestCost: Highest LowestPower: Highest Lowest

L1CacheCPU L2

CacheL3

CacheMemory(DRAM)

Page 7: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals7

Typical levels in mem hierarchy

Response time Size Memory level≈ 0.5 ns ≈ 100 bytes CPU registers≈ 1 ns ≈ 64 KB L1 cache ≈ 15 ns ≈ 1 – 4 MB L2 cache≈ 150 ns ≈ 1 – 4 GB Main memory

(DRAM)≈ 15 ms ≈ 1 – 2 TB Hard disk (SATA)

Page 8: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals8

DRAM & SRAM

Page 9: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals9

DRAM basics DRAM

Dynamic random-access memory Random access = access cost the same (well, not really)

CPU thinks of DRAM as 1-dimensional Simpler

But DRAM is actually arranged as a 2-D grid Need row & col addresses to access Given “1-D address”, DRAM interface splits it to row &

col Some time duration must elapse between row & col

access(10s of ns)

Page 10: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals10

DRAM basics Why 2D? Why delayed row & col accesses?

Every address-bit requires a physical pin DRAMs are large (GBs nowadays)

=> need many pins => more expensive

A DRAM array has Row decoder

• Extracts row number from memory address Column decoder

• Extracts column number from memory address Sense amplifiers

• Hold row when (1) written to, (2) read from, (3) is refreshed (see next slide)

Page 11: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals11

DRAM basics Use one transistor-capacitor pair

Per bit

Capacitors leaks => Need to be refreshed every few ms

DRAM spends ~1% of time in refreshing “Opening” a row = fetching it to sense amplifiers = refreshing it

Is it worth it to make DRAM a rectangle (rather than square?)

Page 12: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals12

x1 DRAM

Data in/out

buffersSense

amplifiers

Memoryarray

Column decoder

Row

deco

der

…columns……rows…

one bit

Page 13: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals13

DRAM banks Each DRAM memory array outputs one bit

DRAMs use multiple arrays to output multiple bits at a time x N indicates DRAM with N memory arrays Typical today: x16, x32

Each collection of x N arrays forms a DRAM bank Can read/write from/to each bank independently

Page 14: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals14

x4 DRAM

one bit

…row…

…columns…

Data in/out

buffersSense

amplifiers

Memoryarray

Column decoder

Row

deco

der

…row…

…columns…

Data in/out

buffersSense

amplifiers

Memoryarray

Column decoder

Row

deco

der

…row…

…columns…

Data in/out

buffersSense

amplifiers

Memoryarray

Column decoder

Row

deco

der

…rows…

…columns…

Data in/out

buffersSense

amplifiers

Memoryarray

Column decoder

Row

deco

der

Page 15: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals15

Ranks & DIMMs DIMM

(Dual in-line) memory module (the unit we connect to the MB)

Increase bandwidth by delivering data from multiple banks Bandwidth by one bank is limited => Put multiple banks on DIMM Bus has higher clock frequency than any one DRAM Bus controls switches between banks to achieve high

data rate

Increase capacity by utilizing multiple ranks Each rank is an independent set of banks that can be

accessed for the full data bit‐width, • 64 bits for non-ECC; 72 for ECC (error correction code)

Ranks cannot be accessed simultaneously• As they share the same data path

Page 16: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals16

Ranks & DIMMs

1GB 2Rx8 (= 2ranks x 8 banks)

Page 17: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals17

Modern DRAM organization A system has multiple DIMMs

Each DIMM has multiple DRAM banks Arranged in one or more ranks

Each bank has multiple DRAM arrays

Concurrency in banks increases memory bandwidth

Page 18: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals18

Memory controllerM

emor

yco

ntro

ller

address/command bus

data bus

chip select 1

address/command bus

data bus

chip select 2

Page 19: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals19

Memory controller Functionality

Executes processor memory requests

In earlier systems Separate off-processor chip

In modern systems Integrated on-chip with the processor

Interconnect with processor Bus, but can be point-to-point, or through crossbar

Page 20: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals20

Lifetime of a memory access1. Processor orders & queues memory requests2. Request(s) sent to memory controller3. Controller queues & orders requests4. For each request in queue, when the time is right

1. Controller waits until requested DRAM ready2. Controller breaks address bits into rank, bank, row,

column fields3. Controller sends chip-select signal to select rank4. Selected bank pre-charged to activate selected row5. Activate row within selected DRAM bank

• Use “RAS” (row-address strobe signal)6. Send (entire) row to sense amplifiers7. Select desired column

• Use “CAS” (column-address strobe signal)8. Send data back

Page 21: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals21

Basic DRAM array

· Timing (2 phases)· Decode row address + RAS assert· Wait for “RAS to CAS delay”· Decode column address + CAS assert· Transfer DATA

Row latch

Row addressdecoder

Column addrdecoder

Column latchCAS#

RAS# Data

Memoryarray

Memory address bus

Addr

Page 22: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals22

DRAM timing CAS Latency

Number of clock cycles to access a specific column of data

From moment the memory controller issues a column in the current row until data is read out from memory

RAS to CAS delay Number of cycles between row and column access

Row pre-charge time Number of cycles to close the opened-row & to open

next-row

Page 23: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals23

Addressing sequence

· Access sequence· Put row address on data bus and assert RAS#

· Wait for RAS# to CAS# delay (tRCD)· Put column address on data bus and assert CAS# · DATA transfer· Pre-charge

access time

RAS/CAS delay

precharge delay

RAS#

Data

A[0:7]

CAS#

Data n

Row i Col n Row jX

CAS latencyX

Page 24: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals24

· Paged Mode DRAM– Multiple accesses to different columns from same row (special locality)– Saves time it takes to bring a new row (but might be unfair)

· Extended Data Output RAM (EDO RAM)– A data output latch enables to parallelize next column address with

current column data

Improved DRAM Schemes

RAS#

DataA[0:7]CAS#

Data n D n+1

Row X Col n X Col n+1 X Col n+2 X

D n+2

X

RAS#

DataA[0:7]CAS#

Data n Data n+1

Row X Col n X Col n+1 X Col n+2 X

Data n+2

X

Page 25: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals25

· Burst DRAM– Generates consecutive column address by itself

Improved DRAM Schemes (cont)

RAS#

DataA[0:7]CAS#

Data n Data n+1

Row X Col n X

Data n+2

X

Page 26: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals26

Synchronous DRAM (SDRAM) Asynchrony in DRAM

Due to RAS & CAS arriving at any time

Synchronous DRAM Uses clock to deliver requests at regular intervals More predictable DRAM timing => Less skew => Faster turnaround

SDRAMs support burst-mode access Initial performance similar to BEDO (=burst +EDO) Clock scaling enabled higher transfer rates later

• => DDR SDRAM => DDR2 => DDR3

Page 27: Computer Architecture Peripherals

Computer Architecture 2011 – peripherals27

DRAM vs. SRAM(Random access = access time the same for all locations)

DRAM – Dynamic RAM SRAM – Static RAM

Refresh Yes (~1% time) NoAddress Address muxed:

row+colAddress not multiplexed

Random Access Not really… Yesdensity High (1 Transistor/bit) Low (6 Transistor/bit)Power low highSpeed slow fastPrice/bit low high

Typical usage Main memory cache