omse 510: computing foundations 2: disks, buses, dram

125
OMSE 510: Computing Foundations 2: Disks, Buses, DRAM Portland State University/OMSE

Upload: hina

Post on 09-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

OMSE 510: Computing Foundations 2: Disks, Buses, DRAM. Portland State University/OMSE. Outline of Comp. Architecture. Outline of the rest of the computer architecture section: Start with a description of computer devices, work back towards the CPU. Computer Architecture Is …. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

OMSE 510: Computing Foundations2: Disks, Buses, DRAM

Portland State University/OMSE

Page 2: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Outline of Comp. ArchitectureOutline of the rest of the computer architecture

section:

Start with a description of computer devices, work back towards the CPU.

Page 3: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Computer Architecture Is …

the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.

Amdahl, Blaaw, and Brooks, 1964

SOFTWARESOFTWARE

Page 4: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Today

Begin Computer Architecture

Disk Drives

The Bus

Memory

Page 5: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Computer System (Idealized)

CPUMemory

System Bus

Disk Controller

Disk

Page 6: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

I/O Device Examples

Device Behavior Partner Data Rate (KB/sec)

Keyboard Input Human 0.01

Mouse Input Human 0.02

Line Printer Output Human 1.00

Floppy disk Storage Machine 50.00

Laser Printer Output Human 100.00

Optical Disk Storage Machine 500.00

Magnetic Disk Storage Machine 5,000.00

Network-LAN Input or Output Machine 20 – 1,000.00

Graphics Display Output Human 30,000.00

Page 7: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

A Device: The Disk

Disk Drives!

- eg. Your hard disk drive

- Where files are physically stored

- Long-term non-volatile storage device

Page 8: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Magnetic Drum

Page 9: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Spiral Format for Compact Disk

Page 10: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

A Device: The DiskMagnetic Disks

- Your hard disk drive

- Where files are physically stored

- Long-term non-volatile storage device

Page 11: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

A Magnetic Disk with Three Platters

Page 12: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Organization of a Disk Platter with a 1:2 Interleave Factor

Page 13: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Physical Characteristics

Platters 1 to 20 with diameters from 1.3 to 8 inches (Recording

on both sides)

Tracks 2500 to 5000 Tracks/inch

Cylinders all tracks in the same position in the platters

Sectors 128-256 sectors/track with gaps and info related to

sectors between them (typical sector, 256-512 bytes)

Page 14: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Physical Characteristics

Trend as of 2005:

Constant bit density (105 bits/inch) Ie. More info (sectors) on outer tracks

Strangely enough, history reverses itself Originally, disks were constant bit density (more

efficient) Then, went to uniform #sectors/track (simpler, allowed

easier optimization) Returning now to constant bit density

Disk capacity follows Moore’s law: doubles every 18 months

Page 15: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Example: Seagate Barracuda

Disk for server

10 disks hence 20 surfaces

7500 cylinders, hence 7500*20 = 150000 total tracks

237 sectors/track (average)

512 bytes/sector

Total capacity:

150000 * 237 * 512 = 18,201,600,000 bytes

= 18 GB

Page 16: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Things to consider

Addressing modes: Computers always refer to data in blocks (512bytes

common) How to address blocks? Old school: CHS (Cylinder-Head-Sector)

Computer has an idea how the drive is structured

New School: LBA (Large Block Addressing) Linear!

Page 17: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Performance

Steps to read from disk:

1. CPU tells drive controller “need data from this address”

2. Drive decodes instruction

3. Move read head over desired cylinder/track (seek)

4. Wait for desired sector to rotate under read head

5. Read the data as it goes under drive head

Page 18: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Performance

Components of disk performance:

Seek time (to move the arm on the right cylinder)

Rotation time (on average ½ rotation) (time to find the right sector)

Transfer time depends on rotation time

Disk controller time. Overhead to perform an access

Page 19: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Performance

So Disk Latency = Queuing Time + Controller time + Seek time + Rotation time + Transfer time

Page 20: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Seek Time

From 0 (if arm already positioned) to a maximum 15-20 ms

Note: This is not a linear function of distance (speedup + coast + slowdown + settle)

Even when reading tracks on the same cylinder, there is a minimal seek time (due to severe tolerances for head positioning)

Barracuda example: Average seek time = 8 ms, track to track seek time = 1 ms, full disk seek = 17ms

Page 21: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Rotation time

Rotation time:

Seagate Barracuda: 7200 RPM

(Disks these days are 3600, 4800, 5400, 7200 up to 10800 RPM)

7200 RPM = 120 RPS = 8.33ms per rotation

Average rotational latency = ½ worst case rotational latency = 4.17ms

Page 22: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Transfer time

Transfer time depends on rotation time, amount of data to transfer (minimum one sector), recording density, disk/memory connection

These days, transfer time around 2MB/s to 16MB/s

Page 23: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Controller OverheadDisk controller contains a microprocessor + buffer memory + possibly a cache (for disk sectors)

Overhead to perform an access (of the order of 1ms) Receiving orders from CPU and interpreting them Managing the transfer between disk and memory (eg.

Managing the DMA) Transfer rate between disk and controller is smaller

than between disk and memory, hence: Need for a buffer in controller This buffer might take the form of a cache (mostly for read-

ahead and write-behind)

Page 24: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Time ExampleDisk Parameters: Transfer size is 8K bytes Advertised average seek is 12 ms Disk spins at 7200 RPM Transfer rate is 4MB/s

Controller overhead is 2ms

Assume disk is idle so no queuing delay

What is Average disk time for a sector?avg seek + avg rot delay + transfer time + controller

overhead

____ + _____ + _____ + _____

Page 25: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Time ExampleAnswer: 20ms

But! Advertised seek time assumes no locality: typically ¼ to 1/3rd advertised seek time!

20ms->12ms

Locality is an effect of smart placement of data by the operating system

Page 26: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

My DiskHitachi Travelstar 7K100 60GB ATA-6 2.5in

7200RPM Mobile Hard Drive w/8MB Buffer Interface:

ATA-6 Capacity (GB)1: 60 Sector size (bytes): 512 Data heads: 3 Disks: 2

Performance

Data buffer (MB): 8 Rotational speed (rpm): 7,200 Latency (average ms): 4.2 Media transfer rate (Mbits/sec): 561 Max.interface transfer rate (MB/sec):      100 Ultra DMA mode-5      16.6 PIO mode-4Command Overhead: 1ms

Seek time (ms):     Average: 10 R / 11 W Track to track: 1 R / 1.2 W Full stroke:18 R / 19 W

Sectors per Track: 414-792Max.areal density (Gbits/sq.inch): 66

Disk to buffer data transfer: 267-629 Mb/s

Buffer-host data transfer: 100 MB/s

Page 27: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Some other quotesHard Drives:

Notebook: Toshiba MK8026GAX 80GB, 2.5", 9.5mm, 5400 RPM, 12ms seek, 100MB/s

Desktop: Seagate 250GB, 7200RPM, SATA II, 9-11ms seek

Buffer to host: 300MB/s

Buffer to disk: 93MB/s

Server: Seagate Raptor SATA, 10000RPM, SATA

Buffer to host: 150MB/s

Buffer to disk: 72MB/s

Page 28: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Next Topic

Disk Arrays

RAID!

Page 29: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disk Capacity now doubles every 18 months; before1990 every 36 months

• Today: Processing Power Doubles Every 18 months

• Today: Memory Size Doubles Every 18 months(4X/3yr)

• Today: Disk Capacity Doubles Every 18 months

• Disk Positioning Rate (Seek + Rotate) Doubles Every Ten Years!

• Caches in Memory and Device Controllers to Close the Gap

The I/OGAP

The I/OGAP

Technology Trends

Page 30: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

14”10”5.25”3.5”

3.5”

Disk Array: 1 disk design

Conventional: 4 disk designs

Low End High End

Disk Product Families

Manufacturing Advantages of Disk Arrays

Page 31: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Data Capacity

Volume

Power

Data Rate

I/O Rate

MTTF

Cost

IBM 3390 (K)

20 GBytes

97 cu. ft.

3 KW

15 MB/s

600 I/Os/s

250 KHrs

$250K

IBM 3.5" 0061

320 MBytes

0.1 cu. ft.

11 W

1.5 MB/s

55 I/Os/s

50 KHrs

$2K

3.5”x70

23 GBytes

11 cu. ft.

1 KW

120 MB/s

3900 IOs/s

??? Hrs

$150K

Disk Arrays have potential for

large data and I/O rates

high MB per cu. ft., high MB per KW

reliability?

Small # of Large Disks Large # of Small Disks!

Page 32: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

• Reliability of N disks = Reliability of 1 Disk ÷ N

50,000 Hours ÷ 70 disks = 700 hours

Disk system MTTF: Drops from 6 years to 1 month!

• Arrays (without redundancy) too unreliable to be useful!

Hot spares support reconstruction in parallel with access: very high media availability can be achievedHot spares support reconstruction in parallel with access: very high media availability can be achieved

Array Reliability

Page 33: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Media Bandwidth/Latency Demands

Bandwidth requirements High quality video

Digital data = (30 frames/s) × (640 x 480 pixels) × (24-b color/pixel) = 221 Mb/s (27.625 MB/s)

High quality audio Digital data = (44,100 audio samples/s) × (16-b audio samples) ×

(2 audio channels for stereo) = 1.4 Mb/s (0.175 MB/s) Compression reduces the bandwidth requirements considerably

Latency issues How sensitive is your eye (ear) to variations in video (audio) rates? How can you ensure a constant rate of delivery? How important is synchronizing the audio and video streams?

15 to 20 ms early to 30 to 40 ms late tolerable

Page 34: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Dependability, Reliability, Availability

Reliability – a measure of the reliability measured by the mean time to failure (MTTF). Service interruption is measured by mean time to repair (MTTR)

Availability – a measure of service accomplishment

Availability = MTTF/(MTTF + MTTR)

To increase MTTF, either improve the quality of the components or design the system to continue operating in the presence of faulty components

1. Fault avoidance: preventing fault occurrence by construction

2. Fault tolerance: using redundancy to correct or bypass faulty components (hardware)

Fault detection versus fault correction Permanent faults versus transient faults

Page 35: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

RAIDs: Disk Arrays

Arrays of small and inexpensive disks Increase potential throughput by having many disk drives

Data is spread over multiple disks Multiple accesses are made to several disks at a time

Reliability is lower than a single disk

But availability can be improved by adding redundant disks (RAID) Lost information can be reconstructed from redundant information MTTR: mean time to repair is in the order of hours MTTF: mean time to failure of disks is tens of years

Redundant Array of Inexpensive Disks

Page 36: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

RAID: Level 0 (No Redundancy; Striping)

Multiple smaller disks as opposed to one big disk Spreading the data over multiple disks – striping – forces

accesses to several disks in parallel increasing the performance

Four times the throughput for a 4 disk system

Same cost as one big disk – assuming 4 small disks cost the same as one big disk

No redundancy, so what if one disk fails? Failure of one or more disks is more likely as the number

of disks in the system increases

S0,b0 S0,b2S0,b1 S0,b3

sector numberbit number

Page 37: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

RAID: Level 1 (Redundancy via Mirroring)

Uses twice as many disks as RAID 0 (e.g., 8 smaller disks with second set of 4 duplicating the first set) so there are always two copies of the data Still four times the throughput # redundant disks = # of data disks so twice the cost of

one big disk writes have to be made to both sets of disks, so writes would be

only 1/2 the performance of RAID 0

What if one disk fails? If a disk fails, the system just goes to the “mirror” for the

data

S0,b0 S0,b2S0,b1 S0,b3 S0,b0 S0,b1 S0,b2 S0,b3

redundant (check) data

Page 38: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

RAID: Level 2 (Redundancy via ECC)

ECC disks contain the parity of data on a set of distinct overlapping disks Still four times the throughput # redundant disks = log (total # of disks) so almost twice

the cost of one big disk writes require computing parity to write to the ECC disks reads require reading ECC disk and confirming parity

Can tolerate limited disk failure, since the data can be reconstructed

S0,b0 S0,b2S0,b1 S0,b3

3 5 6 7 4 2 1

10 0 0 11

ECC disks

0

ECC disks 4 and 2 point to either data disk 6 or 7, but ECC disk 1 says disk 7 is okay, so disk 6 must be in error

1

Page 39: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

RAID: Level 3 (Bit-Interleaved Parity)

Cost of higher availability is reduced to 1/N where N is the number of disks in a protection group Still four times the throughput # redundant disks = 1 × # of protection groups

writes require writing the new data to the data disk as well as computing the parity, meaning reading the other disks, so that the parity disk can be updated

Can tolerate limited disk failure, since the data can be reconstructed

reads require reading all the operational data disks as well as the parity disk to calculate the missing data that was stored on the failed disk

S0,b0 S0,b2S0,b1 S0,b3

10 0 1

parity diskdisk fails

1

Page 40: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

RAID: Level 4 (Block-Interleaved Parity)

Cost of higher availability still only 1/N but the parity is stored as blocks associated with a set of data blocks Still four times the throughput # redundant disks = 1 × # of protection groups Supports “small reads” and “small writes” (reads and writes that go to

just one (or a few) data disk in a protection group) by watching which bits change when writing new information, need

only to change the corresponding bits on the parity disk the parity disk must be updated on every write, so it is a bottleneck

for back-to-back writes

Can tolerate limited disk failure, since the data can be reconstructed

parity disk

Page 41: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Block WritesNew data

D0 D1 D2 D3 P

D0 D1 D2 D3 P

5 writes

involving all the disks

RAID 4 small writesNew data

D0 D1 D2 D3 P

D0 D1 D2 D3 P

2 reads and 2 writes

involving just two disks

RAID 3 block writes

Page 42: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

RAID: Level 5 (Distributed Block-Interleaved Parity)

Cost of higher availability still only 1/N but the parity is spread throughout all the disks so there is no single bottleneck for writes Still four times the throughput # redundant disks = 1 × # of protection groups Supports “small reads” and “small writes” (reads and writes

that go to just one (or a few) data disk in a protection group) Allows multiple simultaneous writes as long as the

accompanying parity blocks are not located on the same disk

Can tolerate limited disk failure, since the data can be reconstructed

Page 43: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

D0 D1 D2 D3 PD0'

+

+

D0' D1 D2 D3 P'

newdata

olddata

old parity

XOR

XOR

(1. Read) (2. Read)

(3. Write) (4. Write)

RAID-5: Small Write Algorithm

1 Logical Write = 2 Physical Reads + 2 Physical Writes

Problems of Disk Arrays: Block Writes

Page 44: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Distributing Parity Blocks

0 1 2 3 P0

4 5 6 7 P1

8 9 10 11 P2

12 13 14 15 P3

RAID 4 RAID 5

0 1 2 3 P0

4 5 6 P1 7

8 9 P2 10 11

12 P3 13 14 15

By distributing parity blocks to all disks, some small writes can be performed in parallel

Page 45: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Disks SummaryFour components of disk access time: Seek Time: advertised to be 3 to 14 ms but lower in real systems Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at 15000 RPM Transfer Time: 10 to 80 MB/s Controller Time: typically less than .2 ms

RAIDS can be used to improve availability RAID 0 and RAID 5 – widely used in servers, one estimate is that 80% of disks

in servers are RAIDs RAID 1 (mirroring) – EMC, Tandem, IBM RAID 3 – Storage Concepts RAID 4 – Network Appliance

RAIDS have enough redundancy to allow continuous operation

Page 46: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Computer System (Idealized)

CPUMemory

System Bus

Disk Controller

Disk

Page 47: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Next Topic

Buses

Page 48: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Control

Datapath

Memory

ProcessorInput

Output

What is a bus?A Bus Is:

shared communication link

single set of wires used to connect multiple subsystems

A Bus is also a fundamental tool for composing large, complex systems systematic means of abstraction

Page 49: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Bridge Based Bus

Arch-itecture

Bridging with dual Pentium II Xeon processors on Slot 2.

(Source: http://www.intel.com.)

Page 50: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Buses

Page 51: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

MemoryProcesser

I/O Device

I/O Device

I/O Device

Advantages of Buses

Versatility:New devices can be added easilyPeripherals can be moved between computersystems that use the same bus standard

Low Cost:A single set of wires is shared in multiple ways

Page 52: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

MemoryProcesser

I/O Device

I/O Device

I/O Device

Disadvantage of Buses

It creates a communication bottleneckThe bandwidth of that bus can limit the maximum I/O

throughput

The maximum bus speed is largely limited by:The length of the busThe number of devices on the busThe need to support a range of devices with:

Widely varying latencies Widely varying data transfer rates

Page 53: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Data Lines

Control Lines

The General Organization of a Bus

Control lines:Signal requests and acknowledgmentsIndicate what type of information is on the data lines

Data lines carry information between the source and the destination:

Data and AddressesComplex commands

Page 54: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

BusMaster

BusSlave

Master issues command

Data can go either way

Master versus Slave

A bus transaction includes two parts:Issuing the command (and address) – requestTransferring the data – action

Master is the one who starts the bus transaction by:issuing the command (and address)

Slave is the one who responds to the address by:Sending data to the master if the master ask for dataReceiving data from the master if the master wants to

send data

Page 55: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Types of BusesProcessor-Memory Bus (design specific)

Short and high speedOnly need to match the memory system

Maximize memory-to-processor bandwidthConnects directly to the processorOptimized for cache block transfers

I/O Bus (industry standard)Usually is lengthy and slowerNeed to match a wide range of I/O devicesConnects to the processor-memory bus or backplane bus

Backplane Bus (standard or proprietary)Backplane: an interconnection structure within the chassisAllow processors, memory, and I/O devices to coexistCost advantage: one bus for all components

Page 56: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Processor/MemoryBus -- Design Specific

Backplane Bus – PCIPCI Devices: Graphics IO Control

I/O Busses – IDE, USB & SCSI

Example: Pentium System Organization

Page 57: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Standard Intel Pentium Read and Write Bus Cycles

Page 58: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Intel Pentium Burst Read Bus Cycle

Page 59: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

A Computer System with One Bus: Backplane Bus

Processor Memory

I/O Devices

Backplane Bus

A single bus (the backplane bus) is used for:Processor to memory communicationCommunication between I/O devices and memory

Advantages: Simple and low cost

Disadvantages: slow and the bus can become a major bottleneck

Example: IBM PC - AT

Page 60: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

A Two-Bus SystemProcessor Memory

I/OBus

Processor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/OBus

I/OBus

I/O buses tap into the processor-memory bus via bus adaptors to speed match between bus types:

Processor-memory bus: mainly for processor-memory trafficI/O buses: provide expansion slots for I/O devices

Apple Macintosh-IINuBus: Processor, memory, and a few selected I/O devicesSCSI Bus: the rest of the I/O devices

Page 61: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

A Three-Bus System (+ backside cache)

Processor Memory

Processor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/O Bus

BacksideCache bus

I/O BusL2 Cache

A small number of backplane buses tap into the processor-memory bus

Processor-memory bus focus on traffic to/from memoryI/O buses are connected to the backplane bus

Advantage: loading on the processor bus is greatly reduced & busses run at different speeds

Page 62: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Main components of Intel Chipset: Pentium II/III

Northbridge:Handles memoryGraphics

Southbridge: I/OPCI busDisk controllersUSB controllersAudio (AC97)Serial I/OInterrupt controllerTimers

Page 63: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Bunch of Wires

Physical / Mechanical Characterisics – the connectors

Electrical Specification

Timing and Signaling Specification

Transaction Protocol

What defines a bus?

Page 64: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Synchronous and Asynchronous Bus

Synchronous Bus:Includes a clock in the control linesA fixed protocol for communication that is relative to the

clockAdvantage: involves very little logic and can run very fastDisadvantages:

Every device on the bus must run at the same clock rate To avoid clock skew, they cannot be long if they are fast

Asynchronous Bus:It is not clockedIt can accommodate a wide range of devicesIt can be lengthened without worrying about clock skewIt requires a handshaking protocol

Page 65: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

° ° °Master Slave

Control LinesAddress LinesData Lines

Busses so far

Bus Master: has ability to control the bus, initiates transaction

Bus Slave: module activated by the transaction

Bus Communication Protocol: specification of sequence of events and timing requirements in transferring information.

Asynchronous Bus Transfers: control lines (req, ack) serve to orchestrate sequencing.

Synchronous Bus Transfers: sequence relative to common clock.

Page 66: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Simplest bus paradigm

All agents operate synchronously

All can source / sink data at same rate

=> simple protocoljust manage the source and target

Page 67: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

BReq

BG

Cmd+AddrR/WAddress

Data1 Data2Data

Simple Synchronous Protocol

Even memory busses are more complex than thismemory (slave) may take time to respondit may need to control data rate

Page 68: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

BReq

BG

Cmd+AddrR/WAddress

Data1 Data2Data Data1

Wait

Typical Synchronous Protocol

Slave indicates when it is prepared for data xfer

Actual transfer goes at bus rate

Page 69: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Address

Data

Read

Req

Ack

Master Asserts Address

Master Asserts Data

Next Address

Write Transaction

t0 t1 t2 t3 t4 t5

Asynchronous Handshake

t0 : Master has obtained control and asserts address, direction, data

Waits a specified amount of time for slaves to decode target.

t1: Master asserts request line

t2: Slave asserts ack, indicating data received

t3: Master releases req

t4: Slave releases ack

Page 70: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Address

Data

Read

Req

Ack

Master Asserts Address Next Address

t0 t1 t2 t3 t4 t5

Read Transaction

Slave Data

t0 : Master has obtained control and asserts address, direction, data

Waits a specified amount of time for slaves to decode target.

t1: Master asserts request line

t2: Slave asserts ack, indicating ready to transmit data

t3: Master releases req, data received

t4: Slave releases ack

Page 71: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

What is DMA (Direct Memory Access)?

Typical I/O devices must transfer large amounts of data to memory of processor:

Disk must transfer complete block (4K? 16K?)Large packets from networkRegions of video frame buffer

DMA gives external device ability to write memory directly: much lower overhead than having processor request one word at a time.

Processor (or at least memory system) acts like slave

Issue: Cache coherence:What if I/O devices write data that is currently in processor Cache?

The processor may never see new data!Solutions:

Flush cache on every I/O operation (expensive) Have hardware invalidate cache lines

Page 72: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Bus Transaction

Arbitration: Who gets the bus

Request: What do we want to do

Action: What happens in response

Page 73: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

BusMaster

BusSlave

Control: Master initiates requests

Data can go either way

Arbitration: Obtaining Access to the Bus

One of the most important issues in bus design:How is the bus reserved by a device that wishes to use it?

Chaos is avoided by a master-slave arrangement:Only the bus master can control access to the bus:

It initiates and controls all bus requestsA slave responds to read and write requests

The simplest system:Processor is the only bus masterAll bus requests must be controlled by the processorMajor drawback: the processor is involved in every

transaction

Page 74: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Multiple Potential Bus Masters: the Need for Arbitration

Bus arbitration scheme:A bus master wanting to use the bus asserts the bus requestA bus master cannot use the bus until its request is grantedA bus master must signal to the arbiter after finish using the bus

Bus arbitration schemes usually try to balance two factors:Bus priority: the highest priority device should be serviced firstFairness: Even the lowest priority device should never

be completely locked out from the bus

Bus arbitration schemes can be divided into four broad classes:Daisy chain arbitrationCentralized, parallel arbitrationDistributed arbitration by self-selection: each device wanting the bus

places a code indicating its identity on the bus.Distributed arbitration by collision detection:

Each device just “goes for it”. Problems found after the fact.

Page 75: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

The Daisy Chain Bus Arbitrations Scheme

BusArbiter

Device 1HighestPriority

Device NLowestPriority

Device 2

Grant Grant Grant

Release

Request

wired-ORAdvantage: simple

Disadvantages:Cannot assure fairness: A low-priority device may be locked out indefinitely

The use of the daisy chain grant signal also limits the bus speed

Page 76: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

BusArbiter

Device 1 Device NDevice 2

Grant Req

Centralized Parallel Arbitration

Used in essentially all processor-memory busses and in high-speed I/O busses

Page 77: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Increasing the Bus BandwidthSeparate versus multiplexed address and data lines:Address and data can be transmitted in one bus cycle

if separate address and data lines are availableCost: (a) more bus lines, (b) increased complexity

Data bus width:By increasing the width of the data bus, transfers of

multiple words require fewer bus cyclesExample: SPARCstation 20’s memory bus is 128 bit

wideCost: more bus lines

Block transfers:Allow the bus to transfer multiple words in back-to-back

bus cyclesOnly one address needs to be sent at the beginningThe bus is not released until the last word is transferredCost: (a) increased complexity

(b) decreased response time for request

Page 78: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Increasing Transaction Rate on Multimaster Bus

Overlapped arbitrationperform arbitration for next transaction during current

transaction

Bus parkingmaster can holds onto bus and performs multiple

transactions as long as no other master makes request

Overlapped address / data phases (prev. slide)requires one of the above techniques

Split-phase (or packet switched) buscompletely separate address and data phasesarbitrate separately for eachaddress phase yield a tag which is matched with data

phase

”All of the above” in most modern buses

Page 79: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

PCI Read/Write TransactionsAll signals sampled on rising edge

Centralized Parallel Arbitration

overlapped with previous transaction

All transfers are (unlimited) bursts

Address phase starts by asserting FRAME#

Next cycle “initiator” asserts cmd and address

Data transfers happen on when

IRDY# asserted by master when ready to transfer data

TRDY# asserted by target when ready to transfer data

transfer when both asserted on rising edge

FRAME# deasserted when master intends to complete only one more data transfer

Page 80: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

– Turn-around cycle on any signal driven by more than one agent

PCI Read Transaction

Page 81: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

PCI Write Transaction

Page 82: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

PCI OptimizationsPush bus efficiency toward 100% under common simple usage

like RISC

Bus Parkingretain bus grant for previous master until another makes

requestgranted master can start next transfer without arbitration

Arbitrary Burst lengthinitiator and target can exert flow control with xRDYtarget can disconnect request with STOP (abort or retry)master can disconnect by deasserting FRAMEarbiter can disconnect by deasserting GNT

Delayed (pended, split-phase) transactionsfree the bus after request to slow device

Page 83: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

SummaryBuses are an important technique for building large-scale systems

Their speed is critically dependent on factors such as length, number of devices, etc.

Critically limited by capacitanceImportant terminology:

Master: The device that can initiate new transactionsSlaves: Devices that respond to the master

Two types of bus timing:Synchronous: bus includes clockAsynchronous: no clock, just REQ/ACK strobing

Direct Memory Access (dma) allows fast, burst transfer into processor’s memory:

Processor’s memory acts like a slaveProbably requires some form of cache-coherence so that DMA’ed

memory can be invalidated from cache.

Page 84: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

The Big Picture: Where are We Now?

Control

Datapath

Memory

Processor

Input

Output

The Five Classic Components of a Computer

Next TopicLocality and Memory HierarchySRAM Memory TechnologyDRAM Memory TechnologyMemory Organization

Page 85: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Technology Trends

DRAM

Year Size Cycle Time

1980 64 Kb 250 ns

1983 256 Kb 220 ns

1986 1 Mb 190 ns

1989 4 Mb 165 ns

1992 16 Mb 145 ns

1995 64 Mb 120 ns

1000:1! 2:1!

Capacity Speed (latency)

Logic: 2x in 3 years 2x in 3 years

DRAM: 4x in 3 years 2x in 10 years

Disk: 4x in 3 years 2x in 10 years

Page 86: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

µProc60%/yr.(2X/1.5yr)

DRAM9%/yr.(2X/10 yrs)1

10

100

1000

198

0198

1 198

3198

4198

5 198

6198

7198

8198

9199

0199

1 199

2199

3199

4199

5199

6199

7199

8 199

9200

0

DRAM

CPU198

2

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

Processor-DRAM Memory Gap (latency)

Who Cares About the Memory Hierarchy?

“Less’ Law?”

Page 87: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Today’s Situation: Microprocessor

Rely on caches to bridge gap

Microprocessor-DRAM performance gap time of a full cache miss in instructions executed

1st Alpha (7000): 340 ns/5.0 ns =  68 clks x 2 or136 instructions

2nd Alpha (8400): 266 ns/3.3 ns =  80 clks x 4 or320 instructions

3rd Alpha (t.b.d.): 180 ns/1.7 ns =108 clks x 6 or648 instructions

1/2X latency x 3X clock rate x 3X Instr/clock 5X

Page 88: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Cache Performance

CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time

Memory stall clock cycles = (Reads x Read miss rate x Read miss penalty + Writes x Write miss rate x Write miss penalty)

Memory stall clock cycles = Memory accesses x Miss rate x Miss penalty

Page 89: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Impact on PerformanceSuppose a processor executes at

Clock Rate = 200 MHz (5 ns per cycle)Base CPI = 1.1 50% arith/logic, 30% ld/st, 20% control

Suppose that 10% of memory operations get 50 cycle miss penaltySuppose that 1% of instructions get same miss penaltyCPI = Base CPI + average stalls per instruction

1.1(cycles/ins) +[ 0.30 (DataMops/ins)

x 0.10 (miss/DataMop) x 50 (cycle/miss)] +[ 1 (InstMop/ins)

x 0.01 (miss/InstMop) x 50 (cycle/miss)] = (1.1 + 1.5 + .5) cycle/ins = 3.1

55% of the time the proc is stalled waiting for memory!

Ideal CPI 1.1Data Miss 1.5Inst Miss 0.5

Page 90: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

The Goal: illusion of large, fast, cheap memory

Fact: Large memories are slow, fast memories are small

How do we create a memory that is large, cheap and fast (most of the time)?

HierarchyParallelism

Page 91: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Why hierarchy works

Address Space0 2^n - 1

Probabilityof reference

The Principle of Locality:Program access a relatively small portion of the address space at any instant of time.

Page 92: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Memory Hierarchy: How Does it Work?

Lower LevelMemoryUpper Level

MemoryTo Processor

From ProcessorBlk X

Blk Y

Temporal Locality (Locality in Time):=> Keep most recently accessed data items closer to the processor

Spatial Locality (Locality in Space):=> Move blocks consists of contiguous words to the upper levels

Page 93: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Memory Hierarchy: Terminology

Lower LevelMemoryUpper Level

MemoryTo Processor

From ProcessorBlk X

Blk Y

Hit: data appears in some block in the upper level of the hierarchy (example: Block X is found in the L1 cache)

Hit Rate: the fraction of memory access found in the upper levelHit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss

Miss: data needs to be retrieve from a block in the lower level in the hierachy (Block Y is not in L1 cache and must be fetched from main memory)

Miss Rate = 1 - (Hit Rate)Miss Penalty: Time to replace a block in the upper level +

Time to deliver the block the processor

Hit Time << Miss Penalty

Page 94: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Memory Hierarchy of a Modern Computer System

Control

Datapath

SecondaryStorage(Disk)

Processor

Registers

MainMemory(DRAM)

SecondLevelCache

(SRAM)

On

-Ch

ipC

ache

1s 10,000,000s

(10s ms)

Speed (ns): 10s 100s

100s GsSize (bytes): Ks Ms

TertiaryStorage(Tape)

10,000,000,000s (10s sec)

Ts

By taking advantage of the principle of locality:Present the user with as much memory as is available in the

cheapest technology.Provide access at the speed offered by the fastest technology.

Page 95: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

How is the hierarchy managed?

Registers <-> Memoryby compiler (programmer?)

cache <-> memoryby the hardware

memory <-> disksby the hardwareby the operating system (disk caches & virtual memory)

by the programmer (files)

Page 96: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Memory Hierarchy TechnologyRandom Access:“Random” is good: access time is the same for all locationsDRAM: Dynamic Random Access Memory

High density, low power, cheap, slow Dynamic: need to be “refreshed” regularly (1-2% of cycles)

SRAM: Static Random Access Memory Low density, high power, expensive, fast Static: content will last “forever”(until lose power)

“Non-so-random” Access Technology:Access time varies from location to location and from time to timeExamples: Disk, CDROM

Sequential Access Technology: access time linear in location (e.g.,Tape)

We will concentrate on random access technologyThe Main Memory: DRAMs + Caches: SRAMs

Page 97: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Main Memory BackgroundPerformance of Main Memory:

Latency: Cache Miss Penalty Access Time: time between request and word arrives Cycle Time: time between requests

Bandwidth: I/O & Large Block Miss Penalty (L2)

Main Memory is DRAM : Dynamic Random Access MemoryDynamic since needs to be refreshed periodically (8 ms)Addresses divided into 2 halves (Memory as a 2D matrix):

RAS or Row Access Strobe CAS or Column Access Strobe

Cache uses SRAM : Static Random Access MemoryNo refresh (6 transistors/bit vs. 1 transistor)

Size: DRAM/SRAM 4-8 Cost/Cycle time: SRAM/DRAM 8-16

Page 98: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Random Access Memory (RAM) Technology

Why do computer designers need to know about RAM technology?

Processor performance is usually limited by memory bandwidth

As IC densities increase, lots of memory will fit on processor chip

Tailor on-chip memory to specific needs- Instruction cache- Data cache- Write buffer

What makes RAM different from a bunch of flip-flops?Density: RAM is much denser

Page 99: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Main Memory Deep Background

“Out-of-Core”, “In-Core,” “Core Dump”?

“Core memory”?

Non-volatile, magnetic

Lost to 4 Kbit DRAM (today using 64Mbit DRAM)

Access time 750 ns, cycle time 1500-3000 ns

Page 100: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Write:

1. Drive bit lines (bit=1, bit=0)

2. Select row

Read:

1. Precharge bit and bit to Vdd or Vdd/2 => make sure equal!

2.. Select row

3. Cell pulls one line low

4. Sense amp on column detects difference between bit and bit

Static RAM Cell6-Transistor SRAM Cell

bit bit

word(row select)

bit bit

word

replaced with pullupto save area

10

0 1

Page 101: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Typical SRAM Organization: 16-word x 4-bit

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

- +Sense Amp - +Sense Amp - +Sense Amp - +Sense Amp

: : : :

Word 0

Word 1

Word 15

Dout 0Dout 1Dout 2Dout 3

- +Wr Driver &Precharger - +

Wr Driver &Precharger - +

Wr Driver &Precharger - +

Wr Driver &Precharger

Ad

dress D

ecoder

WrEnPrecharge

Din 0Din 1Din 2Din 3

A0

A1

A2

A3

Page 102: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

A

DOE_L

2 Nwordsx M bitSRAM

N

M

WE_L

Logic Diagram of a Typical SRAM

Write Enable is usually active low (WE_L)

Din and Dout are combined to save pins:A new control signal, output enable (OE_L) is neededWE_L is asserted (Low), OE_L is disasserted (High)

D serves as the data input pinWE_L is disasserted (High), OE_L is asserted (Low)

D is the data output pinBoth WE_L and OE_L are asserted:

Result is unknown. Don’t do that!!!

Page 103: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Typical SRAM Timing

Write Timing:

D

Read Timing:

WE_L

A

WriteHold Time

Write Setup Time

A

DOE_L

2 Nwordsx M bitSRAM

N

M

WE_L

Data In

Write Address

OE_L

High Z

Read Address

Junk

Read AccessTime

Data Out

Read AccessTime

Data Out

Read Address

Page 104: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

1-Transistor Memory Cell (DRAM)

row select

bit

Write:1. Drive bit line2.. Select row

Read:1. Precharge bit line to Vdd2.. Select row3. Cell and bit line share charges

Very small voltage changes on the bit line4. Sense (fancy sense amp)

Can detect changes of ~1 million electrons5. Write: restore the value

Refresh1. Just do a dummy read to every cell.

Page 105: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Classical DRAM Organization (square)

row

decoder

rowaddress

Column Selector & I/O Circuits Column

Address

data

RAM Cell Array

word (row) select

bit (data) lines

Each intersection representsa 1-T DRAM Cell

Row and Column Address together: Select 1 bit a time

Page 106: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

AD

OE_L

256K x 8DRAM9 8

WE_LCAS_LRAS_L

Logic Diagram of a Typical DRAM

Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low

Din and Dout are combined (D):WE_L is asserted (Low), OE_L is disasserted (High)

D serves as the data input pinWE_L is disasserted (High), OE_L is asserted (Low)

D is the data output pin

Row and column addresses share the same pins (A)RAS_L goes low: Pins A are latched in as row addressCAS_L goes low: Pins A are latched in as column addressRAS/CAS edge-sensitive

Page 107: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

AD

OE_L

256K x 8DRAM9 8

WE_LCAS_LRAS_L

OE_L

A Row Address

WE_L

Junk

Read AccessTime

Output EnableDelay

CAS_L

RAS_L

Col Address Row Address JunkCol Address

D High Z Data Out

DRAM Read Cycle Time

Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L

Junk Data Out High Z

DRAM Read TimingEvery DRAM access begins at:

The assertion of the RAS_L2 ways to read:

early or late v. CAS

Page 108: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

AD

OE_L

256K x 8DRAM9 8

WE_LCAS_LRAS_L

WE_L

A Row Address

OE_L

Junk

WR Access Time WR Access Time

CAS_L

RAS_L

Col Address Row Address JunkCol Address

D Junk JunkData In Data In Junk

DRAM WR Cycle Time

Early Wr Cycle: WE_L asserted before CAS_L Late Wr Cycle: WE_L asserted after CAS_L

DRAM Write TimingEvery DRAM access begins at:

The assertion of the RAS_L2 ways to write:

early or late v. CAS

Page 109: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Key DRAM Timing Parameters

tRAC: minimum time from RAS line falling to the valid data output.

Quoted as the speed of a DRAM A fast 4Mb DRAM tRAC = 60 ns

tRC: minimum time from the start of one row access to the start of the next.

tRC = 110 ns for a 4Mbit DRAM with a tRAC of 60 ns

tCAC: minimum time from CAS line falling to valid data output.

15 ns for a 4Mbit DRAM with a tRAC of 60 ns

tPC: minimum time from the start of one column access to the start of the next.

35 ns for a 4Mbit DRAM with a tRAC of 60 ns

Page 110: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

DRAM PerformanceA 60 ns (tRAC) DRAM can

perform a row access only every 110 ns (tRC)

perform column access (tCAC) in 15 ns, but time between column accesses is at least 35 ns (tPC).

In practice, external address delays and turning around buses make it 40 to 50 ns

These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead.

Drive parallel DRAMs, external memory controller, bus to turn around, SIMM module, pins…

180 ns to 250 ns latency from processor to memory is good for a “60 ns” (tRAC) DRAM

Page 111: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Interleaved: CPU, Cache, Bus 1 word:

Memory N Modules(4 Modules); example is word interleaved

Wide: CPU/Mux 1 word;

Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits)

Main Memory Performance

Simple: CPU, Cache, Bus,

Memory same width (32 bits)

Page 112: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

TimeAccess Time

Cycle Time

Main Memory Performance

DRAM (Read/Write) Cycle Time >> DRAM (Read/Write) Access Time

2:1; why?

DRAM (Read/Write) Cycle Time :How frequent can you initiate an access?Analogy: A little kid can only ask his father for money on Saturday

DRAM (Read/Write) Access Time:How quickly will you get what you want once you initiate an

access?Analogy: As soon as he asks, his father will give him the money

DRAM Bandwidth Limitation analogy:What happens if he runs out of money on Wednesday?

Page 113: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Access Pattern without Interleaving:

Start Access for D1

CPU Memory

Start Access for D2

D1 available

Access Pattern with 4-way Interleaving:

Acc

ess

Ban

k 0

Access Bank 1

Access Bank 2

Access Bank 3

We can Access Bank 0 again

CPU

MemoryBank 1

MemoryBank 0

MemoryBank 3

MemoryBank 2

Increasing Bandwidth - Interleaving

Page 114: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

address

Bank 0

048

12

address

Bank 1

159

13

address

Bank 2

26

1014

address

Bank 3

37

1115

Main Memory PerformanceTiming model

1 to send address, 4 for access time, 10 cycle time, 1 to send

dataCache Block is 4 words

Simple M.P. = 4 x (1+10+1) = 48Wide M.P. = 1 + 10 + 1 = 12Interleaved M.P. = 1+10+1 + 3 =15

Page 115: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Independent Memory Banks

How many banks?number banks number clocks to access word in bank

For sequential accesses, otherwise will return to original bank before it has next word ready

Increasing DRAM => fewer chips => harder to have banks

Growth bits/chip DRAM : 50%-60%/yrNathan Myrvold M/S: mature software growth (33%/yr for NT) growth MB/$ of DRAM (25%-30%/yr)

Page 116: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Fewer DRAMs/System over TimeM

inim

um

PC

Mem

ory

Siz

e

DRAM Generation‘86 ‘89 ‘92 ‘96 ‘99 ‘02 1 Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb

4 MB

8 MB

16 MB

32 MB

64 MB

128 MB

256 MB

32 8

16 4

8 2

4 1

8 2

4 1

8 2

Memory per System growth@ 25%-30% / year

Memory per DRAM growth@ 60% / year

(from PeteMacWilliams, Intel)

Page 117: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Fast Page Mode Operation

N r

ows

N cols

DRAM

ColumnAddress

M-bit OutputM bits

N x M “SRAM”

RowAddress

A Row Address

CAS_L

RAS_L

Col Address Col Address

1st M-bit Access

Col Address Col Address

2nd M-bit 3rd M-bit 4th M-bit

Regular DRAM Organization:N rows x N column x M-bitRead & Write M-bit at a timeEach M-bit access requires

a RAS / CAS cycleFast Page Mode DRAM

N x M “SRAM” to save a rowAfter a row is read into the register

Only CAS is needed to access other M-bit blocks on that row

RAS_L remains asserted while CAS_L is toggled

Page 118: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

FP Mode DRAM

Fast page mode DRAMIn page mode, a row of the DRAM can be kept "open", so

that successive reads or writes within the row do not suffer the delay of precharge and accessing the row. This increases the performance of the system when reading or writing bursts of data.

Page 119: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Key DRAM Timing ParameterstRAC: minimum time from RAS line falling to the valid data output.

Quoted as the speed of a DRAM A fast 4Mb DRAM tRAC = 60 ns

tRC: minimum time from the start of one row access to the start of the next.

tRC = 110 ns for a 4Mbit DRAM with a tRAC of 60 ns

tCAC: minimum time from CAS line falling to valid data output.

15 ns for a 4Mbit DRAM with a tRAC of 60 ns

tPC: minimum time from the start of one column access to the start of the next.

35 ns for a 4Mbit DRAM with a tRAC of 60 ns

Page 120: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

SDRAM: Syncronous DRAM?

More complicated, on-chip controllerOperations syncronized to clockSo, give row address one cycle

Column address some number of cycles later (say 3) Date comes out later (say 2 cycles later)

Burst modes Typical might be 1,2,4,8, or 256 length burst Thus, only give RAS and CAS once for all of these accesses

Multi-bank operation (on-chip interleaving) Lets you overlap startup latency (5 cycles above) of two banks

Careful of timing specs!10ns SDRAM may still require 50ns to get first data! 50ns DRAM means first data out in 50ns

Page 121: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Other Types of DRAM

Extended data out (EDO) DRAM similar to Fast Page Mode DRAMadditional feature that a new access cycle can be started

while keeping the data output of the previous cycle active. This allows a certain amount of overlap in operation (pipelining), allowing somewhat improved speed. It was 5% faster than Fast Page Mode DRAM, which it began to replace in 1993.

Page 122: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Other Types of DRAM

Double data rate (DDR) SDRAMDouble data rate (DDR) SDRAM is a later development of

SDRAM, used in PC memory from 2000 onwards. All types of SDRAM use a clock signal that is a square wave.

This means that the clock alternates regularly between one voltage (low) and another (high), usually millions of times per second. Plain SDRAM, like most synchronous logic circuits, acts on the low-to-high transition of the clock and ignores the opposite transition. DDR SDRAM acts on both transitions, thereby halving the required clock rate for a given data transfer rate.

Page 123: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

Memory Systems: Delay more than RAW DRAM

DRAM2^n x 1chip

DRAMController

address

MemoryTimingController Bus Drivers

n

w

Tc = Tcycle + Tcontroller + Tdriver

Page 124: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

DRAMs over TimeDRAM Generation

‘84 ‘87 ‘90 ‘93 ‘96 ‘99

1 Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb

55 85 130 200 300 450

30 47 72 110 165 250

28.84 11.1 4.26 1.64 0.61 0.23

(from Kazuhiro Sakashita, Mitsubishi)

1st Gen. Sample

Memory Size

Die Size (mm2)

Memory Area (mm2)

Memory Cell Area (µm2)

Page 125: OMSE 510: Computing Foundations 2: Disks, Buses, DRAM

SummaryTwo Different Types of Locality:

Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon.

Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.

By taking advantage of the principle of locality:Present the user with as much memory as is available in the

cheapest technology.Provide access at the speed offered by the fastest technology.

DRAM is slow but cheap and dense:Good choice for presenting the user with a BIG memory system

SRAM is fast but expensive and not very dense:Good choice for providing the user FAST access time.