3.1memory

7/28/2019 3.1Memory

1/62

Chapter 5. The Memory

System

7/28/2019 3.1Memory

2/62

Overview

Basic memory circuits

Organization of the main memory

Cache memory concept Virtual memory mechanism

Secondary storage

7/28/2019 3.1Memory

3/62

Some Basic Concepts

7/28/2019 3.1Memory

4/62

Basic Concepts

The maximum size of the memory that can be used in anycomputer is determined by the addressing scheme.16-bit addresses = 216 = 64K memory locations (32-bit 4GB, 40-bit - 1TB)

Most modern computers are byte addressable.

In 32-bit memory address, Higher order 30 bits determine word toaccess

2k

4- 2k

3- 2k

2- 2k

1- 2k

4-2k

4-

0 1 2 3

4 5 6 7

00

4

2k

1- 2k

2- 2k

3- 2k

4-

3 2 1 0

7 6 5 4

Byte addressByte address

(a) Big-endian assignment (b) Little-endian assignment

4

W ord

address

7/28/2019 3.1Memory

5/62

Traditional Architecture

Up to 2k addressableMDR

MAR

Figure 5.1. Connection of the memory to the processor.

k-bitaddress bus

n-bitdata bus

Control lines

( , MFC, etc.)

Processor Memory

locations

Word length = n bits

WR /

K can be smaller than n

if k=2^64 ~ 2 exabytes (tera->Peta->exa 10^19)

7/28/2019 3.1Memory

6/62

Basic Concepts

Block transfer bulk data transfer

Memory access time (Time between Readand MFC)

Memory cycle time (Time between twosuccessive R/W)

RAM any location can be accessedfor a Read or Write operation insome fixed amount of time that isindependent of the locationsaddress.

Cache memory

7/28/2019 3.1Memory

7/62

Semiconductor RAM

Memories

7/28/2019 3.1Memory

8/62

Internal Organization ofMemory Chips

FF

Figure 5.2. Organization of bit cells in a memory chip.

circuitSense / Write

Address

decoder

FF

CS

cells

Memory

circuitSense / Write Sense / Write

circuit

Data input /output lines:

A0

A1

A2

A3

W0

W1

W15

b7

b1

b0

WR /

b 7

b1

b 0

b7 b1 b0

16 words of 8 bits each: 16x8

memory org.. It has 16 externalconnections: addr. 4, data 8,

control: 2, power/ground: 2

1K memory cells: 128x8 memory,

external connections: ?

19(7+8+2+2)

1Kx1:? 15 (10+1+2+2)

7/28/2019 3.1Memory

9/62

A Memory Chip

Figure 5.3. Organization of a 1K 1 memory chip.

CS

Sense/ Write

circuitry

arraymemory cell

address5-bit row

input/outputData

5-bitdecoder

address5-bit column

address10-bit

output multiplexer

32-to-1

input demultiplexer

32 32

WR /

W0

W1

W31

and

7/28/2019 3.1Memory

10/62

Static Memories

The circuits are capable of retaining their state aslong as power is applied.

YX

Word line

Bit lines

Figure 5.4. A static RAM cell.

b

T2T1

b

7/28/2019 3.1Memory

11/62

Figure A.15. CMOS realization of a NOT gate.

(b) Truth table and transistor states

7/28/2019 3.1Memory

12/62

Static Memories

CMOS cell: low power consumption

7/28/2019 3.1Memory

13/62

Asynchronous DRAMs Static RAMs are fast, but they cost more area and are more

expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but

they can not retain their state indefinitely need to beperiodically refreshed.

Figure 5.6. A single-transistor dynamic memory cell

T

C

Word line

Bit line

7/28/2019 3.1Memory

14/62

A Dynamic Memory Chip

Column

CSSense / Writecircuits

cell arraylatch

addressRow

Column

latch

decoderRow

decoderaddress

4096 512 8

R / W

A20 9- A8 0-

D0D7

R A S

C A S

Figure 5.7. Internal organization of a 2M 8 dynamic memory chip.

7/28/2019 3.1Memory

15/62

Fast Page Mode

When the DRAM in last slide is accessed,the contents of all 4096 cells in theselected row are sensed, but only 8 bits

are placed on the data lines D7-0, asselected by A8-0.

Fast page mode make it possible toaccess the other bytes in the same row

without having to reselect the row.

A latch is added at the output of the senseamplifier in each column.

Good for bulk transfer.

7/28/2019 3.1Memory

16/62

Synchronous DRAMs

The operations of SDRAM are controlled by a

clock signal.

R/ W

RA S

CA S

CS

Clock

Cell arraylatch

addressRow

decoderRow

Figure 5.8. Synchronous DRAM.

decoderCo lumn Read/Write

circuits & latchescounteraddressColumn

Row/Columnaddress

Data inputregister

Data outputregister

Data

Refreshcounter

Mode registerand

timing control

7/28/2019 3.1Memory

17/62

Synchronous DRAMs

R / W

R A S

C A S

Clock

Figure 5.9. Burst read of length 4 in an SDRAM.

Row Col

D0 D1 D2 D3

Address

Data

7/28/2019 3.1Memory

18/62

Synchronous DRAMs

No CAS pulses is needed in burstoperation.

Refresh circuits are included (every64ms).

Clock frequency > 100 MHz

Intel PC100 and PC133 DDR Double data rate

7/28/2019 3.1Memory

19/62

Latency and Bandwidth

The speed and efficiency of data transfersamong memory, processor, and disk havea large impact on the performance of a

computer system. Memory latency the amount of time it

takes to transfer a word of data to or fromthe memory.

Memory bandwidth the number of bits orbytes that can be transferred in onesecond. It is used to measure how muchtime is needed to transfer an entire block

of data.

7/28/2019 3.1Memory

20/62

DDR SDRAM Double-Data-Rate SDRAM

Standard SDRAM performs all actions on the risingedge of the clock signal.

DDR SDRAM accesses the cell array in the same way,but transfers the data on both edges of the clock.

The cell array is organized in two banks. Each can beaccessed separately.

DDR SDRAMs and standard SDRAMs are mostefficiently used in applications where block transfersare prevalent.

SDRAM 10ns 100 MHz

DDR7.5/6/5 ns 133/166/200 MHz

DDR2/3 4 words or 8 words/read

f

7/28/2019 3.1Memory

21/62

Structures of Larger Memories

Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.

19-bit internal chip address

decoder2-bit

addresses21-bit

A0A1

A19

memory chip

A20

D31-24

D7-0

D23-16

D15-8

512 K 8

Chip select

memory chip

19-bitaddress

512 K 8

8-bit datainput/output2 M words, 32-bit each

7/28/2019 3.1Memory

22/62

Memory SystemConsiderations

The choice of a RAM chip for a given applicationdepends on several factors:

Cost, speed, power, size

SRAMs are faster, more expensive, smaller.

DRAMs are slower, cheaper, larger.

Which one for cache and main memory,respectively?

Refresh overhead suppose a SDRAM whose

cells are in 8K rows; 4 clock cycles are needed toaccess each row; then it takes 81924=32,768cycles to refresh all rows; if the clock rate is 133MHz, then it takes 32,768/(13310-6)=24610-6seconds; suppose the typical refreshing period is

64 ms, then the refresh overhead is=

7/28/2019 3.1Memory

23/62

Memory Controller

Processor

R A S

C A S

R/ W

Clock

Address

Row/Columnaddress

Memorycontroller

R/ W

Clock

RequestCS

Data

Memory

Figure 5.11. Use of a memory controller.

7/28/2019 3.1Memory

24/62

Read-Only Memories

7/28/2019 3.1Memory

25/62

Read-Only-Memory

Volatile / non-volatile memory

ROM

PROM EPROM

EEPROMNot connected to store a 1

Connected to store a 0

Figure 5.12. A ROM cell.

Word line

P

Bit line

T

7/28/2019 3.1Memory

26/62

Flash Memory

Similar to EEPROM

Difference: only possible to write anentire block of cells instead of asingle cell

Low power

Use in portable equipment Implementation of such modules

Flash cards

Flash drives

7/28/2019 3.1Memory

27/62

Speed, Size, and Cost

Pr ocessor

Primarycache

Secondarycache

Main

Magnetic disk

memory

Increasingsize

Increasingspeed

Figure 5.13. Memory hierarchy.

secondary

memory

Increasingcost per bit

Re gisters

L1

L2

7/28/2019 3.1Memory

28/62

Cache Memories

7/28/2019 3.1Memory

29/62

Cache

What is cache?

Why we need it?

Locality of reference (very important)- temporal

- spatial

Cache block cache line

7/28/2019 3.1Memory

30/62

Cache

Replacement algorithm Hit / miss

Write-through / Write-back

Load through

Figure 5.14. Use of a cache memory.

CacheMain

memoryProcessor

7/28/2019 3.1Memory

31/62

Direct Mapping

tag

tag

tag

Cache

Mainmemory

Block 0

Block 1

Block 127

Block 128

Block 129

Block 255

Block 256

Block 257

Block 4095

Block 0

Block 1

Block 127

7 4 Main memory address

Tag Block Word

128 bocks 16 words each (2K words cache)

16-bit addressable

Main memory -64K words(4K blocks of 16 words each)

block j mod 128

Figure 5.15. Direct-mapped cache.5

7/28/2019 3.1Memory

32/62

Associative Mapping

4

tag

tag

tag

Cache

Mainmemory

Block 0

Block 1

Blocki

Block 4095

Block 0

Block 1

Block 127

12 Main memory address

Figure 5.16. Associative-mapped cache.

T ag Word

Place anywhere in cache

12 bits for tag

Search entire cache

7/28/2019 3.1Memory

33/62

Set-Associative Mapping

tag

tag

tag

Cache

Mainmemory

Block 0

Block 1

Block 63

Block 64

Block 65

Block 127

Block 128

Block 129

Block 4095

Block 0

Block 1

Block 126

tag

tag

Block 2

Block 3

tagBlock 127

Main memory address6 6 4

Tag Set Word

Set 0

Set 1

Set 63

Figure 5.17. Set-associative-mapped cache with two blocks per set.

2 or more Blocks make a set

Place block in any one of the blocks in a setContention minimized, chance for any one of the two

blocks

Search space reduced (search from 64 rather than 128

blocks)

7/28/2019 3.1Memory

34/62

Cache Validity

Valid bit indicates data is valid

Set to 0 when powered on

Set to 0 when block in cache, butDMA transfer bypasses cache

Set to 1 when block loaded withdata

Similar issue for write from memoryto disk using DMA, memory hasstale data

Flush cache before DMA write

7/28/2019 3.1Memory

35/62

Replacement Algorithms

Difficult to determine which blocks tokick out

Least Recently Used (LRU) block

The cache controller tracksreferences to all blocks ascomputation proceeds.

Increase / clear track counters whena hit/miss occurs

FIFO

7/28/2019 3.1Memory

36/62

Performance Considerations

7/28/2019 3.1Memory

37/62

Overview

Two key factors: performance and cost Price/performance ratio Performance depends on how fast

machine instructions can be brought into

the processor for execution and how fastthey can be executed.

For memory hierarchy, it is beneficial iftransfers to and from the faster units can

be done at a rate equal to that of thefaster unit. This is not possible if both the slow and

the fast units are accessed in the samemanner.

However, it can be achieved when

7/28/2019 3.1Memory

38/62

Interleaving

If the main memory is structured as acollection of physically separated modules,each with its own ABR and DBR, memoryaccess operations may proceed in more

than one module at the same time.mbitsAddress in module MM address

(a) Consecutive words in a module

i

kbits

Module Module Module

Module

DBRABR DBRABR ABR DBR

0 n 1-

Figure 5.25. Addressing multiple-module memory systems.

(b) Consecutive words in consecutive modules

i

kbits

0

ModuleModuleModule

Module MM address

DBRABRABR DBRABR DBR

Address in module

2k

1-

mbits

7/28/2019 3.1Memory

39/62

Hit Rate and Miss Penalty The success rate in accessing information

at various levels of the memory hierarchy hit rate / miss rate.

Ideally, the entire memory hierarch would

appear to the processor as a singlememory unit that has the access time of acache on the processor chip and the sizeof a magnetic disk depends on the hit

rate (>>0.9). A miss causes extra time needed to bring

the desired information into the cache.

Example 5.2, page 332.

7/28/2019 3.1Memory

40/62

Example

h -hit rate,

M miss penalty -time to access main memory

C -time to access cache

tavg= hC+(1-h)MNo cache:

time to get one word=1+8+1

cache with 8 word blockstime to get 1 block = 1+8+4+4(send addr + access 1st word all 4 modules + send 4 words in next 4 cycles simultaneous access

next addrs + 4 cycles next 4 words)

1 cycle send addr to memory

8 cycles - 1st word access

4 cycles subsequent words

1 cycle - send word to cache

7/28/2019 3.1Memory

41/62

How to Improve Hit Rate?

Use larger cache

Increase the block size while keepingthe total cache size constant.

However, if the block size is toolarge, some items may not bereferenced before the block is

replaced miss penalty increases.

Load-through approach

Caches on the Processor Chip

7/28/2019 3.1Memory

42/62

Caches on the Processor Chip On chip vs. off chip

Two separate caches for instructions anddata, respectively

Single cache for both

Which one has better hit rate?

Whats the good for separating caches? Level 1 and Level 2 caches

L1 cache faster and smaller. Accessmore than one word simultaneously and

let the processor use them one at a time. L2 cache slower and larger.

How about the average access time?

Average access time: tave

= h1C

1+ (1-

h h C + 1-h 1-h M

7/28/2019 3.1Memory

43/62

Other Enhancements

Write buffer processor doesntneed to wait for the memory write tobe completed

Prefetching prefetch the data intothe cache before they are needed

Lockup-Free cache processor is

able to access the cache while a missis being serviced.

7/28/2019 3.1Memory

44/62

Virtual Memories

7/28/2019 3.1Memory

45/62

Overview Physical main memory is not as large as the

address space spanned by an address issued bythe processor.

232 = 4 GB, 264=

When a program does not completely fit into the

main memory, the parts of it not currently beingexecuted are stored on secondary storagedevices.

Techniques that automatically move program anddata blocks into the physical main memory whenthey are required for execution are called virtual-memory techniques.

Virtual addresses will be translated into physicaladdresses.

Overview

7/28/2019 3.1Memory

46/62

Overview

7/28/2019 3.1Memory

47/62

Address Translation

All programs and data are composedof fixed-length units called pages,each of which consists of a block ofwords that occupy contiguouslocations in the main memory.

Page cannot be too small or toolarge.

The virtual memory mechanismbridges the size and speed gapsbetween the main memory and

secondary storage similar to cache.

7/28/2019 3.1Memory

48/62

Address Translation

Page frame

Virtual address from processor

in memory

Offset

Offset

Virtual page numberPage table address

Page table base register

Figure 5.27. Virtual-memory address translation.

Controlbits

Physical address in main memory

PAGE TABLE

Page frame

+

Valid bitModifed bit

Read only bit ...

7/28/2019 3.1Memory

49/62

Address Translation

The page table information is usedby the MMU for every access, so it issupposed to be with the MMU.

However, since MMU is on the

processor chip and the page table israther large, only small portion of it,which consists of the page tableentries that correspond to the mostrecently accessed pages, can beaccommodated within the MMU.

Translation Lookaside Buffer (TLB)

TLBVirtual address from processor

7/28/2019 3.1Memory

50/62

TLB

Figure 5.28. Use of an associative-mapped TLB.

No

Yes

Hit

Miss

p

TLB

OffsetVirtual page number

numberVirtual page Page frame

in memoryControlbits

Offset

Physical address in main memory

Page frame

=?

7/28/2019 3.1Memory

51/62

TLB The contents of TLB must be

coherent with the contents of pagetables in the memory.

Translation procedure.

Page fault

Page replacement

Write-through is not suitable forvirtual memory.

Locality of reference in virtual

memory

M M t

7/28/2019 3.1Memory

52/62

Memory ManagementRequirements

Multiple programs

System space / user space

Protection (supervisor / user state,privileged instructions)

Shared pages

7/28/2019 3.1Memory

53/62

Secondary Storage

7/28/2019 3.1Memory

54/62

Magnetic Hard Disks

Disk

Disk drive

Disk controller

7/28/2019 3.1Memory

55/62

Organization of Data on a Disk

Sector 0, track 0

Sector 3, trackn

Figure 5.30. Organization of one surface of a disk.

Sector 0, track 1

7/28/2019 3.1Memory

56/62

Access Data on a Disk

Sector header

Following the data, there is an error-correction code (ECC).

Formatting process Difference between inner tracks and

outer tracks

Access time seek time / rotationaldelay (latency time)

Data buffer/cache

7/28/2019 3.1Memory

57/62

Disk Controller

Processor Main memory

System bus

Figure 5.31. Disks connected to the system bus.

Disk controller

Disk drive Disk drive

7/28/2019 3.1Memory

58/62

Disk Controller

Seek

Read

Write

Error checking

7/28/2019 3.1Memory

59/62

RAID Disk Arrays

Redundant Array of InexpensiveDisks

Using multiple disks makes it

cheaper for huge storage, and alsopossible to improve the reliability ofthe overall system.

RAID0 data striping

RAID1

RAID2, 3, 4

- - Aluminum Acrylic Label

7/28/2019 3.1Memory

60/62

Optical Disks(a) Cross-section

Polycarbonate plastic

Source Detector Source Detector Source Detector

No reflection

Reflection Reflection

Pit Land

0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0

(c) Stored binary pattern

Figure 5.32. Optical disk.

Pit Land

1

(b) Transition from pit to land

7/28/2019 3.1Memory

61/62

Optical Disks

CD-ROM

CD-Recordable (CD-R)

CD-ReWritable (CD-RW)

DVD

DVD-RAM

7/28/2019 3.1Memory

62/62

Magnetic Tape Systems

Figure 5.33. Organization of data on magnetic tape.

FileFile

markmark

File

7 or 9

gap gapFile gap Record RecordRecord Record

bits

3.1memory

Documents