3.1memory
TRANSCRIPT
-
7/28/2019 3.1Memory
1/62
Chapter 5. The Memory
System
-
7/28/2019 3.1Memory
2/62
Overview
Basic memory circuits
Organization of the main memory
Cache memory concept Virtual memory mechanism
Secondary storage
-
7/28/2019 3.1Memory
3/62
Some Basic Concepts
-
7/28/2019 3.1Memory
4/62
Basic Concepts
The maximum size of the memory that can be used in anycomputer is determined by the addressing scheme.16-bit addresses = 216 = 64K memory locations (32-bit 4GB, 40-bit - 1TB)
Most modern computers are byte addressable.
In 32-bit memory address, Higher order 30 bits determine word toaccess
2k
4- 2k
3- 2k
2- 2k
1- 2k
4-2k
4-
0 1 2 3
4 5 6 7
00
4
2k
1- 2k
2- 2k
3- 2k
4-
3 2 1 0
7 6 5 4
Byte addressByte address
(a) Big-endian assignment (b) Little-endian assignment
4
W ord
address
-
7/28/2019 3.1Memory
5/62
Traditional Architecture
Up to 2k addressableMDR
MAR
Figure 5.1. Connection of the memory to the processor.
k-bitaddress bus
n-bitdata bus
Control lines
( , MFC, etc.)
Processor Memory
locations
Word length = n bits
WR /
K can be smaller than n
if k=2^64 ~ 2 exabytes (tera->Peta->exa 10^19)
-
7/28/2019 3.1Memory
6/62
Basic Concepts
Block transfer bulk data transfer
Memory access time (Time between Readand MFC)
Memory cycle time (Time between twosuccessive R/W)
RAM any location can be accessedfor a Read or Write operation insome fixed amount of time that isindependent of the locationsaddress.
Cache memory
-
7/28/2019 3.1Memory
7/62
Semiconductor RAM
Memories
-
7/28/2019 3.1Memory
8/62
Internal Organization ofMemory Chips
FF
Figure 5.2. Organization of bit cells in a memory chip.
circuitSense / Write
Address
decoder
FF
CS
cells
Memory
circuitSense / Write Sense / Write
circuit
Data input /output lines:
A0
A1
A2
A3
W0
W1
W15
b7
b1
b0
WR /
b 7
b1
b 0
b7 b1 b0
16 words of 8 bits each: 16x8
memory org.. It has 16 externalconnections: addr. 4, data 8,
control: 2, power/ground: 2
1K memory cells: 128x8 memory,
external connections: ?
19(7+8+2+2)
1Kx1:? 15 (10+1+2+2)
-
7/28/2019 3.1Memory
9/62
A Memory Chip
Figure 5.3. Organization of a 1K 1 memory chip.
CS
Sense/ Write
circuitry
arraymemory cell
address5-bit row
input/outputData
5-bitdecoder
address5-bit column
address10-bit
output multiplexer
32-to-1
input demultiplexer
32 32
WR /
W0
W1
W31
and
-
7/28/2019 3.1Memory
10/62
Static Memories
The circuits are capable of retaining their state aslong as power is applied.
YX
Word line
Bit lines
Figure 5.4. A static RAM cell.
b
T2T1
b
-
7/28/2019 3.1Memory
11/62
Figure A.15. CMOS realization of a NOT gate.
(b) Truth table and transistor states
-
7/28/2019 3.1Memory
12/62
Static Memories
CMOS cell: low power consumption
-
7/28/2019 3.1Memory
13/62
Asynchronous DRAMs Static RAMs are fast, but they cost more area and are more
expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but
they can not retain their state indefinitely need to beperiodically refreshed.
Figure 5.6. A single-transistor dynamic memory cell
T
C
Word line
Bit line
-
7/28/2019 3.1Memory
14/62
A Dynamic Memory Chip
Column
CSSense / Writecircuits
cell arraylatch
addressRow
Column
latch
decoderRow
decoderaddress
4096 512 8
R / W
A20 9- A8 0-
D0D7
R A S
C A S
Figure 5.7. Internal organization of a 2M 8 dynamic memory chip.
-
7/28/2019 3.1Memory
15/62
Fast Page Mode
When the DRAM in last slide is accessed,the contents of all 4096 cells in theselected row are sensed, but only 8 bits
are placed on the data lines D7-0, asselected by A8-0.
Fast page mode make it possible toaccess the other bytes in the same row
without having to reselect the row.
A latch is added at the output of the senseamplifier in each column.
Good for bulk transfer.
-
7/28/2019 3.1Memory
16/62
Synchronous DRAMs
The operations of SDRAM are controlled by a
clock signal.
R/ W
RA S
CA S
CS
Clock
Cell arraylatch
addressRow
decoderRow
Figure 5.8. Synchronous DRAM.
decoderCo lumn Read/Write
circuits & latchescounteraddressColumn
Row/Columnaddress
Data inputregister
Data outputregister
Data
Refreshcounter
Mode registerand
timing control
-
7/28/2019 3.1Memory
17/62
Synchronous DRAMs
R / W
R A S
C A S
Clock
Figure 5.9. Burst read of length 4 in an SDRAM.
Row Col
D0 D1 D2 D3
Address
Data
-
7/28/2019 3.1Memory
18/62
Synchronous DRAMs
No CAS pulses is needed in burstoperation.
Refresh circuits are included (every64ms).
Clock frequency > 100 MHz
Intel PC100 and PC133 DDR Double data rate
-
7/28/2019 3.1Memory
19/62
Latency and Bandwidth
The speed and efficiency of data transfersamong memory, processor, and disk havea large impact on the performance of a
computer system. Memory latency the amount of time it
takes to transfer a word of data to or fromthe memory.
Memory bandwidth the number of bits orbytes that can be transferred in onesecond. It is used to measure how muchtime is needed to transfer an entire block
of data.
-
7/28/2019 3.1Memory
20/62
DDR SDRAM Double-Data-Rate SDRAM
Standard SDRAM performs all actions on the risingedge of the clock signal.
DDR SDRAM accesses the cell array in the same way,but transfers the data on both edges of the clock.
The cell array is organized in two banks. Each can beaccessed separately.
DDR SDRAMs and standard SDRAMs are mostefficiently used in applications where block transfersare prevalent.
SDRAM 10ns 100 MHz
DDR7.5/6/5 ns 133/166/200 MHz
DDR2/3 4 words or 8 words/read
f
-
7/28/2019 3.1Memory
21/62
Structures of Larger Memories
Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.
19-bit internal chip address
decoder2-bit
addresses21-bit
A0A1
A19
memory chip
A20
D31-24
D7-0
D23-16
D15-8
512 K 8
Chip select
memory chip
19-bitaddress
512 K 8
8-bit datainput/output2 M words, 32-bit each
-
7/28/2019 3.1Memory
22/62
Memory SystemConsiderations
The choice of a RAM chip for a given applicationdepends on several factors:
Cost, speed, power, size
SRAMs are faster, more expensive, smaller.
DRAMs are slower, cheaper, larger.
Which one for cache and main memory,respectively?
Refresh overhead suppose a SDRAM whose
cells are in 8K rows; 4 clock cycles are needed toaccess each row; then it takes 81924=32,768cycles to refresh all rows; if the clock rate is 133MHz, then it takes 32,768/(13310-6)=24610-6seconds; suppose the typical refreshing period is
64 ms, then the refresh overhead is=
-
7/28/2019 3.1Memory
23/62
Memory Controller
Processor
R A S
C A S
R/ W
Clock
Address
Row/Columnaddress
Memorycontroller
R/ W
Clock
RequestCS
Data
Memory
Figure 5.11. Use of a memory controller.
-
7/28/2019 3.1Memory
24/62
Read-Only Memories
-
7/28/2019 3.1Memory
25/62
Read-Only-Memory
Volatile / non-volatile memory
ROM
PROM EPROM
EEPROMNot connected to store a 1
Connected to store a 0
Figure 5.12. A ROM cell.
Word line
P
Bit line
T
-
7/28/2019 3.1Memory
26/62
Flash Memory
Similar to EEPROM
Difference: only possible to write anentire block of cells instead of asingle cell
Low power
Use in portable equipment Implementation of such modules
Flash cards
Flash drives
-
7/28/2019 3.1Memory
27/62
Speed, Size, and Cost
Pr ocessor
Primarycache
Secondarycache
Main
Magnetic disk
memory
Increasingsize
Increasingspeed
Figure 5.13. Memory hierarchy.
secondary
memory
Increasingcost per bit
Re gisters
L1
L2
-
7/28/2019 3.1Memory
28/62
Cache Memories
-
7/28/2019 3.1Memory
29/62
Cache
What is cache?
Why we need it?
Locality of reference (very important)- temporal
- spatial
Cache block cache line
-
7/28/2019 3.1Memory
30/62
Cache
Replacement algorithm Hit / miss
Write-through / Write-back
Load through
Figure 5.14. Use of a cache memory.
CacheMain
memoryProcessor
-
7/28/2019 3.1Memory
31/62
Direct Mapping
tag
tag
tag
Cache
Mainmemory
Block 0
Block 1
Block 127
Block 128
Block 129
Block 255
Block 256
Block 257
Block 4095
Block 0
Block 1
Block 127
7 4 Main memory address
Tag Block Word
128 bocks 16 words each (2K words cache)
16-bit addressable
Main memory -64K words(4K blocks of 16 words each)
block j mod 128
Figure 5.15. Direct-mapped cache.5
-
7/28/2019 3.1Memory
32/62
Associative Mapping
4
tag
tag
tag
Cache
Mainmemory
Block 0
Block 1
Blocki
Block 4095
Block 0
Block 1
Block 127
12 Main memory address
Figure 5.16. Associative-mapped cache.
T ag Word
Place anywhere in cache
12 bits for tag
Search entire cache
-
7/28/2019 3.1Memory
33/62
Set-Associative Mapping
tag
tag
tag
Cache
Mainmemory
Block 0
Block 1
Block 63
Block 64
Block 65
Block 127
Block 128
Block 129
Block 4095
Block 0
Block 1
Block 126
tag
tag
Block 2
Block 3
tagBlock 127
Main memory address6 6 4
Tag Set Word
Set 0
Set 1
Set 63
Figure 5.17. Set-associative-mapped cache with two blocks per set.
2 or more Blocks make a set
Place block in any one of the blocks in a setContention minimized, chance for any one of the two
blocks
Search space reduced (search from 64 rather than 128
blocks)
-
7/28/2019 3.1Memory
34/62
Cache Validity
Valid bit indicates data is valid
Set to 0 when powered on
Set to 0 when block in cache, butDMA transfer bypasses cache
Set to 1 when block loaded withdata
Similar issue for write from memoryto disk using DMA, memory hasstale data
Flush cache before DMA write
-
7/28/2019 3.1Memory
35/62
Replacement Algorithms
Difficult to determine which blocks tokick out
Least Recently Used (LRU) block
The cache controller tracksreferences to all blocks ascomputation proceeds.
Increase / clear track counters whena hit/miss occurs
FIFO
-
7/28/2019 3.1Memory
36/62
Performance Considerations
-
7/28/2019 3.1Memory
37/62
Overview
Two key factors: performance and cost Price/performance ratio Performance depends on how fast
machine instructions can be brought into
the processor for execution and how fastthey can be executed.
For memory hierarchy, it is beneficial iftransfers to and from the faster units can
be done at a rate equal to that of thefaster unit. This is not possible if both the slow and
the fast units are accessed in the samemanner.
However, it can be achieved when
-
7/28/2019 3.1Memory
38/62
Interleaving
If the main memory is structured as acollection of physically separated modules,each with its own ABR and DBR, memoryaccess operations may proceed in more
than one module at the same time.mbitsAddress in module MM address
(a) Consecutive words in a module
i
kbits
Module Module Module
Module
DBRABR DBRABR ABR DBR
0 n 1-
Figure 5.25. Addressing multiple-module memory systems.
(b) Consecutive words in consecutive modules
i
kbits
0
ModuleModuleModule
Module MM address
DBRABRABR DBRABR DBR
Address in module
2k
1-
mbits
-
7/28/2019 3.1Memory
39/62
Hit Rate and Miss Penalty The success rate in accessing information
at various levels of the memory hierarchy hit rate / miss rate.
Ideally, the entire memory hierarch would
appear to the processor as a singlememory unit that has the access time of acache on the processor chip and the sizeof a magnetic disk depends on the hit
rate (>>0.9). A miss causes extra time needed to bring
the desired information into the cache.
Example 5.2, page 332.
-
7/28/2019 3.1Memory
40/62
Example
h -hit rate,
M miss penalty -time to access main memory
C -time to access cache
tavg= hC+(1-h)MNo cache:
time to get one word=1+8+1
cache with 8 word blockstime to get 1 block = 1+8+4+4(send addr + access 1st word all 4 modules + send 4 words in next 4 cycles simultaneous access
next addrs + 4 cycles next 4 words)
1 cycle send addr to memory
8 cycles - 1st word access
4 cycles subsequent words
1 cycle - send word to cache
-
7/28/2019 3.1Memory
41/62
How to Improve Hit Rate?
Use larger cache
Increase the block size while keepingthe total cache size constant.
However, if the block size is toolarge, some items may not bereferenced before the block is
replaced miss penalty increases.
Load-through approach
Caches on the Processor Chip
-
7/28/2019 3.1Memory
42/62
Caches on the Processor Chip On chip vs. off chip
Two separate caches for instructions anddata, respectively
Single cache for both
Which one has better hit rate?
Whats the good for separating caches? Level 1 and Level 2 caches
L1 cache faster and smaller. Accessmore than one word simultaneously and
let the processor use them one at a time. L2 cache slower and larger.
How about the average access time?
Average access time: tave
= h1C
1+ (1-
h h C + 1-h 1-h M
-
7/28/2019 3.1Memory
43/62
Other Enhancements
Write buffer processor doesntneed to wait for the memory write tobe completed
Prefetching prefetch the data intothe cache before they are needed
Lockup-Free cache processor is
able to access the cache while a missis being serviced.
-
7/28/2019 3.1Memory
44/62
Virtual Memories
-
7/28/2019 3.1Memory
45/62
Overview Physical main memory is not as large as the
address space spanned by an address issued bythe processor.
232 = 4 GB, 264=
When a program does not completely fit into the
main memory, the parts of it not currently beingexecuted are stored on secondary storagedevices.
Techniques that automatically move program anddata blocks into the physical main memory whenthey are required for execution are called virtual-memory techniques.
Virtual addresses will be translated into physicaladdresses.
Overview
-
7/28/2019 3.1Memory
46/62
Overview
-
7/28/2019 3.1Memory
47/62
Address Translation
All programs and data are composedof fixed-length units called pages,each of which consists of a block ofwords that occupy contiguouslocations in the main memory.
Page cannot be too small or toolarge.
The virtual memory mechanismbridges the size and speed gapsbetween the main memory and
secondary storage similar to cache.
-
7/28/2019 3.1Memory
48/62
Address Translation
Page frame
Virtual address from processor
in memory
Offset
Offset
Virtual page numberPage table address
Page table base register
Figure 5.27. Virtual-memory address translation.
Controlbits
Physical address in main memory
PAGE TABLE
Page frame
+
Valid bitModifed bit
Read only bit ...
-
7/28/2019 3.1Memory
49/62
Address Translation
The page table information is usedby the MMU for every access, so it issupposed to be with the MMU.
However, since MMU is on the
processor chip and the page table israther large, only small portion of it,which consists of the page tableentries that correspond to the mostrecently accessed pages, can beaccommodated within the MMU.
Translation Lookaside Buffer (TLB)
TLBVirtual address from processor
-
7/28/2019 3.1Memory
50/62
TLB
Figure 5.28. Use of an associative-mapped TLB.
No
Yes
Hit
Miss
p
TLB
OffsetVirtual page number
numberVirtual page Page frame
in memoryControlbits
Offset
Physical address in main memory
Page frame
=?
-
7/28/2019 3.1Memory
51/62
TLB The contents of TLB must be
coherent with the contents of pagetables in the memory.
Translation procedure.
Page fault
Page replacement
Write-through is not suitable forvirtual memory.
Locality of reference in virtual
memory
M M t
-
7/28/2019 3.1Memory
52/62
Memory ManagementRequirements
Multiple programs
System space / user space
Protection (supervisor / user state,privileged instructions)
Shared pages
-
7/28/2019 3.1Memory
53/62
Secondary Storage
-
7/28/2019 3.1Memory
54/62
Magnetic Hard Disks
Disk
Disk drive
Disk controller
-
7/28/2019 3.1Memory
55/62
Organization of Data on a Disk
Sector 0, track 0
Sector 3, trackn
Figure 5.30. Organization of one surface of a disk.
Sector 0, track 1
-
7/28/2019 3.1Memory
56/62
Access Data on a Disk
Sector header
Following the data, there is an error-correction code (ECC).
Formatting process Difference between inner tracks and
outer tracks
Access time seek time / rotationaldelay (latency time)
Data buffer/cache
-
7/28/2019 3.1Memory
57/62
Disk Controller
Processor Main memory
System bus
Figure 5.31. Disks connected to the system bus.
Disk controller
Disk drive Disk drive
-
7/28/2019 3.1Memory
58/62
Disk Controller
Seek
Read
Write
Error checking
-
7/28/2019 3.1Memory
59/62
RAID Disk Arrays
Redundant Array of InexpensiveDisks
Using multiple disks makes it
cheaper for huge storage, and alsopossible to improve the reliability ofthe overall system.
RAID0 data striping
RAID1
RAID2, 3, 4
- - Aluminum Acrylic Label
-
7/28/2019 3.1Memory
60/62
Optical Disks(a) Cross-section
Polycarbonate plastic
Source Detector Source Detector Source Detector
No reflection
Reflection Reflection
Pit Land
0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0
(c) Stored binary pattern
Figure 5.32. Optical disk.
Pit Land
1
(b) Transition from pit to land
-
7/28/2019 3.1Memory
61/62
Optical Disks
CD-ROM
CD-Recordable (CD-R)
CD-ReWritable (CD-RW)
DVD
DVD-RAM
-
7/28/2019 3.1Memory
62/62
Magnetic Tape Systems
Figure 5.33. Organization of data on magnetic tape.
FileFile
markmark
File
7 or 9
gap gapFile gap Record RecordRecord Record
bits