1 comp 206: computer architecture and implementation montek singh wed., oct. 23, 2002 topic: memory...
TRANSCRIPT
1
COMP 206:COMP 206:Computer Architecture and Computer Architecture and
ImplementationImplementation
Montek SinghMontek Singh
Wed., Oct. 23, 2002Wed., Oct. 23, 2002
Topic: Topic: Memory Hierarchy Design (HP3 Memory Hierarchy Design (HP3
Ch. 5)Ch. 5)
(Caches, Main Memory and Virtual (Caches, Main Memory and Virtual
Memory)Memory)
2
The Five Classic Components of a ComputerThe Five Classic Components of a Computer
This lecture (and next few): Memory SystemThis lecture (and next few): Memory System
Control
Datapath
Memory
Processor
Input
Output
The Big Picture: Where are We The Big Picture: Where are We Now?Now?
3
MotivationMotivation Large (cheap) memories (DRAM) are slowLarge (cheap) memories (DRAM) are slow Small (costly) memories (SRAM) are fastSmall (costly) memories (SRAM) are fast
Make the average access time smallMake the average access time small service most accesses from a small, fast memoryservice most accesses from a small, fast memory reduce the bandwidth required of the large memoryreduce the bandwidth required of the large memory
Processor
Memory System
Cache DRAM
The Motivation for CachesThe Motivation for Caches
4
The Principle of LocalityThe Principle of Locality Program access a relatively small portion of the address Program access a relatively small portion of the address
space at any instant of timespace at any instant of time Example: 90% of time in 10% of the codeExample: 90% of time in 10% of the code
Two different types of localityTwo different types of locality Temporal Locality (locality in time): Temporal Locality (locality in time):
if an item is referenced, it will tend to be referenced again if an item is referenced, it will tend to be referenced again soonsoon
Spatial Locality (locality in space): Spatial Locality (locality in space): if an item is referenced, items close by tend to be referenced if an item is referenced, items close by tend to be referenced
soonsoon
Address Space0 2n
Probabilityof reference
The Principle of LocalityThe Principle of Locality
5
CPU Registers500 Bytes0.25 ns~$.01
Cache16K-1M Bytes1 ns~$.0001
Main Memory64M-2G Bytes100ns~$.0000001
Disk100 G Bytes5 ms10-5- 10-7 cents
CapacityAccess TimeCost/bit
Tape/Network“infinite”secs.10-8 cents
Registers
L1, L2, … Cache
Memory
Disk
Tape/Network
Words
Blocks
Pages
Files
StagingTransfer Unit
programmer/compiler1-8 bytes
cache controller8-128 bytes
OS4-64K bytes
user/operatorMbytes
Upper Level
Lower Level
Faster
Larger
Levels of the Memory HierarchyLevels of the Memory Hierarchy
6
Lower Level(Memory)
Upper Level(Cache)
To Processor
From ProcessorBlk X
Blk Y
Memory Hierarchy: Principles of Memory Hierarchy: Principles of OperationOperation At any given time, data is copied between only At any given time, data is copied between only
2 adjacent levels2 adjacent levels Upper Level (Cache): the one closer to the processorUpper Level (Cache): the one closer to the processor
Smaller, faster, and uses more expensive technologySmaller, faster, and uses more expensive technology Lower Level (Memory): the one further away from the Lower Level (Memory): the one further away from the
processorprocessorBigger, slower, and uses less expensive technologyBigger, slower, and uses less expensive technology
BlockBlock The smallest unit of information that can either be The smallest unit of information that can either be
present or not present in the two-level hierarchypresent or not present in the two-level hierarchy
7
Memory Hierarchy: TerminologyMemory Hierarchy: Terminology Hit:Hit: data appears in some block in the upper data appears in some block in the upper
level (e.g.: Block X in previous slide) level (e.g.: Block X in previous slide) Hit Rate = fraction of memory access found in upper Hit Rate = fraction of memory access found in upper
levellevel Hit Time = time to access the upper levelHit Time = time to access the upper level
memory access time + Time to determine hit/missmemory access time + Time to determine hit/miss
Miss:Miss: data needs to be retrieved from a block in data needs to be retrieved from a block in the lower level (e.g.: Block Y in previous slide)the lower level (e.g.: Block Y in previous slide) Miss Rate = 1 - (Hit Rate)Miss Rate = 1 - (Hit Rate) Miss Penalty: includes time to fetch a new block from Miss Penalty: includes time to fetch a new block from
lower levellower levelTime to replace a block in the upper level from lower level + Time to replace a block in the upper level from lower level +
Time to deliver the block the processorTime to deliver the block the processor
Hit Time: significantly less than Miss PenaltyHit Time: significantly less than Miss Penalty
8
Cache AddressingCache Addressing
Set 0
Set j-1
Block 0 Block k-1 Replacement info
Sector 0 Sector m-1 Tag
Byte 0 Byte n-1 Valid Dirty Shared
Block/line is unit of allocationBlock/line is unit of allocation Sector/sub-block is unit of transfer and coherenceSector/sub-block is unit of transfer and coherence Cache parameters Cache parameters jj, , kk, , mm, , nn are integers, and are integers, and
generally powers of 2generally powers of 2
9
Examples of Cache ConfigurationsExamples of Cache Configurations
# Sets # Blocks # Sectors # Bytes Name1 k m n Fully associativej 1 m n Direct mappedj k 1 n A cache that is not sectoredj 4 m n 4-way set-associative cache
64 8 2 32 PowerPC 601
10
Storage Overhead of CacheStorage Overhead of Cache
8
31
8
83
bits data ofNumber
bits ofnumber Total
nmk
mktagkrepl
nmkj
nmtagkreplj
System # Address bits (j,k,m,n) Cache size Storage overheadIBM 360/85 24 (1,16,16,64) 16 KB 0.85%IBM 3033 32 (64,16,1,64) 64 KB 5.95%
Motorola 68030 32 (24,4,2,2) 256 B 28.10%Intel i486 32 (128,4,1,16) 8 KB 19.90%
DEC Alpha AXP 21064 34 (256,1,1,32) 8 KB 9.37%IBM PowerPC 601 32 (64,8,2,32) 32 KB 5.76%
11
Cache OrganizationCache Organization Direct Mapped CacheDirect Mapped Cache
Each memory location can only mapped to 1 cache locationEach memory location can only mapped to 1 cache location No need to make any decision :-)No need to make any decision :-)
Current item replaces previous item in that cache locationCurrent item replaces previous item in that cache location
N-way Set Associative CacheN-way Set Associative Cache Each memory location have a choice of N cache locationsEach memory location have a choice of N cache locations
Fully Associative CacheFully Associative Cache Each memory location can be placed in ANY cache locationEach memory location can be placed in ANY cache location
Cache miss in a N-way Set Associative or Fully Cache miss in a N-way Set Associative or Fully Associative CacheAssociative Cache Bring in new block from memoryBring in new block from memory Throw out a cache block to make room for the new blockThrow out a cache block to make room for the new block Need to decide which block to throw out!Need to decide which block to throw out!
12
Write Allocate versus Not AllocateWrite Allocate versus Not Allocate Assume that a 16-bit write to memory location Assume that a 16-bit write to memory location
0x00 causes a cache miss0x00 causes a cache miss Do we read in the block?Do we read in the block?
Yes: Write AllocateYes: Write Allocate No: Write No-AllocateNo: Write No-Allocate
13
Basics of Cache OperationBasics of Cache Operation
HIT MISS READ CPU reads from cache Allocate and load block
from MM, then CPU reads from it
WRITE Write into cache plus write through into MM
Write through into MM with or without write allocate
WRITE Write into cache only and set dirty bit (so that on replacement, block is written back to MM only if modified)
Write allocate with write back
14
Details of Simple Blocking CacheDetails of Simple Blocking Cache HIT MISS READ CPU reads cache CPU detects miss, stalls
Cache selects replacement block New block loaded from MM Requested word sent to CPU CPU resumes operation
WRITE CPU writes cache CPU writes MM and stalls until write completes
CPU detects miss CPU writes MM (cache also if write allocate) stalls until write completes
HIT MISS READ CPU reads cache CPU detects miss, stalls
Cache selects replacement block New block loaded from MM Word sent to CPU CPU resumes operation
WRITE CPU writes cache
CPU detects miss, stalls Cache selects replacement block Old block evicted from cache New block loaded from MM (write allocate) CPU resumes operation
Write Through
Write Back
15
Cache Data
Cache Block 0
Cache TagValid
:: :
Cache Data
Cache Block 0
Cache Tag Valid
: ::
Cache Index
Mux 01SEL1 SEL0
Cache Block
CompareAddr. Tag
Compare
OR
Hit
Addr. Tag
A-way Set-Associative CacheA-way Set-Associative Cache AA-way set associative: -way set associative: AA entries for each entries for each
cache indexcache index A direct-mapped caches operating in parallelA direct-mapped caches operating in parallel
Example: Two-way set associative cacheExample: Two-way set associative cache Cache Index selects a “set” from the cacheCache Index selects a “set” from the cache The two tags in the set are compared in parallelThe two tags in the set are compared in parallel Data is selected based on the tag resultData is selected based on the tag result
16
:
Cache Data
Byte 0
0431
:
Cache Tag (27 bits long)
Valid Bit
:
Byte 1Byte 31 :
Byte 32Byte 33Byte 63 :
Cache Tag
Byte Select
Ex: 0x01
X
X
X
X
X
Fully Associative CacheFully Associative Cache Push the set-associative idea to its limit!Push the set-associative idea to its limit!
Forget about the Cache IndexForget about the Cache Index Compare the Cache Tags of all cache tag entries in Compare the Cache Tags of all cache tag entries in
parallelparallel Example: Block Size = 32B, we need N 27-bit Example: Block Size = 32B, we need N 27-bit
comparatorscomparators
17
Cache ShapesCache Shapes
Direct-mapped(A = 1, S = 16)
2-way set-associative(A = 2, S = 8)
4-way set-associative(A = 4, S = 4)
8-way set-associative(A = 8, S = 2)
Fully associative(A = 16, S = 1)
18
:
Entry 0
Entry 1
Entry 63
Replacement
Pointer
Cache Block Replacement PoliciesCache Block Replacement Policies Random ReplacementRandom Replacement
Hardware randomly selects a cache item and throw it outHardware randomly selects a cache item and throw it out
Least Recently UsedLeast Recently Used Hardware keeps track of the access historyHardware keeps track of the access history Replace the entry that has not been used for the longest timeReplace the entry that has not been used for the longest time For 2-way set-associative cache, need one bit for LRU repl.For 2-way set-associative cache, need one bit for LRU repl.
Example of a Simple “Pseudo” LRU ImplementationExample of a Simple “Pseudo” LRU Implementation Assume 64 Fully Associative entriesAssume 64 Fully Associative entries Hardware replacement pointer points to one cache entryHardware replacement pointer points to one cache entry Whenever access is made to the entry the pointer points to:Whenever access is made to the entry the pointer points to:
Move the pointer to the next entryMove the pointer to the next entry Otherwise: do not move the pointerOtherwise: do not move the pointer
19
Cache Write PolicyCache Write Policy Cache read is much easier to handle than cache Cache read is much easier to handle than cache
writewrite Instruction cache is much easier to design than data cacheInstruction cache is much easier to design than data cache
Cache writeCache write How do we keep data in the cache and memory consistent?How do we keep data in the cache and memory consistent?
Two options (decision time again :-)Two options (decision time again :-) Write Back: write to cache only. Write the cache block to Write Back: write to cache only. Write the cache block to
memorymemory when that cache block is being replaced on a cache miss when that cache block is being replaced on a cache missNeed a “dirty bit” for each cache blockNeed a “dirty bit” for each cache blockGreatly reduce the memory bandwidth requirementGreatly reduce the memory bandwidth requirementControl can be complexControl can be complex
Write Through: write to cache and memory at the same timeWrite Through: write to cache and memory at the same timeWhat!!! How can this be? Isn’t memory too slow for this?What!!! How can this be? Isn’t memory too slow for this?
20
ProcessorCache
Write Buffer
DRAM
Write Buffer for Write ThroughWrite Buffer for Write Through
Write Buffer: needed between cache and main Write Buffer: needed between cache and main memmem Processor: writes data into the cache and the write Processor: writes data into the cache and the write
bufferbuffer Memory controller: write contents of the buffer to Memory controller: write contents of the buffer to
memorymemory
Write buffer is just a FIFOWrite buffer is just a FIFO Typical number of entries: 4Typical number of entries: 4 Works fine if store freq. (w.r.t. time) << 1 / DRAM Works fine if store freq. (w.r.t. time) << 1 / DRAM
write cyclewrite cycle
Memory system designer’s nightmareMemory system designer’s nightmare Store frequency (w.r.t. time) > 1 / DRAM write cycleStore frequency (w.r.t. time) > 1 / DRAM write cycle Write buffer saturationWrite buffer saturation
21
ProcessorCache
Write Buffer
DRAM
ProcessorCache
Write Buffer
DRAML2Cache
Write Buffer SaturationWrite Buffer Saturation
Store frequency (w.r.t. time) > 1 / DRAM write cycleStore frequency (w.r.t. time) > 1 / DRAM write cycle If this condition exist for a long period of time (CPU cycle If this condition exist for a long period of time (CPU cycle
time too quick and/or too many store instructions in a row)time too quick and/or too many store instructions in a row) Store buffer will overflow no matter how big you make itStore buffer will overflow no matter how big you make it CPU Cycle Time << DRAM Write Cycle TimeCPU Cycle Time << DRAM Write Cycle Time
Solutions for write buffer saturationSolutions for write buffer saturation Use a write back cacheUse a write back cache Install a second level (L2) cacheInstall a second level (L2) cache
22
Four Questions for Memory Four Questions for Memory HierarchyHierarchy Where can a block be placed in the upper Where can a block be placed in the upper
level? level? (Block placement)(Block placement) How is a block found if it is in the upper level?How is a block found if it is in the upper level?
(Block identification)(Block identification) Which block should be replaced on a miss?Which block should be replaced on a miss?
(Block replacement)(Block replacement) What happens on a write?What happens on a write?
(Write strategy)(Write strategy)