anshul kumar, cse iitd csl718 : memory hierarchy cache memories 6th feb, 2006
TRANSCRIPT
Anshul Kumar, CSE IITD
CSL718 : Memory HierarchyCSL718 : Memory HierarchyCSL718 : Memory HierarchyCSL718 : Memory Hierarchy
Cache Memories
6th Feb, 2006
Anshul Kumar, CSE IITD slide 2
Memory technologiesMemory technologiesMemory technologiesMemory technologies
• Semiconductor– Registers– SRAM Random Access– DRAM– FLASH
• Magnetic– FDD– HDD
• Optical Random + sequential– CD– DVD
Anshul Kumar, CSE IITD slide 3
Hierarchical structureHierarchical structureHierarchical structureHierarchical structure
Memory
CPU
Memory
Size Cost / bitSpeed
Smallest
Biggest
Highest
Lowest
Fastest
Slowest Memory
Anshul Kumar, CSE IITD slide 4
System Configuration: e-bay price: Rs. 37,500
Processor: Intel P4 3.2GHz (800FSB) 1024k CPU with Hyper Threading
CPU Fan: P4 Heavy Duty Cooling Fan With Heat Sink
Motherboard: D915G express chipset 800FSB (up to 3.6GHz support)
Memory: 1GB DDR400 PC3200 DUAL CHANNEL RAM
Video Card: GeForce FX 6200 256MB 16x PCI-e video with TV out
Hard drive: 160GB 7200RPM UDMA-150 SATA
CD drive: 52x32x52x16x CDRW + DVD ROM drive
Floppy drive: Sony 1.44MB 3.5" drive
Sound: AC 97 6 ch 5.1 Full duplex digital sound, stereo speakers
Network: 10/100 RJ45 onboard network (Ethernet, cable or DSL)
Modem: 56k v92 modem
Ports: Six USB 2.0 ports,1 serial, 1 parallel, 1 microphone jack
Case: Black i BOX 522 Mid Tower 400w power supply (front USB)
Keyboard: Black PS2 Windows Keyboard
Mouse: Black PS2 Scroll Mouse
Monitor: 17" SAMSUNG 793S MONITOR
Anshul Kumar, CSE IITD slide 5
Main Memory for Pentium IVMain Memory for Pentium IVDDR (double data rate) DRAMDDR (double data rate) DRAMMain Memory for Pentium IVMain Memory for Pentium IVDDR (double data rate) DRAMDDR (double data rate) DRAM
Size Interface Price
128 MB PC-333 Rs. 599
256 MB PC-333 Rs. 1,299
1 GB PC-333 Rs. 4,999
1 GB PC-400 Rs, 5,299
Anshul Kumar, CSE IITD slide 6
Disk drives Disk drives Seagate Baracuda 7200 RPMSeagate Baracuda 7200 RPM
Disk drives Disk drives Seagate Baracuda 7200 RPMSeagate Baracuda 7200 RPM
Capacity Price40 GB Rs. 2,999
80 GB Rs. 3,499
120 GB Rs. 4,499
160 GB Rs. 4,799
200 GB Rs. 5,500
250 GB Rs. 6,999
300 GB Rs. 9,900
400 GB Rs. 14,950
Anshul Kumar, CSE IITD slide 7
Data transfer between levelsData transfer between levelsData transfer between levelsData transfer between levels
unit of transfer = block
access
hit
miss
Processor
Data transfer
Anshul Kumar, CSE IITD slide 8
Principle of localityPrinciple of localityPrinciple of localityPrinciple of locality
• Temporal Locality– references repeated in time
• Spatial Locality– references repeated in space– Special case: Sequential Locality
Anshul Kumar, CSE IITD slide 9
Memory Hierarchy AnalysisMemory Hierarchy AnalysisMemory Hierarchy AnalysisMemory Hierarchy Analysis
Memory Mi: M1, M2, …. , Mn
Capacity si: s1< s2< …. < sn
Unit cost ci: c1> c2> …. > cn
Total cost Ctotal: i ci . si
Access time ti : 1+ 2+ …. +i (i at level i)
1< 2< …. < n
Hit ratios hi(si): h1< h2< …. < hn = 1
Effective time Teff: i mi . hi . ti = i mi . i
Miss before level i, mi: (1-h1)(1-h2) …. (1-hi-1)
Anshul Kumar, CSE IITD slide 10
Cache TypesCache TypesCache TypesCache Types
Instruction | Data | Unified | Split
Split vs. Unified:
• Split allows specializing each part
• Unified allows best use of the capacity
On-chip | Off-chip• on-chip : fast but small
• off-chip : large but slow
Single level | Multi level
Anshul Kumar, CSE IITD slide 11
Cache PoliciesCache PoliciesCache PoliciesCache Policies
• Placement what gets placed where?
• Read when? from where?
• Load order of bytes/words?
• Fetch when to fetch new block?• Replacement which one?
• Write when? to where?
Anshul Kumar, CSE IITD slide 12
Block placement strategiesBlock placement strategiesBlock placement strategiesBlock placement strategies
12
Tag
Data
Block # 0 1 2 3 4 5 6 7
Search
Direct mapped
12
Tag
Data
Set # 0 1 2 3
Search
Set associative
12
Tag
Data
Search
Fully associative
Anshul Kumar, CSE IITD slide 13
Organization/placement policyOrganization/placement policyOrganization/placement policyOrganization/placement policy
Set 1
Set S
Sector 1 Sector 2 Sector SE LRU
Block 1 Block 2 Block BTag
AU 1 AU 2 AU AV D S
Cache
Set
Sector
Block
Anshul Kumar, CSE IITD slide 14
Addressing CacheAddressing CacheAddressing CacheAddressing Cache
Sector Name Set Index Block Displacement Address
Selects set
Compared to TagsSelectsBlock
Selects AU
Early select: access data after tag matchingLate select: access data while tag matching
Anshul Kumar, CSE IITD slide 15
Cache organization exampleCache organization exampleCache organization exampleCache organization example
Tag V D AU AU V D AU AU Tag V D AU AU V D AU AU1
2
3
4
5
6
7
8
Block Block Block Block
Sector Sector
Sets
Anshul Kumar, CSE IITD slide 16
Cache access mechanismCache access mechanismCache access mechanismCache access mechanism
=
index v tag data01
...
...
409518 32
Hit Data
Address31 0
Tag 18 12
index
2byte
offset
Anshul Kumar, CSE IITD slide 17
Cache with 4 word blocksCache with 4 word blocksCache with 4 word blocksCache with 4 word blocks
=
index v tag data01
...
...
102318 32
Hit Data
Address31 0
Tag 18 10
index
2byte offset
3232 32
Mux
2block offset
Anshul Kumar, CSE IITD slide 18
4-way set associative cache4-way set associative cache4-way set associative cache4-way set associative cache
0.........
255
Hit
Data
31 0
tag 20 8index
2 byte offset
Mux
2block offset
=
20 128
v tag data
=
20 128
v tag data
=
20 128
v tag data
=
20 128
v tag data
Mux Mux Mux Mux32 32 32 32
Anshul Kumar, CSE IITD slide 19
Read policiesRead policiesRead policiesRead policies
• Sequential or concurrent– initiate memory access only after detecting a
miss– initiate memory access along with cache access
in anticipation of a miss
• With or without forwarding– give data to CPU after filling the missing block
in cache– forward data to CPU as it gets filled in cache
Anshul Kumar, CSE IITD slide 20
Read PoliciesRead PoliciesRead PoliciesRead Policies
CacheMemory
Teff=(1-pm).1 + pm . (T+2)
Sequential Simple:
CacheMemory
Teff=(1-pm).1 + pm . (T+1)
Concurrent Simple:
CacheMemory
Teff=(1-pm).1 + pm . (T+1)
Sequential Forward:
CacheMemory
Teff=(1-pm).1 + pm . (T)
Concurrent Forward:
1 1 1T
1 1 1T
1 1T
1 1T
Anshul Kumar, CSE IITD slide 21
Load policiesLoad policiesLoad policiesLoad policies
4 AU Block
Cache miss on AU 1
Block Load
Load Forward
Fetch Bypass(wrap aroundload)
0 1 2 3
Anshul Kumar, CSE IITD slide 22
Fetch PoliciesFetch PoliciesFetch PoliciesFetch Policies
• Fetch on miss (demand fetching)
• Software prefetching
• Hardware Prefetching
Anshul Kumar, CSE IITD slide 23
Fetch PoliciesFetch PoliciesFetch PoliciesFetch Policies
• Demand fetching– fetch only when required (miss)
• Hardware prefetching– automatically prefetch next block
• Software prefetching– programmer decides to prefetch
questions: – how much ahead (prefetch distance)– how often
Anshul Kumar, CSE IITD slide 24
Software Control of CacheSoftware Control of CacheSoftware Control of CacheSoftware Control of Cache
Software visible cache– mode selection (WT, WB etc)– block flush– block invalidate– block prefetch
Anshul Kumar, CSE IITD slide 25
Replacement PoliciesReplacement PoliciesReplacement PoliciesReplacement Policies
• Least Recently Used (LRU)
• Least Frequently Used (LFU)
• First In First Out (FIFO)
• Random
Anshul Kumar, CSE IITD slide 26
Write PoliciesWrite PoliciesWrite PoliciesWrite Policies
• Write Hit– Write Back– Write Through
• Write Miss– Write Back– Write Through (with or without Write Allocate)
Buffers are used in all cases to hide latencies