snoopy coherence protocols small-scale multiprocessors
Post on 22-Dec-2015
220 views
TRANSCRIPT
Assumptions
• broadcast-style interconnect– e.g. shared bus, free-space optical, …– allows passive listeners
• assume write-back caches– invalidation after a write rather than update
• write-through (update protocol) is also possible
Invalidate vs. update
• Write-invalidate protocol:– write to shared data: an invalidate is sent to
all caches which snoop and invalidate copies.– read miss: snoop caches to find most recent
copy
• Write-update protocol:– write to shared data: broadcast on bus,
processors snoop and update any copies.– read miss: memory is always up to date.
Three-state MSI protocol
• Each block of memory is in one state:– Clean in all caches and up-to-date in memory (shared)– Dirty in exactly one cache (modified)– Not in any cache
• Each cache line is in one state:– Modified: cache has only copy, it is writable and dirty– Shared: line can be read– Invalid: line contains no valid data
• Read misses cause the cache to snoop the bus• Write to a shared block is treated as a miss - needs a
(snoopy) bus transaction
I S
M
Bus write
Bus read – send data to requestor
Bus write
Bus read
b) Bus snooping
-send datato requestor
Bus reador write
Example
• assume cache line is initially invalid
• consider two addresses, A1 and A2
• assume A1 and A2 map to the same cache line, but A1 != A2– that is, A1 and A2 refer to completely different
places in memory, not adjacent (or nearby) addresses that fit within the same block
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
Step 1a: Write miss, invalid line
- is A1 cached anywhere?
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
Step 1b: No other cache responds
- assert ownership
Wait a minute...
• if we only have one type of read transaction (“Bus read”) how do we tell the difference between memory or another cache responding?
• the bus cycle allows for an “intervention”– more properly, a cache-to-cache intervention– a cache pre-empts the bus and answers
instead of memory
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
P1: read A1 M A1 10
Step 2: Read hit
- no bus action needed
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
P1: read A1 M A1 10
P2: read A1 I Bus read P2 A1
Step 3a: Read miss
- does anyone have A1 cached?
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
P1: read A1 M A1 10
P2: read A1 I Bus read P2 A1
S A1 10 S A1 10 Bus write P1 A1 10 A1 10
Step 3b: Cached elsewhere
- P1 replies
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
P1: read A1 M A1 10
P2: read A1 I Bus read P2 A1
S A1 10 S A1 10 Bus write P1 A1 10 A1 10
P2: write 20 to A1 I M A1 20 Bus write P2 A1
Step 4: Write hit, shared line
- now P2 owns it
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
P1: read A1 M A1 10
P2: read A1 I Bus read P2 A1
S A1 10 S A1 10 Bus write P1 A1 10 A1 10
P2: write 20 to A1 I M A1 20 Bus write P2 A1
P2: write 40 to A2 Bus write P2 A1 20 A1 20
Step 5a: Write miss, A2 maps to the same line as A1
- first, write back the victim
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
P1: read A1 M A1 10
P2: read A1 I Bus read P2 A1
S A1 10 S A1 10 Bus write P1 A1 10 A1 10
P2: write 20 to A1 I M A1 20 Bus write P2 A1
P2: write 40 to A2 Bus write P2 A1 20 A1 20
Bus read P2 A2
Step 5b: Service the miss
- does anyone have A2 cached?
Step P1 P2 Bus Memory
State Addr Value State Addr Value Action Processor Addr Value Addr Value
P1: write 10 to A1 I Bus read P1 A1
M A1 10 Bus write P1 A1
P1: read A1 M A1 10
P2: read A1 I Bus read P2 A1
S A1 10 S A1 10 Bus write P1 A1 10 A1 10
P2: write 20 to A1 I M A1 20 Bus write P2 A1
P2: write 40 to A2 Bus write P2 A1 20 A1 20
Bus read P2 A2
M A2 40 Bus write P2 A2
Step 5c: Not cached elsewhere
- like the second half of step 1
Four state protocol
• add “exclusive” state
• indicates this is the only cached copy
• no need to broadcast an invalidation on a write hit to an E line
• goal is to reduce bus traffic
• works well for local variables
I E
M
Write (hit)Write (miss)
Read (hit)
Read or write (hit)
a) Processor actions
S
Write (hit)
Read (miss) 2
Read (miss) 1
Read (hit)1: data comes from memory2: data from another cache