ecse$425$lecture$29:$ more$snoopy$coherence$ · today$ • snoopy$coherence$example$ •...
TRANSCRIPT
![Page 1: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/1.jpg)
ECSE 425 Lecture 29: More Snoopy Coherence
H&P Chapter 4
© 2011 Pa=erson, Vu, Meyer; © 2007 Elsevier Science
![Page 2: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/2.jpg)
Last Time
• Snoopy Coherence
© 2011 Pa=erson, Vu, Meyer; © 2007 Elsevier Science ECSE 425, Fall 2011, Lecture 29 2
![Page 3: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/3.jpg)
Today
• Snoopy Coherence Example • SMP Performance
© 2011 Pa=erson, Vu, Meyer; © 2007 Elsevier Science ECSE 425, Fall 2011, Lecture 29 3
![Page 4: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/4.jpg)
Invalidate for this block
Write-‐back State Machine-‐III
4 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
• State machine for CPU requests and for bus requests for each cache block
CPU Read Place read miss on bus
Invalid Shared
(read only)
Exclusive (read/write)
CPU Read hit
CPU Write Place Write Miss on bus
CPU read miss Write back block, Place read miss on bus CPU Write
Place Write Miss on Bus
CPU Read miss Place read miss on bus
CPU read hit CPU write hit
Write miss for this block
Write miss for this block Write Back
Block; (abort Memory access)
Read miss for this block
Write Back Block; (abort memory access)
CPU Write Miss (replace block) Write back cache block Place write miss on bus
![Page 5: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/5.jpg)
WB Invalidate Example
5
Assumes A1 and A2 map to same cache block, ini[al cache state is invalid
ECSE 425, Fall 2011, Lecture 29 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science
![Page 6: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/6.jpg)
WB Invalidate Example
6
Assumes A1 and A2 map to same cache block
ECSE 425, Fall 2011, Lecture 29 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science
![Page 7: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/7.jpg)
WB Invalidate Example
7
Assumes A1 and A2 map to same cache block
ECSE 425, Fall 2011, Lecture 29 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science
![Page 8: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/8.jpg)
WB Invalidate Example
8
Assumes A1 and A2 map to same cache block
ECSE 425, Fall 2011, Lecture 29 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science
![Page 9: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/9.jpg)
WB Invalidate Example
9
Assumes A1 and A2 map to same cache block
ECSE 425, Fall 2011, Lecture 29 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science
![Page 10: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/10.jpg)
WB Invalidate Example
10
Assumes A1 and A2 map to same cache block, but A1 != A2
ECSE 425, Fall 2011, Lecture 29 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science
![Page 11: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/11.jpg)
Limita[ons of Snoopy SMPs
• Bus-‐based mul[processor – Bus for coherence traffic, memory traffic
– Becomes the bo=le neck – Solns: Mul[ple buses or interconnec[on networks
• Single memory accommodates all CPUs
11 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
– Mul[-‐bank memory
• Coherence traffic can overwhelm the network and memory system
![Page 12: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/12.jpg)
Opteron: a hybrid solu[on
• Memory connected directly to each chip – Up to four chips, or eight cores
• In addi[on, point-‐to-‐point connec[ons – Coherent protocol by broadcast on the point-‐to-‐point connec[on to up to three other chips (snooping idea)
– Invalidates complete via explicit ack (directory idea)
• Similar remote and local memory latency – OS can treat Opteron as a UMA MP
12 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
![Page 13: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/13.jpg)
Performance of SMPs
• Cache performance is combina[on of – Uni-‐processor cache miss traffic
– Traffic caused by communica[on • Results in invalida[ons and subsequent cache misses
• Cache misses: – Compulsory, Capacity, Conflict misses – 4th C: coherence miss
13 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
![Page 14: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/14.jpg)
Coherency Misses
1. True sharing misses – Mul[ple processors read and write the same word
– Miss would s[ll occur if block size were one word
2. False sharing misses – Mul[ple processors read and write the same block – Block size ma=ers: no miss if for one word block size
– Occurs because only one dirty bit per block
14 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
![Page 15: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/15.jpg)
Example: True or False Sharing misses?
• Assume x1 and x2 in same cache block. P1 and P2 both read x1 and x2 before.
15 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
True miss; invalidate x1 in P2
False miss; x1 irrelevant to P2
False miss; x1 irrelevant to P2
True miss; occur even if block size 1
True miss; x2 was wri=en by P2
Time P1 P2 True, False, Hit? Why?
1 Write x1
2 Read x2
3 Write x1
4 Write x2
5 Read x2
![Page 16: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/16.jpg)
MP Performance: Commercial Workload
• Three-‐level cache • Uni-‐processor cache misses improve
• True sharing and false sharing misses are unchanged
16 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
![Page 17: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/17.jpg)
MP Performance: Commercial Workload
• 2MB cache • True sharing, false sharing misses increase with number of CPUs going from 1 to 8 CPUs
17 © 2011 Pa=erson, Vu, Meyer; © 2007
Elsevier Science ECSE 425, Fall 2011, Lecture 29
![Page 18: ECSE$425$Lecture$29:$ More$Snoopy$Coherence$ · Today$ • Snoopy$Coherence$Example$ • SMP$Performance$ ©$2011$Paerson,$Vu,$Meyer;$©$2007$ ElsevierScience$ ECSE$425,$Fall$2011,$Lecture$29$](https://reader031.vdocuments.us/reader031/viewer/2022020413/5b578ff47f8b9ad9688dfd89/html5/thumbnails/18.jpg)
Next Time
• Directory Coherence • Synchroniza[on and Consistency
© 2011 Pa=erson, Vu, Meyer; © 2007 Elsevier Science ECSE 425, Fall 2011, Lecture 29 18