in-network cache coherence micro’2006 noel eisley et.al, princeton univ. presented by pak, eunji
TRANSCRIPT
![Page 1: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/1.jpg)
In-network cache coher-ence
MICRO’2006Noel Eisley et.al, Princeton Univ.
Presented by PAK, EUNJI
![Page 2: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/2.jpg)
2
In-Network Cache Coherence
• Traditional Directory based Coherence• Decouple the protocol and the Communication medium
• Protocol optimizations for end-to-end communication• Network optimizations for reduced latency
• Known problems due to technology scaling• Wire delays• Storage overhead of directories (80 core CMPs)
• Expose the interconnection medium to the protocol• Directory = Centralized serializing point
• Distribute the directory with-in the network
![Page 3: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/3.jpg)
In-Network Read optimization
H A B
Read request
To sheerer dataDirectory based MSI
•Three end-to-end messages
In-Network Cache Coherence, Noel Eisley,
The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)
![Page 4: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/4.jpg)
In-Network Read optimization
H A B
•Node B “bumps” into node A•While message in-transit to the home node H obtain the data directly from A
Read request
data In-Network MSI
In-Network Cache Coherence, Noel Eisley,
The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)
![Page 5: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/5.jpg)
In-Network Write optimization
H A B
write request
Inv
InvDirectory based MSIAck
data
C
Ack
In-Network Cache Coherence, Noel Eisley,
The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)
![Page 6: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/6.jpg)
In-Network Write optimization
H A B
write request
Inv+AckIn-network MSI
data
C
Ack +inv
•This in-transit optimization can reduce write communication from two round-trips to a single round-trip from C to H and back
In-Network Cache Coherence, Noel Eisley,
The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)
![Page 7: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/7.jpg)
Current: Directory-Based Protocol
• Home node (H) keeps a list (or partial list) of sharers
• Read: Round trip over-head– Reads don’t take ad-
vantage of locality• Write: Up to two round
trip overhead– Invalidations ineffi-
cient
H
![Page 8: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/8.jpg)
New Approach: Virtual Network
• Network is not just for get-ting from A to B
• It can keep track of sharers and maintain coherence
• How? With trees stored within routers– Tree connecting home
node with sharers– Tree stored within
routers– On the way to home
node, when requests “bump” into trees, routers will re-route them towards sharers.
H
Bump into tree
Add to tree
![Page 9: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/9.jpg)
H
Reads Locate Data Efficiently
LegendH: Home node
Sharer node
Tree node (no data)
Read request/re-plyWrite request/reply
Teardown re-questAcknowledge-ment
• Read Request in-jected into the network
• Tree constructed as read reply is sent back
• New read in-jected into the network
• Request is redi-rected by the net-work
• Data is obtained from sharer and re-turned
![Page 10: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/10.jpg)
Writes Invalidate Data Efficiently
H
• Write Request in-jected into the network
• In-transit invali-dations
• Acknowledge-ments spawned at leaf nodes
• Wait for ac-knowledgements
• Send out write reply
LegendH: Home node
Sharer node
Tree node (no data)
Read request/re-plyWrite request/reply
Teardown re-questAcknowledge-ment
![Page 11: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/11.jpg)
Router Micro-architecture
Tree Cache
Link Tra-versal
Crossbar Traversal
Virtual Channel Arbitration
Physical Channel ArbitrationTree
Cache
Routing
Router Pipeline
Tag Data
En
try
H
![Page 12: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/12.jpg)
Router Micro-architecture: Virtual Tree Cache
• Each cache line in the virtual tree cache contains– 2 bits (To, From root
node) for each di-rection (NSEW)
– 1 bit if the local node contains valid data
– 1 busy bit– 1 Outstanding re-
quest bit
Tag Data
En
try
T T T TF F F FN S E W D B R
Virtual Tree Cache
![Page 13: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/13.jpg)
Methodology
• Benchmarks: SPLASH-2• Multithreaded simulator: Bochs running 16 or
64-way• Network and virtual tree simulator
– Trace-driven • Full-system simulation becomes prohibitive with many
cores• Motivation: explore scalability
– Cycle-accurate– Each message is modeled– Models network contention (arbitration)– Calculates average request completion times
BochsNetwork
Simulator
Applica-tion
Memory traces Memory access
latency perfor-mance
![Page 14: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/14.jpg)
Performance Scalability
• Compare in-network vir-tual tree protocol to standard directory proto-col
• Good improvement– 4x4 Mesh: Avg. 35.5%
read latency reduc-tion, 41.2% write la-tency reduction
• Scalable– 8x8 Mesh: Avg. 35%
read and 48% write latency reduction
4x4 Mesh
8x8 Mesh
![Page 15: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/15.jpg)
Storage Scalability
• Directory protocol: O(# Nodes)• Virtual tree protocol: O(# Ports) ~O(1)• Storage overhead
– 4x4 mesh: 56% more storage bits– 8x8 mesh: 58% fewer storage bits
16
64
# of Nodes Tree Cache Line
Directory Cache Line
28 bits
28 bits
18 bits
66 bits
![Page 16: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/16.jpg)
Issues
• Power Dissipation• Major Bottleneck
• complex functionality in the routers
• Complexity• Complexity of routers, arbiters, etc. • Lack of verification tools
• CAD tool support• More research needed for ease of adoption
![Page 17: In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI](https://reader036.vdocuments.us/reader036/viewer/2022072016/56649ef55503460f94c08b5f/html5/thumbnails/17.jpg)
Summary
• More and more cores on a CMP• Natural trend towards distribution of state• More interaction between interconnects and protocols
• Ring interconnects => medium size systems• Simple design, low cost• Limited Scalability
• In-Network coherence => large scale systems• Scalability• Severe implementation and power constraints on chip