deadlock verification in network-on-chipsfvaan/pv/aes_genoc_060611.pdf · 80 cores research chip...
TRANSCRIPT
![Page 1: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/1.jpg)
Deadlock Verif ication in Network-on-Chips
Jul ien Schmaltz and Freek Verbeek
![Page 2: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/2.jpg)
Julien and Freek
• Julien– French– Ph.D. from University of Grenoble 2006 (D. Borrione, TIMA Labs)– 1-year postdoc in Saabrücken, Germany– 2,5-year postdoc Radboud University Nijmegen– Since October 2009, Assistant Professor Open Universiteit
• but I still have an office within RUN !– Main research area in formal methods
• Freek– Dutch– Ph.D. student at the Radboud University Nijmegen– Started October 2008– Main research area in deadlock verification for NoCs– Official promotors: Frits Vaandrager and Marko van Eekelen– Daily supervisor: Julien
![Page 3: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/3.jpg)
Mutlicore shift has happened (A. Agarwal - Keynote NOCS 2011)
![Page 4: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/4.jpg)
Growing number of cores (W. Tichy - Keynote ICST 2011)
Intel 2 cores~167 Mio. T. on 1.1cm2
Intel 4 cores~582 Mio. T. on 2.86cm2
Intel 8 cores~2.3 Bill. T. on 6.8cm2
AMD Opteron 12 cores~1.8 Bill. T. on 2x3.46cm2
Sun Niagara3 16 cores~1 Bill. T. on 3.7cm2
Intel SCC 48 cores~1.3 Bill. T. on 5.6cm2
Tilera TILEPro64 64 coresIntel Research 80 cores~100 Mio. T. on 2.75cm2
![Page 5: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/5.jpg)
80 Cores Research Chip
• Teraflops, 62 Watts
• 100 millions transistors, 275 mm2
• 25% node area for router
• ASCI Red Supercomputer
• Teraflops (Dec. 1996)
• 10, 000 Pentium Pro
• 104 cabinets, 230 m2
![Page 6: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/6.jpg)
I t is only the beginning . . .
Source. IEEE Computers 2011
![Page 7: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/7.jpg)
A key component: the communication fabric or Network-on-Chip (NoC)
• 80 core research chip– 25% area for the NoC– 30% power consumption for the NoC
• Communication fabrics key – to performance and efficiency– to functional correctness
![Page 8: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/8.jpg)
Verif ication challenges
• NoCs are very large systems – Verification methods must scale up to 100s of agents– Large number of parameters (routing, switching, buffers, etc.)– Regular and irregular topologies
• NoCs must be fault-tolerant– Deep sub-micron effect– Not all routers/processors are working– Static and dynamic fault-models
• NoCs have intricate message dependencies– Mix between interconnect and protocols– e.g. cache coherency or master/slave– Deadlocks can emerge from deadlock-free routing and protocols
![Page 9: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/9.jpg)
Networks-on-Chips: Example 1, Hermes
NOFDATA HD
• XY minimal deterministic routing• Wormhole switching• Frame structure based on flits (header, control, data)
![Page 10: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/10.jpg)
A simple master/slave protocol
• Masters send requests and wait for responses• Slaves produce responses when receiving requests• Deadlock-free protocol
Master Slave
![Page 11: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/11.jpg)
XY routing in a 2D-mesh
• Deterministic simple routing algorithm• First route to the destination column and then to the correct row• No cyclic dependencies and thus deadlock-free
![Page 12: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/12.jpg)
All nodes are both masters and slaves
• Is the system deadlock-free ?
![Page 13: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/13.jpg)
All nodes are both masters and slaves
• Is the system deadlock-free ?• No ! Two nodes are sufficient to create a deadlock.
![Page 14: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/14.jpg)
Masters on even columns and slaves on odd columns
• Is the system deadlock-free ?
MasterSlave
Master MasterSlave Slave
![Page 15: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/15.jpg)
Masters on even columns and slaves on odd columns
• Is the system deadlock-free ?• No if at least four columns, yes otherwise.
MasterSlave
Master MasterSlave Slave
Green request cannot be consumed as it is waiting for the blue request to leave.
![Page 16: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/16.jpg)
All masters on the right and al l slaves on the left
• Is the system deadlock-free ?
Master Slave
![Page 17: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/17.jpg)
All masters on the right and al l slaves on the left
• Is the system deadlock-free ?• Yes ! A deadlock would require a response to wait for a request
Master SlaveNo dependencies between responses and requests
![Page 18: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/18.jpg)
From deadlock-free components a deadlock emerges !
![Page 19: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/19.jpg)
Networks-on-Chips: Example 2, Spidergon
• Design by STMicroelectronics• Simple shortest path routing algorithm• Regular for an even number of nodes• Packet, circuit, or wormhole switching
RelAd = (dest - current ) mod 4 * N
if RelAd = 0 then stop
elseif 0 < RelAd <= N then go clockwise
elseif 3*N <= RelAd <= 4*N then go counter clockwise
else go across
endif
Route from 0 to 7 ? 1 to 6 ?
7
6
5 34
0 1
2
![Page 20: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/20.jpg)
The interconnect has deadlocks !
• For instance, we can have a cycle of packets
7
6
5 34
0 1
2
![Page 21: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/21.jpg)
The only possible cycle is the ring
• Because routing is across first
7
6
5 34
0 1
27 to 4
6 to 36 to 3 route through 2 not 7
![Page 22: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/22.jpg)
Nodes in f irst quarter are slaves only
• Is the system deadlock-free ?
7 10
6 2
5 34
Slave
Master
![Page 23: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/23.jpg)
• Is the system deadlock-free ?• Yes ! A cycle would require a response going from 0 to 2.
7 10
6 2
5 34
Slave
Master
Nodes in f irst quarter are slaves only
![Page 24: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/24.jpg)
From components with deadlocks a deadlock-free system emerges !
![Page 25: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/25.jpg)
• Is the system deadlock-free ?
Slave
Master14
12 4
6
5
15
3
1 2
10 8
0
11
13
9 7
Nodes in f irst quarter are slaves only
![Page 26: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/26.jpg)
• Is the system deadlock-free ?• No !
Slave
Master14
12 4
6
5
15
3
1 2
10 8
0
11
13
9 7
Nodes in f irst quarter are slaves only
![Page 27: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/27.jpg)
Confusing . . .
• We need tools to (quickly) check for deadlocks– in large systems– with message dependencies
![Page 28: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/28.jpg)
Outline
• Intel's micro-architectural description language– xMAS language– Capturing high-level structure and message dependencies
• Deadlock verification for xMAS– Definition of deadlocks– Labelled dependency graph– Feasible logically closed subgraph
• Conclusion and future work
![Page 29: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/29.jpg)
Intel 's abstraction for communication fabrics
• High-level of abstraction
• Exploit high-level structure
Automatic proofs using invariant generation and hardware model-checking
![Page 30: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/30.jpg)
xMAS - Executable MicroArchitectural Specif ications
• Fair sinks and sometimes sources• Diagram is formal model• Friendly to microarchitects
![Page 31: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/31.jpg)
Composing modules via channels
• Channels with three signals– data, input ready, target ready
• Transfer cycle– both input and target are "true"
![Page 32: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/32.jpg)
A simple example
• Two sources– both inject responsesrsp
rsp
rsp
req
![Page 33: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/33.jpg)
Another simple example
• Two message types– requests and responses
req,rsp
x
q0q1
q2
rsp
req
![Page 34: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/34.jpg)
le8, right
spoons
……
……
l
rrl
An academic example - Dining Philosophers
• Philosophers model in xMAS– Hands as 2 message types– Spoons as queues of size 1– "Eat" as join between hands
1
1
![Page 35: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/35.jpg)
An example extracted from Intel 's designs
b
q0r
q1
r
r
b
b
• Two message types– blue and red
• Sorting queues
• Reds and blues synchronized before sink
![Page 36: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/36.jpg)
Processing node for XY routing in a 2D-mesh
h.y ≠ Y
h.x = X
h.x ≠ X
h.y = Y
h.x > X
h.x < X
h.y < Y
h.y > Y
L
N
S
E
W
N
SE
W
![Page 37: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/37.jpg)
Processing node with requests and responses
h.y ≠ Y
h.x = X
h.x ≠ X
h.y = Y
h.x > X
h.x < X
h.y < Y
h.y > Y
L
N
S
E
W
N
SE
W
(dst,src,req) (src, _ ,rsp)
req
rsp
![Page 38: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/38.jpg)
Processing node for Spidergon
R
relAd < 3N
relAd = 0
relAd > N relAd ≥ 3N
relAd ≤ NCW
ACR
CCW
relAd ≠ 0
CW
ACR
CCW
![Page 39: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/39.jpg)
Outline
• Intel's micro-architectural description language– xMAS language– Capturing high-level structure and message dependencies
• Deadlock verification for xMAS– Definition of deadlocks– Labelled dependency graph– Feasible logically closed subgraph
• Conclusion and future work
![Page 40: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/40.jpg)
Formal definit ion of "deadlock" in xMAS
• Intuition is a "dead" channel
• Formal definition based on Linear Temporal Logic– Predicate logic– Temporal operators "eventually" (F) and "globally" (G)
• Channel u is dead iff– F (u.irdy => G ~u.trdy)– Eventually the input is ready and the target is globally (forever) not ready– A packet arrives at a channel but will never be able to cross it
![Page 41: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/41.jpg)
A simple example
• Two sources– one for requests– one for responses
• Is it deadlock-free ?
rsp
rsp
rsp
req
irdy
data
trdy
![Page 42: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/42.jpg)
A simple example
• Two sources for responses
• There is a deadlock– no cycle– no message dependencies
![Page 43: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/43.jpg)
A simple example
• Two sources for responses
• There is a deadlock– no cycle– no message dependencies
idle channel
![Page 44: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/44.jpg)
A simple example
• Two sources for responses
• There is a deadlock– no cycle– no message dependencies
idle channel
blocking queue
![Page 45: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/45.jpg)
Another simple example
• Two message types– requests and responses
• Types at source x without creating a deadlock ?
req,rsp
x
q0q1
q2
rsp
req
![Page 46: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/46.jpg)
Another simple example - deadlock configuration
• Two message types– requests and responses
• Types at source x without creating a deadlock ?– If x = rsp no deadlock– If x = req then requests get blocked in q1
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 47: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/47.jpg)
Another simple example - deadlock configuration (1)
• Inject two requests in q0
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 48: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/48.jpg)
Another simple example - deadlock configuration (2)
• Inject two requests in q0• Fork creates two copies
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 49: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/49.jpg)
Another simple example - deadlock configuration (3)
• Inject two requests in q0• Fork creates two copies• One pair is sunk
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 50: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/50.jpg)
Another simple example - deadlock configuration (3)
• Inject two requests in q0• Fork creates two copies• One pair is sunk
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 51: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/51.jpg)
Another simple example - deadlock configuration (3)
• Inject two requests in q0• Fork creates two copies• One pair is sunk
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 52: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/52.jpg)
Another simple example - deadlock configuration (3)
• Inject two requests in q0• Fork creates two copies• One pair is sunk
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 53: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/53.jpg)
Another simple example - deadlock configuration (3)
• Inject two requests in q0• Fork creates two copies• One pair is sunk
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 54: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/54.jpg)
Another simple example - deadlock configuration (4)
• Inject two requests in q0• Fork creates two copies• One pair is sunk• Inject two responses in q0
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 55: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/55.jpg)
Another simple example - deadlock configuration (5)
• Inject two requests in q0• Fork creates two copies• One pair is sunk• Inject two responses in q0• If x never injects responses, q1 is blocking
req,rsp
x
q0q1
q2
rsp
req
requests
responses
![Page 56: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/56.jpg)
Another simple example - deadlock configuration (5)
• Inject two requests in q0• Fork creates two copies• One pair is sunk• Inject two responses in q0• If x never injects responses, q1 is blocking
req,rsp
x
q0q1
q2
rsp
req
requests
responses
We have a deadlock without a circular wait !
![Page 57: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/57.jpg)
General approach for deadlock detection in xMAS networks
• Define deadlock equations for all components– Equations capture the reason why a component is idle or
blocking
• Build a labelled waiting graph for each queue– Labels correspond to the equations– Graph captures the topology, i.e., the dependencies between
the xMAS components
• Search for a feasible logically closed subgraph– Corresponds to a deadlock situation– Feasibility checked using Linear Programming
• This approach may output unreachable deadlocks– A first step generates invariants to rule out false deadlocks– Invariants are rather weak and simple - false deadlocks are in
theory still possible
![Page 58: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/58.jpg)
General approach for deadlock detection in xMAS networks
• Define deadlock equations for all components– Equations capture the reason why a component is idle or
blocking
• Build a labelled waiting graph for each queue– Labels correspond to the equations– Graph captures the topology, i.e., the dependencies between
the xMAS components
• Search for a feasible logically closed subgraph– Corresponds to a deadlock situation– Feasibility checked using Linear Programming
• This approach may output unreachable deadlocks– A first step generates invariants to rule out false deadlocks– Invariants are rather weak and simple - false deadlocks are in
theory still possible
![Page 59: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/59.jpg)
Idle and blocked channels - searching for deadlocks
• Definition of a deadlock– F (u.irdy => G~u.trdy)
• Two reasons for a deadlock– a blocked channel (G~u.trdy)– an "idle" channel (G~u.irdy)
u
w
v is blocked
w is idle
irdy
data
trdy
![Page 60: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/60.jpg)
Deadlock equations for a channel
• Depends on the target component connected to the channel
• We look at the input port of the target component
irdy
data
u
trdy
Initiator target
![Page 61: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/61.jpg)
Deadlock equations for queues
• Queue blocking when full and blocked message at its head• We look at the input channel of the queue
• Block(u) = Full(q) . Block(v)
v
irdy
data
trdy
u
![Page 62: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/62.jpg)
Deadlock equations for a join
• 2 cases– output is blocked– the other input is idle
• Block(u) = Idle(v) + Block(w)
irdy
data
trdy
u
vw
![Page 63: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/63.jpg)
Deadlock equations for a join
• 2 cases– output is blocked– the other input is idle
• Block(u) = Idle(v) + Block(w)
irdy
data
trdy
u
vw
We need to know when a channel is idle !
![Page 64: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/64.jpg)
Idle equations for a channel
• Depends on the initiator component connected to the channel
• We are looking at the input port of the initiator
irdy
data
u
trdy
Initiator target
![Page 65: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/65.jpg)
Idle equations for a join
• A join is idle if one of the input channels is idle
• Idle(w) = Idle(u) + Idle(v)irdy
data
trdy
u
vw
![Page 66: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/66.jpg)
Idle equations for a fork
• A fork output is idle if the input is idle or the other output is blocked
• Idle(w) = Idle(u) + Block(v)irdy
data
trdy
uv
w
![Page 67: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/67.jpg)
Idle equations for a queue
• A queue is idle if it is empty and its input channel is idle• This is for one message type which might be blocked by another type
• Idle(w) = Empty(q) . Idle(u) + Block(w')– where w' is a message with a type different from w
irdy
data
trdy
u w
![Page 68: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/68.jpg)
Our quest for "dead" queues
• Definition of a deadlock– F (u.irdy => G~u.trdy)
• We look for a "dead" queue– with a message in it (u.irdy)– output blocked (G~u.trdy)
• Over approximation– configuration not always reachable– we may output false deadlocks
![Page 69: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/69.jpg)
General approach for deadlock detection in xMAS networks
• Define deadlock equations for all components– Equations capture the reason why a component is idle or
blocking
• Build a labelled waiting graph for each queue– Labels correspond to the equations– Graph captures the topology, i.e., the dependencies between
the xMAS components
• Search for a feasible logically closed subgraph– Corresponds to a deadlock situation– Feasibility checked using Linear Programming
• This approach may output unreachable deadlocks– A first step generates invariants to rule out false deadlocks– Invariants are rather weak and simple - false deadlocks are in
theory still possible
![Page 70: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/70.jpg)
Step 1 / simulation - req
inject req
queues with req
![Page 71: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/71.jpg)
Step 1 / simulation - req
inject req
queues with req
![Page 72: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/72.jpg)
Step 1 / simulation - req
inject req
queues with req
![Page 73: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/73.jpg)
Step 1 / simulation - rsp
inject rsp
queues with req and rsp
![Page 74: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/74.jpg)
Step 1 / simulation - rsp
inject rsp
queues with req and rsp
![Page 75: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/75.jpg)
Step 1 / simulation - rsp
inject rsp
queues with req and rsp
![Page 76: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/76.jpg)
General approach for deadlock detection in xMAS networks
• Define deadlock equations for all components– Equations capture the reason why a component is idle or
blocking
• Build a labelled waiting graph for each queue– Labels correspond to the equations– Graph captures the topology, i.e., the dependencies between
the xMAS components
• Search for a feasible logically closed subgraph– Corresponds to a deadlock situation– Feasibility checked using Linear Programming
• This approach may output unreachable deadlocks– A first step generates invariants to rule out false deadlocks– Invariants are rather weak and simple - false deadlocks are in
theory still possible
![Page 77: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/77.jpg)
Step 2 / labelled dependency graph (1)
q1q1.req ≥ 1
join
start
join
start with a message in q1 and visit the join
![Page 78: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/78.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
joinu
vw
analyse the join according to its deadlock equation
Block(u) = Idle(v) + Block(w)mrg2
sw
+
mrg2
sw
we go forward to the merge and backward to the switch
![Page 79: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/79.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
join
u
vw
forwards to the switch - then the sink can never be blocked
Block(u) = Block(w)mrg2
sw
+
mrg2
sw
we assume fair sinks
sink
false
![Page 80: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/80.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
join
uv
w
backwards to the switch
Idle(u) = Idle(w)mrg2
sw
+
mrg2
sw
sink
false
![Page 81: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/81.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
join
uw
backwards to the queue
Idle(u) = Idle(w) . Empty(q2)mrg2
sw
+
mrg2
sw
sink
false
q2.rsp = 0q2
note that we forgot the Block(w') case
![Page 82: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/82.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
join
uw
backwards to the merge and branch
Idle(w) = Idle(u) + Idle(v)mrg2
sw
+
mrg2
sw
sink
false
q2.rsp = 0q2mrg1
uv
note branching is bad for us
![Page 83: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/83.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
join
u
w
Idle(u) = Block(v) + Idle(w)
mrg2
sw
+
mrg2
sw
sink
false
q2.rsp = 0q2mrg1
uv
rsp �∈ x
frk
src2 .
backwards to the merge and branch
to the source - idle if no type produced
to the fork
![Page 84: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/84.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
join
u
w
Idle(u) = Idle(w) . Empty(q0)
mrg2
sw
+
mrg2
sw
sink
false
q2.rsp = 0q2mrg1
u
rsp �∈ x
frk
src2 .
backwards to q0 and the source
q0.rsp = 0false
src1 q0
![Page 85: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/85.jpg)
Step 2 / labelled dependency graph (2)
q1q1.req ≥ 1
join
start
join
Block(u) = Block(w) . Full(q1)
mrg2
sw
+
mrg2
sw
sink
false
q2.rsp = 0q2mrg1
u
rsp �∈ x
frk
src2 .
forwards back to q1 and stop expansion
q0.rsp = 0false
src1 q0 q1 = q1.size+
w
![Page 86: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/86.jpg)
General approach for deadlock detection in xMAS networks
• Define deadlock equations for all components– Equations capture the reason why a component is idle or
blocking
• Build a labelled waiting graph for each queue– Labels correspond to the equations– Graph captures the topology, i.e., the dependencies between
the xMAS components
• Search for a feasible logically closed subgraph– Corresponds to a deadlock situation– Feasibility checked using Linear Programming
• This approach may output unreachable deadlocks– A first step generates invariants to rule out false deadlocks– Invariants are rather weak and simple - false deadlocks are in
theory still possible
![Page 87: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/87.jpg)
q1.req ≥ 1
q2.rsp = 0rsp �∈ x
q0.rsp = 0
q1 = q1.size
false
Step 2 / logically closed subgraph 1
q1
join
mrg2 sink
sw
q2mrg1
frk
src1
src2 .
+q0
false+
![Page 88: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/88.jpg)
q1.req ≥ 1
q2.rsp = 0rsp �∈ x
q0.rsp = 0
q1 = q1.size
false
Step 2 / logically closed subgraph 1
q1
join
mrg2 sink
sw
q2mrg1
frk
src1
src2 .
+q0
false+
![Page 89: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/89.jpg)
q1.req ≥ 1
q2.rsp = 0rsp �∈ x
q0.rsp = 0
q1 = q1.size
false
Step 2 / logically closed subgraph 1
q1
join
mrg2 sink
sw
q2mrg1
frk
src1
src2 .
+q0
false
not feasible
+
![Page 90: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/90.jpg)
q1.req ≥ 1
q2.rsp = 0rsp �∈ x
q0.rsp = 0
q1 = q1.size
false
Step 2 / logically closed subgraph 2
q1
join
mrg2 sink
sw
q2mrg1
frk
src1
src2 .
+q0
false+
![Page 91: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/91.jpg)
q1.req ≥ 1
q2.rsp = 0rsp �∈ x
q0.rsp = 0
q1 = q1.size
false
Step 2 / logically closed subgraph 2
q1
join
mrg2 sink
sw
q2mrg1
frk
src1
src2 .
+q0
false+
![Page 92: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/92.jpg)
q1.req ≥ 1
q2.rsp = 0rsp �∈ x
q0.rsp = 0
q1 = q1.size
false
Step 2 / logically closed subgraph 2
q1
join
mrg2 sink
sw
q2mrg1
frk
src1
src2 .
+q0
false
feasible if x is req
req
+
![Page 93: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/93.jpg)
Implementation and case studies
• Implementation of algorithm in C
• 2 topologies– Spidergon from STMicroelectronics– HERMES from Univ. Rio Grande (Brazil)
![Page 94: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/94.jpg)
Experimental results
![Page 95: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/95.jpg)
An academic example - Dining Philosophers
• Philosophers model in xMAS– Hands as 2 message types– Spoons as queues of size 1– "Eat" as join between hands
• A ring of philosophers– Easy problem for our algorithm– 3 000 philos in 0.06s
le8, right
spoons
……
……
l
rrl
![Page 96: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/96.jpg)
An example too hard for Intel but not for us !
• Two message types– blue and red
• Sorting queues
• Reds and blues synchronized before sink
• Is this network deadlock-free ?
![Page 97: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/97.jpg)
An example too hard for Intel but not for us !
• Two message types– blue and red
• Sorting queues
• Reds and blues synchronized before sink
• Is this network deadlock-free ?
deadlock-free
![Page 98: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/98.jpg)
An example too hard for Intel but not for us !
• Two message types– blue and red
• Sorting queues
• Reds and blues synchronized before sink
• Is this network deadlock-free ?
deadlock-free
ordering for blues and
reds
![Page 99: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/99.jpg)
Outline
• Intel's micro-architectural description language– xMAS definition– examples
• Deadlock verification for xMAS– definition of deadlocks– labelled dependency graph– feasible logically closed subgraph
• Conclusion and future work
![Page 100: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/100.jpg)
Conclusion and future work
• Tool to detect message dependent deadlocks– Very efficient– Intel's our only concurrent– We can already handle more cases than Intel's techniques
• Still need to be formally proven– Connection with our previous work on GeNoC
• Composition/Hierarchy– Check sub-networks first and then compose
• Memory consistency proofs– e.g. Producer Consumer relations– Open PhD position sponsored by Intel/Open Universiteit/Radboud
![Page 101: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/101.jpg)
Thanks !
![Page 102: Deadlock Verification in Network-on-Chipsfvaan/PV/aes_genoc_060611.pdf · 80 Cores Research Chip • Teraflops, 62 Watts • 100 millions transistors, 275 mm2 • 25% node area for](https://reader033.vdocuments.us/reader033/viewer/2022043023/5f3f41fc8c0f5a5dfc52687e/html5/thumbnails/102.jpg)
Deadlock example 3
• Channels with three signals– data, input ready, target ready
• Transfer cycle– both input and target are "true"