a systematic methodology to develop resilient cache coherence protocols
DESCRIPTION
A Systematic Methodology to Develop Resilient Cache Coherence Protocols. Konstantinos Aisopos (Princeton, MIT ) Li- Shiuan Peh (MIT ). Motivation. CMP era is here … Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon - PowerPoint PPT PresentationTRANSCRIPT
A Systematic Methodology to Develop Resilient
Cache Coherence Protocols
Konstantinos Aisopos (Princeton, MIT)Li-Shiuan Peh (MIT)
Motivation
• CMP era is here…
• Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months)
NIC
P$ S$
P
C C … CC
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
[1,2]
[1] R. Bauman (TI), IEEE Design Test of Computers, vol. 22 (3), 2005 [2] J. Graham (MoSys), EE Times, 2002
Motivation
• CMP era is here…
• Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months)• Goal: resilient cache coherence protocol
NIC
P$ S$
P
C C … CC
loss of a single coherencemessage : deadlock
R
Rdatarequest
R RR
S
R
Outline• Motivation• Methodology
–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties
• Evaluation–Overhead–Performance
• Conclusions
S1 S2
S S
R
S
SM
dir
I I
M
request (M)
unblock
ackack
S{ } BM M{ }
request (M)
R S1S2 R
1. initiator sends request to the directory2. directory forwards request to the sharers3. sharers invalidate their copy and acknowledge 4. request completes and initiator sends unblock to the dir5. dir updates sharing vector and may now process succeeding requests
Walkthrough Example:transaction resilient transaction
S1 S2R
Sdir
request (M)
request (M)
SM
request (M)
1. initiator sends request to the directory2. request is lost3. initiator resends request after a timeout4. directory forwards request to the sharers (…transaction continues identically as before)
Walkthrough Example:transaction resilient transaction
S2 S1R
request (M)
ack
S{R,S1,S2} BM
request (M)
Srequest (M)
SM
dir
ack
S{ }R S1S2
1. initiator resends its request
Walkthrough Example:transaction resilient transaction
S2 S1R
request (M)
ackack
S{R,S1,S2} BM
request (M)
Srequest (M)
SM
Srequest(M)
request (S)
BS
unblock
SM
BM
request(M)
?
request(M)
dir
tolerate a duplicate request:(1) transit to same state(2) generate the same messages
S{ }R S1S2
1. initiator resends its request
Walkthrough Example:transaction resilient transaction
BM
(M)request
unblock
S2 S1R
request (M)
ack
request (M)
Srequest (M)
SM
ack
dir
S{R,S1,S2} BMS{ }R S1S2
1. initiator resends its request2. directory forwards the request to sharers (again)
Walkthrough Example:transaction resilient transaction
S2 S1
request (M)
ackack
S
I
request(M) ack
request(M)
ack
Walkthrough Example:transaction resilient transaction
tolerate a duplicate request:(1) transit to same state(2) generate the same messages
S2 S1R
request (M)
ack
request (M)
Srequest (M)
SM
ack
dir
ackack
M
1. initiator resends its request2. directory forwards the request to sharers (again)3. sharers acknowledge (again) (…transaction completes identically as before)
Walkthrough Example:transaction resilient transaction
Outline• Motivation• Methodology
–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties
• Evaluation–Overhead–Performance
• Conclusions
Defining the Resilience Properties
request R
………
Rresponse
- same state transition - same outgoing messages- same state transition
- same outgoing messages
response
message loss => transaction suspended the requestor
regenerates its request after timeout
Defining the Resilience Properties
request
X
A
msgA
…
msgAY
msgA
…
msgB
msgA msgB
transient…
transient
stable
requeststable
messagelast
R
…
……
Property 1 initiator remains transient
throughout the transaction
Property 2 replicate msgs roll-back
to same earlier state
Property 3 retain information
to regenerate msgs
Rresponse
Outline• Motivation• Methodology
–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties
• Evaluation–Overhead–Performance
• Conclusions
Enforcing Property 1
the initiator remains transient throughout a transaction to be able to resend lost messages
transient…
transient
stable
requeststable
messagelast
Property 1
Enforcing Property 1
the initiator remains transient throughout a transaction to be able to resend lost messages
transient…
transient
requeststable
messagelast
Property 1
transient
stable
requeststable
dir…response
unblock
done initiator cannot
resend unblock
counter-example:Enforcement:
transient
- detect every outgoing message that transits the initiator to stable state
- replace the stable with a transient state, and wait for done
stable
Enforcing Property 2Property 2
A
msgA
…msgA
replicate messages roll-back to the earlier state the original message transitioned to
T1
S
msgA
…
T2
msgA
…
… ……TM … TM2
T1
S
msgA
…
T2
msgA
…
… ……TM1TM
disassociate branches after merging pointmsgA msgA
msgAT1 or T2?
Enforcing Property 2
replicate messages roll-back to the earlier state the original message transitioned to
Property 2A
msgA
…msgA
unique data
I
M
Rrequest (M)
dir( )
unique data
request (M)dir( )
Enforcing Property 3
retain info to regenerate every outgoing message, in case a replicate request is received
Property 3 msgA
…
msgB
msgA msgB
Sharer
TM
…
unique data
M
Rrequest (M)
dir( )
ITI invalidate permission
invalidate ack
…
Enforcing Property 3
retain info to regenerate every outgoing message, in case a replicate request is received
Property 3 msgA
…
msgB
msgA msgB
Sharer
unique dataretains
Outline• Motivation• Methodology
–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties
• Evaluation–Overhead–Performance
• Conclusions
Evaluation: Overhead
directory-based protocol (static directory node, MESI)base states resilient states
stable
Modified Md (M, waiting done)Ed (E, waiting done)Exclusive
Shared Sd (S, waiting done)
Invalid Id (I, waiting done)
transient
IM (I M) Sp (S, waiting permission)
IS (I S) Ip (I, waiting permission)
SM (SM) Ma (M, waiting ack)
ISI (IS I) Sa (S, waiting ack)
MI (M I)
base states resilient statesstableransient
Modified Md (M, waiting done)
Owned Ed (E, waiting done)
Exclusive Sd (S, waiting done)
Shared Id (I, waiting done)
Invalid MId (MI, waiting done)
transient
IM (I M) Sp (S, waiting permission)
IS (I S) Ip (I, waiting permission)
SM (S M) Ma (M, waiting ack)
SE (S E) Ea (E, waiting ack)
SS (S S) Sa (S, waiting ack)
OM (OM)
WB req
broadcast-based protocol (AMD Hammer, MOESI)
9 to 17 states (4 to 5 bits)
12 to 22 states (4 to 5 bits)
stab
letr
ansie
nt
stab
letr
ansie
nt
No state was introduced into the critical path of serving a request
PC address requestor flags state
Miss Status Holding Register (MSHR)
entr
ies
4-
32
timer
0 to 213
state
1bit 13bits
response bitvector
64bits
transID
6bits
11 bytes
total storage overhead : < 0.5 KB / core (worst-case: 2KB / core)
(*)
assuming a 64-node CMP with in-order cores(*)
Evaluation: Overhead
Network-on-ChipTopology 8x8 meshChannels 64-bitVNets 5Routing XY
System ConfigurationProcessors in-order SPARC coresL1 Caches 64KB/node, 3 cycles 4-way
64Byte blkL2 Caches 1MB/node, 6 cyclesMemory 4 controllers * 1GB, 160 cycles
Simulator: Wisconsin Multifacet GEMS
Evaluation: Performance
0%
2%
4%
6%
8%
10%
12%no faults1 fault / 1msec1 fault / 100μsec1 fault / 10μsec
benchmark
runti
me
over
head
(%)
fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate
SPLASH PARSEC
7.4%
11%
1.4%
1.8%
1.1%
3.5%
lower is better
directory protocol
Evaluation: Performancemetric: runtime overhead vs. non-resilient baseline
0%
5%
10%
15%
20%
25%
30%no faults1 fault / 1msec1 fault / 100μsec1 fault / 10μsec
benchmark
runti
me
over
head
(%)
fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate
SPLASH PARSEC
2.4%
5.1%
0.5%
20.4%
51%
56%
broadcast protocol
Evaluation: Performancemetric: runtime overhead vs. non-resilient baseline
Outline• Motivation• Methodology
–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties
• Evaluation–Overhead–Performance
• Conclusions
We have presented a generic methodology:• coherence protocol -> resilient coherence protocol …by enforcing 3 properties• minimal hardware overhead (<2KB / node)• small performance overhead
– directory-based protocol: 1.4% (1 fault / msec)– broadcast-based protocol: 2.4% (1 fault / msec)
Conclusions
Thank You!
Questions?
BACKUP SLIDES
Why performance overhead?
• transactions last longer => a request may have to wait for outstanding
conflicting requests to complete• data remain in caches for longer (3-way hs) => cache replacement duration • more messages are injected in the NoC => network traffic => average NoC latency
Transaction DurationB R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R
L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1fft fmm lu radix water nsq water sp blacksc
holescanneal fluidan-
imateswaptions x264
0
50
100
150
200
250
300
350 Property 2 (inval-idation HS)Property 1 (done message)baseline trans-action
benchmark
dura
tion
(cyc
les)
B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2)
+12%
+18%
Transaction Duration
11%
24%
B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R
L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1fft fmm lu radix water nsq water sp blacksc
holescanneal fluidan-
imateswaptions x264
0
50
100
150
200
250
300
350 Property 2 (inval-idation HS)Property 1 (done message)baseline trans-action
benchmark
dura
tion
(cyc
les)
B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2)
large working sets, shared data =>high number of requests (high traffic)(!) retransmissions saturate network)
Network Traffic
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80%
10%
20%
30%
40%
50%
60%
70%baseline protocol, no faultsextrapolated (baseline protocol, no faults)resilient protocol, no faultsextrapolated (resilient protocol, no faults)resilient protocol, 1fault/10μsecextrapolated (resilient protocol, 1 fault/10μsec)
request rate (total requests / cycle)
link
utiliz
ation
(%)
most congested link
average over all links
Enforcing the Resilience Properties
A single message type transits to a unique state in every FSM branch
P2
……T1
T2
msgA
…
Case 2: identical messages in same branch
X
Y
msgA
T count =1
T count =2
ack
SM + acks =1ack
SM + acks =2
Rrequest (M)
SM + acks =0
…
M
Enforcing the Resilience Properties
A single message type transits to a unique state in every FSM branch
P2
……
msgA
…
Case 2: identical messages in same branch
X
Y
msgA
T count =1
T count =2
……
XmsgA
T [XYZ=100]
msgA
…
YT [XYZ=110]
Enforcing the Resilience Properties
A single message type transits to a unique state in every FSM branch
P2
……
msgA
…
Case 2: identical messages in same branch
X
Y
msgA
T count =1
T count =2
……
XmsgA
T [XYZ=100]
msgA
…
XT [XYZ=100]
(duplicate)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 19 21 22 23
24 25 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
2018
26