rebound: scalable checkpointing for coherent shared...
TRANSCRIPT
![Page 1: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/1.jpg)
Rebound: Scalable Checkpointing for Coherent Shared Memorfor Coherent Shared Memory
Rishi Agarwal, Pranav Garg, and Josep TorrellasD f C S iDepartment of Computer Science
University of Illinois at Urbana-Champaignhttp://iacoma.cs.uiuc.edup
![Page 2: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/2.jpg)
Checkpointing in Shared-Memory MPs
rollback
Faultsave chkpt
save chkpt
• HW-based schemes for small CMPs use Global checkpointing– All procs participate in system-wide checkpoints
P1 P2 P3 P4
checkpoint
h k i t
P1 P2 P3 P4
• Global checkpointing is not scalable
checkpoint
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
– Synchronization, bursty movement of data, loss in rollback…2
![Page 3: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/3.jpg)
Alternative: Coordinated Local Checkpointing
• Idea: threads coordinate their checkpointing in groups• Rationale:
– Faults propagate only through communication – Interleaving between non-comm. threads is irrelevant
P1 P2 P3 P4 P5 P1 P2 P3 P4 P5
GlobalChkpt
LocalChkptLocal
Chkpt
+ Scalable: Checkpoint and rollback in processor groupsC l it R d i t th d d d d i ll
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
3
– Complexity: Record inter-thread dependences dynamically.
![Page 4: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/4.jpg)
Contributions
Rebound: First HW-based scheme for scalable, coordinated local checkpointing in coherent shared-memory
• Leverages directory protocol to track inter-thread deps.
p g y
• Opts to boost checkpointing efficiency:• Delaying write-back of data to safe memory at checkpoints• Supporting multiple checkpoints• Optimizing checkpointing at barrier synchronization
• Avg. performance overhead for 64 procs: 2%• Compared to 15% for global checkpointing
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
p g p g
4
![Page 5: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/5.jpg)
Background: In-Memory Checkpt with ReVive
P1 P2 P3Register
[Pvrulovic-02]
ExecutionP1 P2 P3Register
Dump
Caches
CHK
Dirty Cache
Displacement
Writebacks
Writeback
W W W W WBDirty Cache linesCheckpoint
ApplicationStalls
MemoryLogLogging
Stalls
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
5
![Page 6: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/6.jpg)
Background: In-Memory Checkpt with ReVive
[Pvrulovic-02]Old Register
restoredP3P2P1
FaultCHK
P3P2
Caches
P1
Cache Invalidated
Memory LinesR d
W W W W WB
Reverted
Log Memory
GlobalBroadcast protocol
Local CoordinatedScalable protocol
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
6
![Page 7: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/7.jpg)
Coordinated Local Checkpointing Rules
P1 P2 P1 P2 P1 P2
wr x
P1 P2 P1 P2 P1 P2
rd x
Producerrollback
Consumerrollback
Producerchkpoint
Consumerchkpoint
chkptchkpt
rollback rollback chkpoint chkpoint
P checkpoints P’s producers checkpointP rolls back P’s consumers rollback
• Banatre et al. used Coordinated Local checkpointing for bus-based machines [Banatre96]
P rolls back P s consumers rollback
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
based machines [Banatre96]
7
![Page 8: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/8.jpg)
Rebound Fault Model
Log (in SW)Main Memory
Chip Multiprocessor
Log (in SW)
• Any part of the chip can suffer transient or permanent faults.• A fault can occur even during checkpointing• Off-chip memory and logs suffer no fault on their own (e g NVM)Off chip memory and logs suffer no fault on their own (e.g. NVM)• Fault detection outside our scope:
• Fault detection latency has upper-bound of L cycles
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
8
![Page 9: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/9.jpg)
Rebound Architecture
Main Memory
Chip Multiprocessor
P+L1
L2DirectoryCache
MyProducerMyConsumer
DepRegister
LW-ID
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
9
![Page 10: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/10.jpg)
Rebound Architecture
Main Memory
Chip Multiprocessor
P+L1
L2DirectoryCache
MyProducerMyConsumer
DepRegister
• Dependence (Dep) registers in the L2 cache controller:
LW-ID
p ( p) g• MyProducers : bitmap of proc. that produced data consumed by
the local proc.• MyConsumers : bitmap of proc that consumed data producedMyConsumers : bitmap of proc. that consumed data produced
by the local proc.
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
10
![Page 11: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/11.jpg)
Rebound Architecture
Main Memory
Chip Multiprocessor
P+L1
L2DirectoryCache
MyProducerMyConsumer
DepRegister
• Dependence (Dep) registers in the L2 cache controller:
LW-ID
p ( p) g• MyProducers : bitmap of proc. that produced data consumed by
the local proc.• MyConsumers : bitmap of proc that consumed data producedMyConsumers : bitmap of proc. that consumed data produced
by the local proc. • Processor ID in each directory entry:
LW ID l t it t th li i th t h k i t i t l
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
• LW-ID : last writer to the line in the current checkpoint interval.
11
![Page 12: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/12.jpg)
Recording Inter-Thread Dependences
P1 P2
Write
P1 writes MyProducersMyConsumers
MyProducersMyConsumers
DP1
Write
LW-ID
Log Memory
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
Assume MESI protocol12
![Page 13: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/13.jpg)
Recording Inter-Thread Dependences
P1 P2MyConsumers P2
P2 reads
y
MyProducers P1
MyProducersMyConsumers
MyProducersMyConsumersP2
P1
DP1 S
LW-ID
Write back
Logginggg g
MemoryLog
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
Assume MESI protocol13
![Page 14: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/14.jpg)
Recording Inter-Thread Dependences
P1 P2
P1 writes P2P1MyProducers
MyConsumersMyProducersMyConsumers
P1 SP1
LW-ID
DP1
MemoryLog
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
Assume MESI protocol14
![Page 15: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/15.jpg)
Recording Inter-Thread Dependences
P1 P2Clear Dep registers
Clear LW ID
P1 checkpoints P2P1MyProducers
MyConsumersMyProducersMyConsumers
p g
P1P1 S
W it b k
Clear LW-ID
LW-ID shouldremain set tillth li i
LW-ID
P1 DWritebacks
Logging
the line ischeckpointed
MemoryLog
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
Assume MESI protocol15
![Page 16: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/16.jpg)
Distributed Checkpointing Protocol in SW
• Interaction Set [Pi]: set of producer processors (transitively) for Pi
– Built using MyProducers– Built using MyProducers
P1P1 P2 P3 P4 InteractionSet : P1
P1
chk
initiatecheckpointcheckpoint
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
16
![Page 17: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/17.jpg)
Distributed Checkpointing Protocol in SW
• Interaction Set [Pi]: set of producer processors (transitively) for Pi
– Built using MyProducers– Built using MyProducers
P1P1 P2 P3 P4 InteractionSet : P1, P2, P3
P1
P2 P3
Ck? Ck?chk
initiatecheckpointcheckpoint
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
17
![Page 18: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/18.jpg)
Distributed Checkpointing Protocol in SW
• Interaction Set [Pi]: set of producer processors (transitively) for Pi
– Built using MyProducers– Built using MyProducers
P1P1 P2 P3 P4 InteractionSet : P1, P2, P3
P1
P2 P3
Ck? Ck?chk
initiatecheckpoint
P4
Ck?
checkpoint
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
18
![Page 19: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/19.jpg)
Distributed Checkpointing Protocol in SW
• Interaction Set [Pi]: set of producer processors (transitively) for Pi
– Built using MyProducers– Built using MyProducers
P1P1 P2 P3 P4 InteractionSet : P1, P2, P3
P1
P2 P3
Ck? Ck?chk
initiatecheckpoint
P4
Ck?
checkpoint
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
19
![Page 20: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/20.jpg)
Distributed Checkpointing Protocol in SW
• Interaction Set [Pi]: set of producer processors (transitively) for Pi
– Built using MyProducers– Built using MyProducers
P1P1 P2 P3 P4 InteractionSet : P1, P2, P3
P1
P2 P3
Ck? Ck?chk
initiatecheckpoint
P4
Ck?
• Rollback handled similarly using MyConsumers
checkpoint
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
20
![Page 21: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/21.jpg)
Optimization1 : Delayed Writebacks
Inte
rval
I1Ti
me
nter
val
I1
Stall
WB dirty lines
sync
sync
Che
ckpo
int
Stallsync
WB dirty lines
eckp
oint
Innt
erva
l I2
Stall
C
Inte
rval
I2
syncCh In
• Checkpointing overhead dominated by data writebacks
• Delayed Writeback optimization• Processors synchronize and resume execution• Hardware automatically writes back dirty lines in background • Checkpoint only completed when all delayed data written back
Still d t d i t th d d d d l d d t
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
• Still need to record inter-thread dependences on delayed data
21
![Page 22: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/22.jpg)
Delayed Writeback Pros/Cons
+ Significant reduction in checkpoint overhead
- Additional support:Each processor has two sets of Dep. registers E h h li h d l d bitEach cache line has a delayed bit
- Increased vulnerabilityyA rollback event forces both intervals to roll back
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
22
![Page 23: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/23.jpg)
Optimization2 : Multiple Checkpoints
• Problem: Fault detection is not instantaneous– Checkpoint is safe only after max fault-detection latency (L)
Dep registers 1Ckpt 1
p y y ( )
ectio
nen
cy Dep registers 2Rol
lbac
k
Ckpt 2
Fault
Det
eLa
te
tf
• Solution: Keep multiple checkpoints– On fault, roll back interacting processors to safe checkpoints
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
• No Domino Effect 23
![Page 24: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/24.jpg)
Multiple Checkpoints: Pros/Cons
+ Realistic system: supports non-instantaneous fault detection
- Additional support:Each checkpoint has Dep registers Dep registers can be recycled only after fault detection latency
- Need to track communication across checkpointsNeed to track communication across checkpoints
- Combination with Delayed Writebacks: one more Dep register set
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
24
![Page 25: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/25.jpg)
Optimization3 : Hiding Chkpt behind Global Barrier
• Global barriers require that all processors communicateLeads to global checkpoints– Leads to global checkpoints
• Optimization:p– Proactively trigger a global checkpoint at a global barrier– Hide checkpoint overhead behind barrier imbalance spins
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
25
![Page 26: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/26.jpg)
Evaluation Setup
• Analysis tool using Pin + SESC cycle-acc. simulator + DRAMsim• Applications: SPLASH-2 some PARSEC ApacheApplications: SPLASH 2 , some PARSEC, Apache• Simulated CMP architecture with up to 64 threads• Checkpoint interval : 5 – 8 ms• Modeled several environments:
• Global: baseline global checkpointing• Rebound: Local checkpointing scheme with delayed writeback• Rebound: Local checkpointing scheme with delayed writeback.• Rebound_NoDWB: Rebound without the delayed writebacks.
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
26
![Page 27: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/27.jpg)
Avg. Interaction Set: Set of Producer Processors
64
38
• Most apps: interaction set is a small setMost apps: interaction set is a small set– Justifies coordinated local checkpointing– Averages brought up by global barriers
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
27
![Page 28: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/28.jpg)
Checkpoint Execution Overhead
30
40
nt
GlobalRebound_NoDWBR b d
10
20
% C
heck
poi
Ove
rhea
d Rebound
2
15
0
Bar
nes
Cho
lesk
y Fft
Fmm
Rad
ix
Lu-C
Lu-N
C
Volre
nd
Wat
er-
Sp
Wat
er-
Nsq
Rad
iosi
ty
Oce
an
Ray
trace
SP
2-AV
G
%
• Rebound’s avg checkpoint execution overhead is 2%– Compared to 15% for GlobalCompared to 15% for Global
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
28
![Page 29: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/29.jpg)
Checkpoint Execution Overhead
30
40
nt
GlobalRebound_NoDWBR b d
10
20
% C
heck
poi
Ove
rhea
d Rebound
0
Bar
nes
Cho
lesk
y Fft
Fmm
Rad
ix
Lu-C
Lu-N
C
Volre
nd
Wat
er-
Sp
Wat
er-
Nsq
Rad
iosi
ty
Oce
an
Ray
trace
SP
2-AV
G
%
• Rebound’s avg checkpoint execution overhead is 2%– Compared to 15% for GlobalCompared to 15% for Global
• Delayed Writebacks complement local checkpointing
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
29
![Page 30: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/30.jpg)
Rebound Scalability
Constant problem size
• Rebound is scalable in checkpoint overhead• Delayed Writebacks help scalability
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
30
![Page 31: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/31.jpg)
Also in the Paper
• Delayed write backs also useful in GlobalBarrier optimi ation is effecti e b t not ni ersall applicable• Barrier optimization is effective but not universally applicable
• Power increase due to hardware additions < 2%• Rebound leads to only 4% increase in coherence trafficy
R. Agarwal, P. Garg, J. TorrellasRebound: Scalable Checkpointing
31
![Page 32: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/32.jpg)
Conclusions
Rebound: First HW-based scheme for scalable, coordinated local checkpointing in coherent shared-memory
• Leverages directory protocol• Boosts checkpointing efficiency:
p g y
• Boosts checkpointing efficiency:• Delayed write-backs• Multiple checkpoints• Barrier optimization
• Avg. execution overhead for 64 procs: 2%
• Future work:• Apply Rebound to non-hardware coherent machines
SR. Agarwal, P. Garg, J. Torrellas
Rebound: Scalable Checkpointing
• Scalability to hierarchical directories32
![Page 33: Rebound: Scalable Checkpointing for Coherent Shared ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca11_2.pdf · • Processors synchronize and resume execution • Hardware automatically](https://reader033.vdocuments.us/reader033/viewer/2022053000/5f04e7b17e708231d4104adf/html5/thumbnails/33.jpg)
Rebound: Scalable Checkpointing for Coherent Shared Memorfor Coherent Shared Memory
Rishi Agarwal, Pranav Garg, and Josep TorrellasD f C S iDepartment of Computer Science
University of Illinois at Urbana-Champaignhttp://iacoma.cs.uiuc.edup