fault tolerant techniques on cgra - yonsei
TRANSCRIPT
Fault Tolerant Techniques on CGRA
Jihoon Kang
http://dclab.yonsei.ac.kr 2016-04-02
Dependable Computing Lab Department of Computer Science
Yonsei University
Committee Kyoungwoo Lee
Bernd Burgstaller Yosub Han
Preliminary Oral Examination for Master Degree
April 22nd, 2013
Contents
Motivation Related works Problem definition Thesis proposal
– Selective validation for efficient protection
Experiments Conclusions
2016-04-02 2
2016-04-02
1. Motivation
Reliability is important
2016-04-02 4
Motivation
Computer system doesn’t work properly because of soft error
Soft errors can cause significant financial loss
Reliability is important
2016-04-02 5
Motivation
Soft errors can cause significant financial loss
Japanese and London stock market were halts
Reliability is important
2016-04-02 6
Motivation
Soft errors can cause significant financial loss
Japanese and London stock market were halts
financial loss is quite large
Reliability is important
2016-04-02 7
Motivation
- Reliability is directly connected our life - Vulnerable reliability
Unintended acceleration
Reliability is related to our life
Reliability is important
2016-04-02 8
Motivation
Unintended acceleration
Incorrect operation of implanted devices
Reliability is related to our life
Reliability is important
2016-04-02 9
Motivation
Robot malfunctions
Fukushima nuclear accident
Robot malfunctions due to lots of radiation
Reliability is important
2016-04-02 10
Motivation
- Soft error damaged financial losing - Reliability is directly connected our life
- Reliability is directly connected our life - Vulnerable reliability
Reliability is an important concern
Soft error – an increasing concern Motivation
2016-04-02 11
Transistor
source drain
D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN
Temporary bit flip in a semiconductor device
0
Soft error – an increasing concern Motivation
2016-04-02 12
Transistor
source drain
D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN
Temporary bit flip in a semiconductor device
0
Soft error – an increasing concern Motivation
2016-04-02 13
+ - +
+ + - - -
Transistor
source drain
D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN
Temporary bit flip in a semiconductor device
0
Soft error – an increasing concern Motivation
2016-04-02 14
+ - +
+ + - - -
Transistor
source drain
D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN
Temporary bit flip in a semiconductor device
1
Soft error – an increasing concern Motivation
2016-04-02 15
+ - +
+ + - - -
Transistor
source drain
D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN
Temporary bit flip in a semiconductor device
0
SER ∝ Nflux CS x exp Qcritical {- x
Qs }
Qcritical = Capacitance Voltage x where
Soft error – an increasing concern Motivation
2016-04-02 16
+ - +
+ + - - -
Transistor
source
1 0
D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN
Soft error rate – Is now 1 per year – Exponentially increases
with technology scaling
SER ∝ Nflux CS x exp Qcritical {- x
Qs }
Qcritical = Capacitance Voltage x where
Soft error – an increasing concern Motivation
2016-04-02 17
+ - +
+ + - - -
Transistor
source
1 0
D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN
Soft errors are becoming a critical design concern
Soft error rate – Is now 1 per year – Exponentially increases
with technology scaling
CGRA Architecture Motivation
2016-04-02 18
Emerging architecture – High performance, flexibility, low power
PE PE
PE PE
PE PE
PE PE
PE PE
PE PE
PE PE
PE PE
Configuration memory
PE array
Local mem
ory
Main m
emory
CGRA Architecture Motivation
2016-04-02 19
Emerging architecture – High performance, flexibility, low power
PE PE
PE PE
PE PE
PE PE
PE PE
PE PE
PE PE
PE PE
Configuration memory
PE array
Local mem
ory
Main m
emory
FU RF
Output register
From neighbor
To neighbor
Advantages of CGRA Motivation
2016-04-02 20
Specific kernels in a thread can be power/performance critical
Processor
Program thread
Kernel to accelerate
A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP
Advantages of CGRA Motivation
2016-04-02 21
Processor
A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP
Specific kernels in a thread can be power/performance critical
Advantages of CGRA Motivation
2016-04-02 22
Processor
co-processor
A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP
Specific kernels in a thread can be power/performance critical
Advantages of CGRA Motivation
2016-04-02 23
Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)
Processor
co-processor
A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP
Advantages of CGRA Motivation
2016-04-02 24
Processor
co-processor
A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP
Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)
Advantages of CGRA Motivation
2016-04-02 25
Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)
– Power consuming processor execution is saved – Better performance of thread is realized – Overall throughput is increased
Processor
co-processor
A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP
Advantages of CGRA Motivation
2016-04-02 26
Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)
– Power consuming processor execution is saved – Better performance of thread is realized – Overall throughput is increased
Processor
co-processor
A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP
CGRA becomes more and more popular thanks to high performance, flexibility and low power
Popular CGRA usage for reliability concern Motivation
2016-04-02 27
https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;
CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue
Popular CGRA usage for reliability concern Motivation
2016-04-02 28
https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;
CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue
Vehicle engine control unit
Popular CGRA usage for reliability concern Motivation
2016-04-02 29
https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;
CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue
Vehicle engine control unit
Airplane control unit
Popular CGRA usage for reliability concern Motivation
2016-04-02 30
https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;
Vehicle engine control unit
Airplane control unit Data server system
CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue
Popular CGRA usage for reliability concern Motivation
2016-04-02 31
https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;
CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue
Vehicle engine control unit
Airplane control unit Data server system
Medical device
Popular CGRA usage for reliability concern Motivation
2016-04-02 32
https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;
Soft error occurrence in various embedded systems can cause catastrophic results
2016-04-02
2. Related works
Previously proposed techniques for CGRAs
2016-04-02 34
Related works
Paper Key Idea Experiment drawback Comment
[Alnajiar, 2009]
dynamic operation modes in CGRA architecture to provide the various levels of reliability under the performance constraint.
set up - compare to the number of gate (using tool : RTL) - flexible protection
mechanism
- area-overhead (implement voter) - performance degradation (cant use resource fully) result - area : 26.6% increase
[jafri, 2010] self-checking residue mode for multiplication and addition operations on DART architecture
set up - compare to original FU, DMR FU and self-checking FU (using STMicroelectronics 130 nm) - less area-overhead
compare to DMR
- area-overhead (implement self-checking) - performance degradation - only detection - not cover specific fault result
- area : 18% decrease compare to DMR - performance : 400% decrease compare to DMR
[Schweizer, 2011]
exploiting unused FUs for replications to increase the reliability with the minimal hardware overhead. FEHM (Flexible Error Handling Module) - supports DMR and TMR schemes on specific target architectures.
set up - compare to TMR and Clustering PE include FEHM - supports DMR and
TMR schemes with modified FEHM - considering power
- area-overhead (implement FEHM) - cant apply to data intensive application result
- area : 12.8% decrease compare to TMR - power : 1.6~18.6% decrease compare to TMR
[Schweizer, 2012]
to resolve previous ([Schweizer, 2011]) limitation, multiple contexts to be mapped on CGRA by using the concept of temporal Redundancy
set up
- mapping to CGRA used FFT application - estimate required area as context memory is increased - Estimate time between read and store
- enable to apply permanent, transient, timing fault - enable to apply any application
- performance degradation compare to TMR (increase context) - still exits area-overhead
result - area : 31% decrease compare to TMR - performance : NR/TMR 26%/12% decrease
[K. Singh, 2006] Selective apply combined scheme to code that can cause a soft error.
set up - reliability is the percentage of time that FT matrix multiplication can run on raw architecture without system reset
- no area-overhead (software based technique)
- limited RAW architecture - performance overhead - not consider TMR voting
result - reliability : 89.2%(108 out of 1000 reset)
[Lee, 2010]
replication & voter is implemented on PE to reduce area overhead; to reduce the critical path, add conditional execution & column wide bus; thermal impact optimal mapping on PE is proposed.
set up
- compare base arch with proposed arch based on RTL - ACS(time consuming operations) module in viterbi decoder map CGRA
- software technique - area-efficiency (minimal hardware overhead)
- performance overhead (replication & voter map on PE)
result - area : 12% increase compare to base - performance : decrease TMR(700%)/DMR (167%)
Soft Error : hardware protection software protection
HW based techniques - area overhead
2016-04-02 35
Related works
used PE unused PE
PE0 PE1
PE4 PE5
PE2 PE3
PE6 PE7
PE8 PE9
PE12 PE13
PE10 PE11
PE14 PE15
T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.
HW based techniques - area overhead
2016-04-02 36
Related works
used PE unused PE
PE0 PE1
PE4 PE5
PE2 PE3
PE6 PE7
PE8 PE9
PE12 PE13
PE10 PE11
PE14 PE15
FEHM
PE4 PE8 PE12
TMR
PE0 PE4
EDM
DMR
T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.
HW based techniques - area overhead
2016-04-02 37
Related works
used PE unused PE
PE0 PE1
PE4 PE5
PE2 PE3
PE6 PE7
PE8 PE9
PE12 PE13
PE10 PE11
PE14 PE15
FEHM
PE4 PE8 PE12
TMR
PE0 PE4
EDM
DMR
T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.
Module 1
Module 2
Module 3
Input Voter Output
DMR (Dual Modular Redundancy) TMR (Triple Modular Redundancy)
HW based techniques - area overhead
2016-04-02 38
Related works
used PE unused PE
PE0 PE1
PE4 PE5
PE2 PE3
PE6 PE7
PE8 PE9
PE12 PE13
PE10 PE11
PE14 PE15
FEHM
PE4 PE8 PE12
TMR
PE0 PE4
EDM
DMR
FEHM PE-cluster
PE8
PE west
PE8 PE12
FEHM
PE east PE mid
T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.
Module 1
Module 2
Module 3
Input Voter Output
DMR (Dual Modular Redundancy) TMR (Triple Modular Redundancy)
100
120
East PE Central PE West PE Cluster
Rela
tive
Are
a [B
ase
: St
d PE
] (%
)
Standard PE
DoubleFU PE
TripleFU PE
FEHM PE-Cluster
HW based techniques - area overhead
2016-04-02 39
Related works
T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.
FEHM
FU FU FU
TMR
FU FU
EDM
DMR
Cluster
FU
PE west
FU FU
FEHM
PE east PE mid
FU
No Protection
100
120
East PE Central PE West PE Cluster
Rela
tive
Are
a [B
ase
: St
d PE
] (%
)
Standard PE
DoubleFU PE
TripleFU PE
FEHM PE-Cluster
HW based techniques - area overhead
2016-04-02 40
Related works
T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.
22% area overhead
FEHM
FU FU FU
TMR
FU FU
EDM
DMR
Cluster
FU
PE west
FU FU
FEHM
PE east PE mid
FU
No Protection
SW based techniques - performance overhead
2016-04-02 41
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
Module 1
Module 2
Module 3
Input Voter Output
TMR (Triple Modular Redundancy)
DMR (Dual Modular Redundancy) Module 1
Module 2
Input Comparator Output
TMR voter implementation
DMR comparator implementation
SW based techniques - performance overhead
2016-04-02 42
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
0
2
4
6
8
Base DMR TMR
Nor
mal
ized
Run
tim
e to
Bas
e
Techniques
Performance Overhead of TMR/DMR
SW based techniques - performance overhead
2016-04-02 43
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
0
2
4
6
8
Base DMR TMR
Nor
mal
ized
Run
tim
e to
Bas
e
Techniques
Performance Overhead of TMR/DMR
167%
700%
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
SW based techniques - performance overhead
2016-04-02 44
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
DFG is generated for loop kernel
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
SW based techniques - performance overhead
2016-04-02 45
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
DFG is generated for loop kernel
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
SW based techniques - performance overhead
2016-04-02 46
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
DFG is generated for loop kernel
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
SW based techniques - performance overhead
2016-04-02 47
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
DFG is generated for loop kernel
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
SW based techniques - performance overhead
2016-04-02 48
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
DFG is generated for loop kernel
SW based techniques - performance overhead
2016-04-02 49
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
: Store : Load : Subtraction : Compare : Logical AND : Addition : Voter
SW based techniques - performance overhead
2016-04-02 50
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
: Store : Load : Subtraction : Compare : Logical AND : Addition : Voter
SW based techniques - performance overhead
2016-04-02 51
Related works
G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …
: Store : Load : Subtraction : Compare : Logical AND : Addition : Voter
2016-04-02
3. Problem definition
Significantly expensive voting mechanism
2016-04-02 53
Problem definition
Significantly expensive voting mechanism
2016-04-02 54
Problem definition
38%
Triplication
Significantly expensive voting mechanism
2016-04-02 55
Problem definition
38%
62%
Triplication
Voting
2016-04-02
4. Thesis proposal: Selective validation for efficient protection
Selective validation : reduce expensive voting
2016-04-02 57
Thesis proposal: Selective validation for efficient protection … for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …
Suppose that memory of CGRA is protected against errors – Traditional protection methodologies for memory are inexpensive as
compared to maintaining execution cores
Do not replicate the memory operation Synchronization point where corrupt data can propagate
and cause failure – erroneous store operations can eventually cause incorrect outputs
The program will be executed correctly if corrupted data is not stored in the main memory
Selective validation : reduce expensive voting
2016-04-02 58
Thesis proposal: Selective validation for efficient protection
TMR with full voting – Voting at every operation (except
memory operation)
TMR with selective voting – Voting at operation just before store – Memory is protected by ECC or Parity – Corrupt data in memory failure
Base TMR with full voting TMR with selective voting
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …
Selective validation : reduce expensive voting
2016-04-02 59
Thesis proposal: Selective validation for efficient protection
TMR with full voting – Voting at every operation (except
memory operation)
TMR with selective voting – Voting at operation just before store – Memory is protected by ECC or Parity – Corrupt data in memory failure
Base TMR with full voting TMR with selective voting
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …
Ex) Reduce expensive voting
2016-04-02 60
Thesis proposal: Selective validation for efficient protection
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 61
Thesis proposal: Selective validation for efficient protection
100
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 62
Thesis proposal: Selective validation for efficient protection
300
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 63
Thesis proposal: Selective validation for efficient protection
700
300
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 64
Thesis proposal: Selective validation for efficient protection
700
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 65
Thesis proposal: Selective validation for efficient protection
Total : 2000 Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 66
Thesis proposal: Selective validation for efficient protection
Total : 2000
300
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 67
Thesis proposal: Selective validation for efficient protection
Total : 2000
0
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 68
Thesis proposal: Selective validation for efficient protection
Total : 2000
300
700
Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 69
Thesis proposal: Selective validation for efficient protection
Total : 2000 Total : 1300 Base TMR with full voting TMR with selective voting
Ex) Reduce expensive voting
2016-04-02 70
Thesis proposal: Selective validation for efficient protection
Total : 2000 Total : 1300 Base TMR with full voting TMR with selective voting
35% improvement
Ex) Reduce expensive voting
2016-04-02 71
Thesis proposal: Selective validation for efficient protection
Base TMR with full voting TMR with selective voting
TMR with the selective voting is more efficient than TMR with full voting in terms of performance
Optimization by reducing # of store operations
72
Thesis proposal: Selective validation for efficient protection
Our optimization to reduce # of votings – Reduce the # of store operations by merging store
operations with add and shift operations
0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙
a[0] a[1]
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …
2016-04-02
Optimization by reducing # of store operations
73
Thesis proposal: Selective validation for efficient protection
Our optimization to reduce # of votings – Reduce the # of store operations by merging store
operations with add and shift operations
0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙
a[0] a[1]
0x1200 0x34 ∙ ∙ ∙
shift sizeof(a[i])
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …
2016-04-02
Optimization by reducing # of store operations
74
Thesis proposal: Selective validation for efficient protection
Our optimization to reduce # of votings – Reduce the # of store operations by merging store
operations with add and shift operations
0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙
a[0] a[1]
2 byte
0x1200 0x34 ∙ ∙ ∙
0x1200+0x34 ∙ ∙ ∙ ∙ ∙ ∙
shift sizeof(a[i])
add a[i+1]
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …
2016-04-02
Optimization by reducing # of store operations
75
Thesis proposal: Selective validation for efficient protection
Our optimization to reduce # of votings – Reduce the # of store operations by merging store
operations with add and shift operations
0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙
a[0] a[1]
2 byte
0x1200 0x34 ∙ ∙ ∙
0x1200+0x34 ∙ ∙ ∙ ∙ ∙ ∙
0x1234 ∙ ∙ ∙ ∙ ∙ ∙
shift sizeof(a[i])
add a[i+1]
store a[i], a[i+1]
… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …
2016-04-02
2016-04-02
5. Experiments
Experimental setup
2016-04-02 77
Experiments
Compiler / Modular
Scheduler Benchmarks
DFG Generator
minimum II Runtime
CGRA Simulator
CGRA Configuration
initial II
compile and simulate each benchmark kernel with input parameters – Input parameters: DFG information, CGRA configuration, initial II
DFG information : generated by DFG Generator CGRA configuration : 4 X 4 CGRA, Local register II : Initiation Interval
Benchmarks : SPEC 2000, OpenCV Scheduler : modulo scheduling (the cost-based scheduling algorithm)
0
2
4
6
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)
Performance improvement w/ selective voting
2016-04-02 78
Experiments
0
2
4
6
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)
Performance improvement w/ selective voting
2016-04-02 79
Experiments
0
2
4
6
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)
Performance improvement w/ selective voting
2016-04-02 80
Experiments
59.6%
0
2
4
6
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)
Performance improvement w/ selective voting
2016-04-02 81
Experiments
59.6%
40.9%
0
2
4
6
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)
Performance improvement w/ selective voting
2016-04-02 82
Experiments
59.6%
40.9%
Our selective validation for TMR can improve the performance
0
1
2
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Runtime Evaluation of Selective Validation for DMR DMR (Full Comparison) Our Technique (Selective Comparison)
Performance improvement w/ selective voting
2016-04-02 83
Experiments
0
1
2
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Runtime Evaluation of Selective Validation for DMR DMR (Full Comparison) Our Technique (Selective Comparison)
Performance improvement w/ selective voting
2016-04-02 84
Experiments
20.0%
9.5%
Performance optimization w/ store reduction
2016-04-02 85
Experiments
0
2
4
6
Nor
mal
ized
Run
time
to B
ase
Benchmarks
Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization
Performance optimization w/ store reduction
2016-04-02 86
Experiments
0
2
4
6
Nor
mal
ized
Run
time
to B
ase
Benchmarks
Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization
59.9%
Performance optimization w/ store reduction
2016-04-02 87
Experiments
0
2
4
6
Nor
mal
ized
Run
time
to B
ase
Benchmarks
Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization
59.9%
45.2%
Performance optimization w/ store reduction
2016-04-02 88
Experiments
0
2
4
6
Nor
mal
ized
Run
time
to B
ase
Benchmarks
Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization
59.9%
45.2%
Our optimization with store reduction can further improve the performance as compared to our selective validation
0
1
2
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Performance Improvement of Optimization for DMR DMR (Full Comparison) Our Technique (Selective Comparison) Our Optimization
Performance optimization w/ store reduction
2016-04-02 89
Experiments
0
1
2
Nor
mal
ized
Run
time
to th
e B
ase
Benchmarks
Performance Improvement of Optimization for DMR DMR (Full Comparison) Our Technique (Selective Comparison) Our Optimization
Performance optimization w/ store reduction
2016-04-02 90
Experiments
25.2%
16.9%
Energy consumption w/ optimization
2016-04-02 91
Experiments
0
1
2
Nor
mal
ized
Ene
rgy
Con
sum
ptio
n to
the
Bas
e
Benchmarks
Energy Consumption of Optimization for TMR TMR (Full Voting) Our Technique(Selective Voting) Our Optimization
Energy consumption w/ optimization
2016-04-02 92
Experiments
0
1
2
Nor
mal
ized
Ene
rgy
Con
sum
ptio
n to
the
Bas
e
Benchmarks
Energy Consumption of Optimization for TMR TMR (Full Voting) Our Technique(Selective Voting) Our Optimization
33.7%
Energy consumption w/ optimization
2016-04-02 93
Experiments
0
1
2
Nor
mal
ized
Ene
rgy
Con
sum
ptio
n to
the
Bas
e
Benchmarks
Energy Consumption of Optimization for TMR TMR (Full Voting) Our Technique(Selective Voting) Our Optimization
26.1% 33.7%
Conclusions
2016-04-02 94
Conclusions
Reliability is the critical design concern CGRA becomes more popular Previous proposed techniques have limitations in
terms of area and performance Software-based selective validation techniques
– Validation at operation just before store Performance improvement
– TMR/DMR with selective validation : 40.9%/9.5% – TMR/DMR with selective validation + optimization :
45.2%/16.9% ※ Previous proposed DMR/TMR techniques : 167%/700%
Energy consumption savings – TMR/DMR with selective validation + optimization :
26.1%/13.8%
Future work
2016-04-02 95
Continue energy consumption
Design space exploration – Tradeoff among reliability, energy consumption and
performance
Further optimization
– CGRA scheduling algorithm
Q & A
2016-04-02 96
Publication
2016-04-02 97
Paper accept – Selective validations for efficient protections on
coarse-grained reconfigurable architectures – Application-specific Systems, Architectures and
Processors(ASAP) – The 24th IEEE International Conference, June 5-7
2013
Plan to submit extended version journal(TECS) – Energy consumption estimation – Fault coverage
Plan to submit a paper KCC Conference
Reference
2016-04-02 98
Baumann et al. “Determining the Impact of Alpha-Particle-Emitting Contamination From the Fukushima-Daiichi Disaster on Japanese Semiconductor Manufacturing Sites” 2012 RADECS.
Toshinori et al. “Radiation Effect Mitigation Methods for Electronic Systems” 2012 IEEE SII . Jafri et al. “Design of a faulttolerant coarse-grained reconfigurable architecture: a case study,”
in ISQED, 2010. Alnajiar el al. “Coarse-grained dynamically reconfigurable architecture with flexible reliability,”
in FPL, 2009. Schweizer el al. “Using run-time reconfiguration to implement fault-tolerant coarse grained
reconfigurable architectures,” in IPDPSW, 2012. Schweizer el al “Low-cost TMR for fault-tolerance on coarse-grained reconfigurable
architectures,” in ReConFig, 2011. Reis el al. “SWIFT: Software implemented fault tolerance,” in CGO, 2005. K.Lee el al. “Mitigating the impact of hardware defects on multimedia applications: a cross-
layer approach,” in ACM Multimedia, 2008. G. Lee et al. “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable
array architecture,” in AHS, 2010. G. Bradski, “The OpenCV library,” Doctor Dobbs Journal, 2000. J. L. Henning, “SPEC CPU2000: Measuring CPU performance in the new millennium,” Computer,
2000. http://technology-and-science.lawyers.com/blogs/archives/4461-Are-Cosmic-Rays-a-Factor-in-
Toyota-Acceleration-Problems.html http://www.chipdesignmag.com/payne/2010/04/14/alpha-particles-dram-seu-toyota-
acceleration/