fault tolerant techniques on cgra - yonsei

98
Fault Tolerant Techniques on CGRA Jihoon Kang http://dclab.yonsei.ac.kr 2016-04-02 Dependable Computing Lab Department of Computer Science Yonsei University Committee Kyoungwoo Lee Bernd Burgstaller Yosub Han Preliminary Oral Examination for Master Degree April 22 nd , 2013

Upload: others

Post on 15-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fault Tolerant Techniques on CGRA - Yonsei

Fault Tolerant Techniques on CGRA

Jihoon Kang

http://dclab.yonsei.ac.kr 2016-04-02

Dependable Computing Lab Department of Computer Science

Yonsei University

Committee Kyoungwoo Lee

Bernd Burgstaller Yosub Han

Preliminary Oral Examination for Master Degree

April 22nd, 2013

Page 2: Fault Tolerant Techniques on CGRA - Yonsei

Contents

Motivation Related works Problem definition Thesis proposal

– Selective validation for efficient protection

Experiments Conclusions

2016-04-02 2

Page 3: Fault Tolerant Techniques on CGRA - Yonsei

2016-04-02

1. Motivation

Page 4: Fault Tolerant Techniques on CGRA - Yonsei

Reliability is important

2016-04-02 4

Motivation

Computer system doesn’t work properly because of soft error

Soft errors can cause significant financial loss

Page 5: Fault Tolerant Techniques on CGRA - Yonsei

Reliability is important

2016-04-02 5

Motivation

Soft errors can cause significant financial loss

Japanese and London stock market were halts

Page 6: Fault Tolerant Techniques on CGRA - Yonsei

Reliability is important

2016-04-02 6

Motivation

Soft errors can cause significant financial loss

Japanese and London stock market were halts

financial loss is quite large

Page 7: Fault Tolerant Techniques on CGRA - Yonsei

Reliability is important

2016-04-02 7

Motivation

- Reliability is directly connected our life - Vulnerable reliability

Unintended acceleration

Reliability is related to our life

Page 8: Fault Tolerant Techniques on CGRA - Yonsei

Reliability is important

2016-04-02 8

Motivation

Unintended acceleration

Incorrect operation of implanted devices

Reliability is related to our life

Page 9: Fault Tolerant Techniques on CGRA - Yonsei

Reliability is important

2016-04-02 9

Motivation

Robot malfunctions

Fukushima nuclear accident

Robot malfunctions due to lots of radiation

Page 10: Fault Tolerant Techniques on CGRA - Yonsei

Reliability is important

2016-04-02 10

Motivation

- Soft error damaged financial losing - Reliability is directly connected our life

- Reliability is directly connected our life - Vulnerable reliability

Reliability is an important concern

Page 11: Fault Tolerant Techniques on CGRA - Yonsei

Soft error – an increasing concern Motivation

2016-04-02 11

Transistor

source drain

D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN

Temporary bit flip in a semiconductor device

0

Page 12: Fault Tolerant Techniques on CGRA - Yonsei

Soft error – an increasing concern Motivation

2016-04-02 12

Transistor

source drain

D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN

Temporary bit flip in a semiconductor device

0

Page 13: Fault Tolerant Techniques on CGRA - Yonsei

Soft error – an increasing concern Motivation

2016-04-02 13

+ - +

+ + - - -

Transistor

source drain

D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN

Temporary bit flip in a semiconductor device

0

Page 14: Fault Tolerant Techniques on CGRA - Yonsei

Soft error – an increasing concern Motivation

2016-04-02 14

+ - +

+ + - - -

Transistor

source drain

D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN

Temporary bit flip in a semiconductor device

1

Page 15: Fault Tolerant Techniques on CGRA - Yonsei

Soft error – an increasing concern Motivation

2016-04-02 15

+ - +

+ + - - -

Transistor

source drain

D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN

Temporary bit flip in a semiconductor device

0

Page 16: Fault Tolerant Techniques on CGRA - Yonsei

SER ∝ Nflux CS x exp Qcritical {- x

Qs }

Qcritical = Capacitance Voltage x where

Soft error – an increasing concern Motivation

2016-04-02 16

+ - +

+ + - - -

Transistor

source

1 0

D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN

Soft error rate – Is now 1 per year – Exponentially increases

with technology scaling

Page 17: Fault Tolerant Techniques on CGRA - Yonsei

SER ∝ Nflux CS x exp Qcritical {- x

Qs }

Qcritical = Capacitance Voltage x where

Soft error – an increasing concern Motivation

2016-04-02 17

+ - +

+ + - - -

Transistor

source

1 0

D. Shivakumar et al. “Modeling the effect of technology trends on the soft error rate of combinational logic” 2002 DSN

Soft errors are becoming a critical design concern

Soft error rate – Is now 1 per year – Exponentially increases

with technology scaling

Page 18: Fault Tolerant Techniques on CGRA - Yonsei

CGRA Architecture Motivation

2016-04-02 18

Emerging architecture – High performance, flexibility, low power

PE PE

PE PE

PE PE

PE PE

PE PE

PE PE

PE PE

PE PE

Configuration memory

PE array

Local mem

ory

Main m

emory

Page 19: Fault Tolerant Techniques on CGRA - Yonsei

CGRA Architecture Motivation

2016-04-02 19

Emerging architecture – High performance, flexibility, low power

PE PE

PE PE

PE PE

PE PE

PE PE

PE PE

PE PE

PE PE

Configuration memory

PE array

Local mem

ory

Main m

emory

FU RF

Output register

From neighbor

To neighbor

Page 20: Fault Tolerant Techniques on CGRA - Yonsei

Advantages of CGRA Motivation

2016-04-02 20

Specific kernels in a thread can be power/performance critical

Processor

Program thread

Kernel to accelerate

A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP

Page 21: Fault Tolerant Techniques on CGRA - Yonsei

Advantages of CGRA Motivation

2016-04-02 21

Processor

A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP

Specific kernels in a thread can be power/performance critical

Page 22: Fault Tolerant Techniques on CGRA - Yonsei

Advantages of CGRA Motivation

2016-04-02 22

Processor

co-processor

A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP

Specific kernels in a thread can be power/performance critical

Page 23: Fault Tolerant Techniques on CGRA - Yonsei

Advantages of CGRA Motivation

2016-04-02 23

Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)

Processor

co-processor

A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP

Page 24: Fault Tolerant Techniques on CGRA - Yonsei

Advantages of CGRA Motivation

2016-04-02 24

Processor

co-processor

A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP

Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)

Page 25: Fault Tolerant Techniques on CGRA - Yonsei

Advantages of CGRA Motivation

2016-04-02 25

Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)

– Power consuming processor execution is saved – Better performance of thread is realized – Overall throughput is increased

Processor

co-processor

A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP

Page 26: Fault Tolerant Techniques on CGRA - Yonsei

Advantages of CGRA Motivation

2016-04-02 26

Specific kernels in a thread can be power/performance critical The kernel can be mapped and scheduled for execution on the CGRA Using the CGRA as a co-processor (accelerator)

– Power consuming processor execution is saved – Better performance of thread is realized – Overall throughput is increased

Processor

co-processor

A. Shrivastava et al. “Enabling Multithreading on CGRAs” 2011 ICPP

CGRA becomes more and more popular thanks to high performance, flexibility and low power

Page 27: Fault Tolerant Techniques on CGRA - Yonsei

Popular CGRA usage for reliability concern Motivation

2016-04-02 27

https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;

CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue

Page 28: Fault Tolerant Techniques on CGRA - Yonsei

Popular CGRA usage for reliability concern Motivation

2016-04-02 28

https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;

CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue

Vehicle engine control unit

Page 29: Fault Tolerant Techniques on CGRA - Yonsei

Popular CGRA usage for reliability concern Motivation

2016-04-02 29

https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;

CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue

Vehicle engine control unit

Airplane control unit

Page 30: Fault Tolerant Techniques on CGRA - Yonsei

Popular CGRA usage for reliability concern Motivation

2016-04-02 30

https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;

Vehicle engine control unit

Airplane control unit Data server system

CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue

Page 31: Fault Tolerant Techniques on CGRA - Yonsei

Popular CGRA usage for reliability concern Motivation

2016-04-02 31

https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;

CGRA can be used widely close to humans – Reliability in CGRA is becoming an important issue

Vehicle engine control unit

Airplane control unit Data server system

Medical device

Page 32: Fault Tolerant Techniques on CGRA - Yonsei

Popular CGRA usage for reliability concern Motivation

2016-04-02 32

https://mocana.com/blog/tag/medical-device/; www.sustainabletechnolog.com; www.kia.co.nz; www.woodward.com;

Soft error occurrence in various embedded systems can cause catastrophic results

Page 33: Fault Tolerant Techniques on CGRA - Yonsei

2016-04-02

2. Related works

Page 34: Fault Tolerant Techniques on CGRA - Yonsei

Previously proposed techniques for CGRAs

2016-04-02 34

Related works

Paper Key Idea Experiment drawback Comment

[Alnajiar, 2009]

dynamic operation modes in CGRA architecture to provide the various levels of reliability under the performance constraint.

set up - compare to the number of gate (using tool : RTL) - flexible protection

mechanism

- area-overhead (implement voter) - performance degradation (cant use resource fully) result - area : 26.6% increase

[jafri, 2010] self-checking residue mode for multiplication and addition operations on DART architecture

set up - compare to original FU, DMR FU and self-checking FU (using STMicroelectronics 130 nm) - less area-overhead

compare to DMR

- area-overhead (implement self-checking) - performance degradation - only detection - not cover specific fault result

- area : 18% decrease compare to DMR - performance : 400% decrease compare to DMR

[Schweizer, 2011]

exploiting unused FUs for replications to increase the reliability with the minimal hardware overhead. FEHM (Flexible Error Handling Module) - supports DMR and TMR schemes on specific target architectures.

set up - compare to TMR and Clustering PE include FEHM - supports DMR and

TMR schemes with modified FEHM - considering power

- area-overhead (implement FEHM) - cant apply to data intensive application result

- area : 12.8% decrease compare to TMR - power : 1.6~18.6% decrease compare to TMR

[Schweizer, 2012]

to resolve previous ([Schweizer, 2011]) limitation, multiple contexts to be mapped on CGRA by using the concept of temporal Redundancy

set up

- mapping to CGRA used FFT application - estimate required area as context memory is increased - Estimate time between read and store

- enable to apply permanent, transient, timing fault - enable to apply any application

- performance degradation compare to TMR (increase context) - still exits area-overhead

result - area : 31% decrease compare to TMR - performance : NR/TMR 26%/12% decrease

[K. Singh, 2006] Selective apply combined scheme to code that can cause a soft error.

set up - reliability is the percentage of time that FT matrix multiplication can run on raw architecture without system reset

- no area-overhead (software based technique)

- limited RAW architecture - performance overhead - not consider TMR voting

result - reliability : 89.2%(108 out of 1000 reset)

[Lee, 2010]

replication & voter is implemented on PE to reduce area overhead; to reduce the critical path, add conditional execution & column wide bus; thermal impact optimal mapping on PE is proposed.

set up

- compare base arch with proposed arch based on RTL - ACS(time consuming operations) module in viterbi decoder map CGRA

- software technique - area-efficiency (minimal hardware overhead)

- performance overhead (replication & voter map on PE)

result - area : 12% increase compare to base - performance : decrease TMR(700%)/DMR (167%)

Soft Error : hardware protection software protection

Page 35: Fault Tolerant Techniques on CGRA - Yonsei

HW based techniques - area overhead

2016-04-02 35

Related works

used PE unused PE

PE0 PE1

PE4 PE5

PE2 PE3

PE6 PE7

PE8 PE9

PE12 PE13

PE10 PE11

PE14 PE15

T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.

Page 36: Fault Tolerant Techniques on CGRA - Yonsei

HW based techniques - area overhead

2016-04-02 36

Related works

used PE unused PE

PE0 PE1

PE4 PE5

PE2 PE3

PE6 PE7

PE8 PE9

PE12 PE13

PE10 PE11

PE14 PE15

FEHM

PE4 PE8 PE12

TMR

PE0 PE4

EDM

DMR

T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.

Page 37: Fault Tolerant Techniques on CGRA - Yonsei

HW based techniques - area overhead

2016-04-02 37

Related works

used PE unused PE

PE0 PE1

PE4 PE5

PE2 PE3

PE6 PE7

PE8 PE9

PE12 PE13

PE10 PE11

PE14 PE15

FEHM

PE4 PE8 PE12

TMR

PE0 PE4

EDM

DMR

T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.

Module 1

Module 2

Module 3

Input Voter Output

DMR (Dual Modular Redundancy) TMR (Triple Modular Redundancy)

Page 38: Fault Tolerant Techniques on CGRA - Yonsei

HW based techniques - area overhead

2016-04-02 38

Related works

used PE unused PE

PE0 PE1

PE4 PE5

PE2 PE3

PE6 PE7

PE8 PE9

PE12 PE13

PE10 PE11

PE14 PE15

FEHM

PE4 PE8 PE12

TMR

PE0 PE4

EDM

DMR

FEHM PE-cluster

PE8

PE west

PE8 PE12

FEHM

PE east PE mid

T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.

Module 1

Module 2

Module 3

Input Voter Output

DMR (Dual Modular Redundancy) TMR (Triple Modular Redundancy)

Page 39: Fault Tolerant Techniques on CGRA - Yonsei

100

120

East PE Central PE West PE Cluster

Rela

tive

Are

a [B

ase

: St

d PE

] (%

)

Standard PE

DoubleFU PE

TripleFU PE

FEHM PE-Cluster

HW based techniques - area overhead

2016-04-02 39

Related works

T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.

FEHM

FU FU FU

TMR

FU FU

EDM

DMR

Cluster

FU

PE west

FU FU

FEHM

PE east PE mid

FU

No Protection

Page 40: Fault Tolerant Techniques on CGRA - Yonsei

100

120

East PE Central PE West PE Cluster

Rela

tive

Are

a [B

ase

: St

d PE

] (%

)

Standard PE

DoubleFU PE

TripleFU PE

FEHM PE-Cluster

HW based techniques - area overhead

2016-04-02 40

Related works

T. Schweizer, A. Kuster, S. Eisenhardt, T. Kuhn, and W. Rosenstiel, “Using run-time reconfiguration to implement fault-tolerant coarse grained reconfigurable architectures,” in IPDPSW, 2012.

22% area overhead

FEHM

FU FU FU

TMR

FU FU

EDM

DMR

Cluster

FU

PE west

FU FU

FEHM

PE east PE mid

FU

No Protection

Page 41: Fault Tolerant Techniques on CGRA - Yonsei

SW based techniques - performance overhead

2016-04-02 41

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

Module 1

Module 2

Module 3

Input Voter Output

TMR (Triple Modular Redundancy)

DMR (Dual Modular Redundancy) Module 1

Module 2

Input Comparator Output

TMR voter implementation

DMR comparator implementation

Page 42: Fault Tolerant Techniques on CGRA - Yonsei

SW based techniques - performance overhead

2016-04-02 42

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

0

2

4

6

8

Base DMR TMR

Nor

mal

ized

Run

tim

e to

Bas

e

Techniques

Performance Overhead of TMR/DMR

Page 43: Fault Tolerant Techniques on CGRA - Yonsei

SW based techniques - performance overhead

2016-04-02 43

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

0

2

4

6

8

Base DMR TMR

Nor

mal

ized

Run

tim

e to

Bas

e

Techniques

Performance Overhead of TMR/DMR

167%

700%

Page 44: Fault Tolerant Techniques on CGRA - Yonsei

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

SW based techniques - performance overhead

2016-04-02 44

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

DFG is generated for loop kernel

Page 45: Fault Tolerant Techniques on CGRA - Yonsei

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

SW based techniques - performance overhead

2016-04-02 45

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

DFG is generated for loop kernel

Page 46: Fault Tolerant Techniques on CGRA - Yonsei

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

SW based techniques - performance overhead

2016-04-02 46

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

DFG is generated for loop kernel

Page 47: Fault Tolerant Techniques on CGRA - Yonsei

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

SW based techniques - performance overhead

2016-04-02 47

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

DFG is generated for loop kernel

Page 48: Fault Tolerant Techniques on CGRA - Yonsei

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

SW based techniques - performance overhead

2016-04-02 48

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

DFG is generated for loop kernel

Page 49: Fault Tolerant Techniques on CGRA - Yonsei

SW based techniques - performance overhead

2016-04-02 49

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

: Store : Load : Subtraction : Compare : Logical AND : Addition : Voter

Page 50: Fault Tolerant Techniques on CGRA - Yonsei

SW based techniques - performance overhead

2016-04-02 50

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

: Store : Load : Subtraction : Compare : Logical AND : Addition : Voter

Page 51: Fault Tolerant Techniques on CGRA - Yonsei

SW based techniques - performance overhead

2016-04-02 51

Related works

G. Lee and K. Choi, “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable array architecture,” in AHS, 2010.

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) / Y; /* X and Y are constants */ } …

: Store : Load : Subtraction : Compare : Logical AND : Addition : Voter

Page 52: Fault Tolerant Techniques on CGRA - Yonsei

2016-04-02

3. Problem definition

Page 53: Fault Tolerant Techniques on CGRA - Yonsei

Significantly expensive voting mechanism

2016-04-02 53

Problem definition

Page 54: Fault Tolerant Techniques on CGRA - Yonsei

Significantly expensive voting mechanism

2016-04-02 54

Problem definition

38%

Triplication

Page 55: Fault Tolerant Techniques on CGRA - Yonsei

Significantly expensive voting mechanism

2016-04-02 55

Problem definition

38%

62%

Triplication

Voting

Page 56: Fault Tolerant Techniques on CGRA - Yonsei

2016-04-02

4. Thesis proposal: Selective validation for efficient protection

Page 57: Fault Tolerant Techniques on CGRA - Yonsei

Selective validation : reduce expensive voting

2016-04-02 57

Thesis proposal: Selective validation for efficient protection … for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …

Suppose that memory of CGRA is protected against errors – Traditional protection methodologies for memory are inexpensive as

compared to maintaining execution cores

Do not replicate the memory operation Synchronization point where corrupt data can propagate

and cause failure – erroneous store operations can eventually cause incorrect outputs

The program will be executed correctly if corrupted data is not stored in the main memory

Page 58: Fault Tolerant Techniques on CGRA - Yonsei

Selective validation : reduce expensive voting

2016-04-02 58

Thesis proposal: Selective validation for efficient protection

TMR with full voting – Voting at every operation (except

memory operation)

TMR with selective voting – Voting at operation just before store – Memory is protected by ECC or Parity – Corrupt data in memory failure

Base TMR with full voting TMR with selective voting

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …

Page 59: Fault Tolerant Techniques on CGRA - Yonsei

Selective validation : reduce expensive voting

2016-04-02 59

Thesis proposal: Selective validation for efficient protection

TMR with full voting – Voting at every operation (except

memory operation)

TMR with selective voting – Voting at operation just before store – Memory is protected by ECC or Parity – Corrupt data in memory failure

Base TMR with full voting TMR with selective voting

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …

Page 60: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 60

Thesis proposal: Selective validation for efficient protection

Base TMR with full voting TMR with selective voting

Page 61: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 61

Thesis proposal: Selective validation for efficient protection

100

Base TMR with full voting TMR with selective voting

Page 62: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 62

Thesis proposal: Selective validation for efficient protection

300

Base TMR with full voting TMR with selective voting

Page 63: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 63

Thesis proposal: Selective validation for efficient protection

700

300

Base TMR with full voting TMR with selective voting

Page 64: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 64

Thesis proposal: Selective validation for efficient protection

700

Base TMR with full voting TMR with selective voting

Page 65: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 65

Thesis proposal: Selective validation for efficient protection

Total : 2000 Base TMR with full voting TMR with selective voting

Page 66: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 66

Thesis proposal: Selective validation for efficient protection

Total : 2000

300

Base TMR with full voting TMR with selective voting

Page 67: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 67

Thesis proposal: Selective validation for efficient protection

Total : 2000

0

Base TMR with full voting TMR with selective voting

Page 68: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 68

Thesis proposal: Selective validation for efficient protection

Total : 2000

300

700

Base TMR with full voting TMR with selective voting

Page 69: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 69

Thesis proposal: Selective validation for efficient protection

Total : 2000 Total : 1300 Base TMR with full voting TMR with selective voting

Page 70: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 70

Thesis proposal: Selective validation for efficient protection

Total : 2000 Total : 1300 Base TMR with full voting TMR with selective voting

35% improvement

Page 71: Fault Tolerant Techniques on CGRA - Yonsei

Ex) Reduce expensive voting

2016-04-02 71

Thesis proposal: Selective validation for efficient protection

Base TMR with full voting TMR with selective voting

TMR with the selective voting is more efficient than TMR with full voting in terms of performance

Page 72: Fault Tolerant Techniques on CGRA - Yonsei

Optimization by reducing # of store operations

72

Thesis proposal: Selective validation for efficient protection

Our optimization to reduce # of votings – Reduce the # of store operations by merging store

operations with add and shift operations

0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙

a[0] a[1]

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …

2016-04-02

Page 73: Fault Tolerant Techniques on CGRA - Yonsei

Optimization by reducing # of store operations

73

Thesis proposal: Selective validation for efficient protection

Our optimization to reduce # of votings – Reduce the # of store operations by merging store

operations with add and shift operations

0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙

a[0] a[1]

0x1200 0x34 ∙ ∙ ∙

shift sizeof(a[i])

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …

2016-04-02

Page 74: Fault Tolerant Techniques on CGRA - Yonsei

Optimization by reducing # of store operations

74

Thesis proposal: Selective validation for efficient protection

Our optimization to reduce # of votings – Reduce the # of store operations by merging store

operations with add and shift operations

0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙

a[0] a[1]

2 byte

0x1200 0x34 ∙ ∙ ∙

0x1200+0x34 ∙ ∙ ∙ ∙ ∙ ∙

shift sizeof(a[i])

add a[i+1]

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …

2016-04-02

Page 75: Fault Tolerant Techniques on CGRA - Yonsei

Optimization by reducing # of store operations

75

Thesis proposal: Selective validation for efficient protection

Our optimization to reduce # of votings – Reduce the # of store operations by merging store

operations with add and shift operations

0x12 0x34 ∙ ∙ ∙ ∙ ∙ ∙

a[0] a[1]

2 byte

0x1200 0x34 ∙ ∙ ∙

0x1200+0x34 ∙ ∙ ∙ ∙ ∙ ∙

0x1234 ∙ ∙ ∙ ∙ ∙ ∙

shift sizeof(a[i])

add a[i+1]

store a[i], a[i+1]

… for ( i = 0; i < iteration; i ++) { a[i] = ( b[i] – X ) Y; /* X and Y are constants */ } …

2016-04-02

Page 76: Fault Tolerant Techniques on CGRA - Yonsei

2016-04-02

5. Experiments

Page 77: Fault Tolerant Techniques on CGRA - Yonsei

Experimental setup

2016-04-02 77

Experiments

Compiler / Modular

Scheduler Benchmarks

DFG Generator

minimum II Runtime

CGRA Simulator

CGRA Configuration

initial II

compile and simulate each benchmark kernel with input parameters – Input parameters: DFG information, CGRA configuration, initial II

DFG information : generated by DFG Generator CGRA configuration : 4 X 4 CGRA, Local register II : Initiation Interval

Benchmarks : SPEC 2000, OpenCV Scheduler : modulo scheduling (the cost-based scheduling algorithm)

Page 78: Fault Tolerant Techniques on CGRA - Yonsei

0

2

4

6

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)

Performance improvement w/ selective voting

2016-04-02 78

Experiments

Page 79: Fault Tolerant Techniques on CGRA - Yonsei

0

2

4

6

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)

Performance improvement w/ selective voting

2016-04-02 79

Experiments

Page 80: Fault Tolerant Techniques on CGRA - Yonsei

0

2

4

6

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)

Performance improvement w/ selective voting

2016-04-02 80

Experiments

59.6%

Page 81: Fault Tolerant Techniques on CGRA - Yonsei

0

2

4

6

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)

Performance improvement w/ selective voting

2016-04-02 81

Experiments

59.6%

40.9%

Page 82: Fault Tolerant Techniques on CGRA - Yonsei

0

2

4

6

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Runtime Evaluation of Selective Validation for TMR TMR (Full Voting) Our Technique (Selective Voting)

Performance improvement w/ selective voting

2016-04-02 82

Experiments

59.6%

40.9%

Our selective validation for TMR can improve the performance

Page 83: Fault Tolerant Techniques on CGRA - Yonsei

0

1

2

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Runtime Evaluation of Selective Validation for DMR DMR (Full Comparison) Our Technique (Selective Comparison)

Performance improvement w/ selective voting

2016-04-02 83

Experiments

Page 84: Fault Tolerant Techniques on CGRA - Yonsei

0

1

2

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Runtime Evaluation of Selective Validation for DMR DMR (Full Comparison) Our Technique (Selective Comparison)

Performance improvement w/ selective voting

2016-04-02 84

Experiments

20.0%

9.5%

Page 85: Fault Tolerant Techniques on CGRA - Yonsei

Performance optimization w/ store reduction

2016-04-02 85

Experiments

0

2

4

6

Nor

mal

ized

Run

time

to B

ase

Benchmarks

Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization

Page 86: Fault Tolerant Techniques on CGRA - Yonsei

Performance optimization w/ store reduction

2016-04-02 86

Experiments

0

2

4

6

Nor

mal

ized

Run

time

to B

ase

Benchmarks

Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization

59.9%

Page 87: Fault Tolerant Techniques on CGRA - Yonsei

Performance optimization w/ store reduction

2016-04-02 87

Experiments

0

2

4

6

Nor

mal

ized

Run

time

to B

ase

Benchmarks

Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization

59.9%

45.2%

Page 88: Fault Tolerant Techniques on CGRA - Yonsei

Performance optimization w/ store reduction

2016-04-02 88

Experiments

0

2

4

6

Nor

mal

ized

Run

time

to B

ase

Benchmarks

Performance Improvement of Optimization for TMR TMR (Full Voting) Our Technique (Selective Voting) Our Optimization

59.9%

45.2%

Our optimization with store reduction can further improve the performance as compared to our selective validation

Page 89: Fault Tolerant Techniques on CGRA - Yonsei

0

1

2

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Performance Improvement of Optimization for DMR DMR (Full Comparison) Our Technique (Selective Comparison) Our Optimization

Performance optimization w/ store reduction

2016-04-02 89

Experiments

Page 90: Fault Tolerant Techniques on CGRA - Yonsei

0

1

2

Nor

mal

ized

Run

time

to th

e B

ase

Benchmarks

Performance Improvement of Optimization for DMR DMR (Full Comparison) Our Technique (Selective Comparison) Our Optimization

Performance optimization w/ store reduction

2016-04-02 90

Experiments

25.2%

16.9%

Page 91: Fault Tolerant Techniques on CGRA - Yonsei

Energy consumption w/ optimization

2016-04-02 91

Experiments

0

1

2

Nor

mal

ized

Ene

rgy

Con

sum

ptio

n to

the

Bas

e

Benchmarks

Energy Consumption of Optimization for TMR TMR (Full Voting) Our Technique(Selective Voting) Our Optimization

Page 92: Fault Tolerant Techniques on CGRA - Yonsei

Energy consumption w/ optimization

2016-04-02 92

Experiments

0

1

2

Nor

mal

ized

Ene

rgy

Con

sum

ptio

n to

the

Bas

e

Benchmarks

Energy Consumption of Optimization for TMR TMR (Full Voting) Our Technique(Selective Voting) Our Optimization

33.7%

Page 93: Fault Tolerant Techniques on CGRA - Yonsei

Energy consumption w/ optimization

2016-04-02 93

Experiments

0

1

2

Nor

mal

ized

Ene

rgy

Con

sum

ptio

n to

the

Bas

e

Benchmarks

Energy Consumption of Optimization for TMR TMR (Full Voting) Our Technique(Selective Voting) Our Optimization

26.1% 33.7%

Page 94: Fault Tolerant Techniques on CGRA - Yonsei

Conclusions

2016-04-02 94

Conclusions

Reliability is the critical design concern CGRA becomes more popular Previous proposed techniques have limitations in

terms of area and performance Software-based selective validation techniques

– Validation at operation just before store Performance improvement

– TMR/DMR with selective validation : 40.9%/9.5% – TMR/DMR with selective validation + optimization :

45.2%/16.9% ※ Previous proposed DMR/TMR techniques : 167%/700%

Energy consumption savings – TMR/DMR with selective validation + optimization :

26.1%/13.8%

Page 95: Fault Tolerant Techniques on CGRA - Yonsei

Future work

2016-04-02 95

Continue energy consumption

Design space exploration – Tradeoff among reliability, energy consumption and

performance

Further optimization

– CGRA scheduling algorithm

Page 96: Fault Tolerant Techniques on CGRA - Yonsei

Q & A

2016-04-02 96

Page 97: Fault Tolerant Techniques on CGRA - Yonsei

Publication

2016-04-02 97

Paper accept – Selective validations for efficient protections on

coarse-grained reconfigurable architectures – Application-specific Systems, Architectures and

Processors(ASAP) – The 24th IEEE International Conference, June 5-7

2013

Plan to submit extended version journal(TECS) – Energy consumption estimation – Fault coverage

Plan to submit a paper KCC Conference

Page 98: Fault Tolerant Techniques on CGRA - Yonsei

Reference

2016-04-02 98

Baumann et al. “Determining the Impact of Alpha-Particle-Emitting Contamination From the Fukushima-Daiichi Disaster on Japanese Semiconductor Manufacturing Sites” 2012 RADECS.

Toshinori et al. “Radiation Effect Mitigation Methods for Electronic Systems” 2012 IEEE SII . Jafri et al. “Design of a faulttolerant coarse-grained reconfigurable architecture: a case study,”

in ISQED, 2010. Alnajiar el al. “Coarse-grained dynamically reconfigurable architecture with flexible reliability,”

in FPL, 2009. Schweizer el al. “Using run-time reconfiguration to implement fault-tolerant coarse grained

reconfigurable architectures,” in IPDPSW, 2012. Schweizer el al “Low-cost TMR for fault-tolerance on coarse-grained reconfigurable

architectures,” in ReConFig, 2011. Reis el al. “SWIFT: Software implemented fault tolerance,” in CGO, 2005. K.Lee el al. “Mitigating the impact of hardware defects on multimedia applications: a cross-

layer approach,” in ACM Multimedia, 2008. G. Lee et al. “Thermal-aware fault-tolerant system design with coarse-grained reconfigurable

array architecture,” in AHS, 2010. G. Bradski, “The OpenCV library,” Doctor Dobbs Journal, 2000. J. L. Henning, “SPEC CPU2000: Measuring CPU performance in the new millennium,” Computer,

2000. http://technology-and-science.lawyers.com/blogs/archives/4461-Are-Cosmic-Rays-a-Factor-in-

Toyota-Acceleration-Problems.html http://www.chipdesignmag.com/payne/2010/04/14/alpha-particles-dram-seu-toyota-

acceleration/