ressource reduced triple modular redundancy for built-in self-repair in vliw-processors

26
1 Computer Engineering Group Brandenburg University of Technology at Cottbus Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors Mario Schölzel

Upload: halona

Post on 11-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors. Mario Schölzel. Outline. Why Built-In Self-Repair? Base Architecture Resource Reduced TMR Program Modifications Architecture Modifications Conclusions and Limitations. Why Built-In Self-Repair ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

1

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in

VLIW-Processors

Mario Schölzel

Page 2: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

2

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Outline

• Why Built-In Self-Repair?• Base Architecture • Resource Reduced TMR• Program Modifications • Architecture Modifications• Conclusions and Limitations

Page 3: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

3

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Why Built-In Self-Repair ?

• Hardware becomes unreliable (permanent faults due to small feature size)

• ITRS Roadmap 2005 for Design predicts requirement for reliable systems due to:– Infeasibility of full functional test at

manufacturing exit – Relaxing 100% correctness requirement

(reduces functional test complexity and cost)

• Consequence: Redundancy in the system is required for robustness!

Page 4: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

4

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Simple TMR-Approach

Processor 1

Processor 2

Processor 3

VoterInput Output

We consider the following application domain:• High-performance signal processing

applications (i.e. image- and audio-processing)

• Real-Time demands

Page 5: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

5

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Basic Processor Architecture

opcode1 src1.1 src1.2 dst1 srcn.1 srcn.2 dstnopcode2 opcoden...Branch

Data Path

Register File

Branch FU 1

Extern

FU n

Control Path

Control Logic

Program Memory Data Memory

Instruction Pointer ...

Page 6: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

6

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Idea of Resource Reduced TMR

• Redundant operators are naturally available in a VLIW data path

• In TMR: Three results are only necessary in case of a mismatch of two results

• Idea of RR-TMR: Perform every operation only by two operators and use in non-fault case third operator for executing regular operations

Page 7: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

7

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Modified VLIW Data Path

Data Path

Regular Register File

Branch

Extern

Control Path

Control Logic

Program Memory Data Memory

Instruction Pointer ...

Temporary Register File

FD & C Logic FD & C LogicVoting Control

Logic

FU nFU 1

Limitation: Every operator must be available at least three times.

Page 8: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

8

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

+ +

+ +

Program Transformation5: 4: 7: 8: 12: 11: 2: 1:

6:+ 9:+ 13:+ 3:+

10:+ 14:+

17: 18: 16: 15:

20:+ 19:+

24: 23: 21: 22:

26:+ 25:+

28:+ 27:+

Duplicated Operations

+

+

Pair of Reference Operations

Page 9: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

9

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Modified Part of Instruction Word

• RefFU: number of FU that executes reference operation

• Mod=0: RefReg is target register in TRF• Mod=1: RefReg delivers reference value from

TRF• These fields must be set correctly for every

operation and its duplicate after scheduling all operations (We allow scheduling of original and duplicate operations at different times)

opcode src1 src2 dst RefREG RefFUmod

Page 10: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

10

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: Instruction Word

+

+Time step 8

Time step 10

FU 2 FU 3

Time step 9

Result of Scheduling

Corresponding Instruction Words

… …

+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU

+ R3 R6 R0 1 R6 2

Instr. 8

Instr. 9

Instr. 10

OpC Src1 Src2 Dst mod RReg RFU

Instruction Word Part of FU 2 Instruction Word Part of FU 3

Page 11: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

11

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU i

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i_Fault

Write Port of FU i in RF

error

mod

From voting logic to voting logicto voting logic

FD&C Logic Details

Every bit represents fault

status of corresponding

operator

Opcode of currently executed

operation in corresponding

FU

Compares current result and reference

value from register RefReg

in TRF

Decides whether an error occurs first time

or not and gives a signal to

Voting Logic

Detects, if current result is faulty

Page 12: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

12

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: Correct Execution

+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU

+ R3 R6 R0 1 R6 2

Instr. 8

Instr. 9

Instr. 10

OpC Src1 Src2 Dst mod RReg RFU

Instruction Word Part of FU 2 Instruction Word Part of FU 3

0

0 0

0

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 2

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 2 in RF

errormod

From voting logic to voting logicto voting logic

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 3

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 3 in RF

error

mod

From voting logic to voting logicto voting logic

Page 13: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

13

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: FU 2 is Faulty

+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU

+ R3 R6 R0 1 R6 2

Instr. 8

Instr. 9

Instr. 10

OpC Src1 Src2 Dst mod RReg RFU

Instruction Word Part of FU 2 Instruction Word Part of FU 3

1

1 1

0

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 2

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 2 in RF

errormod

From voting logic to voting logicto voting logic

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 3

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 3 in RF

error

mod

From voting logic to voting logicto voting logic

Page 14: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

14

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: FU 3 is Faulty

+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU

+ R3 R6 R0 1 R6 2

Instr. 8

Instr. 9

Instr. 10

OpC Src1 Src2 Dst mod RReg RFU

Instruction Word Part of FU 2 Instruction Word Part of FU 3

0

0 0

1

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 2

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 2 in RF

errormod

From voting logic to voting logicto voting logic

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 3

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 3 in RF

error

mod

From voting logic to voting logicto voting logic

Page 15: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

15

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: Fault Detection (1)

+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU

+ R3 R6 R0 1 R6 2

Instr. 8

Instr. 9

Instr. 10

OpC Src1 Src2 Dst mod RReg RFU

Instruction Word Part of FU 2 Instruction Word Part of FU 3

0

0 0

0

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 2

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 2 in RF

errormod

From voting logic to voting logicto voting logic

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 3

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 3 in RF

error

mod

From voting logic to voting logicto voting logic

Page 16: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

16

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: Fault Detection (2)

+ R3 R6 R0 1 R6 2

OpC Src1 Src2 Dst mod RReg RFU

Executing mismatch causing operation of FU 3 again in another FU. One of the following two cases applies:

0

+ R3 R6 R0 1 R6 2

OpC Src1 Src2 Dst mod RReg RFU

1

No mismatch is discovered. FU 2 and FU 4 computed correct result. Suppress Write-Back of FU 3

A mismatch is discovered again. It is assumed that FU 3 computed correct result. This is written to register file.

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 1

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 1 in RF

error

mod

From voting logic to voting logicto voting logic

RefReg

RefFU

Cmp

Fault Vector

opcode

Result of FU 1

Write Port TRF

Read Port TRF

Control of TRF Read Ports

Fault Re-

mem-ber

errOpc

errDet

i _Fault

Write Port of FU 1 in RF

error

mod

From voting logic to voting logicto voting logic

Page 17: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

17

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Details FD&C-Logic

cs1

csk

...

cs_s

el

Voting Instruction

Control Signal Queue

Dec

oder

Fet

ched

Inst

ruct

ion

Fault_1

Fault_m...Voting-Logic

write_disable

to FD&C 1

to FD&C m

op_sel

Operation mode

fu_sel

Fault Memory

cs2

opMode (to data path)

Select a certain control word (normal: cs1)

Current operation mode (normal, voting, resume)

Select control signals of fault

causing operation

Redirect selected signals to a working FU

Remember faulty operators

Control of (De-)Multiplexers

Page 18: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

18

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: FD&C-Logic

cs1

csk

...

cs_s

el

Voting Instruction

Control Signal Queue

Dec

oder

Fault_1

Fault_m...Voting-Logic

write_disable

to FD&C 1

to FD&C m

op_sel

Operation mode

fu_sel

Fault Memory

cs2

opMode (to data path)

*&

& * +-

nopnopnop

Example Schedule Situation of FD&C-Logic

+ Instruktion 1 (EX)

&

-

&

-Instruktion 2 (Fetch)

Instruktion 3

Fault is reported

normal

Page 19: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

19

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example: FD&C-Logic

cs1

csk

cs_s

el

Control Signal Queue

Dec

oder

Fault_1

Fault_m...Voting-Logic

write_disable

to FD&C 1

to FD&C m

op_sel

Operation mode

fu_sel

Fault Memory

cs2

opMode (to data path)

*-

*

* +-

* &&

nopnop*

nopnopnop

...

Example Schedule Situation of FD&C-Logic

+ Instruktion 1 (WB)

&

-

&

-Instruktion 2 (EX, stopped)

Instruktion 3 (Fetched, stopped)

Voting

Page 20: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

20

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Example FD&C-Logic

cs1

csk

cs_s

el

Control Signal Queue

Dec

oder

Fault_1

Fault_m...Voting-Logic

write_disable

to FD&C 1

to FD&C m

op_sel

Operation mode

fu_sel

Fault Memory

cs2

opMode (to data path)

*-

** +-

* &&

nopnop*

nopnopnop

...

Resume starts here

Resume

Page 21: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

21

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Limitations in Error Detection

++

Fu1 Fu2 Fu3 …Assumption: Operator + in FU 1 is faulty.

Problem: Correctness of Operator + in FU 2 can no longer be checked!

++

Solution: Check correctness of FU 2 with a reference operation in FU 3.

Page 22: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

22

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Preliminary Results5: 4: 7: 8: 12: 11: 2: 1:

6:+ 9:+ 13:+ 3:+

10:+ 14:+

17: 18: 16: 15:

20:+ 19:+

24: 23: 21: 22:

26:+ 25:+

28:+ 27:+

Page 23: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

23

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Preliminary Results

Non-Fault Tolerant Fault tolerant

L FUs Add Mul FUs Add Mul

8 4 4 4 8 8 8

9 4 3 4 8 7 8

10 3 3 3 6 6 6

11 3 3 3 6 6 6

12 3 3 2 5 5 5

13 3 3 2 5 5 5

14 2 2 2 5 5 4

15 2 2 2 4 4 4

16 2 2 2 4 4 3

Page 24: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

24

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Conclusion• Method can detect and repair permanent

and transient faults• Known faults do not cause a delay, new

faults cause a delay of at most 2maxLat+1

• Multiple known faults can be repaired (as long as at least on operation of every pair is executed by a non-faulty FU)

• Overhead of operators and register file ports of approximately 100%

• Overhead of Control-Logic is unknown so far (VHDL model is missing)

Page 25: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

25

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Open Problems

• Handling of multiple faults that first occur at the same time is possible but difficult

• Faults in wires, registers, control path and FD & C logic

• Hardware implementation for better area and performance estimation

Page 26: Ressource  Reduced  Triple Modular Redundancy for Built-In Self-Repair in  VLIW-Processors

Motivation

VLIW

Architecture

RR-TRM Idea

SW

Modifications

HW

Modifications

Conclusion

26

Computer Engineering Group

Brandenburg University of Technology at Cottbus

Mario SchölzelSPA 2007

Thank You!