an architecture for fail-silent operation of fpgas and
TRANSCRIPT
An Architecture for FailAn Architecture for Fail--Silent Silent Operation of FPGAs and Operation of FPGAs and
Configurable Configurable SoCsSoCsLee W. Lerner and Charles E. StroudLee W. Lerner and Charles E. Stroud
based on presentation at International Conf. on based on presentation at International Conf. on Embedded Systems & Applications, June 2006Embedded Systems & Applications, June 2006
VLSI Design & Test Seminar, Spring 2007 2
Outline of PresentationOutline of PresentationMotivation and BackgroundMotivation and Background
Overview of FailOverview of Fail--Silent operationSilent operationSingle Event Upsets (Single Event Upsets (SEUsSEUs))
FailFail--Silent ArchitectureSilent ArchitectureFault isolation with Guard BandsFault isolation with Guard Bands
Experimental ImplementationsExperimental ImplementationsAtmel AT94K series Atmel AT94K series SoCSoCXilinx VirtexXilinx Virtex--4 series FPGAs4 series FPGAsTriple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)
SummarySummaryFuture WorkFuture Work
VLSI Design & Test Seminar, Spring 2007 3
Motivation and BackgroundMotivation and BackgroundFailFail--Silent operationSilent operation
Halt all operation immediately upon Halt all operation immediately upon occurrence of a faultoccurrence of a faultReduces need for periodic offReduces need for periodic off--line system line system testingtesting
Single Event Upsets (Single Event Upsets (SEUsSEUs))Transient or soft radiationTransient or soft radiation--induced errors in induced errors in microelectronic devices microelectronic devices Known to occur in highKnown to occur in high--radiation radiation environments such as spaceenvironments such as spaceAffect FPGA configuration memoryAffect FPGA configuration memory
VLSI Design & Test Seminar, Spring 2007 4
Single Event Upsets (Single Event Upsets (SEUsSEUs))Energetic particles causing Energetic particles causing SEUsSEUs
Galactic cosmic raysGalactic cosmic raysCosmic solar particles influenced by solar flaresCosmic solar particles influenced by solar flaresTrapped protons in radiation beltsTrapped protons in radiation belts
VLSI Design & Test Seminar, Spring 2007 5
Single Event Upsets (Single Event Upsets (SEUsSEUs))
CMOS Inverter Modified from Tribble, A. C., The Space Environment – Implications for Spacecraft Design, 2nd Ed., (Princeton, NJ: Princeton University Press, 2003).
VIN
VOUT
p-type substrate
n+ n+
n-well
p+ p+p+ n+
VSSVDD
Source
Gate
Drain Source
Radiation(proton, ion, neutron, …)
+
++
+
++
+
-
-
-
-
-Upset occurs ifchannel current turned on
Latchup occurs if parasitic current loop initiated
SEU effects on CMOS technologySEU effects on CMOS technologyChange logic values of transistorsChange logic values of transistors Vin Vout
VDD
VSS
CMOS Inverter
VLSI Design & Test Seminar, Spring 2007 6
SEU Effects on an FPGASEU Effects on an FPGA
word
BIT
RAM Cell
Coupled Inverters BIT
Configuration Memory Bit
Wire BWire A
Programmable Interconnect Point (PIP)
PIP Connecting the Routing of Multiple Modules
Module 2
Module 1
isolatedwire segments
Deactivated PIP
Traditional TMR Approach
VLSI Design & Test Seminar, Spring 2007 7
FailFail--Silent ArchitectureSilent ArchitectureGuard band region of isolationGuard band region of isolation
Isolate multiple working circuitsIsolate multiple working circuitsNo single fault can allow interaction between two No single fault can allow interaction between two working circuitsworking circuits
WorkingRegion
#1
WorkingRegion
#2
input set #1 input set #2
fail-silentoutput set #1
fail-silentoutput set #2
guard bandwith fault
monitor circuit
VLSI Design & Test Seminar, Spring 2007 8
FailFail--Silent ArchitectureSilent ArchitectureFault monitoring circuitFault monitoring circuit
For each output of independent working regionsFor each output of independent working regionsPairPair--wise compare outputs of working regionswise compare outputs of working regionsTriTri--state output when any mismatch occursstate output when any mismatch occursInitiate processor routine to reconfigure FPGAInitiate processor routine to reconfigure FPGA
to processor interruptfail-silent output
output fromregion #1
PLB PLBs forfault isolation
guard bandwith fault
monitor circuit
output fromregion #2
tri-state buffer
VLSI Design & Test Seminar, Spring 2007 9
Atmel AT94K Series Configurable SoC Architecture
AT94K SoCArchitecture Our AT94K Demo &
Development Board
VLSI Design & Test Seminar, Spring 2007 10
repeaters
guard band
express x8
PLBs
local x4express x8
local cross-point PIPs
= Programmable Interconnect Point (PIP)
Y
Y
Y Y
X X
X
Local Routing
PLB
Global Routing (1 PLB) Horizontal Repeaters in Global Routing
4 PLBs 8 PLBs
repeaters×8 lines×4 lines
X
Atmel AT94K Routing Architecture Atmel AT94K Routing Architecture
VLSI Design & Test Seminar, Spring 2007 11
Guard Band Implementation in AT94KGuard Band Implementation in AT94K8080--bit LFSR bit LFSR system system functionsfunctions4 PLB wide 4 PLB wide guard band guard band regionregionFault monitor Fault monitor circuit in circuit in guard band guard band regionregion
System Function
Fault Monitor
VLSI Design & Test Seminar, Spring 2007 12
Guard Band Implementation in AT94KGuard Band Implementation in AT94KSystem Function
Fault Monitor
VLSI Design & Test Seminar, Spring 2007 13
Basic Virtex-4 ArchitecturePIPsPIPs and Routing resourcesand Routing resources
4 types of 4 types of PIPsPIPsDouble lines (x2 lines) span 2 PLBsDouble lines (x2 lines) span 2 PLBsHex lines (x6 lines) span 6 PLBsHex lines (x6 lines) span 6 PLBsLong lines span width and length of PLB arrayLong lines span width and length of PLB array
= PLBs(1,368 – 22,272)
= block RAMs(36 – 552)
= DSPs(32-512)
= PowerPCs(0-2)
Horizontal guard bands work best with Virtex-4 architecture
VLSI Design & Test Seminar, Spring 2007 14
System Function System FunctionFault Monitor
Guard BandPLBw/ 4
slices
I/Obuffer
System Function System FunctionFault Monitor
Guard Band
I/Obuffer
Guard Band Implementation in VirtexGuard Band Implementation in Virtex--44Xilinx ISE: constraints in PACE and routing in FPGA EditorXilinx ISE: constraints in PACE and routing in FPGA Editor
Two 5Two 5--bit LFSR system functionsbit LFSR system functions6 PLB wide guard band region with fault monitoring circuit6 PLB wide guard band region with fault monitoring circuit
VLSI Design & Test Seminar, Spring 2007 15
7474--bit LFSR Implementationbit LFSR Implementation
System Function System Function
Fault Monitor
Guard Band
I/Obuffer
Guard Band
System Function System Function
Fault Monitor
PLBw/ 4
slices
I/Obuffer
VLSI Design & Test Seminar, Spring 2007 16
Triple Modular Redundancy (TMR) Implementations in FPGAs
Traditional TMR SEU susceptibility problemTraditional TMR SEU susceptibility problemWire segments from a PIP can access multiple Wire segments from a PIP can access multiple modulesmodules
Therefore, 1 fault can destroy faultTherefore, 1 fault can destroy fault--tolerancetoleranceSpecial place and route algorithms needed to avoid Special place and route algorithms needed to avoid problemproblem
Deactivated PIP
TMR fault isolation with guard band regionsTMR fault isolation with guard band regionsGuard bands isolate module components and routingGuard bands isolate module components and routing
Module 2
Module 1 Module 3
Majority Voter
isolatedwire segments
Majority Voter
Module1
Module2
Module3
Guard Bands
VLSI Design & Test Seminar, Spring 2007 17
Traditional TMR Implementation in AT94KTraditional TMR Implementation in AT94K
Mixed Routing of 3
Different System
Functions
VLSI Design & Test Seminar, Spring 2007 18
TMR Implementation in AT94KTMR Implementation in AT94KSystem
Function A
System Function B
System Function C
Majority Voter Circuit
VLSI Design & Test Seminar, Spring 2007 19
Fault Injection ResultsFault Injection Results
Majority Voter
Module1
Module2
Module3
Guard Bands
√√
AVR Fault Injection
Module1
Module2
Module3
Module1
Majority VoterMajority Voter
××√√
TMR TMR -- Pass 1:Pass 1:No fault injection No fault injection Majority Voter PassesMajority Voter Passes
TMR TMR -- Pass 2:Pass 2:Module 1 injected with fault Module 1 injected with fault Majority Voter PassesMajority Voter Passes
TMR TMR -- Pass 3:Pass 3:Modules 1 & 3 injected with faults Modules 1 & 3 injected with faults Majority Voter FailsMajority Voter Fails
Module3
VLSI Design & Test Seminar, Spring 2007 20
repeaters
guard band
express x8
PLBs
local x4
local cross-point PIPs
express x8
Guard Band:Guard Band:Injected 240 faults at edge of guard band with no failureInjected 240 faults at edge of guard band with no failureMultiple specific faults required to cause failureMultiple specific faults required to cause failure
Fault Injection ResultsFault Injection Results
VLSI Design & Test Seminar, Spring 2007 21
SummarySummaryGuard Band regions for FPGAsGuard Band regions for FPGAs
Isolate multiple working regions that contain Isolate multiple working regions that contain functionally equivalent system functionsfunctionally equivalent system functions
Fault monitoring circuits within guard bandsFault monitoring circuits within guard bandsMonitor and compare working region outputsMonitor and compare working region outputsTriTri--state outputs when a mismatch occursstate outputs when a mismatch occurs
FailFail--Silent operationSilent operationHalt operation immediately upon occurrence of a faultHalt operation immediately upon occurrence of a faultArea overhead only 2x that of nonArea overhead only 2x that of non--faultfault--tolerant circuittolerant circuitUse with TMR to achieve faultUse with TMR to achieve fault--tolerancetolerance
Single Event Upsets (Single Event Upsets (SEUsSEUs))Architecture provides immediate indication to initiate Architecture provides immediate indication to initiate scrubbing of the configuration memory scrubbing of the configuration memory