on-chip-redundancy on-chip-redundancy according to
Post on 05-May-2022
11 Views
Preview:
TRANSCRIPT
1Wind River - GermanyAndreas Buchwieser
On-chip-redundancy
On-chip-redundancy according to IEC61508, Annex E:Analysis of a combination of Software Virtualization Layer
And a SPEAR1310 System-On-Chip(ARM Cortex A9 Dual-Core)
International TÜV Rheinland Symposium in ChinaFunctional Safety in Industrial Applications18 – 19 October 2011, Shanghai - China
2Wind River - GermanyAndreas Buchwieser
Agenda
� Definitions
� Consolidation through Virtualization
� Compliant Item
� Analysis Approach
� Hardware Random Failures
� Common Cause Failures
� Outlook
� Summary
3Wind River - GermanyAndreas Buchwieser
Definitions
� VirtualizationAbstraction of computer resources, hiding the physical characteristics
� HypervisorVirtualization platform that allows multiple operating systems to run on a host computer at the same time (Wikipedia)
� Virtual BoardEnvironment for one operating system or bare application; has physical and/or virtual hardware controlled by the Hypervisor
4Wind River - GermanyAndreas Buchwieser
Consolidation through Virtualization HFT0
EquipmentUnder
Control PE
Virtualization
Virtual Board 1 Virtual Board 2
COTS OS VxWorks Cert
Application
Virtualization Mechanism - WR Hypervisor
Safe Application
Virtualization
Hardware
5Wind River - GermanyAndreas Buchwieser
Safety Standard IEC 61508
� ”Where the software is to implement both safety and non-safety functions, then all of the software shall be treated as safety-related, unless adequate design measures ensure that the failures of non-safety functions cannot adversely affect safety functions.”
� ”..all of the software shall be treated as belonging to the highest SIL, unless adequate independence between the safety functions of the different safety integrity levels can be shown in the design. It shall be demonstrated either (1) that independence is achieved both in the spatial and temporal domain, or (2) that any violation of independence is controlled.”
[source: IEC 61508-3, Edition 2.0, Dated: 2010-04]
6Wind River - GermanyAndreas Buchwieser
Consolidation through Virtualization HFT1
PE PE
EquipmentUnder
Control
Virtual Board 1 Virtual Board 3
VxWorks Cert
Virtualization Mechanism - WR Hypervisor
Core 1 Core 2
Safe Application Application
COTS OS
Virtual Board 2
Safe Application
Safe OS
VirtualizationVirtualization
Dual Channel Solution
Compliant Item
7Wind River - GermanyAndreas Buchwieser
Safety Standard IEC 61508
� ” ...On-chip redundancy as used in this standard means a duplication (ortriplication etc.) of functional units to establish a hardware fault tolerance greater than zero..”
� ”.. A subsystem with a hardware fault tolerance greater than 0 can be realized using one single IC semi-conductor substrate (on-chip redundancy)..”
[source: IEC 61508-2 Ed2.0, Annex E: , Edition 2.0, Dated: 2010-04]
8Wind River - GermanyAndreas Buchwieser
1oo2D Architecture
� Both channels need to demand the safety function
� Diagnostic tests detect faults in either channel
� Voting is adapted
� Faults in both channels -> safe state
9Wind River - GermanyAndreas Buchwieser
Safety Requirements Compliant Item
� Used for two kind of safety functions:– Low-demand SIL3
– High-demand / continuous mode SIL2
� Used in a safety function powered on for a long time
� Safe State– OS is informed about a fault
– Voter is informed and is able to achieve or maintain a safe state
� Could execute both safety functions and non safety functions
� Is assumed to be analyzed according to route 1H and 1S
10Wind River - GermanyAndreas Buchwieser
� YOGITECH’s fRMethodology is a “white-box” approach to do functional safety analysis and safety-oriented exploration of integrated circuits (IC) in compliance with IEC 61508 and ISO 26262� splitting the circuit in elementary parts (“sensitive zones”)
� Sensitive zones are extracted from the design databasewith automatic tools (to guarantee completeness)
� computing their failure rates
� using those failure rates to compute safety metrics � verifying the results with fault injection
� Transient (single event upset and single event transient)� Permanent (stuck-at, bridging, stuck-open, stuck-on)� Common-cause failures (e.g. clock/PLL and reset faults)
� allowing sensitivity analyses by changing parameters� delivering to customer numbers to compare different architectures
� YOGITECH is responsible of ISO 26262:10 Annex A, about how to apply ISO 26262 to microcontrollers
λSZ = f (λelem, C, D, F,DC)
λelem = elementary failure rate for each fault modelC = probability of faiilure in that sensitive zone in terms of area, n. and type of gates, number of registers, type of interconnections, number of logic levels etc...D = dangerousity of the sensitive zoneF = frequency of use of the sensitive zoneDC = diagnostic coverage for the sensitive zone
YOGITECH‘s fRMethodology
11Wind River - GermanyAndreas Buchwieser
Analysis Process*
Safety architecture
HW RandomFailures Common-Cause
FailuresSystematic
Failures
Criticality ranking
DC of availablediagnostics
Identification ofSafety gaps
Improvements
ApplicationMeasures
(End Customer)
PBIT/CBITMeasures
(WINDRIVER/End Customer)
HWMeasures
(Chip provider)
Evidences of lowcriticality
(Chip provideror End Customer)
Annex E of part 2
Improvements
HW Measures(Chip provider or End Customer)
Layout Measures(Chip provider)
Diversity(End Customer/WINDRIVER)
Identification ofSafety gaps
Interferencefreeness
ConfigurationHW/SW/tools
design
Identification ofSafety gaps
Improvements
Application / SWseparation
(WINDRIVER/End Customer)
guidelinesComputation of
metrics
Source*: Yogitech fRMethodology, approved by TÜV SÜD (certificate Z10 06 11 61674 001)
12Wind River - GermanyAndreas Buchwieser
Abstract View of Compliant Item Functions
S
S
PEi PEo
PEo
A
WDTe
PEi
DDRCOMM
Vi
ViPEc
PEc
Ve
VMONe
E2PROM
SR
Zone 0: Fully shared HW
Zone 1: Separate HW
Zone 1-: Shared HW usedin time division
13Wind River - GermanyAndreas Buchwieser
Allocation of Spear1310 modules to zones
CPU1 CPU 2
GIC
Cache Controller
L2Cache
Power management
SMISerial memor y interface
MPMCDDR2/3 interface
32KB ROM(only for boot)
Reset and clock c ontr ol
GIGA-Ethernet
OTP memory
32KB+4KB SRAM(only for boot)
OTP
SCU
GPT0
GPT1
UART1
UART 2
UART3,4,5 GPT2,3
GPIO
GPIO
UART0
THSENS
EXTERNALWATCHDOG
WDTe
EXTERNALVOTER
Ve
GPIO
safecomm.(input)
actuator
VMONe
THSENS
DDR
E2PROM
MISC
safe inter-board comm.
Separate HW used ONLY by unsaf e VBs
Separate HW used ONLY by saf e VB channel 1
Separate HW used ONLY by saf e VB channel 2
Separate HW used by saf e channels but with some f ully shared HW
Shared HW used by saf e channels in time div ision
Shared HW used by saf e channels in time div ision with some fully shared HW
Fully shared HW (one-channel)
I2C
I2C
RS485
RS485
unsafecomm.
ADC
RTC
HW & VB programsand data
unsafe analogue input
FAST-Ether net ( 1,2,3) unsafe inter-board comm.
safecomm.(output)
unsafe inter-board comm.
unsafe inter-board comm.
14Wind River - GermanyAndreas Buchwieser
HW Random Failures: Allocation of Targets
15Wind River - GermanyAndreas Buchwieser
HW Random Failures: Steps for quantitative Analysis
� Partition the compliant item in sub-modules
� Identify the fault models for each module and sub-module
� Estimate the failure rates for each sub-module and for each fault model, including the estimate of the amount of safe, no effect and no part failures according IEC 61508 2nd edition for each sub-module
� First estimate of the DC for each sub-module
� Rank the sub-modules in terms of the remaining risk of undetected failures
� First computation of the safety metrics (SFF and PFD/PFH);
� Fix a new target for the DC for each sub-module in order to match the SFF targets defined during safety requirements allocation
� Define the CoUs needed to match those targets.
16Wind River - GermanyAndreas Buchwieser
HW Random Failures: Diagnostic Measures in the Compliant Item
17Wind River - GermanyAndreas Buchwieser
DM3, DM4: Internal and External Watchdog
18Wind River - GermanyAndreas Buchwieser
Common Cause Failures: Critical Requirements IEC 61508-2, Annex E, E.1 and E.2
19Wind River - GermanyAndreas Buchwieser
Common Cause Failures affecting redundant channels in MCUs
� Sleeping Faults
� Clock Faults
� Power Faults
� Temperature Faults (hots spots)
� Timing Faults
� Checker FaultsFaults in the Software checker or in the comparator of the dual core
� Cascading FaultsClass of dependent failures
20Wind River - GermanyAndreas Buchwieser
Common Cause Failures: Application of BetaICtable IEC61508-2 Annex E.3, Zone 1
21Wind River - GermanyAndreas Buchwieser
Summary
� Diversity required� Diversity at application level between the channels� Diversity at Hypervisor level between the channels
� CCF� More details on the MCU structure required or� Specific structure of the external watchdog needed
� Complexity of SW tests� Need to reach high coverages for MCU, bus interconnect� Not possible to reach with simlpe SW based on MCU instruction manual� Approach needs
� Software Tests with MCU-aware approach� Verification Strategy (fault-injection)
� Safety Manual Quality� Guidelines for SW Test solid and verified upfront� Proper verification strategy� End customer will reach claimed coverage following guidelines
22Wind River - GermanyAndreas Buchwieser
Outlook
Phase A1.0
Phase A1
Phase A2/A3
Go/nogo
Detailed specificationof what has to be covered by
the diagnostic measures(BIT, WDT-FPGA, etc.)
Product detailedspecification &implementation
Verification of productimplementation (V&V of BIT,WDT-FPGA etc. coverage)
Collection of evidences
Verification of CCF(e.g. analysis of
SPEAr1310 layout)
Detailed specificationof what has to be covered by
the diagnostic measures(BIT, WDT-FPGA, etc.)
Verification of productimplementation (V&V of BIT,WDT-FPGA etc. coverage)
Collection of evidences
Detailed specificationof what has to be covered by
the diagnostic measures(BIT, WDT-FPGA, etc.)
Verification of CCF(e.g. analysis of
SPEAr1310 layout)
Verification of productimplementation (V&V of BIT,WDT-FPGA etc. coverage)
Collection of evidences
Initial analysisof SPEAr1310-based
logic solver
Detailed specificationof what has to be covered by
the diagnostic measures(BIT, WDT-FPGA, etc.)
Ski
p ap
plic
atio
n di
vers
ity
Product certification
top related