Dependability Benchmarking of Soc-embedded control systems
Juan-Carlos RuizTechnical University of Valencia,
Spain
3rd SIGDeB Workshop on Dependability BenchmarkingISSRE 2005, Chicago, Illinois USA, November 2005
Outline
Benchmarking Context Benchmark Specification Benchmark Prototype Ongoing Work
SoC-embeddedAutomotive Control Applications
Entertainment
&
Communications
Powertrain
Body & Chassis
Electronically
controlled Automatic
Transmission
Electronic
Combustion
Per-Cylinder Knock
Electronic Valve timing
ElectronicFuel
InjectionElectronic
Ignition
ElectronicControl
Units
Any Engine ECU handles (at least) …
Fuel injection
Angle When is the fuel injected?
Timing How long is the injection?
Air management
Flow shape How does the air enter the cylinder?
Air Mass How much air enters the cylinder?
Engine ECU model
air mass
Flow shape
timing
angle
Air management Fuel injection
ElectronicControlUnit
µController-/DSP-based Hardware
control software
Throttle
Engineinternal
variables
Outputs fromthe ECU
control loopsRTOS (optional)
senso
rs
actuators
Outline
Benchmarking Context Benchmark Specification Benchmark Prototype Ongoing Work
• Unpredictable
e.g. the engine stops, the engine is damaged• Predictable
- Noise & Vibrations
- Non-optimal behavior
• Unpredictable
e.g. the engine stops, the engine is damaged• Predictable
- Noise & Vibrations
- Non-optimal behavior
Powertrain System Failure Powertrain System Failure ModesModes
Measures for powertrain system integrators (I)
• No new data (stuck-at failure)• Delayed data (missed deadline)• Output control deviation
- Close to nominal - Far from nominal
• No new data (stuck-at failure)• Delayed data (missed deadline)• Output control deviation
- Close to nominal - Far from nominal
ECU Failure modesECU Failure modes ECU control outputsECU control outputs
• Angle• Timing
• Volume• Flow Shape
Fuel injection
Air management
Measures for powertrain system integrators (II)
ECU control outputs
Fuel injection Air management
Angle Timing Air Mass Flow shape
Engine ECU Reset unpredictable
No new data unpredictable non-optimal noise & vibration noise & vibration
Close to nominal non-optimal noise & vibration non-optimal non-optimal
Far from nominal unpredictable non-optimal noise & vibration noise & vibration
Delayed data non-optimal noise & vibration non-optimal non-optimalFailu
re m
od
es
Value
Time
+ -Unsafety levels
{Number of failures of this type in the considered output / Total number of experiments}{Number of failures of this type in the considered output / Total number of experiments}
Note: (1) Table referred to a diesel engine PSA DW12
System Under Benchmarking (SUB) & Dependability Benchmark Target (DBT)
volume
Flow shape
timing
angle
Air management Fuel injection ECU
µController-/DSP-based Hardware
DBT
Engineinternalvariables
RTOS
senso
rs
actuators
SUB
EngineModelEngineModel
ThrottleModel
ThrottleModel
For economical and
safety reasons, the
throttle and the
engine are
replaced by two
models
These models must
be customised
according to each
specific engine
Outputs fromthe ECU control loops
Tools supporting the definition of Engine models
If available, we can use a
synthetic model of the engine
Otherwise, the model can be
obtained from the real engine:
1. Run the workload
2. Trace the engine behaviour
3. The resulting traces are
those defining a model of the
engine behaviour in absence of
faults
realengine
Workload
Workload = Engine internal variables + Throttle
Engine internal variables are generated by the
engine
Throttle inputs are computing according to one
of the following driving cycles: Acceleration-Deceleration cycle
Urban Driving Cycle
Extra-Urban Driving Cycle
Emission certification of light duty vehicles in Europe (EEC Directive 90/C81/01)
Worload detail
Speed average 18.35 km/hTime 13.00 minDistance 3976.11 mMaximum speed 50.00 km/h
Speed average 18.35 km/hTime 13.00 minDistance 3976.11 mMaximum speed 50.00 km/h
Speed average 61.40 km/hTime 6.67 minDistance 6822.22 mMaximum speed 120.00 km/h
Speed average 61.40 km/hTime 6.67 minDistance 6822.22 mMaximum speed 120.00 km/h
Urban Driving Cycle Extra-urban Driving Cycle
Faultload
“About 80% of hardware faults are transient, and as VLSI implementation use smaller geometries and lower power levels, the importance of transients increases” [Cunha DSN2002] [Somani Computer1997][DBench ETIE2 report]
Transient physical (hardware) faults affecting the ECU memory that are software-emulated using the single bit-flip fault model
Benchmark Conduct
Golden run Sequence of experimentsSequence of experiments
Experiment 1Experiment 1 Experiment 2Experiment 2 Experiment NExperiment N……
Start-up Fault injection ECU’s activity Observation
fault injection
fault activation(error)
error detection failure
Event-oriented
measures
error detection latency
observation time
Time-orientedmeasures
Technical considerations Engine ECUs are typically manufactured as SoCs
Control software is stored and executes inside a chip(observability and controllability issue)
Running faultloads without introducing spatial and temporal intrusion is a challenge
Our advises: Exploit on-chip debugging features currently supplied by most
automotive embedded microcontrollers On-the-fly memory access Program and data tracing facilities
To increase portability select standard and processor independent OCD mechanisms, like the ones defined by Nexus
Outline
Benchmarking Context Benchmark Specification Benchmark Prototype Ongoing Work
Benchmark Prototype
ECU software runs here
MPC565Evaluation
Board
Nexus Adapter
In-Circuit Debugger for Nexus
DBench-ECU USB link
Benchmark Analysis
Probes
ExternalsInternals
Memoryposition
ErrorHandlers
PeriodicOutputs
Non-PeriodicOutputs
RAWmeasures
1
Events
ExternalsInternals
ErrorActivation
ErrorDetection
Failures
NoFailure
No newData Missed
Deadline
Far fromNominal
Close toNominal
2
DependabilityMeasure
ExternalsInternals
ErrorLatencies
ErrorCoverage
&Distribution
Failures
Inducing a predictable
engine behavior
Inducing an unpredictable
engine behavior
Non-optimal Vibrations 3
Case Study: Diesel ECUs (DECUs)
• Implemented on a RTOS called μC/OS II• Each control output computed in a different OS task• Tasks uses semaphores & waiting facilities of the OS• OS scheduling policy is Rate-monotonic
ECU version 1• Implemented without OS• Each control output computed in a different program
procedure• The main program schedules the execution of each
program procedure• The scheduling policy is computed off-line
ECU version 2
Inputs fromSensors
Outputs toActuators
Engine ECU Control Loops
Intake air pressure
Common rail pressure
Crankshaft angle
Camshaft angle
Throttle position (Reference speed)
Current engine speed (in rpm)
Common rail compressor
discharge valve
Swirl valve
Waste gate valve
Injector1
…
InjectorN…
Fuel Timing(20 ms)
Fuel Angle(50 ms)
Air Shape(20 ms)
Air Volume(500 ms)
Some Results
DECU with RTOS
Failure Ratio: 1.775 % - Upredictable: 0.598 %
- Noise & Vibrations: 0.539 %
- Non-Optimal: 0.638 %
DECU without RTOS Failure Ratio: 10.659 % - Upredictable: 2.52 %
- Noise & Vibrations: 3.517 %
- Non-Optimal: 4.622 %
Acc.-Dec. CycleAcc.-Dec. Cycle
DECU with RTOS
Failure Ratio: 5.76 % - Upredictable: 1.35 %
- Noise & Vibrations: 2.48 %
- Non-Optimal: 1.93 %
DECU without RTOS Failure Ratio: 2.38 % - Upredictable: 0.34 %
- Noise & Vibrations: 1.36 %
- Non-Optimal: 0.68 %
Urban Driving CycleUrban Driving Cycle
DECU with RTOS
Failure Ratio: 5.10 % - Upredictable: 2.72 %
- Noise & Vibrations: 0.00 %
- Non-Optimal: 2.38 %
DECU without RTOS Failure Ratio: 5.76 % - Upredictable: 1.28 %
- Noise & Vibrations: 1.6 %
- Non-Optimal: 2.88 %
Extra-Urban CycleExtra-Urban Cycle
(Results obtained from a 5 days benchmark execution 300 exp. per driving cycle)
Practical considerations
Observation time after fault injection is limited by
the trace memory of the debugger connected to
the debugging ports
The number of probes that can be connected to a
debugging port is limited. Thus, obtaining the
benchmarking measures should require to run
several times the same golden run or experiment.
Outline
Benchmarking Context Benchmark Specification Benchmark Prototype Ongoing Work
Current Working Context ARTEMIS workshop, June-July 2005,
Paris Increasing interest of the industrial
community in the use of SW components in (SoC-)embedded systems
Need of benchmarking other types of components in control systems (RTOS, Middlewares, etc.)
To what extend what we know can be applied to such type of reseach?
Ongoing Research SoC systems = compound of components
Component = Interface + Implementation Parameter corruption techniques of major
interest to evaluate component robustness New technique for parameter corruption in
SoCs using OCD mechanisms [PRDC11 (to appear)]
The key issue here is not to reinvent the wheel but rather to explore to what extend what exists can be applied to SoCs
Thanks for your attention!!
Any question, comment, or suggestions ?
Benchmark measures Failure modes in control outputs
Time failures (out of time control delivery)
Value failures (no new value, value in tolerable bounds, value
out of tolerable bounds)
Impact of failures over the system and users (unsafety
levels) Without consequences
With consequences, but non-catastrophic
With catastrophic consequences
Benchmark performers must correlate, for each control
output, failure modes and their impact over the system and
users
Some Results
Urban Driving Cycle Extra-Urban Driving Cycle
Number of BEs: 300
DECU with RTOS Failure Ratio: 5.76 %
- Upredictable: 1.35 % - Noise & Vibrations: 2.48 % - Non-Optimal: 1.93 %
DECU without RTOS Failure Ratio: 2.38 % - Upredictable: 0.34 % - Noise & Vibrations: 1.36 % - Non-Optimal: 0.68 %
Number of BEs: 300
DECU with RTOS Failure Ratio: 5.1 %
- Upredictable: 2.72 % - Noise & Vibrations: 0 % - Non-Optimal: 2.38 %
DECU without RTOS Failure Ratio: 5.76 % - Upredictable: 1.28 % - Noise & Vibrations: 1.6 % - Non-Optimal: 2.88 %
Some Results
DECU with RTOS
Failure Ratio: 1.775 % - Upredictable: 0.598 %
- Noise & Vibrations: 0.539 %
- Non-Optimal: 0.638 %
DECU without RTOS Failure Ratio: 10.659 % - Upredictable: 2.52 %
- Noise & Vibrations: 3.517 %
- Non-Optimal: 4.622 %
Acc.-Dec. CycleAcc.-Dec. Cycle
DECU with RTOS
Failure Ratio: 5.76 % - Upredictable: 1.35 %
- Noise & Vibrations: 2.48 %
- Non-Optimal: 1.93 %
DECU without RTOS Failure Ratio: 2.38 % - Upredictable: 0.34 %
- Noise & Vibrations: 1.36 %
- Non-Optimal: 0.68 %
Urban Driving CycleUrban Driving Cycle
DECU with RTOS
Failure Ratio: 5.10 % - Upredictable: 2.72 %
- Noise & Vibrations: 0.00 %
- Non-Optimal: 2.38 %
DECU without RTOS Failure Ratio: 5.76 % - Upredictable: 1.28 %
- Noise & Vibrations: 1.6 %
- Non-Optimal: 2.88 %
Extra-Urban CycleExtra-Urban Cycle
(Results obtained from a 5 days benchmark execution 300 exp. per driving cycle)
System Under Benchmarking(SUB)
Experimental Set-upSUB Monitor
ExperimentRepositoryExperimentRepository
WorkloadController
FaultloadController
BenchmarkManager
Benchmark Target activity monitoring
Exercise SoC-embeddedcomponents
FaultInjectionProcess
Experimentalmeasurements stored in
Monitoring interface
Workload interface
Faultload interface
DependabilityMeasures
ExperimentAnalyzer
COTS software Components(Potential Benchmark Targets)
Results
ALE6%
SYSE0%
FPUVE2%
FPASE13%
SEE38%
MCE31%
DTLBER0%
OTHER4%
CHSTP6%
ITLBER0%
detected errors
26,7 %
Failure ration 40 %
Non-detected errors 73,3 %
3000 experimentsSW Configuration: RTOS (μC/OS II)Workload: Acceleration-Deceleration
31%
69%
Error
No error
Fault Injection Procedure
startSoC
Power-up ResetSoC software is
loaded in memory
end
set watchpoint
Set a external timer (e.g. in a PC)
Spatial trigger
SoC software starts execution
watchpointmessage?
FI experiment setup
No
Yes
No
Temporal trigger
Fault injection process
Read Memory Bit-flip Write Fault
Timer expires ?Yes
Hardware Fault models Transient faults (Single & Multiple bit-flip)
Permanent faults (stuck-at model)Continuous monitoring of the location where the fault must beintroduced:
stuck-at “1” bit-flip = Memory OR Mask (bits to flip at “1”)stuck-at “0” bit-flip = Memory AND Mask (bits to flip at “0”)
2. Bit-flip = Memory Mask(e.g. mask: bit70011.1000bit0)
1. Read Memory(e.g. bit71000.0011bit0)
3. Write fault(e.g. bit71011.1011bit0)
XOR0x00014B04
Memory location
Technical considerations The number of probes that can be connected to a
debugging port is limited. Thus, studying the system
activity in presence of faults should require to run several
times a fault injection experiment
The observation time after a fault injection is limited by the
trace memory of the components connected to the
debugging ports
……
Golden run
Experiment NExperiment N
Experiment N´Experiment N´
FI experiments(1 fault per experiment)
FI campaign
ExperimentExperiment
fault injection
fault activation (error)
error detection
failure
Golden run
Experiment 2Experiment 2
Experiment 2´Experiment 2´
Golden run
Experiment 1Experiment 1
Experiment 1´Experiment 1´
TraceRepository
INERTE : Integrated NExus-based Real-Time fault injection tool for Embedded systems
Experiment Generator Module
Experiment Generator Module Fault InjectorFault Injector Analysis ToolAnalysis Tool
ConfigurationFile
For eachFault Injection campaign
For eachFault Injection experiment
Golden Run Trace
FI Trace
FI Campaignreport
Experiment Generator Module
Configuration Files(Where & When faults are injected)
521 200000.ms
1 ROD 0x00015FB4 0x40 88265.ms OSUnMapTbl
2 ROD 0x00015F6D 0x10 70262.ms OSUnMapTbl
3 ROD 0x00015FB5 0x40 103116.ms OSUnMapTbl
4 COD 0x00014A85 0x02 57053.ms
ConvertirDatosInyeccion
5 COD 0x00014A21 0x80 129717.ms
ConvertirDatosInyeccion
6 COD 0x00014B04 0x01 115127.ms
ConvertirDatosInyeccion
7 RWD 0x00070B25 0x10 77078.ms
ConsignaPresionRail
8 RWD 0x00070B46 0x10 97479.ms ConsignaPresionRail
9 COD 0x00014419 0x02 139488.ms Interp2d
10 COD 0x00014138 0x20 79351.ms Interp2d
11 COD 0x000143F6 0x08 85503.ms Interp2d
12 COD 0x0001457A 0x40 59389.ms Interp2d
13 COD 0x000141D7 0x01 96898.ms Interp2d
14 COD 0x0001416B 0x01 146757.ms Interp2d
15 COD 0x000143C7 0x08 58150.ms Interp2d
16 COD 0x000141C4 0x20 128517.ms Interp2d
17 COD 0x00013FAA 0x80 76006.ms Interp2d
18 COD 0x000140BA 0x04 61788.ms Interp2d
19 COD 0x000140FF 0x08 136874.ms Interp2d
20 COD 0x0001427C 0x08 97722.ms Interp2d
…
1 ROD 0x00015FB4 0x40 88265.ms OSUnMapTbl
Where
When
Fault Injector
SoC applicationinputs & outputs
SoC application Tasks
SoCInternal Registers
Commercial Nexus debugging tool fromLauterbach®
Golden runprocesingGolden runprocesing
Fault injectionprocesingFault injectionprocesing
For the time being,Multibit flip is not consideredFor the time being,Multibit flip is not considered
Fault Injection Script(written in PRACTICE)Fault Injection Script(written in PRACTICE)
Analysis Tool
TraceRepository
B::Trace.List_(-50000.)--(0.)_address_data_ti.back_mark.mark_____record|address_____|d.l_____|ti.back___|mark-**********|-0000001128| D:00070BB8 00000000 -----0000001127| D:00070BBC 00000000 0.540us -----0000001126| D:00070BC0 00000000 0.700us -----0000001125| D:00070BC4 00000000 0.700us -----0000001124| D:00070BC8 00000000 0.960us -----0000001123| D:00070BCC 00000000 1.040us -----0000001122| D:00070BD0 00000000 0.700us -----0000001121| D:00070BD4 00000000 0.700us -----0000001120| D:00070BB8 000003E8 1.026s -----0000001119| D:00070BBA 00000014 1.760us -----0000001118| D:00070BBC 0000000F 1.740us -----0000001117| 239.200us A---…
Fault activation vs Non-activation:Error, 1173No error, 1786
Error syndrome:Detected Errors, 431 - Failure before error detection, 15No Detected Errors, 742
Errors not provoking a failure, 454Errors leading to Failure, 288
Failures:Data close to expected output, 116Data far from expected output, 172
Error detection mechanism:IBRK , 0LBRK , 0DTLBER , 0ITLBER , 0SEE , 234FPASE , 26SYSE , 11FPUVE , 11ALE , 25MCE , 97CHSTP , 31OTHER , 5
Error detection latency:Min, 0.000008620Max, 0.002321500Avg, 0.000097840
Analysis completed: - 3000 experiments analyzed. - 41 dropped, 11 due multibitflips.
Anatomy of a SoC-based control system
A SoC is a chip-embedded computer
Sensors Actuators
RTOS Component
RTOS interface
Task1 TaskN
Control Component
…
SoC internal memory
Sensor readings Control outputs
Controlleror DSP
Inputs Outputs