reliability analysis in bayesian networks -...
TRANSCRIPT
Lecture RS SWQ 4-609.09.16
1
Reliability Analysis in Bayesian Networks ROSAS Center Fribourg – SAFETY DAY Conference Sept-8 2016Reinhard Schlegel
Lecture RS SWQ 4-609.09.16
2
Bayesian Approach to Reliability Analysis
Space Shuttle Challenger-Disaster in 1986
Seven crew members died
Trigger of a paradigm shift in Reliability Engineering
Lecture RS SWQ 4-609.09.16
3
StatisticsClassical or Bayes
§ Inferential statistics is the science to infer from a sample to a population à Inference, inductive reasoning.
§ In inferential statistics we find two different concepts: Classical Statistics and Bayesian Statistics.
§ Classical Statistics uses for parameter estimation and hypothesis testing only the sample („the data“).
§ Bayesian Statistics additionally takes into account, what one knows or believes about the problem (both from “objective” and/or “subjective” sources).
§ This is reflecting different opinions how to interpret the concept of probability: > relative frequency in random experiments (the classical view) or > state of knowledge (the Bayesian view).Recommended reference: https://en.wikipedia.org/wiki/Bayesian_probability
See also the book from W. Tschirk, Springer (2014)
FrequentistInterpretationEpistemic Interpretation
Lecture RS SWQ 4-609.09.16
4
StatisticsClassical or Bayes
§ Controversial is not how to deal mathematically with probabilities. The calculus is identical in both cases.
§ However a remarkable difference between the classical and the Bayesian approach occurs in hypothesis testing:
§ A classical test indicates how well a sample fits to a hypothesis.Symbolically: p(Data | Hypothesis)
§ In Bayesian Statistics we can judge the probability of the hypothesis given the data.Symbolically: p(Hypothesis | Data)
Probability Calculus based on Kolmogorov Axioms
In Appendix 1 you will find an example, explaining Bayesian inference in comparison to the frequentist approach.
Lecture RS SWQ 4-609.09.16
5
Excursion to Probability CalculusTotal Probability
§ We consider a complete system of events, i.e. a at the most countably finite set {B1, B2, …} of mutually exclusive events with the property B1∪ B2 ∪… = W (sample space).
B1B2
B3
B4B5
A
W
§ Then the probability P(A) for any event A can be traced back to the conditional probabilities P(A|Bi). Using the general product rule we get:
§ The Σ −term is called total probability of A (or equivalent: marginalization of A).
P A( )= P Bi∩ A( )i∈I∑ = P Bi( )⋅P A|Bi( )
i∈I∑
Lecture RS SWQ 4-609.09.16
6
Excursion to Probability CalculusBayes‘ Theorem
§ The conditional probability of an event Bi given the conditioning event A is defined as:
§ Replacing P(A) with the total probability of A yields the famous Bayes’ Theorem: Be {B1, B2, …} a complete system of events, then:
§ If probabilities P(Bi), the so called prior probabilities, are known, then using Bayes’ Theorem the posterior probabilities P(Bi|A) can be computed (assuming the other terms of the right side can be estimated from empirical data).
Thomas Bayes,(1702 – 1761)
P Bi |A( )= P Bi∩ A( )P A( ) =
P Bi( )⋅P A|Bi( )P A( )
P Bi |A( )= P Bi∩ A( )P A( ) =
P Bi( )⋅P A|Bi( )P Bj( )⋅P A|Bj( )
j∈I∑
"for"i∈I
Lecture RS SWQ 4-609.09.16
7
Excursion to Probability CalculusBayes’ Theorem – An Example from Chip Industry
Killer defect problem:(Traditional solution)
§ Given a population W of Si chips. Suspect chips shall be identified and sorted out by means of an automated surface inspection before further processing. Contamination with killer defects is 0.01% (i.e. 100 ppm). This proportion was evaluated during a separate investigation.
§ The test procedure identifies and sorts defective (“sick”) chips with a probability of 99.9% (positive diagnosis). Non-defective (“healthy”) chips are accepted with a probability of 95% (negative diagnosis). What is the probability, that a positively tested (i.e. rejected) chip has indeed a killer defect?
Lecture RS SWQ 4-609.09.16
8
Bayesian InferenceThe Theorem Broken down into 4 Pieces
P(H | D) Posterior distribution, probability of hypothesis H given the data D
P(H) Prior distribution, reflecting knowledge about hypothesis H independent from data D;
P(D | H) Likelihood or aleatory model, representing the process or mechanism leading to data D;
P(D) Marginal distribution, serving for normalization purposes.
Following the usual notation in literature:H = hypothesisD= data, evidence
Data are supporting a hypothesis if P(H|D) > P(H).Data are irrelevant for a hypothesis if P(H|D) = P(H).Data are against a hypothesis if P(H|D) < P(H).
P(H | D) P(H)P(D | H)
P(D)
Lecture RS SWQ 4-609.09.16
9
Bayesian InferenceThe Theorem Broken down into 4 Pieces
Probability model, or likelihood function, for the observed data x given the unknown parameter q
Prior distribution model for q
Posterior distribution model for q given that the data x have been observed
Normalization , i.e. total probability
Bayes’ Theorem written with probability densities:
Appendix 2 explains in more detail the basic procedure of Bayesian inference.
Bayes‘ Theorem
Lecture RS SWQ 4-609.09.16
10
Important Application of Bayes’ TheoremBayesian Networks
§ Bayesian networks are a calculus for probabilistic and causal reasoning.
§ They are are one of the central technologies in the area of artificial intelligence (AI).
§ The foundations of Bayesian networks (probabilistic directed acyclic graphical models, causal inference, learning algorithms) were developed in the late 80s.
§ First SW tools were available with the continuously improved computing power (which is really needed for the often extensive calculations).
§ During the last two decades Bayesian networks experienced a steep upturn in disciplines like bio-informatics, medical diagnostics, safety- and reliability-analyses, risk analyses in economics, legal affairs etc.
Path breaking work of J. Pearl, S. Lauritzen, D. Spiegelhalter, F. Jensen etc
Turing Award for Judea Pearl 2011
http://amturing.acm.org/award_winners/pearl_2658896.cfm
SW-Tools like Hugin, Netica, BayesiaLab, AgenaRisk etc
Lecture RS SWQ 4-609.09.16
11
Bayesian NetworksA Short Definition
§ A Bayesian Network (BN) is a directed acyclic graph(DAG) with nodes and edges.
§ Nodes represent the random variables relevant to the given problem (observable quantities, latent variables, unknown parameters or hypotheses).
§ Edges represent conditional dependencies. Nodes that are not connected (no path from one of the variables to the other) are conditionally independent of each other.
§ Each node is associated with a probability function that takes, as input, a particular set of values for the node's parent variables, and gives as output the probability distribution of the variable represented by the node à probabilistic directed acyclic graph, PDAG.
§ The direction of the edges represents the cause-effect-relation. We identify serial, divergent and convergent basic network fragments.
What Bayesian networks are:
Quite substantial pioneering work on PDAGs was done at the University of Fribourg (CH), Institut für Informatik, Prof. Jürg Kohlas et al
A discussion of these very important structural properties of BNs is provided in Appendix 3 of this lecture.
Lecture RS SWQ 4-609.09.16
12
Bayesian NetworksAn Elementary Example
§ We consider an elementary example with only two nodes: “Quality” and “Test_result” .
§ Both nodes are from data type labeled, two states“defect_free” and “defective” for “Quality”, and two states “negative” and “positive” for “Test_result”.
§ The edges represent influential relationships between variables. In our example: “Test_result” is influenced by “Quality”. “Quality” is parent of ”Test_result”
§ The strength of influence of the parent is captured by the probability distribution for the node “Test_result”. In this case the distribution is just a 2x2 table of the conditional probabilities.
See also exercise for calculating with Bayes’ Theorem some slides before
Data types for the nodes: labeled, boolean, ranked, continuous interval, integer interval
Lecture RS SWQ 4-609.09.16
13
Bayesian NetworksAn Elementary Example
Background knowledge„State of the world“
Performance of test„Conditional detection probability“
Model in Hugin SW
Same data as in killer defect example before
Base rate = 0.0001
Specificityof test = 0.95
Sensitivityof test = 0.999(true positive rate)
Lecture RS SWQ 4-609.09.16
14
Bayesian NetworksAn Elementary Example
Denominator in Bayes theorem:
P(negative) = = P(negative | defective) . P(defective) + P(negative | defect_free) . P(defect_free)
P(positive) = = P(positive | defective) . P(defective) + P(positive | defect_free) . P(defect_free)
Lecture RS SWQ 4-609.09.16
15
Bayesian NetworksAn Elementary Example
Evidence from testing = positive
Computation of posterior probability with Bayes theorem:P(defective | positive) = P(positive | defective) . P(defective) / P(positive)
Same result as before:
P(defective | positive) = 0.002
Lecture RS SWQ 4-609.09.16
16
Bayesian Approach to Reliability AssessmentInclusion of Knowledge
During the next pages please always keep in mind:
§ All models are wrong, but some are useful (George Box, statistician, 1978).
…
Lecture RS SWQ 4-609.09.16
17
Example 1a: Getting Started with BNsModel of an Automotive Safety Related Incident
Relevant variables and their causal connections
Source: Thesis of Peter Bjorkman, Uppsala University (2011)
Lecture RS SWQ 4-609.09.16
18
Example 1a: Getting Started with BNsModel of an Automotive Safety Related Incident
Prior probabilities
Source: Thesis of Peter Bjorkman, Uppsala University (2011)
Lecture RS SWQ 4-609.09.16
19
Example 1a: Getting Started with BNsModel of an Automotive Safety Related Incident
Backward inference from evidence on “Crash”
Lecture RS SWQ 4-609.09.16
20
Example 1a: Getting Started with BNsModel of an Automotive Safety Related Incident
Forward inference from evidence on “Loss_of_braking” and “Parking_lot” and “Steer_away”
Lecture RS SWQ 4-609.09.16
21
Example 2a: FTA in Bayesian NetworksFault Tree for a Computer System
Faull Tree Analysis (FTA) is a very popular and diffused technique for dependability modeling and evaluation of large safety-critical systems.
Mapping fault trees into BNs can provide significant advantages in terms of modeling and analysis. Also DFTs can be modelled using BNs.(See respective literature from: Portinale, Fenton, Langseth).
FT of a generic computer system mapped into BN:
Lecture RS SWQ 4-609.09.16
22
Example 2a: FTA in Bayesian NetworksFault Tree for a Computer System
Node types = Boolean
Node statesFalse = Probability of survival Pr 𝑋 > 𝑡True = Probability of failure Pr 𝑋 ≤ 𝑡
X = Time to failure
t = Period of interest
Initial FTA configuration with prior probabilities:
Lecture RS SWQ 4-609.09.16
23
Posterior probabilities after forward and backward propagation of evidence on failures:
Example 2a: FTA in Bayesian NetworksFault Tree for a Computer System
As an example we see the investigation after failure of the computer system, where the manual backup could intervene successfully and power supply 3 was found healthy.The diagnosis indicates: CPU failure has the highest posterior probability, i.e. it is the most probable cause for computer failure.
Lecture RS SWQ 4-609.09.16
24
Block diagram of the system:
Example 2b: FTA in Bayesian NetworksFault Tree for a Programmable Logic Controller PLC
This is an often-cited example from:
Helge Langseth, Luigi Portinale, Bayesian networks in reliability, Reliability Engineering & System Safety 92 (2007) 92 – 108
A BN modelling the fault tree of this circuit can be found in the Appendix 4.
Lecture RS SWQ 4-609.09.16
25
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
§ Motivation: Inclusion of subjective data about process quality (e.g. CMMI model)
§ Table from: A Critique of Software Defect Prediction Models, Norman Fenton and Martin Neil, Centre for Software Reliability, 1999:
§ These data suggest some influence of process quality on product quality.
Building a BN model
Data on the relation of process quality to product quality to reliability are very rare in literature.
CMMI = Capability Maturity Model Integration
See also other process models like:
EQA = European Quality Award
IRIS = International Railway Industry Standard
Lecture RS SWQ 4-609.09.16
26
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN model
Data on the relation of process quality to product quality to reliability are very rare in literature.
CMMI = Capability Maturity Model Integration
Lecture RS SWQ 4-609.09.16
27
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
Data of interest
Lecture RS SWQ 4-609.09.16
28
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (1)
Lecture RS SWQ 4-609.09.16
29
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (2)
Lecture RS SWQ 4-609.09.16
30
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (3)
Lecture RS SWQ 4-609.09.16
31
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (4)
Lecture RS SWQ 4-609.09.16
32
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (5)
Lecture RS SWQ 4-609.09.16
33
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (6)
Lecture RS SWQ 4-609.09.16
34
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (7)
Lecture RS SWQ 4-609.09.16
35
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
Building a BN modelOriginal slides from AgenaRisk
The story behind the data (8)
Lecture RS SWQ 4-609.09.16
36
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
BN model implemented in AgenaRisk…
Lecture RS SWQ 4-609.09.16
37
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
BN model implemented in AgenaRisk
Initial configuration, prior distributionsDiscrete and continuous data types
Hybrid BNs
Lecture RS SWQ 4-609.09.16
38
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
BN model implemented in AgenaRisk
Enter 0 for defects found in operation and calculate.
Look at how the model has tried to ‘explain’ this observation. The most likely explanation is that there is very low operational usage. But note also that it is likely that testing and process quality are higher than average and problem complexity is lower than average.
Lecture RS SWQ 4-609.09.16
39
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
BN model implemented in AgenaRisk
Suppose we discover that, in fact, the operational usage is ‘medium’ and that a rather high number, 30, defects were found in testing. After calculating the model is more or less convinced that the explanation must be that the testing quality was so good that all the residual defects were found and fixed.
Lecture RS SWQ 4-609.09.16
40
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
BN model implemented in AgenaRisk
If, in fact, the testing quality was only ‘medium’ and removing evidence on defects found in testing then the model starts to believe that low complexity and high design process quality are the reasons for the result.
Lecture RS SWQ 4-609.09.16
41
Example 4: Reliability Assessment of SoftwareCausal Model to Predict Software Defects
BN model implemented in AgenaRisk
A possible subnet for
Testing Qualityas an example for modeling process quality (factors “Design process quality” and “Product complexity” need definitely further research)
Lecture RS SWQ 4-609.09.16
42
Example 5: BN in Railway Risk and Safety SupportSafety Risk Model for Train Derailment
Source:
Thesis at TU Dresden (2014)
Lecture RS SWQ 4-609.09.16
43
Example 5: BN in Railway Risk and Safety SupportSafety Risk Model for Train Derailment
Fault Tree with top event “Train derailment”
caused by coaction of 7 basic events
à to event tree
Lecture RS SWQ 4-609.09.16
44
Example 5: BN in Railway Risk and Safety SupportSafety Risk Model for Train Derailment
Event Tree from event “Train derailment”
Evolution of seven states of derailed train into twelve types of accidents
Bow tie model relating fault trees, the hazard and event trees à from fault tree
Lecture RS SWQ 4-609.09.16
45
Example 5: BN in Railway Risk and Safety SupportSafety Risk Model for Train Derailment
Event Tree with top event “Train derailment”
Mapping of some accident types into SIL levels
à from fault tree
Lecture RS SWQ 4-609.09.16
46
Example 5: BN in Railway Risk and Safety SupportSafety Risk Model for Train Derailment
Bayesian networkEquivalent to FT and ET “Train derailment”
BN modeling in original Mahboobthesis
Basic events from FT
Lecture RS SWQ 4-609.09.16
47
Example 5: BN in Railway Risk and Safety SupportSafety Risk Model for Train Derailment
Bayesian networkEquivalent ET “Train derailment”Modeling from Fenton / Neil Book
Eschede, Santiago de Compostelatype of accident
Risk of this type of accident: 2.84 x 10-4 x 0.005 x 10-2 = 1.42 x 10-8
Risk of hazard “Train derailed”: 2.84 x 10-4 computed from FTA
Lecture RS SWQ 4-609.09.16
48
Quite Recently PublishedRisk Analysis of High Speed and Conventional Railway Lines
Railway accident at Santiago de Compostela July-24, 2013
Lecture RS SWQ 4-609.09.16
49
Quite Recently Published Risk Analysis of High Speed and Conventional Railway Lines
BN subnetwork as basic building block
“With the aim of homogenization, we have designed a general Bayesian subnetwork whose acyclic graph is shown ..., where we can see the different variables involved in the railway problem and the dependence structure of all of them.”
Lecture RS SWQ 4-609.09.16
50
Quite Recently Published Risk Analysis of High Speed and Conventional Railway Lines
Drivers state:“This variable takes the six following values: distracted, attentive, alert, minor accident, moderate accident, and severe accident ... In fact, this variable tries to reflect the fact that the driver can be in two situations: normal driving or accident, but we have divided each of them into three possible alternatives.”
Lecture RS SWQ 4-609.09.16
51
Quite Recently Published Risk Analysis of High Speed and Conventional Railway Lines
Proposed model:
„... with the aim of evaluating the accident risk, we present a Markovian–Bayesian model that reproduces the process of traveling along the line and facing different risk situationsthat the driver encounters during the trip.“
Lecture RS SWQ 4-609.09.16
52
Quite Recently Published Risk Analysis of High Speed and Conventional Railway Lines
Evaluation of proposed model: Applying the model to the Orense-Santiago accident the
authors conclude:“… we obtain 8.22x10-6 expected accidents (for each line travel) and it is clear that the maximum speed reduction signal at PK 84.273, indicated only on the drivers booklet but not by side signals and not covered by the ATP system (the conventional ASFA system) is the most critical point on the line. It was the place where the accident took place because of a driver’s distraction, who did not respect the 80 km/h speed limit traveling at 170 km/h, more than twice the maximum allowed speed. It is also clear that a detailed risk analysis would have almost definitely pointed out the possibility of this critical possible human error, almost definitely avoiding the accident.”
Lecture RS SWQ 4-609.09.16
53
Bayesian Networks, Bayesian InferenceRecently Published Books
Lecture RS SWQ 4-609.09.16
54
Bayesian NetworksRecently Published Books
Lecture RS SWQ 4-609.09.16
55
Bayesian NetworksBooks on Basics and Foundations
Published: 2000 (1st edition) Published: 2001 (1st edition) Voluminous, 1200 pages
Lecture RS SWQ 4-609.09.16
56
Bayesian NetworksBrand New Books from 2016
Lecture RS SWQ 4-609.09.16
57
Reliability, Safety and Bayesian NetworksThe Leading Journal
Lecture RS SWQ 4-609.09.16
58
Bayesian NetworksCommercial and Open Source SW Tools
Alternative toOpenBUGS
Lecture RS SWQ 4-609.09.16
59
Literature about Risk Management
Lecture RS SWQ 4-609.09.16
60
Reliability Analysis in Bayesian NetworksShort CV
• R e i n h a r d S c h l e g e l , born 1947 in Nördlingen(Bavaria), living in Lörrach (Baden-Württemberg).
• Studies of Physics in Freiburg (Thesis about a subject in General Relativity Theory).
• Married, 2 daughters (Pedagogue, Political Scientist, Psychologist), 2 grandsons (still unbiased).
• Since 1980 at BBC / ABB in Baden, Turgi, Lenzburg (CH).• 1987 until 1992 Department Head Component Engineering in Process Automation Division (among others tasks: test and development of VLSI digital circuits).
• 1993 until 2011 Vice-President Quality & Reliability at ABB Semiconductors (Power semiconductors in Bipolar und BiMOStechnology).
• Since April 2011 Senior Expert Quality & Reliability at ABB Semiconductors (since 2012 consulting).
• 1993 until 2005 lectureship at University of Applied Sciences North-Western Switzerland in Quality Management, Reliability Engineering, Microelectronics Manufacturing Technology.
• Since 2011 lectureship at University of Cooperative Education Baden-Württemberg Lörrach for Logic and Algebra, for Statistics and Probability, and since 2013 for Software-Quality.