chapter 8: reasoning under uncertainty - uni-hamburg.de
TRANSCRIPT
![Page 1: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/1.jpg)
Chapter 8:Reasoning under Uncertainty
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 1
![Page 2: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/2.jpg)
“The mind is a neural computer, fitted by natural se-lection with combinatorial algorithms for causal andprobabilistic reasoning about plants, animals, objects,and people.
“In a universe with any regularities at all, deci-sions informed about the past are better than deci-sions made at random. That has always been true,and we would expect organisms, especially informa-vores such as humans, to have evolved acute intuitionsabout probability. The founders of probability, like thefounders of logic, assumed they were just formalizingcommon sense.”
Steven Pinker, How the Mind Works, 1997, pp. 524, 343.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 2
![Page 3: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/3.jpg)
Using Uncertain Knowledge
Agents don’t have complete knowledge about the world.
Agents need to make decisions based on their uncertainty.
It isn’t enough to assume what the world is like.Example: wearing a seat belt.
An agent needs to reason about its uncertainty.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 3
![Page 4: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/4.jpg)
Why Probability?
There is lots of uncertainty about the world, but agentsstill need to act.
Predictions are needed to decide what to do:I definitive predictions: you will be run over tomorrowI point probabilities: probability you will be run over
tomorrow is 0.002I probability ranges: you will be run over with probability
in range [0.001,0.34]
Acting is gambling: agents who don’t use probabilitieswill lose to those who do — Dutch books.
Probabilities can be learned from data.Bayes’ rule specifies how to combine data and priorknowledge.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 4
![Page 5: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/5.jpg)
Probability
Probability is an agent’s measure of belief in someproposition — subjective probability.
An agent’s belief depends on its prior assumptions andwhat the agent observes.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 5
![Page 6: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/6.jpg)
Numerical Measures of Belief
Belief in proposition, f , can be measured in terms of anumber between 0 and 1 — this is the probability of f .I The probability f is 0 means that f is believed to be
definitely false.I The probability f is 1 means that f is believed to be
definitely true.
Using 0 and 1 is purely a convention.
f has a probability between 0 and 1, means the agent isignorant of its truth value.
Probability is a measure of an agent’s ignorance.
Probability is not a measure of degree of truth.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 6
![Page 7: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/7.jpg)
Random Variables
A random variable is a term in a language that can takeone of a number of different values.
The range of a variable X , written range(X ), is the setof values X can take.
A tuple of random variables 〈X1, . . . ,Xn〉 is a complexrandom variable with range range(X1)× · · · × range(Xn).Often the tuple is written as X1, . . . ,Xn.
Assignment X = x means variable X has value x .
A proposition is a Boolean formula made fromassignments of values to variables.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 7
![Page 8: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/8.jpg)
Possible World Semantics
A possible world specifies an assignment of one value toeach random variable.
A random variable is a function from possible worlds intothe range of the random variable.
ω |= X = xmeans variable X is assigned value x in world ω.
Logical connectives have their standard meaning:
ω |= α ∧ β if ω |= α and ω |= β
ω |= α ∨ β if ω |= α or ω |= β
ω |= ¬α if ω 6|= α
Let Ω be the set of all possible worlds.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 8
![Page 9: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/9.jpg)
Semantics of Probability
For a finite number of possible worlds:
Define a nonnegative measure µ(ω) to each world ωso that the measures of the possible worlds sum to 1.
The probability of proposition f is defined by:
P(f ) =∑ω|=f
µ(ω).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 9
![Page 10: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/10.jpg)
Axioms of Probability: finite case
Three axioms define what follows from a set of probabilities:
Axiom 1 0 ≤ P(a) for any proposition a.
Axiom 2 P(true) = 1
Axiom 3 P(a ∨ b) = P(a) + P(b) if a and b cannot bothbe true.
These axioms are sound and complete with respect to thesemantics.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 10
![Page 11: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/11.jpg)
Consequences
1. Negation of a proposition: P(¬α) = 1− P(α).
The propositions α ∨ ¬α and ¬(α ∧ ¬α) are tautologies.Therefore, P(α ∨ ¬α) = P(α) + P(¬α) = 1.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 11
![Page 12: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/12.jpg)
Consequences
2. Logically equivalent propositions have the sameprobability: α↔ β P(α) = P(β).
If α↔ β, then α∨¬β is a tautology and P(α∨¬β) = 1.α and ¬β are contradictory statements, so with Axiom 3
P(α ∨ ¬β) = P(α) + P(¬β) = 1
Since P(¬β) = 1− P(β) also
P(α) + 1− P(β) = 1,
and therefore P(α) = P(β).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 12
![Page 13: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/13.jpg)
Consequences
3. Reasoning by cases: P(α) = P(α ∧ β) + P(α ∧ ¬β).
The propositions α↔ ((α ∧ β) ∨ (α ∧ ¬β)) and¬((α ∧ β) ∧ (α ∧ ¬β)) are tautologies. Thus,
P(α) = P((α∧β)∨(α∧¬β)) = P(α∧β)+P(α∧¬β).
4. Reasoning by cases, generalized:
If V is a random variable with domain D, then, for allpropositions α,
P(α) =∑d∈D
P(α ∧ V = d).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 13
![Page 14: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/14.jpg)
Consequences
5. Disjunction for non-exclusive propositions:
P(α ∨ β) = P(α) + P(β)− P(α ∧ β).
(α ∨ β)↔ ((α ∧ ¬β) ∨ β) is a tautology. Thus,
P(α ∨ β) = P((α ∧ ¬β) ∨ β) = P(α ∧ ¬β) + P(β).
With P(α ∧ ¬β) = P(α)− P(α ∧ β).
P(α ∨ β) = P(α)− P(α ∧ β) + P(β).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 14
![Page 15: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/15.jpg)
Semantics of Probability: general case
In the general case, probability defines a measure on sets ofpossible worlds. We define µ(S) for some sets S ⊆ Ωsatisfying:
µ(S) ≥ 0
µ(Ω) = 1
µ(S1 ∪ S2) = µ(S1) + µ(S2) if S1 ∩ S2 = .Or sometimes σ-additivity:
µ(⋃
i
Si ) =∑
i
µ(Si ) if Si ∩ Sj = for i 6= j
Then P(α) = µ(ω|ω |= α).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 15
![Page 16: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/16.jpg)
Probability Distributions
A probability distribution on a random variable X is afunction range(X )→ [0, 1] such that
x 7→ P(X = x).
This is written as P(X ).
This also includes the case where we have tuples ofvariables. E.g., P(X ,Y ,Z ) means P(〈X ,Y ,Z 〉).
When range(X ) is infinite sometimes we need aprobability density function...
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 16
![Page 17: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/17.jpg)
Conditioning
Probabilistic conditioning specifies how to revise beliefsbased on new information.
An agent builds a probabilistic model taking allbackground information into account. This gives theprior probability.
All other information must be conditioned on.
If evidence e is all the information obtainedsubsequently, the conditional probability P(h|e) of h
given e is the posterior probability of h.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 17
![Page 18: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/18.jpg)
Semantics of Conditional Probability
Evidence e rules out possible worlds incompatible with e.
Evidence e induces a new measure, µe , over possibleworlds
µe(S) =
c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for some ω ∈ S
We can show that c = 1P(e)
.
The conditional probability of formula h given evidence eis
P(h|e) = µe(ω : ω |= h)
=
P(h ∧ e)
P(e)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 18
![Page 19: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/19.jpg)
Semantics of Conditional Probability
Evidence e rules out possible worlds incompatible with e.
Evidence e induces a new measure, µe , over possibleworlds
µe(S) =
c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for some ω ∈ S
We can show that c = 1P(e)
.
The conditional probability of formula h given evidence eis
P(h|e) = µe(ω : ω |= h)
=P(h ∧ e)
P(e)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 19
![Page 20: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/20.jpg)
Conditioning
Possible Worlds:
Observe Color = orange:
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 20
![Page 21: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/21.jpg)
Conditioning
Possible Worlds:
Observe Color = orange:
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 21
![Page 22: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/22.jpg)
Exercise
Flu Sneeze Snore µtrue true true 0.064true true false 0.096true false true 0.016true false false 0.024false true true 0.096false true false 0.144false false true 0.224false false false 0.336
What is:
(a) P(flu ∧ sneeze)
(b) P(flu ∧ ¬sneeze)
(c) P(flu)
(d) P(sneeze | flu)
(e) P(¬flu ∧ sneeze)
(f) P(flu | sneeze)
(g) P(sneeze | flu∧snore)
(h) P(flu | sneeze∧snore)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 22
![Page 23: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/23.jpg)
Generalized conditional probability
Computation of a conditional probability from given jointprobabilities
P(fn|f1 ∧ . . . ∧ fn−1) =P(f1 ∧ · · · ∧ fn−1 ∧ fn)
P(f1 ∧ · · · ∧ fn−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 23
![Page 24: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/24.jpg)
Chain Rule
Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities
P(f1 ∧ f2 ∧ . . . ∧ fn)
=
P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)
= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)
= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)
× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)
=n∏
i=1
P(fi |f1 ∧ · · · ∧ fi−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 24
![Page 25: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/25.jpg)
Chain Rule
Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities
P(f1 ∧ f2 ∧ . . . ∧ fn)
= P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)
=
P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)
= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)
× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)
=n∏
i=1
P(fi |f1 ∧ · · · ∧ fi−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 25
![Page 26: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/26.jpg)
Chain Rule
Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities
P(f1 ∧ f2 ∧ . . . ∧ fn)
= P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)
= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)
=
P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)
× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)
=n∏
i=1
P(fi |f1 ∧ · · · ∧ fi−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 26
![Page 27: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/27.jpg)
Chain Rule
Inverse of the generalized conditional probability:computation of a joint probability distribution from givenconditional probabilities
P(f1 ∧ f2 ∧ . . . ∧ fn)
= P(fn|f1 ∧ · · · ∧ fn−1)× P(f1 ∧ · · · ∧ fn−1)
= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)× P(f1 ∧ · · · ∧ fn−2)
= P(fn|f1 ∧ · · · ∧ fn−1)×P(fn−1|f1 ∧ · · · ∧ fn−2)
× · · · × P(f3|f1 ∧ f2)× P(f2|f1)× P(f1)
=n∏
i=1
P(fi |f1 ∧ · · · ∧ fi−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 27
![Page 28: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/28.jpg)
Bayes’ theorem
The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:
P(h ∧ e) =
P(h|e)× P(e)
= P(e|h)× P(h).
If P(e) 6= 0, divide the right hand sides by P(e):
P(h|e) =P(e|h)× P(h)
P(e).
This is Bayes’ theorem.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 28
![Page 29: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/29.jpg)
Bayes’ theorem
The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:
P(h ∧ e) = P(h|e)× P(e)
= P(e|h)× P(h).
If P(e) 6= 0, divide the right hand sides by P(e):
P(h|e) =P(e|h)× P(h)
P(e).
This is Bayes’ theorem.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 29
![Page 30: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/30.jpg)
Bayes’ theorem
The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:
P(h ∧ e) = P(h|e)× P(e)
= P(e|h)× P(h).
If P(e) 6= 0, divide the right hand sides by P(e):
P(h|e) =P(e|h)× P(h)
P(e).
This is Bayes’ theorem.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 30
![Page 31: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/31.jpg)
Bayes’ theorem
The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:
P(h ∧ e) = P(h|e)× P(e)
= P(e|h)× P(h).
If P(e) 6= 0, divide the right hand sides by P(e):
P(h|e) =
P(e|h)× P(h)
P(e).
This is Bayes’ theorem.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 31
![Page 32: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/32.jpg)
Bayes’ theorem
The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:
P(h ∧ e) = P(h|e)× P(e)
= P(e|h)× P(h).
If P(e) 6= 0, divide the right hand sides by P(e):
P(h|e) =P(e|h)× P(h)
P(e).
This is Bayes’ theorem.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 32
![Page 33: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/33.jpg)
Why is Bayes’ theorem interesting?
Often you have causal knowledge:P(symptom | disease)P(light is off | status of switches and switch positions)P(alarm | fire)
P(image looks like | a tree is in front of a car)
and want to do evidential reasoning:P(disease | symptom)P(status of switches | light is off and switch positions)P(fire | alarm).
P(a tree is in front of a car | image looks like )
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 33
![Page 34: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/34.jpg)
Conditional independence
Random variable X is independent of random variable Y
given random variable Z if, for all xi ∈ dom(X ),yj ∈ dom(Y ), yk ∈ dom(Y ) and zm ∈ dom(Z ),
P(X = xi |Y = yj ∧ Z = zm)
= P(X = xi |Y = yk ∧ Z = zm)
= P(X = xi |Z = zm).
That is, knowledge of Y ’s value doesn’t affect your belief inthe value of X , given a value of Z .
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 34
![Page 35: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/35.jpg)
Example domain (diagnostic assistant)
light
two-wayswitch
switch
off
on
poweroutlet
circuitbreaker
outside power
l1
l2
w1
w0
w2
w4
w3
w6
w5
p2
p1
cb2
cb1s1
s2s3
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 35
![Page 36: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/36.jpg)
Examples of conditional independence
The identity of the queen of Canada is independent ofwhether light l1 is lit given whether there is outsidepower.
Whether there is someone in a room is independent ofwhether a light l2 is lit given the position of switch s3.
Whether light l1 is lit is independent of the position oflight switch s2 given whether there is power in wire w0.
Every other variable may be independent of whether lightl1 is lit given whether there is power in wire w0 and thestatus of light l1 (if it’s ok , or if not, how it’s broken).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 36
![Page 37: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/37.jpg)
Idea of belief networks
Whether l1 is lit (L1 lit) de-pends only on the status of thelight (L1 st) and whether thereis power in wire w0. Thus,L1 lit is independent of theother variables given L1 st andW 0. In a belief network, W 0and L1 st are parents of L1 lit.
w1 w2
s2_pos
s2_st
w0
l1_lit
l1_st
... ... ......
Similarly, W 0 depends only on whether there is power in w1,whether there is power in w2, the position of switch s2(S2 pos), and the status of switch s2 (S2 st).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 37
![Page 38: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/38.jpg)
Belief networks
A belief network is a graph: the nodes are random variables;there is an arc from the parents of each node into that node.
Suppose x1, . . . , xn are the variables of interest.
Totally order the variables of interest: X1, . . . ,Xn
Theorem of probability theory (chain rule):P(X1, . . . ,Xn) =
∏ni=1 P(Xi |X1, . . . ,Xi−1)
The parents parents(Xi ) of Xi are those predecessors ofXi that render Xi independent of the other predecessors.That is, parents(Xi ) ⊆ X1, . . . ,Xi−1 andP(Xi |parents(Xi )) = P(Xi |X1, . . . ,Xi−1)
So P(X1, . . . ,Xn) =∏n
i=1 P(Xi |parents(Xi ))
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 38
![Page 39: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/39.jpg)
Components of a belief network
A belief network consists of:
a directed acyclic graph with nodes labeled with randomvariables
a domain for each random variable
a set of conditional probability tables for each variablegiven its parents (including prior probabilities for nodeswith no parents).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 39
![Page 40: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/40.jpg)
Example belief network
Outside_power
W3
Cb1_st Cb2_st
W6
W2
W0
W1
W4
S1_st
S2_st
P1P2
S1_pos
S2_pos
S3_pos
S3_st
L2_st
L2_lit
L1_st
L1_lit
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 40
![Page 41: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/41.jpg)
Example belief network (continued)
The belief network also specifies:
The domain of the variables:W0, . . . ,W6 have domain live, deadS1 pos, S2 pos, and S3 pos have domain up, downS1 st has ok , upside down, short, intermittent, broken.Conditional probabilities, including:P(W1 = live|s1 pos = up ∧ S1 st = ok ∧W3 = live)P(W1 = live|s1 pos = up ∧ S1 st = ok ∧W3 = dead)P(S1 pos = up)P(S1 st = upside down)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 41
![Page 42: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/42.jpg)
Belief network summary
A belief network is automatically acyclic by construction.
A belief network is a directed acyclic graph (DAG) wherenodes are random variables.
The parents of a node n are those variables on which ndirectly depends.
A belief network is a graphical representation ofdependence and independence:I A variable is independent of its non-descendants given
its parents.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 42
![Page 43: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/43.jpg)
Constructing belief networks
To represent a domain in a belief network, you need toconsider:
What are the relevant variables?I What will you observe?I What would you like to find out (query)?I What other features make the model simpler?
What values should these variables take?
What is the relationship between them? This should beexpressed in terms of local influence.
How does the value of each variable depend on itsparents? This is expressed in terms of the conditionalprobabilities.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 43
![Page 44: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/44.jpg)
Using belief networks
The power network can be used in a number of ways:
Conditioning on the status of the switches and circuitbreakers, whether there is outside power and the positionof the switches, you can simulate the lighting.
Given values for the switches, the outside power, andwhether the lights are lit, you can determine the posteriorprobability that each switch or circuit breaker is ok or not.
Given some switch positions and some outputs and someintermediate values, you can determine the probability ofany other variable in the network.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 44
![Page 45: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/45.jpg)
What variables are affected by observing?
If you observe variable Y , the variables whose posteriorprobability is different from their prior are:I The ancestors of Y andI their descendants.
Intuitively (if you have a causal belief network):I You do abduction to possible causes andI prediction from the causes.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 45
![Page 46: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/46.jpg)
Common descendants
tampering
alarm
fire tampering and fire areindependent
tampering and fire aredependent given alarm
Intuitively, tamperingcan explain away fire
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 46
![Page 47: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/47.jpg)
Common ancestors
smokealarm
fire
alarm and smoke aredependent
alarm and smoke areindependent given fire
Intuitively, fire canexplain alarm and
smoke; learning onecan affect the other bychanging your belief infire.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 47
![Page 48: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/48.jpg)
Chain
report
alarm
leaving
alarm and report aredependent
alarm and report areindependent givenleaving
Intuitively, the onlyway that the alarmaffects report is byaffecting leaving .
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 48
![Page 49: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/49.jpg)
Pruning Irrelevant Variables
Suppose you want to compute P(X |e1 . . . ek):
Prune any variables that have no observed or querieddescendents.
Connect the parents of any observed variable.
Remove arc directions.
Remove observed variables.
Remove any variables not connected to X in the resulting(undirected) graph.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 49
![Page 50: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/50.jpg)
Belief network inference
Four main approaches to determine posterior distributions inbelief networks:
Variable Elimination: exploit the structure of the networkto eliminate (sum out) the non-observed, non-queryvariables one at a time.
Search-based approaches: enumerate some of the possibleworlds, and estimate posterior probabilities from theworlds generated.
Stochastic simulation: random cases are generatedaccording to the probability distributions.
Variational methods: find the closest tractabledistribution to the (posterior) distribution we areinterested in.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 50
![Page 51: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/51.jpg)
Factors
A factor is a representation of a function from a tuple ofrandom variables into a number.We will write factor f on variables X1, . . . ,Xj as f (X1, . . . ,Xj ).We can assign some or all of the variables of a factor:
f (X1 = v1,X2, . . . ,Xj ), where v1 ∈ dom(X1), is a factoron X2, . . . ,Xj .
f (X1 = v1,X2 = v2, . . . ,Xj = vj ) is a number that is thevalue of f when each Xi has value vi .
The former is also written as f (X1,X2, . . . ,Xj )X1 = v1 , etc.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 51
![Page 52: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/52.jpg)
Example factors
r(X ,Y ,Z ):
X Y Z valt t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7
r(X =t,Y ,Z ):
Y Z valt t 0.1t f 0.9f t 0.2f f 0.8
r(X =t,Y ,Z =f ):Y valt 0.9f 0.8
r(X =t,Y =f ,Z =f ) = 0.8
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 52
![Page 53: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/53.jpg)
Multiplying factors
The product of factor f1(X ,Y ) and f2(Y ,Z ), where Y are
the variables in common, is the factor (f1 × f2)(X ,Y ,Z )defined by:
(f1 × f2)(X ,Y ,Z ) = f1(X ,Y )f2(Y ,Z ).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 53
![Page 54: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/54.jpg)
Multiplying factors example
f1:
A B valt t 0.1t f 0.9f t 0.2f f 0.8
f2:
B C valt t 0.3t f 0.7f t 0.6f f 0.4
f1 × f2:
A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 54
![Page 55: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/55.jpg)
Summing out variables
We can sum out a variable, say X1 with domain v1, . . . , vk,from factor f (X1, . . . ,Xj ), resulting in a factor on X2, . . . ,Xj
defined by:
(∑X1
f )(X2, . . . ,Xj )
= f (X1 = v1, . . . ,Xj ) + · · ·+ f (X1 = vk , . . . ,Xj )
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 55
![Page 56: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/56.jpg)
Summing out a variable example
f3:
A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32
∑B f3:
A C valt t 0.57t f 0.43f t 0.54f f 0.46
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 56
![Page 57: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/57.jpg)
Evidence
If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :
P(Z |Y1 = v1, . . . ,Yj = vj )
=
P(Z ,Y1 = v1, . . . ,Yj = vj )
P(Y1 = v1, . . . ,Yj = vj )
=P(Z ,Y1 = v1, . . . ,Yj = vj )∑Z P(Z ,Y1 = v1, . . . ,Yj = vj ).
So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj ).We normalize at the end.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 57
![Page 58: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/58.jpg)
Evidence
If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :
P(Z |Y1 = v1, . . . ,Yj = vj )
=P(Z ,Y1 = v1, . . . ,Yj = vj )
P(Y1 = v1, . . . ,Yj = vj )
=
P(Z ,Y1 = v1, . . . ,Yj = vj )∑Z P(Z ,Y1 = v1, . . . ,Yj = vj ).
So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj ).We normalize at the end.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 58
![Page 59: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/59.jpg)
Evidence
If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :
P(Z |Y1 = v1, . . . ,Yj = vj )
=P(Z ,Y1 = v1, . . . ,Yj = vj )
P(Y1 = v1, . . . ,Yj = vj )
=P(Z ,Y1 = v1, . . . ,Yj = vj )∑Z P(Z ,Y1 = v1, . . . ,Yj = vj ).
So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj ).We normalize at the end.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 59
![Page 60: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/60.jpg)
Probability of a conjunction
Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj ), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.
P(Z ,Y1 = v1, . . . ,Yj = vj )
=
∑Zk
· · ·∑
Z1
P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj.
=∑Zk
· · ·∑
Z1
n∏i=1
P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 60
![Page 61: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/61.jpg)
Probability of a conjunction
Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj ), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.
P(Z ,Y1 = v1, . . . ,Yj = vj )
=∑Zk
· · ·∑
Z1
P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj.
=
∑Zk
· · ·∑
Z1
n∏i=1
P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 61
![Page 62: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/62.jpg)
Probability of a conjunction
Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj ), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.
P(Z ,Y1 = v1, . . . ,Yj = vj )
=∑Zk
· · ·∑
Z1
P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj.
=∑Zk
· · ·∑
Z1
n∏i=1
P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 62
![Page 63: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/63.jpg)
Computing sums of products
Computation in belief networks reduces to computing thesums of products.
How can we compute ab + ac efficiently?
Distribute out the a giving a(b + c)
How can we compute∑
Z1
∏ni=1 P(Xi |parents(Xi ))
efficiently?
Distribute out those factors that don’t involve Z1.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 63
![Page 64: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/64.jpg)
Computing sums of products
Computation in belief networks reduces to computing thesums of products.
How can we compute ab + ac efficiently?
Distribute out the a giving a(b + c)
How can we compute∑
Z1
∏ni=1 P(Xi |parents(Xi ))
efficiently?
Distribute out those factors that don’t involve Z1.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 64
![Page 65: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/65.jpg)
Computing sums of products
Computation in belief networks reduces to computing thesums of products.
How can we compute ab + ac efficiently?
Distribute out the a giving a(b + c)
How can we compute∑
Z1
∏ni=1 P(Xi |parents(Xi ))
efficiently?
Distribute out those factors that don’t involve Z1.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 65
![Page 66: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/66.jpg)
Computing sums of products
Computation in belief networks reduces to computing thesums of products.
How can we compute ab + ac efficiently?
Distribute out the a giving a(b + c)
How can we compute∑
Z1
∏ni=1 P(Xi |parents(Xi ))
efficiently?
Distribute out those factors that don’t involve Z1.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 66
![Page 67: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/67.jpg)
Variable elimination algorithm
To compute P(Z |Y1 = v1 ∧ . . . ∧ Yj = vj ):
Construct a factor for each conditional probability.
Set the observed variables to their observed values.
Sum out each of the other variables (the Z1, . . . ,Zk)according to some elimination ordering.
Multiply the remaining factors. Normalize by dividing theresulting factor f (Z ) by
∑Z f (Z ).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 67
![Page 68: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/68.jpg)
Summing out a variable
To sum out a variable Zj from a product f1, . . . , fk of factors:
Partition the factors intoI those that don’t contain Zj , say f1, . . . , fi ,I those that contain Zj , say fi+1, . . . , fk
We know:
∑Zj
f1× · · ·×fk = f1× · · ·×fi×
∑Zj
fi+1× · · ·×fk
.
Explicitly construct a representation of the rightmostfactor. Replace the factors fi+1, . . . , fk by the new factor.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 68
![Page 69: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/69.jpg)
Variable elimination example
tampering
alarm
fire
leaving
report
smoke
P(Tampering ,Fire,Alarm,Smoke, Leaving ,Report) =P(Tampering)× P(Fire)× P(Alarm|Fire,Tampering)× P(Smoke|Fire)× P(Leaving |Alarm)× P(Report|Leaving)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 69
![Page 70: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/70.jpg)
Variable elimination example
tampering
alarm
fire
leaving
report
smoke
P(tampering) = 0.02P(fire) = 0.01P(alarm|fire ∧ tampering) = 0.5P(alarm|fire ∧ ¬tampering) = 0.99P(alarm|¬fire ∧ tampering) = 0.85P(alarm|¬fire ∧ ¬tampering) = 0.0001P(smoke|fire) = 0.9P(smoke|¬fire) = 0.01P(leaving |alarm) = 0.88P(leaving |¬alarm) = 0.001P(report|leaving) = 0.75P(report|¬leaving) = 0.01
Query: P(Tampering |Smoke = true ∧ Report = true).
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 70
![Page 71: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/71.jpg)
Variable elimination example
Conditional probabilities and factors
P(Tampering)→ f0(Tampering) =true 0.02false 0.98
P(Fire)→ f1(Fire) =true 0.01false 0.99
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 71
![Page 72: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/72.jpg)
Variable elimination example
Conditional probabilities and factors
P(Alarm|Tampering ,Fire)
→ f2(Tampering ,Fire,Alarm) =
true true true 0.5true true false 0.5true false true 0.85true false false 0.15false true true 0.99false true false 0.01false false true 0.0001false false false 0.9999
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 72
![Page 73: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/73.jpg)
Variable elimination example
Conditional probabilities and factors
P(Smoke|Fire)
→ f3(Fire, Smoke) =
true true 0.9true false 0.1false true 0.01false false 0.99
P(Leaving |Alarm)
→ f4(Alarm, Leaving) =
true true 0.88true false 0.12false true 0.001false false 0.999
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 73
![Page 74: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/74.jpg)
Variable elimination example
Conditional probabilities and factors
P(Report|Leaving)
→ f5(Leaving ,Report) =
true true 0.75true false 0.25false true 0.01false false 0.99
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 74
![Page 75: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/75.jpg)
Variable elimination example
variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report
query: P(Tampering |Smoke = true ∧ Report = true)
to eliminate:
Fire,Alarm, Smoke, Leaving ,Report
distributions: P(Alarm|Tampering ,Fire)
P(Smoke|Fire)
P(Leaving |Alarm)
P(Report|Leaving)
P(Tampering)
P(Fire)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 75
![Page 76: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/76.jpg)
Variable elimination example
variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report
query: P(Tampering |Smoke = true ∧ Report = true)
to eliminate: Fire,Alarm, Smoke, Leaving ,Report
distributions: P(Alarm|Tampering ,Fire)
P(Smoke|Fire)
P(Leaving |Alarm)
P(Report|Leaving)
P(Tampering)
P(Fire)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 76
![Page 77: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/77.jpg)
Variable elimination example
variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report
query: P(Tampering |Smoke = true ∧ Report = true)
to eliminate: Fire,Alarm, Smoke, Leaving ,Report
distributions:
P(Alarm|Tampering ,Fire)
P(Smoke|Fire)
P(Leaving |Alarm)
P(Report|Leaving)
P(Tampering)
P(Fire)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 77
![Page 78: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/78.jpg)
Variable elimination example
variables: Tampering ,Fire,Alarm, Smoke, Leaving ,Report
query: P(Tampering |Smoke = true ∧ Report = true)
to eliminate: Fire,Alarm, Smoke, Leaving ,Report
distributions: P(Alarm|Tampering ,Fire)
P(Smoke|Fire)
P(Leaving |Alarm)
P(Report|Leaving)
P(Tampering)
P(Fire)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 78
![Page 79: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/79.jpg)
Variable elimination example
Eliminate the observed variable Smoke
f3(Fire, Smoke) =
true true 0.9true false 0.1false true 0.01false false 0.99
P(Smoke = true|Fire)
→ f ′3(Fire) =true 0.9false 0.01
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 79
![Page 80: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/80.jpg)
Variable elimination example
Eliminate the observed variable Report
f5(Leaving ,Report) =
true true 0.75true false 0.25false true 0.01false false 0.99
P(Report = yes|Leaving)
→ f ′5(Leaving) =true 0.75false 0.01
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 80
![Page 81: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/81.jpg)
Variable elimination example
Select e.g. Fire to be eliminated next
Collect all the factors containing Fire:
f1(Fire) =true 0.01false 0.99
f ′3(Fire) =true 0.9false 0.01
f2(Tampering ,Fire,Alarm) =
true true true 0.5true true false 0.5true false true 0.85true false false 0.15false true true 0.99false true false 0.01false false true 0.0001false false false 0.9999
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 81
![Page 82: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/82.jpg)
Variable elimination example
Compute a new factor for them, eliminating Fire
f6(Tampering ,Alarm)
=∑Fire
(f1(Fire)× f2(Tampering ,Fire,Alarm)× f3(Fire)
=
true true 0.01292true false 0.00599false true 0.00891false false 0.00999
remaining factors:f0(Tampering), f4(Alarm, Leaving), f ′5(Leaving),f6(Tampering ,Alarm)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 82
![Page 83: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/83.jpg)
Variable elimination example
Select e.g. Alarm to be eliminated next.
Collect the factors containing Alarm
f4(Alarm, Leaving) =
true true 0.88true false 0.12false true 0.001false false 0.999
f6(Tampering ,Alarm) =
true true 0.0129true false 0.006false true 0.0089false false 0.01
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 83
![Page 84: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/84.jpg)
Variable elimination example
Compute a new factor for them, eliminating Alarm
f7(Tampering , Leaving)
=∑Alarm
f6(Tampering ,Alarm)× f4(Leaving ,Alarm)
=
true true 0.01137true false 0.00753false true 0.00785false false 0.01105
remaining factors:
f0(Tampering), f ′5(Leaving), f7(Tampering , Leaving)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 84
![Page 85: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/85.jpg)
Variable elimination example
Select Leaving to be eliminated next
Collect the factors containing Leaving
f ′5(Leaving) =true 0.75false 0.01
f7(Tampering , Leaving , ) =
true true 0.01137true false 0.00753false true 0.00785false false 0.01105
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 85
![Page 86: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/86.jpg)
Variable elimination example
Compute a new factor for them, eliminating Leavingf8(Tampering)
=∑
Leaving
f ′5(Leaving)× f7(Tampering , Leaving)
=true 0.0086false 0.006
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 86
![Page 87: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/87.jpg)
Variable elimination example
Multiply the remaining factors for Tampering
f0(Tampering) =true 0.02false 0.98
f8(Tampering) =true 0.0086false 0.006
f9(Tampering)
= f0(Tampering)× f8(Tampering) =true 0.00017false 0.00588
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 87
![Page 88: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/88.jpg)
Variable elimination example
posterior distribution for Tampering
P(Tampering |Report, Smoke) =
f9(Tampering)∑Tampering f9(Tampering)
=true 0.02844false 0.97156
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 88
![Page 89: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/89.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 89
![Page 90: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/90.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 90
![Page 91: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/91.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 91
![Page 92: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/92.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 92
![Page 93: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/93.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 93
![Page 94: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/94.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
f 6(Tampering ,Alarm)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 94
![Page 95: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/95.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
f 6(Tampering ,Alarm) f 4(Alarm, Leaving)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 95
![Page 96: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/96.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
f 6(Tampering ,Alarm) f 4(Alarm, Leaving)
f 7(Tampering , Leaving)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 96
![Page 97: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/97.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
f 6(Tampering ,Alarm) f 4(Alarm, Leaving)
f 7(Tampering , Leaving)
f 8(Tampering)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 97
![Page 98: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/98.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
f 6(Tampering ,Alarm) f 4(Alarm, Leaving)
f 7(Tampering , Leaving)
f 8(Tampering)f 0(Tampering)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 98
![Page 99: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/99.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
f 6(Tampering ,Alarm) f 4(Alarm, Leaving)
f 7(Tampering , Leaving)
f 8(Tampering)f 0(Tampering)
f 9(Tampering)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 99
![Page 100: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/100.jpg)
Variable elimination example
Smoke = true f 3(Fire,Smoke)
f ′3(Fire)
f 5(Leaving ,Report) Report = true
f ′5(Leaving)
f 1(Fire) f 2(Tampering ,Fire,Alarm)
f 6(Tampering ,Alarm) f 4(Alarm, Leaving)
f 7(Tampering , Leaving)
f 8(Tampering)f 0(Tampering)
f 9(Tampering)
Normalization
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 100
![Page 101: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/101.jpg)
Variable Elimination example
A B C D E F
G H
Query: P(G |f ); elimination ordering: A,H ,E ,D,B ,C
P(G |f ) ∝
∑C
∑B
∑D
∑E
∑H
∑A
P(A)P(B |A)P(C |B)
P(D|C )P(E |D)P(f |E )P(G |C )P(H |E )
=∑
C
(∑B
(∑A
P(A)P(B |A)
)P(C |B)
)P(G |C )(∑
D
P(D|C )
(∑E
P(E |D)P(f |E )∑
H
P(H |E )
))
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 101
![Page 102: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/102.jpg)
Variable Elimination example
A B C D E F
G H
Query: P(G |f ); elimination ordering: A,H ,E ,D,B ,C
P(G |f ) ∝∑
C
∑B
∑D
∑E
∑H
∑A
P(A)P(B |A)P(C |B)
P(D|C )P(E |D)P(f |E )P(G |C )P(H |E )
=∑
C
(∑B
(∑A
P(A)P(B |A)
)P(C |B)
)P(G |C )(∑
D
P(D|C )
(∑E
P(E |D)P(f |E )∑
H
P(H |E )
))
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 102
![Page 103: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/103.jpg)
Variable Elimination example
A B C D E F
G H
Query: P(G |f ); elimination ordering: A,H ,E ,D,B ,C
P(G |f ) ∝∑
C
∑B
∑D
∑E
∑H
∑A
P(A)P(B |A)P(C |B)
P(D|C )P(E |D)P(f |E )P(G |C )P(H |E )
=∑
C
(∑B
(∑A
P(A)P(B |A)
)P(C |B)
)P(G |C )(∑
D
P(D|C )
(∑E
P(E |D)P(f |E )∑
H
P(H |E )
))c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 103
![Page 104: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/104.jpg)
Stochastic Simulation
Idea: probabilities ↔ samples
Get probabilities from samples:
X count
x1 n1...
...xk nk
total m
↔
X probability
x1 n1/m...
...xk nk/m
If we could sample from a variable’s (posterior)probability, we could estimate its (posterior) probability.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 104
![Page 105: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/105.jpg)
Generating samples from a distribution
For a variable X with a discrete domain or a (one-dimensional)real domain:
Totally order the values of the domain of X .
Generate the cumulative probability distribution:f (x) = P(X ≤ x).
Select a value y uniformly in the range [0, 1].
Select the x such that f (x) = y .
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 105
![Page 106: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/106.jpg)
Cumulative Distribution
0
1
v1 v2 v3 v4 v1 v2 v3 v4
P(X)
f(X)
0
1
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 106
![Page 107: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/107.jpg)
Forward sampling in a belief network
Sample the variables one at a time; sample parents of Xbefore sampling X .
Given values for the parents of X , sample from theprobability of X given its parents.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 107
![Page 108: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/108.jpg)
Rejection Sampling
To estimate a posterior probability given evidenceY1 = v1 ∧ . . . ∧ Yj = vj :
Reject any sample that assigns Yi to a value other thanvi .
The non-rejected samples are distributed according to theposterior probability:
P(α|evidence) ≈∑
sample|=α 1∑sample 1
where we consider only samples consistent with evidence.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 108
![Page 109: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/109.jpg)
Rejection Sampling Example: P(ta|sm, re)
Ta Fi
SmAl
Le
Re
Observe Sm = true,Re = true
Ta Fi Al Sm Le Res1 false true false true false false
8
s2 false true true true true true 4
s3 true false true false — — 8
s4 true true true true true true 4
. . .s1000 false false false false — — 8
P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 109
![Page 110: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/110.jpg)
Rejection Sampling Example: P(ta|sm, re)
Ta Fi
SmAl
Le
Re
Observe Sm = true,Re = true
Ta Fi Al Sm Le Res1 false true false true false false 8
s2 false true true true true true
4
s3 true false true false — — 8
s4 true true true true true true 4
. . .s1000 false false false false — — 8
P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 110
![Page 111: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/111.jpg)
Rejection Sampling Example: P(ta|sm, re)
Ta Fi
SmAl
Le
Re
Observe Sm = true,Re = true
Ta Fi Al Sm Le Res1 false true false true false false 8
s2 false true true true true true 4
s3 true false true false
— — 8
s4 true true true true true true 4
. . .s1000 false false false false — — 8
P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 111
![Page 112: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/112.jpg)
Rejection Sampling Example: P(ta|sm, re)
Ta Fi
SmAl
Le
Re
Observe Sm = true,Re = true
Ta Fi Al Sm Le Res1 false true false true false false 8
s2 false true true true true true 4
s3 true false true false — — 8
s4 true true true true true true
4
. . .s1000 false false false false — — 8
P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 112
![Page 113: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/113.jpg)
Rejection Sampling Example: P(ta|sm, re)
Ta Fi
SmAl
Le
Re
Observe Sm = true,Re = true
Ta Fi Al Sm Le Res1 false true false true false false 8
s2 false true true true true true 4
s3 true false true false — — 8
s4 true true true true true true 4
. . .s1000 false false false false
— — 8
P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 113
![Page 114: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/114.jpg)
Rejection Sampling Example: P(ta|sm, re)
Ta Fi
SmAl
Le
Re
Observe Sm = true,Re = true
Ta Fi Al Sm Le Res1 false true false true false false 8
s2 false true true true true true 4
s3 true false true false — — 8
s4 true true true true true true 4
. . .s1000 false false false false — — 8
P(sm) = 0.02P(re|sm) = 0.32How many samples are rejected?How many samples are used?
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 114
![Page 115: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/115.jpg)
Importance Sampling
Samples have weights: a real number associated witheach sample that takes the evidence into account.
Probability of a proposition is weighted average ofsamples:
P(α|evidence) ≈∑
sample|=α weight(sample)∑sample weight(sample)
don’t sample all of the variables, but weight each sampleaccording to a proposal distribution P(evidence|sample).
summing out the variables which are neither observed norsampled (exact inference)
the proposal distribution should be as close as possible tothe posterior distribution (unknown at sampling time)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 115
![Page 116: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/116.jpg)
Applications of Bayesian Networks
modelling human multimodal perceptionI human sensor data fusionI top down influences in human perception
multimodal human-computer interaction
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 116
![Page 117: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/117.jpg)
Human Sensor Data Fusion
two general strategies (Ernst and Bulthoff, 2004)I sensory combination: maximize information delivered
from the different sensory modalitiesI sensory integration: reduce the variance in the sensory
estimate to increase its reliability
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 117
![Page 118: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/118.jpg)
Sensor Data Fusion
sensory integration has to produce a coherent percept
Which modality is the dominating one?I visual capture: e.g. vision dominates haptic perceptionI auditory capture: e.g. number of auditory beeps vs.
number of visual flashes
modality precision, modality appropriateness, estimateprecision: the most precise modality wins
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 118
![Page 119: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/119.jpg)
Sensor Data Fusion
two possible explanations:I maximum likelihood estimation: weighted sum of the
individual estimatesI all cues contribute to the percept
I cue switching:I the most precise cue takes overI the less precise cues have no influence
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 119
![Page 120: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/120.jpg)
Sensor Data Fusion
maximum likelihood estimate:I weighted sum of the individual estimatesI weights are proportional to their inverse variance
s =∑
i
wi si with∑
i
wi = 1
wi =1/σ2
i∑j 1/σ2
j
I most reliable unbiased estimate possible (estimate withminimal variance)
I optimality not really required; good approximation mightbe good enough
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 120
![Page 121: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/121.jpg)
Sensor Data Fusion
overwhelming evidence for the role of estimate precision...
weighting within modalitiesI visual depth perception: motion + disparity, texture +
disparityI visual perception of slantI visual perception of distanceI haptic shape perception: force + position
cross modal weighting:I vision + auditionI vision + hapticI vision + proprioception
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 121
![Page 122: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/122.jpg)
Sensor Data Fusion
... but no really conclusive evidence for the reliabilityhypothesis
problem: estimating the variance of a stimulusI requires an independence assumptionI difficult to achieve in a unimodal taskI cues within one modality are correlatedI → multi-modal experiments
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 122
![Page 123: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/123.jpg)
Sensor Data Fusion
Ernst and Banks (2002): vision-haptic integrationI modifying the visual reliability by adding noise to the
visual channelI two extreme cases:
I vision dominates (little noise)I haptics dominate (high noise)
→ perception requires dynamic adjustment of weights→ nervous system has online access to sensory reliabilities
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 123
![Page 124: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/124.jpg)
Sensor Data Fusion
open question: Where do the estimates come from?
prior experience vs. on-line estimation during perception
on-line is more likely: observing the fluctuations ofresponses to a signalI over some period of timeI across a population of independent neurons (population
codes)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 124
![Page 125: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/125.jpg)
Top-Down Influence
perception is modulated by contextual factors, e.g sceneor object properties
How to model top-down influence?I can be captured by prior probabilitiesI prior probabilities can be integrated by means of Bayes
rule→ Bayesian reasoning
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 125
![Page 126: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/126.jpg)
Top-Down Influence
Kersten and Yuilley (2003)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 126
![Page 127: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/127.jpg)
Top-Down Influence
Kersten and Yuilley (2003)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 127
![Page 128: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/128.jpg)
Top-Down Influence
Kersten and Yuilley (2003)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 128
![Page 129: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/129.jpg)
Top-Down Influence
Kersten and Yuilley (2003)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 129
![Page 130: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/130.jpg)
Multimodal Human-Computer Interaction
Socher, Sagerer, Perona (2000), Wachsmuth, Sagerer(2002)I multi-modal human machine
interaction usingI speechI visionI (pointing gestures)
data fusion from different reference systemsI spatial (vision) vs. temporal (speech)I language based instruction: fusion on the level of
concepts
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 130
![Page 131: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/131.jpg)
Multimodal Human-Computer Interaction
noisy and partial interpretation of the sensory signals
dealing with referential uncertainty
goal: cross modal synergy
sensory data: properties (color) and (spatial)relationships: degree-of-membership representation(fuzzyness)
combination using Bayesian Networks
estimating the probabilities by means of psycholinguisticexperimentsI how do humans categorize objects and verbalize object
descriptions
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 131
![Page 132: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/132.jpg)
Multimodal Human-Computer Interaction
identified object(23)3-holed bar5-holed bar7-holed barcube, redcube, blue...
likelihood of the object being the intended one
scene (23)3-holed bar5-holed bar7-holed barcube, redcube, blue...
probability of being part of the scene
objectn (23)3-holed bar5-holed bar7-holed barcube, redcube, blue...
probability of the categorization
object1 (23)3-holed bar5-holed bar7-holed barcube, redcube, blue...
type3-holed bar5-holed bar7-holed barcube...
colorredbluegreen...
type3-holed bar5-holed bar7-holed barcube...
colorredbluegreen...
instruction(23)3-holed bar5-holed bar7-holed barcube, redcube, blue...
prob of having been mentioned
typeobjectbarcube3-holed bar...
colorredbluegreen...
sizesmallbigshortlong...
shaperoundangularelongatedhexagonal...
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 132
![Page 133: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/133.jpg)
Multimodal Human-Computer Interaction
more sophisticated fusion model (Wachsmuth, Sagerer2002)I solution to the correspondence problem using selection
variables
more adequate modelling of naming habits
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 133
![Page 134: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/134.jpg)
Multimodal Human-Computer Interaction
results for object identification
correct noisy noisy noisyinput speech vision input
recognition error rates – 15% 20% 15%+20%identification rates 0.85 0.81 0.79 0.76decrease of identification rates – 5% 7% 11%
synergy between vision and speech
higher robustness due to redundancy between modalities
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 134
![Page 135: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/135.jpg)
Bayesian Models for Sequences
the world is dynamicI old information becomes obsoleteI new information is availableI the decisions an agent takes need to reflect these
changes
the dynamics of the world can be captured by means ofstate-based models
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 135
![Page 136: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/136.jpg)
Bayesian Models for Sequences
changes in the world are modelled as transitions betweensubsequent states
state transitions can beI clocked, e.g.
I speech: every 10 msI vision: every 40 msI stock market trends: every 24 hours
I triggered by external eventsI language: every other wordI travel planning: potential transfer points
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 136
![Page 137: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/137.jpg)
Bayesian Models for Sequences
main purpose:I predicting the probability of the next eventI computing the probability of a (sub-)sequence
important application areas:I speech and language processing, genome analysis, time
series predictions (stock market, natural desasters, ...)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 137
![Page 138: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/138.jpg)
Markov chain
Markov chain : special sort of belief network forsequential observations
S0 S1 S2 S3 S4
Thus, P(St+1|S0, . . . , St) = P(St+1|St).
Intuitively St conveys all of the information about thehistory that can affect the future states.
“The future is independent of the past given the present.”
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 138
![Page 139: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/139.jpg)
Stationary Markov chain
A stationary Markov chain is when for all t > 0, t ′ > 0,P(St+1|St) = P(St′+1|St′).
We specify P(S0) and P(St+1|St).I Simple model, easy to specifyI Often the natural modelI The network can extend indefinitely
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 139
![Page 140: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/140.jpg)
Higher-order Markov Models
modelling dependencies of various lengths
bigrams
S0 S1 S2 S3 S4
trigrams
S0 S1 S2 S3 S4
I three different time slices have to be modelledI for S0: P(S0)I for S1: P(S1|S0)I for all others: P(Si |Si−2Si−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 140
![Page 141: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/141.jpg)
Higher-order Markov Models
quadrograms: P(Si |Si−3Si−2Si−1)
S0 S1 S2 S3 S4
four different kinds of time slices required
Markov models can be applied to predict the probabilityof the next event
e.g. for speech and language processing, genome analysis,time series predictions (stock market, natural desasters,...)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 141
![Page 142: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/142.jpg)
Higher-order Markov Models
quadrograms: P(Si |Si−3Si−2Si−1)
S0 S1 S2 S3 S4
four different kinds of time slices required
Markov models can be applied to predict the probabilityof the next event
e.g. for speech and language processing, genome analysis,time series predictions (stock market, natural desasters,...)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 142
![Page 143: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/143.jpg)
Markov Models
examples of Markov chains for German letter sequences
unigrams:aiobnin*tarsfneonlpiitdregedcoa*ds*e*dbieastnreleeucdkeaitb*dnurlarsls*omn*keu**svdleeoieei* . . .
bigrams:er*agepteprteiningeit*gerelen*re*unk*ves*mterone*hin*d*an*nzerurbom* . . .
trigrams:billunten*zugen*die*hin*se*sch*wel*war*gen*man*nicheleblant*diertunderstim* . . .
quadrograms:eist*des*nich*in*den*plassen*kann*tragen*was*wiese*zufahr* . . .
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 143
![Page 144: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/144.jpg)
Hidden Markov Model
Often the observation does not deterministically dependon the state of the model
This can be captured by a Hidden Markov Model(HMM)
... even if the state transitions are not directly observable
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 144
![Page 145: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/145.jpg)
Hidden Markov Model
A HMM is a belief network where states and observationsare separated
S0 S1 S2 S3 S4
O0 O1 O2 O3 O4
P(S0) specifies initial conditions
P(St+1|St) specifies the dynamics
P(Ot |St) specifies the sensor model
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 145
![Page 146: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/146.jpg)
Hidden Markov Models
A Hidden Markov Model consists of
S: a finite set of states si
O: a finite set of observations oi
transition probabilities T : S × S 7→ R+
emission probabilities E : S ×O 7→ R+
prior probabilities (for the initial states): π : S 7→ R+
HMMs are a special case of belief networks→ arbitrary distributions can be computed by means ofvariable elimination
HMMs make strong assumptions about the modeltopology→ special algorithms for inference are available
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 146
![Page 147: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/147.jpg)
Hidden Markov Models
A Hidden Markov Model consists of
S: a finite set of states si
O: a finite set of observations oi
transition probabilities T : S × S 7→ R+
emission probabilities E : S ×O 7→ R+
prior probabilities (for the initial states): π : S 7→ R+
HMMs are a special case of belief networks→ arbitrary distributions can be computed by means ofvariable elimination
HMMs make strong assumptions about the modeltopology→ special algorithms for inference are available
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 147
![Page 148: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/148.jpg)
Inference with Hidden Markov Models
Evaluation:What’s the probability of an HMM λ having produced anobservation sequence O = (o1, o2, . . . , ot)problem: hidden state sequence S = (s1, s2, . . . , st) is notknown
P(O|λ) =∑∀S
P(O|S , λ) P(S |λ)
with
P(O|S , λ) =t∏
i=1
P(Oi = oi |Si = si ) = Es1,o1 Es2,o2 . . .Est ,ot
P(S |λ) = P(S1 = s1)t∏
i=2
P(Si = si |Si−1 = si−1)
= πs1 Ts1,s2 Ts2,s3 . . .Tst−1,st
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 148
![Page 149: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/149.jpg)
Inference with Hidden Markov Models
Forward algorithm
there are exponentially many state sequences→ naive computation requires exponentially manymultiplications: O(t · |S|t)
efficient calculation using a recursive reformulation basedon the Markov property
forward coefficients: αk(s) = P(O1:k , Sk = s|λ)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 149
![Page 150: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/150.jpg)
Inference with Hidden Markov Models
Forward algorithm
initialize α1(s) = πs Es,o1 ∀s ∈ Srepeat, for k = 1 to k = t − 1 and for all s ∈ S
αk+1(s) = Es,ok+1
∑q∈S
αk(q) Tq,s
aggregate:
P(O|λ) =∑s∈S
αt(s)
complexity reduced to O(t · |S|2)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 150
![Page 151: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/151.jpg)
Inference with Hidden Markov Models
Backward algorithm
initialize βk(s) = 1 ∀s ∈ Srepeat, for k = t − 1 to k = 1 and for all s ∈ S
βk(s) =∑q∈S
βk+1(q)Ts,q Eq,ok+1
aggregate:
P(O|λ) =∑s∈S
π(s) β1(s) Es,o1
complexity reduced to O(t · |S|2)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 151
![Page 152: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/152.jpg)
Inference with Hidden Markov Models
Explanation
Filtering: P(Sk |O1:k)given an observation sequence O1:k = (o1, o2, . . . , ok)compute the distribution of Sk
Smoothing: P(Sj |O1:k), j < kgiven an observation sequence O1:k = (o1, o2, . . . , ok)compute the distribution of Sj , j < k
Prediction: P(Sj |O1:k), j > kgiven an observation sequence O1:k = (o1, o2, . . . , ok)compute the distribution of Sj , j > k
Decoding: S1k= arg max
SP(S1:k |O1:k)
given an observation sequence O1:k = (o1, o2, . . . , ok)compute the most likely state sequence S1:k
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 152
![Page 153: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/153.jpg)
Filtering
Filtering:
P(Si |o1, . . . , oi )
What is the current belief state based on the observationhistory?
P(Si |o1, . . . , oi ) ∝ P(Si , o1, . . . , oi )
= P(oi |Si )P(Si , o1, . . . , oi−1)
= P(oi |Si )∑Si−1
P(Si , Si−1, o1, . . . , oi−1)
= P(oi |Si )∑Si−1
P(Si |Si−1)P(Si−1, o1, . . . , oi−1)
∝ P(oi |Si )∑Si−1
P(Si |Si−1)P(Si−1|o1, . . . , oi−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 153
![Page 154: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/154.jpg)
Filtering
Filtering:
P(Si |o1, . . . , oi )
What is the current belief state based on the observationhistory?
P(Si |o1, . . . , oi ) ∝ P(Si , o1, . . . , oi )
= P(oi |Si )P(Si , o1, . . . , oi−1)
= P(oi |Si )∑Si−1
P(Si , Si−1, o1, . . . , oi−1)
= P(oi |Si )∑Si−1
P(Si |Si−1)P(Si−1, o1, . . . , oi−1)
∝ P(oi |Si )∑Si−1
P(Si |Si−1)P(Si−1|o1, . . . , oi−1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 154
![Page 155: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/155.jpg)
Example (1): robot localization
Suppose a robot wants to determine its location based onits actions and its sensor readings: Localization
This can be represented by the augmented HMM:
Loc0 Loc1 Loc2 Loc3 Loc4
Obs0 Obs1 Obs2 Obs3 Obs4
Act0 Act1 Act2 Act3
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 155
![Page 156: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/156.jpg)
Example localization domain
Circular corridor, with 16 locations:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Doors at positions: 2, 4, 7, 11.
Noisy Sensors
Stochastic Dynamics
Robot starts at an unknown location and must determinewhere it is.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 156
![Page 157: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/157.jpg)
Example Sensor Model
P(Observe Door | At Door) = 0.8
P(Observe Door | Not At Door) = 0.1
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 157
![Page 158: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/158.jpg)
Example Dynamics Model
P(loct+1 = L|actiont = goRight ∧ loct = L) = 0.1
P(loct+1 = L + 1|actiont = goRight ∧ loct = L) = 0.8
P(loct+1 = L + 2|actiont = goRight ∧ loct = L) = 0.074
P(loct+1 = L′|actiont = goRight ∧ loct = L) = 0.002 forany other location L′.I All location arithmetic is modulo 16.I The action goLeft works the same but to the left.
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 158
![Page 159: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/159.jpg)
Combining sensor information
the robot can have many (noisy) sensors for signals fromthe environment
e.g. a light sensor in addition to the door sensor
Sensor Fusion : combining information from differentsources
Loc0 Loc1 Loc2 Loc3 Loc4
Act0 Act1 Act2 Act3
D0 D1 D2 D3 D4L0 L1 L2 L3 L4
Dt door sensor value at time tLt light sensor value at time t
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 159
![Page 160: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/160.jpg)
Example (2) Medical diagnosis
milk infection test (Jensen and Nielsen 2007)I having test data for a certain period of time available,
what’s the probability that a cow is currently infected
the probability of the test outcome depends on the cowbeing infected or not
Infected?
Test
the probability of the cow being infected also depends onthe cow being infected on the previous dayI first order Markov model
Inf1
Test1
Inf2
Test2
Inf3
Test3
Inf4
Test4
Inf5
Test5
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 160
![Page 161: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/161.jpg)
Example dynamics
the probability of the cow being infected depends on thecow being infected on the two previous daysI incubation and infection periods of more than one dayI second order Markov model
Inf1
Test1
Inf2
Test2
Inf3
Test3
Inf4
Test4
Inf5
Test5
I assumes only random test errors
weaker independence assumptionsI more powerful modelI more data required for training
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 161
![Page 162: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/162.jpg)
Refined models of the dynamics
the probability of the test outcome also depends on thecow’s health and the test outcome on the previous dayI can also capture systematic test errorsI second order Markov model for the infectionI first order Markov model for the test results
Inf1
Test1
Inf2
Test2
Inf3
Test3
Inf4
Test4
Inf5
Test5
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 162
![Page 163: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/163.jpg)
Decoding
What’s the state sequence which most likely produced theobservation?
filtering and smoothing produce probability distributionsfor the values of a state variable
choosing the value with the highest probability gives onlya pointwise best estimationI a sequence of pointwise best estimation need not be the
best state value sequenceI the model need not even be able to produce the
pointwise best sequence
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 163
![Page 164: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/164.jpg)
Decoding
Viterbi coefficients:
δk(s) = maxS1:k−1
P(S1:k = (S1:k−1, s),O1:k |λ)
I δk (s) is the probability of the most likely path ending instate s and generating the observation sequence O1:k
because of the Markov property the computationsimplifies to
δk+1(s) = maxq
(δk(q) Tq,s) Es,ok+1
this corresponds to the principle of dynamic programming
first computing the deltas in a forward pass
afterwards reconstructing the best path by means ofpointers pred from each state qk to its most likelypredecessor qk−1
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 164
![Page 165: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/165.jpg)
Decoding
Viterbi coefficients:
δk(s) = maxS1:k−1
P(S1:k = (S1:k−1, s),O1:k |λ)
I δk (s) is the probability of the most likely path ending instate s and generating the observation sequence O1:k
because of the Markov property the computationsimplifies to
δk+1(s) = maxq
(δk(q) Tq,s) Es,ok+1
this corresponds to the principle of dynamic programming
first computing the deltas in a forward pass
afterwards reconstructing the best path by means ofpointers pred from each state qk to its most likelypredecessor qk−1
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 165
![Page 166: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/166.jpg)
Decoding
Viterbi coefficients:
δk(s) = maxS1:k−1
P(S1:k = (S1:k−1, s),O1:k |λ)
I δk (s) is the probability of the most likely path ending instate s and generating the observation sequence O1:k
because of the Markov property the computationsimplifies to
δk+1(s) = maxq
(δk(q) Tq,s) Es,ok+1
this corresponds to the principle of dynamic programming
first computing the deltas in a forward pass
afterwards reconstructing the best path by means ofpointers pred from each state qk to its most likelypredecessor qk−1
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 166
![Page 167: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/167.jpg)
Decoding
Viterbi algorithm
intitialize for all s ∈ SI δ1(s) = πs Es,o1
I pred1(s) = null
repeat recursivelyI δk+1(s) = max
q(δk (q) Tq,s) Es,ok+1
I predk+1(s) = arg maxq
(δk (q) Tq,s)
select the most likely terminal state st = arg maxsδt(s)
with p = δ(st) being the probability of the most likelypath
reconstruct the most likely path backwards:qk = predk+1(qk+1)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 167
![Page 168: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/168.jpg)
Decoding
Viterbi algorithm
is similar to the forward algorithm
uses maximization instead of summation
has many applications in signal processing, patternrecognition, biocomputing, natural language processing,etc.I message reconstruction for noisy wireless communicationI speech recognition / speech synthesisI machine translationI swype keyboardsI intron/exon detectionI ...
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 168
![Page 169: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/169.jpg)
Decoding
Viterbi algorithm
is similar to the forward algorithm
uses maximization instead of summation
has many applications in signal processing, patternrecognition, biocomputing, natural language processing,etc.I message reconstruction for noisy wireless communicationI speech recognition / speech synthesisI machine translationI swype keyboardsI intron/exon detectionI ...
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 169
![Page 170: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/170.jpg)
Example (3) Part-of-Speech tagging
elementary procedure for Natural Language Processing
annotating the word forms in a sentence with
part-of-speech informationYesterdayRB theDT schoolNNS wasVBD closedVBN
topic areas: He did some field work.fieldmilitary , fieldagriculture , fieldphysics , fieldsocial sci ., fieldoptics , ...
semantic rolesThe winnerBeneficiary received the trophyTheme at thetown hallLocation
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 170
![Page 171: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/171.jpg)
Example (3): Part-of-Speech tagging
sequence labelling problemI the label depends on the current state and the most
recent history
one-to-one correspondence between states, tags, andword forms
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 171
![Page 172: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/172.jpg)
Example dynamics
causal (generative) model of the sentence generationprocessI tags are assigned to statesI the underlying state (tag) sequence produces the
observations (word forms)
typical independence assumptionsI trigram probabilities for the state transitionsI word form probabilities depend only on the current state
Tag1
Word1
Tag2
Word2
Tag3
Word3
Tag4
Word4
Tag5
Word5
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 172
![Page 173: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/173.jpg)
Example dynamics
weaker independence assumption (stronger model):I the probability of a word form also depends on the
previous and subsequent state
Tag1
Word1
Tag2
Word2
Tag3
Word3
Tag4
Word4
Tag5
Word5
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 173
![Page 174: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/174.jpg)
Two alternative graphical representations
influence diagrams, belief networks, Bayesian networks,causal networks, graphical models, ...state transition diagrams (probabilistic finite statemachines)
Bayesian networks State transition diagrams
state nodes variables with statesstates as values
edges into causal influence possible state transitionsstate nodes and their probabilities
# state nodes length of the observa-tion sequence
# model states
observation variables with observation valuesnodes observations as values
edges into conditional probability conditional probabilitiesobserv. nodes tables
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 174
![Page 175: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/175.jpg)
Two alternative graphical representations
Bigram-tagging as a Bayesian network
Tag1
Word1
Tag2
Word2
Tag3
Word3
Tag4
Word4
Tag5
Word5
possible state transitions are not directly visibleI indirectly encoded in the conditional probability tables
sometimes state transition diagrams are better suited toillustrate the model topology
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 175
![Page 176: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/176.jpg)
Two alternative graphical representations
Bigram-Tagging as a state transition diagram (can onlybe depicted for bigram models)
t1
t2
t3
t4
w1 . . . wn
w1 . . . wn
w1 . . . wn
w1 . . . wn
ergodic model: full connectivity between all states
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 176
![Page 177: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/177.jpg)
Example (4): Speech Recognition
similar problem: Swype gesture recognition
observation subsequences of unknown length are mappedto one label→ alignment problem
full connectivity is not needed
a phone/syllable/word realization cannot be reversed
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 177
![Page 178: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/178.jpg)
Example dynamics
possible model topologies for phones (only transitionsdepicted)
P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0
0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 178
![Page 179: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/179.jpg)
Example dynamics
possible model topologies for phones (only transitionsdepicted)
P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0
0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 179
![Page 180: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/180.jpg)
Example dynamics
possible model topologies for phones (only transitionsdepicted)
P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0
0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 180
![Page 181: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/181.jpg)
Example dynamics
possible model topologies for phones (only transitionsdepicted)
P(1|0) P(1|1) 0 0 0P(2|0) P(2|1) P(2|2) 0 0
0 P(3|1) P(3|2) P(3|3) 00 0 P(4|2) P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 P(3|1) P(3|2) P(3|3) 00 0 0 P(4|3) 0
P(1|0) P(1|1) 0 0 00 P(2|1) P(2|2) 0 00 0 P(3|2) P(3|3) 00 0 0 P(4|3) 0
the more data available the more sophisticated (andpowerful) models can be trained
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 181
![Page 182: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/182.jpg)
Model composition
composition of submodels on multiple levelsI phone models have to be concatenated into word modelsI word models are concatenated into utterance models
[ f ] [ a ] [ n ]
[ f a n ]
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 182
![Page 183: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/183.jpg)
Dynamic Bayesian Networks
using complex state descriptions, encoded by means offeaturesI model can be in ”different states” at the same time
more efficient implementation of state transitions
modelling of transitions between sub-models
factoring out different influences on the outcomeI interplay of several actuators (muscles, motors, ...)
modelling partly asynchronized processesI coordinated movement of different body parts (e.g. sign
language)I synchronization between speech sounds and lip
movementsI synchronization between speech and gestureI ...
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 183
![Page 184: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/184.jpg)
Dynamic Bayesian Networks
problem: state-transition probability tables are sparseI contain a large number of zero probabilities
alternative model structure: separation of state andtransition variables
deterministic statevariablesstochastic transitionvariables
observation variables
causal links can be stochastic or deterministicI stochastic: conditional probabilities to be estimatedI deterministic: to be specified manually (decision trees)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 184
![Page 185: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/185.jpg)
Dynamic Bayesian Networks
state variablesI distinct values for each state of the corresponding HMMI value at slice t + 1 is a deterministic function of the
state and the transition of slice t
transition variablesI probability distributionI which arc to take to leave a state of the corresponding
HMMI number of values is the outdegree of the corresponding
state in an HMM
use of transition variables is more efficient than usingstochastic state variables with zero probabilities for theimpossible state transitions
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 185
![Page 186: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/186.jpg)
Dynamic Bayesian Networks
composite models: some applications require a model tobe composed out of sub-modelsI speech: phones → syllables → words → utterancesI vision: sub-parts → parts → compositesI genomics: nucleotides → amino acids → proteins
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 186
![Page 187: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/187.jpg)
Dynamic Bayesian Networks
composite models:I length of the sub-segments is not kown in advanceI naive concatenation would require to generate all
possible segmentations of the input sequence
︸ ︷︷ ︸sub-model for /n/
︸ ︷︷ ︸sub-model for /ow/
evolution of articulationacoustic emission
which sub-model to choose next?
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 187
![Page 188: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/188.jpg)
Dynamic Bayesian Networks
additional sub-model variables select the next sub-modelto choose
sub-model indexvariables
stochastic transitionvariablessub-model statevariables
observation variables
sub-model index variables: which submodel to use at eachpoint in time
sub-model index and transition variables model legalsequences of sub-models (control layer)
several control layers can be combined
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 188
![Page 189: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/189.jpg)
Dynamic Bayesian Networks
factored models (1): factoring out different influences onthe observation
e.g. articulation:I asynchroneous movement of articulators
(lips, tongue, jaw, ...)
state
articulators
observation
if the data is drawn from a factored source, full DBNs aresuperior to the special case of HMMs
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 189
![Page 190: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/190.jpg)
Dynamic Bayesian Networks
factored models (2): coupling of different input channelsI e.g. acoustic and visual information in speech processing
naıve approach (1): data level fusion
state
mixtures
observation
too strong synchronisation constraints
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 190
![Page 191: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/191.jpg)
Dynamic Bayesian Networks
naıve approach(2): independent input streams
acoustic channel
visual channel
no synchronisation at all
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 191
![Page 192: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/192.jpg)
Dynamic Bayesian Networks
product model
state
mixtures
visual channel
acoustic channel
state values are taken from the cross product of acousticand visual states
large probability distributions have to be trained
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 192
![Page 193: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/193.jpg)
Dynamic Bayesian Networks
factorial model (Nefian et al. 2002)
factor 1 state
factor 2 state
mixtures
visual channel
acoustic channel
independent (hidden) states
indirect influence by means of the ”explaining away”effect
loose coupling of input channels
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 193
![Page 194: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/194.jpg)
Dynamic Bayesian Networks
inference is extremely expensiveI nodes are connected across slicesI domains are not locally restrictedI cliques become intractably large
but: joint distribution usually need not be computedI only maximum detection requiredI finding the optimal path through a latticeI dynamic programming can be applied (Viterbi algorithm)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 194
![Page 195: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/195.jpg)
Learning of Bayesian Networks
estimating the probabilities for a given structureI for complete data:
I maximum likelihood estimationI Bayesian estimation
I for incomplete dataI expectation maximizationI gradient descent methods
learning the network structure
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 195
![Page 196: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/196.jpg)
Maximum Likelihood Estimation
likelihood of the model M given the (training) data D
L(M |D) =∏d∈D
P(d |M)
log-likelihood
LL(M |D) =∑d∈D
log2P(d |M)
choose among several possible models for describing thedata according to the principle of maximum likelihood
Θ = arg maxΘ
L(MΘ|D) = arg maxΘ
LL(MΘ|D)
the models only differ in the set of parameters Θc©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 196
![Page 197: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/197.jpg)
Maximum Likelihood Estimation
complete data: estimating the parameters by counting
P(A = a) =N(A = a)∑
v∈dom(A) N(A = v)
P(A = a|B = b,C = c) =N(A = a,B = b,C = c)
N(B = b,C = c)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 197
![Page 198: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/198.jpg)
Rare events
sparse data results in pessimistic estimations for unseeneventsI if the count for an event in the data base is 0, the event
is considered impossible by the modelI in many applications most events will never be observed,
irrespective of the sample size
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 198
![Page 199: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/199.jpg)
Rare events
Bayesian estimation: using an estimate of the priorprobability as starting point for countingI estimation of maximum a posteriori parametersI no zero counts can occurI if nothing else available use an even distribution as priorI Bayesian estimate in the binary case with an even
distribution
P(yes) =n + 1
n + m + 2
n: counts for yes, m: counts for noI effectively adding virtual counts to the estimate
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 199
![Page 200: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/200.jpg)
Rare events
alternative: smoothing as a post processing step
remove probability mass from the frequent observations ...
... and distribute it to the not observed onesI floor methodI discountingI ...
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 200
![Page 201: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/201.jpg)
Rare events
Backoff
interpolate with the estimates of a less sophisticatedmodel, e.g. combine trigram probabilities with bigram orunigram probabilities
P(on|on−2, on−1) =
c3 P(on|on−2, on−1) + c2 P(on|on−1) + c1 P(on)
good/acceptable coefficients ci can be estimated on heldout data
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 201
![Page 202: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/202.jpg)
Incomplete Data
missing at random:I probability that a value is missing depends only on the
observed valueI e.g. confirmation measurement: values are available only
if the preceding measurement was positive/negative
missing completely at randomI probability that a value is missing is also independent of
the valueI e.g. stochastic failures of the measurement equipmentI e.g. hidden/latent variables (mixture coefficients of a
Gaussian mixture distribution)
nonignorable:I neither MAR or MCARI probability depends on the unseen values, e.g. exit polls
for extremist parties
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 202
![Page 203: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/203.jpg)
Expectation Maximization
estimating the underlying distribution of not directlyobservable variables
expectation:I ”complete” the data set using the current estimation
h = Θ to calculate expectations for the missing valuesI applies the model to be learned (Bayesian inference)
maximization:I use the ”completed” data set to find a new maximum
likelihood estimation h′ = Θ′
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 203
![Page 204: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/204.jpg)
Expectation Maximization
full data consists of tuples 〈xi1, ..., xik , zi1, ..., zil〉only xi can be observed
training data: X = ~x1, ..., ~xmhidden information: Z = ~z1, ...,~zmparameters of the distribution to be estimated: Θ
Z can be treated as random variable with p(Z ) = f (Θ,X )
full data: Y = ~y | ~y = ~xi ||~zihypothesis: h of Θ, needs to be revised into h′
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 204
![Page 205: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/205.jpg)
Expectation Maximization
goal of EM: h′ = arg max E (log2 p(Y |h′))
define a function Q(h′|h) = E (log2 p(Y |h′)|h,X )
Estimation (E) step:Calculate Q(h′|h) using the current hypothesis h and theobserved data X to estimate the probability distributionover Y
Q(h′|h)← E (log2 p(Y |h′)|h,X )
Maximization (M) stepReplace hypothesis h by h′ that maximizes the function Q
h← arg maxh′
Q(h′|h)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 205
![Page 206: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/206.jpg)
Expectation Maximization
expectation step requires applying the model to belearnedI Bayesian inference
gradient ascent / hill climbing searchI converges to the next local optimumI global optimum is not guaranteed
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 206
![Page 207: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/207.jpg)
Expectation Maximization
Q(h′|h) Q(h′|h)
Q(h′|h)← E (ln p(Y |h′)|h,X )
h← arg maxh′
Q(h′|h)
If Q is continuous, EM converges to the local maximumof the likelihood function P(Y |h′)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 207
![Page 208: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/208.jpg)
Learning the Network Structure
learning the network structure
space of possible networks is extremely large (> O(2n))
a Bayesian network over a complete graph is always apossible answer, but not an interesting one (no modellingof independencies)
problem of overfitting
two approachesI constraint-based learningI (score-based learning)
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 208
![Page 209: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/209.jpg)
Constraint-based Structure Learning
estimate the pairwise degree of independence usingconditional mutual information
determine the direction of the arcs betweennon-independent nodes
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 209
![Page 210: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/210.jpg)
Estimating Independence
conditional mutual information
CMI (A,B |X ) =∑X
P(X )∑A,B
P(A,B |X )log2P(A,B |X )
P(A|X )P(B |X )
two nodes are independent if CMI (A,B |X ) = 0
choose all pairs of nodes as non-independent, where thesignificance of a χ2-test on the hypothesisCMI (A,B |X ) = 0 is above a certain user-definedthreshold
high minimal significance level: more links are established
result is a skeleton of possible candidates for causalinfluence
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 210
![Page 211: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/211.jpg)
Determining Causal Influence
Rule 1 (introduction of v-structures): A− C and B − Cbut not A− B introduce a v-structure A→ C ← B ifthere exists a set of nodes X so that A is d-separatedfrom B given X
A B
C
A B
C
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 211
![Page 212: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/212.jpg)
Determining Causal Influence
Rule 2 (avoid new v-structures): When Rule 1 has beenexhausted and there is a structure A→ C − B but notA− B then direct C → B
Rule 3 (avoid cycles): If A→ B introduces a cycle in thegraph do A← B
Rule 4 (choose randomly): If no other rule can be appliedto the graph, choose an undirected link and give it anarbitrary direction
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 212
![Page 213: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/213.jpg)
Determining Causal Influence
A B
C D E
F G
Rule 1
A B
C D E
F G
Rule 2
A B
C D E
F G
Rule 4
A B
C D E
F G
Rule 2
A B
C D E
F G
Rule 2
A B
C D E
F G
Rule 4
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 213
![Page 214: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/214.jpg)
Determining Causal Influence
independence/non-independence candidates mightcontradict each other
¬I (A,B),¬I (A,C ),¬I (B ,C ), but I (A,B |C ), I (A,C |B)and I (B ,C |A)I remove a link and build a chain out of the remaining
ones
A B
C
A B
C
I uncertain region: different heuristics might lead todifferent structures
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 214
![Page 215: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/215.jpg)
Determining Causal Influence
I (A,C ), I (A,D), I (B ,D)
A D
B C
A D
B C
E
I problem might be caused by a hidden variable E → BE → C A→ B D → C
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 215
![Page 216: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/216.jpg)
Constraint-based Structure Learning
useful results can only be expected, ifI the data is completeI no (unrecognized) hidden variables obscure the induced
influence linksI the observations are a faithful sample of an underlying
Bayesian networkI the distribution of cases in D reflects the distribution
determined by the underlying networkI the estimated probability distribution is very close to the
underlying one
I the underlying distribution is recoverable from theobservations
c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 216
![Page 217: Chapter 8: Reasoning under Uncertainty - uni-hamburg.de](https://reader035.vdocuments.us/reader035/viewer/2022071312/62cd236286581275024156ce/html5/thumbnails/217.jpg)
Constraint-based Structure Learning
example of an unrecoverable distribution:I two switches: P(A = up) = P(B = up) = 0.5I P(C = on) = 1 if val(A) = val(B)I → I (A,C ), I (B,C )
A B C
problem: independence decisions are taken on individuallinks (CMI), not on complete link configurations
P(C |A,B) =
(1 00 1
)c©D. Poole, A. Mackworth 2010, W. Menzel 2015 Artificial Intelligence, Chapter 8, Page 217