announcements lecture09a: bayesian networkscs540/spr2019/more_progress/lecture09a.pdflecture09a:...
TRANSCRIPT
4/10/19
1
Lecture09a:Bayesian NetworksCS540 3/26/19
AnnouncementsMid-term Exam
Will be posted to the assignments page of the class web site immediately after class.Due on Thursday by the start of class (11:00 AM MST)All work soloSubmit your answers by email to [email protected] will respond when I have received your submission
We will form teams for the next assignment next week
Why Bayesian Networks?So far, we have studied classical AI◦ Search and related topics
◦ Exploration, not learning
The current hot topics in AI are machine learning◦ Tabla rasa approaches that assume nothing (well, little)
◦ … as if we had no knowledge to bring to the decision
◦ … also we have CS545 (and CS445) to cover this
Bayesian networks are an important middle ground◦ Created using knowledge + statistical learning
◦ Structured: the opposite of tabla rasa
environmentagent
?
sensors
actuatorsI believe that the sunwill still exist tomorrowwith probability 0.999999and that it will be sunnywith probability 0.6
Probabilistic Agent
ProblemAt a certain time t, the KB of an agent is some collection of beliefs◦ Every belief has a degree of certainty
At time t the agent’s sensors make an observation that changes the strength of one or more of its beliefsHow should the agent update the strength of its other beliefs?
Purpose of Bayesian NetworksFacilitate the description of a collection of beliefs ◦ make causality relations explicit◦ exploit conditional independence
Provide efficient methods for:◦ Representing a joint probability distribution ◦ Updating belief strengths when new evidence is observed
4/10/19
2
ReviewProbabilities represent a degree of belief◦ Bayesian interpretation
◦ Subjective measure of internal confidence
◦ Frequentist interpretation◦ Based on sampling from random draws
Probability values◦ 0 £ p £ 1◦ P(A v B) = P(A) + P(B) – P(A ^ B)
ReviewConditional Probability: P(A|B)◦ The probability of A given that B is true ◦ The frequency with which A will be observed, considering only
samples in which B is true
Bayes Rule
! "|$ = ! $|" ! "! $ = ! $^"
! $
! "|$ ! $ = P $|" ! "
Bayesian networks, a.k.a.Belief networks
Causal networks
Graphical models
Example
I’m at work, neighbor John calls to say my alarm isringing, but neighbor Mary doesn’t call. Sometime it’s set off by a minor earthquake. Is there a burglary?
Network topology reflects “causal” knowledge:- A burglary can set the alarm off- An earthquake can set the alarm off- The alarm can cause Mary to call- The alarm can cause John to call
Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
Burglary Earthquake
Alarm
MaryCallsJohnCalls
causes
effects
Directed acyclicgraph (DAG)
Intuitive meaning of arrowfrom x to y: “x has direct influence on y”
Nodes are random variables
A Simple Network
B E P(A|B,E)TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)0.001
P(E)0.002
A P(J|A)TF0.900.05
A P(M|A)TF0.700.01
Conditional Probability Tables
4/10/19
3
What the BN Encodes
Each of the beliefs JohnCallsand MaryCalls is independent of Burglary and Earthquake given Alarm or ¬Alarm
The beliefs JohnCalls and MaryCalls are independent given Alarm or ¬Alarm
Burglary Earthquake
Alarm
MaryCallsJohnCalls Radio
Battery
SparkPlugs
Starts
Gas
Moves
linear
converging
diverging
Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds:1. A node on the path is linear and in E2. A node on the path is diverging and in E3. A node on the path is converging and
neither this node, nor any descendant is in E
Independence Relations in BN
Where are we?We know the basics of probability, e.g.◦ The probability of an event P(e)◦ Conditional independence: P(e;f | x)
We know how to represent networks of causal eventsBut… how do we infer probabilities given a causal network and a set of observations?
If Mary calls and John calls, do I believe there was a burglary?
Set E of evidence variables that are observed, e.g., {JohnCalls,MaryCalls}Query variable X, e.g., Burglary, for which we would like to know the posterior probability distribution P(X|E)
Distribution conditional to the observations made
Inference in BN
?TT
P(B|…)MJ
Inference in BNSimplest case:
BA
For Boolean variables:P(b) = P(b|a)P(a) + P(b|¬a) P(¬a)P(¬b) = P(¬b|a)P(a) + P(¬b|¬a) P(¬a)
Note: we are summing over all
possible values of A
Inference on a chain
BA C
4/10/19
4
For a chain with n nodes where each node has k possible values the complexity is O(n k2).
BA C D
Inference on a chain
BA C D
Inference on a chain
Summary:
BA C D
Inference on a chain
We can perform variable elimination using a different order:
Variable EliminationSuppose we’re interested in P(Xk)Write query in the form
Iteratively◦ Move all irrelevant terms outside of innermost sum◦ Perform innermost sum, getting a new term◦ Insert the new term into the product
Inference Ex. 2
RainSprinkler
Cloudy
WetGrass
å=C,S,R
)c(P)c|s(P)c|r(P)s,r|w(P)W(P
å å=S,R C
)c(P)c|s(P)c|r(P)s,r|w(P
å=S,R
C )s,r(f)s,r|w(P )S,R(fC
A More Complex Example
Visit to Asia Smoking
Lung CancerTuberculosis
Abnormalityin Chest Bronchitis
X-Ray Dyspnea
The “Asia” network:
4/10/19
5
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
Example (cont)
Suppose we want to computeP(d)
The form of the joint distribution:
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
Eliminate: v
Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term
Compute: å=v
v vtPvPtf )|()()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ
Example (cont)
Need to eliminate: v,s,x,t,l,a,b
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
Eliminate: s
Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables
Compute: å=s
s slPsbPsPlbf )|()|()(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ
),|()|(),|(),()( badPaxPltaPlbftf svÞ
Example (cont)
Need to eliminate: s,x,t,l,a,b
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
Eliminate: x
Note: fx(a) = 1 for all values of a. In fact, every variable that is not an ancestor of the query or evidence variables can be ignored!
Compute: å=x
x axPaf )|()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ
),|()|(),|(),()( badPaxPltaPlbftf svÞ
),|(),|()(),()( badPltaPaflbftf xsvÞ
Need to eliminate: x,t,l,a,b
Example (cont)
Compute: å=t
vt ltaPtflaf ),|()(),(
),|(),()(),( badPlafaflbf txsÞ
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
Eliminate: t
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ
),|()|(),|(),()( badPaxPltaPlbftf svÞ
),|(),|()(),()( badPltaPaflbftf xsvÞ
Need to eliminate: t,l,a,b
Example (cont)
Note: using functions of eliminated variables to eliminate variables
Compute: å=l
tsl laflbfbaf ),(),(),(
),|()(),( badPafbaf xlÞ
V S
LT
A B
X D),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
Eliminate: l
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ
),|()|(),|(),()( badPaxPltaPlbftf svÞ
),|(),|()(),()( badPltaPaflbftf xsvÞ
),|(),()(),( badPlafaflbf txsÞ
Need to eliminate: l,a,b
Example (cont)
4/10/19
6
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
Eliminate: a,bCompute: åå ==
bab
axla dbfdfbadpafbafdbf ),()(),|()(),(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ
),|()|(),|(),()( badPaxPltaPlbftf svÞ
),|(),|()(),()( badPltaPaflbftf xsvÞ
),|()(),( badPafbaf xlÞ),|(),()(),( badPlafaflbf txsÞ
)(),( dfdbf ba ÞÞ
Need to eliminate: a,b
Example (cont) Variable EliminationWe now understand variable elimination as a sequence of rewriting operations
Computation depends on order of elimination
Dealing with Evidence
How do we deal with evidence?
Suppose get evidence V = t, S = f, D = t and want to compute P(L|V = t, S = f, D = t)
V S
LT
A B
X D
( ) ( )( )tDfStVP
tDfStVLPtDfStVLP======
====,,,,,,,|
We start by writing the factors:
Since we know that V = t, we don’t need to eliminate VInstead, we can replace the factors P(V) and P(T|V) with
These “select” the appropriate parts of the original factors given the evidenceNote that fp(V) is a constant, and thus does not appear in elimination of other variables
Dealing with Evidence
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
V S
LT
A B
X D
)|()()( )|()( tVTPTftVPf VTpVP ====
Compute P(L, V = t, S = f, D = t )
Initial factors, after setting evidence:
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
Dealing with Evidence
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
Dealing with Evidence Compute P(L, V = t, S = f, D = t )
Initial factors, after setting evidence:
Eliminating X:
4/10/19
7
),()()( )|()|()()( lbfbflfff asbPslPsPvP
)()()|()()( lflfff bslPsPvP
Dealing with Evidence Initial factors, after setting evidence:
Eliminating x, we get
Eliminating t, we get
Eliminating a, we get
Eliminating b, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP
Variable Elimination
Compute the probability of Xk given values to evidence variables E
Algorithm is same as before, with no need to perform summation with respect to evidence variables.
Complexity of inferenceTheorem:Computing P(X = x) in a Bayesian network is NP-hard
Approaches to inferenceExact inference
◦ Variable elimination◦ Join tree algorithm
Approximate inference◦ Simplify the structure of the network to make exact inference
efficient (variational methods, loopy belief propagation)
Probabilistic methods◦ Stochastic simulation / sampling methods◦ Markov chain Monte Carlo methods