Download - PGM 2002/03 Tirgul 4 Exact Inference
PGM 2002/03 Tirgul 4
Exact Inference
Inference in Simple Chains
How do we compute P(X2)?
X1 X2
11
)|()(),()( 121212xx
xxPxPxxPxP
Inference in Simple Chains (cont.)
How do we compute P(X3)?
we already know how to compute P(X2)...
X1 X2
22
)|()(),()( 232323xx
xxPxPxxPxP
X3
11
)|()(),()( 121212xx
xxPxPxxPxP
Inference in Simple Chains (cont.)
How do we compute P(Xn)?
Compute P(X1), P(X2), P(X3), … We compute each term by using the previous one
Complexity: Each step costs O(|Val(Xi)|*|Val(Xi+1)|) operations Compare to naïve evaluation, that requires summing
over joint values of n-1 variables
ix
iiii xxPxPxP )|()()( 11
X1 X2 X3Xn
...
Inference in Simple Chains (cont.)
Suppose that we observe the value of X2 =x2
How do we compute P(X1|x2)? Recall that we it suffices to compute P(X1,x2)
X1 X2
)()|(),( 11221 xPxxPxxP
Inference in Simple Chains (cont.)
Suppose that we observe the value of X3 =x3
How do we compute P(X1,x3)?
How do we compute P(x3|x1)?
X1 X2
)|()(),( 13131 xxPxPxxP
X3
2
22
)|()|(
),|()|()|,()|(
2312
2131213213
x
xx
xxPxxP
xxxPxxPxxxPxxP
Inference in Simple Chains (cont.)
Suppose that we observe the value of Xn =xn
How do we compute P(X1,xn)?
We compute P(xn|xn-1), P(xn|xn-2), … iteratively
X1 X2
)|()(),( 111 xxPxPxxP nn
X3
i
i
xinii
xiniin
xxPxxP
xxxPxxP
)|()|(
)|,()|(
11
11
Xn...
Inference in Simple Chains (cont.)
Suppose that we observe the value of Xn =xn
We want to find P(Xk|xn )
How do we compute P(Xk,xn )?
We compute P(Xk ) by forward iterations
We compute P(xn | Xk ) by backward iterations
X1 X2
)|()(),( knknk xxPxPxxP
Xk Xn......
Elimination in Chains
We now try to understand the simple chain example using first-order principles
Using definition of probability, we have
d c b a
edcbaPeP ),,,,()(
A B C ED
Elimination in Chains
By chain decomposition, we get
A B C ED
d c b a
d c b a
dePcdPbcPabPaP
edcbaPeP
)|()|()|()|()(
),,,,()(
Elimination in Chains
Rearranging terms ...
A B C ED
d c b a
d c b a
abPaPdePcdPbcP
dePcdPbcPabPaPeP
)|()()|()|()|(
)|()|()|()|()()(
Elimination in Chains
Now we can perform innermost summation
This summation, is exactly the first step in the forward iteration we describe before
A B C ED
d c b
d c b a
bpdePcdPbcP
abPaPdePcdPbcPeP
)()|()|()|(
)|()()|()|()|()(
X
Elimination in Chains
Rearranging and then summing again, we get
A B C ED
d c
d c b
d c b
cpdePcdP
bpbcPdePcdP
bpdePcdPbcPeP
)()|()|(
)()|()|()|(
)()|()|()|()(
X X
Elimination in Chains with Evidence
Similarly, we understand the backward pass
We write the query in explicit form
A B C ED
b c d
b c d
dePcdPbcPabPaP
edcbaPeaP
)|()|()|()|()(
),,,,(),(
Elimination in Chains with Evidence
Eliminating d, we get
A B C ED
b c
b c d
b c d
cePbcPabPaP
dePcdPbcPabPaP
dePcdPbcPabPaPeaP
)|()|()|()(
)|()|()|()|()(
)|()|()|()|()(),(
X
Elimination in Chains with Evidence
Eliminating c, we get
A B C ED
b
b c
b c
bepabPaP
cePbcPabPaP
cePbcPabPaPeaP
)|()|()(
)|()|()|()(
)|()|()|()(),(
XX
Elimination in Chains with Evidence
Finally, we eliminate b
A B C ED
)|()(
)|()|()(
)|()|()(),(
aePaP
bepabPaP
bepabPaPeaP
b
b
XXX
Variable Elimination
General idea: Write query in the form
Iteratively Move all irrelevant terms outside of innermost
sum Perform innermost sum, getting a new term Insert the new term into the product
kx x x i
iin paxPXP3 2
)|(),( e
A More Complex Example
Visit to Asia
Smoking
Lung CancerTuberculosis
Abnormalityin Chest
Bronchitis
X-Ray Dyspnea
“Asia” network:
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
We want to compute P(d) Need to eliminate: v,s,x,t,l,a,b
Initial factors
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
We want to compute P(d) Need to eliminate: v,s,x,t,l,a,b
Initial factors
Eliminate: v
Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term
Compute: v
v vtPvPtf )|()()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
We want to compute P(d) Need to eliminate: s,x,t,l,a,b
Initial factors
Eliminate: s
Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables
Compute: s
s slPsbPsPlbf )|()|()(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
We want to compute P(d) Need to eliminate: x,t,l,a,b
Initial factors
Eliminate: x
Note: fx(a) = 1 for all values of a !!
Compute: x
x axPaf )|()(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
We want to compute P(d) Need to eliminate: t,l,a,b
Initial factors
Eliminate: t
Compute: t
vt ltaPtflaf ),|()(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
),|(),()(),( badPlafaflbf txs
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
We want to compute P(d) Need to eliminate: l,a,b
Initial factors
Eliminate: l
Compute: l
tsl laflbfbaf ),(),(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
),|(),()(),( badPlafaflbf txs
),|()(),( badPafbaf xl
V S
LT
A B
X D
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
We want to compute P(d) Need to eliminate: b
Initial factors
Eliminate: a,bCompute:
b
aba
xla dbfdfbadpafbafdbf ),()(),|()(),(),(
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
),|()|(),|(),()( badPaxPltaPlbftf sv
),|(),|()(),()( badPltaPaflbftf xsv
),|()(),( badPafbaf xl),|(),()(),( badPlafaflbf txs
)(),( dfdbf ba
Variable Elimination
We now understand variable elimination as a sequence of rewriting operations
Actual computation is done in elimination step
Exactly the same computation procedure applies to Markov networks
Computation depends on order of elimination We will return to this issue in detail
Dealing with evidence
How do we deal with evidence?
Suppose get evidence V = t, S = f, D = t We want to compute P(L, V = t, S = f, D = t)
V S
LT
A B
X D
Dealing with Evidence
We start by writing the factors:
Since we know that V = t, we don’t need to eliminate V Instead, we can replace the factors P(V) and P(T|V) with
These “select” the appropriate parts of the original factors given the evidence
Note that fp(V) is a constant, and thus does not appear in elimination of other variables
),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP
V S
LT
A B
X D
)|()()( )|()( tVTPTftVPf VTpVP
Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence:
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence:
Eliminating x, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence:
Eliminating x, we get
Eliminating t, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP
Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence:
Eliminating x, we get
Eliminating t, we get
Eliminating a, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP
),()()( )|()|()()( lbfbflfff asbPslPsPvP
Dealing with Evidence Given evidence V = t, S = f, D = t Compute P(L, V = t, S = f, D = t ) Initial factors, after setting evidence:
Eliminating x, we get
Eliminating t, we get
Eliminating a, we get
Eliminating b, we get
),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP
V S
LT
A B
X D
),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP
),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP
),()()( )|()|()()( lbfbflfff asbPslPsPvP
)()()|()()( lflfff bslPsPvP
x
kxkx yyxfyyf ),,,('),,( 11
m
ilikx i
yyxfyyxf1
,1,1,11 ),,(),,,('
Complexity of variable elimination
Suppose in one elimination step we compute
This requires multiplications
For each value for x, y1, …, yk, we do m multiplications
additions For each value of y1, …, yk , we do |Val(X)| additions
Complexity is exponential in number of variables in the intermediate factor!
i
iYXm )Val()Val(
i
iYX )Val()Val(
Numeric example: “Green” Network
Car
Rain
Waste
Gazlan Hiker
Pollution
R0 R1
P(Ri) 0.8 0.2
R0 R1
P(H0|R) 0.2 0.7P(H1|R) 0.8 0.3
R0 R1
P(G0|R) 0.1 0.3P(G1|R) 0.9 0.7
F0 F1
P(P0|C) 0.8 0.1P(P1|C) 0.2 0.9
C0 C1
P(C0|H) 0.95 0.4P(C1|H) 0.05 0.6
G0H0 G0H1 G1H0 G1H1
P(W0|G,H) 0.9 0.8 0.7 0.15P(W1|G,H) 0.1 0.2 0.3 0.85
Example:
Suppose we wish to calculate P(c0|p1,w0), using the algorithm shown in class.
Answer:
First, We’ll calculate P(C,p1,w0).
We have the initial factors:
P(R)=fR (R) = r1 0.2
r0 0.8
Initial Factors
P(G|R) = fG(G,R) = r0 g0 0.1
r0 g1 0.9
r1 g0 0.3
r1 g1 0.7
P(H|R) = fH(H,R) = h0 r0 0.2
h0 r1 0.7
h1 r0 0.8
h1 r1 0.3
Initial Factors (cont.)
P(w0|G,H) = fW(w0,G,H) = w0 h0 g0 0.9
w0 h0 g1 0.7
w0 h1 g0 0.8
w0 h1 g1 0.15
P(C|H) = fC(C,H) = h0 c0 0.95
h0 c1 0.05
h1 c0 0.4
h1 c1 0.6
Initial Factors (cont.)
P(p1|C) = FP(p1,C) = p1 c0 0.2
p1 c1 0.9
We will now choose an elimination order:
P(F,p1,w0) =
H R GWGRHCP ffffff
Calculation
The steps are: gwghr = fWXfG = w0 g0 h0 r0 0.9X0.1 = 0.09
w0 g0 h0 r1 0.9X0.3 = 0.27
w0 g0 h1 r0 0.8X0.1 = 0.08
w0 g0 h1 r1 0.8X0.3 = 0.24
w0 g1 h0 r0 0.7X0.9 = 0.63
w0 g1 h0 r1 0.7X0.7 = 0.49
w0 g1 h1 r0 0.15X0.9 = 0.135
w0 g1 h1 r1 0.15X0.7 = 0.105
Calculation (cont.)
mhwr = = w0 h0 r0 0.09+0.63 = 0.72
w0 h0 r1 0.27 + 0.49 = 0.76
w0 h1 r0 0.08+0.135 = 0.215
w0 h1 r1 0.24+0.105 = 0.345
gwhr = fH x fR x mhwr = w0 h0 r0 0.8x0.2x0.72 = 0.1152
w0 h0 r1 0.2x0.7x0.76 = 0.1064
w0 h1 r0 0.8x0.8x0.215 = 0.1376
w0 h1 r1 0.2x0.3x0.345 = 0.0207
G
wghrg
Calculation (cont.)
mhw = = w0 h0 0.1152+0.1064 = 0.2216
w0 h1 0.1376+0.0207 = 0.1583
gwhc = fC x mhw = w0 h0 c0 0.2216x0.95 = 0.21052
w0 h0 c1 0.2216x0.05= 0.01108
w0 h1 c0 0.1583x0.4 = 0.06332
w0 h1 c1 0.1583x0.6 = 0.09498
mwc = = w0 c0 0.21052+0.06332= 0.27384
w0 c1 0.01108+0.09498= 0.10606
R
whrg
H
whcg
Calculation (cont.)
gwcp = fP x mwc = w0 c0 p1 0.2x0.27384 = 0.054768
w0 c1 p1 0.8x0.10606 = 0.095454
Now, we have:
3646.0095454.0054768.0
054768.0
)w,P(p
)w,pP(c)w,p|P(c
01
010,010
The Computational Cost
Computing gwghr = fW X fG requires 8 multiplications.
Computing mhwr = requires 4 additions.
Computing gwhr = fH x fR x mhwr requires 8 multiplications.
Computing mhw = requires 2 additions.
Computing gwhc = fC x mhw requires 4 multiplications.
Computing mwc = requires 2 additions.
Computing gwcp = fP x mwc requires 2 multiplications.
For a total of: 8+8+4+2 = 22 multiplications
and 4+2+2 = 8 additions.
G
wghrg
R
whrg
H
whcg
Elimination order does matter
We can choose another elimination order, say: R,G,H
For a total of 16+4+4+2 = 26 multiplications,
and 4+2+2 = 8 additions.