lac group, 16/06/2011. so far... directed graphical models bayesian networks useful because both...
TRANSCRIPT
![Page 1: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/1.jpg)
PGM CH 4.1-4.2 NOTESLAC group, 16/06/2011
![Page 2: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/2.jpg)
So far...
Directed graphical models Bayesian Networks
Useful because both the structure and the parameters provide a natural representation for many types of real-world domains.
![Page 3: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/3.jpg)
This chapter...
Undirected graphical models
Useful in modelling phenomena where we cannot determine the directionality of the interaction between the variables.
Offer a different, simpler perspective on directed models (both independence structure & inference task)
![Page 4: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/4.jpg)
This chapter...
Introduce a framework that allows both directed and undirected edges
Note: some of the results in this chapter require that we restrict attention to distribution over discrete state spaces.
Discrete vs. continuous = boolean or real numbers e.g. 2.1.6
![Page 5: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/5.jpg)
The 4 students example
(The misconception example sec. 3.4.2, ex.3.8)
4 students who get together in pairs to work on their homework for a class. The pairs that meet are shown via the edges (lines) of this undirected graph : A : Alice B : Bobby C : Charles D : Debbie
A
D B
C
![Page 6: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/6.jpg)
The 4 students example
We want to model the following distribution:
1) A is independent of C given B and D2) B is independent of D given A and C
}),{|()2 CADB
}),{|()1 DBCA
![Page 7: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/7.jpg)
The 4 students example
PROBLEM 1:
If we try to model these on a Bayesian network, we will be in trouble:
Any bayesian network I-map of such a distribution will have extraneous edges
At least one of the desired independence statements will not be captured
(cont’d)
![Page 8: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/8.jpg)
The 4 students example
(cont’d) Any bayesian will require from us to
describe the directionality of the influence
Also: Interactions look symmetrical and we
would like to model this somehow, without representing a direction of influence.
![Page 9: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/9.jpg)
The 4 students example
SOLUTION 1:
Undirected graph
= (here) Markov network structure
Nodes (circles) represent variables Edges (lines) represent a notion of direct
probabilistic interaction between the neighbouring variables, not mediated by any other variable in the network.
A
D B
C
![Page 10: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/10.jpg)
The 4 students example
PROBLEM 2: How to parameterise this undirected
graph? CPD (conditional probability
distribution) not useful, as the interaction is not directed
We would like to capture the affinities between the related variables e.g. Alice and Bobby are more likely to agree than disagree
A
D B
C
![Page 11: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/11.jpg)
The 4 students example
SOLUTION 2: Associate A and B with a general
purpose function : factor
![Page 12: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/12.jpg)
The 4 students example
Here we focus only on non-negative factors.
Factor: Let D be a set of random variables. We define a
factor φ to be a function from Val(D) to R. A factor is non-negative if all its entries are non-negative.
Scope:The set of variables D is called the scope of the
factor and is denoted as Scope[φ].
![Page 13: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/13.jpg)
The 4 students example
Let’s calculate the factor of A and B i.e. the fact that Alice and Bob are more likely to agree than disagree:
φ1(A,B) : Val(A,B) to R+
The value associated with a particular assignment a,b denotes the affinity between the two values: the higher the value of φ1(A,B) the more compatible the two values are
![Page 14: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/14.jpg)
The 4 students example
Fig 4.1/a shows one possible compatibility factor for A and B
Not normalised (see partial function later on how to do this)
0: right, 1:wrong/has the misconception
φ1(A,B)
a0 b0 30a0 b1 5a1 b0 1a1 b1 10
0: right, 1:wrong/has the misconception
![Page 15: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/15.jpg)
The 4 students example
φ1(A,B) asserts that: it is more likely that Alice
and Bob agree φ1(a0, b0), φ1(a1, b1) - they are more likely to be either both wrong or both right
If they disagree, Alice is more likely to be right (φ1(a0, b1)) than Bob (φ1(a1, b0))
φ1(A,B)
a0 b0 30a0 b1 5a1 b0 1a1 b1 10
0: right, 1:wrong/has the misconception
![Page 16: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/16.jpg)
The 4 students example
φ3(C,D) asserts that: Charles and Debbie
argue all the time and they will end up disagreeing any way : φ3(c0, d1) and φ3(c1, d0)
φ3(C,D)
c0 d0 1c0 d1 10
0c1 d0 10
0c1 d1 10: right, 1:wrong/has the misconception
![Page 17: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/17.jpg)
The 4 students example
So far: defined the local interactions
between variables/nodes/circles
Next step: Define a global model : need to
combine these interactions = multiply them as with a Bayesian network
![Page 18: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/18.jpg)
The 4 students example
A possible GLOBAL MODEL:
P(a,b,c,d) = φ1(a, b) ∙ φ2(b, c) ∙ φ3(c, d) ∙ φ4(d, a)
PROBLEM:Nothing guarantees that the result is a
normalised distribution (see fig. 4.2 middle column)
![Page 19: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/19.jpg)
The 4 students example
SOLUTIONTake the product of the local factors and normalise it:
P(a,b,c,d) = 1/Z ∙ φ1(a, b) ∙ φ2(b, c) ∙ φ3(c, d) ∙ φ4(d, a)
Where
Z= ∑ φ1(a, b) ∙ φ2(b, c) ∙ φ3(c, d) ∙ φ4(d, a)
Z is a normalising constant known as partition function :
partition as in markov random field in statistical physics;
function , as Z is a function of the parameters [important for machine learning]
![Page 20: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/20.jpg)
The 4 students example
See figure 4.2 for the calculations of the joint distribution
Calculate the partition function of a1,b1,c0,d1
![Page 21: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/21.jpg)
The 4 students example
We can use the partition function/joint probability to answer questions like:
How likely is Bob to have a misconception?
How likely is Bob to have the misconception, given that Charles doesn’t?
![Page 22: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/22.jpg)
The 4 students example
How likely is Bob to have the misconception?
P(b1) ≈ 0.732P(b0) ≈ 0.268
Bob is 26% less ?? likely to have the misconception
![Page 23: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/23.jpg)
The 4 students example
How likely is Bob to have the misconception, given that Charles doesn’t?
P(b1|c0) ≈ 0.06
![Page 24: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/24.jpg)
The 4 students example
Advantages of this approach:
Allows great flexibility in representing interactions between variables. We can change the nature of interaction
between A and B by simply modifying the entries in the factor without caring about normalisation constraints and the interaction of other factors
![Page 25: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/25.jpg)
The 4 students example
Tight connection between factorisation of the distribution and its independence properties:
Factorisation:
),(),()(
:)|(|)3
21 ZYZXXP
asPwritecanweifZYXP
![Page 26: LAC group, 16/06/2011. So far... Directed graphical models Bayesian Networks Useful because both the structure and the parameters provide a natural](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649f3e5503460f94c5f5df/html5/thumbnails/26.jpg)
The 4 students example
Using the formula in 3) we can decompose the distribution in several ways e.g.
P(A,B,C,D) = [1/Z ∙ φ1(A, B) ∙ φ2(B, C)] ∙ φ3(C, D) ∙ φ4(A, D)
and infer that
),|(|
),|(|
DBCAP
andCADBP