Download - An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011
An introduction to Bayesian networks
Stochastic Processes Course
Hossein Amirkhani
Spring 2011
2
An introduction to Bayesian networks
Outline
Introduction,
Bayesian Networks,
Probabilistic Graphical Models,
Conditional Independence,
I-equivalence.
3
An introduction to Bayesian networks
Introduction
Our goal is to represent a joint distribution over some set of random variables .
Even in the simplest case where these variables are binary-valued, a joint distribution requires the specification of numbers.
The explicit representation of the joint distribution is unmanageable from every perspective: Computationally, Cognitively, and Statistically.
4
An introduction to Bayesian networks
Bayesian Networks
Bayesian networks exploit conditional independence properties of the distribution in order to allow a compact and natural representation.
They are a specific type of probabilistic graphical models. BNs are directed acyclic graphs (DAG).
5
An introduction to Bayesian networks
Probabilistic Graphical Models
Nodes are the random variables in our domain. Edges correspond, intuitively, to direct influence of
one node on another.
Factor Graph Markov Random Field Bayesian Network
6
An introduction to Bayesian networks
Probabilistic Graphical Models
Graphs are an intuitive way of representing and visualising the relationships between many variables.
A graph allows us to abstract out the conditional independence relationships between the variables from the details of their parametric forms. Thus we can answer questions like: “Is A dependent on
B given that we know the value of C ?” just by looking at the graph.
Graphical models allow us to define general message-passing algorithms that implement probabilistic inference efficiently.
Graphical models = statistics × graph theory × computer science.
7
An introduction to Bayesian networks
Bayesian Networks
8
An introduction to Bayesian networks
Bayesian Networks
9
An introduction to Bayesian networks
Conditional Independence: Example 1
tail-to-tail at c
10
An introduction to Bayesian networks
Conditional Independence: Example 1
11
An introduction to Bayesian networks
Conditional Independence: Example 1
Smoking
Lung Cancer Yellow Teeth
12
An introduction to Bayesian networks
Conditional Independence: Example 2
head-to-tail at c
13
An introduction to Bayesian networks
Conditional Independence: Example 2
14
An introduction to Bayesian networks
Conditional Independence: Example 2
Type of Car Speed Amount of speeding Fine
15
An introduction to Bayesian networks
Conditional Independence: Example 3
head-to-head at c
v-structure
16
An introduction to Bayesian networks
Conditional Independence: Example 3
17
An introduction to Bayesian networks
Conditional Independence: Example 3
Ability of team A Ability of team B
Outcome of A vs. B game
18
An introduction to Bayesian networks
D-separation
• A, B, and C are non-intersecting subsets of nodes in a directed graph.
• A path from A to B is blocked if it contains a node such that either
a) the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or
b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
• If all paths from A to B are blocked, A is said to be d-separated from B by C.
• If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .
19
An introduction to Bayesian networks
I-equivalence
Let be a distribution over . We define to be the set of independence assertions that hold in .
Two graph structures and over are I-equivalent if .
The set of all graphs over X is partitioned into a set of mutually exclusive and exhaustive I-equivalence classes.
20
An introduction to Bayesian networks
The skeleton of a Bayesian network
The skeleton of a Bayesian network graph over is an undirected graph over that contains an edge for every edge in .
21
An introduction to Bayesian networks
Immorality
A v-structure is an immorality if there is no direct edge between X and Y.
22
An introduction to Bayesian networks
Relationship between immorality, skeleton and I-equivalence
Let and be two graphs over . Then and have the same skeleton and the same set of immoralities if and only if they are I-equivalent.
We can use this theorem to recognize that whether two BNs are I-equivalent or not.
In addition, this theorem can be used for learning the structure of the Bayesian network related to a distribution. We can construct the I-equivalence class for a distribution by
determining its skeleton and its immoralities from the independence properties of the given distribution.
We then use both of these components to build a representation of the equivalence class.
23
An introduction to Bayesian networks
Identifying the Undirected Skeleton
The basic idea is to use independence queries of the form for different sets of variables .
If and are adjacent in , we cannot separate them with any set of variables.
Conversely, if and are not adjacent in , we would hope to be able to find a set of variables that makes these two variables conditionally independent: we call this set a witness of their independence.
24
An introduction to Bayesian networks
Identifying the Undirected Skeleton
Let be an I-map of a distribution , and let and be two variables that are not adjacent in . Then either or .
Thus, if and are not adjacent in , then we can find a witness of bounded size.
Thus, if we assume that has bounded indegree, say less than or equal to d, then we do not need to consider witness sets larger than d.
25
An introduction to Bayesian networks
26
An introduction to Bayesian networks
Identifying Immoralities
At this stage we have reconstructed the undirected skeleton. Now, we want to reconstruct edge direction.
Our goal is to consider potential immoralities in the skeleton and for each one determine whether it is indeed an immorality.
A triplet of variables X, Z, Y is a potential immorality if the skeleton contains but does not contain an edge between X and Y.
A potential immorality is an immorality if and only if Z is not in the witness set(s) for X and Y.
27
An introduction to Bayesian networks
28
An introduction to Bayesian networks
Representing Equivalence Classes
An acyclic graph containing both directed and undirected edges is called a partially directed acyclic graph or PDAG.
29
An introduction to Bayesian networks
Representing Equivalence Classes
Let be a DAG. A chain graph is a class PDAG of the equivalence class of if shares the same skeleton as , and contains a directed edge if and only if all that are I-equivalent to contain the edge .
If the edge is directed, then all the members of the equivalence class agree on the orientation of the edge.
If the edge is undirected, there are two DAGs in the equivalence class that disagree with the orientation of the edge.
30
An introduction to Bayesian networks
Representing Equivalence Classes
Is the output of Mark-Immoralities the class PDAG? Clearly, edges involved in immoralities must be
directed in K. The obvious question is whether K can contain
directed edges that are not involved in immoralities. In other words, can there be additional edges whose
direction is necessarily the same in every member of the equivalence class?
31
An introduction to Bayesian networks
Rules
32
An introduction to Bayesian networks
33
An introduction to Bayesian networks
Example
34
An introduction to Bayesian networks
References
D. Koller and N. Friedman: Probabilistic Graphical Models. MIT Press, 2009.
C. M. Bishop: Pattern Recognition and Machine Learning. Springer, 2006.
35
An introduction to Bayesian networks
THANKS