Download - An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011

An introduction to Bayesian networks

Stochastic Processes Course

Hossein Amirkhani

Spring 2011

2


Outline

Introduction,

Bayesian Networks,

Probabilistic Graphical Models,

Conditional Independence,

I-equivalence.

3


Introduction

Our goal is to represent a joint distribution over some set of random variables .

Even in the simplest case where these variables are binary-valued, a joint distribution requires the specification of numbers.

The explicit representation of the joint distribution is unmanageable from every perspective: Computationally, Cognitively, and Statistically.

4


Bayesian Networks

Bayesian networks exploit conditional independence properties of the distribution in order to allow a compact and natural representation.

They are a specific type of probabilistic graphical models. BNs are directed acyclic graphs (DAG).

5


Probabilistic Graphical Models

Nodes are the random variables in our domain. Edges correspond, intuitively, to direct influence of

one node on another.

Factor Graph Markov Random Field Bayesian Network

6


Probabilistic Graphical Models

Graphs are an intuitive way of representing and visualising the relationships between many variables.

A graph allows us to abstract out the conditional independence relationships between the variables from the details of their parametric forms. Thus we can answer questions like: “Is A dependent on

B given that we know the value of C ?” just by looking at the graph.

Graphical models allow us to define general message-passing algorithms that implement probabilistic inference efficiently.

Graphical models = statistics × graph theory × computer science.

7


Bayesian Networks

8


Bayesian Networks

9


Conditional Independence: Example 1

tail-to-tail at c

10



11



Smoking

Lung Cancer Yellow Teeth

12



head-to-tail at c

13



14



Type of Car Speed Amount of speeding Fine

15



head-to-head at c

v-structure

16



17



Ability of team A Ability of team B

Outcome of A vs. B game

18


D-separation

• A, B, and C are non-intersecting subsets of nodes in a directed graph.

• A path from A to B is blocked if it contains a node such that either

a) the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or

b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

• If all paths from A to B are blocked, A is said to be d-separated from B by C.

• If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .

19


I-equivalence

Let be a distribution over . We define to be the set of independence assertions that hold in .

Two graph structures and over are I-equivalent if .

The set of all graphs over X is partitioned into a set of mutually exclusive and exhaustive I-equivalence classes.

20


The skeleton of a Bayesian network

The skeleton of a Bayesian network graph over is an undirected graph over that contains an edge for every edge in .

21


Immorality

A v-structure is an immorality if there is no direct edge between X and Y.

22


Relationship between immorality, skeleton and I-equivalence

Let and be two graphs over . Then and have the same skeleton and the same set of immoralities if and only if they are I-equivalent.

We can use this theorem to recognize that whether two BNs are I-equivalent or not.

In addition, this theorem can be used for learning the structure of the Bayesian network related to a distribution. We can construct the I-equivalence class for a distribution by

determining its skeleton and its immoralities from the independence properties of the given distribution.

We then use both of these components to build a representation of the equivalence class.

23


Identifying the Undirected Skeleton

The basic idea is to use independence queries of the form for different sets of variables .

If and are adjacent in , we cannot separate them with any set of variables.

Conversely, if and are not adjacent in , we would hope to be able to find a set of variables that makes these two variables conditionally independent: we call this set a witness of their independence.

24


Identifying the Undirected Skeleton

Let be an I-map of a distribution , and let and be two variables that are not adjacent in . Then either or .

Thus, if and are not adjacent in , then we can find a witness of bounded size.

Thus, if we assume that has bounded indegree, say less than or equal to d, then we do not need to consider witness sets larger than d.

25


26


Identifying Immoralities

At this stage we have reconstructed the undirected skeleton. Now, we want to reconstruct edge direction.

Our goal is to consider potential immoralities in the skeleton and for each one determine whether it is indeed an immorality.

A triplet of variables X, Z, Y is a potential immorality if the skeleton contains but does not contain an edge between X and Y.

A potential immorality is an immorality if and only if Z is not in the witness set(s) for X and Y.

27


28


Representing Equivalence Classes

An acyclic graph containing both directed and undirected edges is called a partially directed acyclic graph or PDAG.

29



Let be a DAG. A chain graph is a class PDAG of the equivalence class of if shares the same skeleton as , and contains a directed edge if and only if all that are I-equivalent to contain the edge .

If the edge is directed, then all the members of the equivalence class agree on the orientation of the edge.

If the edge is undirected, there are two DAGs in the equivalence class that disagree with the orientation of the edge.

30



Is the output of Mark-Immoralities the class PDAG? Clearly, edges involved in immoralities must be

directed in K. The obvious question is whether K can contain

directed edges that are not involved in immoralities. In other words, can there be additional edges whose

direction is necessarily the same in every member of the equivalence class?

31


Rules

32


33


Example

34


References

D. Koller and N. Friedman: Probabilistic Graphical Models. MIT Press, 2009.

C. M. Bishop: Pattern Recognition and Machine Learning. Springer, 2006.

35


THANKS

Download - An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011

Top Related