lirong xia

40
Lirong Xia Bayesian networks (1) Thursday, Feb 21, 2014

Upload: harsha

Post on 11-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Bayesian networks (1). Lirong Xia. Thursday, Feb 21, 2014. Reminder. Reassigned Project 1 due today midnight please upload as a new assignment on LMS don ’ t update old project 1 For Q6 and Q7, remember to briefly write done what is your heuristic Project 2 due on Feb 28 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lirong  Xia

Lirong Xia

Bayesian networks (1)

Thursday, Feb 21, 2014

Page 2: Lirong  Xia

• Reassigned Project 1 due today midnight

– please upload as a new assignment on LMS

– don’t update old project 1

• For Q6 and Q7, remember to briefly write

done what is your heuristic

• Project 2 due on Feb 28

• Homework 1 out on Feb 28

• Ask questions on piazza, come to OH2

Reminder

Page 3: Lirong  Xia

• Probability

– marginal probability of events

– joint probability of events

– conditional probability of events

– independent of events

• Random variables

– marginal distribution

– joint distribution

– conditional distribution

– independent variables

3

Last class

Page 4: Lirong  Xia

• A sample space Ω

– states of the world,

• or equivalently, outcomes

– only need to model the states that are relevant to the

problem

• A probability mass function p over the sample space

– for all a Ω,∈ p(a)≥0

– Σa Ω ∈ p(a)=1

4

Probability space

Temperature Weather p

Ω

hot sun 0.4

hot rain 0.1

cold sun 0.2

cold rain 0.3

Page 5: Lirong  Xia

• An event is a subset of the sample space

• An atomic event is a single outcome in Ω

• Given

– a probability space (Ω, p)

– an event A

• The marginal probability of A, denoted by p(A) is Σa∈A p(a)

• Example

– p(sun) = 0.6

– p(hot) = 0.5

5

Events and marginal probability of an event

Temperature Weather p

Ω

hot sun 0.4

hot rain 0.1

cold sun 0.2

cold rain 0.3

Page 6: Lirong  Xia

• Given

– a probability space

– two events A and B

• The joint probability of A and B is p(A and B)

6

Joint probability

ABA and B

Sample space

Page 7: Lirong  Xia

Conditional probability

• p(A | B) = p(A and B) / p(B)

ABA and B

Sample space

• p(A | B)p(B) = p(A and B) = p(B | A)P(A)

• p(A | B) = p(B | A)p(A)/p(B)

– Bayes’ rule

Page 8: Lirong  Xia

• Given

– a probability space

– two events A and B

• A and B are independent, if and only if

– p(A and B) = p(A)p(B)

– equivalently, p(A|B) = p(A)

• Interpretation

– Knowing that the state of the world is in B does not affect

the probability that the state of the world is in A (vs. not in

A)

• Different from [A and B are disjoint] 8

Independent events

Page 9: Lirong  Xia

Random variables and joint distributions

9

• A random variable is a variable with a domain– Random variables: capital letters, e.g. W, D, L

– values: small letters, e.g. w, d, l

• A joint distribution over a set of random variables: X1, X2,

…, Xn specifies a real number for each assignment (or

outcome)– p(X1 = x1, X2 = x2, …, Xn = xn)

– p(x1, x2, …, xn)

• This is a special (structured) probability space

– Sample space Ω: all combinations of values

– probability mass function is p

• A probabilistic model is a joint distribution over a set of random variables

– will be our focus of this course

T W p

hot sun 0.4

hot rain 0.1

cold sun 0.2

cold rain 0.3

, p T W

Page 10: Lirong  Xia

Marginal Distributions

10

• Marginal distributions are sub-tables which eliminate variables

• Marginalization (summing out): combine collapsed rows by adding

2

1 1 1 1 2 2,x

p X x p X x X x

T W p

hot sun 0.4

hot rain 0.1

cold sun 0.2

cold rain 0.3

, p T W

0.4 0.1

0.2 0.3

W: sun rain

T: hot

cold

0.6 0.4

0.5

0.5

Page 11: Lirong  Xia

Conditional Distributions

11

• Conditional distributions are probability distributions over some variables given fixed values of others

T W p

hot sun 0.4

hot rain 0.1

cold sun 0.2

cold rain 0.3

, p T WW p

sun 0.8

rain 0.2

p W T hot

W p

sun 0.4

rain 0.6

p W T cold

Conditional Distributions Joint Distributions

p W T

Page 12: Lirong  Xia

Independence

12

• Two variables are independent in a joint distribution if for all

x,y, the events X=x and Y=y are independent:

– The joint distribution factors into a product of two simple ones

– Usually variables aren’t independent!

• Can use independence as a modeling assumption

,

, ,

p X Y p X p Y

x y p x y p x p y

Page 13: Lirong  Xia

The Chain Rule

13

• Write any joint distribution as an incremental product of

conditional distributions

• Why is this always true?

Page 14: Lirong  Xia

• Conditional independence

• Bayesian networks

– definitions

– independence

14

Today’s schedule

Page 15: Lirong  Xia

Conditional Independence among random variables

15

• p(Toothache, Cavity, Catch)

• If I don’t have a cavity, the probability that the probe catches in it doesn’t depend on whether I have a toothache:– p(+Catch|+Toothache,-Cavity) = p(+Catch|-Cavity)

• The same independence holds if I have a cavity:– p(+Catch|+Toothache,+Cavity) = p(+Catch|+Cavity)

• Catch is conditionally independent of toothache given cavity:– p(Catch|Toothache,Cavity) = p(Catch|Cavity)

• Equivalent statements:– p(Toothache|Catch,Cavity) = p(Toothache|Cavity)

– p(Toothache,Catch|Cavity) = p(Toothache|Cavity)×p(Catch|Cavity)

– One can be derived from the other easily (part of Homework 1)

Page 16: Lirong  Xia

Conditional Independence

16

• Unconditional (absolute) independence very rare

• Conditional independence is our most basic and robust form of knowledge about uncertain environments:

• Definition: X and Y are conditionally independent given Z, if

– ∀ x, y, z: p(x, y| z)=p(x|z)×p(y|z)

– or equivalently, ∀ x, y, z: p(x| z, y)=p(x|z)

– X, Y, Z are random variables

– written as X Y|Z⊥

• What about Traffic, Umbrella, raining?

• What about fire, smoke, alarm?

• Brain teaser: in a probabilistic model with three random variables XYZ

– If X and Y are independent, can we say X and Y are conditionally independent given Z

– If X and Y are conditionally independent given Z, can we say X and Y are independent?

– Bonus questions in Homework 1

Page 17: Lirong  Xia

The Chain Rule

17

• p(X1,…, Xn)=p(X1) p(X2|X1) p(X3|X1, X2)…

• Trivial decomposition:– p(Traffic, Rain, Umbrella) = p(Rain) p(Traffic|Rain) p(Umbrella|

Traffic,Rain)

• With assumption of conditional independence:– Traffic Umbrella|Rain⊥

– p(Traffic, Rain, Umbrella) = p(Rain) p(Traffic|Rain) p(Umbrella|Rain)

• Bayesian networks/ graphical models help us express conditional independence assumptions

Page 18: Lirong  Xia

Bayesian networks: Big Picture

18

• Using full joint distribution tables– Representation: n random variables, at

least 2n entries– Computation: hard to learn (estimate)

anything empirically about more than a few variables at a time

S T W p

summer hot sun 0.30

summer hot rain 0.05

summer cold sun 0.10

summer cold rain 0.05

winter hot sun 0.10

winter hot rain 0.05

winter cold sun 0.15

winter cold rain 0.20

• Bayesian networks: a technique for describing complex joint distributions (models) using simple, local distributions (conditional probabilities)– More properly called graphical models– We describe how variables locally interact– Local interactions chain together to give global, indirect

interactions

Page 19: Lirong  Xia

Example Bayesian networks: Car

19

• Initial observation: car won’t start• Orange: “broken” nodes• Green: testable evidence• Gray: “hidden variables” to ensure sparse structure, reduce parameters

Page 20: Lirong  Xia

Graphical Model Notation

20

• Nodes: variables (with domains)

• Arcs: interactions– Indicate “direct influence” between

variables– Formally: encode conditional

independence (more later)

• For now: imagine that arrows mean direct causation (in general, they don’t!)

Page 21: Lirong  Xia

Example: Coin Flips

21

• n independent coin flips (different coins)

• No interactions between variables: independence

– Really? How about independent flips of the same coin?

– How about a skillful coin flipper?

– Bottom line: build an application-oriented model

Page 22: Lirong  Xia

Example: Traffic

22

• Variables:– R: It rains– T: There is traffic

• Model 1: independence

• Model 2: rain causes traffic

• Which model is better?

Page 23: Lirong  Xia

Example: Burglar Alarm Network

23

• Variables:– B: Burglary

– A: Alarm goes off

– M: Mary calls

– J: John calls

– E: Earthquake!

Page 24: Lirong  Xia

Bayesian network

24

• Definition of Bayesian network (Bayes’ net or BN)

• A set of nodes, one per variable X

• A directed, acyclic graph

• A conditional distribution for each node

– A collection of distributions over X, one for each combination of parents’ values

p(X|a1,…, an)

– CPT: conditional probability table

– Description of a noisy “causal” process

A Bayesian network = Topology (graph) + Local Conditional Probabilities

Page 25: Lirong  Xia

Probabilities in BNs

25

• Bayesian networks implicitly encode joint distributions– As a product of local conditional distributions

– Example:

• This lets us reconstruct any entry of the full joint

• Not every BN can represent every joint distribution– The topology enforces certain conditional independencies

Page 26: Lirong  Xia

Example: Coin Flips

26

h 0.5

t 0.5

1p X 2p X np X

…h 0.5

t 0.5

h 0.5

t 0.5

, , ,p h h t h

Only distributions whose variables are absolutely independent

can be represented by a Bayesian network with no arcs.

Page 27: Lirong  Xia

Example: Traffic

27

+r 0.25

-r 0.75

p R

+r +t 0.75

+r -t 0.25

-r +t 0.50

-r -t 0.50

p T R

,p r t

Page 28: Lirong  Xia

Example: Alarm Network

28

B E A p(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J p(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M p(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99

B p(B)

+b 0.001

-b 0.999

E p(E)

+e 0.002

-e 0.998

Page 29: Lirong  Xia

Size of a Bayesian network

29

• How big is a joint distribution over N Boolean variables?

– 2N

• How big is an N-node net if nodes have up to k parents?

– O(N×2k+1)

• Both give you the power to calculate p(X1,…, Xn)

• BNs: Huge space savings!

• Also easier to elicit local CPTs

• Also turns out to be faster to answer queries

Page 30: Lirong  Xia

Bayesian networks

30

• So far: how a Bayesian network encodes a joint

distribution

• Next: how to answer queries about that distribution– Key idea: conditional independence– Main goal: answer queries about conditional independence

and influence from the graph

• After that: how to answer numerical queries

(inference)

Page 31: Lirong  Xia

Conditional Independence in a BN

31

• Important question about a BN:

– Are two nodes independent given certain evidence?

– If yes, can prove using algebra (tedious in general)

– If no, can prove with a counter example

– Example: X: pressure, Y: rain, Z: traffic

– Question: are X and Z necessarily independent?• Answer: no. Example: low pressure causes rain, which causes

traffic• X can influence Z, Z can influence X (via Y)

Page 32: Lirong  Xia

Causal Chains

32

• This configuration is a “causal chain”

– Is X independent of Z given Y?

– Evidence along the chain “blocks” the influence

, ,p x y z p x p y x p z y

, ,,

,

p x p y x p z yp x y zp z x y

p x y p x p y xp z y

X: Low pressure

Y: Rain

Z: Traffic

Yes!

Page 33: Lirong  Xia

Common Cause

33

• Another basic configuration: two effects of the same cause

– Are X and Z independent?

– Are X and Z independent given Y?

– Observing the cause blocks influence between effects.

, ,,

,

p y p x y p z yp x y zp z x y

p x y p y p x yp z y

Y: Project due

X: Newsgroup busy

Z: Lab full

Yes!

Page 34: Lirong  Xia

Common Effect

34

• Last configuration: two causes of one effect (v-structure)

– Are X and Z independent?• Yes: the ballgame and the rain cause

traffic, but they are not correlated• Still need to prove they must be (try it!)

– Are X and Z independent given Y?

• No: seeing traffic puts the rain and the ballgame in competition as explanation?

– This is backwards from the other cases• Observing an effect activates influence

between possible causes.

X: Raining

Z: Ballgame

Y: Traffic

Page 35: Lirong  Xia

The General Case

35

• Any complex example can be analyzed using these three canonical cases

• General question: in a given BN, are two variables independent (given evidence)?

• Solution: analyze the graph

Page 36: Lirong  Xia

Reachability

36

• Recipe: shade evidence nodes

• Attempt 1: if two nodes are connected by an undirected path not blocked by a shaded node, they are NOT conditionally independent

• Almost works, but not quite– Where does it break?

– Answer: the v-structure at T doesn’t count as a link in a path unless “active”

Page 37: Lirong  Xia

Reachability (D-Separation)

37

• Question: are X and Y conditionally independent given evidence vars {Z}?– Yes, if X and Y “separated” by Z– Look for active paths from X to Y– No active paths = independence!

• A path is active if each triple is active:– Causal chain where B is

unobserved (either direction)– Common cause where B is

unobserved– Common effect where B or

one of its descendents is observed

• All it takes to block a path is a single inactive segment

A B C

A B C

A B C

Page 38: Lirong  Xia

Example

38

'

R B

R B T

R B T

Yes!

Page 39: Lirong  Xia

Example

39

'

'

,

L T T

L B

L B T

L B T

L B T R

Yes!

Yes!

Yes!

Page 40: Lirong  Xia

Example

40

,

T D

T D R

T D R S

Yes!

• Variables:– R: Raining

– T: Traffic

– D: Roof drips

– S: I am sad

• Questions: