bayesian networks cse 473. © d. weld and d. fox 2 bayes nets in general, joint distribution p over...

29
Bayesian Networks CSE 473

Upload: avice-wilkins

Post on 20-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

Bayesian Networks

CSE 473

Page 2: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 2

Bayes Nets

•In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference

•BNs provide a graphical representation of conditional independence relations in P

  usually quite compact  requires assessment of fewer parameters, those being

quite natural (e.g., causal)  efficient (usually) inference: query answering and belief

update

Page 3: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 3

An Example Bayes Net

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Pr(B=t) Pr(B=f) 0.05 0.95

Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)

Page 4: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 4

Earthquake Example (cont’d)

•If we know Alarm, no other evidence influences our degree of belief in Nbr1Calls

  P(N1|N2,A,E,B) = P(N1|A)  also: P(N2|N1,A,E,B) = P(N2|A) and P(E|B) = P(E)

•By the chain rule we haveP(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B)

•Full joint requires only 10 parameters (cf. 32)

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Page 5: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 5

BNs: Qualitative Structure

•Graphical structure of BN reflects conditional independence among variables

•Each variable X is a node in the DAG•Edges denote direct probabilistic influence

  usually interpreted causally  parents of X are denoted Par(X)

•X is conditionally independent of all nondescendents given its parents

  Graphical test exists for more general independence  “Markov Blanket”

Page 6: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 6

Given Parents, X is Independent of

Non-Descendants

Page 7: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 7

For Example

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Page 8: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 8

For Example

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Page 9: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 9

For Example

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Radio

Page 10: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 10

For Example

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Radio

Page 11: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 11

For Example

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Radio

Page 12: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 12

Given Markov Blanket, X is Independent of All Other Nodes

MB(X) = Par(X) Childs(X) Par(Childs(X))

Page 13: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 13

For Example

Earthquake Burglary

Alarm

Nbr2Calls

Radio

Nbr1Calls

Page 14: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 14

For Example

Earthquake Burglary

Alarm

Nbr2Calls

Radio

Nbr1Calls

Page 15: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 15

Conditional Probability Tables

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Pr(B=t) Pr(B=f) 0.05 0.95

Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)

Page 16: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 16

Conditional Probability Tables•For complete spec. of joint dist., quantify BN

•For each variable X, specify CPT: P(X | Par(X))  number of params locally exponential in |Par(X)|

•If X1, X2,... Xn is any topological sort of the network, then we

are assured:P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1)

… P(X2 | X1) · P(X1)

= P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1)

Page 17: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 17

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?

••

Bayes Net Construction Example

Page 18: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 18

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?

• No•

Example

Page 19: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 19

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)?

• No•

Example

Page 20: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 20

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)?P(E | B, A, J, M) = P(E | A, B)?

• No•

Example

Page 21: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 21

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)? NoP(E | B, A, J, M) = P(E | A, B)? Yes

• No •

Example

Page 22: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 22

Example contd.

• Deciding conditional independence is hard in noncausal directions• (Causal models and conditional independence seem hardwired for

humans!)• Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

•••

Page 23: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 23

Inference in BNs

•The graphical independence representation  yields efficient inference schemes

•We generally want to compute   Pr(X), or

  Pr(X|E) where E is (conjunctive) evidence

•Computations organized by network topology

•One simple algorithm:   variable elimination (VE)

Page 24: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 24

P(B | J=true, M=true)

Earthquake Burglary

Alarm

MaryJohn

P(b|j,m) = P(b,j,m,e,a) e,a

Page 25: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 25

P(B | J=true, M=true)

Earthquake Burglary

Alarm

MaryJohn

P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a)e a

Page 26: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 26

Structure of Computation

Dynamic Programming

Page 27: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 27

Variable Elimination

•A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1)

  CPTs are factors, e.g., P(A|E,B) function of A,E,B

•VE works by eliminating all variables in turn until there is a factor with only query variable

•To eliminate a variable:  join all factors containing that variable (like DB)  sum out the influence of the variable on new factor  exploits product form of joint distribution

Page 28: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 28

Example of VE: P(N1)

Earthqk Burgl

Alarm

N2N1

P(N1)

= N2,A,B,E P(N1,N2,A,B,E)

= N2,A,B,E P(N1|A)P(N2|A) P(B)P(A|B,E)P(E)

= AP(N1|A) N2P(N2|A) BP(B) EP(A|B,E)P(E)

= AP(N1|A) N2P(N2|A) BP(B) f1(A,B)

= AP(N1|A) N2P(N2|A) f2(A)

= AP(N1|A) f3(A)

= f4(N1)

Page 29: Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential

© D. Weld and D. Fox 29

Notes on VE•Each operation is a simple multiplication of factors and summing out a variable

•Complexity determined by size of largest factor  in our example, 3 vars (not 5)  linear in number of vars,   exponential in largest factor elimination ordering greatly

impacts factor size  optimal elimination orderings: NP-hard  heuristics, special structure (e.g., polytrees)

•Practically, inference is much more tractable using structure of this sort