artificial intelligence probabilistic reasoning fall 2008 professor: luigi ceccaroni

Artificial IntelligenceProbabilistic reasoning

Fall 2008

professor: Luigi Ceccaroni

Bayesian networks

• A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions.

• Syntax:– a set of nodes, one per variable– a directed, acyclic graph (links ≈ "directly influences")– a conditional distribution for each node given its

parents:P (Xi | Parents (Xi))

• In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values.

Example

• Topology of network encodes conditional independence assertions:

• Weather is independent of the other variables.• Toothache and Catch are conditionally

independent given Cavity.

Example

• What is the probability of having a heart attack?

• This probability depends on “4 variables”: – Sport– Diet– Blood pressure– Smoking

• Knowing the dependency among these variables let us build a Bayesian network.

4

Constructing Bayesian networks

• 1. Choose an ordering of variables X1, … ,Xn

• 2. For i = 1 to n– add Xi to the network

– select parents from X1, … ,Xi-1 such that

P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)

This choice of parents guarantees:P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1) (chain rule)

= πi =1P (Xi | Parents(Xi)) (by construction)

n

n

Example

Heartattack

Smoking

Blood pressure

DietSport

Diet P(Di)

balanced 0.4

unbalanced 0.6

Sport P(Sp)

yes 0.1

no 0.9

Smoking P(Sm)

yes 0.4

no 0.6

Diet Sport P(Bp = high)

P(Bp = normal)

bal. yes 0.01 0.99

unbal. yes 0.2 0.8

bal. no 0.25 0.75

unbal. no 0.7 0.3

Bp Sm P(Ha=yes) P(Ha=no)

high yes 0.8 0.2

norm. yes 0.6 0.4

high no 0.7 0.3

norm. no 0.3 0.7

Compactness

• A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values.

• Each row requires one number p for Xi = true(the number for Xi = false is just 1-p).

• If each variable (n) has no more than k parents (k<<n), the complete network requires O(n · 2k) numbers.

Representation cost

• The network grows linearly with n, vs. O(2n) for the conditional full joint distribution.

• Examples: – With 10 variables and at most 3 parents:

•80 vs. 1024

– With 100 variables and at most 5 parents:•3200 vs. 1030

Semantics

The full joint distribution is defined as the product of the local conditional distributions:

P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))

Example:P (sp ∧ Di=balanced ∧ Bp=high ∧ ¬sm ∧ ¬ha) =

= P (sp) P (Di=balanced) P (Bp=high | sp, Di=balanced) P (¬sm) P (¬ha | Bp=high, ¬sm)

n

Bayesian networks – Joint distribution - Example

P(ha Bp = high sm sp Di = ∧ ∧ ∧ ∧balanced)

= P(ha | Bp = high, sm) P(Bp = high | sp, Di = balanced) P(sm) P(sp) P(Di = balanced)

= 0.8 x 0.01 x 0.4 x 0.1 x 0.4

= 0.000128

10

Exact inference in Bayesian networks: example

• Inference by enumeration:

P(X | e) = α P(X, e) = α y P(X, e, y)

• Let’s calculate:

P(Smoking | Heart attack = yes, Sport = no)• The full joint distribution of the network is:

P(Sp, Di, Bp, Sm, Ha) =

= P(Sp) P(Di) P(Bp | Sp, Di) P(Sm) P(Ha | Bp, Sm)

• We want to calculate: P(Sm | ha, ¬sp).

Exact inference in Bayesian networks: example

P(Sm | ha, ¬sp) = α P(Sm, ha, ¬sp) =

= αDi{b, ¬b}Bp{h, n}P(Sm, ha, ¬sp, Di, Bp) =

= α P(¬sp) P(Sm) Di{b, ¬b}P(Di) Bp{h,n}P(Bp | ¬sp, Di) P(ha | Bp, Sm) =

= α <0.9 * 0.4 * (0.4 * (0.25 * 0.8 + 0.75 * 0.6) + 0.6 * (0.7 * 0.8 + 0.3 * 0.6)),

0.9 * 0.6 * (0.4 * (0.25 * 0.7 + 0.75 * 0.3) + 0.6 * (0.7 * 0.7 + 0.3 * 0.3)> =

= α <0.253, 0.274> = <0.48, 0,52>

Variable elimination algorithm

• The variable elimination algorithm let us avoid the calculation repetition of inference by enumeration.

• Each variable is represented by a factor.• Intermediate results are saved to be later

reused.• Non-relevant variables, being constant

factors, are not directly computed.

13

Variable elimination algorithm

14

CALCULA-FACTOR generates the factor corresponding to variable var in the function of the joint probability distribution.

PRODUCTO-Y-SUMA multiplies factors and sums over the hidden variable.

PRODUCTO multiplies a set of factors.

Variable elimination algorithm - Example

α P(¬sp) P(Sm) Di{b, ¬b}P(Di) Bp{h,n}P(Bp | ¬sp, Di) P(ha | Bp, Sm)

• Factor for variable Heart attack P(ha | Bp, Sm), fHa(Bp, Sm):

15

Bp Sm fHa(Bp, Sm)

high yes 0.8

high no 0.7

normal yes 0.6

normal no 0.3


• Factor for variable Blood pressure P(Bp | ¬sp, Di), fBp(Bp, Di):

• To put together the factors just obtained, we calculate the product of fHa(Bp, Sm) x fBp(Bp, Di) = fHa Bp(Bp, Sm, Di) 16

Bp Di fBp(Bp, Di)

high balanced 0.25

high unbalanced 0.7

normal balanced 0.75

normal unbalanced 0.3

fHa Bp(Bp, Sm, Di) =

= fHa(Bp, Sm) x fBp(Bp, Di)


Bp Sm Di fHa Bp(Bp, Sm, Di)

high yes balanced 0.8 * 0.25

high yes unbalanced 0.8 * 0.7

high no balanced 0.7 * 0.25

high no unbalanced 0.7 * 0.7

normal yes balanced 0.6 * 0.75

normal yes unbalanced 0.6 * 0.3

normal no balanced 0.3 * 0.75

normal no unbalanced 0.3 * 0.3

17

• We sum over the values of variable Bp to obtain factor fHa Bp(Sm, Di)

• Factor for variable Di, fDi(Di):


18

Sm Di fHa Bp(Sm, Di)

yes balanced 0.8 * 0.25 + 0.6 * 0.75 = 0.65

yes unbalanced 0.8 * 0.7 + 0.6 * 0.3 = 0.74

no balanced 0.7 * 0.25 + 0.3 * 0.75 = 0.4

no unbalanced 0.7 * 0.7 + 0.3 * 0.3 = 0.58

Di fDi(Di)

balanced 0.4

unbalanced 0.6

• fHa Di Bp(Sm, Di) = fDi(Di) x fHa Bp(Sm, Di)

• We sum over the values of variable Di to obtain factor fHa Di Bp(Sm)


19

Sm Di fHa Di Bp(Sm, Di)

yes balanced 0.65 * 0.4

yes unbalanced 0.74 * 0.6

no balanced 0.4 * 0.4

no unbalanced 0.58 * 0.6

Sm fHa Di Bp(Sm)

yes 0.65 * 0.4 + 0.74 * 0.6 = 0.7

no 0.4 * 0.4 + 0.58 * 0.6 = 0.51


• Factor for variable Sm, fSm(Sm):

• fHa Sm Di Bp(Sm) = fSm(Sm) x fHa Di Bp(Sm)

• Normalizing, we obtain:20

Sm fSm(Sm)

yes 0.4

no 0.6

Sm fHa Sm Di Bp(Sm)

yes 0.4 * 0.7 = 0.282

no 0.6 * 0.51 = 0.305

Sm P(Sm | ha, ¬sp)

yes 0.48

no 0.52

• Bayesian networks provide a natural representation for (causally induced) conditional independence.

• Topology + CPTs = compact representation of joint distribution.

• Generally easy for domain experts to construct.

Summary

artificial intelligence probabilistic reasoning fall 2008 professor: luigi ceccaroni

Documents