announcements lecture09a: bayesian networkscs540/spr2019/more_progress/lecture09a.pdflecture09a:...

7
4/10/19 1 Lecture09a: Bayesian Networks CS540 3/26/19 Announcements Mid-term Exam Will be posted to the assignments page of the class web site immediately after class. Due on Thursday by the start of class (11:00 AM MST) All work solo Submit your answers by email to [email protected] I will respond when I have received your submission We will form teams for the next assignment next week Why Bayesian Networks? So far, we have studied classical AI Search and related topics Exploration, not learning The current hot topics in AI are machine learning Tabla rasa approaches that assume nothing (well, little) … as if we had no knowledge to bring to the decision … also we have CS545 (and CS445) to cover this Bayesian networks are an important middle ground Created using knowledge + statistical learning Structured: the opposite of tabla rasa environment agent ? sensors actuators I believe that the sun will still exist tomorrow with probability 0.999999 and that it will be sunny with probability 0.6 Probabilistic Agent Problem At a certain time t, the KB of an agent is some collection of beliefs Every belief has a degree of certainty At time t the agent’s sensors make an observation that changes the strength of one or more of its beliefs How should the agent update the strength of its other beliefs? Purpose of Bayesian Networks Facilitate the description of a collection of beliefs make causality relations explicit exploit conditional independence Provide efficient methods for: Representing a joint probability distribution Updating belief strengths when new evidence is observed

Upload: others

Post on 20-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Announcements Lecture09a: Bayesian Networkscs540/spr2019/more_progress/Lecture09a.pdfLecture09a: Bayesian Networks CS540 3/26/19 Announcements ... Search and related topics Exploration,

4/10/19

1

Lecture09a:Bayesian NetworksCS540 3/26/19

AnnouncementsMid-term Exam

Will be posted to the assignments page of the class web site immediately after class.Due on Thursday by the start of class (11:00 AM MST)All work soloSubmit your answers by email to [email protected] will respond when I have received your submission

We will form teams for the next assignment next week

Why Bayesian Networks?So far, we have studied classical AI◦ Search and related topics

◦ Exploration, not learning

The current hot topics in AI are machine learning◦ Tabla rasa approaches that assume nothing (well, little)

◦ … as if we had no knowledge to bring to the decision

◦ … also we have CS545 (and CS445) to cover this

Bayesian networks are an important middle ground◦ Created using knowledge + statistical learning

◦ Structured: the opposite of tabla rasa

environmentagent

?

sensors

actuatorsI believe that the sunwill still exist tomorrowwith probability 0.999999and that it will be sunnywith probability 0.6

Probabilistic Agent

ProblemAt a certain time t, the KB of an agent is some collection of beliefs◦ Every belief has a degree of certainty

At time t the agent’s sensors make an observation that changes the strength of one or more of its beliefsHow should the agent update the strength of its other beliefs?

Purpose of Bayesian NetworksFacilitate the description of a collection of beliefs ◦ make causality relations explicit◦ exploit conditional independence

Provide efficient methods for:◦ Representing a joint probability distribution ◦ Updating belief strengths when new evidence is observed

Page 2: Announcements Lecture09a: Bayesian Networkscs540/spr2019/more_progress/Lecture09a.pdfLecture09a: Bayesian Networks CS540 3/26/19 Announcements ... Search and related topics Exploration,

4/10/19

2

ReviewProbabilities represent a degree of belief◦ Bayesian interpretation

◦ Subjective measure of internal confidence

◦ Frequentist interpretation◦ Based on sampling from random draws

Probability values◦ 0 £ p £ 1◦ P(A v B) = P(A) + P(B) – P(A ^ B)

ReviewConditional Probability: P(A|B)◦ The probability of A given that B is true ◦ The frequency with which A will be observed, considering only

samples in which B is true

Bayes Rule

! "|$ = ! $|" ! "! $ = ! $^"

! $

! "|$ ! $ = P $|" ! "

Bayesian networks, a.k.a.Belief networks

Causal networks

Graphical models

Example

I’m at work, neighbor John calls to say my alarm isringing, but neighbor Mary doesn’t call. Sometime it’s set off by a minor earthquake. Is there a burglary?

Network topology reflects “causal” knowledge:- A burglary can set the alarm off- An earthquake can set the alarm off- The alarm can cause Mary to call- The alarm can cause John to call

Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

Burglary Earthquake

Alarm

MaryCallsJohnCalls

causes

effects

Directed acyclicgraph (DAG)

Intuitive meaning of arrowfrom x to y: “x has direct influence on y”

Nodes are random variables

A Simple Network

B E P(A|B,E)TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)0.001

P(E)0.002

A P(J|A)TF0.900.05

A P(M|A)TF0.700.01

Conditional Probability Tables

Page 3: Announcements Lecture09a: Bayesian Networkscs540/spr2019/more_progress/Lecture09a.pdfLecture09a: Bayesian Networks CS540 3/26/19 Announcements ... Search and related topics Exploration,

4/10/19

3

What the BN Encodes

Each of the beliefs JohnCallsand MaryCalls is independent of Burglary and Earthquake given Alarm or ¬Alarm

The beliefs JohnCalls and MaryCalls are independent given Alarm or ¬Alarm

Burglary Earthquake

Alarm

MaryCallsJohnCalls Radio

Battery

SparkPlugs

Starts

Gas

Moves

linear

converging

diverging

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds:1. A node on the path is linear and in E2. A node on the path is diverging and in E3. A node on the path is converging and

neither this node, nor any descendant is in E

Independence Relations in BN

Where are we?We know the basics of probability, e.g.◦ The probability of an event P(e)◦ Conditional independence: P(e;f | x)

We know how to represent networks of causal eventsBut… how do we infer probabilities given a causal network and a set of observations?

If Mary calls and John calls, do I believe there was a burglary?

Set E of evidence variables that are observed, e.g., {JohnCalls,MaryCalls}Query variable X, e.g., Burglary, for which we would like to know the posterior probability distribution P(X|E)

Distribution conditional to the observations made

Inference in BN

?TT

P(B|…)MJ

Inference in BNSimplest case:

BA

For Boolean variables:P(b) = P(b|a)P(a) + P(b|¬a) P(¬a)P(¬b) = P(¬b|a)P(a) + P(¬b|¬a) P(¬a)

Note: we are summing over all

possible values of A

Inference on a chain

BA C

Page 4: Announcements Lecture09a: Bayesian Networkscs540/spr2019/more_progress/Lecture09a.pdfLecture09a: Bayesian Networks CS540 3/26/19 Announcements ... Search and related topics Exploration,

4/10/19

4

For a chain with n nodes where each node has k possible values the complexity is O(n k2).

BA C D

Inference on a chain

BA C D

Inference on a chain

Summary:

BA C D

Inference on a chain

We can perform variable elimination using a different order:

Variable EliminationSuppose we’re interested in P(Xk)Write query in the form

Iteratively◦ Move all irrelevant terms outside of innermost sum◦ Perform innermost sum, getting a new term◦ Insert the new term into the product

Inference Ex. 2

RainSprinkler

Cloudy

WetGrass

å=C,S,R

)c(P)c|s(P)c|r(P)s,r|w(P)W(P

å å=S,R C

)c(P)c|s(P)c|r(P)s,r|w(P

å=S,R

C )s,r(f)s,r|w(P )S,R(fC

A More Complex Example

Visit to Asia Smoking

Lung CancerTuberculosis

Abnormalityin Chest Bronchitis

X-Ray Dyspnea

The “Asia” network:

Page 5: Announcements Lecture09a: Bayesian Networkscs540/spr2019/more_progress/Lecture09a.pdfLecture09a: Bayesian Networks CS540 3/26/19 Announcements ... Search and related topics Exploration,

4/10/19

5

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

Example (cont)

Suppose we want to computeP(d)

The form of the joint distribution:

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

Eliminate: v

Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term

Compute: å=v

v vtPvPtf )|()()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ

Example (cont)

Need to eliminate: v,s,x,t,l,a,b

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

Eliminate: s

Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables

Compute: å=s

s slPsbPsPlbf )|()|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ

),|()|(),|(),()( badPaxPltaPlbftf svÞ

Example (cont)

Need to eliminate: s,x,t,l,a,b

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

Eliminate: x

Note: fx(a) = 1 for all values of a. In fact, every variable that is not an ancestor of the query or evidence variables can be ignored!

Compute: å=x

x axPaf )|()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ

),|()|(),|(),()( badPaxPltaPlbftf svÞ

),|(),|()(),()( badPltaPaflbftf xsvÞ

Need to eliminate: x,t,l,a,b

Example (cont)

Compute: å=t

vt ltaPtflaf ),|()(),(

),|(),()(),( badPlafaflbf txsÞ

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

Eliminate: t

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ

),|()|(),|(),()( badPaxPltaPlbftf svÞ

),|(),|()(),()( badPltaPaflbftf xsvÞ

Need to eliminate: t,l,a,b

Example (cont)

Note: using functions of eliminated variables to eliminate variables

Compute: å=l

tsl laflbfbaf ),(),(),(

),|()(),( badPafbaf xlÞ

V S

LT

A B

X D),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

Eliminate: l

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ

),|()|(),|(),()( badPaxPltaPlbftf svÞ

),|(),|()(),()( badPltaPaflbftf xsvÞ

),|(),()(),( badPlafaflbf txsÞ

Need to eliminate: l,a,b

Example (cont)

Page 6: Announcements Lecture09a: Bayesian Networkscs540/spr2019/more_progress/Lecture09a.pdfLecture09a: Bayesian Networks CS540 3/26/19 Announcements ... Search and related topics Exploration,

4/10/19

6

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

Eliminate: a,bCompute: åå ==

bab

axla dbfdfbadpafbafdbf ),()(),|()(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfvÞ

),|()|(),|(),()( badPaxPltaPlbftf svÞ

),|(),|()(),()( badPltaPaflbftf xsvÞ

),|()(),( badPafbaf xlÞ),|(),()(),( badPlafaflbf txsÞ

)(),( dfdbf ba ÞÞ

Need to eliminate: a,b

Example (cont) Variable EliminationWe now understand variable elimination as a sequence of rewriting operations

Computation depends on order of elimination

Dealing with Evidence

How do we deal with evidence?

Suppose get evidence V = t, S = f, D = t and want to compute P(L|V = t, S = f, D = t)

V S

LT

A B

X D

( ) ( )( )tDfStVP

tDfStVLPtDfStVLP======

====,,,,,,,|

We start by writing the factors:

Since we know that V = t, we don’t need to eliminate VInstead, we can replace the factors P(V) and P(T|V) with

These “select” the appropriate parts of the original factors given the evidenceNote that fp(V) is a constant, and thus does not appear in elimination of other variables

Dealing with Evidence

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

V S

LT

A B

X D

)|()()( )|()( tVTPTftVPf VTpVP ====

Compute P(L, V = t, S = f, D = t )

Initial factors, after setting evidence:

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

Dealing with Evidence

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

Dealing with Evidence Compute P(L, V = t, S = f, D = t )

Initial factors, after setting evidence:

Eliminating X:

Page 7: Announcements Lecture09a: Bayesian Networkscs540/spr2019/more_progress/Lecture09a.pdfLecture09a: Bayesian Networks CS540 3/26/19 Announcements ... Search and related topics Exploration,

4/10/19

7

),()()( )|()|()()( lbfbflfff asbPslPsPvP

)()()|()()( lflfff bslPsPvP

Dealing with Evidence Initial factors, after setting evidence:

Eliminating x, we get

Eliminating t, we get

Eliminating a, we get

Eliminating b, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

Variable Elimination

Compute the probability of Xk given values to evidence variables E

Algorithm is same as before, with no need to perform summation with respect to evidence variables.

Complexity of inferenceTheorem:Computing P(X = x) in a Bayesian network is NP-hard

Approaches to inferenceExact inference

◦ Variable elimination◦ Join tree algorithm

Approximate inference◦ Simplify the structure of the network to make exact inference

efficient (variational methods, loopy belief propagation)

Probabilistic methods◦ Stochastic simulation / sampling methods◦ Markov chain Monte Carlo methods