the famous “sprinkler” example (j. pearl, probabilistic reasoning in intelligent systems, ...

35
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Upload: mahola

Post on 22-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988). Recall rule for inference in Bayesian networks: Example: What is A (slightly) harder question: What is P ( C | W , R )? . Exact Inference in Bayesian Networks. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

The famous “sprinkler” example(J. Pearl, Probabilistic Reasoning in Intelligent Systems,

1988)

Page 2: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

P((X1 = x1)∧...∧(Xn = xn ))

= P((X i = x ii=1

n

∏ ) | parents(X i))

Recall rule for inference in Bayesian networks:

Example: What is

A (slightly) harder question: What is P(C | W, R)? €

P(C, R, ¬ S,W )?

Page 3: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

General question: What is P(X|e)?

Notation convention: upper-case letters refer to random variables; lower-case letters refer to specific values of those variables

Exact Inference in Bayesian Networks

Page 4: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

General question: Given query variable X and observed evidence variable values e, what is P(X | e)?

P(X | e) =P(X,e)

P(e) (definition of conditional probability)

= α P(X,e) α = 1

P(e) ⎛ ⎝ ⎜

⎞ ⎠ ⎟

= α P(X,e,y)y∈Y∑ (where Y are the non - evidence variables other than X)

= α P(z∈{X ,e,y}∏ z | parents(z))

y∑ (semantics of Bayesian networks)

Page 5: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Example: What is P(C |W, R)?

Page 6: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

P(C |W ,R) = α P(C,R,W ,S) + P(C,R,W ,¬ S)( )

= α P(z | parents(Z))z∈{C ,R ,W ,S}

∏ + P(z | parents(Z))z∈{C ,R ,W ,¬ S}

∏ ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟

= α P(C)P(R | C)P(S | C)P(W | S,R) + P(C)P(R | C)P(¬ S | C)P(W | ¬ S,R)[ ]

= α .5 × .8 × .1 × .99( ) + (.5 × .8 × .9 × .9)[ ]

= α (.3636)

P(C | R,W ), P(¬ C | R,W ) = α .3636, .0945

=.3636

.3636 + .0945,

.0945.3636 + .0945

= .794, .206€

P(¬ C | R,W ) = .0945

Page 7: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

• Worst-case complexity is exponential in n (number of nodes)

• Problem is having to enumerate all possibilities for many variables.

s

rswPcsPcrPcP ),|()|()|()(

Page 8: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Can reduce computation by computing terms only once and storing for future use.

See “variable elimination algorithm” in reading.

Page 9: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

• In general, however, exact inference in Bayesian networks is too expensive.

Page 10: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Approximate inference in Bayesian networks

Instead of enumerating all possibilities, sample to estimate probabilities.

X1 X2 X3 Xn

...

Page 11: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Direct Sampling

• Suppose we have no evidence, but we want to determine P(C,S,R,W) for all C,S,R,W.

• Direct sampling: – Sample each variable in topological order,

conditioned on values of parents.

– I.e., always sample from P(Xi | parents(Xi))

Page 12: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

1. Sample from P(Cloudy). Suppose returns true.

2. Sample from P(Sprinkler | Cloudy = true). Suppose returns false.

3. Sample from P(Rain | Cloudy = true). Suppose returns true.

4. Sample from P(WetGrass | Sprinkler = false, Rain = true). Suppose returns true.

Here is the sampled event: [true, false, true, true]

Example

Page 13: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

• Suppose there are N total samples, and let NS (x1, ..., xn) be the observed frequency of the specific event x1, ..., xn.

• Suppose N samples, n nodes. Complexity O(Nn).

• Problem 1: Need lots of samples to get good probability estimates.

• Problem 2: Many samples are not realistic; low likelihood.

),...,(),...,(lim 11

nnS

NxxP

NxxN

),...,(),...,(1

1n

nS xxPN

xxN

Page 14: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Markov Chain Monte Carlo Sampling

• One of most common methods used in real applications.

• Uses idea of Markov blanket of a variable Xi: – parents, children, children’s other parents

• Fact: By construction of Bayesian network, a node is conditionally independent of its non-descendants, given its parents.

Page 15: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

• Proposition: A node Xi is conditionally independent of all other nodes in the network, given its Markov blanket.

What is the Markov Blanket of Rain?

What is the Markov blanket of Wet Grass?

Page 16: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Markov Chain Monte Carlo (MCMC) Sampling Algorithm

• Start with random sample from variables, with evidence variables fixed: (x1, ..., xn). This is the current “state” of the algorithm.

• Next state: Randomly sample value for one non-evidence variable Xi , conditioned on current values in “Markov Blanket” of Xi.

Page 17: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Example

• Query: What is P(Rain | Sprinkler = true, WetGrass = true)?

• MCMC: – Random sample, with evidence variables fixed:

[Cloudy, Sprinkler, Rain, WetGrass] = [true, true, false, true]

– Repeat: 1. Sample Cloudy, given current values of its Markov blanket:

Sprinkler = true, Rain = false. Suppose result is false. New state: [false, true, false, true]

2. Sample Rain, given current values of its Markov blanket: Cloudy = false, Sprinkler = true, WetGrass = true. Supposeresult is true. New state: [false, true, true, true].

Page 18: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

• Each sample contributes to estimate for queryP(Rain | Sprinkler = true, WetGrass = true)

• Suppose we perform 100 such samples, 20 with Rain = true and 80 with Rain = false.

• Then answer to the query isNormalize (20,80) = .20,.80

• Claim: “The sampling process settles into a dynamic equilibrium in which the long-run fraction of time spent in each state is exactly proportional to its posterior probability, given the evidence.”

– That is: for all variables Xi, the probability of the value xi of Xi appearing in a sample is equal to P(xi | e).

• Proof of claim: Reference on request

Page 19: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Issues in Bayesian Networks

• Building / learning network topology

• Assigning / learning conditional probability tables

• Approximate inference via sampling

• Incorporating temporal aspects (e.g., evidence changes from one time step to the next).

Page 20: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Learning network topology

• Many different approaches, including:

– Heuristic search, with evaluation based on information theory measures

– Genetic algorithms

– Using “meta” Bayesian networks!

Page 21: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Learning conditional probabilities

• In general, random variables are not binary, but real-valued

• Conditional probability tables conditional probability distributions

• Estimate parameters of these distributions from data

• If data is missing on one or more variables, use “expectation maximization” algorithm

Page 22: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Learning network topology

• Many different approaches, including:

– Heuristic search, with evaluation based on information theory measures

– Genetic algorithms

– Using “meta” Bayesian networks!

Page 23: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Learning conditional probabilities

• In general, random variables are not binary, but real-valued

• Conditional probability tables conditional probability distributions

• Estimate parameters of these distributions from data

• If data is missing on one or more variables, use “expectation maximization” algorithm

Page 24: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Speech Recognition

• Task: Identify sequence of words uttered by speaker, given acoustic signal.

• Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc.

• Thus speech recognition is viewed as problem of probabilistic inference.

Page 25: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Speech Recognition

• So far, we’ve looked at probabilistic reasoning in static environments.

• Speech: Time sequence of “static environments”. – Let X be the “state variables” (i.e., set of non-

evidence variables) describing the environment (e.g., Words said during time step t)

– Let E be the set of evidence variables (e.g., features of acoustic signal).

Page 26: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

– The E values and X joint probability distribution changes over time.

t1: X1, e1 t2: X2 , e2

etc.

Page 27: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

• At each t, we want to compute P(Words | S).

• We know from Bayes rule:

• P(S | Words), for all words, is a previously learned “acoustic model”. – E.g. For each word, probability distribution over phones, and for

each phone, probability distribution over acoustic signals (which can vary in pitch, speed, volume).

• P(Words), for all words, is the “language model”, which specifies prior probability of each utterance.

– E.g. “bigram model”: probability of each word following each other word.

)()|()|( WordsPWordsPWordsP SS

Page 28: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

• Speech recognition typically makes three assumptions:

1. Process underlying change is itself

“stationary” i.e., state transition probabilities don’t change

2. Current state X depends on only a finite history of previous states (“Markov assumption”).– Markov process of order n: Current state

depends only on n previous states.

3. Values et of evidence variables depend only on current state Xt. (“Sensor model”)

Page 29: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

From http://www.cs.berkeley.edu/~russell/slides/

Page 30: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

From http://www.cs.berkeley.edu/~russell/slides/

Page 31: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Hidden Markov Models

• Markov model: Given state Xt, what is probability of transitioning to next state Xt+1 ?

• E.g., word bigram probabilities give P (wordt+1 | wordt )

• Hidden Markov model: There are observable states (e.g., signal S) and “hidden” states (e.g., Words). HMM represents probabilities of hidden states given observable states.

Page 32: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

From http://www.cs.berkeley.edu/~russell/slides/

Page 33: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

From http://www.cs.berkeley.edu/~russell/slides/

Page 34: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

Example: “I’m firsty, um, can I have something to dwink?”

From http://www.cs.berkeley.edu/~russell/slides/

Page 35: The famous “sprinkler” example (J. Pearl,  Probabilistic Reasoning in Intelligent Systems,  1988)

From http://www.cs.berkeley.edu/~russell/slides/