© daniel s. weld 1 statistical learning cse 573 lecture 16 slides which overlap fix several errors
Post on 22-Dec-2015
215 views
TRANSCRIPT
![Page 1: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/1.jpg)
© Daniel S. Weld 1
Statistical LearningCSE 573
Lecture 16 slides
which overlap fix
several erro
rs
![Page 2: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/2.jpg)
© Daniel S. Weld 2
Logistics
• Team Meetings• Midterm
Open book, notes Studying
• See AIMA exercises
![Page 3: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/3.jpg)
© Daniel S. Weld 3
573 Topics
Agency
Problem Spaces
Search
Knowledge Representation & Inference
Planning SupervisedLearning
Logic-Based
Probabilistic
ReinforcementLearning
![Page 4: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/4.jpg)
© Daniel S. Weld 4
Topics
• Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case
• Learning Parameters for a Bayesian Network• Naive Bayes
Maximum Likelihood estimates Priors
• Learning Structure of Bayesian Networks
![Page 5: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/5.jpg)
Coin Flip
P(H|C2) = 0.5P(H|C1) = 0.1
C1C2
P(H|C3) = 0.9
C3
Which coin will I use?
P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
Prior: Probability of a hypothesis before we make any observations
![Page 6: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/6.jpg)
Coin Flip
P(H|C2) = 0.5P(H|C1) = 0.1
C1C2
P(H|C3) = 0.9
C3
Which coin will I use?
P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
Uniform Prior: All hypothesis are equally likely before we make any observations
![Page 7: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/7.jpg)
Experiment 1: Heads
Which coin did I use?P(C1|H) = ? P(C2|H) = ? P(C3|H) = ?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1)=0.1
C1C2 C3
P(C1)=1/3 P(C2) = 1/3 P(C3) = 1/3
![Page 8: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/8.jpg)
Experiment 1: Heads
Which coin did I use?P(C1|H) = 0.066P(C2|H) = 0.333 P(C3|H) = 0.6
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
Posterior: Probability of a hypothesis given data
![Page 9: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/9.jpg)
Terminology
•Prior: Probability of a hypothesis before we see any data
•Uniform Prior: A prior that makes all hypothesis equaly likely
•Posterior: Probability of a hypothesis after we saw some data
•Likelihood: Probability of data given hypothesis
![Page 10: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/10.jpg)
Experiment 2: Tails
Which coin did I use?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
P(C1|HT) = ? P(C2|HT) = ? P(C3|HT) = ?
![Page 11: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/11.jpg)
Experiment 2: Tails
Which coin did I use?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
P(C1|HT) = 0.21P(C2|HT) = 0.58P(C3|HT) = 0.21
![Page 12: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/12.jpg)
Experiment 2: Tails
Which coin did I use?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
P(C1|HT) = 0.21P(C2|HT) = 0.58P(C3|HT) = 0.21
![Page 13: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/13.jpg)
Your Estimate?
What is the probability of heads after two experiments?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
Best estimate for P(H)
P(H|C2) = 0.5
Most likely coin:
C2
![Page 14: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/14.jpg)
Your Estimate?
P(H|C2) = 0.5
C2
P(C2) = 1/3
Most likely coin: Best estimate for P(H)
P(H|C2) = 0.5C2
Maximum Likelihood Estimate: The best hypothesis
that fits observed data assuming uniform prior
![Page 15: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/15.jpg)
Using Prior Knowledge• Should we
always use a Uniform Prior ?
• Background knowledge:
Heads => we have take-home midterm
Dan doesn’t like take-homes…
=> Dan is more likely to use a coin biased in his favor
P(H|C2) = 0.5P(H|C1) = 0.1
C1C2
P(H|C3) = 0.9
C3
![Page 16: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/16.jpg)
Using Prior Knowledge
P(H|C2) = 0.5P(H|C1) = 0.1
C1C2
P(H|C3) = 0.9
C3
P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
We can encode it in the prior:
![Page 17: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/17.jpg)
Experiment 1: Heads
Which coin did I use?P(C1|H) = ? P(C2|H) = ? P(C3|H) = ?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
![Page 18: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/18.jpg)
Experiment 1: Heads
Which coin did I use?P(C1|H) = 0.006P(C2|H) = 0.165P(C3|H) = 0.829
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
P(C1|H) = 0.066P(C2|H) = 0.333P(C3|H) = 0.600Compare with ML posterior after
Exp 1:
![Page 19: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/19.jpg)
Experiment 2: Tails
Which coin did I use?P(C1|HT) = ? P(C2|HT) = ? P(C3|HT) = ?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
![Page 20: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/20.jpg)
Experiment 2: Tails
Which coin did I use?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
P(C1|HT) = 0.035P(C2|HT) = 0.481P(C3|HT) = 0.485
![Page 21: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/21.jpg)
Experiment 2: Tails
Which coin did I use?P(C1|HT) = 0.035P(C2|HT)=0.481P(C3|HT) = 0.485
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
![Page 22: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/22.jpg)
Your Estimate?
What is the probability of heads after two experiments?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
Best estimate for P(H)
P(H|C3) = 0.9C3
Most likely coin:
![Page 23: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/23.jpg)
Your Estimate?
Most likely coin: Best estimate for P(H)
P(H|C3) = 0.9C3
Maximum A Posteriori (MAP) Estimate: The best hypothesis that fits observed data
assuming a non-uniform prior
P(H|C3) = 0.9
C3
P(C3) = 0.70
![Page 24: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/24.jpg)
Did We Do The Right Thing?
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
P(C1|HT)=0.035 P(C2|HT)=0.481P(C3|HT)=0.485
![Page 25: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/25.jpg)
Did We Do The Right Thing?
P(C1|HT) =0.035P(C2|HT)=0.481P(C3|HT)=0.485
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1
C1C2 C3
C2 and C3 are almost equally likely
![Page 26: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/26.jpg)
A Better Estimate
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1C1
C2 C3
Recall: = 0.680
P(C1|HT)=0.035 P(C2|HT)=0.481P(C3|HT)=0.485
![Page 27: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/27.jpg)
Bayesian Estimate
P(C1|HT)=0.035 P(C2|HT)=0.481 P(C3|HT)=0.485
P(H|C2) = 0.5 P(H|C3) = 0.9P(H|C1) = 0.1C1
C2 C3
= 0.680
Bayesian Estimate: Minimizes prediction error, given data and (generally) assuming a
non-uniform prior
![Page 28: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/28.jpg)
Comparison After more experiments: HTH8
ML (Maximum Likelihood): P(H) = 0.5after 10 experiments: P(H) = 0.9
MAP (Maximum A Posteriori): P(H) = 0.9after 10 experiments: P(H) = 0.9
Bayesian: P(H) = 0.68after 10 experiments: P(H) = 0.9
![Page 29: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/29.jpg)
Comparison
ML (Maximum Likelihood):Easy to compute
MAP (Maximum A Posteriori): Still easy to computeIncorporates prior knowledge
Bayesian: Minimizes error => great when data is
scarcePotentially much harder to compute
![Page 30: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/30.jpg)
Summary For Now
Prior Hypothesis
Maximum Likelihood Estimate
Maximum A Posteriori Estimate
Bayesian Estimate
Uniform The most likely
Any The most likely
Any Weighted combination
![Page 31: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/31.jpg)
Continuous Case
•In the previous example, we chose from a discrete set of three coins
•In general, we have to pick from a continuous distribution of biased coins
![Page 32: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/32.jpg)
Continuous Case
![Page 33: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/33.jpg)
0
1
2
3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Continuous Case
![Page 34: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/34.jpg)
0
1
2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Continuous CasePrior
0
1
2
3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Exp 1: Heads
0
1
2
3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
1
2
3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Exp 2: Tails
0
1
2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
1
2
3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
uniform
with backgroundknowledge
![Page 35: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/35.jpg)
Continuous Case
0
1
2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
1
2
3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Bayesian EstimateMAP Estimate
ML Estimate
Posterior after 2 experiments:
w/ uniform prior
with backgroundknowledge
![Page 36: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/36.jpg)
-1
0
1
2
3
4
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-1
0
1
2
3
4
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
After 10 Experiments...Posterior:
Bayesian EstimateMAP Estimate
ML Estimatew/ uniform prior
with backgroundknowledge
![Page 37: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/37.jpg)
After 100 Experiments...
![Page 38: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/38.jpg)
© Daniel S. Weld 38
Topics
• Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case
• Learning Parameters for a Bayesian Network• Naive Bayes
Maximum Likelihood estimates Priors
• Learning Structure of Bayesian Networks
![Page 39: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/39.jpg)
© Daniel S. Weld 39
Review: Conditional Probability
• P(A | B) is the probability of A given B• Assumes that B is the only info known.• Defined by:
)(
)()|(
BP
BAPBAP
A BAB
Tru
e
![Page 40: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/40.jpg)
© Daniel S. Weld 40
Conditional IndependenceTru
e
B
A A B
A&B not independent, since P(A|B) < P(A)
![Page 41: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/41.jpg)
© Daniel S. Weld 41
Conditional IndependenceTru
e
B
A A B
C
B C
AC
But: A&B are made independent by C
P(A|C) =P(A|B,C)
![Page 42: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/42.jpg)
© Daniel S. Weld 42
Bayes Rule
Simple proof from def of conditional probability:
)(
)()|()|(
EP
HPHEPEHP
)(
)()|(
EP
EHPEHP
)(
)()|(
HP
EHPHEP
)()|()( HPHEPEHP
QED:
(Def. cond. prob.)
(Def. cond. prob.)
)(
)()|()|(
EP
HPHEPEHP
(Mult by P(H) in line 1)
(Substitute #3 in #2)
![Page 43: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/43.jpg)
© Daniel S. Weld 43
An Example Bayes Net
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Pr(B=t) Pr(B=f) 0.05 0.95
Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)
Radio
![Page 44: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/44.jpg)
© Daniel S. Weld 44
Given Parents, X is Independent of
Non-Descendants
![Page 45: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/45.jpg)
© Daniel S. Weld 45
Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X) Childs(X) Par(Childs(X))
![Page 46: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/46.jpg)
Parameter Estimation and Bayesian Networks
E B R A J MT F T T F TF F F F F TF T F T T TF F F T T TF T F F F F
...
We have: - Bayes Net structure and observations- We need: Bayes Net parameters
![Page 47: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/47.jpg)
Parameter Estimation and Bayesian Networks
E B R A J MT F T T F TF F F F F TF T F T T TF F F T T TF T F F F F
...
P(B) = ? -5
0
5
10
15
20
25
0 0.2 0.4 0.6 0.8 1
Prior
+ data = -2
0
2
4
6
8
10
12
14
16
18
20
0 0.2 0.4 0.6 0.8 1
Now computeeither MAP or
Bayesian estimate
![Page 48: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/48.jpg)
Parameter Estimation and Bayesian Networks
E B R A J MT F T T F TF F F F F TF T F T T TF F F T T TF T F F F F
...
P(A|E,B) = ?P(A|E,¬B) = ?P(A|¬E,B) = ?P(A|¬E,¬B) = ?
![Page 49: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/49.jpg)
Parameter Estimation and Bayesian Networks
E B R A J MT F T T F TF F F F F TF T F T T TF F F T T TF T F F F F
...
P(A|E,B) = ?P(A|E,¬B) = ?P(A|¬E,B) = ?P(A|¬E,¬B) = ?
Prior
0
1
2
0 0.2 0.4 0.6 0.8 1
+ data= 0
1
2
0 0.2 0.4 0.6 0.8 1
Now com
pute
eith
er M
AP or
Bayes
ian
estim
ate
![Page 50: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/50.jpg)
© Daniel S. Weld 50
Topics
• Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case
• Learning Parameters for a Bayesian Network• Naive Bayes
Maximum Likelihood estimates Priors
• Learning Structure of Bayesian Networks
![Page 51: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/51.jpg)
© Daniel S. Weld 51
Recap
• Given a BN structure (with discrete or continuous variables), we can learn the parameters of the conditional prop tables.
Spam
Nigeria NudeSex
Earthqk Burgl
Alarm
N2N1
![Page 52: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/52.jpg)
What if we don’t know structure?
![Page 53: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/53.jpg)
Learning The Structureof Bayesian Networks
• Search thru the space… of possible network structures! (for now, assume we observe all variables)
• For each structure, learn parameters• Pick the one that fits observed data best
Caveat – won’t we end up fully connected????
• When scoring, add a penalty model complexity
Problem !?!?
![Page 54: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/54.jpg)
Learning The Structureof Bayesian Networks
• Search thru the space • For each structure, learn parameters• Pick the one that fits observed data best
• Problem? Exponential number of networks! And we need to learn parameters for each! Exhaustive search out of the question!
• So what now?
![Page 55: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/55.jpg)
Learning The Structureof Bayesian Networks
Local search! Start with some network structure Try to make a change (add or delete or reverse edge) See if the new network is any better
What should be the initial state?
![Page 56: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/56.jpg)
Initial Network Structure?
• Uniform prior over random networks?
• Network which reflects expert knowledge?
![Page 57: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/57.jpg)
© Daniel S. Weld 57
Learning BN Structure
![Page 58: © Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors](https://reader030.vdocuments.us/reader030/viewer/2022032523/56649d775503460f94a59649/html5/thumbnails/58.jpg)
The Big Picture
• We described how to do MAP (and ML) learning of a Bayes net (including structure)
• How would Bayesian learning (of BNs) differ?•Find all possible networks
•Calculate their posteriors
•When doing inference, return weighed combination of predictions from all networks!