probability and statistics review thursday mar 12
TRANSCRIPT
![Page 1: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/1.jpg)
Probability and Statistics Review
Thursday Mar 12
![Page 2: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/2.jpg)
מודל למידה
?מה המשתנים החשובים?מה הטווח שלהם?מהן הקומבינציות החשובות
![Page 3: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/3.jpg)
חזרה מהירה על הסתברות
מאורע , אוסף מאורעות•משתנה מקרי •הסתברות מותנית , חוק ההסתברות השלמה , •
חוק השרשרת ,חוק בייסתלות ואי תלות •שונות ,תוחלת ושונות משותפת••Moments
![Page 4: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/4.jpg)
Sample space and Events
• Sample Space, result of an experiment• If you toss a coin twice
• Event: a subset of • First toss is head = {HH,HT}
• S: event space, a set of events• Closed under finite union and complements
• Entails other binary operation: union, diff, etc.• Contains the empty event and
![Page 5: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/5.jpg)
Probability Measure
• Defined over (Ss.t.• P() >= 0 for all in S• P() = 1• If are disjoint, then
• P( U ) = p() + p()• We can deduce other axioms from the above ones
• Ex: P( U ) for non-disjoint event
![Page 6: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/6.jpg)
Visualization
• We can go on and define conditional probability, using the above visualization
![Page 7: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/7.jpg)
הסתברות מותנית
-P(F|H) = Fraction of worlds in which H is true that also have F true
)(
)()|(
Hp
HFpHFp
![Page 8: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/8.jpg)
Rule of total probability
A
B1
B2B3
B4
B5
B6B7
ii BAPBPAp |
![Page 9: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/9.jpg)
Bayes Rule
• We know that P(smart) = .7• If we also know that the students grade is
A+, then how this affects our belief about his intelligence?
• Where this comes from?
)(
)|()(|
yP
xyPxPyxP
![Page 10: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/10.jpg)
דוגמא
מתוצרת המפעל מיוצרת B . 10% ו-Aבמפעל פעולות שתי מכונות 5% ו A מהמוצרים המיוצרים במכונה B. 1% במכונה 90% ו-Aבמכונה
הם פגומים.Bמהמוצרים המיוצרים במכונה ?נבחר מוצר אקראי, מה ההסתברות שהוא פגום נמצא מוצר שהוא פגום, מה ההסתברות שיוצר במכונהA? אחרי ביקור של טכנאי שמטפל במכונהB ממוצרי 1.9% , מוצאים ש
Bהמפעל הם פגומים. מה עכשיו ההסתברות שמוצר המיוצר במכונה יהיה פגום?
פתרון:-המוצר A , B-המוצר הנבחר יוצר במכונה Aנגדיר את המאורעות הבאים:
-המוצר שנבחר פגום.B, Cהנבחר יוצר במכונה P(C|A)*P(A)+P(C|B)*P(B)א. עפ"י נוסחת ההסתברות השלמה מתקיים:
0.046=0.01*0.1 +0.05*0.9|P(A|C)=P(C|A)*P(A)/P(C) => P(A ב. עפ"י נוסחת בייס:
C)=0.01*0.1/0.046=0.021
![Page 11: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/11.jpg)
משתנה מקרי בדיד
מ"מ הוא פונקציה ממרחב המאורעות הכללי (העולם) •למרחב המאפיינים
בעצם ניתן לייצג התפלגות חדשה על פי המשתנה המקרי •
• Modeling students (Grade and Intelligence):• all possible students• What are events
• Grade_A = all students with grade A• Grade_B = all students with grade A• Intelligence_High = … with high intelligence
![Page 12: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/12.jpg)
Random Variables
High
low
A
B A+
I:Intelligence
G:Grade
![Page 13: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/13.jpg)
Random Variables
High
low
A
B A+
I:Intelligence
G:Grade
P(I = high) = P( {all students whose intelligence is high})
![Page 14: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/14.jpg)
הסתברות משותפת
• Joint probability distributions quantify this • P( X= x, Y= y) = P(x, y)
• How probable is it to observe these two attributes together?• How can we manipulate Joint probability distributions?
.1,2,3,4דוגמא:מרכיבים באקראי מס' דו סיפרתי מהספרות 1 מס' הפעמים שהספרה Y מס' הספרות השונות המופיעות במס' ו Xיהי
מופיעה.?)X,Y(מהי ההתפלגות המשותפת של הזוג
XY
1 2
0 3/16 6/16
1 0 6/16
2 1/16 0
![Page 15: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/15.jpg)
חוק השרשרת
• Always true• P(x,y,z) = p(x) p(y|x) p(z|x, y)
= p(z) p(y|z) p(x|y, z) =…
כדי לסבך קצת את העניינים נוסיף לשאלה מקודם
יהיה מס הפעמים שמופיע ספרה גדולה ממש zאת הנתון הבא 2מ
P(x=2,y=1,z=1)=P(x=2)*P(y=1|x=2)*P(z=1|y=1,x=2)=0.75*[(6/16)/P(x=2)]*[(4/16)/(6/16)]
![Page 16: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/16.jpg)
Conditional Probability
P X YP X Y
P Y
x yx y
y
)(
),(|
yp
yxpyxP
But we will always write it this way:
events
![Page 17: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/17.jpg)
הסתברות השולית
• We know P(X,Y), what is P(X=x)?• We can use the low of total probability, why?
y
y
yxPyP
yxPxp
|
,
A
B1
B2B3B4
B5
B6B7
![Page 18: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/18.jpg)
Marginalization Cont.
• Another example
yz
zy
zyxPzyP
zyxPxp
,
,
,|,
,,
![Page 19: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/19.jpg)
Bayes Rule cont.
• You can condition on more variables
)|(
),|()|(,|
zyP
zxyPzxPzyxP
![Page 20: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/20.jpg)
אי תלות
• X is independent of Y means that knowing Y does not change our belief about X.• P(X|Y=y) = P(X) • P(X=x, Y=y) = P(X=x) P(Y=y)
• Why this is true?• The above should hold for all x, y• It is symmetric and written as X Y
![Page 21: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/21.jpg)
CI: Conditional Independence
• X Y | Z if once Z is observed, knowing the value of Y does not change our belief about X• The following should hold for all x,y,z• P(X=x | Z=z, Y=y) = P(X=x | Z=z) • P(Y=y | Z=z, X=x) = P(Y=y | Z=z) • P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)
We call these factors : very useful concept !!
![Page 22: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/22.jpg)
Properties of CI
• Symmetry: – (X Y | Z) (Y X | Z)
• Decomposition: – (X Y,W | Z) (X Y | Z)
• Weak union: – (X Y,W | Z) (X Y | Z,W)
• Contraction: – (X W | Y,Z) & (X Y | Z) (X Y,W | Z)
• Intersection: – (X Y | W,Z) & (X W | Y,Z) (X Y,W | Z) – Only for positive distributions!– P()>0, 8, ;
![Page 23: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/23.jpg)
Monty Hall Problem
You're given the choice of three doors: Behind one door is a car; behind the others, goats.
You pick a door, say No. 1 The host, who knows what's behind the doors,
opens another door, say No. 3, which has a goat. Do you want to pick door No. 2 instead?
![Page 24: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/24.jpg)
Host mustreveal Goat B
Host mustreveal Goat A
Host revealsGoat A
orHost reveals
Goat B
![Page 25: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/25.jpg)
Monty Hall Problem: Bayes Rule
: the car is behind door i, i = 1, 2, 3 : the host opens door j after you pick door i
iC
ijH 1 3iP C
0
0
1 2
1 ,
ij k
i j
j kP H C
i k
i k j k
![Page 26: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/26.jpg)
Monty Hall Problem
WLOG, i=1, j=3
13 1 11 13
13
P H C P CP C H
P H
13 1 11 1 1
2 3 6P H C P C
![Page 27: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/27.jpg)
Monty Hall Problem: Bayes Rule cont.
13 13 1 13 2 13 3
13 1 1 13 2 2
, , ,
1 11
6 31
2
P H P H C P H C P H C
P H C P C P H C P C
1 131 6 1
1 2 3P C H
![Page 28: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/28.jpg)
Monty Hall Problem: Bayes Rule cont.
1 131 6 1
1 2 3P C H
You should switch!
2 13 1 131 2
13 3
P C H P C H
![Page 29: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/29.jpg)
Moments
Mean (Expectation): Discrete RVs:
Continuous RVs:
Variance: Discrete RVs:
Continuous RVs:
XE X P X
ii iv
E v v XE xf x dx
2X XV E 2X P X
ii iv
V v v 2XV x f x dx
![Page 30: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/30.jpg)
Properties of Moments
Mean If X and Y are independent,
Variance If X and Y are independent,
X Y X YE E E X XE a aE
XY X YE E E
2X XV a b a V
X Y (X) (Y)V V V
![Page 31: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/31.jpg)
The Big Picture
Model Data
Probability
Estimation/learning
![Page 32: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/32.jpg)
Statistical Inference
Given observations from a model What (conditional) independence assumptions
hold? Structure learning
If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning
![Page 33: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/33.jpg)
MLE
Maximum Likelihood estimation Example on board
Given N coin tosses, what is the coin bias ( )? Sufficient Statistics: SS
Useful concept that we will make use later In solving the above estimation problem, we only
cared about Nh, Nt , these are called the SS of this model. All coin tosses that have the same SS will result in the
same value of Why this is useful?
![Page 34: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/34.jpg)
Statistical Inference
Given observation from a model What (conditional) independence assumptions
holds? Structure learning
If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning
We need some concepts from information theory
![Page 35: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/35.jpg)
Information Theory
• P(X) encodes our uncertainty about X• Some variables are more uncertain that others
• How can we quantify this intuition?• Entropy: average number of bits required to encode X
P(X) P(Y)
X Y
xP xP
xPxp
EXH1
log1
log
![Page 36: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/36.jpg)
Information Theory cont.
• Entropy: average number of bits required to encode X
• We can define conditional entropy similarly
• We can also define chain rule for entropies (not surprising)
xP xP
xPxp
EXH1
log1
log
YHYXHyxp
EYXH PPP
,
|
1log|
YXZHXYHXHZYXH PPPP ,||,,
![Page 37: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/37.jpg)
Mutual Information: MI
• Remember independence?• If XY then knowing Y won’t change our belief about X• Mutual information can help quantify this! (not the only
way though)• MI:
• Symmetric• I(X;Y) = 0 iff, X and Y are independent!
YXHXHYXI PPP |;
![Page 38: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/38.jpg)
Continuous Random Variables
What if X is continuous? Probability density function (pdf) instead of
probability mass function (pmf) A pdf is any function that describes the
probability density in terms of the input variable x.
f x
![Page 39: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/39.jpg)
PDF Properties of pdf
Actual probability can be obtained by taking the integral of pdf E.g. the probability of X being between 0 and 1 is
0,f x x
1f x
1
0P 0 1X f x dx
1 ???f x
![Page 40: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/40.jpg)
Cumulative Distribution Function
Discrete RVs
Continuous RVs
X P XF v v
X P Xi
ivF v v
X
vF v f x dx
X
dF x f x
dx
![Page 41: Probability and Statistics Review Thursday Mar 12](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649f165503460f94c2c4ac/html5/thumbnails/41.jpg)
Acknowledgment
Andrew Moore Tutorial: http://www.autonlab.org/tutorials/prob.html
Monty hall problem: http://en.wikipedia.org/wiki/Monty_Hall_problem http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html