review of probability theory - kti
TRANSCRIPT
![Page 1: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/1.jpg)
Knowledge Discovery and Data Mining 1 (VO) (707.003)Review of Probability Theory
Denis Helic
KTI, TU Graz
Oct 9, 2014
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 110
![Page 2: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/2.jpg)
Big picture: KDDM
Probability Theory Linear Algebra Hardware & ProgrammingModel
Mathematical Tools Infrastructure
Knowledge Discovery Process
Information Theory Statistical Inference
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 2 / 110
![Page 3: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/3.jpg)
Outline
1 Introduction
2 Conditional Probability and Independence
3 Random Variables
4 Discrete Random Variables
5 Continuous Random Variables
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 3 / 110
![Page 4: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/4.jpg)
Introduction
Random experiments
In random experiments we can not predict the output in advance
We can observe some “regularity” if we repeat the experiment a largenumber of times
E.g. when tossing a coin we can not predict the result of a single toss
If we toss many times we get an average of 50% of “heads” (fair coin)
Probability theory is a mathematical theory which describes suchphenomena
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 4 / 110
![Page 5: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/5.jpg)
Introduction
The state space
It is the set of all possible outcomes of the experiment
We denote the state space by Ω
Coin toss: Ω = t, hTwo successive coin tosses: Ω = tt, th, ht, hhDice roll: Ω = 1, 2, 3, 4, 5, 6The lifetime of a light-bulb: Ω = R+
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 5 / 110
![Page 6: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/6.jpg)
Introduction
The events
An event is a property that either holds or does not hold after theexperiment is done
Mathematically, an event is a subset of Ω
We denote the events by capital letters: A,B,C , . . .
E.g. rolling at least one heads in two successive coin tosses
A = th, ht, hh
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 6 / 110
![Page 7: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/7.jpg)
Introduction
The events
Some basic properties of events
If A and B are two events, then:
The contrary event of A is the complement set Ac
The event “A or B” is the union A ∪ B
The event “A and B” is the intersection A ∩ B
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 7 / 110
![Page 8: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/8.jpg)
Introduction
The events
Some basic properties of events
If A and B are two events, then:
The sure event is Ω
The impossible event is the empty set ∅An elementary (atomic) event is a subset of Ω containing a singleelement, e.g. ω
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 8 / 110
![Page 9: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/9.jpg)
Introduction
The events
We denote by A the family of all events
Very often A = 2Ω, the set of all subsets of Ω
The family A should be closed under the operations from above
If A,B ∈ A, then we must have: Ac ∈ A, A ∩ B ∈ A, A ∪ B ∈ AAlso: Ω ∈ A and ∅ ∈ A
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 9 / 110
![Page 10: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/10.jpg)
Introduction
The probability
With each event we associate a number P(A) called the probability ofA
P(A) is between 0 and 1
“Frequency” interpretation
P(A) is a limit of the “frequency” with which A is realized
P(A) = limit of f (A)n as n goes to positive infinity
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 10 / 110
![Page 11: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/11.jpg)
Introduction
The probability
Basic properties of the probabilities
(i) 0 ≤ P(A) ≤ 1
(ii) P(Ω) = 1
(iii) P(A ∪ B) = P(A) + P(B) if A ∩ B = ∅
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 11 / 110
![Page 12: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/12.jpg)
Introduction
The probability
The model is a triple (Ω,A,P)
Ω is the state space
A is the collection of all events
P(A) is the collection of all probabilities for A ∈ AP is a mapping from A into [0, 1] which satisfies at least properties(ii) and (iii) (Kolmogorov axioms)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 12 / 110
![Page 13: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/13.jpg)
Introduction
Venn diagrams
Ω
A
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 13 / 110
![Page 14: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/14.jpg)
Introduction
Probability measure
The probability P(A) of the event A is the area of the set in thediagram
The area of Ω is 1
E.g. radius of the event A is r = 0.2
P(A) = 0.1257
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 14 / 110
![Page 15: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/15.jpg)
Introduction
Properties of probability measures
Properties
If P is a probability measure on (Ω,A), then:
(i) P(∅) = 0
(ii) For every finite sequence An of pairwise disjoint (whenever i 6= j ,Ai ∩ Aj = ∅) elements of A we have:
P(∪mn=1) =m∑
n=1
P(An)
Property (ii) is called additivity
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 15 / 110
![Page 16: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/16.jpg)
Introduction
Additivity
Ω
A B
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 16 / 110
![Page 17: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/17.jpg)
Introduction
Probability measure
The probability of P(A ∪ B) of the event A ∪ B is the sum of areas ofthe sets A and B in the diagram
P(A) = 0.1257
P(A) = 0.1257
P(A ∪ B) = 0.2514
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 17 / 110
![Page 18: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/18.jpg)
Introduction
Properties of probability measures
Properties
If P is a probability measure on (Ω,A), then:
(i) For A,B ∈ A, A ⊂ B =⇒ P(A) ≤ P(B)
(ii) For A,B ∈ A, P(A ∪ B) = P(A) + P(B)− P(A ∩ B)
(iii) For A ∈ A, P(A) = 1− P(Ac)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 18 / 110
![Page 19: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/19.jpg)
Introduction
Subsets
Ω
AB
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 19 / 110
![Page 20: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/20.jpg)
Introduction
Union
Ω
A B
A∩B
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 20 / 110
![Page 21: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/21.jpg)
Introduction
Complement
ΩAc
A
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 21 / 110
![Page 22: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/22.jpg)
Conditional Probability and Independence
Conditional probability
We always have the triple (Ω,A,P)
Typically we suppress (Ω,A) and talk only about P
Nevertheless, they are always present!
Conditional probability and independence are crucial for application ofprobability theory in data mining!
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 22 / 110
![Page 23: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/23.jpg)
Conditional Probability and Independence
Conditional probability
Definition
If P(B) > 0 then we define the conditional probability of A given B:
P(A|B) =P(A ∩ B)
P(B)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 23 / 110
![Page 24: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/24.jpg)
Conditional Probability and Independence
Conditional probability
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Ω
A B
A∩B
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 24 / 110
![Page 25: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/25.jpg)
Conditional Probability and Independence
Conditional probability
P(B) = 0.16
P(A) = 0.12
P(A ∩ B) = 0.04
P(A|B) = P(A∩B)P(B) = 0.25
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 25 / 110
![Page 26: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/26.jpg)
Conditional Probability and Independence
Conditional probability
One intuitive explanation is that B occurred first and then we askwhat is the probability that now A occurs as well
Time dimension
Another intuitive explanation is that our knowledge about the worldincreased
We have more information and know that B already occurred
Technically, B restricts the state space (makes it smaller)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 26 / 110
![Page 27: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/27.jpg)
Conditional Probability and Independence
Conditional probability
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
Ω′=B
A∩B
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 27 / 110
![Page 28: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/28.jpg)
Conditional Probability and Independence
Conditional probability: example
We throw two dies
Event A = snake eyesEvent B = doubleΩ = (1, 1), (1, 2), . . . , (6, 5), (6, 6)A = (1, 1), B = (1, 1), (2, 2), . . . , (6, 6)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 28 / 110
![Page 29: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/29.jpg)
Conditional Probability and Independence
Conditional probability: example
P(A) = 136
P(B) =6∑
i=1
1
36by final additivity and because events are pairwise
disjoint =⇒ P(B) = 16
A ∩ B = A
P(A|B) = P(A∩B)P(B) =
13616
= 16
A = (1, 1), B = (1, 1), (2, 2), . . . , (6, 6)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 29 / 110
![Page 30: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/30.jpg)
Conditional Probability and Independence
Conditional probability: example
We have two boxes: red and blue
Each box contains apples and oranges
We first pick box at random
Then we pick a fruit from that box again at random
We are interested in the conditional probabilities of picking a specificfruit if a specific box was selected
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 30 / 110
![Page 31: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/31.jpg)
Conditional Probability and Independence
Conditional probability: example
Figure: From the book “Pattern Recognition and Machine Learning” by Bishop
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 31 / 110
![Page 32: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/32.jpg)
Conditional Probability and Independence
Conditional probability: example
P(A|B) = 34
P(O|B) = 14
P(A|R) = 14
P(O|R) = 34
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 32 / 110
![Page 33: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/33.jpg)
Conditional Probability and Independence
Independence
Definition
Event A and B are independent if:
P(A ∩ B) = P(A)P(B)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 33 / 110
![Page 34: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/34.jpg)
Conditional Probability and Independence
Independence
Events A and B are not related to each other
We flip coin once: event A
Second flip is the event B
The outcome of the second flip is not dependent on the outcome ofthe first flip
Intuitively, A and B are independent
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 34 / 110
![Page 35: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/35.jpg)
Conditional Probability and Independence
Independence
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0ΩA
B
A∩B
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 35 / 110
![Page 36: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/36.jpg)
Conditional Probability and Independence
Independence
P(A) = 0.5
P(B) = 0.5
P(A ∩ B) = 0.25
P(A)P(B) = 0.25
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 36 / 110
![Page 37: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/37.jpg)
Conditional Probability and Independence
Conditional independence
Definition
Suppose P(C ) > 0. Event A and B are conditionally independent given Cif:
P(A ∩ B|C ) = P(A|C )P(B|C )
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 37 / 110
![Page 38: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/38.jpg)
Conditional Probability and Independence
Conditional independence
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Ω
A
B
C
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 38 / 110
![Page 39: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/39.jpg)
Conditional Probability and Independence
Conditional independence
P(A) = 18 , P(B) = 1
8 , P(C ) = 14
P(A|C ) = 12 , P(B|C ) = 1
2
P(A ∩ B|C ) = 14 , P(A|C )P(B|C ) = 1
4
P(A ∩ B) = 116 , P(A)P(B) = 1
64
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 39 / 110
![Page 40: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/40.jpg)
Conditional Probability and Independence
Conditional independence
Remark
(i) Independence 6=⇒ conditional independence
(ii) Conditional independence 6=⇒ independence
Remark
Suppose P(B) > 0. Events A and B are independent if and only ifP(A|B) = P(A).
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 40 / 110
![Page 41: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/41.jpg)
Conditional Probability and Independence
Independence: Example
Pick a card at random from a deck of 52 cards
A = the card is a heart, B = the card is QueenP(i) = 1
52
By additivity, P(A) = 1352 , P(B) = 4
52
P(A ∩ B) = 152 (Queen heart), P(A)P(B) = 1
52
A and B are independent
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 41 / 110
![Page 42: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/42.jpg)
Conditional Probability and Independence
Bayes rule
Remark
Suppose P(A) > 0 and P(B) > 0. Then,
P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)
Theorem
Suppose P(A) > 0 and P(B) > 0. Then,
P(B|A) =P(A|B)P(B)
P(A)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 42 / 110
![Page 43: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/43.jpg)
Conditional Probability and Independence
Bayes rule
One of the most important concepts in statistical inference
Bayesian statistics
You start with a probabilistic model with parameters B
You observe data A and you are interested in the probability ofparameters given the data
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 43 / 110
![Page 44: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/44.jpg)
Conditional Probability and Independence
Chain & partition rule
Theorem
If A1,A2, . . . ,An are events and P(A1 ∩ · · · ∩ An−1) > 0, then
P(A1∩· · ·∩An) = P(A1)P(A2|A1)P(A3|A1∩A2) . . .P(An|A1∩· · ·∩An−1)
Definition
A partition of Ω is a finite or countable collection (Bn) if Bn ∈ A and:
(i) P(Bn) > 0, ∀n
(ii) Bi ∩ Bj = ∅,∀i 6= j (pairwise disjoint)
(iii) ∪iBi = Ω
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 44 / 110
![Page 45: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/45.jpg)
Conditional Probability and Independence
Partition rule
Ω
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 45 / 110
![Page 46: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/46.jpg)
Conditional Probability and Independence
Partition rule
Theorem
Let Bn, n ≥ 1 a finite or countable partition of Ω. Then if A ∈ A:
P(A) =∑n
P(A|Bn)P(Bn)
P(A) =∑n
P(A ∩ Bn)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 46 / 110
![Page 47: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/47.jpg)
Conditional Probability and Independence
Partition rule
Ω
A
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 47 / 110
![Page 48: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/48.jpg)
Conditional Probability and Independence
Bayes rule revisited
Theorem
Let Bn, n ≥ 1 a finite or countable partition of Ω and suppose P(A) > 0.Then
P(Bn|A) =P(A|Bn)P(Bn)∑m P(A|Bm)P(Bm)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 48 / 110
![Page 49: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/49.jpg)
Conditional Probability and Independence
Bayes rule: example
Medical tests
Donated blood is screened for AIDS. Suppose that if the blood is HIVpositive the test will be positive in 99% of cases. The test has also 5%false positive rating. In this age group one in ten thousand people are HIVpositive.Suppose that a person is screened as positive. What is the probability thatthis person has AIDS?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 49 / 110
![Page 50: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/50.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A) =
0.0001
P(Ac) = 0.9999
P(P|A) = 0.99
P(N|A) = 0.01
P(P|Ac) = 0.05
P(N|Ac) = 0.95
P(A|P) =?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 50 / 110
![Page 51: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/51.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A) = 0.0001
P(Ac) =
0.9999
P(P|A) = 0.99
P(N|A) = 0.01
P(P|Ac) = 0.05
P(N|Ac) = 0.95
P(A|P) =?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 50 / 110
![Page 52: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/52.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A) = 0.0001
P(Ac) = 0.9999
P(P|A) =
0.99
P(N|A) = 0.01
P(P|Ac) = 0.05
P(N|Ac) = 0.95
P(A|P) =?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 50 / 110
![Page 53: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/53.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A) = 0.0001
P(Ac) = 0.9999
P(P|A) = 0.99
P(N|A) =
0.01
P(P|Ac) = 0.05
P(N|Ac) = 0.95
P(A|P) =?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 50 / 110
![Page 54: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/54.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A) = 0.0001
P(Ac) = 0.9999
P(P|A) = 0.99
P(N|A) = 0.01
P(P|Ac) =
0.05
P(N|Ac) = 0.95
P(A|P) =?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 50 / 110
![Page 55: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/55.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A) = 0.0001
P(Ac) = 0.9999
P(P|A) = 0.99
P(N|A) = 0.01
P(P|Ac) = 0.05
P(N|Ac) =
0.95
P(A|P) =?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 50 / 110
![Page 56: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/56.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A) = 0.0001
P(Ac) = 0.9999
P(P|A) = 0.99
P(N|A) = 0.01
P(P|Ac) = 0.05
P(N|Ac) = 0.95
P(A|P) =?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 50 / 110
![Page 57: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/57.jpg)
Conditional Probability and Independence
Bayes rule: example
P(A|P) =P(P|A)P(A)
P(P)
=P(P|A)P(A)
P(P|A)P(A) + P(P|Ac)P(Ac)
= 0.00198
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 51 / 110
![Page 58: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/58.jpg)
Conditional Probability and Independence
Bayes rule: example
The disease is so rare that the number of false positives outnumbersthe people who have the disease
E.g. what can we expect in a population of 1 million
100 will have the disease and 99 will be correctly diagnosed
999,900 will not have the disease but 49,995(!) will be falselydiagnosed
If your test is positive the likelihood that you have the disease is99
99+49995 = 0.00198
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 52 / 110
![Page 59: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/59.jpg)
Conditional Probability and Independence
Bayes rule: example
Figure: From the book “Pattern Recognition and Machine Learning” by Bishop
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 53 / 110
![Page 60: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/60.jpg)
Conditional Probability and Independence
Bayes rule: example
P(R) = 25
P(B) = 35
P(A|B) = 34
P(O|B) = 14
P(A|R) = 14
P(O|R) = 34
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 54 / 110
![Page 61: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/61.jpg)
Conditional Probability and Independence
Bayes rule: example
Fruits
We select orange. What is the probability the box was red? P(R|O) =?
P(R|O) =P(O|R)P(R)
P(O)
=P(O|R)P(R)
P(O|R)P(R) + P(O|B)P(B)
=2
3
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 55 / 110
![Page 62: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/62.jpg)
Random Variables
Random variables
We use random variables to refer to “random quantities”
E.g. we flip a coin 5 times and are interested in the number of heads
E.g. the lifetime of the bulb
E.g. the number of occurrences of a word in a text document
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 56 / 110
![Page 63: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/63.jpg)
Random Variables
Random variables
Definition
Given a probability measure space (Ω,A,P) a random variable is afunction X : Ω→ R such that ω ∈ Ω : X (ω) ≤ x ∈ A, ∀x ∈ R
Remark
The condition is a technicality ensuring that the set is measurable. Wejust need to know that a random variable is a function that maps eventsonto numbers.
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 57 / 110
![Page 64: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/64.jpg)
Random Variables
Random variables: example
We transmit 10 data packets over a communication channel
Events are of the form (S , S , S ,F ,F , S ,S ,S , S , S)
State space Ω contains all possible 210 sequences
What is the probability that we will observe n successful transmissions
We associate a r.v. with the number of successful transmissions
The r.v. takes on the values 0, 1, . . . , 10
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 58 / 110
![Page 65: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/65.jpg)
Discrete Random Variables
Discrete random variables
Definition
A r.v. X is discrete if X (Ω) is countable (finite or countably infinite).
Remark
E.g. X (Ω) = x1, x2, . . . Ω is countable =⇒ X (Ω) is countable and X is discrete
These r.v. are called discrete random variables
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 59 / 110
![Page 66: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/66.jpg)
Discrete Random Variables
Discrete random variables
A discrete r.v. is characterized by its probability mass function(PMF)
pX (x) = P(X = x)
pX (x) =∑
ω:X (ω)=x
P(ω)
We will shorten the notation and write p(x)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 60 / 110
![Page 67: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/67.jpg)
Discrete Random Variables
Discrete random variables: example
Die rolls
Let X be the sum of two die rolls.
Ω = (1, 1), (1, 2), . . . , (6, 5), (6, 6)X (Ω) = 2, 3, . . . , 12E.g. X ((1, 2)) = 3,X ((2, 1)) = 3,X ((3, 5)) = 8, . . .
E.g. p(3) =?
p(3) =∑
ω:X (ω)=3
P(ω) = P((1, 2)) + P((2, 1)) =2
36
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 61 / 110
![Page 68: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/68.jpg)
Discrete Random Variables
Discrete random variables: example
E.g. p(4) =?
p(4) =∑
ω:X (ω)=4
P(ω) = P((1, 3)) + P((3, 1)) + P((2, 2)) =3
36
p(x) =x − 1
36, 2 ≤ x ≤ 7
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 62 / 110
![Page 69: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/69.jpg)
Discrete Random Variables
Joint probability mass function
We can introduce multiple r.v. on the same probability measure space(Ω,A,P)
Let X and Y be r.v. on that space, then the probability that X andY take on values x and y is given by:
P(ω ∈ Ω|X (ω) = x ,Y (ω) = y)
Shortly, we write:
P(X = x ,Y = y)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 63 / 110
![Page 70: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/70.jpg)
Discrete Random Variables
Joint probability mass function
We define joint PMF as:
pXY (x , y) = P(X = x ,Y = y)
Shortly, we write:
p(x , y)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 64 / 110
![Page 71: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/71.jpg)
Discrete Random Variables
Joint PMF: example
Text classification
Suppose we have a collection of documents that are either about China orJapan (document topics). We model a word occurrence as an event ω in aprobability space. Let Ω = all word occurrences. Let X be a r.v. thatmaps those occurrences to an enumeration of words and let Y be a r.v.that maps an occurrence to an enumeration of topics (either China orJapan). What is the joint PMF p(x , y).
Document Class
Chinese Beijing Chinese China
Chinese Chinese Shanghai China
Chinese Macao China
Tokyo Japan Chinese Japan
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 65 / 110
![Page 72: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/72.jpg)
Discrete Random Variables
Joint PMF: example
Word Class
Chinese China
Beijing China
Chinese China
Chinese China
Chinese China
Shanghai China
Chinese China
Macao China
Tokyo Japan
Japan Japan
Chinese Japan
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 66 / 110
![Page 73: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/73.jpg)
Discrete Random Variables
Joint PMF: example
YX
Chinese Beijing Shanghai Macao Tokyo Japan
China 5 1 1 1 0 0
Japan 1 0 0 0 1 1
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 67 / 110
![Page 74: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/74.jpg)
Discrete Random Variables
Joint PMF: example
YX
Chinese Beijing Shanghai Macao Tokyo Japan
China 5/11 1/11 1/11 1/11 0 0
Japan 1/11 0 0 0 1/11 1/11
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 68 / 110
![Page 75: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/75.jpg)
Discrete Random Variables
Marginal PMF
p(x) and p(y) are called marginal probability mass functions
Remark
p(x) =∑y
p(x , y)
p(y) =∑x
p(x , y)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 69 / 110
![Page 76: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/76.jpg)
Discrete Random Variables
Marginal PMF: example
YX
Chinese Beijing Shanghai Macao Tokyo Japan p(y)
China 5/11 1/11 1/11 1/11 0 0 8/11Japan 1/11 0 0 0 1/11 1/11 3/11p(x) 6/11 1/11 1/11 1/11 1/11 1/11
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 70 / 110
![Page 77: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/77.jpg)
Discrete Random Variables
Conditional PMF
Definition
A conditional probability mass function is defined as:
p(x |y) =p(x , y)
p(y)
Again, we can easily establish connection to the underlayingprobability space and events
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 71 / 110
![Page 78: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/78.jpg)
Discrete Random Variables
Conditional PMF: example
p(x |C )X
Chinese Beijing Shanghai Macao Tokyo Japan
p(x |China) 5/8 1/8 1/8 1/8 0 0
p(x |J)X
Chinese Beijing Shanghai Macao Tokyo Japan
p(x |Japan) 1/3 0 0 0 1/3 1/3
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 72 / 110
![Page 79: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/79.jpg)
Discrete Random Variables
Independence
Definition
Two r.v. X and Y are independent if:
p(x , y) = p(x)p(y), ∀x , y ∈ R
All rules are equivalent to the rules for events, we just work withPMFs instead.
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 73 / 110
![Page 80: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/80.jpg)
Discrete Random Variables
Joint PMF
In general we can have many r.v. defined on the same probabilitymeasure space Ω
X1, . . . ,Xn
We define the joint PMF as:
p(x1, . . . , xn) = P(X1 = x1, . . . ,Xn = xn)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 74 / 110
![Page 81: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/81.jpg)
Discrete Random Variables
Common discrete random variables
Certain random variables commonly appear in nature and applications
Bernoulli random variable
Binomial random variable
Geometric random variable
Poisson random variable
Power-law random variable
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 75 / 110
![Page 82: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/82.jpg)
Discrete Random Variables
Common discrete random variables
IPython Notebook examples
http:
//kti.tugraz.at/staff/denis/courses/kddm1/pmf.ipynb
Command Line
ipython notebook –pylab=inline pmf.ipynb
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 76 / 110
![Page 83: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/83.jpg)
Discrete Random Variables
Bernoulli random variable
PMF
p(x) =
1− p if x = 0
p if x = 1
Bernoulli r.v. with parameter p
Models situations with two outcomes
E.g. we start a task on a cluster node. Does the node fail (X = 0) orsuccessfully finish the task (X = 1)?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 77 / 110
![Page 84: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/84.jpg)
Discrete Random Variables
Bernoulli random variable
0 1k
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7pr
obab
ility
of k
Probability mass function of a Bernoulli random variable; differing p values
p=0.3
p=0.6
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 78 / 110
![Page 85: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/85.jpg)
Discrete Random Variables
Binomial random variable
Suppose X1, . . . ,Xn are independent and identical Bernoulli r.v.
The Binomial r.v. with parameters (p, n) is
Y = X1 + · · ·+ Xn
Models the number of successes in n Bernoulli trials
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 79 / 110
![Page 86: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/86.jpg)
Discrete Random Variables
Binomial random variable
Cluster nodes
We start tasks on n cluster nodes. How many nodes successfully finishtheir task?
Probability of a single cluster configuration with k successes.
p(ω) = (1− p)n−kpk
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 80 / 110
![Page 87: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/87.jpg)
Discrete Random Variables
Binomial random variable
How many successful configurations exist?
p(k) = N(k)(1− p)n−kpk
N(k) =
(n
k
)=
n!
(n − k)!k!
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 81 / 110
![Page 88: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/88.jpg)
Discrete Random Variables
Binomial random variable
PMF
p(k) =
(n
k
)(1− p)n−kpk
E.g. how many heads we get in n coin flips
E.g. how many packets we transmit over n communication channels
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 82 / 110
![Page 89: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/89.jpg)
Discrete Random Variables
Binomial random variable
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19k
0.00
0.05
0.10
0.15
0.20
0.25
0.30pr
obab
ility
of k
Probability mass function of a Binomial random variable; differing p values
p=0.1
p=0.6
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 83 / 110
![Page 90: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/90.jpg)
Discrete Random Variables
Power-law (Zipf) random variable
Power-law distribution is a very commonly occurring distribution
Word occurences in natural language
Friendships in a social network
Links on the web
PageRank, etc.
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 84 / 110
![Page 91: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/91.jpg)
Discrete Random Variables
Power-law (Zipf) random variable
PMF
p(k) =k−α
ζ(α)
k ∈ N, k ≥ 1, α > 1
ζ(α) is the Riemann zeta function
ζ(α) =∞∑k=1
k−α
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 85 / 110
![Page 92: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/92.jpg)
Discrete Random Variables
Power-law (Zipf) random variable
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19k
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9pro
babili
ty o
f k
Probability mass function of a Zipf random variable; differing α values
α=2.0
α=3.0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 86 / 110
![Page 93: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/93.jpg)
Discrete Random Variables
Power-law (Zipf) random variable
0 1 2 3 4 5 6 7 8 9k
10-3
10-2
10-1
100pro
babili
ty o
f k
Probability mass function of a Zipf random variable; differing α values
α=2.0
α=3.0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 87 / 110
![Page 94: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/94.jpg)
Discrete Random Variables
Expectation
Definition
The expectation of a discrete r.v. X with PMF p is
E [X ] =∑
x∈X (Ω)
xp(x)
when this sum is “well-defined”, otherwise the expectation does not exist.
Remark
(i) “Well-defined”: it could be infinite, but it should not alternatebetween −∞ and ∞
(ii) Expectation is the average value of a r.v.
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 88 / 110
![Page 95: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/95.jpg)
Discrete Random Variables
Expectation: example
Gambling game
We play repeatedly a gambling game. Each time we play we either win10¿ or lose 10¿. What are our average winnings?
Let wk be our winning for game k . Then the average winning in ngames is:
W =w1 + · · ·+ wn
n
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 89 / 110
![Page 96: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/96.jpg)
Discrete Random Variables
Expectation: example
Let nW be the number of wins and nL the number of losses. Then,
W =10nW − 10nL
n= 10
nW
n− 10
nL
n
If we approximate P(win) ≈ nWn and P(loss) ≈ nL
n . Then,
W = 10P(win)− 10P(loss) =∑
x∈X (Ω)
xp(x)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 90 / 110
![Page 97: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/97.jpg)
Discrete Random Variables
Linearity of expectation
Theorem
Suppose X and Y are discrete r.v. such that E [X ] <∞ and E [Y ] <∞.Then,
E [aX ] = aE [X ], ∀a ∈ RE [X + Y ] = E [X ] + E [Y ]
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 91 / 110
![Page 98: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/98.jpg)
Discrete Random Variables
Variance
Definition
The variance σ2(X ), var(X ) of a discrete r.v. X is the expectation of ther.v. (X − E [X ])2
var(X ) = E [(X − E [X ])2]
Remark
Variance indicates how close X typically is to E [X ]
var(X ) = E [X 2]− (E [X ])2
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 92 / 110
![Page 99: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/99.jpg)
Discrete Random Variables
Covariance
Definition
The covariance cov(X ,Y ) of two discrete r.v. X and Y is the expectationof the r.v. (X − E [X ])(Y − E [Y ])
cov(X ,Y ) = E [(X − E [X ])(Y − E [Y ])]
cov(X ,Y ) = E [XY ]− E [X ]E [Y ]
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 93 / 110
![Page 100: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/100.jpg)
Discrete Random Variables
Covariance
Remark
Covariance measures how much two r.v. change together, i.e. do X and Ytend to be small together, or is X large when Y is small (or vice versa), ordo they change independently of each other
If greater values of X correspond with greater values of Y , and sameholds for small values then cov(X ,Y ) > 0
In the opposite case cov(X ,Y ) < 0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 94 / 110
![Page 101: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/101.jpg)
Discrete Random Variables
Covariance
If X and Y are independent then cov(X ,Y ) = 0
This follows because E [XY ] = E [X ]E [Y ] in the case of independence
Is the opposite true?
If cov(X ,Y ) = 0 are X and Y independent?
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 95 / 110
![Page 102: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/102.jpg)
Discrete Random Variables
Covariance
Covariance and independence
Suppose X takes on values −2,−1, 1, 2 with equal probability. SupposeY = X 2.
cov(X ,Y ) = E [XY ]− E [X ]E [Y ] = E [X 3] = 0
Clearly X and Y are not independent
They are linearly independent, but not independent in general
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 96 / 110
![Page 103: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/103.jpg)
Discrete Random Variables
Covariance
Remark
(i) Independence of X and Y =⇒ cov(X ,Y ) = 0
(ii) cov(X ,Y ) = 0 6=⇒ Independence of X and Y
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 97 / 110
![Page 104: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/104.jpg)
Continuous Random Variables
Continuous random variables
Definition
A r.v. X is continuous (general) if X (Ω) is uncountable.
A general description is given by P(X ∈ (−∞, x ])
We define the cumulative distribution function (CDF) of X as:
FX (x) = P(X ∈ (−∞, x ])
We will write shortly F (x) and P(X ≤ x)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 98 / 110
![Page 105: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/105.jpg)
Continuous Random Variables
Cumulative distribution function (CDF)
Definition
A cumulative distribution function (CDF) is a function F : R→ R suchthat
(i) F is non-decreasing (x ≤ y =⇒ F (x) ≤ F (y))
(ii) F is right-continuous ( limxa
= F (a))
(iii) limx→∞
F (x) = 1
(iv) limx→−∞
F (x) = 0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 99 / 110
![Page 106: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/106.jpg)
Continuous Random Variables
CDF: non-decreasing
10 5 0 5 100.0
0.2
0.4
0.6
0.8
1.0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 100 / 110
![Page 107: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/107.jpg)
Continuous Random Variables
CDF: right-continuous
10 5 0 5 100.0
0.2
0.4
0.6
0.8
1.0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 101 / 110
![Page 108: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/108.jpg)
Continuous Random Variables
CDF: infinity limits
20 15 10 5 0 5 10 15 200.0
0.2
0.4
0.6
0.8
1.0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 102 / 110
![Page 109: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/109.jpg)
Continuous Random Variables
Probability density function (PDF)
Suppose that CDF is continuous and differentiable
Definition
A probability density function (PDF) of a r.v. X is defined as:
f (x) =dF (x)
dx
Definition
A joint PDF of two r.v. X and Y is defined as:
f (x , y) =∂2F (x , y)
∂x∂y
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 103 / 110
![Page 110: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/110.jpg)
Continuous Random Variables
Probability density function (PDF)
Definition
Suppose X and Y are two r.v. defined on the same probability measurespace. Conditional PDF of X given Y is defined as:
f (x |y) =f (x , y)
f (y)
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 104 / 110
![Page 111: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/111.jpg)
Continuous Random Variables
Expectation
Definition
Expectation E [X ] of a r.v. X with a PDF f (x) is defined as:
E [X ] =
∫ ∞−∞
xp(x)dx
In a similar way we define variance and covariance for a joint PDF
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 105 / 110
![Page 112: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/112.jpg)
Continuous Random Variables
Common continuous random variables
Certain random variables commonly appear in nature and applications
Exponential random variable
Normal (Gaussian) random variable
Power-law random variable
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 106 / 110
![Page 113: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/113.jpg)
Continuous Random Variables
Common continuous random variables
IPython Notebook examples
http:
//kti.tugraz.at/staff/denis/courses/kddm1/pdf.ipynb
Command Line
ipython notebook –pylab=inline pdf.ipynb
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 107 / 110
![Page 114: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/114.jpg)
Continuous Random Variables
Normal (Gaussian) random variable
Normal distribution is a very commonly occurring distribution
Continuous approximate to the binomial for large n and p not tooclose to neither 0 nor 1
Continuous approximate to the Poisson dist. with nλ large
Measurement errors
Student grades
Measures of sizes of living organisms
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 108 / 110
![Page 115: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/115.jpg)
Continuous Random Variables
Normal random variable
f (x) =1√
2πσ2e−
(x−µ)2
2σ2
µ is the mean (expectation) and σ2 is the variance of a normallydistributed r.v.
CDF
F (x) = Φ(x − µσ
),Φ(x) =1√2π
∫ x
−∞e−
x′22 dx ′
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 109 / 110
![Page 116: Review of Probability Theory - KTI](https://reader031.vdocuments.us/reader031/viewer/2022032816/6206138d8c2f7b1730046229/html5/thumbnails/116.jpg)
Continuous Random Variables
Normal random variable
10 5 0 5 10x
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40pr
obab
ility
of x
PDF of a Normal random variable; differing µ and σ values
µ=0.0,σ=1.0
µ=−2.0,σ=2.0
Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 110 / 110