introduction to probability theory - tu wien
TRANSCRIPT
![Page 1: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/1.jpg)
Introduction to Probability Theory
JESPER LARSSON TRÄFF FRANCESCO VERSACI
– Lectures on Parallel Algorithms –
11 November, 2013
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 1 / 44
![Page 2: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/2.jpg)
References
C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.
M. Loève. Probability Theory I. Springer, 1977.
M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 2 / 44
![Page 3: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/3.jpg)
Probability Space – 1/2
Sample space ΩThe set of outcomes of some random process
E.g. Heads, Tails for a coin toss or
, , , , ,
if we rolla die
Measurable events FA family of Ω subsets which represent all the possible events forwhich we would like to compute the probability
E.g.,
, ,
should be in F if we want to compute what’s theprobability that, by rolling a die, we get an even number as result
More formally F is a σ-algebra over Ω
We will stick to discrete probability spaces, so we can take F tobe the family of all the subsets of Ω, i.e., F = 2Ω
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 3 / 44
![Page 4: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/4.jpg)
Probability Space – 2/2
Probability measure PrPr : F → RIt assigns probabilities toevents
E.g., if we roll a die, theprobability to get an evennumber is one half
Pr(
, ,)
=12
Andrey Kolmogorov
σ-algebra F over ΩE ∈ F ⇒ Ω \ E ∈ F (closed under complementation)
E1,E2, . . . ∈ F ⇒ ⋃i Ei ∈ F (closed under countable unions)
F is non empty (at least ∅ and Ω are in F)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 4 / 44
![Page 5: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/5.jpg)
Probability Measure
Pr : F → RNon-negativity ∀E ∈ F Pr(E) > 0σ-additivity For all countable sequences of pairwise disjoint eventsE1,E2, . . .
Pr
(⋃i
Ei
)=∑i
Pr (Ei)
Normalization Pr(Ω) = 1Null empty set Pr(∅) = 0 (follows from the axioms above)
Banach–Tarski paradox
In general, we define probability on a σ-algebra and not simply on 2Ω
because if Ω is infinite weird things can happen. E.g., it is possible todivide a sphere in R3 into a finite number of pairwise disjoint subsetsand, by recombining these subsets (just by moving and rotating them),get two spheres, as big as the original one.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 5 / 44
![Page 6: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/6.jpg)
Probability of Complementary Events
Let E be an event and E := Ω \ E its complement. Then we have
Pr(E) = 1 − Pr(E)
Proof.Ω = E ∪ E, with E and E disjoint. Then
Pr(Ω) = Pr(E) + Pr(E)
and hencePr(E) = 1 − Pr(E)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 6 / 44
![Page 7: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/7.jpg)
Probability of Non-Disjoint EventsSubadditivity
Two events
Pr(E1 ∪ E2) = Pr(E1) + Pr(E2) − Pr(E1 ∩ E2)
E1 E2E1 ∩ E2
Three events
Pr(E1 ∪ E2 ∪ E3) = Pr(E1) + Pr(E2) + Pr(E3)
− Pr(E1 ∩ E2) − Pr(E2 ∩ E3) − Pr(E1 ∩ E3)
+ Pr(E1 ∩ E2 ∩ E3)
n events (general case)
Pr
(n⋃i=1
Ei
)=
n∑l=1
(−1)l+1∑
16i1<···<il6nPr
(l⋂r=1
Eir
)F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 7 / 44
![Page 8: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/8.jpg)
Independence
Independent eventsTwo events E and F are independentif and only if
Pr(E ∩ F) = Pr(E) Pr(F)
n events E1, . . . ,En are mutuallyindependent if and only if
∀I ⊆ 1, . . . ,n
Pr
(⋂i∈IEi
)=∏i∈I
Pr(Ei)
Conditional probabilityThe conditional probabilitythat event E occurs giventhat event F occurs is
Pr(E|F) =Pr(E ∩ F)
Pr(F)
We assume Pr(F) > 0
NoteIf E and F are twoindependent events, than
Pr(E|F) = Pr(E)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 8 / 44
![Page 9: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/9.jpg)
Law of Total Probability
Simple case
Let E and B be events, and E := Ω \ E. Then
Pr(B) = Pr(B ∩ E) + Pr(B ∩ E) = Pr(B|E) Pr(E) + Pr(B|E) Pr(E)
General caseLet E1, . . . ,En be mutually disjoint events which partition Ω (i.e.,⋃ni=1 Ei = Ω). Then, for all events B,
Pr(B) =n∑i=1
Pr(B ∩ Ei) =n∑i=1
Pr(B|Ei) Pr(Ei)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 9 / 44
![Page 10: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/10.jpg)
Bayes’ Law
Simple case
Let E and B be events, and E := Ω \ E. Then
Pr(E|B) =Pr(E ∩ B)
Pr(B)=
Pr(B|E) Pr(E)Pr(B)
=Pr(B|E) Pr(E)
Pr(B|E) Pr(E) + Pr(B|E) Pr(E)
General caseLet E1, . . . ,En be mutually disjoint events whichpartition Ω (i.e.,
⋃ni=1 Ei = Ω). Then, for all j
and all events B,
Pr(Ej|B) =Pr(Ej ∩ B)
Pr(B)=
Pr(B|Ej) Pr(E)∑ni=1 Pr(B|Ei) Pr(Ei) Thomas Bayes
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 10 / 44
![Page 11: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/11.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 12: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/12.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 13: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/13.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)?
Answer: 136
2 White and black outcomes are equals? Answer: 636 = 1
63 White and black outcomes are different? Answer: 1 − 1
6 = 56
4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3
36 = 14
5 White is larger than black? Answer: 5+4+3+2+136 = 15
366 White is odd? Answer: 1
27 Both outcomes are odd? Answer: 3·3
36 = 14
8 At least one outcome is odd? Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 14: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/14.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals?
Answer: 636 = 1
63 White and black outcomes are different? Answer: 1 − 1
6 = 56
4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3
36 = 14
5 White is larger than black? Answer: 5+4+3+2+136 = 15
366 White is odd? Answer: 1
27 Both outcomes are odd? Answer: 3·3
36 = 14
8 At least one outcome is odd? Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 15: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/15.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different?
Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 16: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/16.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 17: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/17.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black?
Answer: 5+4+3+2+136 = 15
366 White is odd? Answer: 1
27 Both outcomes are odd? Answer: 3·3
36 = 14
8 At least one outcome is odd? Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 18: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/18.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd?
Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 19: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/19.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd?
Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 20: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/20.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd?
Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 21: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/21.jpg)
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
![Page 22: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/22.jpg)
Examples
A data packet travels through n = 10 routers. Each relay hasprobability p = 1% to corrupt the packet. What’s the probabilityPbad that the packet arrives corrupted at the destination?
AnswerThe n events “packet is corrupted at router i” (with 1 6 i 6 n) areindependent. It is easier to compute the probability Pok for the packetto arrive unaltered, and then take the complementary event. Theprobability that at a given relay the packet remains unaltered is 1 − p,and hence
Pok = (1 − p)n .
Finally we have
Pbad = 1 − Pok = 1 − (1 − p)n ≈ 9.56% .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 12 / 44
![Page 23: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/23.jpg)
Examples
A data packet travels through n = 10 routers. Each relay hasprobability p = 1% to corrupt the packet. What’s the probabilityPbad that the packet arrives corrupted at the destination?
AnswerThe n events “packet is corrupted at router i” (with 1 6 i 6 n) areindependent. It is easier to compute the probability Pok for the packetto arrive unaltered, and then take the complementary event. Theprobability that at a given relay the packet remains unaltered is 1 − p,and hence
Pok = (1 − p)n .
Finally we have
Pbad = 1 − Pok = 1 − (1 − p)n ≈ 9.56% .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 12 / 44
![Page 24: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/24.jpg)
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent?
Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?Answer: Pr(
) = 1
62 We are told that the outcome is greater than 3. What is now the
probability that the outcome is ?Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
![Page 25: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/25.jpg)
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?
Answer: Pr(
) = 16
2 We are told that the outcome is greater than 3. What is now theprobability that the outcome is ?Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
![Page 26: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/26.jpg)
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?Answer: Pr(
) = 1
62 We are told that the outcome is greater than 3. What is now the
probability that the outcome is ?
Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
![Page 27: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/27.jpg)
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?Answer: Pr(
) = 1
62 We are told that the outcome is greater than 3. What is now the
probability that the outcome is ?Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
![Page 28: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/28.jpg)
Examples
A medical test for some disease has probability qF = 1% of falsepositives and qN = 2% of false negatives. The percentage ofpopulation having the disease is qD = 5%.
1 What is the probability that someone, chosen at random, ispositive to the test?
2 What is the probability that someone, who is negative to the test,nonetheless has the disease?
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 14 / 44
![Page 29: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/29.jpg)
Examples
Answer – 1/2We consider the following events:
T : The person is positive to the test
D : The person has the disease
We know that
Pr(T |D) = qF , Pr(T |D) = qN , Pr(D) = qD .
and we want to find Pr(T). The law of total probability tells us that
Pr(T) = Pr(T |D) Pr(D) + Pr(T |D) Pr(D) .
Since Pr(T |D) = 1 − Pr(T |D) we have
Pr(T) = (1 − qN)qD + qF(1 − qD) = 5.85% .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 15 / 44
![Page 30: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/30.jpg)
Examples
Answer – 2/2We now want to find Pr(D|T). Bayes’ law gives us
Pr(D|T) =Pr(T |D) Pr(D)
Pr(T)
=qNqD
1 − (1 − qN)qD + qF(1 − qD)≈ 1.06%
(If you don’t test, you have 5% probability to have the disease, if youtest and come out negative, you still have more than 1% of probabilityof having the disease.)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 16 / 44
![Page 31: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/31.jpg)
CombinatoricsPermutations
Let A = a1, . . . ,an be a set of n (distinct) elementsLet k 6 n, consider the ordered sequence of length k(ai1 , . . . ,aik)The number of such possible sequences (k-permutations of n) is
P(n, k) = n(n− 1) · · · (n− k+ 1) =n!
(n− k)!
In particular, the permutations of n elements are n!
ExampleLet A = ♠,♣,F, the ordered sequences of 2 elements are:
(♠,♣) , (♠,F) , (♣,♠) , (♣,F) , (F,♠) , (F,♣) .
P(3, 2) =3!1!
= 2 · 3 = 6
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 17 / 44
![Page 32: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/32.jpg)
CombinatoricsCombinations
Let A = a1, . . . ,an be a set of n (distinct) elements
Let k 6 n, consider the non-ordered sequence of length kai1 , . . . ,aik
The number of such possible sequences (k-combinations of n) is
C(n,k) =(n
k
)=
n!k!(n− k)!
ExampleLet A = ♠,♣,F, the non-ordered sequences of 2 elements are:
♠,♣ = ♣,♠ , ♠,F = F,♠ , ♣,F = F,♣ .
C(3, 2) =(
32
)3!
1!2!=
63= 3
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 18 / 44
![Page 33: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/33.jpg)
Random Variables
A random variable X on a sample space Ω is a real function on Ω:
X : Ω→ R
RemarkIn general we should also require the preimage of Borel sets to bemapped into the σ-algebra of Ω (i.e., X should be measurable), butthat’s not an issue for discrete probability spaces.
Independence of random variablesTwo random variables X and Y are independent if and only if
∀x∀y Pr ((X = x) ∩ (Y = y)) = Pr(X = x) Pr(Y = y)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 19 / 44
![Page 34: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/34.jpg)
Random VariablesExample
We roll two dice, thus having Ω =(
,)
,(
,)
, . . . ,(
,)
assample space. Consider the random variable X = product of the twooutcomes.
We have, e.g., X(
,)= 6 and X
(,)= 18
We write X = a to refer to the set ω ∈ Ω : X(ω) = a
Pr(X = 12) = Pr((
,)
,(
,)
,(
,)
,(
,))
= 19
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 20 / 44
![Page 35: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/35.jpg)
Probability Distribution Functions
A random variable X is typically defined using some distributionfunctions:
Discrete randomvariablesNon-cumulative
Probability mass function(or pmf)
fX(a) := Pr(X = a)
Cumulative Cumulativedistribution function (orcdf)
FX(a) := Pr(X 6 a)
Continuous random variablesNon-cumulative Probability density
function (or pdf)∫ba
fX(t) dt = Pr(a < X 6 b)
Cumulative Cumulative distributionfunction (or cdf)
FX(a) := Pr(X 6 a) =∫a−∞ fX(t) dt
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 21 / 44
![Page 36: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/36.jpg)
Expectation
The expectation (or expected value) E[X] of a random variable X is
Discrete r.v.
E[X] :=∑x∈X
x fX(x)
The sum is done over the image of XX := x ∈ R : ∃ω ∈ Ω s.t. X(ω) = x.
Continuous r.v.
E[X] :=∫+∞−∞ t fX(t) dt
Absolute convergence of the series/integral is required
ExampleA die is rolled. If the outcome is a prime number, you win 10e,otherwise you lose 4e. What’s the expected value of the game?
E[X] = −416+ 10
16+ 10
16− 4
16+ 10
16− 4
16= −4
12+ 10
12= 3
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 22 / 44
![Page 37: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/37.jpg)
Sum of Random Variables
Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters. Let
ΩZ = ΩX ×ΩY Z : ΩZ → R Z = aX+ bY + c .
Then Z is a random variable.
Linearity of expectationFurthermore, we have,
E[Z] = E[aX+ bY + c] = a E[X] + b E[Y] + c
Note: this holds even if X and Y are not independent.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 23 / 44
![Page 38: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/38.jpg)
Product of Random Variables
Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters.
DefinitionLet
ΩZ = ΩX ×ΩY Z : ΩZ → R Z = XY .
Then Z is a random variable.
ExpectationIn general, we have,
E[Z] = E[XY] 6= E[X]E[Y]
Note: but if X and Y are independent, then E[XY] = E[X]E[Y].
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 24 / 44
![Page 39: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/39.jpg)
Functions of a Random Variable
Let X be a random variable on the sample space Ω and let g : R→ Rbe a (measurable) function. Then g(X) is also a random variable.
Theorem (Law of the unconscious statistician)
E[g(X)] =∑x∈X
g(x)fX(x)
(E[g(X)] =
∫+∞−∞ g(t)fX(t) dt
)
NoteIn general, E[g(X)] 6= g (E[X]).
Jensen’s inequality
If g is a convex function (e.g., g : x 7→ x2)
E[g(X)] > g (E[X])
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 25 / 44
![Page 40: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/40.jpg)
Conditional Expectation
Let X and Y be two discrete random variables and y ∈ R. We definethe conditional expectation of X, given Y = y as
E[X|Y = y] =∑x∈X
x Pr(X = x|Y = y)
E[X] =∑y∈Y
fY(y) E[X|Y = y] (total prob.)
E
[n∑i=1
Xi
∣∣∣∣∣Y = y
]=
n∑i=1
E[Xi|Y = y] (linearity)
Continuous r.v.The extension to continuous random variables is somewhat morecomplicated, and since we do not need it we are going to skip it. . .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 26 / 44
![Page 41: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/41.jpg)
Examples
We roll 4 dice. What is the expectation of the sum of theoutcomes?
Answer:
E[Z] = E
[4∑i=1
Xi
]=
4∑i=1
E[Xi] = 4 · 3.5 = 14
We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:
X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =
12 ω1 = ω2
0 otherwise
Note that X and Y are not independent. However, because of thelinearity of expectation we have
E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44
![Page 42: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/42.jpg)
Examples
We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:
E[Z] = E
[4∑i=1
Xi
]=
4∑i=1
E[Xi] = 4 · 3.5 = 14
We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game?
Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:
X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =
12 ω1 = ω2
0 otherwise
Note that X and Y are not independent. However, because of thelinearity of expectation we have
E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44
![Page 43: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/43.jpg)
Examples
We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:
E[Z] = E
[4∑i=1
Xi
]=
4∑i=1
E[Xi] = 4 · 3.5 = 14
We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:
X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =
12 ω1 = ω2
0 otherwise
Note that X and Y are not independent. However, because of thelinearity of expectation we have
E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44
![Page 44: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/44.jpg)
Examples
We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?
Answer: LetX be describing the first outcome (i.e., X
( )= 1, X
( )= 2,
etc.)Y be describing the second outcome (with Y = 0 if the die is not
rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]
E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]
Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain
E[X+ Y] = 3.5 +3.56
=4912≈ 4.08
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 28 / 44
![Page 45: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/45.jpg)
Examples
We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?Answer: LetX be describing the first outcome (i.e., X
( )= 1, X
( )= 2,
etc.)Y be describing the second outcome (with Y = 0 if the die is not
rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]
E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]
Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain
E[X+ Y] = 3.5 +3.56
=4912≈ 4.08
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 28 / 44
![Page 46: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/46.jpg)
Binomial DistributionDefinition
Bernoulli indicatorAn experiment has probability p to succeeds and 1− p to fail (e.g., wetoss a biased coin). We define the following random variable X:
X :=
1 if the experiment succeeds
0 otherwise
Binomial distributionWe repeat the Bernoulli experiment n times (independently, and withthe same distribution). We now want to count the number ofsuccessfull experiments, and hence define the new random variable Y:
Y :=
n∑i=1
Xi
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 29 / 44
![Page 47: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/47.jpg)
Binomial DistributionProperties
Expected valuesBernoulli ∀i E[Xi] = 1 · p+ 0 · (1 − p) = p
Binomial E[Y] = E
[n∑i=1
Xi
]=
n∑i=1
E[Xi] = np
Distribution function of Binomial r.v.
fY(k) = Pr(Y = k) =
(n
k
)pk(1 − p)n−k
Reminder: Binomial theorem
∀x∀y (x+ y)n =
n∑k=0
(n
k
)xkyn−k
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 30 / 44
![Page 48: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/48.jpg)
Geometric DistributionDefinition
We repeat a Bernoulli trials process until an experiment succeeds. LetX be the random variable which gives the number of tests we have toperform until the experiment finally succeeds.
Distribution function of the Geometric r.v.
fX(n) = Pr(X = n) = (1 − p)n−1p
Why geometric?Let’s prove that fX is really a distribution (i.e., that is sums to one). Letq := 1 − p.
∞∑n=1
fX(n) =
∞∑n=1
(1 − p)n−1p = p
∞∑n=0
qn =p
1 − q= 1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 31 / 44
![Page 49: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/49.jpg)
Geometric DistributionProperties
Expectation
E[X] =∞∑n=1
nfX(n) =
∞∑n=1
n(1 − p)n−1p
= p
∞∑n=1
nqn−1 = p
∞∑n=1
d(qn)dq
= pd
dq
∞∑n=1
qn
= pd
dq
[1
1 − q− 1]= p
ddq
[q
1 − q
]= p
1(1 − q)2 =
1p
The geometric r.v. is memoryless
Pr(X = n+ k |X > k) = Pr(X = n)
Which means that if some number has not come out for 20 weeks inthe Lotto game, it isn’t more luckily to be extracted. . .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 32 / 44
![Page 50: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/50.jpg)
Example: Coupon Collector’s Problem
There are n different coupons and we want to collect all of them.Each day we get a new (uniformly random) coupon. How long does ittake, on average, to finish the collection?
Answer – 1/2Consider the following random variables:
Xi Number of coupons we get while we have exactly i− 1 coupons
X Number of coupons we get until we have all the coupons
We have
X =
n∑i=1
Xi ⇒ E[X] =n∑i=1
E[Xi]
Each Xi is a geometric r.v., with
pi = 1 −i− 1n
=n− i+ 1
n
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 33 / 44
![Page 51: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/51.jpg)
Example: Coupon Collector’s Problem
Answer – 2/2We finally have
E[Xi] =1pi
=n
n− i+ 1
E[X] =n∑i=1
E[Xi] =n∑i=1
n
n− i+ 1= n
n∑i=1
1i= nHn
If n = 80 coupons, then E[X] ≈ 397 days.
Harmonic numbers
Hn :=
n∑i=1
1i= lnn+ γ+O
(1n
)with γ ≈ 0.577 being the Euler-Mascheroni constant
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 34 / 44
![Page 52: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/52.jpg)
Example: Randomized Quicksort
When sorting n elements, mergesort makes, on both the worst andaverage cases, M(n) = n log2(n) +O(n) comparisons. Weimplement quicksort choosing the pivot uniformly at random. What isthe average number of comparisons made by quicksort?
Answer – 1/3Let [x1, x2, . . . , xn] be the sorted list of elements and ∀j∀i < j let Xijbe the following Bernoulli random variable
Xij :=
1 if xi and xj get compared during the execution
0 otherwise
and let X be
X :=
n−1∑i=1
n∑j=i+1
Xij
We want to compute E[X].F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 35 / 44
![Page 53: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/53.jpg)
Example: Randomized Quicksort
Answer – 2/3By linearity of expectation we have
E[X] =n−1∑i=1
n∑j=i+1
E[Xij]
,
and then we just need to compute the probability that a generic pair(xi, xj) gets compared during the execution of the algorithm.Consider the segment [xi, xi+1, . . . , xj]; xi and xj are compared ifand only if one of the two is chosen as pivot before all theintermediate values xi+1, . . . , xj−1 (otherwise they are split indifferent classes and never compared against each other). Theprobability for this to happen is
pij =2
j− i+ 1⇒ E
[Xij]=
2j− i+ 1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 36 / 44
![Page 54: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/54.jpg)
Example: Randomized Quicksort
Answer – 3/3
E[X] =n−1∑i=1
n∑j=i+1
E[Xij]=
n−1∑i=1
n∑j=i+1
2j− i+ 1
=
n−1∑i=1
n−i+1∑k=2
2k=
n∑k=2
n+1−k∑i=1
2k=
n∑k=2
2(n+ 1 − k)
k
= (n+ 1)
[n∑k=2
2k
]− 2(n− 1) = 2(n+ 1)
[n∑k=1
1k
]− 4n
= 2(n+ 1)Hn−4n = 2n ln(n) +O(n)
=2
log2(e)n log2(n) +O(n) ≈ 1.39M(n)
Randomized quicksort makes about 39% more comparisons thanmergesort on the average cases.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 37 / 44
![Page 55: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/55.jpg)
Moments
Let X be a discrete random variable
Raw momentsThe n-th (raw) moment of X is
E [Xn] =∑x∈X
xnfX(x)
Central momentsThe n-th central moment of X is
E [(X− E[X])n] =∑x∈X
(x− µ)n fX(x)
where µ := E[X].
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 38 / 44
![Page 56: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/56.jpg)
Variance
Definition
Var[X] = σ2[X] := E[(X− E[X])2
]Note
Var[X] = E[(X− E[X])2
]= E
[X2 − 2X E[X] + (E[X])2
]= E
[X2]− 2 E[X] E[X] + (E[X])2
= E[X2]− (E[X])2
Sum of independent variablesLet X1,X2, . . . ,Xn be mutually independent random variables. Then
Var
[n∑i=1
Xi
]=
n∑i=1
Var [Xi]
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 39 / 44
![Page 57: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/57.jpg)
Examples
Bernoulli distribution
Var[X] = p(1 − p)
Binomial distribution
Var[X] = np(1 − p)
Geometric distribution
Var[X] =1 − p
p2
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 40 / 44
![Page 58: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/58.jpg)
Markov’s Inequality
TheoremLet X be a non-negative r.v. Then,
∀a ∈ R,a > 0 Pr(X > a) 6E[X]a
ExampleA coin is tossed 100 times. Let X be the number of heads outcomes(binomial variable with p = 1
2 ). We have
E[X] = 50 Var[X] = 100 · 12· 1
2= 25
An upper bound on the probability that we have at least 80 heads isgiven by
Pr(X > 80) 6E[X]80
=58= 62.5%
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 41 / 44
![Page 59: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/59.jpg)
Chebyshev’s Inequality
TheoremLet X be a r.v. Then,
∀a ∈ R,a > 0 Pr (|X− E[X]| > a) 6Var[X]a2
ExampleAgain, a coin is tossed 100 times. An upper bound on the probabilitythat we have at least 80 heads is given by
Pr(X > 80) =Pr (|X− E[X]| > 30)
2
612
Var[X]302 =
12
25900
=1
72≈ 1.39%
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 42 / 44
![Page 60: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/60.jpg)
Chernoff BoundFor the Sum of Poisson Trials
Theorem
X :=
n∑i=1
Xi, with Xi Bernoulli r.v. with parameter pi
Let µ := E[X], then
∀δ ∈ ]0, 1[ Pr (|X− µ| > δµ) 6 2 exp(−µδ2
3
)ExampleAgain, a coin is tossed 100 times. Give an upper bound on theprobability that we have at least 80 heads. Let µδ = 30⇒ δ = 3
5 , then
Pr(X > 80) 6 exp(−µδ2
3
)= exp
(−50 · 93 · 25
)= exp(−6) ≈ 0.248%
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 43 / 44
![Page 61: Introduction to Probability Theory - TU Wien](https://reader031.vdocuments.us/reader031/viewer/2022020620/61e379749f58b64be026638c/html5/thumbnails/61.jpg)
References
C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.
M. Loève. Probability Theory I. Springer, 1977.
M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 44 / 44