introduction to probability theory - tu wien

Introduction to Probability Theory

JESPER LARSSON TRÄFF FRANCESCO VERSACI

– Lectures on Parallel Algorithms –

11 November, 2013

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 1 / 44

References

C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.

M. Loève. Probability Theory I. Springer, 1977.

M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.


http://math.dartmouth.edu/~prob/prob/prob.pdf

Probability Space – 1/2

Sample space ΩThe set of outcomes of some random process

E.g. Heads, Tails for a coin toss or

, , , , ,

if we rolla die

Measurable events FA family of Ω subsets which represent all the possible events forwhich we would like to compute the probability

E.g.,

, ,

should be in F if we want to compute what’s theprobability that, by rolling a die, we get an even number as result

More formally F is a σ-algebra over Ω

We will stick to discrete probability spaces, so we can take F tobe the family of all the subsets of Ω, i.e., F = 2Ω


Probability Space – 2/2

Probability measure PrPr : F → RIt assigns probabilities toevents

E.g., if we roll a die, theprobability to get an evennumber is one half

Pr(

, ,)

=12

Andrey Kolmogorov

σ-algebra F over ΩE ∈ F ⇒ Ω \ E ∈ F (closed under complementation)

E1,E2, . . . ∈ F ⇒ ⋃i Ei ∈ F (closed under countable unions)

F is non empty (at least ∅ and Ω are in F)


Probability Measure

Pr : F → RNon-negativity ∀E ∈ F Pr(E) > 0σ-additivity For all countable sequences of pairwise disjoint eventsE1,E2, . . .

Pr

(⋃i

Ei

)=∑i

Pr (Ei)

Normalization Pr(Ω) = 1Null empty set Pr(∅) = 0 (follows from the axioms above)

Banach–Tarski paradox

In general, we define probability on a σ-algebra and not simply on 2Ω

because if Ω is infinite weird things can happen. E.g., it is possible todivide a sphere in R3 into a finite number of pairwise disjoint subsetsand, by recombining these subsets (just by moving and rotating them),get two spheres, as big as the original one.


Probability of Complementary Events

Let E be an event and E := Ω \ E its complement. Then we have

Pr(E) = 1 − Pr(E)

Proof.Ω = E ∪ E, with E and E disjoint. Then

Pr(Ω) = Pr(E) + Pr(E)

and hencePr(E) = 1 − Pr(E)


Probability of Non-Disjoint EventsSubadditivity

Two events

Pr(E1 ∪ E2) = Pr(E1) + Pr(E2) − Pr(E1 ∩ E2)

E1 E2E1 ∩ E2

Three events

Pr(E1 ∪ E2 ∪ E3) = Pr(E1) + Pr(E2) + Pr(E3)

− Pr(E1 ∩ E2) − Pr(E2 ∩ E3) − Pr(E1 ∩ E3)

+ Pr(E1 ∩ E2 ∩ E3)

n events (general case)

Pr

(n⋃i=1

Ei

)=

n∑l=1

(−1)l+1∑

16i1<···<il6nPr

(l⋂r=1

Eir

)F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 7 / 44

Independence

Independent eventsTwo events E and F are independentif and only if

Pr(E ∩ F) = Pr(E) Pr(F)

n events E1, . . . ,En are mutuallyindependent if and only if

∀I ⊆ 1, . . . ,n

Pr

(⋂i∈IEi

)=∏i∈I

Pr(Ei)

Conditional probabilityThe conditional probabilitythat event E occurs giventhat event F occurs is

Pr(E|F) =Pr(E ∩ F)

Pr(F)

We assume Pr(F) > 0

NoteIf E and F are twoindependent events, than

Pr(E|F) = Pr(E)


Law of Total Probability

Simple case

Let E and B be events, and E := Ω \ E. Then

Pr(B) = Pr(B ∩ E) + Pr(B ∩ E) = Pr(B|E) Pr(E) + Pr(B|E) Pr(E)

General caseLet E1, . . . ,En be mutually disjoint events which partition Ω (i.e.,⋃ni=1 Ei = Ω). Then, for all events B,

Pr(B) =n∑i=1

Pr(B ∩ Ei) =n∑i=1

Pr(B|Ei) Pr(Ei)


Bayes’ Law

Simple case

Let E and B be events, and E := Ω \ E. Then

Pr(E|B) =Pr(E ∩ B)

Pr(B)=

Pr(B|E) Pr(E)Pr(B)

=Pr(B|E) Pr(E)

Pr(B|E) Pr(E) + Pr(B|E) Pr(E)

General caseLet E1, . . . ,En be mutually disjoint events whichpartition Ω (i.e.,

⋃ni=1 Ei = Ω). Then, for all j

and all events B,

Pr(Ej|B) =Pr(Ej ∩ B)

Pr(B)=

Pr(B|Ej) Pr(E)∑ni=1 Pr(B|Ei) Pr(Ei) Thomas Bayes


Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)?

Answer: 136

2 White and black outcomes are equals? Answer: 636 = 1

63 White and black outcomes are different? Answer: 1 − 1

6 = 56

4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3

36 = 14

5 White is larger than black? Answer: 5+4+3+2+136 = 15


27 Both outcomes are odd? Answer: 3·3

36 = 14

8 At least one outcome is odd? Answer: 12 + 1

2 − 14 = 3

4


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1

362 White and black outcomes are equals?

Answer: 636 = 1

63 White and black outcomes are different? Answer: 1 − 1

6 = 56

4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3

36 = 14

5 White is larger than black? Answer: 5+4+3+2+136 = 15



36 = 14


2 − 14 = 3

4


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1


36 = 16

3 White and black outcomes are different?

Answer: 1 − 16 = 5


Answer: 3·336 = 1


36 = 1536




2 + 12 − 1

4 = 34


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1


36 = 16



Answer: 3·336 = 1


36 = 1536




2 + 12 − 1

4 = 34


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1


36 = 16



Answer: 3·336 = 1

45 White is larger than black?

Answer: 5+4+3+2+136 = 15



36 = 14


2 − 14 = 3

4


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1


36 = 16



Answer: 3·336 = 1


36 = 1536

6 White is odd?

Answer: 12



2 + 12 − 1

4 = 34


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1


36 = 16



Answer: 3·336 = 1


36 = 1536


7 Both outcomes are odd?

Answer: 3·336 = 1


2 + 12 − 1

4 = 34


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1


36 = 16



Answer: 3·336 = 1


36 = 1536



48 At least one outcome is odd?

Answer: 12 + 1

2 − 14 = 3

4


Examples


, , , , ,

and Ω2 :=

, , , , ,

.



=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)


(,)? Answer: 1


36 = 16



Answer: 3·336 = 1


36 = 1536




2 + 12 − 1

4 = 34


Examples

A data packet travels through n = 10 routers. Each relay hasprobability p = 1% to corrupt the packet. What’s the probabilityPbad that the packet arrives corrupted at the destination?

AnswerThe n events “packet is corrupted at router i” (with 1 6 i 6 n) areindependent. It is easier to compute the probability Pok for the packetto arrive unaltered, and then take the complementary event. Theprobability that at a given relay the packet remains unaltered is 1 − p,and hence

Pok = (1 − p)n .

Finally we have

Pbad = 1 − Pok = 1 − (1 − p)n ≈ 9.56% .


Examples

A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent?

Answers: Yes, no.We roll a fair die.

1 What’s the probability that the outcome is ?Answer: Pr(

) = 1

62 We are told that the outcome is greater than 3. What is now the

probability that the outcome is ?Answer:

Pr(

|

, ,)

=Pr(

∩

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13


Examples

A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.

1 What’s the probability that the outcome is ?

Answer: Pr(

) = 16

2 We are told that the outcome is greater than 3. What is now theprobability that the outcome is ?Answer:

Pr(

|

, ,)

=Pr(

∩

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13


Examples



) = 1


probability that the outcome is ?

Answer:

Pr(

|

, ,)

=Pr(

∩

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13


Examples



) = 1


probability that the outcome is ?Answer:

Pr(

|

, ,)

=Pr(

∩

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13


Examples

A medical test for some disease has probability qF = 1% of falsepositives and qN = 2% of false negatives. The percentage ofpopulation having the disease is qD = 5%.

1 What is the probability that someone, chosen at random, ispositive to the test?

2 What is the probability that someone, who is negative to the test,nonetheless has the disease?


Examples

Answer – 1/2We consider the following events:

T : The person is positive to the test

D : The person has the disease

We know that

Pr(T |D) = qF , Pr(T |D) = qN , Pr(D) = qD .

and we want to find Pr(T). The law of total probability tells us that

Pr(T) = Pr(T |D) Pr(D) + Pr(T |D) Pr(D) .

Since Pr(T |D) = 1 − Pr(T |D) we have

Pr(T) = (1 − qN)qD + qF(1 − qD) = 5.85% .


Examples

Answer – 2/2We now want to find Pr(D|T). Bayes’ law gives us

Pr(D|T) =Pr(T |D) Pr(D)

Pr(T)

=qNqD

1 − (1 − qN)qD + qF(1 − qD)≈ 1.06%

(If you don’t test, you have 5% probability to have the disease, if youtest and come out negative, you still have more than 1% of probabilityof having the disease.)


CombinatoricsPermutations

Let A = a1, . . . ,an be a set of n (distinct) elementsLet k 6 n, consider the ordered sequence of length k(ai1 , . . . ,aik)The number of such possible sequences (k-permutations of n) is

P(n, k) = n(n− 1) · · · (n− k+ 1) =n!

(n− k)!

In particular, the permutations of n elements are n!

ExampleLet A = ♠,♣,F, the ordered sequences of 2 elements are:

(♠,♣) , (♠,F) , (♣,♠) , (♣,F) , (F,♠) , (F,♣) .

P(3, 2) =3!1!

= 2 · 3 = 6


CombinatoricsCombinations

Let A = a1, . . . ,an be a set of n (distinct) elements

Let k 6 n, consider the non-ordered sequence of length kai1 , . . . ,aik

The number of such possible sequences (k-combinations of n) is

C(n,k) =(n

k

)=

n!k!(n− k)!

ExampleLet A = ♠,♣,F, the non-ordered sequences of 2 elements are:

♠,♣ = ♣,♠ , ♠,F = F,♠ , ♣,F = F,♣ .

C(3, 2) =(

32

)3!

1!2!=

63= 3


Random Variables

A random variable X on a sample space Ω is a real function on Ω:

X : Ω→ R

RemarkIn general we should also require the preimage of Borel sets to bemapped into the σ-algebra of Ω (i.e., X should be measurable), butthat’s not an issue for discrete probability spaces.

Independence of random variablesTwo random variables X and Y are independent if and only if

∀x∀y Pr ((X = x) ∩ (Y = y)) = Pr(X = x) Pr(Y = y)


Random VariablesExample

We roll two dice, thus having Ω =(

,)

,(

,)

, . . . ,(

,)

assample space. Consider the random variable X = product of the twooutcomes.

We have, e.g., X(

,)= 6 and X

(,)= 18

We write X = a to refer to the set ω ∈ Ω : X(ω) = a

Pr(X = 12) = Pr((

,)

,(

,)

,(

,)

,(

,))

= 19


Probability Distribution Functions

A random variable X is typically defined using some distributionfunctions:

Discrete randomvariablesNon-cumulative

Probability mass function(or pmf)

fX(a) := Pr(X = a)

Cumulative Cumulativedistribution function (orcdf)

FX(a) := Pr(X 6 a)

Continuous random variablesNon-cumulative Probability density

function (or pdf)∫ba

fX(t) dt = Pr(a < X 6 b)

Cumulative Cumulative distributionfunction (or cdf)

FX(a) := Pr(X 6 a) =∫a−∞ fX(t) dt


Expectation

The expectation (or expected value) E[X] of a random variable X is

Discrete r.v.

E[X] :=∑x∈X

x fX(x)

The sum is done over the image of XX := x ∈ R : ∃ω ∈ Ω s.t. X(ω) = x.

Continuous r.v.

E[X] :=∫+∞−∞ t fX(t) dt

Absolute convergence of the series/integral is required

ExampleA die is rolled. If the outcome is a prime number, you win 10e,otherwise you lose 4e. What’s the expected value of the game?

E[X] = −416+ 10

16+ 10

16− 4

16+ 10

16− 4

16= −4

12+ 10

12= 3


Sum of Random Variables

Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters. Let

ΩZ = ΩX ×ΩY Z : ΩZ → R Z = aX+ bY + c .

Then Z is a random variable.

Linearity of expectationFurthermore, we have,

E[Z] = E[aX+ bY + c] = a E[X] + b E[Y] + c

Note: this holds even if X and Y are not independent.


Product of Random Variables

Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters.

DefinitionLet

ΩZ = ΩX ×ΩY Z : ΩZ → R Z = XY .

Then Z is a random variable.

ExpectationIn general, we have,

E[Z] = E[XY] 6= E[X]E[Y]

Note: but if X and Y are independent, then E[XY] = E[X]E[Y].


Functions of a Random Variable

Let X be a random variable on the sample space Ω and let g : R→ Rbe a (measurable) function. Then g(X) is also a random variable.

Theorem (Law of the unconscious statistician)

E[g(X)] =∑x∈X

g(x)fX(x)

(E[g(X)] =

∫+∞−∞ g(t)fX(t) dt

)

NoteIn general, E[g(X)] 6= g (E[X]).

Jensen’s inequality

If g is a convex function (e.g., g : x 7→ x2)

E[g(X)] > g (E[X])


Conditional Expectation

Let X and Y be two discrete random variables and y ∈ R. We definethe conditional expectation of X, given Y = y as

E[X|Y = y] =∑x∈X

x Pr(X = x|Y = y)

E[X] =∑y∈Y

fY(y) E[X|Y = y] (total prob.)

E

[n∑i=1

Xi

∣∣∣∣∣Y = y

]=

n∑i=1

E[Xi|Y = y] (linearity)

Continuous r.v.The extension to continuous random variables is somewhat morecomplicated, and since we do not need it we are going to skip it. . .


Examples

We roll 4 dice. What is the expectation of the sum of theoutcomes?

Answer:

E[Z] = E

[4∑i=1

Xi

]=

4∑i=1

E[Xi] = 4 · 3.5 = 14

We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:

X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =

12 ω1 = ω2

0 otherwise

Note that X and Y are not independent. However, because of thelinearity of expectation we have

E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1


Examples

We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:

E[Z] = E

[4∑i=1

Xi

]=

4∑i=1

E[Xi] = 4 · 3.5 = 14

We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game?

Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:

X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =

12 ω1 = ω2

0 otherwise


E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1


Examples

We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:

E[Z] = E

[4∑i=1

Xi

]=

4∑i=1

E[Xi] = 4 · 3.5 = 14

We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:

X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =

12 ω1 = ω2

0 otherwise


E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1


Examples

We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?

Answer: LetX be describing the first outcome (i.e., X

( )= 1, X

( )= 2,

etc.)Y be describing the second outcome (with Y = 0 if the die is not

rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]

E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]

Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain

E[X+ Y] = 3.5 +3.56

=4912≈ 4.08


Examples

We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?Answer: LetX be describing the first outcome (i.e., X

( )= 1, X

( )= 2,

etc.)Y be describing the second outcome (with Y = 0 if the die is not

rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]

E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]

Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain

E[X+ Y] = 3.5 +3.56

=4912≈ 4.08


Binomial DistributionDefinition

Bernoulli indicatorAn experiment has probability p to succeeds and 1− p to fail (e.g., wetoss a biased coin). We define the following random variable X:

X :=

1 if the experiment succeeds

0 otherwise

Binomial distributionWe repeat the Bernoulli experiment n times (independently, and withthe same distribution). We now want to count the number ofsuccessfull experiments, and hence define the new random variable Y:

Y :=

n∑i=1

Xi


Binomial DistributionProperties

Expected valuesBernoulli ∀i E[Xi] = 1 · p+ 0 · (1 − p) = p

Binomial E[Y] = E

[n∑i=1

Xi

]=

n∑i=1

E[Xi] = np

Distribution function of Binomial r.v.

fY(k) = Pr(Y = k) =

(n

k

)pk(1 − p)n−k

Reminder: Binomial theorem

∀x∀y (x+ y)n =

n∑k=0

(n

k

)xkyn−k


Geometric DistributionDefinition

We repeat a Bernoulli trials process until an experiment succeeds. LetX be the random variable which gives the number of tests we have toperform until the experiment finally succeeds.

Distribution function of the Geometric r.v.

fX(n) = Pr(X = n) = (1 − p)n−1p

Why geometric?Let’s prove that fX is really a distribution (i.e., that is sums to one). Letq := 1 − p.

∞∑n=1

fX(n) =

∞∑n=1

(1 − p)n−1p = p

∞∑n=0

qn =p

1 − q= 1


Geometric DistributionProperties

Expectation

E[X] =∞∑n=1

nfX(n) =

∞∑n=1

n(1 − p)n−1p

= p

∞∑n=1

nqn−1 = p

∞∑n=1

d(qn)dq

= pd

dq

∞∑n=1

qn

= pd

dq

[1

1 − q− 1]= p

ddq

[q

1 − q

]= p

1(1 − q)2 =

1p

The geometric r.v. is memoryless

Pr(X = n+ k |X > k) = Pr(X = n)

Which means that if some number has not come out for 20 weeks inthe Lotto game, it isn’t more luckily to be extracted. . .


Example: Coupon Collector’s Problem

There are n different coupons and we want to collect all of them.Each day we get a new (uniformly random) coupon. How long does ittake, on average, to finish the collection?

Answer – 1/2Consider the following random variables:

Xi Number of coupons we get while we have exactly i− 1 coupons

X Number of coupons we get until we have all the coupons

We have

X =

n∑i=1

Xi ⇒ E[X] =n∑i=1

E[Xi]

Each Xi is a geometric r.v., with

pi = 1 −i− 1n

=n− i+ 1

n


Example: Coupon Collector’s Problem

Answer – 2/2We finally have

E[Xi] =1pi

=n

n− i+ 1

E[X] =n∑i=1

E[Xi] =n∑i=1

n

n− i+ 1= n

n∑i=1

1i= nHn

If n = 80 coupons, then E[X] ≈ 397 days.

Harmonic numbers

Hn :=

n∑i=1

1i= lnn+ γ+O

(1n

)with γ ≈ 0.577 being the Euler-Mascheroni constant


Example: Randomized Quicksort

When sorting n elements, mergesort makes, on both the worst andaverage cases, M(n) = n log2(n) +O(n) comparisons. Weimplement quicksort choosing the pivot uniformly at random. What isthe average number of comparisons made by quicksort?

Answer – 1/3Let [x1, x2, . . . , xn] be the sorted list of elements and ∀j∀i < j let Xijbe the following Bernoulli random variable

Xij :=

1 if xi and xj get compared during the execution

0 otherwise

and let X be

X :=

n−1∑i=1

n∑j=i+1

Xij

We want to compute E[X].F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 35 / 44


Answer – 2/3By linearity of expectation we have

E[X] =n−1∑i=1

n∑j=i+1

E[Xij]

,

and then we just need to compute the probability that a generic pair(xi, xj) gets compared during the execution of the algorithm.Consider the segment [xi, xi+1, . . . , xj]; xi and xj are compared ifand only if one of the two is chosen as pivot before all theintermediate values xi+1, . . . , xj−1 (otherwise they are split indifferent classes and never compared against each other). Theprobability for this to happen is

pij =2

j− i+ 1⇒ E

[Xij]=

2j− i+ 1



Answer – 3/3

E[X] =n−1∑i=1

n∑j=i+1

E[Xij]=

n−1∑i=1

n∑j=i+1

2j− i+ 1

=

n−1∑i=1

n−i+1∑k=2

2k=

n∑k=2

n+1−k∑i=1

2k=

n∑k=2

2(n+ 1 − k)

k

= (n+ 1)

[n∑k=2

2k

]− 2(n− 1) = 2(n+ 1)

[n∑k=1

1k

]− 4n

= 2(n+ 1)Hn−4n = 2n ln(n) +O(n)

=2

log2(e)n log2(n) +O(n) ≈ 1.39M(n)

Randomized quicksort makes about 39% more comparisons thanmergesort on the average cases.


Moments

Let X be a discrete random variable

Raw momentsThe n-th (raw) moment of X is

E [Xn] =∑x∈X

xnfX(x)

Central momentsThe n-th central moment of X is

E [(X− E[X])n] =∑x∈X

(x− µ)n fX(x)

where µ := E[X].


Variance

Definition

Var[X] = σ2[X] := E[(X− E[X])2

]Note

Var[X] = E[(X− E[X])2

]= E

[X2 − 2X E[X] + (E[X])2

]= E

[X2]− 2 E[X] E[X] + (E[X])2

= E[X2]− (E[X])2

Sum of independent variablesLet X1,X2, . . . ,Xn be mutually independent random variables. Then

Var

[n∑i=1

Xi

]=

n∑i=1

Var [Xi]


Examples

Bernoulli distribution

Var[X] = p(1 − p)

Binomial distribution

Var[X] = np(1 − p)

Geometric distribution

Var[X] =1 − p

p2


Markov’s Inequality

TheoremLet X be a non-negative r.v. Then,

∀a ∈ R,a > 0 Pr(X > a) 6E[X]a

ExampleA coin is tossed 100 times. Let X be the number of heads outcomes(binomial variable with p = 1

2 ). We have

E[X] = 50 Var[X] = 100 · 12· 1

2= 25

An upper bound on the probability that we have at least 80 heads isgiven by

Pr(X > 80) 6E[X]80

=58= 62.5%


Chebyshev’s Inequality

TheoremLet X be a r.v. Then,

∀a ∈ R,a > 0 Pr (|X− E[X]| > a) 6Var[X]a2

ExampleAgain, a coin is tossed 100 times. An upper bound on the probabilitythat we have at least 80 heads is given by

Pr(X > 80) =Pr (|X− E[X]| > 30)

2

612

Var[X]302 =

12

25900

=1

72≈ 1.39%


Chernoff BoundFor the Sum of Poisson Trials

Theorem

X :=

n∑i=1

Xi, with Xi Bernoulli r.v. with parameter pi

Let µ := E[X], then

∀δ ∈ ]0, 1[ Pr (|X− µ| > δµ) 6 2 exp(−µδ2

3

)ExampleAgain, a coin is tossed 100 times. Give an upper bound on theprobability that we have at least 80 heads. Let µδ = 30⇒ δ = 3

5 , then

Pr(X > 80) 6 exp(−µδ2

3

)= exp

(−50 · 93 · 25

)= exp(−6) ≈ 0.248%


References

C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.

M. Loève. Probability Theory I. Springer, 1977.

M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.


http://math.dartmouth.edu/~prob/prob/prob.pdf

introduction to probability theory - tu wien

Documents