outline probability theory historical background …...combinatorics is a branch of mathematics that...

CombinatoricsProbability

Conditional probability and Bayes’ ruleRandom variables

Radboud University Nijmegen

Probability theory

B. Jacobs

Institute for Computing and Information Sciences – Digital SecurityRadboud University Nijmegen

Version: fall 2014

B. Jacobs Version: fall 2014 Probability theory 1 / 78




Outline

Combinatorics

Probability

Conditional probability and Bayes’ rule

Random variablesContinuous random variablesFinal remarks





Historical background

“Probability” is the part of mathematics which looks for lawsgoverning random events. It has its origins in games of chance i.e.in gambling.Chevalier de Mere (1607-1684) was a famous gambler and a friendof Blaise Pascal, who started to develop probability theory

Example (Question about rolling dices)

What is more likely to get:

1 at least one 6 in 4 rolls of one dice

2 at least one pair (6,6) in 24 simultaneous rolls of two dice?

Chevalier expected (2), and lost money as a result.

• p1 = 1− (56)4 ≈ 0.518 (or 51.8% chance)

• p2 = 1− (3536)24 ≈ 0.491





Another de Mere-like challenge (from teacherlink.org)

Would you take the following bet, about repeatedly rolling twodice:

“I will get both a sum 8 and a sum 6,before you get two sums of 7.”





If you take it, I win and you loose

Consider all possible sums as outcomes:

1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

The catch: the order of the 8 and the 6 are not specified: theprobability of (6,8) or (8,6) is higer than the probability of (7,7).





Combinatorics = smart counting

Combinatorics is a branch of mathematics that studies counting,typically in finite structures, of objects satisfying certain criteria.

Example (Counting permutations)

• A permutation of n-objects is a rearrangement in some order

• Question: how many different permutations are there ofn-objects?

• Try to think of the answer for n = 2, 3, 4, . . .

• The answer is n! = n · (n − 1) · (n − 2) · · · 2 · 1• Pronounce: n! as “n factorial”• For those who like recursion: n! = n · (n − 1)! and 0! = 1.

• Interestingly, each permutation of n corresponds to aparticular ordering of n objects; we will use this later





Fundamental principle of (successive) counting

• Suppose that a task involves a sequence of k successivechoices

• let n1 be the number of options at the first stage;• let n2 be the number of options at the second stage, after the

first stage has occurred;• . . .• let nk term be the number of options at the k-th stage, after

the previous k − 1 stages have occurred.

• Then the total number of different ways the task can occur is:

n1 · n2 · . . . · nk =∏

1≤i≤kni





Simple counting example

A company places a 6-symbol code on each unit of its products,consisting of:

• 4 digits, the first of which is the number 5,

• followed by 2 letters, the first of which is NOT a vowel.

How many different codes are possible?

Using the basic counting principle:

• there are 10 options (decimals) for digits 2, 3, 4

• there are 26 letters in the alphabet, 26 options for letter 2

• 5 of the letters in the alphabet are vowels (a, e, i, o, u), sothat means there are 21 options for letter 1

Altogether there are 10 · 10 · 10 · 21 · 26 = 546, 000 different codes.





Samples (grepen)

We will study the following four combinations of samples

Samples Ordered Unordered

With replacement I III

Without replacement II IV





Ad I ordered samples with replacement

Question• Suppose you have n objects, and you take an ordered sample

with replacement of r out of them (with r ≤ n)

• This means that the order of the selected r elements matters,and the same element may be selected multiple times

• How many such samples are there?

Example (2-samples out of 3 elements, say 1, 2, 3)• samples: 11, 12, 13, 21, 22, 23, 31, 32, 33

• number of samples: 9 = 32

Lemma

There are nr ordered samples with replacement





Ad II ordered samples without replacement

• With replacement we can reason as follows• for the first item of the sample, there are n options• for the second item of the sample, there are still n options• etc.

This gives nr samples in total

• Without replacement we now reason:• for the first item of the sample, there are n options• for the second item of the sample, there are only n − 1 options• for the third item of the sample, there are only n − 2 options• etc.

Lemma

There are n · (n − 1) · (n − 2) · · · (n − r + 1) =n!

(n − r)!ordered

samples without replacement.





Ad II Example (ordered, without replacement)

In how many ways can 10 people be seated on a bench with 4seats?

Answer:

• We have n = 10, from which we take samples of size r = 4

• The order matters, and people who are already seated cannotbe seated again: no replacement

• Number of options: 10 · 9 · 8 · 7 = 5040 =10!

6!=

10!

(10− 4)!





Ad IV unordered samples without replacement

Recall two things:

• there are r ! ways to order/permute r items

• there are n!(n−r)! ordered samples without replacement

Combining these two yields:

Lemma

There aren!

r !(n − r)!unordered samples without replacement.

One writes

(n

r

)=

n!

r !(n − r)!. This is called the binomial

coefficient.

It is pronounced as “n choose r” or as “n over r”.

An unordered sample is sometimes called a combination.B. Jacobs Version: fall 2014 Probability theory 13 / 78




Ad IV Examples (of unordered samples without replacement)

Example (Lotto with 49 numbered balls)

How many possible outcomes are there if we consecutively take out6 balls?Answer:

(496

)= 13, 983, 816

Example

Find the number of ways to form a committee of 5 people from aset of 9.Answer:

(95

)= 126. (what is the difference with the bench example?)

Example

How many symmetric keys are needed so that n people can allcommunicate directly with each other?Answer:

(n2

)= n(n−1)

2 = (n − 1) + (n − 2) + · · ·+ 2 + 1B. Jacobs Version: fall 2014 Probability theory 14 / 78




Calculation rules for binomial coefficients

1(nr

)=( nn−r)

2

r=n∑

r=0

(nr

)= 2n

3( nr−1)

+(nr

)=(n+1

r

)

Recall also Pascal’s triangle(00

)(10

) (11

)(20

) (21

) (22

)

......

...





Binomial expansion of powers of sums

• Recall: (x + y)2 = x2 + 2xy + y2

=(20

)x2y0 +

(21

)x1y1 +

(22

)x0y2

• Similarly:

(x + y)3 = x3 + 3x2y + 3xy2 + y3

=(30

)x3y0 +

(31

)x2y1 +

(32

)x1y2 +

(33

)x0y3

Lemma

For arbitrary n ∈ N,

(x + y)n =i=n∑

i=0

(ni

)xn−iy i





Ad III unordered samples with replacement

Now the number of options is:(n+r−1

r

)

Example (Lotto with 10 numbered balls, pick and replace 2)

• How many outcomes xx? 10

• How many outcomes xy ∼ yx? 45 = 10·92 =

(102

)

Total: 10 + 45 = 55 = 11·102 =

(112

)=(10+2−1

2

)indeed!

Note that with the earlier calculation rules:(112

)=(102

)+(101

)= 45 + 10 = 55





Ad III Example (unordered samples with replacement)

Example (Lotto with 10 numbered balls, pick and replace 3)

• How many outcomes xxx? 10

• How many outcomes xyy ∼ yxy ∼ yyx? 10 · 9 = 90

• How many xyz ∼ xzy ∼ yxz ∼ yzx ∼ zxy ∼ zyx?10·9·8

6 =(103

)= 120

Total: 10 + 90 + 120 = 220 = 12·10·113·2 =

(123

)=(10+3−1

3

)Indeed!

Again with the earlier calculation rules:(123

)=(113

)+(112

)

=(103

)+(102

)+(102

)+(1010

)

= 120 + 45 + 45 + 10.





Birthday paradox

1 What is the probability that at least 2 of r randomly selectedpeople have the same birthday?

2 How large must r be so that the probability is greater than50%?





Solution, part I

• Assume that no one is born on Feb. 29 and that all birthdaysare equally distributed.

• n = 365• we look at samples of r , which are ordered, with replacement

(once a birthday occurs, it is not excluded, since it can occur again)

• nr = 365r birthday options for r people

• Look at r birthdays, all at different days• number of options: 365 · 364 · · · (366− r) = 365!

(365−r)! =(365r

)r !

• take fraction: the probability that r people have their birthdayon different days is:

365!(365−r)!

365r=

365!

(365− r)! · 365r

• Therefore, the probability that at least 2 people out of r havetheir birthday on the same day is p(r) = 1− 365!

(365−r)!·365r





Solution, part II

Some values for p(r) = 1− 365!(365−r)!·365r , depending on r .

r p(r)

10 0.117

20 0.411

23 0.507

30 0.706

50 0.97

57 0.99

Hence for r = 23 the probability of birthday-coincidence is ≥ 50%.





Application: Birthday attacks on hash functions

• SHA1 with a 160 bit output requires brute-force work of atmost 280 operations

• (although because of weaknesses in SHA1 collisions are foundalready in around 260 steps)

• In general hash functions used for signature schemes shouldhave the number of output bits n large enough such that 2n/2

computations are impractical

Note: With 8M budget an 80-bit key can be retrieved in a year (2011).





Experiments and their sample spaces

• An experiment is called random if the result will vary even ifthe conditions are the same

• A sample space consists of all possible outcomes of a randomexperiment, usually denoted with the letter S or Ω

Example (What are the relevant sample spaces?)

1 coin tossing once: S = T ,H2 coin tossing twice: S = TT ,HT ,TH,HH3 die tossing: S = 1, 2, 3, 4, 5, 64 lifetime of a bulb: S = t | 0 ≤ t ≤ 1 year

(Oxford dictionary: Historically, dice is the plural of die, but in modern

standard English dice is both the singular and the plural)B. Jacobs Version: fall 2014 Probability theory 24 / 78




Events

Definition

An event is a subset of outcomes of a random experiment, that is,a subset of the sample space.

We write the powerset P(S) = A |A ⊆ S for the set of events.

Example (for sample space S)

• the entire subset S ⊆ S is the “certain” event

• ∅ ⊆ S is the impossible event

• two events A and B are mutually exclusive if A ∩ B = ∅.





Probability measure

Definition

A probability measure P for a sample space S is a function thatgives for each event A ⊆ S a probability P(A) ∈ [0, 1], with:

1 Axiom 1: P(S) = 1

2 Axiom 2: P(A ∪ B) = P(A) + P(B) for mutually exclusiveevents A,B ⊆ S , that is, when A ∩ B = ∅

A probability measure on S is thus a function P : P(S)→ [0, 1]satisfying (1) and (2).

It is called discrete if the sample space S is finite; this implies thatthere only finitely many events.

(Officially, discrete spaces can also be countable, but we shall not use

those here)





Properties of probability measures

Theorem

Let P be a probability measure on space S, and let A,Ai ,B, beevents. Then:

1 A ⊆ B ⇒ P(A) ≤ P(B)

2 P(∅) = 0

3 P(¬A) = 1− P(A), where ¬A = S − A = s ∈ S | s 6∈ A4 For mutually exclusive events A1,A2, . . . ,An, where

Ai ∩ Aj = ∅, for all i 6= j , one hasP(A1 ∪ A2 ∪ . . . ∪ An) = P(A1) + P(A2) + . . .+ P(An)

5 P(A ∪ B) = P(A) + P(B)− P(A ∩ B)

6 P(A) = P(A ∩ B) + P(A ∩ ¬B)

The points can all be derived from the axioms (1) and (2) for aprobability measure P.B. Jacobs Version: fall 2014 Probability theory 27 / 78




Example: proof of point (1)

Proof.

• Assume A ⊆ B; RTP: P(A) ≤ P(B)• RTP = “Required To Prove”

• We can write B as disjoint union B = A ∪ (B − A), where:• B − A = B ∩ ¬A = s ∈ S | s ∈ B and s 6∈ A• A ∩ (B − A) = ∅

• By Axiom 2 we get: P(B) = P(A) + P(B − A)

• Since P(B − A) ∈ [0, 1], by definition, we get P(B) ≥ P(A).





Discrete sample space example

Recall that a sample space S is called discrete if it is finite

Example (One dice)

• S = 1, 2, 3, 4, 5, 6, with events A ⊆ S

• The probability measure P : P(S)→ [0, 1] is easy:• P(1, 3, 5) = 1

2• P(1, 6) = 1

3

• We see that P is determined by what it does on singletonevents i ⊆ S

• This is typical for finite (and countable) sample spaces.





Discrete sample spaces

Let S be a discrete (ie. finite) sample space, with probabilitymeasure P : P(S)→ [0, 1].

• An event A ⊆ S is then also finite, say A = x1, . . . , xn• Hence we can write it as disjoint union of singletons:

A = x1 ∪ · · · ∪ xn

• Hence P(A) = P(x1) + · · ·+ P(xn), by Axiom 2.

• Thus, P is entirely determined by its values P(x) onsingletons, for x ∈ S .

• The function f : S → [0, 1] with f (x) = P(x) is called theunderlying distribution

• It satisfies∑

x∈S f (x) = 1 since:∑

x∈S f (x) =∑

x∈S P(x) = P(⋃

x∈Sx) = P(S) = 1





The uniform distribution

Fix a number n ∈ N and take as sample space S = 1, 2, . . . , n.• The simplest distribution is the uniform distribution

un : S → [0, 1], which assigns the same probability to eachi ∈ S

• Since the sum of probabilities must we 1, the only option is:

un(i) =1

n

• More generaly, on each finite set X we can defineu : X → [0, 1] as u(x) = 1

#X , where #X ∈ N is the number ofelements of X .





The binomial distribution

Fix n ∈ N with S = 0, 1, . . . , n and p ∈ [0, 1].

• Define the binomial distribution b : S → [0, 1] as:

b(k) =

(n

k

)pk(1− p)n−k

• Read b(k) as:

the probability of exactly k successes after n trials,each with chance p

Briefly: b(k) = P(k out of n).

• This is well-defined distribution by binomial expansion:∑

k b(k) =∑

k

(nk

)pk(1− p)n−k = (p + (1− p))n = 1n = 1





Example binomial expansion

Suppose we have a biased coin, which comes up head withprobability p ∈ [0, 1].

Example (Toss the coin n = 5 times)

What is the probability of getting head k times (for 0 ≤ k ≤ 5)?

• If k = 0, then: (1− p)5

• via the formula: b(0) =(50

)p0(1− p)5−0 = (1− p)5

• If k = 1, then: 5p(1− p)4

• b(1) =(51

)p1(1− p)5−1 = 5p(1− p)4

• In general: b(k) =(5k

)pk(1− p)5−k .

What happens if p = 12?





Another binomial distribution example

Hospital records show that of patients suffering from a certaindisease, 75% die of it. What is the probability that of 6 randomlyselected patients, 4 will recover?

• We have n = 6, with recovery probability p = 14 .

• Hence b(4) =(64

)(14)4(34)2 ≈ 0, 0329595

• Picture of all (recovery) probabilities in a histogram

(source: intmath.com)





Other distributions

There are many other standard distributions, like:

• Normal distribution (see later in the continuous case)

• Hypergeometric distribution

• Poisson distribution• for independent occurrences, where some average µ is known

• then p(k) = e−µ · µk

k! , for k ∈ N. For instance, for µ = 3,

We will not discuss these distributions here. Look up the details,later in your life, when you need them.B. Jacobs Version: fall 2014 Probability theory 35 / 78




Conditional probability intro

Example (Suppose you throw one dice)

• Of course, the probability of 4 is 16

• But what is the probability of 4, if you already know that theoutcome is even?

• Intuitively it is clear it should be: 13 .

• We write P(4) = 16 and P(4 | even) = 1

3

Conditional probability is about updating probabilities in the lightof given (aka. prior) information.





Conditional probability example

Assume a group of students for which:

• The probability that a student does mathematics andcomputer science is 1

10

• The probability that a student does computer science is 34 .

Question: What is the probability that a student doesmathematics, given that we know that (s)he does computerscience?

Answer: We have P(M ∩ CS) = 110 and P(CS) = 3

4 .

We seek the conditional probability P(M | CS) = “M, given CS”

The formula is:

P(M | CS) =P(M ∩ CS)

P(CS)=

11034

=4

30=

2

15.





Basic definitions

Definition

For two events A,B, the conditional probability P(A | B) = “theprobability of A, given B”, is

P(A | B) =P(A ∩ B)

P(B).

Alternatively, P(A | B) · P(B) = P(A ∩ B).

Definition

Two events A,B are independent if P(A ∩ B) = P(A) · P(B).

Equivalently, P(A | B) = P(A).





Election example

Assume there are three candidates: A,B ,C ; only one can win

• the probability P(A) that A wins is the same as for B

• P(C ) is half of P(A).

Question 1: What are P(A),P(B) and P(C )?

Answer 1: Solving P(A) + P(B) + P(C ) = 1, P(A) = P(B) andP(C ) = 1

2P(A) yields: P(A) = P(B) = 25 ,P(C ) = 1

5 .

Question 2: Assume A withdraws; what are the chances of B,Cnow?

Answer 2: Think first what they would be intuitively!

P(B | ¬A) = P(B∩¬A)P(¬A) = P(B)

1−P(A) =2535

= 23

P(C | ¬A) = P(C∩¬A)P(¬A) = P(C)

1−P(A) =1535

= 13 .





Conditional probability, for multiple events

• Recall P(A1 ∩ A2) = P(A1 | A2) · P(A2)

• HenceP(A1 ∩ A2 ∩ A3) = P(A1 | A2 ∩ A3) · P(A2 ∩ A3)

= P(A1 | A2 ∩ A3) · P(A2 | A3) · P(A3).

• Alternatively:

P(A1 | A2 ∩ A3) =P(A1 ∩ A2 ∩ A3)

P(A2 ∩ A3)=

P(A1 ∩ A2 ∩ A3)

P(A2 | A1) · P(A1)

• This can be generalised to A1, . . . ,An.





Partitions

Definition

A partition of a sample space S is a collections of eventsA1, . . . ,An ⊆ S with both:

A1 ∪ · · · ∪ An = S and Ai ∩ Aj = ∅, for i 6= j

A binary partion is given by A,¬A.





Partitions and the total probability lemma

Lemma (Total probability)

For a partition A1, . . . ,An and arbitrary event B,

P(B) = P(B | A1) · P(A1) + · · ·+ P(B | An) · P(An).

Because: P(B | A1) · P(A1) + · · ·+ P(B | An) · P(An)

= P(B ∩ A1) + · · ·+ P(B ∩ An)

= P((B ∩ A1) ∪ · · · ∪ (B ∩ An))

= P(B ∩ (A1 ∪ · · · ∪ An))

= P(B ∩ S)

= P(B).





Total probability illustration

Example (Two boxes with long & short bolts)

• In box 1, there are 60 short bolts and 40 long bolts. In box 2,there are 10 short bolts and 20 long bolts. Take a box atrandom, and pick a bolt. What is the probability that youchose a short bolt?

• Write Bi for the event that box i is chosen, for i = 1, 2

• The solution is:

P(short) = P(short | B1)P(B1) + P(short | B2)P(B2)

= 60100 · 12 + 10

30 · 12= 3

10 + 16

= 715 .





Bayes’ Rule/Theorem

Theorem

For events E ,H we have:

P(H | E ) =P(E | H) · P(H)

P(E ).

Terminology:

• E = evidence, H = hypothesis

• P(H) = prior probability, P(H | E ) = posterior probability

Proof

P(E | H) · P(H) = P(E ∩ H) = P(H ∩ E ) = P(H | E ) · P(E ).





Bayes’ Rule/Theorem for partitions

Theorem

Suppose we have a partition H1, . . . ,Hn. Then:

P(Hi | E ) =P(E | Hi ) · P(Hi )∑j P(E | Hj) · P(Hj)

.

Proof Since P(E ) =∑

j P(E | Hj) · P(Hj) by the total probabilitylemma.





Machine example

Setting and question

• There are 3 machines M1,M2,M3 producing items, withdefect probabilities 0,01, 0,02, 0,03 respectively.

• 20% of items come from M1, 30% from M2, 50% from M3

• Find the probability that a defect item comes from M1.

Solution

• We have P(M1) = 0, 2, P(M2) = 0, 3, P(M3) = 0, 5 andP(D | M1) = 0, 01, P(D | M2) = 0, 02, P(D | M3) = 0, 03

• Via the total probability lemma we compute P(D) as:

P(D | M1) · p(M1) + P(D | M2) · p(M2) + P(D | M3) · p(M3)

= 0, 01 · 0, 2 + 0, 02 · 0, 3 + 0, 03 · 0, 5 = 0, 023

• Then: P(M1 | D) = P(D|M1)·P(M1)P(D) = 0,01·0,2

0,023 = 0, 087B. Jacobs Version: fall 2014 Probability theory 47 / 78




Rain and umbrella example

Setting

• Prior knowledge P(rain) = 15

• P(umbrella | rain) = 710 and P(umbralla | ¬rain) = 1

10

• Suppose you see someone with an umbrella. What is theprobability that it rains?

Answer

P(rain | umbrella)

=P(umbrella | rain) · P(rain)

P(umbrella | rain) · P(rain) + P(umbrella | ¬rain) · P(¬rain)

=710 · 15

710 · 15 + 1

10 · 45=

750

750 + 4

50

=7

11≈ 0, 64.





Inference: learning from iterated observation

• In the previous example we started from P(rain) = 15 , and

computed P(rain | umbrella) = 711 .

• Thus after observing this umbrella we may update our priorknowledge to P ′(rain) = 7

11 .• What if we see another, second umbrella? Surely, the

probability of rain is even higher. How to compute it?• We can play the same game with the updated rain probability

P ′(rain) = 710 .

P(rain | 2umbrellas)

= P(umbrella|rain)·P′(rain)P(umbrella|rain)·P′(rain)+P(umbrella|¬rain)·P′(¬rain)

=710· 711

710· 711+ 1

10· 411

=49110

49110

+ 4110

= 4953 ≈ 0, 92.

• See courses on AI (esp. Machine Learning) for moreinformation, esp. on Bayesian networks (graphical models)!





Random variable

Associating a value with an experiment

• Suppose, we throw a coin, so the sample space is S = H,T• If the outcome is H, I get 100e, otherwise I loose 100e• This situation is described via a random variable X : S → R

• X (H) = 100,X (T ) = −100

Definition

Let S be a sample space. A random variable is a real functiondefined on S , of the form X : S → R.

A random variable is also called a stochatistic variable, or simply astochast (in Dutch: kansvariabele)





More about random variables

• A random variable X : S → R that takes on a finite (or acountably infinite) number of values is called discrete

• this means that the range R(X ) ⊆ R is finite (or countable)• otherwise we have a non-discrete or continuous random

variable• Note that if S is discrete, then so is X .

• If we have two random variables, say X ,Y : S → R, then wecan also define the random variables X + Y ,X − Y , rX in theobvious, pointwise manner:

(X + Y )(s) = X (s) + Y (s)

(X − Y )(s) = X (s)− Y (s)(rX )(s) = r · X (s)





Random variables and events

Definition

Let X : S → R be a random variable, and x ∈ R be an outcome.

There in event (X = x) ⊆ S , understood as “outcome is x”,namely:

(X = x) = s ∈ S |X (s) = x.If there is also a probability measure P : P(S)→ [0, 1], then wewrite P(X = x) ∈ [0, 1] for the probability of this event(X = x) ⊆ S .

Lemma

If X is a discrete random variable, say with outcomes x1, . . . , xm,then the events (X = x1), . . . , (X = xn) form a partition.





Overview: measures / distributions / random variables

Let S be a sample space. Recall:

• a probability measure P is a function from events toprobabilities:

P(S)P // [0, 1]

• If S is finite, P corresponds to a distribution from samples toprobabilities:

Sf // [0, 1] via f (s) = P(s)

• A random variable is a function from samples to values:

SX // R

It gives rise to events (X = x) ⊆ S , with probabilityP(X = x) ∈ [0, 1].





Example

A (fair) coin is tossed twice times; we count the number of heads.

• the sample space is S = HH,HT ,TH,TT• We have a uniform distribution f : S → [0, 1] namely f (s) = 1

4

• There is a “sum of heads” random variable X : S → R:

X (HH) = 2, X (HT ) = X (TH) = 1, X (TT ) = 0

We are in the discrete case, with range R(X ) = 0, 1, 2 ⊆ R.

• Example events with probabilities:

P(X = 0) = 14 , P(X = 1) = 1

2 , P(X = 2) = 14 .





Expectation and variance (discrete case)

Definition

Let X : S → R be a discrete random variable, with values (range)x1, . . . , xn ⊆ R.

• The expectation or expected value or weighted meanE (X ) ∈ R is:

E (X ) = P(X = x1) · x1 + · · ·+ P(X = xn) · xn

• The variance (spreiding) Var(X ) ∈ R describes the spread

Var(X ) = E((X − E (X ))2

)=∑

i P(X = xi ) · (xi − E (X ))2

• The standard deviation is: σX =√

Var(X ).• an outcome in [E (X )− σX ,E (X ) + σX ] is considered “normal”





Dice example

We have S = 1, 2, 3, 4, 5, 6 with X : S → R simply X (s) = s

• P(X = i) = 16 , for i = 1, 2, 3, 4, 5, 6

• E (X ) =∑

i P(X = i) · i = 16 · 1 + · · ·+ 1

6 · 6= 1

6 · (1 + · · ·+ 6) = 216 = 7

2 = 3, 5

The expectation is the mean for a uniform distribution

• Var(X ) = E((X − E (X ))2

)=∑

i P(X = i) · (i − 72)2

= 16(1− 7

2)2 + · · ·+ 16(6− 7

2)2 = 3512

• σX =√

Var(X ) =√

3512 ≈ 1, 71.





Lottery example


• In a lottery there are 200 prizes of 5e, 20 of 25e, 5 of 100e.Assuming that 10.000 tickets will be issued and sold, what isa fair price to pay for a ticket?

• Answer: a price that is just a bit more than the expectedamount to be won.

• The sample space has 4 elements; we leave it implicit and onlydescribe the random variable X for the amount to be won.

X 5 25 100 0

P(X = x) 0,02 0,002 0,0005 0,9775

• E (X ) =∑

i P(X = i) · i = 0, 02 · 5 + 0, 002 · 25+0, 0005 · 100 + 0 = 0, 2. So a ticket should cost 20 cent.





Expectation for the binomial distribution

Recall the paramters are n ∈ N and p ∈ [0, 1]. Then:

• S = 0, 1, . . . , n, to which we add X : S → R with X (i) = i .

• b(k) = P(X = k) =(nk

)pk(1− p)n−k

Lemma

The binomial distribution for n ∈ N and p ∈ [0, 1] satisfies

E (X ) = n · p

Recall the hospital example, with n = 6 patients and p = 14

recovery probability.

The expected number of survivors is 6 · 14 = 32 .





Expectation for the binomial distribution, proof

E (X ) =k=n∑

k=0

P(X = k) · k =k=n∑

k=0

k(nk

)pk(1− p)n−k

=k=n∑

k=1

n(n−1k−1)pk(1− p)n−k

= npk=n∑

k=1

(n−1k−1)pk−1(1− p)(n−1)−(k−1)

= npi=n−1∑

i=1

(n−1i

)pi (1− p)(n−1)−i

= np(p + (1− p))n−1 = np1n−1 = np.





Events

• For a discrete random variable X we have seen probabilities ofthe form P(X = x)

• In the continuous case we shall look at P(X ≤ x) or also atP(x ≤ X ≤ y)

• They describe the probability of the events:• (X ≤ x) = s ∈ S |X (s) ≤ x ⊆ S

• (x ≤ X ≤ y) = s ∈ S | x ≤ X (s) ≤ y ⊆ S





Probability, surface, and integration: discrete case

Recall the hospital records histogram used earlier:

• We can read it as a function f : R→ R• Suppose we are interested in the probability P(1 ≤ X ≤ 3)

• This probability is obtained as P(1 ≤ X ≤ 3) =∫ 31 f (x)dx





Probability, surface, and integration: continuous case

This idea can be generalised, as suggested in:

P(−2 ≤ X ≤ 2)

=

∫ 2

−2f (x)dx

• The functions f : R→ R used in such a way are calledprobability density functions (pdf, dichtheidsfunctie)

• They should satisfy: f (x) ≥ 0 and:

∫ +∞

−∞f (x)dx = 1.

• This connects the first and second part of this course!





Example pdf

• Often a continuous random variable is defined directly via aprobability density function (pdf)

• For instance, consider f : R→ R defined by:

f (x) =

14x if 1 ≤ x ≤ 3

0 otherwise.

• Clearly f (x) ≥ 0 and:∫ +∞

−∞f (x)dx =

∫ 3

1

14xdx = 1

8x2]31

= 18(32 − 12) = 1.

• Hence we can define a continuous random variable as:

P(a ≤ X ≤ b) =

∫ b

af (x)dx

For instance P(1 ≤ X ≤ 2) = 38 .





Cumulative distribution function (verdelingsfunctie)

• In practice, a pdf is often given directly, and the randomvariable is then defined accordingly

• But one can also obtain the pdf from a random variable X

• Define the cumulative distribution function F : R→ [0, 1] as:

F (x) = P(X ≤ x)

Then F (−∞) = 0 and F (+∞) = 1.

• The pdf f is then the derivative f = F ′ = ddx P(X ≤ x).

Then indeed:

P(X ≤ b) = F (b) = F (b)− F (−∞) =∫ b−∞ f (x)dx

P(a ≤ X ≤ b) = P(X ≤ b)− P(X ≤ a)

=∫ b−∞ f (x)dx −

∫ a−∞ f (x)dx =

∫ ba f (x)dx





Uniform probability, in the continuous case

• Consider the interval [a, b] ⊆ R, for given a < b

• Define a density function ua,b : R→ R as:

ua,b(x) =

1

b−a if x ∈ [a, b]

0 otherwise.

• Then:∫ ∞

−∞ua,b(x)dx =

∫ b

a

1b−adx = 1

b−ax]ba

= 1b−a(b − a) = 1.





Expectation, in the continuous case

Definition

Let X be continuous random variable, given by pdf f . Then:

E (X ) =

∫ ∞

−∞x · f (x)dx

Var(X ) =∫∞−∞(x − E (X ))2 · f (x)dx

σX =√

Var(X ).





Example, in the uniform case

Recall ua,b : R→ R with value 1b−a on [a, b]. Then:

E (X ) =

∫ +∞

−∞xua,b(x)dx =

∫ b

a

xb−adx

= 12(b−a)x2

]ba

= b2−a22(b−a) = (a−b)(a+b)

2(b−a) = a+b2

Var(X ) =

∫ +∞

−∞(x − E (X ))2ua,b(x)dx =

∫ b

a

(x− a+b2

)2

b−a dx

= 14(b−a)

∫ b

a(2x − (a + b))2dx

= 14(b−a)

∫ b

a4x2 − 4(a + b)x + (a + b)2dx

= 14(b−a)

[43x3 − 2(a + b)x2 + (a + b)2x

]ba

= · · · = 112(a− b)2.





Properties, also for the discrete case

Lemma

• E (X + Y ) = E (X ) + E (Y )

• E (r · X ) = r · E (X )

• Var(X ) = E (X 2)− E (X )2





Normal (Gaussian) distribution

An important class of probability density functions is of this form:

• Such bell curves are typical for normal/Gaussian distributions

• The curve is determined by two parameters• the mean, written as µ, which determines the location of the

middle of the bell• the variance, written as σ2, which determines the width

• This distribution is very common in practice, whenobservations pile up around a particular value





Normal (Gaussian) distribution, definition

Definition

Given parameters µ for mean and σ for variance, consider the pdf

f (x) =1

σ√

2πe−

12( x−µ

σ)2

The associated random variable X is called normal or Gaussian. Itsdistribution function is thus:

P(a ≤ X ≤ b) =1

σ√

2π

∫ b

ae−

12( x−µ

σ)2dx

One writes X ∼ N(µ, σ).

The integral cannot be computed exactly, so one uses tables ofcumulative probabilities for a special normal distribution tocalculate the probabilities.





Standardising normal distribution

N(0, 1) is often called the standard normal random variable; the

pdf involved is 1√2π

e−12x2 ; it is centered around 0.

Theorem

If X is a normal random variable with mean µ and standarddeviation σ, then Z = X−µ

σ is a standard normal random variable.

It satisfies E (Z ) = 0 and σZ = 1.

In a picture:





Standardisation example


Suppose students can get 100 marks for their exam; the output is:

marks = [85, 24, 63, 12, 87, 90, 33, 38, 25]

Which ones should “normally” fail?

Via a small Python program we get µ = 50.78 and σ = 28.94.

The list [(i − µ)/σ for i in marks] of standardized marks is:

[1.18,−0.93, 0.42,−1.34, 1.25, 1.36,−0.61,−0.44,−0.89]

By construction this list has mean 0 and deviation 1.

Only one entry deviates < −1; this one should fail “normally”; itcorresponds to the result 12 in the original list.





Overview

In this course we have:

1 seen/recalled the basics of mathematical calculus: derivationand integration

2 introduced the basics of probability theory

These areas are connected, via continous random variables, whereprobabilities are computed via integration of a probability densityfunction (pdf).





Preparing for the exam

• The emphasis is on being able to calculate things, not toprove properties

• much like in the exercises

• The exam is “closed” book

• Hence, definitions and results must be known by heart, whenthey are relevant for doing calculations

• A simple calculator will be provided (only +,−, ∗, /)





The role of the exercises

• The exercises form the best preparation for the exam!

• There will be one more exercise, which is a mock exam• do it as a test for yourself, without the notes• hence you can see what you know and what you don’t• an elaborated version will be put online, after the hand-in date

• Of the marks for the exercises of the past weeks we will dropthe lowest one

• The remaining average of the exercises will make up half ofyour final mark

• the other half comes from the written exam• this exam must be ≥ 5 in order to pass

• There will be a second written exam, where the same averageof the exercises will be used.





Summary of important points

• Derivation• limits, derivatives and their interpretation as tangent,

differentation rules, special functions, function investigation

• Integration• definite integral as area, indefinite integral as inverse to

differentiation, special functions, substitution & integration byparts, arc length

• Probability• combinatorics (samples with/without order/replacement),

binomials, (discrete) probability measure, uniform & binomialdistributions, conditional probability, total probability, Bayes’rule, random variable, expectation, standard deviation, pdf,normal distributions and standardisation





The exam itself

• Date and time: Tuesday, 4 nov, 8:30 (precise) - 11:30• HAL 2, for “regular” students• HG00.058, for “extra time” students (until 12:30)

• If you qualify for extra time, let me know in advance, viaemail.

• Make sure (and double-check) that you are registered• do this today!• registration cannot be done on the spot; you will be excluded

Good luck with the preparation, and the exam itself!


outline probability theory historical background …...combinatorics is a branch of mathematics that...

Documents