outline probability theory historical background …...combinatorics is a branch of mathematics that...
TRANSCRIPT
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Probability theory
B. Jacobs
Institute for Computing and Information Sciences – Digital SecurityRadboud University Nijmegen
Version: fall 2014
B. Jacobs Version: fall 2014 Probability theory 1 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Outline
Combinatorics
Probability
Conditional probability and Bayes’ rule
Random variablesContinuous random variablesFinal remarks
B. Jacobs Version: fall 2014 Probability theory 2 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Historical background
“Probability” is the part of mathematics which looks for lawsgoverning random events. It has its origins in games of chance i.e.in gambling.Chevalier de Mere (1607-1684) was a famous gambler and a friendof Blaise Pascal, who started to develop probability theory
Example (Question about rolling dices)
What is more likely to get:
1 at least one 6 in 4 rolls of one dice
2 at least one pair (6,6) in 24 simultaneous rolls of two dice?
Chevalier expected (2), and lost money as a result.
• p1 = 1− (56)4 ≈ 0.518 (or 51.8% chance)
• p2 = 1− (3536)24 ≈ 0.491
B. Jacobs Version: fall 2014 Probability theory 3 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Another de Mere-like challenge (from teacherlink.org)
Would you take the following bet, about repeatedly rolling twodice:
“I will get both a sum 8 and a sum 6,before you get two sums of 7.”
B. Jacobs Version: fall 2014 Probability theory 4 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
If you take it, I win and you loose
Consider all possible sums as outcomes:
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
The catch: the order of the 8 and the 6 are not specified: theprobability of (6,8) or (8,6) is higer than the probability of (7,7).
B. Jacobs Version: fall 2014 Probability theory 5 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Combinatorics = smart counting
Combinatorics is a branch of mathematics that studies counting,typically in finite structures, of objects satisfying certain criteria.
Example (Counting permutations)
• A permutation of n-objects is a rearrangement in some order
• Question: how many different permutations are there ofn-objects?
• Try to think of the answer for n = 2, 3, 4, . . .
• The answer is n! = n · (n − 1) · (n − 2) · · · 2 · 1• Pronounce: n! as “n factorial”• For those who like recursion: n! = n · (n − 1)! and 0! = 1.
• Interestingly, each permutation of n corresponds to aparticular ordering of n objects; we will use this later
B. Jacobs Version: fall 2014 Probability theory 6 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Fundamental principle of (successive) counting
• Suppose that a task involves a sequence of k successivechoices
• let n1 be the number of options at the first stage;• let n2 be the number of options at the second stage, after the
first stage has occurred;• . . .• let nk term be the number of options at the k-th stage, after
the previous k − 1 stages have occurred.
• Then the total number of different ways the task can occur is:
n1 · n2 · . . . · nk =∏
1≤i≤kni
B. Jacobs Version: fall 2014 Probability theory 7 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Simple counting example
A company places a 6-symbol code on each unit of its products,consisting of:
• 4 digits, the first of which is the number 5,
• followed by 2 letters, the first of which is NOT a vowel.
How many different codes are possible?
Using the basic counting principle:
• there are 10 options (decimals) for digits 2, 3, 4
• there are 26 letters in the alphabet, 26 options for letter 2
• 5 of the letters in the alphabet are vowels (a, e, i, o, u), sothat means there are 21 options for letter 1
Altogether there are 10 · 10 · 10 · 21 · 26 = 546, 000 different codes.
B. Jacobs Version: fall 2014 Probability theory 8 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Samples (grepen)
We will study the following four combinations of samples
Samples Ordered Unordered
With replacement I III
Without replacement II IV
B. Jacobs Version: fall 2014 Probability theory 9 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Ad I ordered samples with replacement
Question• Suppose you have n objects, and you take an ordered sample
with replacement of r out of them (with r ≤ n)
• This means that the order of the selected r elements matters,and the same element may be selected multiple times
• How many such samples are there?
Example (2-samples out of 3 elements, say 1, 2, 3)• samples: 11, 12, 13, 21, 22, 23, 31, 32, 33
• number of samples: 9 = 32
Lemma
There are nr ordered samples with replacement
B. Jacobs Version: fall 2014 Probability theory 10 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Ad II ordered samples without replacement
• With replacement we can reason as follows• for the first item of the sample, there are n options• for the second item of the sample, there are still n options• etc.
This gives nr samples in total
• Without replacement we now reason:• for the first item of the sample, there are n options• for the second item of the sample, there are only n − 1 options• for the third item of the sample, there are only n − 2 options• etc.
Lemma
There are n · (n − 1) · (n − 2) · · · (n − r + 1) =n!
(n − r)!ordered
samples without replacement.
B. Jacobs Version: fall 2014 Probability theory 11 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Ad II Example (ordered, without replacement)
In how many ways can 10 people be seated on a bench with 4seats?
Answer:
• We have n = 10, from which we take samples of size r = 4
• The order matters, and people who are already seated cannotbe seated again: no replacement
• Number of options: 10 · 9 · 8 · 7 = 5040 =10!
6!=
10!
(10− 4)!
B. Jacobs Version: fall 2014 Probability theory 12 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Ad IV unordered samples without replacement
Recall two things:
• there are r ! ways to order/permute r items
• there are n!(n−r)! ordered samples without replacement
Combining these two yields:
Lemma
There aren!
r !(n − r)!unordered samples without replacement.
One writes
(n
r
)=
n!
r !(n − r)!. This is called the binomial
coefficient.
It is pronounced as “n choose r” or as “n over r”.
An unordered sample is sometimes called a combination.B. Jacobs Version: fall 2014 Probability theory 13 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Ad IV Examples (of unordered samples without replacement)
Example (Lotto with 49 numbered balls)
How many possible outcomes are there if we consecutively take out6 balls?Answer:
(496
)= 13, 983, 816
Example
Find the number of ways to form a committee of 5 people from aset of 9.Answer:
(95
)= 126. (what is the difference with the bench example?)
Example
How many symmetric keys are needed so that n people can allcommunicate directly with each other?Answer:
(n2
)= n(n−1)
2 = (n − 1) + (n − 2) + · · ·+ 2 + 1B. Jacobs Version: fall 2014 Probability theory 14 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Calculation rules for binomial coefficients
1(nr
)=( nn−r)
2
r=n∑
r=0
(nr
)= 2n
3( nr−1)
+(nr
)=(n+1
r
)
Recall also Pascal’s triangle(00
)(10
) (11
)(20
) (21
) (22
)
......
...
B. Jacobs Version: fall 2014 Probability theory 15 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Binomial expansion of powers of sums
• Recall: (x + y)2 = x2 + 2xy + y2
=(20
)x2y0 +
(21
)x1y1 +
(22
)x0y2
• Similarly:
(x + y)3 = x3 + 3x2y + 3xy2 + y3
=(30
)x3y0 +
(31
)x2y1 +
(32
)x1y2 +
(33
)x0y3
Lemma
For arbitrary n ∈ N,
(x + y)n =i=n∑
i=0
(ni
)xn−iy i
B. Jacobs Version: fall 2014 Probability theory 16 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Ad III unordered samples with replacement
Now the number of options is:(n+r−1
r
)
Example (Lotto with 10 numbered balls, pick and replace 2)
• How many outcomes xx? 10
• How many outcomes xy ∼ yx? 45 = 10·92 =
(102
)
Total: 10 + 45 = 55 = 11·102 =
(112
)=(10+2−1
2
)indeed!
Note that with the earlier calculation rules:(112
)=(102
)+(101
)= 45 + 10 = 55
B. Jacobs Version: fall 2014 Probability theory 17 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Ad III Example (unordered samples with replacement)
Example (Lotto with 10 numbered balls, pick and replace 3)
• How many outcomes xxx? 10
• How many outcomes xyy ∼ yxy ∼ yyx? 10 · 9 = 90
• How many xyz ∼ xzy ∼ yxz ∼ yzx ∼ zxy ∼ zyx?10·9·8
6 =(103
)= 120
Total: 10 + 90 + 120 = 220 = 12·10·113·2 =
(123
)=(10+3−1
3
)Indeed!
Again with the earlier calculation rules:(123
)=(113
)+(112
)
=(103
)+(102
)+(102
)+(1010
)
= 120 + 45 + 45 + 10.
B. Jacobs Version: fall 2014 Probability theory 18 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Birthday paradox
1 What is the probability that at least 2 of r randomly selectedpeople have the same birthday?
2 How large must r be so that the probability is greater than50%?
B. Jacobs Version: fall 2014 Probability theory 19 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Solution, part I
• Assume that no one is born on Feb. 29 and that all birthdaysare equally distributed.
• n = 365• we look at samples of r , which are ordered, with replacement
(once a birthday occurs, it is not excluded, since it can occur again)
• nr = 365r birthday options for r people
• Look at r birthdays, all at different days• number of options: 365 · 364 · · · (366− r) = 365!
(365−r)! =(365r
)r !
• take fraction: the probability that r people have their birthdayon different days is:
365!(365−r)!
365r=
365!
(365− r)! · 365r
• Therefore, the probability that at least 2 people out of r havetheir birthday on the same day is p(r) = 1− 365!
(365−r)!·365r
B. Jacobs Version: fall 2014 Probability theory 20 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Solution, part II
Some values for p(r) = 1− 365!(365−r)!·365r , depending on r .
r p(r)
10 0.117
20 0.411
23 0.507
30 0.706
50 0.97
57 0.99
Hence for r = 23 the probability of birthday-coincidence is ≥ 50%.
B. Jacobs Version: fall 2014 Probability theory 21 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Application: Birthday attacks on hash functions
• SHA1 with a 160 bit output requires brute-force work of atmost 280 operations
• (although because of weaknesses in SHA1 collisions are foundalready in around 260 steps)
• In general hash functions used for signature schemes shouldhave the number of output bits n large enough such that 2n/2
computations are impractical
Note: With 8M budget an 80-bit key can be retrieved in a year (2011).
B. Jacobs Version: fall 2014 Probability theory 22 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Experiments and their sample spaces
• An experiment is called random if the result will vary even ifthe conditions are the same
• A sample space consists of all possible outcomes of a randomexperiment, usually denoted with the letter S or Ω
Example (What are the relevant sample spaces?)
1 coin tossing once: S = T ,H2 coin tossing twice: S = TT ,HT ,TH,HH3 die tossing: S = 1, 2, 3, 4, 5, 64 lifetime of a bulb: S = t | 0 ≤ t ≤ 1 year
(Oxford dictionary: Historically, dice is the plural of die, but in modern
standard English dice is both the singular and the plural)B. Jacobs Version: fall 2014 Probability theory 24 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Events
Definition
An event is a subset of outcomes of a random experiment, that is,a subset of the sample space.
We write the powerset P(S) = A |A ⊆ S for the set of events.
Example (for sample space S)
• the entire subset S ⊆ S is the “certain” event
• ∅ ⊆ S is the impossible event
• two events A and B are mutually exclusive if A ∩ B = ∅.
B. Jacobs Version: fall 2014 Probability theory 25 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Probability measure
Definition
A probability measure P for a sample space S is a function thatgives for each event A ⊆ S a probability P(A) ∈ [0, 1], with:
1 Axiom 1: P(S) = 1
2 Axiom 2: P(A ∪ B) = P(A) + P(B) for mutually exclusiveevents A,B ⊆ S , that is, when A ∩ B = ∅
A probability measure on S is thus a function P : P(S)→ [0, 1]satisfying (1) and (2).
It is called discrete if the sample space S is finite; this implies thatthere only finitely many events.
(Officially, discrete spaces can also be countable, but we shall not use
those here)
B. Jacobs Version: fall 2014 Probability theory 26 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Properties of probability measures
Theorem
Let P be a probability measure on space S, and let A,Ai ,B, beevents. Then:
1 A ⊆ B ⇒ P(A) ≤ P(B)
2 P(∅) = 0
3 P(¬A) = 1− P(A), where ¬A = S − A = s ∈ S | s 6∈ A4 For mutually exclusive events A1,A2, . . . ,An, where
Ai ∩ Aj = ∅, for all i 6= j , one hasP(A1 ∪ A2 ∪ . . . ∪ An) = P(A1) + P(A2) + . . .+ P(An)
5 P(A ∪ B) = P(A) + P(B)− P(A ∩ B)
6 P(A) = P(A ∩ B) + P(A ∩ ¬B)
The points can all be derived from the axioms (1) and (2) for aprobability measure P.B. Jacobs Version: fall 2014 Probability theory 27 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Example: proof of point (1)
Proof.
• Assume A ⊆ B; RTP: P(A) ≤ P(B)• RTP = “Required To Prove”
• We can write B as disjoint union B = A ∪ (B − A), where:• B − A = B ∩ ¬A = s ∈ S | s ∈ B and s 6∈ A• A ∩ (B − A) = ∅
• By Axiom 2 we get: P(B) = P(A) + P(B − A)
• Since P(B − A) ∈ [0, 1], by definition, we get P(B) ≥ P(A).
B. Jacobs Version: fall 2014 Probability theory 28 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Discrete sample space example
Recall that a sample space S is called discrete if it is finite
Example (One dice)
• S = 1, 2, 3, 4, 5, 6, with events A ⊆ S
• The probability measure P : P(S)→ [0, 1] is easy:• P(1, 3, 5) = 1
2• P(1, 6) = 1
3
• We see that P is determined by what it does on singletonevents i ⊆ S
• This is typical for finite (and countable) sample spaces.
B. Jacobs Version: fall 2014 Probability theory 29 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Discrete sample spaces
Let S be a discrete (ie. finite) sample space, with probabilitymeasure P : P(S)→ [0, 1].
• An event A ⊆ S is then also finite, say A = x1, . . . , xn• Hence we can write it as disjoint union of singletons:
A = x1 ∪ · · · ∪ xn
• Hence P(A) = P(x1) + · · ·+ P(xn), by Axiom 2.
• Thus, P is entirely determined by its values P(x) onsingletons, for x ∈ S .
• The function f : S → [0, 1] with f (x) = P(x) is called theunderlying distribution
• It satisfies∑
x∈S f (x) = 1 since:∑
x∈S f (x) =∑
x∈S P(x) = P(⋃
x∈Sx) = P(S) = 1
B. Jacobs Version: fall 2014 Probability theory 30 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
The uniform distribution
Fix a number n ∈ N and take as sample space S = 1, 2, . . . , n.• The simplest distribution is the uniform distribution
un : S → [0, 1], which assigns the same probability to eachi ∈ S
• Since the sum of probabilities must we 1, the only option is:
un(i) =1
n
• More generaly, on each finite set X we can defineu : X → [0, 1] as u(x) = 1
#X , where #X ∈ N is the number ofelements of X .
B. Jacobs Version: fall 2014 Probability theory 31 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
The binomial distribution
Fix n ∈ N with S = 0, 1, . . . , n and p ∈ [0, 1].
• Define the binomial distribution b : S → [0, 1] as:
b(k) =
(n
k
)pk(1− p)n−k
• Read b(k) as:
the probability of exactly k successes after n trials,each with chance p
Briefly: b(k) = P(k out of n).
• This is well-defined distribution by binomial expansion:∑
k b(k) =∑
k
(nk
)pk(1− p)n−k = (p + (1− p))n = 1n = 1
B. Jacobs Version: fall 2014 Probability theory 32 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Example binomial expansion
Suppose we have a biased coin, which comes up head withprobability p ∈ [0, 1].
Example (Toss the coin n = 5 times)
What is the probability of getting head k times (for 0 ≤ k ≤ 5)?
• If k = 0, then: (1− p)5
• via the formula: b(0) =(50
)p0(1− p)5−0 = (1− p)5
• If k = 1, then: 5p(1− p)4
• b(1) =(51
)p1(1− p)5−1 = 5p(1− p)4
• In general: b(k) =(5k
)pk(1− p)5−k .
What happens if p = 12?
B. Jacobs Version: fall 2014 Probability theory 33 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Another binomial distribution example
Hospital records show that of patients suffering from a certaindisease, 75% die of it. What is the probability that of 6 randomlyselected patients, 4 will recover?
• We have n = 6, with recovery probability p = 14 .
• Hence b(4) =(64
)(14)4(34)2 ≈ 0, 0329595
• Picture of all (recovery) probabilities in a histogram
(source: intmath.com)
B. Jacobs Version: fall 2014 Probability theory 34 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Other distributions
There are many other standard distributions, like:
• Normal distribution (see later in the continuous case)
• Hypergeometric distribution
• Poisson distribution• for independent occurrences, where some average µ is known
• then p(k) = e−µ · µk
k! , for k ∈ N. For instance, for µ = 3,
We will not discuss these distributions here. Look up the details,later in your life, when you need them.B. Jacobs Version: fall 2014 Probability theory 35 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Conditional probability intro
Example (Suppose you throw one dice)
• Of course, the probability of 4 is 16
• But what is the probability of 4, if you already know that theoutcome is even?
• Intuitively it is clear it should be: 13 .
• We write P(4) = 16 and P(4 | even) = 1
3
Conditional probability is about updating probabilities in the lightof given (aka. prior) information.
B. Jacobs Version: fall 2014 Probability theory 37 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Conditional probability example
Assume a group of students for which:
• The probability that a student does mathematics andcomputer science is 1
10
• The probability that a student does computer science is 34 .
Question: What is the probability that a student doesmathematics, given that we know that (s)he does computerscience?
Answer: We have P(M ∩ CS) = 110 and P(CS) = 3
4 .
We seek the conditional probability P(M | CS) = “M, given CS”
The formula is:
P(M | CS) =P(M ∩ CS)
P(CS)=
11034
=4
30=
2
15.
B. Jacobs Version: fall 2014 Probability theory 38 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Basic definitions
Definition
For two events A,B, the conditional probability P(A | B) = “theprobability of A, given B”, is
P(A | B) =P(A ∩ B)
P(B).
Alternatively, P(A | B) · P(B) = P(A ∩ B).
Definition
Two events A,B are independent if P(A ∩ B) = P(A) · P(B).
Equivalently, P(A | B) = P(A).
B. Jacobs Version: fall 2014 Probability theory 39 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Election example
Assume there are three candidates: A,B ,C ; only one can win
• the probability P(A) that A wins is the same as for B
• P(C ) is half of P(A).
Question 1: What are P(A),P(B) and P(C )?
Answer 1: Solving P(A) + P(B) + P(C ) = 1, P(A) = P(B) andP(C ) = 1
2P(A) yields: P(A) = P(B) = 25 ,P(C ) = 1
5 .
Question 2: Assume A withdraws; what are the chances of B,Cnow?
Answer 2: Think first what they would be intuitively!
P(B | ¬A) = P(B∩¬A)P(¬A) = P(B)
1−P(A) =2535
= 23
P(C | ¬A) = P(C∩¬A)P(¬A) = P(C)
1−P(A) =1535
= 13 .
B. Jacobs Version: fall 2014 Probability theory 40 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Conditional probability, for multiple events
• Recall P(A1 ∩ A2) = P(A1 | A2) · P(A2)
• HenceP(A1 ∩ A2 ∩ A3) = P(A1 | A2 ∩ A3) · P(A2 ∩ A3)
= P(A1 | A2 ∩ A3) · P(A2 | A3) · P(A3).
• Alternatively:
P(A1 | A2 ∩ A3) =P(A1 ∩ A2 ∩ A3)
P(A2 ∩ A3)=
P(A1 ∩ A2 ∩ A3)
P(A2 | A1) · P(A1)
• This can be generalised to A1, . . . ,An.
B. Jacobs Version: fall 2014 Probability theory 41 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Partitions
Definition
A partition of a sample space S is a collections of eventsA1, . . . ,An ⊆ S with both:
A1 ∪ · · · ∪ An = S and Ai ∩ Aj = ∅, for i 6= j
A binary partion is given by A,¬A.
B. Jacobs Version: fall 2014 Probability theory 42 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Partitions and the total probability lemma
Lemma (Total probability)
For a partition A1, . . . ,An and arbitrary event B,
P(B) = P(B | A1) · P(A1) + · · ·+ P(B | An) · P(An).
Because: P(B | A1) · P(A1) + · · ·+ P(B | An) · P(An)
= P(B ∩ A1) + · · ·+ P(B ∩ An)
= P((B ∩ A1) ∪ · · · ∪ (B ∩ An))
= P(B ∩ (A1 ∪ · · · ∪ An))
= P(B ∩ S)
= P(B).
B. Jacobs Version: fall 2014 Probability theory 43 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Total probability illustration
Example (Two boxes with long & short bolts)
• In box 1, there are 60 short bolts and 40 long bolts. In box 2,there are 10 short bolts and 20 long bolts. Take a box atrandom, and pick a bolt. What is the probability that youchose a short bolt?
• Write Bi for the event that box i is chosen, for i = 1, 2
• The solution is:
P(short) = P(short | B1)P(B1) + P(short | B2)P(B2)
= 60100 · 12 + 10
30 · 12= 3
10 + 16
= 715 .
B. Jacobs Version: fall 2014 Probability theory 44 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Bayes’ Rule/Theorem
Theorem
For events E ,H we have:
P(H | E ) =P(E | H) · P(H)
P(E ).
Terminology:
• E = evidence, H = hypothesis
• P(H) = prior probability, P(H | E ) = posterior probability
Proof
P(E | H) · P(H) = P(E ∩ H) = P(H ∩ E ) = P(H | E ) · P(E ).
B. Jacobs Version: fall 2014 Probability theory 45 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Bayes’ Rule/Theorem for partitions
Theorem
Suppose we have a partition H1, . . . ,Hn. Then:
P(Hi | E ) =P(E | Hi ) · P(Hi )∑j P(E | Hj) · P(Hj)
.
Proof Since P(E ) =∑
j P(E | Hj) · P(Hj) by the total probabilitylemma.
B. Jacobs Version: fall 2014 Probability theory 46 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Machine example
Setting and question
• There are 3 machines M1,M2,M3 producing items, withdefect probabilities 0,01, 0,02, 0,03 respectively.
• 20% of items come from M1, 30% from M2, 50% from M3
• Find the probability that a defect item comes from M1.
Solution
• We have P(M1) = 0, 2, P(M2) = 0, 3, P(M3) = 0, 5 andP(D | M1) = 0, 01, P(D | M2) = 0, 02, P(D | M3) = 0, 03
• Via the total probability lemma we compute P(D) as:
P(D | M1) · p(M1) + P(D | M2) · p(M2) + P(D | M3) · p(M3)
= 0, 01 · 0, 2 + 0, 02 · 0, 3 + 0, 03 · 0, 5 = 0, 023
• Then: P(M1 | D) = P(D|M1)·P(M1)P(D) = 0,01·0,2
0,023 = 0, 087B. Jacobs Version: fall 2014 Probability theory 47 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Rain and umbrella example
Setting
• Prior knowledge P(rain) = 15
• P(umbrella | rain) = 710 and P(umbralla | ¬rain) = 1
10
• Suppose you see someone with an umbrella. What is theprobability that it rains?
Answer
P(rain | umbrella)
=P(umbrella | rain) · P(rain)
P(umbrella | rain) · P(rain) + P(umbrella | ¬rain) · P(¬rain)
=710 · 15
710 · 15 + 1
10 · 45=
750
750 + 4
50
=7
11≈ 0, 64.
B. Jacobs Version: fall 2014 Probability theory 48 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Inference: learning from iterated observation
• In the previous example we started from P(rain) = 15 , and
computed P(rain | umbrella) = 711 .
• Thus after observing this umbrella we may update our priorknowledge to P ′(rain) = 7
11 .• What if we see another, second umbrella? Surely, the
probability of rain is even higher. How to compute it?• We can play the same game with the updated rain probability
P ′(rain) = 710 .
P(rain | 2umbrellas)
= P(umbrella|rain)·P′(rain)P(umbrella|rain)·P′(rain)+P(umbrella|¬rain)·P′(¬rain)
=710· 711
710· 711+ 1
10· 411
=49110
49110
+ 4110
= 4953 ≈ 0, 92.
• See courses on AI (esp. Machine Learning) for moreinformation, esp. on Bayesian networks (graphical models)!
B. Jacobs Version: fall 2014 Probability theory 49 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Random variable
Associating a value with an experiment
• Suppose, we throw a coin, so the sample space is S = H,T• If the outcome is H, I get 100e, otherwise I loose 100e• This situation is described via a random variable X : S → R
• X (H) = 100,X (T ) = −100
Definition
Let S be a sample space. A random variable is a real functiondefined on S , of the form X : S → R.
A random variable is also called a stochatistic variable, or simply astochast (in Dutch: kansvariabele)
B. Jacobs Version: fall 2014 Probability theory 51 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
More about random variables
• A random variable X : S → R that takes on a finite (or acountably infinite) number of values is called discrete
• this means that the range R(X ) ⊆ R is finite (or countable)• otherwise we have a non-discrete or continuous random
variable• Note that if S is discrete, then so is X .
• If we have two random variables, say X ,Y : S → R, then wecan also define the random variables X + Y ,X − Y , rX in theobvious, pointwise manner:
(X + Y )(s) = X (s) + Y (s)
(X − Y )(s) = X (s)− Y (s)(rX )(s) = r · X (s)
B. Jacobs Version: fall 2014 Probability theory 52 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Random variables and events
Definition
Let X : S → R be a random variable, and x ∈ R be an outcome.
There in event (X = x) ⊆ S , understood as “outcome is x”,namely:
(X = x) = s ∈ S |X (s) = x.If there is also a probability measure P : P(S)→ [0, 1], then wewrite P(X = x) ∈ [0, 1] for the probability of this event(X = x) ⊆ S .
Lemma
If X is a discrete random variable, say with outcomes x1, . . . , xm,then the events (X = x1), . . . , (X = xn) form a partition.
B. Jacobs Version: fall 2014 Probability theory 53 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Overview: measures / distributions / random variables
Let S be a sample space. Recall:
• a probability measure P is a function from events toprobabilities:
P(S)P // [0, 1]
• If S is finite, P corresponds to a distribution from samples toprobabilities:
Sf // [0, 1] via f (s) = P(s)
• A random variable is a function from samples to values:
SX // R
It gives rise to events (X = x) ⊆ S , with probabilityP(X = x) ∈ [0, 1].
B. Jacobs Version: fall 2014 Probability theory 54 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Example
A (fair) coin is tossed twice times; we count the number of heads.
• the sample space is S = HH,HT ,TH,TT• We have a uniform distribution f : S → [0, 1] namely f (s) = 1
4
• There is a “sum of heads” random variable X : S → R:
X (HH) = 2, X (HT ) = X (TH) = 1, X (TT ) = 0
We are in the discrete case, with range R(X ) = 0, 1, 2 ⊆ R.
• Example events with probabilities:
P(X = 0) = 14 , P(X = 1) = 1
2 , P(X = 2) = 14 .
B. Jacobs Version: fall 2014 Probability theory 55 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Expectation and variance (discrete case)
Definition
Let X : S → R be a discrete random variable, with values (range)x1, . . . , xn ⊆ R.
• The expectation or expected value or weighted meanE (X ) ∈ R is:
E (X ) = P(X = x1) · x1 + · · ·+ P(X = xn) · xn
• The variance (spreiding) Var(X ) ∈ R describes the spread
Var(X ) = E((X − E (X ))2
)=∑
i P(X = xi ) · (xi − E (X ))2
• The standard deviation is: σX =√
Var(X ).• an outcome in [E (X )− σX ,E (X ) + σX ] is considered “normal”
B. Jacobs Version: fall 2014 Probability theory 56 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Dice example
We have S = 1, 2, 3, 4, 5, 6 with X : S → R simply X (s) = s
• P(X = i) = 16 , for i = 1, 2, 3, 4, 5, 6
• E (X ) =∑
i P(X = i) · i = 16 · 1 + · · ·+ 1
6 · 6= 1
6 · (1 + · · ·+ 6) = 216 = 7
2 = 3, 5
The expectation is the mean for a uniform distribution
• Var(X ) = E((X − E (X ))2
)=∑
i P(X = i) · (i − 72)2
= 16(1− 7
2)2 + · · ·+ 16(6− 7
2)2 = 3512
• σX =√
Var(X ) =√
3512 ≈ 1, 71.
B. Jacobs Version: fall 2014 Probability theory 57 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Lottery example
Setting and question
• In a lottery there are 200 prizes of 5e, 20 of 25e, 5 of 100e.Assuming that 10.000 tickets will be issued and sold, what isa fair price to pay for a ticket?
• Answer: a price that is just a bit more than the expectedamount to be won.
• The sample space has 4 elements; we leave it implicit and onlydescribe the random variable X for the amount to be won.
X 5 25 100 0
P(X = x) 0,02 0,002 0,0005 0,9775
• E (X ) =∑
i P(X = i) · i = 0, 02 · 5 + 0, 002 · 25+0, 0005 · 100 + 0 = 0, 2. So a ticket should cost 20 cent.
B. Jacobs Version: fall 2014 Probability theory 58 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Expectation for the binomial distribution
Recall the paramters are n ∈ N and p ∈ [0, 1]. Then:
• S = 0, 1, . . . , n, to which we add X : S → R with X (i) = i .
• b(k) = P(X = k) =(nk
)pk(1− p)n−k
Lemma
The binomial distribution for n ∈ N and p ∈ [0, 1] satisfies
E (X ) = n · p
Recall the hospital example, with n = 6 patients and p = 14
recovery probability.
The expected number of survivors is 6 · 14 = 32 .
B. Jacobs Version: fall 2014 Probability theory 59 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Expectation for the binomial distribution, proof
E (X ) =k=n∑
k=0
P(X = k) · k =k=n∑
k=0
k(nk
)pk(1− p)n−k
=k=n∑
k=1
n(n−1k−1)pk(1− p)n−k
= npk=n∑
k=1
(n−1k−1)pk−1(1− p)(n−1)−(k−1)
= npi=n−1∑
i=1
(n−1i
)pi (1− p)(n−1)−i
= np(p + (1− p))n−1 = np1n−1 = np.
B. Jacobs Version: fall 2014 Probability theory 60 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Events
• For a discrete random variable X we have seen probabilities ofthe form P(X = x)
• In the continuous case we shall look at P(X ≤ x) or also atP(x ≤ X ≤ y)
• They describe the probability of the events:• (X ≤ x) = s ∈ S |X (s) ≤ x ⊆ S
• (x ≤ X ≤ y) = s ∈ S | x ≤ X (s) ≤ y ⊆ S
B. Jacobs Version: fall 2014 Probability theory 61 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Probability, surface, and integration: discrete case
Recall the hospital records histogram used earlier:
• We can read it as a function f : R→ R• Suppose we are interested in the probability P(1 ≤ X ≤ 3)
• This probability is obtained as P(1 ≤ X ≤ 3) =∫ 31 f (x)dx
B. Jacobs Version: fall 2014 Probability theory 62 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Probability, surface, and integration: continuous case
This idea can be generalised, as suggested in:
P(−2 ≤ X ≤ 2)
=
∫ 2
−2f (x)dx
• The functions f : R→ R used in such a way are calledprobability density functions (pdf, dichtheidsfunctie)
• They should satisfy: f (x) ≥ 0 and:
∫ +∞
−∞f (x)dx = 1.
• This connects the first and second part of this course!
B. Jacobs Version: fall 2014 Probability theory 63 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Example pdf
• Often a continuous random variable is defined directly via aprobability density function (pdf)
• For instance, consider f : R→ R defined by:
f (x) =
14x if 1 ≤ x ≤ 3
0 otherwise.
• Clearly f (x) ≥ 0 and:∫ +∞
−∞f (x)dx =
∫ 3
1
14xdx = 1
8x2]31
= 18(32 − 12) = 1.
• Hence we can define a continuous random variable as:
P(a ≤ X ≤ b) =
∫ b
af (x)dx
For instance P(1 ≤ X ≤ 2) = 38 .
B. Jacobs Version: fall 2014 Probability theory 64 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Cumulative distribution function (verdelingsfunctie)
• In practice, a pdf is often given directly, and the randomvariable is then defined accordingly
• But one can also obtain the pdf from a random variable X
• Define the cumulative distribution function F : R→ [0, 1] as:
F (x) = P(X ≤ x)
Then F (−∞) = 0 and F (+∞) = 1.
• The pdf f is then the derivative f = F ′ = ddx P(X ≤ x).
Then indeed:
P(X ≤ b) = F (b) = F (b)− F (−∞) =∫ b−∞ f (x)dx
P(a ≤ X ≤ b) = P(X ≤ b)− P(X ≤ a)
=∫ b−∞ f (x)dx −
∫ a−∞ f (x)dx =
∫ ba f (x)dx
B. Jacobs Version: fall 2014 Probability theory 65 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Uniform probability, in the continuous case
• Consider the interval [a, b] ⊆ R, for given a < b
• Define a density function ua,b : R→ R as:
ua,b(x) =
1
b−a if x ∈ [a, b]
0 otherwise.
• Then:∫ ∞
−∞ua,b(x)dx =
∫ b
a
1b−adx = 1
b−ax]ba
= 1b−a(b − a) = 1.
B. Jacobs Version: fall 2014 Probability theory 66 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Expectation, in the continuous case
Definition
Let X be continuous random variable, given by pdf f . Then:
E (X ) =
∫ ∞
−∞x · f (x)dx
Var(X ) =∫∞−∞(x − E (X ))2 · f (x)dx
σX =√
Var(X ).
B. Jacobs Version: fall 2014 Probability theory 67 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Example, in the uniform case
Recall ua,b : R→ R with value 1b−a on [a, b]. Then:
E (X ) =
∫ +∞
−∞xua,b(x)dx =
∫ b
a
xb−adx
= 12(b−a)x2
]ba
= b2−a22(b−a) = (a−b)(a+b)
2(b−a) = a+b2
Var(X ) =
∫ +∞
−∞(x − E (X ))2ua,b(x)dx =
∫ b
a
(x− a+b2
)2
b−a dx
= 14(b−a)
∫ b
a(2x − (a + b))2dx
= 14(b−a)
∫ b
a4x2 − 4(a + b)x + (a + b)2dx
= 14(b−a)
[43x3 − 2(a + b)x2 + (a + b)2x
]ba
= · · · = 112(a− b)2.
B. Jacobs Version: fall 2014 Probability theory 68 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Properties, also for the discrete case
Lemma
• E (X + Y ) = E (X ) + E (Y )
• E (r · X ) = r · E (X )
• Var(X ) = E (X 2)− E (X )2
B. Jacobs Version: fall 2014 Probability theory 69 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Normal (Gaussian) distribution
An important class of probability density functions is of this form:
• Such bell curves are typical for normal/Gaussian distributions
• The curve is determined by two parameters• the mean, written as µ, which determines the location of the
middle of the bell• the variance, written as σ2, which determines the width
• This distribution is very common in practice, whenobservations pile up around a particular value
B. Jacobs Version: fall 2014 Probability theory 70 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Normal (Gaussian) distribution, definition
Definition
Given parameters µ for mean and σ for variance, consider the pdf
f (x) =1
σ√
2πe−
12( x−µ
σ)2
The associated random variable X is called normal or Gaussian. Itsdistribution function is thus:
P(a ≤ X ≤ b) =1
σ√
2π
∫ b
ae−
12( x−µ
σ)2dx
One writes X ∼ N(µ, σ).
The integral cannot be computed exactly, so one uses tables ofcumulative probabilities for a special normal distribution tocalculate the probabilities.
B. Jacobs Version: fall 2014 Probability theory 71 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Standardising normal distribution
N(0, 1) is often called the standard normal random variable; the
pdf involved is 1√2π
e−12x2 ; it is centered around 0.
Theorem
If X is a normal random variable with mean µ and standarddeviation σ, then Z = X−µ
σ is a standard normal random variable.
It satisfies E (Z ) = 0 and σZ = 1.
In a picture:
B. Jacobs Version: fall 2014 Probability theory 72 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Standardisation example
Setting and question
Suppose students can get 100 marks for their exam; the output is:
marks = [85, 24, 63, 12, 87, 90, 33, 38, 25]
Which ones should “normally” fail?
Via a small Python program we get µ = 50.78 and σ = 28.94.
The list [(i − µ)/σ for i in marks] of standardized marks is:
[1.18,−0.93, 0.42,−1.34, 1.25, 1.36,−0.61,−0.44,−0.89]
By construction this list has mean 0 and deviation 1.
Only one entry deviates < −1; this one should fail “normally”; itcorresponds to the result 12 in the original list.
B. Jacobs Version: fall 2014 Probability theory 73 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Overview
In this course we have:
1 seen/recalled the basics of mathematical calculus: derivationand integration
2 introduced the basics of probability theory
These areas are connected, via continous random variables, whereprobabilities are computed via integration of a probability densityfunction (pdf).
B. Jacobs Version: fall 2014 Probability theory 74 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Preparing for the exam
• The emphasis is on being able to calculate things, not toprove properties
• much like in the exercises
• The exam is “closed” book
• Hence, definitions and results must be known by heart, whenthey are relevant for doing calculations
• A simple calculator will be provided (only +,−, ∗, /)
B. Jacobs Version: fall 2014 Probability theory 75 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
The role of the exercises
• The exercises form the best preparation for the exam!
• There will be one more exercise, which is a mock exam• do it as a test for yourself, without the notes• hence you can see what you know and what you don’t• an elaborated version will be put online, after the hand-in date
• Of the marks for the exercises of the past weeks we will dropthe lowest one
• The remaining average of the exercises will make up half ofyour final mark
• the other half comes from the written exam• this exam must be ≥ 5 in order to pass
• There will be a second written exam, where the same averageof the exercises will be used.
B. Jacobs Version: fall 2014 Probability theory 76 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
Summary of important points
• Derivation• limits, derivatives and their interpretation as tangent,
differentation rules, special functions, function investigation
• Integration• definite integral as area, indefinite integral as inverse to
differentiation, special functions, substitution & integration byparts, arc length
• Probability• combinatorics (samples with/without order/replacement),
binomials, (discrete) probability measure, uniform & binomialdistributions, conditional probability, total probability, Bayes’rule, random variable, expectation, standard deviation, pdf,normal distributions and standardisation
B. Jacobs Version: fall 2014 Probability theory 77 / 78
CombinatoricsProbability
Conditional probability and Bayes’ ruleRandom variables
Radboud University Nijmegen
The exam itself
• Date and time: Tuesday, 4 nov, 8:30 (precise) - 11:30• HAL 2, for “regular” students• HG00.058, for “extra time” students (until 12:30)
• If you qualify for extra time, let me know in advance, viaemail.
• Make sure (and double-check) that you are registered• do this today!• registration cannot be done on the spot; you will be excluded
Good luck with the preparation, and the exam itself!
B. Jacobs Version: fall 2014 Probability theory 78 / 78