introduction to probability theory -...
TRANSCRIPT
2017 Michael TUNG
1
Introduction to Probability Theory
Probability Theory is a branch of Mathematics concerned with MathematicalStructures or Models called Probability Space and from which we can define anAbstract Mathematical Object called Probability Measure or simply Probabilitywhich is applied to Quantify Uncertainty so that it enables us to study Uncertainty.As a common treatment in Mathematics we will adopt the Axiomatic approach toestablish the Probability Theory by starting with some primitive concepts andAxioms so that we can develop the discussion of this important MathematicalTheory in both Pure Mathematics and in Statistics.
Definition 1 (Statistical Population)A Population U is a set or class or collection of Objects which is of interest forsome Study or experiment. A Population U can be regarded as the UniversalSet in a given context.
Definition 2 (Statistical Object and Attribute)Let U be a Population. Any Ux is called an Object.All Objects of any given Population have some common Attributes or Propertieswhich are generally Unary Functions and Unary Predicates. The set of allcommon Attributes of all Objects of a given Population U is denoted by
A : aV : Uf aa , where A is a Countable Set of Attribute Identifiers or
Headers, which are simply Variable Identifiers and A, aVa is the set of all
possible values of Attribute A, aV : Uf aa . Any Attribute A, aV : Uf aa
can be called by Aa .
Example 1In a study of all Mathematicians working in any World Top 50. The PopulationU would be the set of all current Faculty members of Mathematics Departmentof all World Top 50 universities. Then the set of all common Attributes could be
, ..., f, f, f, f, ff Paper No.WeightHeightyNationalitSexAge , where 1000 x: xVAge ;
leFemale, MaVSex ; 0 : xxVV WeightHeight and Paper No.V (Set of
all Natural Numbers).
2017 Michael TUNG
2
Definition 3 (Random Experiment)A Random Experiment is any process which:1. has a well-defined set of possible outcomes (more than one possible
outcome);2. can generate an outcome;3. can be infinitely repeated;4. is impossible to estimate exactly which outcome would occur.
Definition 4 (Random Trial)Any actual execution of a Random Experiment is a Random Trial.
Definition 5 (Sample Space and Outcome)A Sample Space is the set of all possible Outcomes of a Random Experiment. It iscommonly denoted byΩ .
Definition 6 (Event)Any subset E of a given Sample Space Ω is an Event. That is, an Event E is anyset of possible outcomes. Note that any outcome Ωω can be regarded as the
Event ΩωE .
An Event is said to occur if and only if any outcome of that Event occurs.
We now consider the process of randomly selecting an Object u from a given
Population U and observing the value A, aVv aa of a common Attribute
A, afa from u. If after the observation we put u back to U, then this process
theoretically can be repeated infinite times and it has a well-defined set of
possible outcomes such that the observed value A, aVv aa is impossible to
estimate exactly, due to Randomness. Therefore this process is a Random
Experiment with Sample Space A, aVΩ a .
Definition 7 (Random Experiment of randomly selecting and observing a
common Attribute’s value from an Object of a Population)
Let U be a Population and A : aV : Uf aa be the set of all common
Attributes of all Objects of U .
2017 Michael TUNG
3
Then the Sample Space of a Random Experiment of randomly selecting an
Object Uu and observing the value A, aVv aa of a common Attribute
A, afa from Uu is defined as A, aVΩ a .
Definition 8 (Random Variable)A Random Variable, abbreviated as RV and is commonly denoted by X, Y or Z, isa Function or Mapping from a Sample SpaceΩ to a set of Real Numbers .
Formally, a RV can be expressed as CX : Ω such that xωXω ,
where Ωω is a possible outcome.A RV X is a Discrete Random Variable if and only if or ZX : Ω .A RV X is a Continuous Random Variable if and only if X : Ω .Therefore, a RV is simply a way to quantify all elements of a Sample Space and
all elements of the Power Set of the Sample Space ΩE : EΩ (set or class
of all possible Events).
Since the Random Experiment of randomly selecting, with replacement, an
Object u from a given Population U and observing the value A, aVv aa of a
common Attribute A, afa from the selected u has Sample Space A, aVΩ a ,
so a Random Variable aX : V can be defined here and hence there is a
One-One Correspondence between a common Attribute A, afa and a RV X. Or
we can informally regard a common Attribute A, afa as its RV X. So we have
the following definition.
Definition 9 (Values of a Random Variable as Sample Space)Let U be a Population. Then the Sample Space of a Random Experiment ofrandomly selecting an Object Uu , with replacement, and observing the
value A, aVv aa of a common Attribute A, afa from Uu is defined as
aVXΩ , where X is the RV of A, afa , i.e. aX : V .
2017 Michael TUNG
4
Therefore, we can informally regard any common Attribute A, afa as its RV X.
Whether X is Discrete or Continuous depends on the nature of A, aVa .
Example 2
In Example 1, for the common Attribute yNationalitf , if we define the Random
Variable X by yNationalitX : V such that X(UK)=1; X(US)=2; X(France)=3;
X(Germany)=4; X(Russia)=5; X(Japan)=6; X(India)=7, and randomly select aMathematician from the Population U , with replacement, and observe the
value yNationalityNationalit Vv of yNationalitf from the selected Mathematician, then the
Sample Space will be 71 n : nVXΩ yNationalit .
We know that the Random Experiment of randomly selecting an Object u from a
given Population U and observing the value A, aVv aa of a common
Attribute A, afa from Uu has Sample Space aVXΩ , where X is the
RV of A, afa , i.e. aX : V . Since we cannot estimate exactly which
outcome Ωx would occur, we need a measure on the likelihood of theoccurrence of an Event. This measure is called Probability. There are two mainways of assigning a Probability measure to any given Event of a RandomVariable X.
Definition 10 (Theoretical Probability or Classical Probability)If all possible outcomes of a Sample Space Ω are Equally Probable, then theTheoretical Probability or Classical Probability of any Event ΩE , denoted
by EP , is defined as the ratio of the number of outcomes of E to the number of
all possible outcomes. That is, 10 ΩE
EP , where S is the Cardinality of
Set S.
In terms of Random Variable X, we have ΩXEX
EXPEP .
2017 Michael TUNG
5
Notice that Definition 10 has a critical drawback since the formula of EP
breaks down if the Sample Space is an Uncountable Set. Because in this case theCardinality of an Uncountable Set is Mathematically at least equal to that of .
If E is a Countable Set, then EP is zero (since 0
E
ΩE
E ). However,
if E is also an Uncountable Set, then we can still work out EP accordingly as
shown in the following example.
Example 3 (Geometric Probability)Consider a Random Experiment of a simple dart game. The dartboard hasradius 9cm and on which there is a big bullseye with radius 3cm.
Then 91
819
ππ
rtboardArea of dallseyeArea of buseyes the bullA dart hitP .
So Example 3 tells us that even both the Event and the Sample Space areUncountable sets like all points in a plane region or in a line segment, but we can
still base on Definition 10 to calculate these Theoretical probabilities since
can be finite.
Definition 11 (Experimental Probability or Relative Frequency)Let Ω be the Sample Space of a Random Experiment. The Experimental
Probability or Relative Frequency of any Event ΩE , denoted by EP , is
defined as the ratio of the number of times E occurred to the number of total
Trials the Random Experiment executed. Note that 10 EP .
Example 4In Example 3, consider throwing darts 1000 times such that each time the landeddart will be removed from the dartboard. Assume the dartboard is smooth aftereach throw and there are 115 times the dart landed on the bullseye, then
1000115
seyes the bullA dart hitP .
2017 Michael TUNG
6
Even now we have two seemingly different definitions for the ProbabilityConcept, but later on we are able to show that under any given Sample Spacethese two definitions will produce the same Probability of any Event, byconsidering the Sampling Distribution of the Sample Mean of a BernoulliProbability Distribution Model and applying The Central Limit Theorem.
Recall that the Sample Space of the Random Experiment of randomly selecting
an Object u from a given Population U and observing the value A, aVv aa
of a common Attribute A, afa from Uu is aVXΩ , where X is the RV
of A, afa , i.e. aX : V . Since the occurrence of any particular value of X is
uncertain, so the concept of Probability can be applied to define a Function or aMapping which assigns Probabilities to each and every value of X. Also someproperties regarding the concept of Probability also apply to X.
Definition 12 (Population Distribution)
The Population Distribution of a common Attribute A, afa or its respective RV
aX : V , is defined as a Function or Mapping 10, Vp : X a such
that xvE, Xv : VEPxXPp(x)x a which is the unique
Theoretical Probability (i.e. The ratio of En to aVn ) assigned to aVE .
Example 5
Consider the common Attribute Researchf in a Population of 1000 Mathematicians.
Among these, there are: 150 research on Analysis; 100 research on Algebra; 400research on Differential Equation (DE); 50 research on Mathematical Logic (ML)and 300 research on Probability and Statistics (PS). Define a RV:
DVPS, , ML, , DE, , , Algebra, Analysis, X Research 54321 , where
54321 , , , , D ,
then the Population Distribution of Researchf or X is given by the Function
10, p : D such that:
2017 Michael TUNG
7
Research Analysis Algebra DifferentialEquation
MathematicalLogic
Probabilityand
StatisticsProbability 0.15 0.1 0.4 0.05 0.3
The Population Distribution is theoretically an Empirical Distribution in whichall probabilities are computed by the Theoretical Probability approach. But inpractice, it is impossible or very difficult to obtain the Empirical PopulationDistribution since we can only have limited knowledge about the entirePopulation which is usually an Infinite Set, so a Hypothetical Distribution isusually suggested instead.
Definition 13 (Probability Distribution of a Discrete Random Variable)Let the Sample Space of a Random Experiment of randomly selecting an Object
u from a given Population U and observing the value A, aVv aa of a
Discrete common Attribute A, afa from Uu (i.e. aV is a set of Discrete values)
is or ΖVXΩ a , where X is a Discrete Random Variable of A, afa .
Then we define:1. The Probability Distribution of the Discrete Random Variable X or the
Probability Mass Function (PMF) of X by 10, : Ωf X such that
xvE, Xv : VEPxXP(x)fx aX which satisfies:
a. 10 (x)fΩ, x X ;
b. 1 Ωx
Xa (x)fΩPVP ;
c.
EXx
Xa (x)fEXPE, PVE .
2. The Cumulative Distribution Function (CDF) of the Discrete Random Variable
X, 10, : ΩFX , which is an Increasing Function defined on Ω by
xXP(x)Fx X
xt
Xa (t)fxvE, Xv : VEP .
2017 Michael TUNG
8
Definition 14 (General Probability Distribution of a Discrete Random Variable)
Let X : Ω be a Discrete Random Variable, where Ω is a Sample Space.Then we define:1. The Probability Distribution of the Discrete Random Variable X or the
Probability Mass Function (PMF) of X by 10, Ω : Xf X such that
xvE, XvΩ : EPxXP(x)fx X which satisfies:
a. 10 (x)f, ΩXx X ;
b.
1 ΩXx
X (x)fΩXPΩP ;
c.
EXx
X (x)fEXPEΩ, PE .
2. The Cumulative Distribution Function (CDF) of the Discrete Random Variable
X, 10, Ω : XFX , which is an Increasing Function defined on ΩX by
xXP(x)Fx X
xt
X (t)fxvE, XvΩ : EP .
Definition 15 (Probability Distribution of a Continuous Random Variable)Let the Sample Space of a Random Experiment of randomly selecting an Object
u from a given Population U and observing the value A, aVv aa of a
Continuous common Attribute A, afa from Uu (i.e. aV is a set of Continuous
values) be aVXΩ , where X is a Continuous Random Variable of A, afa .
Then we define:
1. A One to One Correspondence 10, : ΩFX called Cumulative Distribution
Function (CDF) of the Continuous Random Variable X by
xXP(x)Fx X xvE, Xv : VEP a such that:
a. XF is Increasing. That is, )(xF)(xFxΩ, x, xx XX 212121 ;
b. 0
(x)Flim Xx;
c. 1
(x)Flim Xx.
2017 Michael TUNG
9
2. If F is Differentiable on Ω such that (x)f(x)Fdxd
XX is Continuous on Ω and
dt(t)f(x)Fx
XX , then (x)F
dxd(x)f XX is defined as the Probability
Distribution of the Continuous Random Variable X or the Probability Density
Function (PDF) of X. 10, : Ωf X is defined by:
xvE, Xv : VEPxXP(x)fx aX such that
(x)dF dx(x)fEXPE, PVEEXx
XEXx
Xa
.
Definition 16 (General Probability Distribution of a Continuous RV)In general, let X : Ω be a Continuous Random Variable, where Ω is aSample Space. Then we define:
1. A One to One Correspondence 10, Ω : XFX called Cumulative
Distribution Function (CDF) of the Continuous Random Variable X by
xXP(x)Fx X xvE, XvΩ : EP such that:
a. XF is Increasing. That is, )(xF)(xFx, xΩX, xx XX 212121 ;
b. 0
(x)Flim Xx;
c. 1
(x)Flim Xx.
2. If F is Differentiable on ΩX such that (x)f(x)Fdxd
XX is Continuous on
ΩX and dt(t)f(x)Fx
XX , then (x)F
dxd(x)f XX is defined as the
Probability Distribution of the Continuous Random Variable X or the
Probability Density Function (PDF) of X. 10, Ω : Xf X is defined by:
xvE, XvΩ : EPxXP(x)fx X such that
(x)dF dx(x)fEXPEΩ, PEEXx
XEXx
X
.
2017 Michael TUNG
10
Theorem 1 (Obvious without Proof)
If the PDF of a Continuous Random Variable X is 10, Ω : Xf X such that
xvE, XvΩ : EPxXP(x)fx X , then:
dx(x)f dx(x)fEXPE, PΩXa, bEXb
a XEXx
X
.
Theorem 2
If the PDF of a Continuous Random Variable X is 10, Ω : Xf X such that
xvE, XvΩ : EPxXP(x)fx X , then:
0 xvE, XvΩ : EPxXP(x)f X .
Proof
Since xEX , so
00
x
x XEXx
XX (x)f dx(x)fEPxXP(x)f .
Theorem 3
If the PDF of a Continuous Random Variable X is 10, Ω : Xf X such that
xvE, XvΩ : EPxXP(x)fx X , then:
1
dx(x)f dx(x)fΩXPΩP XΩXx
X .
Proof
Let 0 , bb . Since ΩX can be at most , , so we have:
b bXb
b
b XbXΩXx
X (x)F lim dx(x)flim dx(x)f dx(x)fΩXPΩP
1
(b)Flim(b)Flimb)(Flim(b)Flimb)(F(b)Flim XbXbXbXbXXb.
2017 Michael TUNG
11
In the most general case, for any Discrete Sample Space Ω , we have thefollowing Mathematical Structures or Models called Probability Space whichsatisfy all Axioms and Theorems of Probability Theory, and it provides the mostgeneral concept of Discrete Probability Distribution or Discrete ProbabilityMeasure.
Definition 17 (Probability Space, Model of Probability Theory)
ATwo-sorted Mathematical Structure , P, , , Ω, M , where:
1. Domain Ω called Sample Space which is a set or class of objects calledOutcomes;
2. Domain ΩΩX : X which is a set of Events;
3. , , are the usual Difference, Union and Intersection Operations of Setdefined on ;
4. 10, P : is a Real-valued Function called Probability Measure,
is an Abstract Mathematical Structure called Probability Space if and only if itsatisfies:
1. ΩΩX : XΩ ;
2. E , its Complement Event EΩE C ;
3.
1
21i ii E, , , iE ;
4. 1ΩP ;
5.
1121
i ii ikji EPEPkj, EE, , , iE .
Theorem 4
Let , P, , , Ω, M be a Probability Space. Then 0P .
Proof
011 PPPΩPΩPΩP .
Theorem 5
Let , P, , , Ω, M be a Probability Space. Then ,E , 0EP .
2017 Michael TUNG
12
Proof
This is obvious from the definition of Probability Measure 10, P : .
Theorem 6
Let , P, , , Ω, M be a Probability Space. Then 21 , EE and
21 EE , we have 21 EPEP .
Proof (Countable Case)
Let nmm , ω, , ω, , ω, ωω, E, ω, , ω, ωωE 32123211 . Then both
Events can be expressed as m
i iωE11
and n
i iωE12
, where
n,, mm, n iΩ, ωi . Notice that iω we have
k, jj, k, ωω kj .
So 211111 EPωPωPωPωPEP n
i in
i im
i im
i i .
For the Uncountable Case, we have to invoke the Mathematical Structure ofMeasure Space.
Theorem 7
Let , P, , , Ω, M be a Probability Space. Then ,E we have
10 EP .
Proof
By Theorem 5, we have 0EP . Now by Theorem 6, since ΩE and Ω , so
we have 1 ΩPEP . Hence 10 EP .
Theorem 8 (General Addition Rule of Probability, Inclusion-Exclusion Principle)
Let Ω, E, , EE n 21 are any n Events. Then,
2017 Michael TUNG
13
n
i in
nji jin
i in
i i EPEEPEPEP1
1111
1
.
ProofNotice that it is not difficult to see that the Countable Union on the Left HandSide can be recovered from the Sets on the Right Hand Side by Set Operationsand Venn Diagram. So we only need to prove both sides have the sameCardinalities:
n
i in
nji jin
i in
i i EEEEE1
1111
1
. We are going to
prove it by Mathematical Induction.
If 2n , then by drawing Venn Diagram we have termC termsC
EEEEEE
22
21
212121 ;
If 3n , then by drawing Venn Diagram again, clearly we have:
termC termsC termsC
EEEEEEEEEEEEEEE
33
32
31
321323121321321
Suppose 13 m, n, mm is true. This means,
termC
m
i im
termsC
mji ji
termsC
m
i im
i i
mmmm
EEEEE
111
21
1
1
1
211
1
1
1
11
Then if mn , by the Induction Hypothesis we have:
1
1
1
1
2111
1
1
1
1
211
1
1
1
1
1
1
1
11
1
1
111
21
1
m
i mim
i im
mji jim
i i
mm
i im
termC
m
i im
termsC
mji ji
termsC
m
i i
mm
i imm
i imm
i im
i i
EEEEEE
EEEEEEE
EEEEEEEmmmm
Apply the Induction Hypothesis to the last term wehave:
termC
m
i mim
termsC
m
i
m
i, jj mjm
termsC
mji mji
termsC
m
i mi
m
i mi
mm
mmmm
EE
EEEEEEE
EE
11
121
21
1
1
1
2
1
1
1
1
311
1
1
1
1
1
1
2017 Michael TUNG
14
m
i im
m
i
m
i, jj mjm
mji mjim
i mi
E
EEEEEEE
1
2
1
1
1
1
311
1
1
1
1
Substitute this into the last equality ofm
i iE1
we have:
m
i imm
i
m
i, jj mjmm
i im
mji mjimkji kjimji jim
i i
m
i i
EEEE
EEEEEEEEE
E
1
11
1
1
1
21
1
2
111111
1
111
m
i imm
i
m
i, jj jm
m
i im
mkji kjimji jim
i i
EE
EEEEEEE
1
11
1 1
2
1
1
2111
11
1
termC
m
i im
termsC
m
i
m
i, jj jm
termsC
mkji kji
termsC
mji ji
termsC
m
i i
mm
mmmmm
E
EEEEEEE
1
1
1 1
2111
1
1
1321
Therefore, by the Principle of Mathematical Induction, the General AdditionRule of Probability is true for any n .
Definition 18 (Mutually Exclusive Events)Let Ω, BA, B . A and B forms a pair of Mutually Exclusive Events if and
only if BA or equivalently 0 BAP .
Let Ω, E, , EE n 21 are any n Events.
It is a collection of Mutually Exclusive Events if and only if jn, ii, j 1 ,
ji EE or equivalently 0 ji EEP .
Theorem 9 (Addition Rule of Mutually Exclusive Events)
Let Ω, E, , EE n 21 are any n Mutually Exclusive Events. Then:
nk, EPEP k
i ik
i i 2
11 .
2017 Michael TUNG
15
Proof
“Since jn, ii, j 1 , ji EE , this implies nj 2 and
, n, , ii j 211 , we have ji
ii iE1
. So by Theorem 8 we have:
n
i in
i in
nji jin
i in
i i EPEPEEPEPEP11
1111
1
”Now nk 2 , set kn in “...” and repeat the same argument as “...”.Hence we obtained the result.
Corollary 1 (Probability of Complement Event) (Obvious without Proof)
ΩE , the Probability of the Complement Event CE is EPEP C 1 .
Definition 19 (Conditional Probability and Multiplication Rule of Probability)Let Ω, BA, B . The Probability of A given that B occurred (so B becomesthe new Sample Space), called the Conditional Probability of A given B, denoted
by BAP , is defined by BP
BAPBAP .
In this regard, we have ΩEPΩP
ΩEPEPEΩ, PE
1
.
From this definition, we obtain the Basic Multiplication Rule of Probability, that
is, APABPBPBAPBAP if A, B .
In Definition 19, if Event B occurred, then the Sample Space would reduce to Band Event A would also reduce to BA within the new Sample Space B. So
BAP , which is the Probability of A given that B is the new Sample Space, is in
fact the Probability of BA with respect to the new Sample Space B, or the
Probability of BA when the Probability of B is 1. Hence, BAP is defined as
BPBAP .
2017 Michael TUNG
16
Theorem 10 (General Multiplication Rule of Probability, Chain Rule)
Let Ω, E, , EE n 21 are any n Events. Then,
11221332141
11EPEEPEEEPEEEEPEEPEP n
i inn
i i
ProofWe are going to prove it by Mathematical Induction.
If 2n , then 11221 EPEEPEEP , which is just the definition of
Conditional Probability;If 3n , then
11221321213321 EPEEPEEEPEEPEEEPEEEP
, which is again by the definition of Conditional Probability.Suppose 13 k, n, nn is true. This means,
factorsk
k
i ikk
i i EPEEPEEEPEEPEP
1
1122132
111
1
Then if kn , by the Induction Hypothesis we have:
k factors
k
i ikk
i ik
k
i ik
i ikk
i i
EPEEPE EPEE EPEEPEEP
EPEEPEP
112122132
111
1
1
1
1
11
Therefore, by the Principle of Mathematical Induction, the GeneralMultiplication Rule of Probability is true for any n .
Definition 20 (Independent or Statistically Independent Events)Let Ω, BA, B . A and B forms a pair of Independent Events or Statistically
Independent Events if and only if APBAP and BPABP .
Let Ω, E, , EE n 21 are any n Events.
It is a collection of Independent Events or Statistically Independent Events if andonly if it is both:
1. Pairwise Independent: jn, ii, j 1 , iE and jE forms a pair of
Independent Events or;
2017 Michael TUNG
17
2. Mutually Independent: 111 nkn, i, i, k , iE and k
i, iji
j
jE1
, where
, n, , ii k 211 , forms a pair of Independent Events.
Theorem 11 (Multiplication Rule of Independent Events)
Let Ω, E, , EE n 21 are any n Independent Events. Then:
nm, EPEP m
i im
i i 2
11 .
Proof
“ 121 nkn, i, i, k , iE andk
i, jj jE1
forms a pair of Independent
Events, so ik
i, jj ji EPEEP 1
. Also, jn, ii, j 1 , iji EPEEP
and jij EPEEP . Therefore by Theorem 10 we have:
n
i i
n
n
i inn
i i
EP
EPEPEPEPEP
EPEEPEEEPEEEEPEEPEP
1
1234
11221332141
11
”Now nm 3 , set mn in “...” and repeat the same argument as “...”.Finally, the result for 2m is trivial. Hence we obtained the result.
Theorem 12 (Pair of Independent Events has pair of Independent Complements)Let ΩA, B . If A and B forms a pair of Independent Events, then sois CA and CB .
Proof
We have to show that CCC APBAP and CCC BPABP .
By Theorem 8 (General Addition Rule of Probability), we have:
BP
BAPBPAPBP
BAPBP
BAPBP
BAPBAP C
C
C
CCCC
11
11
Since BPAPAPABPBPBAPBAP , we have:
2017 Michael TUNG
18
CAPAPBP
BPAPBP
BPAPBPAPBP
BAPBPAP
11
111
11
1
So CCC APBAP . By reversing A and B, we can obtain CCC BPABP .
Hence we proved the result.
Theorem 13 (Bayes Theorem)
Let , n, Ω, iAi 1 be a collection of Mutually Exclusive Events and
ΩAini 1 . Then for any ΩB ,
n
iii
iii
A P ABP
A P ABPBAP
1
.
Proof
n
iii
ii
n
ii
ii
n
iiii
A P ABP
A P ABP
and BAP
AP ABP and BAP and BAP
AP ABPBP
and BAPBAP
1
1
1