introduction to probability theory -...

2017 Michael TUNG

1

Introduction to Probability Theory

Probability Theory is a branch of Mathematics concerned with MathematicalStructures or Models called Probability Space and from which we can define anAbstract Mathematical Object called Probability Measure or simply Probabilitywhich is applied to Quantify Uncertainty so that it enables us to study Uncertainty.As a common treatment in Mathematics we will adopt the Axiomatic approach toestablish the Probability Theory by starting with some primitive concepts andAxioms so that we can develop the discussion of this important MathematicalTheory in both Pure Mathematics and in Statistics.

Definition 1 (Statistical Population)A Population U is a set or class or collection of Objects which is of interest forsome Study or experiment. A Population U can be regarded as the UniversalSet in a given context.

Definition 2 (Statistical Object and Attribute)Let U be a Population. Any Ux is called an Object.All Objects of any given Population have some common Attributes or Propertieswhich are generally Unary Functions and Unary Predicates. The set of allcommon Attributes of all Objects of a given Population U is denoted by

A : aV : Uf aa , where A is a Countable Set of Attribute Identifiers or

Headers, which are simply Variable Identifiers and A, aVa is the set of all

possible values of Attribute A, aV : Uf aa . Any Attribute A, aV : Uf aa

can be called by Aa .

Example 1In a study of all Mathematicians working in any World Top 50. The PopulationU would be the set of all current Faculty members of Mathematics Departmentof all World Top 50 universities. Then the set of all common Attributes could be

, ..., f, f, f, f, ff Paper No.WeightHeightyNationalitSexAge , where 1000 x: xVAge ;

leFemale, MaVSex ; 0 : xxVV WeightHeight and Paper No.V (Set of

all Natural Numbers).

2017 Michael TUNG

2

Definition 3 (Random Experiment)A Random Experiment is any process which:1. has a well-defined set of possible outcomes (more than one possible

outcome);2. can generate an outcome;3. can be infinitely repeated;4. is impossible to estimate exactly which outcome would occur.

Definition 4 (Random Trial)Any actual execution of a Random Experiment is a Random Trial.

Definition 5 (Sample Space and Outcome)A Sample Space is the set of all possible Outcomes of a Random Experiment. It iscommonly denoted byΩ .

Definition 6 (Event)Any subset E of a given Sample Space Ω is an Event. That is, an Event E is anyset of possible outcomes. Note that any outcome Ωω can be regarded as the

Event ΩωE .

An Event is said to occur if and only if any outcome of that Event occurs.

We now consider the process of randomly selecting an Object u from a given

Population U and observing the value A, aVv aa of a common Attribute

A, afa from u. If after the observation we put u back to U, then this process

theoretically can be repeated infinite times and it has a well-defined set of

possible outcomes such that the observed value A, aVv aa is impossible to

estimate exactly, due to Randomness. Therefore this process is a Random

Experiment with Sample Space A, aVΩ a .

Definition 7 (Random Experiment of randomly selecting and observing a

common Attribute’s value from an Object of a Population)

Let U be a Population and A : aV : Uf aa be the set of all common

Attributes of all Objects of U .

2017 Michael TUNG

3

Then the Sample Space of a Random Experiment of randomly selecting an

Object Uu and observing the value A, aVv aa of a common Attribute

A, afa from Uu is defined as A, aVΩ a .

Definition 8 (Random Variable)A Random Variable, abbreviated as RV and is commonly denoted by X, Y or Z, isa Function or Mapping from a Sample SpaceΩ to a set of Real Numbers .

Formally, a RV can be expressed as CX : Ω such that xωXω ,

where Ωω is a possible outcome.A RV X is a Discrete Random Variable if and only if or ZX : Ω .A RV X is a Continuous Random Variable if and only if X : Ω .Therefore, a RV is simply a way to quantify all elements of a Sample Space and

all elements of the Power Set of the Sample Space ΩE : EΩ (set or class

of all possible Events).

Since the Random Experiment of randomly selecting, with replacement, an

Object u from a given Population U and observing the value A, aVv aa of a

common Attribute A, afa from the selected u has Sample Space A, aVΩ a ,

so a Random Variable aX : V can be defined here and hence there is a

One-One Correspondence between a common Attribute A, afa and a RV X. Or

we can informally regard a common Attribute A, afa as its RV X. So we have

the following definition.

Definition 9 (Values of a Random Variable as Sample Space)Let U be a Population. Then the Sample Space of a Random Experiment ofrandomly selecting an Object Uu , with replacement, and observing the

value A, aVv aa of a common Attribute A, afa from Uu is defined as

aVXΩ , where X is the RV of A, afa , i.e. aX : V .

2017 Michael TUNG

4

Therefore, we can informally regard any common Attribute A, afa as its RV X.

Whether X is Discrete or Continuous depends on the nature of A, aVa .

Example 2

In Example 1, for the common Attribute yNationalitf , if we define the Random

Variable X by yNationalitX : V such that X(UK)=1; X(US)=2; X(France)=3;

X(Germany)=4; X(Russia)=5; X(Japan)=6; X(India)=7, and randomly select aMathematician from the Population U , with replacement, and observe the

value yNationalityNationalit Vv of yNationalitf from the selected Mathematician, then the

Sample Space will be 71 n : nVXΩ yNationalit .

We know that the Random Experiment of randomly selecting an Object u from a

given Population U and observing the value A, aVv aa of a common

Attribute A, afa from Uu has Sample Space aVXΩ , where X is the

RV of A, afa , i.e. aX : V . Since we cannot estimate exactly which

outcome Ωx would occur, we need a measure on the likelihood of theoccurrence of an Event. This measure is called Probability. There are two mainways of assigning a Probability measure to any given Event of a RandomVariable X.

Definition 10 (Theoretical Probability or Classical Probability)If all possible outcomes of a Sample Space Ω are Equally Probable, then theTheoretical Probability or Classical Probability of any Event ΩE , denoted

by EP , is defined as the ratio of the number of outcomes of E to the number of

all possible outcomes. That is, 10 ΩE

EP , where S is the Cardinality of

Set S.

In terms of Random Variable X, we have ΩXEX

EXPEP .

2017 Michael TUNG

5

Notice that Definition 10 has a critical drawback since the formula of EP

breaks down if the Sample Space is an Uncountable Set. Because in this case theCardinality of an Uncountable Set is Mathematically at least equal to that of .

If E is a Countable Set, then EP is zero (since 0

E

ΩE

E ). However,

if E is also an Uncountable Set, then we can still work out EP accordingly as

shown in the following example.

Example 3 (Geometric Probability)Consider a Random Experiment of a simple dart game. The dartboard hasradius 9cm and on which there is a big bullseye with radius 3cm.

Then 91

819

ππ

rtboardArea of dallseyeArea of buseyes the bullA dart hitP .

So Example 3 tells us that even both the Event and the Sample Space areUncountable sets like all points in a plane region or in a line segment, but we can

still base on Definition 10 to calculate these Theoretical probabilities since

can be finite.

Definition 11 (Experimental Probability or Relative Frequency)Let Ω be the Sample Space of a Random Experiment. The Experimental

Probability or Relative Frequency of any Event ΩE , denoted by EP , is

defined as the ratio of the number of times E occurred to the number of total

Trials the Random Experiment executed. Note that 10 EP .

Example 4In Example 3, consider throwing darts 1000 times such that each time the landeddart will be removed from the dartboard. Assume the dartboard is smooth aftereach throw and there are 115 times the dart landed on the bullseye, then

1000115

seyes the bullA dart hitP .

2017 Michael TUNG

6

Even now we have two seemingly different definitions for the ProbabilityConcept, but later on we are able to show that under any given Sample Spacethese two definitions will produce the same Probability of any Event, byconsidering the Sampling Distribution of the Sample Mean of a BernoulliProbability Distribution Model and applying The Central Limit Theorem.

Recall that the Sample Space of the Random Experiment of randomly selecting

an Object u from a given Population U and observing the value A, aVv aa

of a common Attribute A, afa from Uu is aVXΩ , where X is the RV

of A, afa , i.e. aX : V . Since the occurrence of any particular value of X is

uncertain, so the concept of Probability can be applied to define a Function or aMapping which assigns Probabilities to each and every value of X. Also someproperties regarding the concept of Probability also apply to X.

Definition 12 (Population Distribution)

The Population Distribution of a common Attribute A, afa or its respective RV

aX : V , is defined as a Function or Mapping 10, Vp : X a such

that xvE, Xv : VEPxXPp(x)x a which is the unique

Theoretical Probability (i.e. The ratio of En to aVn ) assigned to aVE .

Example 5

Consider the common Attribute Researchf in a Population of 1000 Mathematicians.

Among these, there are: 150 research on Analysis; 100 research on Algebra; 400research on Differential Equation (DE); 50 research on Mathematical Logic (ML)and 300 research on Probability and Statistics (PS). Define a RV:

DVPS, , ML, , DE, , , Algebra, Analysis, X Research 54321 , where

54321 , , , , D ,

then the Population Distribution of Researchf or X is given by the Function

10, p : D such that:

2017 Michael TUNG

7

Research Analysis Algebra DifferentialEquation

MathematicalLogic

Probabilityand

StatisticsProbability 0.15 0.1 0.4 0.05 0.3

The Population Distribution is theoretically an Empirical Distribution in whichall probabilities are computed by the Theoretical Probability approach. But inpractice, it is impossible or very difficult to obtain the Empirical PopulationDistribution since we can only have limited knowledge about the entirePopulation which is usually an Infinite Set, so a Hypothetical Distribution isusually suggested instead.

Definition 13 (Probability Distribution of a Discrete Random Variable)Let the Sample Space of a Random Experiment of randomly selecting an Object

u from a given Population U and observing the value A, aVv aa of a

Discrete common Attribute A, afa from Uu (i.e. aV is a set of Discrete values)

is or ΖVXΩ a , where X is a Discrete Random Variable of A, afa .

Then we define:1. The Probability Distribution of the Discrete Random Variable X or the

Probability Mass Function (PMF) of X by 10, : Ωf X such that

xvE, Xv : VEPxXP(x)fx aX which satisfies:

a. 10 (x)fΩ, x X ;

b. 1 Ωx

Xa (x)fΩPVP ;

c.

EXx

Xa (x)fEXPE, PVE .

2. The Cumulative Distribution Function (CDF) of the Discrete Random Variable

X, 10, : ΩFX , which is an Increasing Function defined on Ω by

xXP(x)Fx X

xt

Xa (t)fxvE, Xv : VEP .

2017 Michael TUNG

8

Definition 14 (General Probability Distribution of a Discrete Random Variable)

Let X : Ω be a Discrete Random Variable, where Ω is a Sample Space.Then we define:1. The Probability Distribution of the Discrete Random Variable X or the

Probability Mass Function (PMF) of X by 10, Ω : Xf X such that

xvE, XvΩ : EPxXP(x)fx X which satisfies:

a. 10 (x)f, ΩXx X ;

b.

1 ΩXx

X (x)fΩXPΩP ;

c.

EXx

X (x)fEXPEΩ, PE .

2. The Cumulative Distribution Function (CDF) of the Discrete Random Variable

X, 10, Ω : XFX , which is an Increasing Function defined on ΩX by

xXP(x)Fx X

xt

X (t)fxvE, XvΩ : EP .

Definition 15 (Probability Distribution of a Continuous Random Variable)Let the Sample Space of a Random Experiment of randomly selecting an Object

u from a given Population U and observing the value A, aVv aa of a

Continuous common Attribute A, afa from Uu (i.e. aV is a set of Continuous

values) be aVXΩ , where X is a Continuous Random Variable of A, afa .

Then we define:

1. A One to One Correspondence 10, : ΩFX called Cumulative Distribution

Function (CDF) of the Continuous Random Variable X by

xXP(x)Fx X xvE, Xv : VEP a such that:

a. XF is Increasing. That is, )(xF)(xFxΩ, x, xx XX 212121 ;

b. 0

(x)Flim Xx;

c. 1

(x)Flim Xx.

2017 Michael TUNG

9

2. If F is Differentiable on Ω such that (x)f(x)Fdxd

XX is Continuous on Ω and

dt(t)f(x)Fx

XX , then (x)F

dxd(x)f XX is defined as the Probability

Distribution of the Continuous Random Variable X or the Probability Density

Function (PDF) of X. 10, : Ωf X is defined by:

xvE, Xv : VEPxXP(x)fx aX such that

(x)dF dx(x)fEXPE, PVEEXx

XEXx

Xa

.

Definition 16 (General Probability Distribution of a Continuous RV)In general, let X : Ω be a Continuous Random Variable, where Ω is aSample Space. Then we define:

1. A One to One Correspondence 10, Ω : XFX called Cumulative

Distribution Function (CDF) of the Continuous Random Variable X by

xXP(x)Fx X xvE, XvΩ : EP such that:

a. XF is Increasing. That is, )(xF)(xFx, xΩX, xx XX 212121 ;

b. 0

(x)Flim Xx;

c. 1

(x)Flim Xx.

2. If F is Differentiable on ΩX such that (x)f(x)Fdxd

XX is Continuous on

ΩX and dt(t)f(x)Fx

XX , then (x)F

dxd(x)f XX is defined as the

Probability Distribution of the Continuous Random Variable X or the

Probability Density Function (PDF) of X. 10, Ω : Xf X is defined by:

xvE, XvΩ : EPxXP(x)fx X such that

(x)dF dx(x)fEXPEΩ, PEEXx

XEXx

X

.

2017 Michael TUNG

10

Theorem 1 (Obvious without Proof)

If the PDF of a Continuous Random Variable X is 10, Ω : Xf X such that

xvE, XvΩ : EPxXP(x)fx X , then:

dx(x)f dx(x)fEXPE, PΩXa, bEXb

a XEXx

X

.

Theorem 2



0 xvE, XvΩ : EPxXP(x)f X .

Proof

Since xEX , so

00

x

x XEXx

XX (x)f dx(x)fEPxXP(x)f .

Theorem 3



1

dx(x)f dx(x)fΩXPΩP XΩXx

X .

Proof

Let 0 , bb . Since ΩX can be at most , , so we have:

b bXb

b

b XbXΩXx

X (x)F lim dx(x)flim dx(x)f dx(x)fΩXPΩP

1

(b)Flim(b)Flimb)(Flim(b)Flimb)(F(b)Flim XbXbXbXbXXb.

2017 Michael TUNG

11

In the most general case, for any Discrete Sample Space Ω , we have thefollowing Mathematical Structures or Models called Probability Space whichsatisfy all Axioms and Theorems of Probability Theory, and it provides the mostgeneral concept of Discrete Probability Distribution or Discrete ProbabilityMeasure.

Definition 17 (Probability Space, Model of Probability Theory)

ATwo-sorted Mathematical Structure , P, , , Ω, M , where:

1. Domain Ω called Sample Space which is a set or class of objects calledOutcomes;

2. Domain ΩΩX : X which is a set of Events;

3. , , are the usual Difference, Union and Intersection Operations of Setdefined on ;

4. 10, P : is a Real-valued Function called Probability Measure,

is an Abstract Mathematical Structure called Probability Space if and only if itsatisfies:

1. ΩΩX : XΩ ;

2. E , its Complement Event EΩE C ;

3.

1

21i ii E, , , iE ;

4. 1ΩP ;

5.

1121

i ii ikji EPEPkj, EE, , , iE .

Theorem 4

Let , P, , , Ω, M be a Probability Space. Then 0P .

Proof

011 PPPΩPΩPΩP .

Theorem 5

Let , P, , , Ω, M be a Probability Space. Then ,E , 0EP .

2017 Michael TUNG

12

Proof

This is obvious from the definition of Probability Measure 10, P : .

Theorem 6

Let , P, , , Ω, M be a Probability Space. Then 21 , EE and

21 EE , we have 21 EPEP .

Proof (Countable Case)

Let nmm , ω, , ω, , ω, ωω, E, ω, , ω, ωωE 32123211 . Then both

Events can be expressed as m

i iωE11

and n

i iωE12

, where

n,, mm, n iΩ, ωi . Notice that iω we have

k, jj, k, ωω kj .

So 211111 EPωPωPωPωPEP n

i in

i im

i im

i i .

For the Uncountable Case, we have to invoke the Mathematical Structure ofMeasure Space.

Theorem 7

Let , P, , , Ω, M be a Probability Space. Then ,E we have

10 EP .

Proof

By Theorem 5, we have 0EP . Now by Theorem 6, since ΩE and Ω , so

we have 1 ΩPEP . Hence 10 EP .

Theorem 8 (General Addition Rule of Probability, Inclusion-Exclusion Principle)

Let Ω, E, , EE n 21 are any n Events. Then,

2017 Michael TUNG

13

n

i in

nji jin

i in

i i EPEEPEPEP1

1111

1

.

ProofNotice that it is not difficult to see that the Countable Union on the Left HandSide can be recovered from the Sets on the Right Hand Side by Set Operationsand Venn Diagram. So we only need to prove both sides have the sameCardinalities:

n

i in

nji jin

i in

i i EEEEE1

1111

1

. We are going to

prove it by Mathematical Induction.

If 2n , then by drawing Venn Diagram we have termC termsC

EEEEEE

22

21

212121 ;

If 3n , then by drawing Venn Diagram again, clearly we have:

termC termsC termsC

EEEEEEEEEEEEEEE

33

32

31

321323121321321

Suppose 13 m, n, mm is true. This means,

termC

m

i im

termsC

mji ji

termsC

m

i im

i i

mmmm

EEEEE

111

21

1

1

1

211

1

1

1

11

Then if mn , by the Induction Hypothesis we have:

1

1

1

1

2111

1

1

1

1

211

1

1

1

1

1

1

1

11

1

1

111

21

1

m

i mim

i im

mji jim

i i

mm

i im

termC

m

i im

termsC

mji ji

termsC

m

i i

mm

i imm

i imm

i im

i i

EEEEEE

EEEEEEE

EEEEEEEmmmm

Apply the Induction Hypothesis to the last term wehave:

termC

m

i mim

termsC

m

i

m

i, jj mjm

termsC

mji mji

termsC

m

i mi

m

i mi

mm

mmmm

EE

EEEEEEE

EE

11

121

21

1

1

1

2

1

1

1

1

311

1

1

1

1

1

1

2017 Michael TUNG

14

m

i im

m

i

m

i, jj mjm

mji mjim

i mi

E

EEEEEEE

1

2

1

1

1

1

311

1

1

1

1

Substitute this into the last equality ofm

i iE1

we have:

m

i imm

i

m

i, jj mjmm

i im

mji mjimkji kjimji jim

i i

m

i i

EEEE

EEEEEEEEE

E

1

11

1

1

1

21

1

2

111111

1

111

m

i imm

i

m

i, jj jm

m

i im

mkji kjimji jim

i i

EE

EEEEEEE

1

11

1 1

2

1

1

2111

11

1

termC

m

i im

termsC

m

i

m

i, jj jm

termsC

mkji kji

termsC

mji ji

termsC

m

i i

mm

mmmmm

E

EEEEEEE

1

1

1 1

2111

1

1

1321

Therefore, by the Principle of Mathematical Induction, the General AdditionRule of Probability is true for any n .

Definition 18 (Mutually Exclusive Events)Let Ω, BA, B . A and B forms a pair of Mutually Exclusive Events if and

only if BA or equivalently 0 BAP .

Let Ω, E, , EE n 21 are any n Events.

It is a collection of Mutually Exclusive Events if and only if jn, ii, j 1 ,

ji EE or equivalently 0 ji EEP .

Theorem 9 (Addition Rule of Mutually Exclusive Events)

Let Ω, E, , EE n 21 are any n Mutually Exclusive Events. Then:

nk, EPEP k

i ik

i i 2

11 .

2017 Michael TUNG

15

Proof

“Since jn, ii, j 1 , ji EE , this implies nj 2 and

, n, , ii j 211 , we have ji

ii iE1

. So by Theorem 8 we have:

n

i in

i in

nji jin

i in

i i EPEPEEPEPEP11

1111

1

”Now nk 2 , set kn in “...” and repeat the same argument as “...”.Hence we obtained the result.

Corollary 1 (Probability of Complement Event) (Obvious without Proof)

ΩE , the Probability of the Complement Event CE is EPEP C 1 .

Definition 19 (Conditional Probability and Multiplication Rule of Probability)Let Ω, BA, B . The Probability of A given that B occurred (so B becomesthe new Sample Space), called the Conditional Probability of A given B, denoted

by BAP , is defined by BP

BAPBAP .

In this regard, we have ΩEPΩP

ΩEPEPEΩ, PE

1

.

From this definition, we obtain the Basic Multiplication Rule of Probability, that

is, APABPBPBAPBAP if A, B .

In Definition 19, if Event B occurred, then the Sample Space would reduce to Band Event A would also reduce to BA within the new Sample Space B. So

BAP , which is the Probability of A given that B is the new Sample Space, is in

fact the Probability of BA with respect to the new Sample Space B, or the

Probability of BA when the Probability of B is 1. Hence, BAP is defined as

BPBAP .

2017 Michael TUNG

16

Theorem 10 (General Multiplication Rule of Probability, Chain Rule)

Let Ω, E, , EE n 21 are any n Events. Then,

11221332141

11EPEEPEEEPEEEEPEEPEP n

i inn

i i

ProofWe are going to prove it by Mathematical Induction.

If 2n , then 11221 EPEEPEEP , which is just the definition of

Conditional Probability;If 3n , then

11221321213321 EPEEPEEEPEEPEEEPEEEP

, which is again by the definition of Conditional Probability.Suppose 13 k, n, nn is true. This means,

factorsk

k

i ikk

i i EPEEPEEEPEEPEP

1

1122132

111

1

Then if kn , by the Induction Hypothesis we have:

k factors

k

i ikk

i ik

k

i ik

i ikk

i i

EPEEPE EPEE EPEEPEEP

EPEEPEP

112122132

111

1

1

1

1

11

Therefore, by the Principle of Mathematical Induction, the GeneralMultiplication Rule of Probability is true for any n .

Definition 20 (Independent or Statistically Independent Events)Let Ω, BA, B . A and B forms a pair of Independent Events or Statistically

Independent Events if and only if APBAP and BPABP .

Let Ω, E, , EE n 21 are any n Events.

It is a collection of Independent Events or Statistically Independent Events if andonly if it is both:

1. Pairwise Independent: jn, ii, j 1 , iE and jE forms a pair of

Independent Events or;

2017 Michael TUNG

17

2. Mutually Independent: 111 nkn, i, i, k , iE and k

i, iji

j

jE1

, where

, n, , ii k 211 , forms a pair of Independent Events.

Theorem 11 (Multiplication Rule of Independent Events)

Let Ω, E, , EE n 21 are any n Independent Events. Then:

nm, EPEP m

i im

i i 2

11 .

Proof

“ 121 nkn, i, i, k , iE andk

i, jj jE1

forms a pair of Independent

Events, so ik

i, jj ji EPEEP 1

. Also, jn, ii, j 1 , iji EPEEP

and jij EPEEP . Therefore by Theorem 10 we have:

n

i i

n

n

i inn

i i

EP

EPEPEPEPEP

EPEEPEEEPEEEEPEEPEP

1

1234

11221332141

11

”Now nm 3 , set mn in “...” and repeat the same argument as “...”.Finally, the result for 2m is trivial. Hence we obtained the result.

Theorem 12 (Pair of Independent Events has pair of Independent Complements)Let ΩA, B . If A and B forms a pair of Independent Events, then sois CA and CB .

Proof

We have to show that CCC APBAP and CCC BPABP .

By Theorem 8 (General Addition Rule of Probability), we have:

BP

BAPBPAPBP

BAPBP

BAPBP

BAPBAP C

C

C

CCCC

11

11

Since BPAPAPABPBPBAPBAP , we have:

2017 Michael TUNG

18

CAPAPBP

BPAPBP

BPAPBPAPBP

BAPBPAP

11

111

11

1

So CCC APBAP . By reversing A and B, we can obtain CCC BPABP .

Hence we proved the result.

Theorem 13 (Bayes Theorem)

Let , n, Ω, iAi 1 be a collection of Mutually Exclusive Events and

ΩAini 1 . Then for any ΩB ,

n

iii

iii

A P ABP

A P ABPBAP

1

.

Proof

n

iii

ii

n

ii

ii

n

iiii

A P ABP

A P ABP

and BAP

AP ABP and BAP and BAP

AP ABPBP

and BAPBAP

1

1

1

introduction to probability theory -...

Documents