1: probability review [.03in] - university of sydney · outline we will review the following...

56
1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56

Upload: lexuyen

Post on 03-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

1: PROBABILITY REVIEW

Marek RutkowskiSchool of Mathematics and Statistics

University of Sydney

Semester 2, 2016

M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56

Outline

We will review the following notions:

1 Discrete Random Variables

2 Expectation of a Random Variable

3 Equivalence of Probability Measures

4 Variance of a Random Variable

5 Continuous Random Variables

6 Examples of Probability Distributions

7 Correlation of Random Variables

8 Conditional Distributions and Expectations

M. Rutkowski (USydney) Slides 1: Probability Review 2 / 56

PART 1

DISCRETE RANDOM VARIABLES

M. Rutkowski (USydney) Slides 1: Probability Review 3 / 56

Sample Space

We collect the possible states of the world and denote the set by Ω.The states are called samples or elementary events.

The sample space Ω is either countable or uncountable.

A toss of a coin: Ω = Head,Tail = H,T.The number of successes in a sequence of n identical and independenttrials: Ω = 0, 1, . . . , n.The moment of occurrence of the first success in an infinite sequenceof identical and independent trials: Ω = 1, 2, . . . .The lifetime of a light bulb: Ω = x ∈ R |x ≥ 0.

The choice of a sample space is arbitrary and thus any set can betaken as a sample space. However, practical considerations justify thechoice of the most convenient sample space for the problem at hand.Discrete (finite or infinite, but countable) sample spaces are easier tohandle than general sample spaces.

M. Rutkowski (USydney) Slides 1: Probability Review 4 / 56

Discrete Random Variables

Examples of random variables:

Prices of listed stocks.Fluctuations of currency exchange rates.Payoffs corresponding to portfolios of traded assets.

Definition (Discrete Random Variable)

A real-valued map X : Ω→ R on a discrete sample space Ω = (ωk)k∈I ,where the set I is countable, is called a discrete random variable.

Definition (Probability)

A map P : Ω 7→ [0, 1] is called a probability on a discrete sample space Ωif the following conditions are satisfied:

P.1. P(ωk) ≥ 0 for all k ∈ I,

P.2.∑

k∈I P(ωk) = 1.

M. Rutkowski (USydney) Slides 1: Probability Review 5 / 56

Probability Measure

Let F = 2Ω stand for the class of all subsets of the sample space Ω.The set 2Ω is called the power set of Ω.Note that the empty set ∅ also belongs to any power set.Any set A ∈ F is called an event (or a random event).

Definition (Probability Measure)

A map P : F 7→ [0, 1] is called a probability measure on (Ω,F) if

M.1. For any sequence Ai ⊂ F , i = 1, 2, . . . of events such thatAi ∩Aj = ∅ for all i 6= j we have

P(∪∞i=1Ai) =

∞∑i=1

P(Ai).

M.2. P(Ω) = 1.

For any A ∈ F the equality P(Ω \A) = 1− P(A) holds.M. Rutkowski (USydney) Slides 1: Probability Review 6 / 56

Probability Measure on a Discrete Sample Space

Note a probability P : Ω 7→ [0, 1] on a discrete sample space Ωuniquely specifies probabilities of all events Ak = ωk.It is common to write P(ωk) = P(ωk) = pk.

The following result shows that any probability on a discrete samplespace Ω generates a unique probability measure on (Ω,F).

Theorem

Let P : Ω 7→ [0, 1] be a probability on a discrete sample space Ω. Thenthe unique probability measure on (Ω,F) generated by P satisfies for allA ∈ F

P(A) =∑ωk∈A

P(ωk).

The proof of the theorem is easy since Ω is assumed to be discrete.

M. Rutkowski (USydney) Slides 1: Probability Review 7 / 56

Example: Coin Flipping

Example (1.2)

Let X be the number of “heads” appearing when a fair coin is tossedtwice. We choose the sample space Ω to be

Ω = 0, 1, 2

where a number k ∈ Ω represents the number of times “head” hasoccurred.

The probability measure P on Ω is defined as

P(k) =

0.25, if k = 0, 2,0.5, otherwise.

We recognise here the binomial distribution with n = 2 and p = 0.5.Note that a single flip of a coin is a Bernoulli trial.

M. Rutkowski (USydney) Slides 1: Probability Review 8 / 56

Example: Coin Flipping

Example (1.2 Continued)

We now suppose that the coin is not a fair one.

Let the probability of “head” be p for some p 6= 12 .

Then the probability measure P is given by

P(k) =

q2, if k = 0,

2pq, if k = 1,p2, if k = 2,

where q := 1− p is the probability of “tail” appearing.

We deal here with the binomial distribution with n = 2and 0 < p < 1.

M. Rutkowski (USydney) Slides 1: Probability Review 9 / 56

PART 2

EXPECTATION OF A RANDOM VARIABLE

M. Rutkowski (USydney) Slides 1: Probability Review 10 / 56

Expectation of a Random Variable

Definition (Expectation)

Let X be a random variable on a discrete sample space Ω endowed witha probability measure P. The expectation (expected value or meanvalue) of X is given by

EP (X) = µ :=∑k∈I

X(ωk)P(ωk) =∑k∈I

xkpk.

EP (·) is called the expectation operator over the probability P.

Note that the expectation of a random variable can be seen as theweighted average.

It is impossible to know the exact event to happen in the future andthus expectation is useful in making decisions when the probabilitiesof future outcomes are known.

M. Rutkowski (USydney) Slides 1: Probability Review 11 / 56

Expectation Operator

Any random variable defined on a finite set Ω admits the expectation.

When the set Ω is countable (but infinite), then we say that X isP-integrable whenever EP (|X|) =

∑ω∈Ω |X(ω)|P(ω) <∞.

In that case the expectation EP (X) is well defined (and finite).

Theorem (1.1)

Let X and Y be two P-integrable random variables and P be a probabilitymeasure on a discrete sample space Ω. Then for all α, β ∈ R

EP (αX + βY ) = αEP (X) + β EP (Y ) .

Hence EP (·) : X → R is a linear operator on the space X of P-integrablerandom variables.

M. Rutkowski (USydney) Slides 1: Probability Review 12 / 56

Expectation Operator

Proof of Theorem 1.1.

We note that

EP (|αX + βY |) ≤ |α|EP (|X|) + |β|EP (|Y |) <∞

so that the random variable αX + βY belongs to X . The linearity ofexpectation can be easily deduced from its definition

EP (αX + βY ) =∑ω∈Ω

(αX(ω) + βY (ω))P(ω)

=∑ω∈Ω

αX(ω)P(ω) +∑ω∈Ω

βY (ω)P(ω)

= α∑ω∈Ω

X(ω)P(ω) + β∑ω∈Ω

Y (ω)P(ω)

= αEP (X) + β EP (Y )

where α and β are arbitrary real numbers.M. Rutkowski (USydney) Slides 1: Probability Review 13 / 56

Expectation: Coin Flipping

Example (1.3)

A fair coin is tossed three times. The player receives one dollar eachtime “head” appears and loses one dollar when “tail” occurs.

Let the random variable X represent the player’s payoff.

The sample space Ω is defined as Ω = 0, 1, 2, 3 where k ∈ Ωrepresents the number of times “head” occurs.

The probability measure is given by

P(k) =

18 , if k = 0, 3,

38 , if k = 1, 2.

This is the binomial distribution with n = 3 and p = 12 .

M. Rutkowski (USydney) Slides 1: Probability Review 14 / 56

Expectation: Coin Flipping

Example (1.3 Continued)

The random variable X is defined as

X(k) =

−3, if k = 0,−1, if k = 1,

1, if k = 2,3, if k = 3.

Hence the player’s expected payoff equals

EP (X) =

3∑k=0

X(k)P(k)

= −3

8− 3

8+

3

8+

3

8= 0.

M. Rutkowski (USydney) Slides 1: Probability Review 15 / 56

Expectation of a Function of a Random Variable

Function of a random variable.

Let X be a random variable and P be a probability measure on adiscrete sample space Ω.

We define Y = f(X) where f : R→ R is an arbitrary function.

Then Y is also a random variable on the sample space Ω endowedwith the same probability measure P. Moreover,

EP(Y ) = EP(f(X)) =∑ω∈Ω

f(X(ω))P(ω).

If a random variable X is deterministic then EP(X) = X andEP(f(X)) = f(X).

M. Rutkowski (USydney) Slides 1: Probability Review 16 / 56

PART 3

EQUIVALENCE OF PROBABILITY MEASURES

M. Rutkowski (USydney) Slides 1: Probability Review 17 / 56

Equivalence of Probability Measures

Let P and Q be two probability measures on a discrete sample space Ω.

Definition (Equivalence of Probability Measures)

We say that the probability measures P and Q are equivalent and wedenote P ∼ Q whenever for all ω ∈ Ω

P(ω) > 0 ⇔ Q(ω) > 0.

The random variable L : Ω→ R+ given as

L(ω) =Q(ω)

P(ω)

is called the Radon-Nikodym density of Q with respect to P. Note that

EQ (X) =∑ω∈Ω

X(ω)Q(ω) =∑ω∈Ω

X(ω)L(ω)P(ω) = EP (LX) .

M. Rutkowski (USydney) Slides 1: Probability Review 18 / 56

Example: Equivalent Probability Measures

Example (Equivalent Probability Measures)

The sample space Ω is defined as Ω = 1, 2, 3, 4.Let the two probability measures P and Q be given by(

1

8,3

8,2

8,2

8

)and

(4

8,1

8,2

8,1

8

).

It is clear that P and Q are equivalent, that is, P ∼ Q.

Moreover, the Radon-Nikodym density L of Q with respectto P can be represented as follows(

4,1

3, 1,

1

2

).

Check that for any random variable X: EQ (X) = EP (LX) .

M. Rutkowski (USydney) Slides 1: Probability Review 19 / 56

PART 4

VARIANCE OF A RANDOM VARIABLE

M. Rutkowski (USydney) Slides 1: Probability Review 20 / 56

Risky Investments

When deciding whether to invest in a given portfolio, an agent maybe concerned with the “risk” of his investment.

Example (1.4)

Consider an investor who is given an opportunity to choose between thefollowing two options:

1 The investor either receives or loses 1,000 dollars with equalprobabilities. This random payoff is denoted by X1.

2 The investor either receives or loses 1,000,000 dollars with equalprobabilities. We denote this random payoff as X2.

Hence in both scenarios the expected value of the payoff equals 0

EP (X1) = EP (X2) = 0.

The following question arises: which option is preferred?

M. Rutkowski (USydney) Slides 1: Probability Review 21 / 56

Variance of a Random Variable

Definition (Variance)

The variance of a random variable X on a discrete sample set Ω isdefined as

V ar (X) = σ2 := EP

(X − µ)2

,

where P is a probability measure on Ω.

Variance is a measure of the spread of a random variable about itsmean and also a measure of uncertainty.

In financial applications, it is common to identify varianceof the price of a security with its degree of “risk”.

Note that the variance is non-negative: V ar (X) = σ2 ≥ 0.

The variance is equal to 0 if and only if X is deterministic.

M. Rutkowski (USydney) Slides 1: Probability Review 22 / 56

Variance of a Random Variable

Example (1.4 Continued)

The variance of option 1 equals

V ar (X1) =(1000− 0)2

2+

(−1000− 0)2

2= 106.

The variance of option 2 equals

V ar (X2) =(106 − 0)2

2+

(−106 − 0)2

2= 1012.

Therefore, the option represented by X2 is riskier than the optionrepresented by X1.

A risk-averse agent would prefer the first option over the second.

A risk-loving agent would prefer the second option over the first.

M. Rutkowski (USydney) Slides 1: Probability Review 23 / 56

Variance of a Random Variable

Theorem (1.2)

Let X be a random variable and P be a probability measure on a discretesample space Ω. Then the following equality holds

V ar (X) = EP(X2)− µ2.

Proof of Theorem 1.2.

V ar (X) = EP

(X − µ)2

= EP(X2 − 2µX + µ2

)(linearity)

= EP(X2)− 2µEP (X) + µ2 = EP

(X2)− µ2.

M. Rutkowski (USydney) Slides 1: Probability Review 24 / 56

Independence of Random Variables

Definition (Independence)

Two discrete random variables X and Y are independent if for allx, y ∈ R

P (X = x, Y = y) = P (X = x)P (Y = y)

where P (X = x) is the probability of the event X = x.

A useful property of independent random variables X and Y is

EP (XY ) = EP (X)EP (Y ) .

Theorem (1.3)

For independent discrete random variables X,Y and arbitrary α, β ∈ R

V ar (αX + βY ) = α2V ar (X) + β2V ar (Y ) .

M. Rutkowski (USydney) Slides 1: Probability Review 25 / 56

Independence of Random Variables

Proof of Theorem 1.3.

Let EP (X) = µX and EP (Y ) = µY . Theorem 1.1 yields

EP (αX + βY ) = αµX + βµY .

Using Theorem 1.2, we obtain

V ar (αX + βY ) = EP

(αX + βY )2− (αµX + βµY )2

= α2 EP(X2)

+ 2αβ EP (XY ) + β2 EP (Y )

− (αµX + βµY )2

= α2(EP(X2)− µ2

X

)+ β2

(EP(Y 2)− µ2

Y

)+ 2αβ (EP (X)EP (Y )− µXµY )

= α2V ar (X) + β2V ar (Y ) .

M. Rutkowski (USydney) Slides 1: Probability Review 26 / 56

PART 5

CONTINUOUS RANDOM VARIABLES

M. Rutkowski (USydney) Slides 1: Probability Review 27 / 56

Continuous Random Variables

Definition (Continuous Random Variable)

A random variable X on the sample space Ω is said to have a continuousdistribution if there exists a real-valued function f such that

f(x) ≥ 0,∫ ∞−∞

f(x) dx = 1,

and for all real numbers a < b

P (a ≤ X ≤ b) =

∫ b

af(x) dx.

Then f : R→ R+ is called the probability density function (pdf) of acontinuous random variable X.

M. Rutkowski (USydney) Slides 1: Probability Review 28 / 56

Continuous Random Variables

Assume that X is a continuous random variable.

Note that a probability density function need not satisfy theconstraint f(x) ≤ 1.

A function F (x) is called a cumulative distribution function (cdf)of a continuous random variable X if for all x ∈ R

F (x) = P (X ≤ x) =

∫ x

−∞f(y) dy.

The relationship between the pdf f and cdf F

F (x) =

∫ x

−∞f(y)dy ⇔ f(x) =

d

dxF (x).

M. Rutkowski (USydney) Slides 1: Probability Review 29 / 56

Continuous Random Variables

The expectation and variance of a continuous random variable X aredefined as follows:

EP (X) = µ :=

∫ ∞−∞

x f(x) dx,

V ar (X) = σ2 :=

∫ ∞−∞

(x− µ)2f(x) dx,

or, equivalently,

V ar (X) =

∫ ∞−∞

x2f(x) dx− µ2 = EP(X2)− (EP (X))2

The properties of expectations of discrete random variables carry overto continuous random variables, with probability measures beingreplaced by pdfs and sums by integrals.

M. Rutkowski (USydney) Slides 1: Probability Review 30 / 56

PART 6

EXAMPLES OF PROBABILITY DISTRIBUTIONS

M. Rutkowski (USydney) Slides 1: Probability Review 31 / 56

Discrete Probability Distributions

Example (Binomial Distribution)

Let Ω = 0, 1, 2, . . . , n be the sample space and let X bethe number of successes in n independent trials where p isthe probability of success in a single Bernoulli trial.

The probability measure P is called the binomial distribution if

P(k) =

(n

k

)pk(1− p)n−k for k = 0, . . . , n

where (n

k

)=

n!

k!(n− k)!.

ThenEP (X) = np and V ar (X) = np(1− p).

M. Rutkowski (USydney) Slides 1: Probability Review 32 / 56

Discrete Probability Distributions

Example (Poisson Distribution)

Let the sample space be Ω = 0, 1, 2, . . . .We take an arbitrary number λ > 0.

The probability measure P is called the Poisson distribution if

P (k) =λke−λ

k!for k = 0, 1, 2, . . . .

ThenEP (X) = λ = V ar (X) .

It is known that the Poisson distribution can be obtained as the limitof the binomial distribution when n tends to infinity and the sequencenpn tends to λ > 0.

M. Rutkowski (USydney) Slides 1: Probability Review 33 / 56

Discrete Probability Distributions

Example (Geometric Distribution)

Let Ω = 1, 2, 3, . . . be the sample space and X be the number ofindependent trials to achieve the first success.

Let p stand for the probability of a success in a single trial.

The probability measure P is called the geometric distribution if

P (k) = (1− p)k−1p for k = 1, 2, 3, . . . .

Then

EP (X) =1

pand V ar (X) =

1− pp2

.

M. Rutkowski (USydney) Slides 1: Probability Review 34 / 56

Continuous Probability Distributions

Example (Uniform Distribution)

We say that X has the uniform distribution on an interval (a, b) ifits pdf equals

f(x) =

1b−a , if x ∈ (a, b) ,

0, otherwise.

It is clear that ∫ ∞−∞

f(x) dx =

∫ b

a

1

b− adx = 1.

We have

EP (X) =a+ b

2and V ar (X) =

(b− a)2

12.

M. Rutkowski (USydney) Slides 1: Probability Review 35 / 56

Continuous Probability Distributions

Example (Exponential Distribution)

We say that X has the exponential distribution on (0,∞) with theparameter λ > 0 if its pdf equals

f(x) =

λe−λx, if x > 0,

0, otherwise.

It is easy to check that∫ ∞−∞

f(x) dx =

∫ ∞0

λe−λx dx = 1.

We have

EP (X) =1

λand V ar (X) =

1

λ2.

M. Rutkowski (USydney) Slides 1: Probability Review 36 / 56

Continuous Probability Distributions

Example (Gaussian Distribution)

We say that X has the Gaussian (normal) distribution with meanµ ∈ R and variance σ2 > 0 if its pdf equals

f(x) =1√

2πσ2e−

(x−µ)2

2σ2 for x ∈ R.

We write X ∼ N(µ, σ2).

One can show that∫ ∞−∞

f(x) dx =

∫ ∞−∞

1√2πσ2

e−(x−µ)2

2σ2 dx = 1.

We haveEP (X) = µ and V ar (X) = σ2.

M. Rutkowski (USydney) Slides 1: Probability Review 37 / 56

Continuous Probability Distributions

Example (Standard Normal Distribution)

If we set µ = 0 and σ2 = 1 then we obtain the standard normaldistribution N(0, 1) with the following pdf

n(x) =1√2π

e−x2

2 for x ∈ R.

The cdf of the probability distribution N(0, 1) equals

N(x) =

∫ x

−∞n(u) du =

∫ x

−∞

1√2π

e−u2

2 du for x ∈ R.

The values of N(x) can be found in the cumulative standardnormal table (also known as the Z table).

If X ∼ N(µ, σ2

)then Z := X−µ

σ ∼ N(0, 1).

M. Rutkowski (USydney) Slides 1: Probability Review 38 / 56

LLN and CLT

Theorem (Law of Large Numbers)

Assume that X1, X2, . . . are independent and identically distributedrandom variables with mean µ. Then with probability one,

X1 + · · ·+Xn

n→ µ as n→∞

Theorem (Central Limit Theorem)

Assume that X1, X2, . . . are independent and identically distributedrandom variables with mean µ and variance σ2 > 0. Then for all real x

limn→∞

PX1 + · · ·+Xn − nµ

σ√n

≤ x

=

∫ x

−∞

1√2π

e−u2/2 du = N(x).

M. Rutkowski (USydney) Slides 1: Probability Review 39 / 56

PART 7

CORRELATION OF RANDOM VARIABLES

M. Rutkowski (USydney) Slides 1: Probability Review 40 / 56

Covariance of Random Variables

In the real world, agents can invest in several securities and typicallywould like to benefit from diversification.

The price of one security may affect the prices of other assets. Forexample, the stock index falling may lead the price of gold to rise.

To quantify this effect, it is convenient to introduce the notion ofcovariance.

Definition (Covariance)

The covariance of two random variables X1 and X2 is defined as

Cov (X1, X2) = σ12 := EP (X1 − µ1) (X2 − µ2)

where µi = EP (Xi) for i = 1, 2.

M. Rutkowski (USydney) Slides 1: Probability Review 41 / 56

Correlated and Uncorrelated Random Variables

The covariance of two random variables is a measure of the degree ofvariation of one variable with respect to the other.

Unlike the variance of a random variable, covariance of two randomvariables may take negative values.

σ12 > 0: An increase in one variable tends to coincide withan increase in the other.σ12 < 0: An increase in one variable tends to coincide witha decrease in the other.σ12 = 0: Then the random variables X1 and X2 are said tobe uncorrelated. If σ12 6= 0 then X1 and X2 are correlated.

Definition (Uncorrelated Random Variables)

We say that the random variables X1 and X2 are uncorrelated whenever

Cov (X1, X2) = σ12 = 0.

M. Rutkowski (USydney) Slides 1: Probability Review 42 / 56

Properties of the Covariance

Theorem (1.4)

The following properties are valid:

Cov (X,X) = V ar (X) = σ2.

Cov (X1, X2) = EP (X1X2)− µ1µ2.

EP (X1X2) = EP (X1)EP (X2) if and only if X1 and X2 areuncorrelated, that is, Cov (X1, X2) = 0.

V ar (a1X1 + a2X2) = a21σ

21 + a2

2σ22 + 2a1a2σ12.

V ar (X1 +X2) = V ar (X1) + V ar (X2) if and only if X1 andX2 are uncorrelated.

If X1 and X2 are independent then they are uncorrelated.

The converse is not true: it may happen that X1 and X2 areuncorrelated, but they are not independent.

M. Rutkowski (USydney) Slides 1: Probability Review 43 / 56

Correlation Coefficient

We can normalise the covariance measure. We obtain in this way theso-called correlation coefficient.

Note that the correlation coefficient is only defined when σ1 > 0 andσ2 > 0, that is, none of the two random variables is deterministic.

Definition (Correlation Coefficient)

Let X1 and X2 be two random variables with variances σ21 and σ2

2 andcovariance σ12. Then the correlation coefficient

ρ = ρ(X1, X2) = corr (X1, X2)

is defined by

ρ :=σ12

σ1σ2=

Cov (X1, X2)√V ar (X1)V ar (X2)

.

M. Rutkowski (USydney) Slides 1: Probability Review 44 / 56

Properties of the Correlation Coefficient

Theorem (1.5)

The correlation coefficient satisfies −1 ≤ ρ ≤ 1. The random variables X1

and X2 are uncorrelated whenever ρ = 0.

Proof.

The result follows from an application of the Cauchy-Schwarz inequality:(∑n

k=1 akbk)2 ≤ (

∑nk=1 a

2k)(∑n

k=1 b2k).

ρ = 1: If one variable increases, the other will also increase.

0 < ρ < 1: If one increases, the other will tend to increase.

ρ = 0: The two random variables are uncorrelated.

−1 < ρ < 0: If one increases, the other will tend to decrease.

ρ = −1: If one increases, the other will decrease.

M. Rutkowski (USydney) Slides 1: Probability Review 45 / 56

Linear Regression in Basic Statistics

If we observe the time series of prices of two securities the followingquestion arises: how are they related to each other?

Consider two random variables X and Y on the sample spaceΩ = ω1, ω2, . . . , ωn. Denote X(ωi) = xi and Y (ωi) = yi.

A linear fit is a relation of the following form, for α, β ∈ R,

yi = α+ βxi + εi.

The number εi is called the residual of the ith observation.

In Statistics, the best linear fit to observed data is obtained throughthe method of least-squares: minimise Σn

i=1ε2i . The line y = α+ βx is

called the least-squares linear regression.

We denote by ε the random variable such that ε(ωi) = εi. This meansthat we set ε := Y − α− βX.

M. Rutkowski (USydney) Slides 1: Probability Review 46 / 56

Linear Regression in Probability Theory

In Probability Theory, if α and β are chosen to minimise

EP

(Y − α− βX

)2= EP

(ε2)

= Σni=1ε

2(ωi)P(ωi)

then the random variable `Y |X := α+ βX is called the linearregression of Y on X.The next result is valid for both discrete and continuous randomvariables.

Theorem (1.6)

The minimizers α and β are

α = µY − βµX and β =σXYσ2X

.

Hence the linear regression of Y on X is given by

`Y |X =σXYσ2X

(X − µX) + µY .

M. Rutkowski (USydney) Slides 1: Probability Review 47 / 56

Covariance Matrix

Recall that the covariance of random variables Xi and Xj equals

Cov (Xi, Xj) = σij := EP (Xi − µi) (Xj − µj)

where µi = EP (Xi) and µj = EP (Xj).

Definition (Covariance Matrix)

We consider random variables Xi for i = 1, 2, . . . , n with variancesσ2i = σii and covariances σij . The matrix S given by

S :=

σ2

1 σ12 · · · σ1n

σ21 σ22 · · · σ2n

......

. . ....

σn1 σn2 · · · σ2n

is called the covariance matrix of (X1, . . . , Xn).

M. Rutkowski (USydney) Slides 1: Probability Review 48 / 56

Correlation Matrix

Definition (Correlation Matrix)

We consider random variables Xi for i = 1, 2, . . . , n with variancesσ2i = σii and covariances σij . The matrix P given by

P :=

1 ρ12 · · · ρ1n

ρ21 1 · · · ρ2n...

.... . .

...ρn1 ρn2 · · · 1

is called the correlation matrix of (X1, . . . , Xn).

S and P are symmetric and positive-definite matrices.S = DPD where D2 is a diagonal matrix whose main diagonalcoincides with the main diagonal in S.If Xis are independent (or uncorrelated) random variables then S is adiagonal matrix and P = I is the identity matrix.

M. Rutkowski (USydney) Slides 1: Probability Review 49 / 56

Linear Combinations of Random Variables

Theorem (1.7)

Assume that the random variables Y1 and Y2 are some linear combinationsof Xi, that is, there exists vectors aj ∈ Rn for j = 1, 2 such that

Yj =

n∑i=1

ajiXi =

aj1aj2...

ajn

T

X1

X2...Xn

=⟨aj , X

where 〈·, ·〉 denotes the inner product in Rn. Then

Cov (Y1, Y2) =(a1)TS a2 =

(a1)TDPDa2.

M. Rutkowski (USydney) Slides 1: Probability Review 50 / 56

PART 8

CONDITIONAL DISTRIBUTIONS AND EXPECTATIONS

M. Rutkowski (USydney) Slides 1: Probability Review 51 / 56

Conditional Distributions and Expectations

Definition (Conditional Probability)

For two random variables X1 and X2 and an arbitrary set A such thatP(X2 ∈ A) 6= 0, we define the conditional probability

P(X1 ∈ A1 |X2 ∈ A2) :=P(X1 ∈ A1, X2 ∈ A2)

P(X2 ∈ A2)

and the conditional expectation

EP(X1 |X2 ∈ A) :=EP(X11X2∈A)

P(X2 ∈ A)

where 1X2∈A : Ω→ 0, 1 is the indicator function of X2 ∈ A, that is,

1X2∈A(ω) =

1, if X2(ω) ∈ A,0, otherwise.

M. Rutkowski (USydney) Slides 1: Probability Review 52 / 56

Discrete Case

Assume that X and Y are discrete random variables

pi = P(X = xi) > 0 for i = 1, 2, . . . and∞∑i=1

pi = 1,

pj = P(Y = yj) > 0 for j = 1, 2, . . . and∞∑j=1

pj = 1.

Definition (Conditional Distribution and Expectation)

Then the conditional distribution equals

pX|Y (xi | yj) = P(X = xi |Y = yj) :=P(X = xi, Y = yj)

P(Y = yj)=pi,jpj

and the conditional expectation EP(X |Y ) is given by

EP(X |Y = yj) :=

∞∑i=1

xi P(X = xi |Y = yj) =

∞∑i=1

xipi,jpj

.

M. Rutkowski (USydney) Slides 1: Probability Review 53 / 56

Discrete Case

It is easy to check that

pi = P(X = xi) =

∞∑j=1

P(X = xi |Y = yj)P(Y = yj) =

∞∑j=1

pX|Y (xi | yj) pj .

The expected value EP(X) satisfies

EP(X) =

∞∑j=1

EP(X |Y = yj)P(Y = yj).

Definition (Conditional cdf)

The conditional cdf FX|Y (· | yj) of X given Y is defined for all yj such thatP(Y = yj) > 0 by

FX|Y (x | yj) := P(X ≤ x |Y = yj) =∑xi≤x

pX|Y (xi | yj).

Hence EP(X |Y = yj) is the mean of the conditional distribution.

M. Rutkowski (USydney) Slides 1: Probability Review 54 / 56

Continuous Case

Assume that the continuous random variables X and Y have the jointpdf fX,Y (x, y).

Definition (Conditional pdf and cdf)

The conditional pdf of Y given X is defined for all x such thatfX(x) > 0 and equals

fY |X(y |x) :=fX,Y (x, y)

fX(x)for y ∈ R.

The conditional cdf of Y given X equals

FY |X(y |x) := P(Y ≤ y |X = x) =

∫ y

−∞

fX,Y (x, u)

fX(x)du.

M. Rutkowski (USydney) Slides 1: Probability Review 55 / 56

Continuous Case

Definition (Conditional Expectation)

The conditional expectation of Y given X is defined for all x such thatfX(x) > 0 by

EP(Y |X = x) :=

∫ ∞−∞

y dFY |X(y |x) =

∫ ∞−∞

yfX,Y (x, y)

fX(x)dy.

An important property of conditional expectation is that

EP(Y ) = EP(EP(Y |X)) =

∫ ∞−∞

EP(Y |X = x)fX(x) dx.

Hence the expected value EP(Y ) can be determined by firstconditioning on X (in order to compute EP(Y |X)) and thenintegrating with respect to the pdf of X.

M. Rutkowski (USydney) Slides 1: Probability Review 56 / 56