1: probability review [.03in] - university of sydney · outline we will review the following...
TRANSCRIPT
1: PROBABILITY REVIEW
Marek RutkowskiSchool of Mathematics and Statistics
University of Sydney
Semester 2, 2016
M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56
Outline
We will review the following notions:
1 Discrete Random Variables
2 Expectation of a Random Variable
3 Equivalence of Probability Measures
4 Variance of a Random Variable
5 Continuous Random Variables
6 Examples of Probability Distributions
7 Correlation of Random Variables
8 Conditional Distributions and Expectations
M. Rutkowski (USydney) Slides 1: Probability Review 2 / 56
Sample Space
We collect the possible states of the world and denote the set by Ω.The states are called samples or elementary events.
The sample space Ω is either countable or uncountable.
A toss of a coin: Ω = Head,Tail = H,T.The number of successes in a sequence of n identical and independenttrials: Ω = 0, 1, . . . , n.The moment of occurrence of the first success in an infinite sequenceof identical and independent trials: Ω = 1, 2, . . . .The lifetime of a light bulb: Ω = x ∈ R |x ≥ 0.
The choice of a sample space is arbitrary and thus any set can betaken as a sample space. However, practical considerations justify thechoice of the most convenient sample space for the problem at hand.Discrete (finite or infinite, but countable) sample spaces are easier tohandle than general sample spaces.
M. Rutkowski (USydney) Slides 1: Probability Review 4 / 56
Discrete Random Variables
Examples of random variables:
Prices of listed stocks.Fluctuations of currency exchange rates.Payoffs corresponding to portfolios of traded assets.
Definition (Discrete Random Variable)
A real-valued map X : Ω→ R on a discrete sample space Ω = (ωk)k∈I ,where the set I is countable, is called a discrete random variable.
Definition (Probability)
A map P : Ω 7→ [0, 1] is called a probability on a discrete sample space Ωif the following conditions are satisfied:
P.1. P(ωk) ≥ 0 for all k ∈ I,
P.2.∑
k∈I P(ωk) = 1.
M. Rutkowski (USydney) Slides 1: Probability Review 5 / 56
Probability Measure
Let F = 2Ω stand for the class of all subsets of the sample space Ω.The set 2Ω is called the power set of Ω.Note that the empty set ∅ also belongs to any power set.Any set A ∈ F is called an event (or a random event).
Definition (Probability Measure)
A map P : F 7→ [0, 1] is called a probability measure on (Ω,F) if
M.1. For any sequence Ai ⊂ F , i = 1, 2, . . . of events such thatAi ∩Aj = ∅ for all i 6= j we have
P(∪∞i=1Ai) =
∞∑i=1
P(Ai).
M.2. P(Ω) = 1.
For any A ∈ F the equality P(Ω \A) = 1− P(A) holds.M. Rutkowski (USydney) Slides 1: Probability Review 6 / 56
Probability Measure on a Discrete Sample Space
Note a probability P : Ω 7→ [0, 1] on a discrete sample space Ωuniquely specifies probabilities of all events Ak = ωk.It is common to write P(ωk) = P(ωk) = pk.
The following result shows that any probability on a discrete samplespace Ω generates a unique probability measure on (Ω,F).
Theorem
Let P : Ω 7→ [0, 1] be a probability on a discrete sample space Ω. Thenthe unique probability measure on (Ω,F) generated by P satisfies for allA ∈ F
P(A) =∑ωk∈A
P(ωk).
The proof of the theorem is easy since Ω is assumed to be discrete.
M. Rutkowski (USydney) Slides 1: Probability Review 7 / 56
Example: Coin Flipping
Example (1.2)
Let X be the number of “heads” appearing when a fair coin is tossedtwice. We choose the sample space Ω to be
Ω = 0, 1, 2
where a number k ∈ Ω represents the number of times “head” hasoccurred.
The probability measure P on Ω is defined as
P(k) =
0.25, if k = 0, 2,0.5, otherwise.
We recognise here the binomial distribution with n = 2 and p = 0.5.Note that a single flip of a coin is a Bernoulli trial.
M. Rutkowski (USydney) Slides 1: Probability Review 8 / 56
Example: Coin Flipping
Example (1.2 Continued)
We now suppose that the coin is not a fair one.
Let the probability of “head” be p for some p 6= 12 .
Then the probability measure P is given by
P(k) =
q2, if k = 0,
2pq, if k = 1,p2, if k = 2,
where q := 1− p is the probability of “tail” appearing.
We deal here with the binomial distribution with n = 2and 0 < p < 1.
M. Rutkowski (USydney) Slides 1: Probability Review 9 / 56
Expectation of a Random Variable
Definition (Expectation)
Let X be a random variable on a discrete sample space Ω endowed witha probability measure P. The expectation (expected value or meanvalue) of X is given by
EP (X) = µ :=∑k∈I
X(ωk)P(ωk) =∑k∈I
xkpk.
EP (·) is called the expectation operator over the probability P.
Note that the expectation of a random variable can be seen as theweighted average.
It is impossible to know the exact event to happen in the future andthus expectation is useful in making decisions when the probabilitiesof future outcomes are known.
M. Rutkowski (USydney) Slides 1: Probability Review 11 / 56
Expectation Operator
Any random variable defined on a finite set Ω admits the expectation.
When the set Ω is countable (but infinite), then we say that X isP-integrable whenever EP (|X|) =
∑ω∈Ω |X(ω)|P(ω) <∞.
In that case the expectation EP (X) is well defined (and finite).
Theorem (1.1)
Let X and Y be two P-integrable random variables and P be a probabilitymeasure on a discrete sample space Ω. Then for all α, β ∈ R
EP (αX + βY ) = αEP (X) + β EP (Y ) .
Hence EP (·) : X → R is a linear operator on the space X of P-integrablerandom variables.
M. Rutkowski (USydney) Slides 1: Probability Review 12 / 56
Expectation Operator
Proof of Theorem 1.1.
We note that
EP (|αX + βY |) ≤ |α|EP (|X|) + |β|EP (|Y |) <∞
so that the random variable αX + βY belongs to X . The linearity ofexpectation can be easily deduced from its definition
EP (αX + βY ) =∑ω∈Ω
(αX(ω) + βY (ω))P(ω)
=∑ω∈Ω
αX(ω)P(ω) +∑ω∈Ω
βY (ω)P(ω)
= α∑ω∈Ω
X(ω)P(ω) + β∑ω∈Ω
Y (ω)P(ω)
= αEP (X) + β EP (Y )
where α and β are arbitrary real numbers.M. Rutkowski (USydney) Slides 1: Probability Review 13 / 56
Expectation: Coin Flipping
Example (1.3)
A fair coin is tossed three times. The player receives one dollar eachtime “head” appears and loses one dollar when “tail” occurs.
Let the random variable X represent the player’s payoff.
The sample space Ω is defined as Ω = 0, 1, 2, 3 where k ∈ Ωrepresents the number of times “head” occurs.
The probability measure is given by
P(k) =
18 , if k = 0, 3,
38 , if k = 1, 2.
This is the binomial distribution with n = 3 and p = 12 .
M. Rutkowski (USydney) Slides 1: Probability Review 14 / 56
Expectation: Coin Flipping
Example (1.3 Continued)
The random variable X is defined as
X(k) =
−3, if k = 0,−1, if k = 1,
1, if k = 2,3, if k = 3.
Hence the player’s expected payoff equals
EP (X) =
3∑k=0
X(k)P(k)
= −3
8− 3
8+
3
8+
3
8= 0.
M. Rutkowski (USydney) Slides 1: Probability Review 15 / 56
Expectation of a Function of a Random Variable
Function of a random variable.
Let X be a random variable and P be a probability measure on adiscrete sample space Ω.
We define Y = f(X) where f : R→ R is an arbitrary function.
Then Y is also a random variable on the sample space Ω endowedwith the same probability measure P. Moreover,
EP(Y ) = EP(f(X)) =∑ω∈Ω
f(X(ω))P(ω).
If a random variable X is deterministic then EP(X) = X andEP(f(X)) = f(X).
M. Rutkowski (USydney) Slides 1: Probability Review 16 / 56
PART 3
EQUIVALENCE OF PROBABILITY MEASURES
M. Rutkowski (USydney) Slides 1: Probability Review 17 / 56
Equivalence of Probability Measures
Let P and Q be two probability measures on a discrete sample space Ω.
Definition (Equivalence of Probability Measures)
We say that the probability measures P and Q are equivalent and wedenote P ∼ Q whenever for all ω ∈ Ω
P(ω) > 0 ⇔ Q(ω) > 0.
The random variable L : Ω→ R+ given as
L(ω) =Q(ω)
P(ω)
is called the Radon-Nikodym density of Q with respect to P. Note that
EQ (X) =∑ω∈Ω
X(ω)Q(ω) =∑ω∈Ω
X(ω)L(ω)P(ω) = EP (LX) .
M. Rutkowski (USydney) Slides 1: Probability Review 18 / 56
Example: Equivalent Probability Measures
Example (Equivalent Probability Measures)
The sample space Ω is defined as Ω = 1, 2, 3, 4.Let the two probability measures P and Q be given by(
1
8,3
8,2
8,2
8
)and
(4
8,1
8,2
8,1
8
).
It is clear that P and Q are equivalent, that is, P ∼ Q.
Moreover, the Radon-Nikodym density L of Q with respectto P can be represented as follows(
4,1
3, 1,
1
2
).
Check that for any random variable X: EQ (X) = EP (LX) .
M. Rutkowski (USydney) Slides 1: Probability Review 19 / 56
Risky Investments
When deciding whether to invest in a given portfolio, an agent maybe concerned with the “risk” of his investment.
Example (1.4)
Consider an investor who is given an opportunity to choose between thefollowing two options:
1 The investor either receives or loses 1,000 dollars with equalprobabilities. This random payoff is denoted by X1.
2 The investor either receives or loses 1,000,000 dollars with equalprobabilities. We denote this random payoff as X2.
Hence in both scenarios the expected value of the payoff equals 0
EP (X1) = EP (X2) = 0.
The following question arises: which option is preferred?
M. Rutkowski (USydney) Slides 1: Probability Review 21 / 56
Variance of a Random Variable
Definition (Variance)
The variance of a random variable X on a discrete sample set Ω isdefined as
V ar (X) = σ2 := EP
(X − µ)2
,
where P is a probability measure on Ω.
Variance is a measure of the spread of a random variable about itsmean and also a measure of uncertainty.
In financial applications, it is common to identify varianceof the price of a security with its degree of “risk”.
Note that the variance is non-negative: V ar (X) = σ2 ≥ 0.
The variance is equal to 0 if and only if X is deterministic.
M. Rutkowski (USydney) Slides 1: Probability Review 22 / 56
Variance of a Random Variable
Example (1.4 Continued)
The variance of option 1 equals
V ar (X1) =(1000− 0)2
2+
(−1000− 0)2
2= 106.
The variance of option 2 equals
V ar (X2) =(106 − 0)2
2+
(−106 − 0)2
2= 1012.
Therefore, the option represented by X2 is riskier than the optionrepresented by X1.
A risk-averse agent would prefer the first option over the second.
A risk-loving agent would prefer the second option over the first.
M. Rutkowski (USydney) Slides 1: Probability Review 23 / 56
Variance of a Random Variable
Theorem (1.2)
Let X be a random variable and P be a probability measure on a discretesample space Ω. Then the following equality holds
V ar (X) = EP(X2)− µ2.
Proof of Theorem 1.2.
V ar (X) = EP
(X − µ)2
= EP(X2 − 2µX + µ2
)(linearity)
= EP(X2)− 2µEP (X) + µ2 = EP
(X2)− µ2.
M. Rutkowski (USydney) Slides 1: Probability Review 24 / 56
Independence of Random Variables
Definition (Independence)
Two discrete random variables X and Y are independent if for allx, y ∈ R
P (X = x, Y = y) = P (X = x)P (Y = y)
where P (X = x) is the probability of the event X = x.
A useful property of independent random variables X and Y is
EP (XY ) = EP (X)EP (Y ) .
Theorem (1.3)
For independent discrete random variables X,Y and arbitrary α, β ∈ R
V ar (αX + βY ) = α2V ar (X) + β2V ar (Y ) .
M. Rutkowski (USydney) Slides 1: Probability Review 25 / 56
Independence of Random Variables
Proof of Theorem 1.3.
Let EP (X) = µX and EP (Y ) = µY . Theorem 1.1 yields
EP (αX + βY ) = αµX + βµY .
Using Theorem 1.2, we obtain
V ar (αX + βY ) = EP
(αX + βY )2− (αµX + βµY )2
= α2 EP(X2)
+ 2αβ EP (XY ) + β2 EP (Y )
− (αµX + βµY )2
= α2(EP(X2)− µ2
X
)+ β2
(EP(Y 2)− µ2
Y
)+ 2αβ (EP (X)EP (Y )− µXµY )
= α2V ar (X) + β2V ar (Y ) .
M. Rutkowski (USydney) Slides 1: Probability Review 26 / 56
Continuous Random Variables
Definition (Continuous Random Variable)
A random variable X on the sample space Ω is said to have a continuousdistribution if there exists a real-valued function f such that
f(x) ≥ 0,∫ ∞−∞
f(x) dx = 1,
and for all real numbers a < b
P (a ≤ X ≤ b) =
∫ b
af(x) dx.
Then f : R→ R+ is called the probability density function (pdf) of acontinuous random variable X.
M. Rutkowski (USydney) Slides 1: Probability Review 28 / 56
Continuous Random Variables
Assume that X is a continuous random variable.
Note that a probability density function need not satisfy theconstraint f(x) ≤ 1.
A function F (x) is called a cumulative distribution function (cdf)of a continuous random variable X if for all x ∈ R
F (x) = P (X ≤ x) =
∫ x
−∞f(y) dy.
The relationship between the pdf f and cdf F
F (x) =
∫ x
−∞f(y)dy ⇔ f(x) =
d
dxF (x).
M. Rutkowski (USydney) Slides 1: Probability Review 29 / 56
Continuous Random Variables
The expectation and variance of a continuous random variable X aredefined as follows:
EP (X) = µ :=
∫ ∞−∞
x f(x) dx,
V ar (X) = σ2 :=
∫ ∞−∞
(x− µ)2f(x) dx,
or, equivalently,
V ar (X) =
∫ ∞−∞
x2f(x) dx− µ2 = EP(X2)− (EP (X))2
The properties of expectations of discrete random variables carry overto continuous random variables, with probability measures beingreplaced by pdfs and sums by integrals.
M. Rutkowski (USydney) Slides 1: Probability Review 30 / 56
PART 6
EXAMPLES OF PROBABILITY DISTRIBUTIONS
M. Rutkowski (USydney) Slides 1: Probability Review 31 / 56
Discrete Probability Distributions
Example (Binomial Distribution)
Let Ω = 0, 1, 2, . . . , n be the sample space and let X bethe number of successes in n independent trials where p isthe probability of success in a single Bernoulli trial.
The probability measure P is called the binomial distribution if
P(k) =
(n
k
)pk(1− p)n−k for k = 0, . . . , n
where (n
k
)=
n!
k!(n− k)!.
ThenEP (X) = np and V ar (X) = np(1− p).
M. Rutkowski (USydney) Slides 1: Probability Review 32 / 56
Discrete Probability Distributions
Example (Poisson Distribution)
Let the sample space be Ω = 0, 1, 2, . . . .We take an arbitrary number λ > 0.
The probability measure P is called the Poisson distribution if
P (k) =λke−λ
k!for k = 0, 1, 2, . . . .
ThenEP (X) = λ = V ar (X) .
It is known that the Poisson distribution can be obtained as the limitof the binomial distribution when n tends to infinity and the sequencenpn tends to λ > 0.
M. Rutkowski (USydney) Slides 1: Probability Review 33 / 56
Discrete Probability Distributions
Example (Geometric Distribution)
Let Ω = 1, 2, 3, . . . be the sample space and X be the number ofindependent trials to achieve the first success.
Let p stand for the probability of a success in a single trial.
The probability measure P is called the geometric distribution if
P (k) = (1− p)k−1p for k = 1, 2, 3, . . . .
Then
EP (X) =1
pand V ar (X) =
1− pp2
.
M. Rutkowski (USydney) Slides 1: Probability Review 34 / 56
Continuous Probability Distributions
Example (Uniform Distribution)
We say that X has the uniform distribution on an interval (a, b) ifits pdf equals
f(x) =
1b−a , if x ∈ (a, b) ,
0, otherwise.
It is clear that ∫ ∞−∞
f(x) dx =
∫ b
a
1
b− adx = 1.
We have
EP (X) =a+ b
2and V ar (X) =
(b− a)2
12.
M. Rutkowski (USydney) Slides 1: Probability Review 35 / 56
Continuous Probability Distributions
Example (Exponential Distribution)
We say that X has the exponential distribution on (0,∞) with theparameter λ > 0 if its pdf equals
f(x) =
λe−λx, if x > 0,
0, otherwise.
It is easy to check that∫ ∞−∞
f(x) dx =
∫ ∞0
λe−λx dx = 1.
We have
EP (X) =1
λand V ar (X) =
1
λ2.
M. Rutkowski (USydney) Slides 1: Probability Review 36 / 56
Continuous Probability Distributions
Example (Gaussian Distribution)
We say that X has the Gaussian (normal) distribution with meanµ ∈ R and variance σ2 > 0 if its pdf equals
f(x) =1√
2πσ2e−
(x−µ)2
2σ2 for x ∈ R.
We write X ∼ N(µ, σ2).
One can show that∫ ∞−∞
f(x) dx =
∫ ∞−∞
1√2πσ2
e−(x−µ)2
2σ2 dx = 1.
We haveEP (X) = µ and V ar (X) = σ2.
M. Rutkowski (USydney) Slides 1: Probability Review 37 / 56
Continuous Probability Distributions
Example (Standard Normal Distribution)
If we set µ = 0 and σ2 = 1 then we obtain the standard normaldistribution N(0, 1) with the following pdf
n(x) =1√2π
e−x2
2 for x ∈ R.
The cdf of the probability distribution N(0, 1) equals
N(x) =
∫ x
−∞n(u) du =
∫ x
−∞
1√2π
e−u2
2 du for x ∈ R.
The values of N(x) can be found in the cumulative standardnormal table (also known as the Z table).
If X ∼ N(µ, σ2
)then Z := X−µ
σ ∼ N(0, 1).
M. Rutkowski (USydney) Slides 1: Probability Review 38 / 56
LLN and CLT
Theorem (Law of Large Numbers)
Assume that X1, X2, . . . are independent and identically distributedrandom variables with mean µ. Then with probability one,
X1 + · · ·+Xn
n→ µ as n→∞
Theorem (Central Limit Theorem)
Assume that X1, X2, . . . are independent and identically distributedrandom variables with mean µ and variance σ2 > 0. Then for all real x
limn→∞
PX1 + · · ·+Xn − nµ
σ√n
≤ x
=
∫ x
−∞
1√2π
e−u2/2 du = N(x).
M. Rutkowski (USydney) Slides 1: Probability Review 39 / 56
Covariance of Random Variables
In the real world, agents can invest in several securities and typicallywould like to benefit from diversification.
The price of one security may affect the prices of other assets. Forexample, the stock index falling may lead the price of gold to rise.
To quantify this effect, it is convenient to introduce the notion ofcovariance.
Definition (Covariance)
The covariance of two random variables X1 and X2 is defined as
Cov (X1, X2) = σ12 := EP (X1 − µ1) (X2 − µ2)
where µi = EP (Xi) for i = 1, 2.
M. Rutkowski (USydney) Slides 1: Probability Review 41 / 56
Correlated and Uncorrelated Random Variables
The covariance of two random variables is a measure of the degree ofvariation of one variable with respect to the other.
Unlike the variance of a random variable, covariance of two randomvariables may take negative values.
σ12 > 0: An increase in one variable tends to coincide withan increase in the other.σ12 < 0: An increase in one variable tends to coincide witha decrease in the other.σ12 = 0: Then the random variables X1 and X2 are said tobe uncorrelated. If σ12 6= 0 then X1 and X2 are correlated.
Definition (Uncorrelated Random Variables)
We say that the random variables X1 and X2 are uncorrelated whenever
Cov (X1, X2) = σ12 = 0.
M. Rutkowski (USydney) Slides 1: Probability Review 42 / 56
Properties of the Covariance
Theorem (1.4)
The following properties are valid:
Cov (X,X) = V ar (X) = σ2.
Cov (X1, X2) = EP (X1X2)− µ1µ2.
EP (X1X2) = EP (X1)EP (X2) if and only if X1 and X2 areuncorrelated, that is, Cov (X1, X2) = 0.
V ar (a1X1 + a2X2) = a21σ
21 + a2
2σ22 + 2a1a2σ12.
V ar (X1 +X2) = V ar (X1) + V ar (X2) if and only if X1 andX2 are uncorrelated.
If X1 and X2 are independent then they are uncorrelated.
The converse is not true: it may happen that X1 and X2 areuncorrelated, but they are not independent.
M. Rutkowski (USydney) Slides 1: Probability Review 43 / 56
Correlation Coefficient
We can normalise the covariance measure. We obtain in this way theso-called correlation coefficient.
Note that the correlation coefficient is only defined when σ1 > 0 andσ2 > 0, that is, none of the two random variables is deterministic.
Definition (Correlation Coefficient)
Let X1 and X2 be two random variables with variances σ21 and σ2
2 andcovariance σ12. Then the correlation coefficient
ρ = ρ(X1, X2) = corr (X1, X2)
is defined by
ρ :=σ12
σ1σ2=
Cov (X1, X2)√V ar (X1)V ar (X2)
.
M. Rutkowski (USydney) Slides 1: Probability Review 44 / 56
Properties of the Correlation Coefficient
Theorem (1.5)
The correlation coefficient satisfies −1 ≤ ρ ≤ 1. The random variables X1
and X2 are uncorrelated whenever ρ = 0.
Proof.
The result follows from an application of the Cauchy-Schwarz inequality:(∑n
k=1 akbk)2 ≤ (
∑nk=1 a
2k)(∑n
k=1 b2k).
ρ = 1: If one variable increases, the other will also increase.
0 < ρ < 1: If one increases, the other will tend to increase.
ρ = 0: The two random variables are uncorrelated.
−1 < ρ < 0: If one increases, the other will tend to decrease.
ρ = −1: If one increases, the other will decrease.
M. Rutkowski (USydney) Slides 1: Probability Review 45 / 56
Linear Regression in Basic Statistics
If we observe the time series of prices of two securities the followingquestion arises: how are they related to each other?
Consider two random variables X and Y on the sample spaceΩ = ω1, ω2, . . . , ωn. Denote X(ωi) = xi and Y (ωi) = yi.
A linear fit is a relation of the following form, for α, β ∈ R,
yi = α+ βxi + εi.
The number εi is called the residual of the ith observation.
In Statistics, the best linear fit to observed data is obtained throughthe method of least-squares: minimise Σn
i=1ε2i . The line y = α+ βx is
called the least-squares linear regression.
We denote by ε the random variable such that ε(ωi) = εi. This meansthat we set ε := Y − α− βX.
M. Rutkowski (USydney) Slides 1: Probability Review 46 / 56
Linear Regression in Probability Theory
In Probability Theory, if α and β are chosen to minimise
EP
(Y − α− βX
)2= EP
(ε2)
= Σni=1ε
2(ωi)P(ωi)
then the random variable `Y |X := α+ βX is called the linearregression of Y on X.The next result is valid for both discrete and continuous randomvariables.
Theorem (1.6)
The minimizers α and β are
α = µY − βµX and β =σXYσ2X
.
Hence the linear regression of Y on X is given by
`Y |X =σXYσ2X
(X − µX) + µY .
M. Rutkowski (USydney) Slides 1: Probability Review 47 / 56
Covariance Matrix
Recall that the covariance of random variables Xi and Xj equals
Cov (Xi, Xj) = σij := EP (Xi − µi) (Xj − µj)
where µi = EP (Xi) and µj = EP (Xj).
Definition (Covariance Matrix)
We consider random variables Xi for i = 1, 2, . . . , n with variancesσ2i = σii and covariances σij . The matrix S given by
S :=
σ2
1 σ12 · · · σ1n
σ21 σ22 · · · σ2n
......
. . ....
σn1 σn2 · · · σ2n
is called the covariance matrix of (X1, . . . , Xn).
M. Rutkowski (USydney) Slides 1: Probability Review 48 / 56
Correlation Matrix
Definition (Correlation Matrix)
We consider random variables Xi for i = 1, 2, . . . , n with variancesσ2i = σii and covariances σij . The matrix P given by
P :=
1 ρ12 · · · ρ1n
ρ21 1 · · · ρ2n...
.... . .
...ρn1 ρn2 · · · 1
is called the correlation matrix of (X1, . . . , Xn).
S and P are symmetric and positive-definite matrices.S = DPD where D2 is a diagonal matrix whose main diagonalcoincides with the main diagonal in S.If Xis are independent (or uncorrelated) random variables then S is adiagonal matrix and P = I is the identity matrix.
M. Rutkowski (USydney) Slides 1: Probability Review 49 / 56
Linear Combinations of Random Variables
Theorem (1.7)
Assume that the random variables Y1 and Y2 are some linear combinationsof Xi, that is, there exists vectors aj ∈ Rn for j = 1, 2 such that
Yj =
n∑i=1
ajiXi =
aj1aj2...
ajn
T
X1
X2...Xn
=⟨aj , X
⟩
where 〈·, ·〉 denotes the inner product in Rn. Then
Cov (Y1, Y2) =(a1)TS a2 =
(a1)TDPDa2.
M. Rutkowski (USydney) Slides 1: Probability Review 50 / 56
PART 8
CONDITIONAL DISTRIBUTIONS AND EXPECTATIONS
M. Rutkowski (USydney) Slides 1: Probability Review 51 / 56
Conditional Distributions and Expectations
Definition (Conditional Probability)
For two random variables X1 and X2 and an arbitrary set A such thatP(X2 ∈ A) 6= 0, we define the conditional probability
P(X1 ∈ A1 |X2 ∈ A2) :=P(X1 ∈ A1, X2 ∈ A2)
P(X2 ∈ A2)
and the conditional expectation
EP(X1 |X2 ∈ A) :=EP(X11X2∈A)
P(X2 ∈ A)
where 1X2∈A : Ω→ 0, 1 is the indicator function of X2 ∈ A, that is,
1X2∈A(ω) =
1, if X2(ω) ∈ A,0, otherwise.
M. Rutkowski (USydney) Slides 1: Probability Review 52 / 56
Discrete Case
Assume that X and Y are discrete random variables
pi = P(X = xi) > 0 for i = 1, 2, . . . and∞∑i=1
pi = 1,
pj = P(Y = yj) > 0 for j = 1, 2, . . . and∞∑j=1
pj = 1.
Definition (Conditional Distribution and Expectation)
Then the conditional distribution equals
pX|Y (xi | yj) = P(X = xi |Y = yj) :=P(X = xi, Y = yj)
P(Y = yj)=pi,jpj
and the conditional expectation EP(X |Y ) is given by
EP(X |Y = yj) :=
∞∑i=1
xi P(X = xi |Y = yj) =
∞∑i=1
xipi,jpj
.
M. Rutkowski (USydney) Slides 1: Probability Review 53 / 56
Discrete Case
It is easy to check that
pi = P(X = xi) =
∞∑j=1
P(X = xi |Y = yj)P(Y = yj) =
∞∑j=1
pX|Y (xi | yj) pj .
The expected value EP(X) satisfies
EP(X) =
∞∑j=1
EP(X |Y = yj)P(Y = yj).
Definition (Conditional cdf)
The conditional cdf FX|Y (· | yj) of X given Y is defined for all yj such thatP(Y = yj) > 0 by
FX|Y (x | yj) := P(X ≤ x |Y = yj) =∑xi≤x
pX|Y (xi | yj).
Hence EP(X |Y = yj) is the mean of the conditional distribution.
M. Rutkowski (USydney) Slides 1: Probability Review 54 / 56
Continuous Case
Assume that the continuous random variables X and Y have the jointpdf fX,Y (x, y).
Definition (Conditional pdf and cdf)
The conditional pdf of Y given X is defined for all x such thatfX(x) > 0 and equals
fY |X(y |x) :=fX,Y (x, y)
fX(x)for y ∈ R.
The conditional cdf of Y given X equals
FY |X(y |x) := P(Y ≤ y |X = x) =
∫ y
−∞
fX,Y (x, u)
fX(x)du.
M. Rutkowski (USydney) Slides 1: Probability Review 55 / 56
Continuous Case
Definition (Conditional Expectation)
The conditional expectation of Y given X is defined for all x such thatfX(x) > 0 by
EP(Y |X = x) :=
∫ ∞−∞
y dFY |X(y |x) =
∫ ∞−∞
yfX,Y (x, y)
fX(x)dy.
An important property of conditional expectation is that
EP(Y ) = EP(EP(Y |X)) =
∫ ∞−∞
EP(Y |X = x)fX(x) dx.
Hence the expected value EP(Y ) can be determined by firstconditioning on X (in order to compute EP(Y |X)) and thenintegrating with respect to the pdf of X.
M. Rutkowski (USydney) Slides 1: Probability Review 56 / 56