chapter 4 degroot & schervish. variance although the mean of a distribution is a useful summary,...

Chapter 4DeGroot & Schervish

Variance Although the mean of a distribution is a useful

summary, it does not convey very much information about the distribution. A random variable X with mean 2 has the same mean as

the constant random variable Y such that Pr(Y = 2) = 1 even if X is not constant!

To distinguish the distribution of X from the distribution of Y in this case, it might be useful to give some measure of how spread out the distribution of X is.

The variance of X is one such measure. The standard deviation of X is the square root of the

variance.

Stock Price ChangesConsider the prices A and B of two stocks at

a time one month in the future. Assume that A has the uniform distribution

on the interval [25, 35] and B has the uniform distribution on the interval [15, 45].

Both stocks have a mean price of 30. But the distributions are very different.

Stock Price Changes

Variance/Standard DeviationLet X be a random variable with finite mean μ =

E(X).The variance of X, denoted by Var(X), is defined as

follows:

The standard deviation of X is the nonnegative square root of Var(X) if the variance exists.

When only one random variable is being discussed, it is common to denote its standard deviation by the symbol σ, and the variance is denoted by σ2.

Stock Price ChangesReturn to the two random variables A and B

in the example

Variance and Standard Deviation of a Discrete DistributionSuppose that a random variable X can take each

of the five values −2, 0, 1, 3, and 4 with equal probability.E(X) = 1/5(−2 + 0 + 1+ 3 + 4) = 1.2.

W = (X − μ)2 , Var(X) = E(W).

Properties of the VarianceTheorem:Var(X) = 0 if and only if there exists a

constant c such that Pr(X = c) = 1.

Properties of the VarianceTheorem:For constants a and b,

Y = aX + b, Var(Y ) = a2 Var(X),and σY = |a|σX.

Calculating the Variance and Standard Deviation of a Linear FunctionSuppose that a random variable X can take

each of the five values −2, 0, 1, 3, and 4 with equal probability.

Determine the variance and standard deviation of Y = 4X − 7.The mean of X is μ = 1.2 and the variance is

4.56Var(Y ) = 16 Var(X) = 72.96.Also, the standard deviation σ of Y is

σY = 4σX = 4(4.56)1/2 = 8.54.

For every random variable X, Var(X) = E(X2) − [E(X)]2.

Theorem If X1, . . . , Xn are independent random variables with finite

means, thenVar(X1 + . . . + Xn) = Var(X1) + . . . + Var(Xn).

The Variance of a Binomial DistributionSuppose that a box contains red balls and blue

balls, and that the proportion of red balls is p (0 ≤ p ≤ 1).

Suppose n balls is selected from the box with replacement.

For i = 1, . . . , n, let Xi = 1 if the ith ball that is selected is red, and let Xi = 0 otherwise.

If X denotes the total number of red balls in the sample, then X = X1 + . . . + Xn and X will have the binomial

distribution with parameters n and p.

Since X1, . . . , Xn are independent, it follows from the theorem

E(Xi) = p for i = 1, . . . , n. Since Xi2 = Xi for each i, E(Xi2 ) = E(Xi) = p.

Var(Xi) = E(Xi2 ) − [E(Xi)]2 = p − p2 = p(1− p).

Var(X) = np(1− p).

MomentsFor a random variable X, the means of

powers Xk (called moments) for k >2 have useful theoretical properties, and some of them are used for additional summaries of a distribution.

The moment generating function is a related tool

Existence of MomentsFor each random variable X and every

positive integer k, the expectation E(Xk) is called the kth moment of X

In particular, in accordance with this terminology, the mean of X is the first moment of X.

Existence of MomentsSuppose that X is a random variable for

which E(X)=μ. For every positive integer k, the expectation

E[(X −μ)k] is called the kth central moment of X or the kth moment of X about the mean.

In particular, in accordance with this terminology, the variance of X is the second central moment of X.

Moment Generating FunctionsLet X be a random variable. For each real

number t ,ψ(t) = E(etX).

The function ψ(t) is called the moment generating function (abbreviated m.g.f.) of X.

The Moment Generating Function of X Depends Only on the Distribution of X: Since the m.g.f. is the expected value of a function

of X, it must depend only on the distribution of X. If X and Y have the same distribution, they must

have the same m.g.f.

Theorem LetX be a random variables whose m.g.f. ψ(t)

is finite for all values of t in some open interval around the point t = 0.

Then, for each integer n > 0, the nth moment of X, E(Xn), is finite and equals the nth derivative ψ(n)(t) at t = 0. That is, E(Xn) = ψ(n)(0) for n = 1, 2, . . . .

Example

Properties of Moment Generating FunctionsTheorem

Let X be a random variable for which the m.g.f. is ψ1; let Y = aX + b, where a and b are given constants; and let ψ2 denote the m.g.f. of Y . Then for every value of t such that ψ1(at) is finite, ψ2(t) = ebtψ1(at).

Example

Theorem Suppose that X1, . . . , Xn are n independent

random variables; and for i = 1, . . . , n, let ψi denote the m.g.f. of Xi .

Let Y = X1+ . . . + Xn, and let the m.g.f. of Y be denoted by ψ. Then for every value of t such that ψi(t) is finite for i = 1, . . . , n,

The Moment Generating Function for the Binomial DistributionSuppose that a random variable X has the

binomial distribution with parameters n and p. The mean and the variance of X are

determined by representing X as the sum of n independent random variables X1, . . . , Xn.

The distribution of each variable Xi is as follows:Pr(Xi = 1) = p and Pr(Xi = 0) = 1− p.

Now use this representation to determine the m.g.f. of X = X1 + . . . + Xn.

The Moment Generating Function for the Binomial Distribution

Uniqueness of Moment Generating FunctionsTheorem If the m.g.f.’s of two random variables X1 and

X2 are finite and identical for all values of t in an open interval around the point t = 0, then the probability distributions of X1 and X2 must be identical.

The Additive Property of the Binomial DistributionIf X1 and X2 are independent random

variables, and if Xi has the binomial distribution with parameters ni and p (i = 1, 2), then X1 + X2 has the binomial distribution with parameters n1 + n2 and p.

The Mean and the MedianAlthough the mean of a distribution is a

measure of central location, the median is also a measure of central location for a distribution.

Let X be a random variable. Every number m with the following property

is called a median of the distribution of X:Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2.

Indeed, the 1/2 quantile is a median.

Example The Median of a Discrete Distribution: Suppose that X has the following discrete

distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.2,Pr(X = 3) = 0.3, Pr(X = 4) = 0.4.

The value 3 is a median of this distribution because Pr(X ≤ 3) = 0.6, which is greater than 1/2, and Pr(X ≥ 3) = 0.7, which is also greater than 1/2.

Furthermore, 3 is the unique median of this distribution.

Example A Discrete Distribution for Which the Median Is

Not Unique: Suppose that X has the following discrete

distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.4,Pr(X = 3) = 0.3, Pr(X = 4) = 0.2.

Pr(X ≤ 2) = 1/2, and Pr(X ≥ 3) = 1/2. Therefore, every value of m in the closed interval 2 ≤ m ≤ 3 will be a median of this distribution.

The most popular choice of median of this distribution would be the midpoint 2.5.

Example The Median of a Continuous Distribution. Suppose that X has a continuous distribution

for which the p.d.f. is as follows:

Mean Squared Error/M.S.ESuppose that X is a random variable with mean μ and

variance σ2. Suppose also that the value of X is to be observed in

some experiment, but this value must be predicted before the observation can be made.

One basis for making the prediction is to select some number d for which the expected value of the square of the error X − d will be a minimum.

The number E[(X − d)2] is called the mean squared error (M.S.E.) of the prediction d.

The number d for which the M.S.E. is minimized is E(X).

Mean Absolute Error/M.A.E.Another possible basis for predicting the

value of a random variable X is to choose some number d for which E(|X − d|) will be a minimum.

The M.A.E. is minimized when the chosen value of d is a median of the distribution of X.

Predicting a Discrete Uniform Random Variable. Suppose that the probability is 1/6 that a random variable X

will take each of the following six values: 1, 2, 3, 4, 5, 6.Determine the prediction for which the M.S.E. is minimum

and the prediction for which the M.A.E. is minimum. In this example, E(X) = 1/6(1+ 2 + 3 + 4 + 5 + 6) = 3.5.Therefore, the M.S.E. will be minimized by the unique

value d = 3.5.Also, every number m in the closed interval 3 ≤ m ≤ 4 is a

median of the given distribution. Therefore, the M.A.E. will be minimized by every value of d such that 3 ≤ d ≤ 4.

Because the distribution of X is symmetric, the mean of X is also a median of X.

Covariance and CorrelationWhen we are interested in the joint

distribution of two random variables, it is useful to have a summary of how much the two random variables depend on each other.

The covariance and correlation are attempts to measure that dependence, but they only capture a particular type of dependence, namely linear dependence.

CovarianceLet X and Y be random variables having finite

means. Let E(X) = μX and E(Y) = μY .The covariance of X and Y, which is denoted

by Cov(X,Y), is defined asCov(X, Y ) = E[(X − μX)(Y − μY )]

Example Let X and Y have the joint p.d.f. f:

Theorem For all random variables X and Y

Cov(X, Y ) = E(XY) − E(X)E(Y).Proof

Cov(X, Y ) = E(XY − μXY − μYX + μXμY )

= E(XY) − μXE(Y) − μYE(X) + μXμY .

CorrelationLet X and Y be random variables with finite

variances σX2 and σY

2 , respectively.Then the correlation of X and Y , which is

denoted by ρ(X, Y), is defined as follows:

Theorem

Properties of Covariance and CorrelationIf X and Y are independent random variables

Cov(X, Y ) = ρ(X, Y) = 0.Proof If X and Y are independent, then

E(XY) = E(X)E(Y). Cov(X, Y ) = 0. Also, it follows that ρ(X, Y) = 0.

Theorem Suppose that X is a random variable and Y =

aX + b. If a>0, then ρ(X, Y) = 1. If a <0, then ρ(X, Y)=−1.

Since σY= |a|σX, the theorem follows from Correlation equation.

Theorem If X and Y are random variables

Var(X + Y) = Var(X) + Var(Y ) + 2 Cov(X, Y ).

Theorem

chapter 4 degroot & schervish. variance although the mean of a distribution is a useful summary,...

Documents