chapter 4 degroot & schervish. variance although the mean of a distribution is a useful summary,...
TRANSCRIPT
![Page 1: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/1.jpg)
Chapter 4DeGroot & Schervish
![Page 2: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/2.jpg)
Variance Although the mean of a distribution is a useful
summary, it does not convey very much information about the distribution. A random variable X with mean 2 has the same mean as
the constant random variable Y such that Pr(Y = 2) = 1 even if X is not constant!
To distinguish the distribution of X from the distribution of Y in this case, it might be useful to give some measure of how spread out the distribution of X is.
The variance of X is one such measure. The standard deviation of X is the square root of the
variance.
![Page 3: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/3.jpg)
Stock Price ChangesConsider the prices A and B of two stocks at
a time one month in the future. Assume that A has the uniform distribution
on the interval [25, 35] and B has the uniform distribution on the interval [15, 45].
Both stocks have a mean price of 30. But the distributions are very different.
![Page 4: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/4.jpg)
Stock Price Changes
![Page 5: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/5.jpg)
Variance/Standard DeviationLet X be a random variable with finite mean μ =
E(X).The variance of X, denoted by Var(X), is defined as
follows:
The standard deviation of X is the nonnegative square root of Var(X) if the variance exists.
When only one random variable is being discussed, it is common to denote its standard deviation by the symbol σ, and the variance is denoted by σ2.
![Page 6: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/6.jpg)
Stock Price ChangesReturn to the two random variables A and B
in the example
![Page 7: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/7.jpg)
Variance and Standard Deviation of a Discrete DistributionSuppose that a random variable X can take each
of the five values −2, 0, 1, 3, and 4 with equal probability.E(X) = 1/5(−2 + 0 + 1+ 3 + 4) = 1.2.
W = (X − μ)2 , Var(X) = E(W).
![Page 8: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/8.jpg)
Properties of the VarianceTheorem:Var(X) = 0 if and only if there exists a
constant c such that Pr(X = c) = 1.
![Page 9: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/9.jpg)
Properties of the VarianceTheorem:For constants a and b,
Y = aX + b, Var(Y ) = a2 Var(X),and σY = |a|σX.
![Page 10: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/10.jpg)
Calculating the Variance and Standard Deviation of a Linear FunctionSuppose that a random variable X can take
each of the five values −2, 0, 1, 3, and 4 with equal probability.
Determine the variance and standard deviation of Y = 4X − 7.The mean of X is μ = 1.2 and the variance is
4.56Var(Y ) = 16 Var(X) = 72.96.Also, the standard deviation σ of Y is
σY = 4σX = 4(4.56)1/2 = 8.54.
![Page 11: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/11.jpg)
For every random variable X, Var(X) = E(X2) − [E(X)]2.
![Page 12: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/12.jpg)
Theorem If X1, . . . , Xn are independent random variables with finite
means, thenVar(X1 + . . . + Xn) = Var(X1) + . . . + Var(Xn).
![Page 13: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/13.jpg)
![Page 14: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/14.jpg)
The Variance of a Binomial DistributionSuppose that a box contains red balls and blue
balls, and that the proportion of red balls is p (0 ≤ p ≤ 1).
Suppose n balls is selected from the box with replacement.
For i = 1, . . . , n, let Xi = 1 if the ith ball that is selected is red, and let Xi = 0 otherwise.
If X denotes the total number of red balls in the sample, then X = X1 + . . . + Xn and X will have the binomial
distribution with parameters n and p.
![Page 15: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/15.jpg)
Since X1, . . . , Xn are independent, it follows from the theorem
E(Xi) = p for i = 1, . . . , n. Since Xi2 = Xi for each i, E(Xi2 ) = E(Xi) = p.
Var(Xi) = E(Xi2 ) − [E(Xi)]2 = p − p2 = p(1− p).
Var(X) = np(1− p).
![Page 16: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/16.jpg)
MomentsFor a random variable X, the means of
powers Xk (called moments) for k >2 have useful theoretical properties, and some of them are used for additional summaries of a distribution.
The moment generating function is a related tool
![Page 17: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/17.jpg)
Existence of MomentsFor each random variable X and every
positive integer k, the expectation E(Xk) is called the kth moment of X
In particular, in accordance with this terminology, the mean of X is the first moment of X.
![Page 18: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/18.jpg)
Existence of MomentsSuppose that X is a random variable for
which E(X)=μ. For every positive integer k, the expectation
E[(X −μ)k] is called the kth central moment of X or the kth moment of X about the mean.
In particular, in accordance with this terminology, the variance of X is the second central moment of X.
![Page 19: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/19.jpg)
Moment Generating FunctionsLet X be a random variable. For each real
number t ,ψ(t) = E(etX).
The function ψ(t) is called the moment generating function (abbreviated m.g.f.) of X.
The Moment Generating Function of X Depends Only on the Distribution of X: Since the m.g.f. is the expected value of a function
of X, it must depend only on the distribution of X. If X and Y have the same distribution, they must
have the same m.g.f.
![Page 20: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/20.jpg)
Theorem LetX be a random variables whose m.g.f. ψ(t)
is finite for all values of t in some open interval around the point t = 0.
Then, for each integer n > 0, the nth moment of X, E(Xn), is finite and equals the nth derivative ψ(n)(t) at t = 0. That is, E(Xn) = ψ(n)(0) for n = 1, 2, . . . .
![Page 21: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/21.jpg)
Example
![Page 22: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/22.jpg)
Example
![Page 23: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/23.jpg)
Properties of Moment Generating FunctionsTheorem
Let X be a random variable for which the m.g.f. is ψ1; let Y = aX + b, where a and b are given constants; and let ψ2 denote the m.g.f. of Y . Then for every value of t such that ψ1(at) is finite, ψ2(t) = ebtψ1(at).
![Page 24: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/24.jpg)
Example
![Page 25: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/25.jpg)
Theorem Suppose that X1, . . . , Xn are n independent
random variables; and for i = 1, . . . , n, let ψi denote the m.g.f. of Xi .
Let Y = X1+ . . . + Xn, and let the m.g.f. of Y be denoted by ψ. Then for every value of t such that ψi(t) is finite for i = 1, . . . , n,
![Page 26: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/26.jpg)
Proof
![Page 27: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/27.jpg)
The Moment Generating Function for the Binomial DistributionSuppose that a random variable X has the
binomial distribution with parameters n and p. The mean and the variance of X are
determined by representing X as the sum of n independent random variables X1, . . . , Xn.
The distribution of each variable Xi is as follows:Pr(Xi = 1) = p and Pr(Xi = 0) = 1− p.
Now use this representation to determine the m.g.f. of X = X1 + . . . + Xn.
![Page 28: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/28.jpg)
The Moment Generating Function for the Binomial Distribution
![Page 29: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/29.jpg)
Uniqueness of Moment Generating FunctionsTheorem If the m.g.f.’s of two random variables X1 and
X2 are finite and identical for all values of t in an open interval around the point t = 0, then the probability distributions of X1 and X2 must be identical.
![Page 30: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/30.jpg)
The Additive Property of the Binomial DistributionIf X1 and X2 are independent random
variables, and if Xi has the binomial distribution with parameters ni and p (i = 1, 2), then X1 + X2 has the binomial distribution with parameters n1 + n2 and p.
![Page 31: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/31.jpg)
The Mean and the MedianAlthough the mean of a distribution is a
measure of central location, the median is also a measure of central location for a distribution.
Let X be a random variable. Every number m with the following property
is called a median of the distribution of X:Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2.
Indeed, the 1/2 quantile is a median.
![Page 32: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/32.jpg)
Example The Median of a Discrete Distribution: Suppose that X has the following discrete
distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.2,Pr(X = 3) = 0.3, Pr(X = 4) = 0.4.
The value 3 is a median of this distribution because Pr(X ≤ 3) = 0.6, which is greater than 1/2, and Pr(X ≥ 3) = 0.7, which is also greater than 1/2.
Furthermore, 3 is the unique median of this distribution.
![Page 33: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/33.jpg)
Example A Discrete Distribution for Which the Median Is
Not Unique: Suppose that X has the following discrete
distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.4,Pr(X = 3) = 0.3, Pr(X = 4) = 0.2.
Pr(X ≤ 2) = 1/2, and Pr(X ≥ 3) = 1/2. Therefore, every value of m in the closed interval 2 ≤ m ≤ 3 will be a median of this distribution.
The most popular choice of median of this distribution would be the midpoint 2.5.
![Page 34: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/34.jpg)
Example The Median of a Continuous Distribution. Suppose that X has a continuous distribution
for which the p.d.f. is as follows:
![Page 35: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/35.jpg)
Mean Squared Error/M.S.ESuppose that X is a random variable with mean μ and
variance σ2. Suppose also that the value of X is to be observed in
some experiment, but this value must be predicted before the observation can be made.
One basis for making the prediction is to select some number d for which the expected value of the square of the error X − d will be a minimum.
The number E[(X − d)2] is called the mean squared error (M.S.E.) of the prediction d.
The number d for which the M.S.E. is minimized is E(X).
![Page 36: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/36.jpg)
Mean Absolute Error/M.A.E.Another possible basis for predicting the
value of a random variable X is to choose some number d for which E(|X − d|) will be a minimum.
The M.A.E. is minimized when the chosen value of d is a median of the distribution of X.
![Page 37: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/37.jpg)
Predicting a Discrete Uniform Random Variable. Suppose that the probability is 1/6 that a random variable X
will take each of the following six values: 1, 2, 3, 4, 5, 6.Determine the prediction for which the M.S.E. is minimum
and the prediction for which the M.A.E. is minimum. In this example, E(X) = 1/6(1+ 2 + 3 + 4 + 5 + 6) = 3.5.Therefore, the M.S.E. will be minimized by the unique
value d = 3.5.Also, every number m in the closed interval 3 ≤ m ≤ 4 is a
median of the given distribution. Therefore, the M.A.E. will be minimized by every value of d such that 3 ≤ d ≤ 4.
Because the distribution of X is symmetric, the mean of X is also a median of X.
![Page 38: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/38.jpg)
Covariance and CorrelationWhen we are interested in the joint
distribution of two random variables, it is useful to have a summary of how much the two random variables depend on each other.
The covariance and correlation are attempts to measure that dependence, but they only capture a particular type of dependence, namely linear dependence.
![Page 39: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/39.jpg)
CovarianceLet X and Y be random variables having finite
means. Let E(X) = μX and E(Y) = μY .The covariance of X and Y, which is denoted
by Cov(X,Y), is defined asCov(X, Y ) = E[(X − μX)(Y − μY )]
![Page 40: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/40.jpg)
Example Let X and Y have the joint p.d.f. f:
![Page 41: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/41.jpg)
Theorem For all random variables X and Y
Cov(X, Y ) = E(XY) − E(X)E(Y).Proof
Cov(X, Y ) = E(XY − μXY − μYX + μXμY )
= E(XY) − μXE(Y) − μYE(X) + μXμY .
![Page 42: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/42.jpg)
CorrelationLet X and Y be random variables with finite
variances σX2 and σY
2 , respectively.Then the correlation of X and Y , which is
denoted by ρ(X, Y), is defined as follows:
![Page 43: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/43.jpg)
Theorem
![Page 44: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/44.jpg)
Properties of Covariance and CorrelationIf X and Y are independent random variables
Cov(X, Y ) = ρ(X, Y) = 0.Proof If X and Y are independent, then
E(XY) = E(X)E(Y). Cov(X, Y ) = 0. Also, it follows that ρ(X, Y) = 0.
![Page 45: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/45.jpg)
Theorem Suppose that X is a random variable and Y =
aX + b. If a>0, then ρ(X, Y) = 1. If a <0, then ρ(X, Y)=−1.
Since σY= |a|σX, the theorem follows from Correlation equation.
![Page 46: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/46.jpg)
Theorem If X and Y are random variables
Var(X + Y) = Var(X) + Var(Y ) + 2 Cov(X, Y ).
![Page 47: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649efa5503460f94c0b9f3/html5/thumbnails/47.jpg)
Theorem