Download - Lecture8 Handout
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 1/51
Lecture 8: Estimation
Matt Golder & Sona Golder
Pennsylvania State University
Introduction
Introduction
Populations are characterized by numerical descriptive measures calledparameters. These parameters that describe a population are fixed constants .
Important population parameters that we might be interested in are thepopulation mean µ and variance σ2. A parameter of interest is often called atarget parameter.
Methods for making inferences about parameters fall into one of two categories
1 We will estimate (predict) the value of the target parameter of interest.“What is the value of the population parameter?”
2 We will test a hypothesis about the value of the target parameter.“Is the parameter value equal to this specific value?”
We’ll use greek letters for population parameters (like θ), and letters with“hats” (θ) for specific data-based estimates of those parameters.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 2/51
Introduction
Suppose we are interested in estimating the mean waiting time µ in asupermarket. We can give our estimate in two forms.
1 Point estimate: A single value or point – say, 3 minutes – that we thinkis close to the unknown population mean µ.
2 Interval estimate: Two values that correspond to an interval – say 2 and4 minutes – that is intended to enclose the parameter of interest µ.
Estimation is accomplished by using an estimator for the target parameter.
Introduction
An estimator is a rule, often expressed as a formula, that tells us how tocalculate the value of an estimate based on the measurements contained in asample.
For example, the sample mean
X =1
N
N i=1
X i
is one possible point estimator of the population mean µ.
Recall that any estimate we make is itself a random variable.
Random Variables
One way to think of a random variable is that it is made up of two parts: asystematic component and a random part.
X i = µ + ui
This implies something about u:
ui = X i − µ
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 3/51
Random Variables
What is the expected value of u?
E(u) = E(X − µ)
= E(X )− E(µ)
= E(X )− µ= µ− µ
= 0
The mean is a number such that, if it is subtracted from each value of X in
the sample, the sum of those differences will be zero.
Random Variables
What about the variance of X and u?
Var(X ) = E[(X − µ)2]
= E[u2]
Var(u) = E[(u− E (u))2]
= E[(u− 0)2]
= E[u2]
If we define a random variable as composed of a fixed part and a random part,then:
The variable will have a population mean (i.e. E (X )) equal to µ, and
The variance of X is equal to the variance of u.
Estimates are Random Variables
Suppose that:
We want to know µ for the population, but
we only have data on a sample of N observations from the population, so
we use these data to estimate the mean.
X =1
N
N i=1
X i
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 4/51
Estimates are Random Variables
Recalling that each X i = µ + ui, then we can write:
X =1
N
N i=1
(µ + ui)
=
1
N
N i=1(µ) +
1
N
N i=1(ui)
=1
N (Nµ ) +
1
N
N i=1
(ui)
= µ + u
Estimates are Random Variables
This means that
The estimate of the mean is itself a random variable .
For different samples, we’ll get different values of u (the sample-based“average” of the stochastic component of X ), and correspondinglydifferent estimates of the mean.
All of this (stochastic) variation is due to the random component of X .
We could show in analogous fashion that the usual estimate of the variance σ2
(s2) is also a random variable.
Properties of Estimators
In theory, there are many different estimators. The one(s) we will choose willdepend on their properties .
There are two general types of properties of estimators:
Small-Sample Properties
These properties hold irrespective of the size of the sample on which theestimate is based.
In other words, in order for a estimator to have these properties, theymust hold for all possible sample sizes.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 5/51
Properties of Estimators
Large-Sample (Asymptotic) Properties
These are properties which hold only as the sample size increases toinfinity.
In practical terms, it means that to receive the benefits of theseproperties, “more is better” (at least as far as sample size goes).
In what follows, we’ll consider an abstract population parameter θ (a mean, a
correlation, etc.) We’ll assume that we estimate it with a sample of N
observations. And we’ll call this generic estimator θ.
Unbiased Point Estimators
We generally prefer that estimators be “accurate” i.e., that they reflect thepopulation parameter as closely as possible.
E(θ) = θ
If this property holds, then we say an estimator is unbiased .
An unbiased estimator is one for which its expected value is thepopulation parameter.
Unbiased Point Estimators
Definition: Let θ be a point estimator for parameter θ. Then θ is an unbiasedestimator if E (θ) = θ. If E (θ) = θ, then θ is said to be biased. The bias of apoint estimator θ is given by B(θ) = E (θ)− θ.
Figure: Sampling Distribution for an Unbiased and Biased Estimator
E( 1)θ
θ
f( )θ 1
θ 2
θ 1
θ E( 2)θ
Bias
f( 2)θ
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 6/51
Unbiased Point Estimators
We’ve already seen that the sample mean is an unbiased estimate of thepopulation mean:
E(X ) = E(µ + u)
= E(µ) + E(u)
= µ + 0= µ
But only if we have random sampling.
Unbiased Point Estimators and Random Sampling
Example : Suppose each of 200,000 people in a city under study has eaten X number of fast-food meals in the last week. However, a residential phonesurvey on a week-day afternoon misses those who are working - the very peoplemost likely to eat fast food.
Table: Target Population and Biased Subpopulation
Whole Target Population Subpopulation RespondingX = Meals Frequency Relative Frequency Frequency Relative Frequency
0 100,000 0.50 38,000 0.761 40,000 0.20 6,000 0.122 40,000 0.20 4,000 0.083 20,000 0.10 2,000 0.04
200,000 1.00 50,000 1,00
Unbiased Point Estimators and Random Sampling
Population mean: µ = 0(0.5) + 1(0.2) + 2(0.2) + 3(0.1) = 0.9.Subpopulation mean: µR = 0(0.76) + 1(0.12) + 2(0.08) + 3(0.04) = 0.4.
A random sample of 200 phone calls during the week will bring a response rateof about 50, whose average R will be used to estimate µ. What is the bias?
The sample mean R has an obvious non-response bias.
Bias = E (R)− µ
= µR − µ
= 0.4− 0.9 = −0.5
The bias is large and will lead researchers to underestimate the number of fastfood meals eaten in a week by 0.5.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 7/51
Unbiased Point Estimators
So, how do we know if an estimator is unbiased?
As the example of the sample mean illustrates, we can sometimes prove it.
Other times, it can be difficult or impossible to show that an estimator is
unbiased.
Moreover, there may be many, many unbiased estimators for a particularpopulation parameter.
Unbiased Point Estimators
Example: Consider a sample of two observations X 1 and X 2, and a generalizedestimator for the mean:
Z = λ1X 1 + λ2X 2.
E(Z ) = E(λ1X 1 + λ2X 2)
= E(λ1X 1) + E(λ2X 2)
= λ1E(X 1) + λ2E(X 2)
= λ1µ + λ2µ= (λ1 + λ2)µ
Unbiased Point Estimators
So long as (λ1 + λ2) = 1.0, then E(Z ) = µ and the estimator is unbiased.
This means that there are, in principle, an infinite number of unbiasedestimators.
We could extend this to N observations: So long as the sum of the“weights” add up to 1.0, the estimate is unbiased.
So how do we choose which estimator to use?
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 8/51
Relative Efficiency of Point Estimators
If θ1 and θ2 denote two unbiased estimators for the same parameter θ, weprefer to use the estimator with the smaller variance i.e. the estimator whosesampling distribution is more concentrated around the target parameter. Thisis the notion of efficiency.
Figure: Comparing the Efficiency of Estimators
f( 1)θ
θ 1 θ f( 2)θ
θ θ 2
Relative Efficiency of Point Estimators
To compare the relative efficiency of two estimators we examine the ratio of their variances.
Definition: Given two unbiased estimators θ1 and θ2 of a parameter θ, withvariances Var(θ1) and Var(θ2), respectively, then the efficiency of θ1 relative toθ2, denoted eff (θ1, θ2), is defined by the ratio
eff (θ1, θ2) ≡ Var(θ2)
Var(θ1)
If θ1 and θ2 are unbiased estimators for θ, eff (θ1, θ2), is greater than 1 only if Var(θ2) > Var(θ1). In this case, θ1 is a better unbiased estimator than θ2.
If eff (θ1, θ2) < 1, then θ2 is preferred to θ2.
Relative Efficiency of Point Estimators
Example : We saw a long time ago that when the population being sampled isexactly symmetric, its center can be estimated without bias by the samplemean X and the sample median X Med. But which is more efficient?
When we sample from a normal population, it can be shown that
Var(X Med) 1.57 σ2
N .
And we already know that for a normal population Var(X ) = σ2
N .
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 9/51
Relative Efficiency of Point Estimators
This means that:
eff (X, X Med ) ≡ Var(X Med)
Var(X )
1.57σ2/N
σ2/N
= 1.57 = 157%
The sample mean X is 57% more efficient than the sample median X Med.
The sample median will yield as accurate an estimate as the sample mean onlyif we take a 57% larger sample.
Relative Efficiency of Point Estimators
Example : A distribution with thicker tails than the normal distribution is calledthe Laplace distribution. What is the efficiency of the sample mean relative tothe sample median now?
Figure: Comparing the Standard Normal Distribution and Standard LaplaceDistribution
Relative Efficiency of Point Estimators
In sampling from a Laplace distribution, Var(X Med) 0.5 σ2
N .
And so, we have:
eff (X, X Med ) ≡ Var(X Med)
Var(X ) 0.5σ2/N
σ2/N = 0.57 = 50%
The sample mean is less efficient than the sample median in this case.
If a symmetric population has thick tails, so that outlying observations arelikely to occur, then the sample mean has a larger variance. This is b ecause ittakes into account all observations, even the distant outliers that the samplemedian ignores.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 10/51
Relative Efficiency of Point Estimators
Example : Consider our generalized estimator Z = λ1X 1 + λ2X 2. Consider thevariance of Z .
Var(Z ) = Var(λ1X 1 + λ2X 2)
= (λ21 + λ2
2)σ2
We want to know what combination minimizes this variance. Since we knowthat λ1 + λ2 = 1.0, we can rewrite:
λ21 + λ2
2 = λ21 + (1 − λ1)2
= λ21 + (1 − 2λ1 + λ2
1)
= 2λ21 − 2λ1 + 1
Relative Efficiency of Point Estimators
We then minimize this by taking the derivative with respect to λ1 and settingequal to zero:
4λ1 − 2 = 0
λ1 = 0.5
So the (equally-weighted) sample average has the smallest variance of all the
possible unbiased estimators for the population mean.
Efficient Point Estimators
So far, we have talked about the relative efficiency of point estimators.
However, we sometimes talk about an efficient estimator in absolute terms.
An efficient estimator is an unbiased estimator whose variance is equal towhat is known as the Cramer-Rao lower bound, I (θ).
If we can show that an estimator is equal to the Cramer-Rao lower bound, thenwe know that we have the most efficient estimator.
Note that an efficient estimator must be unbiased. It is possible that a biasedestimator has a smaller variance than I (θ), but that does not make it efficient.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 11/51
Efficient Point Estimators
Let X 1, X 2, . . . , X N denote a random sample from a probability densityfunction f (x), which has unknown parameter θ. If θ is an unbiased estimatorof θ, then under very general conditions
Var(θ) ≥ I (θ)
where
I (θ) =
NE
−∂ 2lnf (x)
∂θ2
−1
This is known as the Cramer-Rao inequality.
If Var(θ) = I (θ), then the estimator θ is said to be efficient.
Mean Squared Error
Clearly one would like to have unbiased and efficient estimators. If one had thechoice of two unbiased estimators, we would chose the more efficient one.
But what if we are comparing both biased and unbiased estimators? It turnsout that it may no longer be appropriate to select the estimator with leastvariance or the estimator with least bias.
Maybe we will want to “tradeoff” some bias in favor of gains in efficiency, orvice versa.
Mean Squared Error
How do we decide which estimator is closest to the target parameter θ overall?
Figure: Mean Squared Error
θ 3
θ 2
θ 1
θ
It turns out that we use something called the mean squared error (MSE).
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 12/51
Mean Squared Error
Definition: The mean squared error (MSE) of a point estimator θ is
MSE(θ) = E [(θ − θ)2]
The mean squared error is, therefore, the expected squared bias of an estimator.
The MSE can be re-written as
MSE(θ) = Var(θ) + [B(θ)]2
This shows that the MSE reduces to the variance for unbiased estimators.
Mean Squared Error
The MSE can be regarded as a general kind of variance that applies to eitherunbiased or biased estimators.
This leads to the general definition of the relative efficiency of two estimators.
Definition: For any two estimators – whether biased or unbiased –
eff (θ1, θ2) ≡ MSE(θ2)
MSE(θ1)
With regards to our three hypothetical estimators shown earlier, θ2 has theleast mean squared error and is therefore the more “efficient” estimator.
If we choose this estimator, we’d be trading off slightly more bias for greaterefficiency.
Mean Squared Error
Example: We want to estimate µ with a sample size of N . One estimator isthe mean (X ), which has:
B(X ) = 0 (because the mean is an unbiased estimator of µ).
Var(X ) = σ2/N , where σ2 is the variance of X .
MSE(X ) = σ2/N + (0)2 = σ2/N .
An alternative estimator, λ, might be:
λ = 6
In other words, this estimator says that its guess of the expectation of X is
always equal to six.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 13/51
Mean Squared Error
The bias of λ, B(λ), is:
B(λ) = E(λ − µ) = E(6)− E(µ) = 6− µ
The variance of λ is:
Var(λ) = Var(6) = 0
And, the MSE of λ is:
MSE(λ) = Var(λ) + [B(λ)]2
= 0 + (6− µ)2
= 36 − 12µ + µ2
Mean Squared Error
Figure: Mean Squared Error
The black line is the MSE of λ as a function of the “true” population mean µ.
The red lines are the MSEs for X , under the assumption that σ2 = 10 andN = {20, 100, 1000}, respectively.
Mean Squared Error
There are several things to note:
The MSE of λ is quite good if µ ≈ 6. In some circumstances, the MSEfor λ will be smaller than that for X , even though X is both unbiasedand a “better” estimator.
But, the MSE of λ gets much worse as µ gets further away from six.Since we don’t know whether µ = 6 or not, this is not a desirableproperty.
Relatedly, our estimator λ doesn’t “improve” in MSE terms if we addmore data to our sample (that is, as N →∞).
In contrast, the MSE of X drops considerably as N increases, and doesso irrespective of the “true” value of µ.
This example illustrates that while MSE can be a good way to choose amongestimators, it shouldn’t be applied uncritically.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 14/51
Mean Squared Error
Example : Recall the phone survey of 50 responses from 200 calls that had aserious non-response bias. In addition, the average response R has variabilitytoo. Calculate the MSE of R.
Table: Biased Subpopulation
r f (r) rf (r) r−
µR (r−
µR)2 (r−
µR)2f (r)0 0.76 0 -0.4 0.16 0.12161 0.12 0.12 0.6 0.36 0.04322 0.08 0.16 1.6 2.56 0.20483 0.04 0.12 2.6 6.76 0.2704
µR = 0.4 σ2R = 0.64
Mean Squared Error
We saw from earlier that the bias was -0.5.
Var(R) =σ2
R
N =
0.64
50= 0.013
MSE(R) = Var(R) + [Bias(R)]2
= 0.013 + 0.25 = 0.263
Mean Squared Error
If we increase the sample size fivefold, how much would the MSE be reduced?
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 15/51
Mean Squared Error
If we increase the sample size fivefold, how much would the MSE be reduced?
Var(R) =σ2
R
N =
0.64
5 × 50= 0.003
The increase in sample size would not affect the bias and so
MSE(R) = Var(R) + [Bias(R)]2
= 0.003 + 0.25 = 0.253
Given that the main term in the MSE is the bias and this has not been
reduced, an increase in sample size does not affect the MSE that much.
Mean Squared Error
A second statistician takes a sample survey of only N = 20 phone calls, withpersistent follow-up until he gets a response. Let this small but unbiasedsample have a sample mean denoted by X . What is the MSE?
Table: Whole Population
x f (x) xf (x) x− µ (x− µ)2 (x− µ)2f (x)0 0.50 0 -0.90 0.81 0.4051 0.20 0.20 0.10 0.01 0.0022 0.20 0.40 1.10 1.21 0.242
3 0.10 0.30 2.10 4.41 0.441µ = 0.9 σ2 = 1.09
Mean Squared Error
Var(X ) =σ2
X
N =
1.09
20= 0.055
MSE(X ) = Var(X ) + [Bias(X )]2 = 0.055 + 0 = 0.055
The variance is larger due to the smaller sample size but the mean squarederror is much smaller.
In publishing his results, the second statistician is criticized for using a sample
only 1/10 the size of the first statistician. What defense might he offer?
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 16/51
Mean Squared Error
Var(X ) =σ2
X
N =
1.09
20= 0.055
MSE(¯
X ) = Var(¯
X ) + [Bias(¯
X )]
2
= 0.055 + 0 = 0.055
The variance is larger due to the smaller sample size but the mean squarederror is much smaller.
In publishing his results, the second statistician is criticized for using a sampleonly 1/10 the size of the first statistician. What defense might he offer?
MSE(X ) = 0.055
MSE(R) = 0.253
Large Sample Properties
Unbiasedness, relative efficiency, and efficiency are small-sample properties of estimators, that hold irrespective of sample size.
In contrast, large-sample properties are properties of estimators that hold onlyas the sample size increases without limit.
Note that this is dependent on sample size , not on the “number” of samplesdrawn.
Intuitively: what would you expect to happen as sample size gets larger?
The variance around the “true” value decreases (less possibility of drawing a “bad” sample).
Eventually the sample size = population, and the estimate “collapses” onthe true value.
Consistent Estimators
In an informal sense, a consistent estimator is one that concentrates in anarrower and narrower band around its target as sample size N increasesindefinitely.
Figure: Consistency
θ
N=5
N =10
N=50
N=200
One of the conditions that makes an estimator consistent is if its bias and
variance both approach zero as the sample size increases.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 17/51
Consistent Estimators
Definition: The estimator θN is said to be a consistent estimator of θ if itconverges in probability to its population value as N goes to infinity. We writethis as:
limN →∞
Pr|θN − θ| ≤
= 1
or equivalently
limN →∞
Pr|θN − θ| >
= 0
for an arbitrarily small > 0.
If we consider an estimator whose properties vary by sample size (say θN ), then
θN is consistent if E(θN ) → θ as N →∞.
Consistent Estimators
Example : Is the sample mean X a consistent estimator of the population meanµ?
We know that X is unbiased and that Var
σ2
N
approaches zero as N
increases.
As a result, X is both an unbiased and consistent estimator of µ.
Consistent Estimators
The fact that the sample mean is consistent for the population mean, orconverges in probability to the population mean, is sometimes referred to as alaw of large numbers.
This provides the theoretical justification for the averaging process that many
employ to obtain precision in measurements. For example, an experimenter
may take the average of the weights of many animals to obtain a precise
estimate of the average weight of animals in a species.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 18/51
Consistent Estimators
Is P a consistent estimator of π? Is the average response in our fast foodexample R a consistent estimator of µ?
Consistent Estimators
Is P a consistent estimator of π? Is the average response in our fast foodexample R a consistent estimator of µ?
Because proportions are just disguised means, it follows that P is also anunbiased and consistent estimator of π.
Consistent Estimators
Is P a consistent estimator of π? Is the average response in our fast foodexample R a consistent estimator of µ?
Because proportions are just disguised means, it follows that P is also anunbiased and consistent estimator of π.
In terms of our fast food example, we saw that the estimator R concentrated
around µR = 0 .40, which is far below the target µ = 0.90. Thus, R is
inconsistent.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 19/51
Asymptotically Unbiased Estimators
An asymptotically unbiased estimator has a bias that tends to zero as samplesize N increases. If its variance also tends to zero, then the estimator isconsistent.
Although the MSD estimator is a biased estimator of the population varianceσ2, is it asymptotically unbiased?
Mean Squared Deviation =1
N
N i=1
(X N − X )2
Recall from last time that the sample variance is an unbiased estimator of thepopulation variance
s2 =1
N − 1
N i=1
(X N − X )2
Asymptotically Unbiased Estimators
We can write the MSD in terms of the unbiased s2.
Mean Squared Deviation =
N − 1
N
s2 =
1− 1
N
s2
E (MSD) =
1− 1
N
E (s2)
=
1− 1
N
σ2 = σ2 −
1
N
σ2
Since1
N tends to zero as N increases, the bias tends to zero. As a result, theMSD is biased but asymptotically unbiased.
It can also be shown that the variance of MSD approaches zero as the sample
size increases. As a result, the MSD is a consistent estimator of the population
variance.
Asymptotic Efficiency
Asymptotic efficiency can be thought of as efficiency as N →∞.
It is intuitive to think of this as the “speed” with which θ “collapses” on θ.
All else equal, we prefer an estimator that does so faster (i.e. for smallersample sizes) rather than more slowly.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 20/51
General Issues
We prefer estimators that have desirable small-sample properties.
We prefer unbiased to consistent estimators, and
We prefer efficient to asymptotically efficient ones.
but...
We can’t always figure out the small sample properties of certain
estimators, and/or
Our estimators with desirable small-sample properties may have otherproblems (e.g. computational cost).
As a result, we often have to choose among estimators that differ in theirdegree of desirable properties.
Properties of Point Estimators
To go to Properties of Point Estimators applet, click here
Some Common Unbiased Point Estimators
Table: Expected Values and Standard Errors of Common Point Estimators
Target Parameter θ Sample Size(s) Point Estimator θ E(θ) Standard Error σθ
µ N X µ σN
π N P = XN
= X π
P (1−P )N
µ1 − µ2 N 1 and N 2 X1 − X2 µ1 − µ2
σ21N 1
+σ22N 2
π1 − π2 N 1 and N 2 P 1 − P 2 π1 − π2
P 1(1−P 1)
N 1+
P 2(1−P 2)N 2
The difference in means and difference in proportions assume that the randomsamples are independent.
All four estimators in the table possess sampling distributions that are
approximately normal for large samples.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 21/51
Interval Estimators and Confidence Intervals
An interval estimator is a rule specifying the method for using the samplemeasurements to calculate two numbers that form the endpoints of an interval.
Ideally, the resulting interval will have two properties.
1 It should contain the target parameter θ.
2 It should be as narrow as possible.
The length and location of the interval are random variables and we cannot becertain that a (fixed) target parameter will fall in the interval calculated from asingle sample.
We want to find an interval estimator capable of generating narrow intervalsthat have a high probability of enclosing θ.
Interval Estimators and Confidence Intervals
Interval estimators are more commonly called confidence intervals.
The upper and lower end points of a confidence interval are called the upperand lower confidence limits (bounds).
The probability that a (random) confidence interval will enclose θ (a fixedquantity) is called the confidence coefficient.
The confidence coefficient identifies the fraction of the time, in repeatedsampling, that the intervals constructed will contain the target parameter θ.
If the confidence coefficient associated with our estimator is high, then we can
be highly confident that any confidence interval, constructed by using the
results from a single sample, will enclose θ.
Interval Estimators and Confidence Intervals
Suppose that θL and θU are the (random) lower and upper confidence limits,respectively, for a parameter θ.
Then if
Pr(θL ≤ θ ≤ θU ) = 1− α
then the probability (1−α) is the confidence coefficient (or level of confidence).
The resulting random interval defined by [θL, θR] is called a two-sidedconfidence interval.
The value of 1
−α is something that is determined by the researcher, and is
usually set with an eye to whether she is more concerned with the parameter θbeing in the confidence interval, or with the relative precision of the intervalestimate.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 22/51
Interval Estimators and Confidence Intervals
It is also possible to form a lower one-sided confidence interval such that
Pr(θL ≤ θ) = 1− α
The implied confidence interval here is [θL,∞).
Similarly, we could have what is called an upper one-sided confidence intervalsuch that
Pr(θ ≤ θU ) = 1 − α
The implied confidence interval here is (−∞, θU ).
Interval Estimators and Confidence Intervals
One method for finding confidence intervals is called the pivotal method .
To use this method, we must have a pivotal quantity that possesses twocharacteristics:
1 It is a function of the sample measurements and the unknown parameterθ, where θ is the only unknown quantity.
2 Its probability distribution does not depend on the parameter θ.
If an estimator has these characteristics, then (as we’ll discuss below) we can
use simple linear transformations to construct confidence intervals.
Large-Sample Confidence Intervals
As we noted previously, the sampling distribution of a mean (or any sum of asufficiently large number of independent random variables) follows a normaldistribution.
Our typical estimator of µ, denoted X , can be thought of as being normallydistributed:
X ∼ N (µ, σ2X )
where we defined σ2X = σ2
N and σ2 is just the variance of X .
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 23/51
Large-Sample Confidence Intervals
To use the pivotal method, we must have a pivotal quantity that possesses twocharacteristics:
1 It is a function of the sample measurements and the unknown parameterθ, where θ is the only unknown quantity.
2 Its probability distribution does not depend on the parameter θ.
With respect to these two criteria:
1 The sample mean X depends only on the values of X in the sample, andon the value of µ.
2 The shape of its sampling distribution does not depend on µ, but only onother things (like the size of the sample).
Large-Sample Confidence Intervals
To construct a confidence interval, we can start with the sample statistic X .
Since we know that E(X ) = µ, it makes sense to use the sample value X asthe “center” or “pivot” of our confidence interval.
Next, we choose a level of confidence – tradition suggests that we set1− α = 0.95 (a “95 percent level of confidence”), though there’s nothingspecial about this number.
This means that we want to create a confidence interval such that
Pr(X L ≤ µ ≤ X U ) = 0.95
Large-Sample Confidence Intervals
One way of calculating the bounds of the confidence interval is to choose X Land X U so that
Pr(µ < X L) =
XL
−∞φX (u) du = 0.025
and
Pr(µ > X H ) =
∞XH
φX (u) du = 0.025.
Since we know the parameters of φX – that is, the distribution is N (µ, σ2X ) –
calculating values for the upper and lower limits of a confidence interval isstraightforward.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 24/51
Large-Sample Confidence Intervals
More generally, for any sample statistic θ which is an estimator of θ (where θmight be µ, π, µ1 − µ2, or π1 − π2) and whose sampling distribution is Normal(in large samples), the statistic
Z =θ − θ
σθ
is distributed according to a standard normal distribution.
As a result, Z forms (at least approximately) a pivotal quantity – it is a
function of the sample measurements θ and a single unknown parameter θ, and
the standard normal distribution does not depend on θ.
Large-Sample Confidence Intervals
We can consider two values in the tails of that standard normal distribution−z α/2 and z α/2 such that
Pr(−z α/2 ≤ Z ≤ z α/2) = 1− α.
We can rewrite this as
1−
α = Pr−z α/2 ≤
θ
−θ
σθ ≤z
α/2
= Pr−z α/2σθ ≤ θ − θ ≤ z α/2σθ
= Pr
−θ − z α/2σθ ≤ −θ ≤ −θ + z α/2σθ
= Pr
θ − z α/2σθ ≤ θ ≤ θ + z α/2σθ
Large-Sample Confidence Intervals
Figure: Location of −z α/2 and z α/2
zα/2-zα/2
α/2α/2
1-α
This means that a (1− α)× 100-percent confidence interval for θ is given by
[θL, θU ] =
θ − z α/2σθ, θ + z α/2σθ
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 25/51
Large-Sample Confidence Intervals
Thus, constructing a confidence interval for a variable whose (asymptotic)sampling distribution is normal consists of five steps:
1 Select your level of confidence 1 − α.
2 Calculate the sample statistic θ.
3 Calculate the z -value associated with the 1− α level of confidence.
4 Multiply that z -value by σθ, the standard error of the sampling statistic.
5 Construct the confidence interval according to
[θL, θU ] =
θ − z α/2σθ , θ + z α/2σθ
.
Large-Sample Confidence Intervals: Mean
Example (Mean): The shopping times of N = 64 randomly selected customersat a supermarket were recorded. The average and variance of the 64 shoppingtimes were 33 and 256 respectively. Estimate µ, the true average shoppingtime per customer, with a confidence coefficient of 1− α = 0.90 i.e. a 90%confidence interval.
In this case, we are interested in target parameter θ = µ. Thus, θ = X = 33and s2 = 256 for a sample of N = 64 . The population variance σ2 is unknown,so we will use s2 as its estimated value.
The confidence interval
θ ± z α/2σθ
has the form
X ± z α/2
σ√ N
≈ X ± z α/2
s√ N
Large-Sample Confidence Intervals: Mean
If we use a standard normal distribution table, we can find thatz α/2 = z 0.05 = 1.645.
Thus, the confidence intervals are
X − z α/2
s√ N
= 33 − 1.645
16
8
= 29.71
X + z α/2
s√ N
= 33 + 1.645
16
8
= 36.29
In other words, our confidence interval for µ is [29.71, 36.29].
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 26/51
Interpreting Confidence Intervals
Our confidence interval for µ is [29.71, 36.29]. What does this mean?
It is very important to remember that this 90% confidence interval does NOTmean that there is a 90% chance that the true population mean µ is in thisinterval.
The population mean is a fixed constant and is either in the confidence intervalor it is not.
Interpreting Confidence Intervals
The correct interpretation is that over a large number of repeated samples,
approximately 90% of all intervals of the form X ± 1.645
s√ N
will include µ,
the true population mean.
Although we do not know whether the particular interval [29.71, 36.29] that wehave calculated from our sample contains µ, the procedure that generated ityields intervals that do capture the true mean in approximately 90% of allinstances where the procedure is used.
This is why we sometimes say that we are “90% confident” that the interval
contains the target parameter.
Large-Sample Confidence Intervals: Mean
What if we wanted a confidence coefficient of 1− α = 0.95 i.e. a 95%confidence interval?
If we use a standard normal distribution table, we can find thatz α/2 = z 0.025 = 1.96.
Thus, the confidence intervals are
X − z α/2
s√ N
= 33 − 1.96
16
8
= 29.08
X + z α/2
s√ N
= 33 + 1.96
16
8
= 36.92
In other words, our 95% confidence interval for µ is [29.08, 36.92].
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 27/51
Large-Sample Confidence Intervals: Mean
What if we wanted a confidence coefficient of 1− α = 0.99 i.e. a 99%confidence interval?
If we use a standard normal distribution table, we can find thatz α/2 = z 0.005 = 2.58.
Thus, the confidence intervals are
X − z α/2
s√ N
= 33 − 2.58
16
8
= 27.84
X + z α/2
s√ N
= 33 + 2.58
16
8
= 38.16
In other words, our 99% confidence interval for µ is [27.84, 38.16].
Large-Sample Confidence Intervals: Proportions
As we noted previously, for π sufficiently different from either zero or one, andN sufficiently large, the sampling distribution of P is N (π, σ2
P ).
That means that we can calculate confidence intervals for an estimatedproportion as
P L = P − z α/2
P (1 − P )
N
and
P U = P + z α/2
P (1− P )
N
Large-Sample Confidence Intervals: Proportions
Example : Suppose that we have a sample of size 20, and P = 0.390. Thelower bound of the associated 95% confidence interval is
πL = 0.390 − 1.96
0.39(0.61)
20
= 0.390 − 0.214
= 0.176
while the upper bound is
πU = 0.390 + 1.96
0.39(0.61)
20
= 0.390 + 0.214
= 0.604
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 28/51
Large-Sample Confidence Intervals: Proportions
Figure: Confidence Intervals for P = π for N = 20 (black dashes), N = 100(red dashes), and N = 400 (green dashes)
0.0 0.2 0.4 0.6 0.8 1.0
0 .
0
0 .
2
0 . 4
0 .
6
0 .
8
1 .
0
π
π
−
h a t
The confidence interval for a proportion is a straightforward function of twoquantities: the estimated proportion P = π, and the sample size N .
Large-Sample Confidence Intervals: Mean
To go to ConfidenceIntervalP under Estimation to illustrate how confidenceintervals work, click here
Difference in Proportions
Example (Difference in Proportions): Two brands of refrigerators, A and B, areeach guaranteed for 1 year. In a random sample of 50 refrigerators of brand A,12 were observed to fail before the guarantee period ended. An independentrandom sample of 60 brand B refrigerators also revealed 12 failures during theguarantee period. Estimate the true difference (π1 − π2) between proportionsof failures during the guarantee period with a confidence coefficientapproximately 0.98.
The confidence interval
θ ± z α/2σθ
has the form
(P 1 − P 2)± z α/2
P 1(1− P 1)N 1+ P 2(1− P 2)N 2
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 29/51
Difference in Proportions
We have P 1 = 0.24, 1 − P 1 = 0.76, P 2 = 0.20, and 1− P 2 = 0.80, andz 0.01 = 2.33.
Thus, the desired 98% confidence interval is
(0.24 − 0.20) ± 2.33 (0.24)(0.76)
50 +
(0.20)(0.80)
60
0.04 ± 0.1851 or [−0.1451, 0.2251]
Difference in Proportions
The 98% confidence interval is [-0.1451, 0.2251].
Notice that the confidence interval contains 0. Thus, a zero value for thedifference in proportions is “believable” (at approximately the 98% level) onthe basis of the observed data.
But of course the interval also contains the value 0.1, and so 0.1 represents
another value for the difference in proportions that is “believable” etc.
Selecting the Sample Size
There are two considerations in choosing the appropriate sample size forestimating µ using a confidence interval.
1 The tolerable error. This establishes the desired width of the confidenceinterval.
2 The confidence level that should be selected.
A wide confidence interval would not be very informative, but the cost of obtaining a narrow confidence interval could be quite large.
Similarly, too low a confidence level would mean that the stated confidence
interval is likely to be in error, but obtaining a higher level of confidence might
be quite expensive.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 30/51
Selecting the Sample Size
Suppose we wish to estimate the average daily yield µ of a chemical and wewish the error of estimation to be less than 5 tons with probability 0.95.
Because approximately 95% of the sample means will lie within 2σX (really1.96σX ) of µ in repeated sampling, we are asking that 2σX equals 5 tons.
2σ
√ N = 5
N =4σ2
25
Selecting the Sample Size
We cannot obtain an exact numerical value of N unless the populationstandard deviation σ is known.
We could use an estimate s obtained from a previous sample. Let’s say thatσ = 21 .
N =(4)(21)2
25= 70.56 = 71
Thus, using a sample size N = 71 , we can be 95% confidence that our
estimate will lie within 5 tons of the true average daily yield.
Small-Sample Confidence Intervals
The formula for calculating large-sample confidence intervals is
θ ± z α/2σθ
When θ = µ is the target parameter, then θ = X and σ2θ
= σ2
N , where σ2 is thepopulation variance.
If the true value of σ2 is known, then this value should be used whencalculating the confidence interval.
However, if σ2 is unknown (as will almost always be the case) and N is large,then there is no real loss of accuracy if s2 is substituted for σ2 (recall that s2
converges to σ2 as N increases).
As a result, we can use the standard normal distribution in these circumstancesas well.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 31/51
Small-Sample Confidence Intervals
Problems only arise if σ2 is unknown AND N is small.
In this case, we will need to calculate small-sample confidence intervals.
In effect, using s instead of σ introduces an additional source of unreliability
into our calculations and we must, therefore, widen the confidence intervals.
Small-Sample Confidence Intervals: Mean
In terms of the population mean, we have already seen that
Z =X − µ
σ/√
N
possessed approximately a standard normal distribution.
Well, if we substitute in s for σ we have
T =X − µ
s/√
N
which has a t distribution with (N − 1) degrees of freedom.
Small-Sample Confidence Intervals: Mean
The quantity T now serves as a pivotal quantity that we will use to formconfidence intervals for µ.
We can use a t distribution table to find values −tN −1,α/2 and tN −1,α/2 so that
P (−tN −1,α/2 ≤ µ ≤ tN −1,α/2) = 1− α
Thus, we will now construct our confidence intervals according to:
[X L, X U ] = X ± tN −1,α/2
s√ N
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 32/51
Student’s t-Distribution
Figure: Standard Normal and Student’s t-Distributions
The confidence intervals constructed using the z distribution and the t
distribution are effectively the same when the degrees of freedom ( N − 1) are
greater than 120; they are also very close as soon as the degrees of freedom
(N − 1) are greater than 30.
Student’s t-Distribution
To go to Comparison of Student’s t and Normal Distributions underDistributions Related to the Normal, click here
Small-Sample Confidence Intervals: Mean
Technically, the small-sample confidence intervals for the mean are based onthe assumption that the sample is randomly drawn from a normal population.
However, experimental evidence has shown that the interval for a single mean
is quite robust in relation to moderate departures from normality.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 33/51
Small-Sample Confidence Intervals: Mean
Example : A manufacturer of gunpowder has developed a new powder, whichwas tested in eight shells. The resulting muzzle velocities were: 3005, 2925,2935, 2965, 2995, 3005, 2937, 2905. Find a 95% confidence interval for thetrue average velocity µ for shells of this type. Assume that muzzle velocitiesare approximately normally distributed.
The confidence interval for µ is
X ± tN −1,α/2
s√ N
Small-Sample Confidence Intervals: Mean
For the given data, X = 2959 and s = 39 .1.
Using the table for the t-distribution, we have t7,0.025 = 2.365.
Thus, we have
2959 ± 2.365
39.1√
8
or 2959 ± 32.7
as the observed confidence interval for µ.
Small-Sample Confidence Intervals: Mean
. sum muzzle_velocity
Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- muzzle_vel~y | 8 2959 39.08964 2905 3005
. ci muzzle_velocity, level(95)
Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+--------------------------------------------------------------- muzzle_vel~y | 8 2959 13.82027 2926.32 2991.68
. ci muzzle_velocity, level(99)
Variable | Obs Mean Std. Err. [99% Conf. Interval]
-------------+--------------------------------------------------------------- muzzle_vel~y | 8 2959 13.82027 2910.636 3007.364
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 34/51
Small-Sample Confidence Intervals: Mean
Example : From a large class, a random sample of 4 grades was drawn: 64, 66,89, and 77. Calculate a 95% confidence interval for the whole class mean µ.Assume that the class grades are approximately normally distributed.
Table: Small-Sample Confidence Interval for a Mean
X (X − X ) (X − X )2
64 -10 10066 -8 6489 15 22577 3 9
X = 2964
= 74 0 s2 = 3983
= 132.7
Small-Sample Confidence Intervals: Mean
The confidence interval for µ is
X ± tN −1,α/2
s√ N
For the given data, X = 74 and s =√
132.7.
In this example, we have N − 1 = 3 degrees of freedom.
Using the table for the t distribution, we have t3,0.025 = 3.18.
Small-Sample Confidence Intervals: Mean
Thus, we have
74± 3.18
√ 132.7√
4
or 74 ± 18
as the observed confidence interval for µ.
That is, with 95% confidence, we can conclude that the mean grade of the
whole class is between 56 and 92.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 35/51
Difference in Means
Suppose we are interested in comparing the means of two normal populations,one with mean µ1 and variance σ2
1 and the other with mean µ2 and σ22 .
If the samples are independent , then confidence intervals for µ1 − µ2 based ona t-distributed random variable can be constructed if we assume that the twopopulations have a common but unknown variance, σ2
1 = σ22 = σ2.
If X 1 and X 2 are the two sample means, then the large-sample confidenceinterval for (µ1 − µ2) is developed by using
Z =(X 1 − X 2)− (µ1 − µ2)
σ21N 1
+σ22N 2
as a pivotal quantity.
Small-Sample Confidence Intervals: Difference in Means
Using the assumption σ21 = σ2
2 = σ2,
Z =(X 1 − X 2)− (µ1 − µ2)
σ
1N 1
+ 1N 2
Because σ is unknown, though, we need to find an estimator for the commonvariance σ2 so that we can construct a quantity with a t distribution.
Small-Sample Confidence Intervals: Difference in Means
Let X 11, X 12, . . . , X 1N 1 denote the random sample of size N 1 from the firstpopulation and let X 21, X 22, . . . , X 2N 2 denote an independent random sampleof size N 2 from the second population. Then we have
X 1 =1
N 1
N i=1
X 1i
and
X 2 =1
N 2
N i=1
X 2i
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 36/51
Difference in Means
The usual unbiased estimator of the common variance σ2 is obtained bypooling the sample data to obtain the pooled estimator s2
p:
s2p =
N 1i=1(X 1i − X 1)2 +
N 2i=1(X 2i − X 2)2
(N 1 − 1) + (N 2 − 1)
=(N 1 − 1)s2
1 + (N 2 − 1)s22
N 1 + N 2 − 2
where s2i is the sample variance from the ith sample, i = 1, 2.
Notice that if N 1 = N 2, then s2p is just the average of s2
1 and s22.
If N 1 = N 2, then s2p is the weighted average of s2
1 and s22, with larger weight
given to the sample variance associated with the larger sample size.
Difference in Means
From all of this we can calculate the following pivotal quantity:
T =(X 1 − X 2)− (µ1 − µ2)
sp
1
N 1+ 1
N 2
This quantity has a t distribution with (N 1 + N 2 − 2) degrees of freedom.
If we use the pivotal method, we find that the small-sample confidence interval
for (µ1 − µ2) is just
(X 1 − X 2)± tN 1+N 2−2,α/2 × sp
1
N 1+
1
N 2
Difference in Means
Technically, the small-sample confidence intervals for the difference in twomeans are based on the assumptions that the samples are randomly drawn fromtwo independent and normal populations with equal variances .
Experimental evidence has shown that these intervals are robust to moderatedepartures from normality and to the assumption of equal population variancesif N 1 ≈ N 2.
As N 1 and N 2 become dissimilar, the assumption of equal population variancesbecomes more crucial.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 37/51
Difference in Means
Example : Suppose we want to compare two methods for training people. Atthe end of the training, two groups of nine employees are timed at some task.The nine people who had the standard training had times of 32, 37, 35, 28, 41,44, 35, 31, and 34. The nine people who had the new training had times of 35,31, 29, 25, 34, 40, 27, 32, and 31. Estimate the true mean difference (µ1 − µ2)with confidence coefficient 0.95.
Assume that the assembly times are approximately normally distributed, thatthe variances of the assembly times are approximately equal for the twomethods, and that the samples are independent.
Difference in Means
For the standard training method, we have sample mean X 1 = 35.22 and
sample variance s21 =
9i=1(X1i−X1)
N −1= 195.56
8= 24 .445.
For the new training method, we have sample mean X 2 = 31.56 and sample
variance s22 =
9i=1(X2i−X2)
N −1 = 160.228 = 20 .027.
As a result, we have
s
2
p =
8(24.445) + 8(20.027)
9 + 9 − 2 =
195.56 + 160.22
16 = 22.236
sp = 4.716
Difference in Means
Since t16,0.025 = 2.120, the observed confidence interval is
(X 1 − X 2)± tN 1+N 2−2,α/2sp
1
N 1+
1
N 2
(35.22 − 31.56 ± (2.120)(4.716)
1
9+
1
9
3.66 ± 4.71
This confidence interval can be written as [-1.05, 8.37].
Since the interval contains both positive and negative numbers, we cannot saythat the new training method differs from the other at our given level of
confidence.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 38/51
Difference in Means
Example : From a large class, a sample of 4 grades were drawn: 64, 66, 89, and77. From a second large class, an independent sample of 3 grades were drawn:56, 71, and 53. Calculate the 95% confidence interval for the differencebetween the two class means, µ1 − µ2. Assume that the grades from bothclasses are approximately normally distributed and that the variances of thegrades are approximately equal for the two classes.
Difference in Means
Table: Difference in Two Means (Independent Samples): Class 1
X1 (X1 − X1) (X1 − X1)2
64 -10 10066 -8 6489 15 22577 3 9
X1 = 2964
= 74 0 s21 = 3983
= 132.7
Table: Difference in Two Means (Independent Samples): Class 2
X2 (X2 − X2) (X2 − X2)2
56 -4 1671 11 12153 -7 49
X2 = 1803
= 60 0 s22 = 1862
= 93
Difference in Means
Class 1: The sample mean is X 1 = 74 and the sample variance is
s21 =
4i=1(X1i−X1)
N −1= 398
3= 132.7.
Class 2: The sample mean is X 2 = 31.56 and the sample variance is
s22 =
3i=1(X2i−X2)
N −1 = 1862 = 93 .
s2p =
3(132.7) + 2(93
4 + 3 − 2=
398 + 186
5= 117
sp =√
117
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 39/51
Difference in Means
We can find that t5,0.025 = 2.57. The observed confidence interval is therefore
(X 1 − X 2)± tN 1+N 2−2,α/2sp
1
N 1+
1
N 2
(74 − 60 ± (2.57)√ 117
14
+ 13
14 ± 21
This confidence interval can be written as [-7, 35].
Since the interval contains both positive and negative numbers, we cannot say
that the mean grades differ from one class to the other at our given level of
confidence.
Difference in Means (Dependent or Matched Samples)
We might also want to compare means across dependent samples. Dependentsamples are sometimes called matched or paired samples .
Suppose that we want to compare the fall grades and spring grades for thesame students.
Table: Difference in Two Means (Dependent Samples)
Observed Grades Difference
Name X1 X2 D = X1 −X2 D − D (D − D)2
Trimble 64 57 7 -4 16Wilde 66 57 9 -2 4
Giannos 89 73 16 5 25Ames 77 65 12 1 1
D = 444 = 11 0 s2D = 46
3 = 15 .3
Difference in Means (Dependent or Matched Samples)
We can use the sample D to construct a confidence interval for the averagepopulation difference .
The confidence interval for in a matched pair sample is
= D ± tN −1,α/2
sD√
N
Suppose we want to construct a 95% confidence interval.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 40/51
Difference in Means (Dependent or Matched Samples)
D = 11 and sD =√
15.3.
Using the table for the t distribution, we have t3,0.025 = 3.18.
Thus, we have
11 ± 3.18√ 15.3
√ 4
or 11 ± 6
as the observed confidence interval for .
Difference in Means (Dependent or Matched Samples)
We are estimating the same parameter (the difference in two populationmeans) with the dependent samples as we did with the independent samples.
The matched-pair approach is much better because it has a smaller confidenceinterval. Why?
Independent samples confidence interval was [-7, 35].
Dependent samples confidence interval for the same data was just [5, 17].
Difference in Means (Dependent or Matched Samples)
We are estimating the same parameter (the difference in two populationmeans) with the dependent samples as we did with the independent samples.
The matched-pair approach is much better because it has a smaller confidenceinterval. Why?
Independent samples confidence interval was [-7, 35].
Dependent samples confidence interval for the same data was just [5, 17].
Essentially, pairing achieves a match that keeps many of the extraneous
variables that might affect our results constant.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 41/51
Overview
Example : To measure the effect of a fitness campaign, a ski club randomlysampled five members before the campaign and another 5 afterwards. Theweights were as follows:
Before: JH 168, KL 195, MM 155, TR 183, MT 169
After: LW 183, VG 177, EP 148, JC 162, MW 180
Calculate a 95% confidence interval for (i) the mean weight before thecampaign, (ii) the mean weight after the campaign, and (iii) the mean weightloss during the campaign.
Overview
Table: Small-Sample Confidence Interval for Difference in Two Means(Independent Samples)
Before AfterX1 (X1 − X1) (X1 − X1)2 X2 (X2 − x2) (X2 − X2)2
168 -6 36 183 13 169195 21 441 177 7 49155 -19 361 148 -22 484183 9 81 162 -8 64169 -5 25 180 10 100
X1 = 8705 = 174 0 944 X2 = 850
5 = 170 0 866
Overview
µ1 = 174± 2.78
944/4√
5= 174± 19
µ2 = 170± 2.78
866/4√
5= 170± 18
µ1 − µ2 = 174− 170 ± 2.31
944 + 866
4 + 4×
1
5+
1
5= 4 ± 22
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 42/51
Overview
It was then decided that a better sampling design would be to measure thesame people after, as before.
KL 194, MT 160, TR 177, MM 147, JH 157
Table: Difference in Two Means (Dependent Samples)
Weights Difference
Name X1 X2 D = X1 −X2 D − D (D − D)2
JH 168 157 11 4 16KL 195 194 1 -6 36
MM 155 147 8 1 1TR 183 177 6 -1 1MT 169 160 9 2 4
D = 355
= 7 0 s2D = 584
= 14 .5
= 7 ± 2.78
√ 14.5√
5
or 7 ± 5
Confidence Interval for Population Variance σ2
As we’ve seen before, s2 =N i=1(Xi−X)
N −1is an unbiased estimator of σ2.
Given that it is distributed according to a gamma distribution, it can bedifficult to determine the probability of it lying in a specific interval.
But we can transform it (as we did before) into a quantity that has a χ2
distribution with N − 1 degrees of freedom.
χ2
=
(N
−1)s2
σ2
Confidence Interval for Population Variance σ2
As we saw before, this can be written as:
χ2N −1 =
(N − 1)s2
σ2=
N i=1(X i − X )
σ2
This quantity becomes the pivotal quantity that allows us to calculateconfidence intervals for the population variance σ2.
In effect, we want to find two numbers χ2L and χ2
U such that
P χ2L ≤
N i=1(X i − X )
σ2≤ χ2
U = 1 − α
for any confidence coefficient (1− α).
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 43/51
Confidence Interval for Population Variance σ2
Figure: χ2 Distribution with (N − 1) = 3 Degrees of Freedom
n-1 = 3 degrees of freedom
We would like to find the shortest interval that includes σ2 with probability(1 − α). This is difficult and requires trial and error.
Typically, we compromise by choosing points that cut off equal tail areas.
Confidence Interval for Population Variance σ2
Given the choice to cut off equal tail areas, we obtain
P
χ2
N −1,1−(α/2) ≤N
i=1(X i − X )
σ2≤ χ2
N −1,α/2
= 1 − α
When we reorder the inequality, we get
P
(N − 1)s2
χ2N −
1,α/2
≤ σ2 ≤ (N − 1)s2
χ2N −
1,1−
(α/2)
= 1 − α
Thus, the 100(1 − α)% confidence interval for σ2 is(N − 1)s2
χ2N −1,α/2
,(N − 1)s2
χ2N −1,1−(α/2)
Confidence Interval for Population Variance σ2
Technically, the small-sample confidence intervals for the population varianceσ2 assume that the sampled population is normally distributed.
Unlike with small-sample confidence intervals for the population mean ordifference in population means, which are reasonably robust to deviations fromnormality, experimental evidence suggests that the small-sample confidenceintervals for the population variance can be quite misleading if the sampledpopulation is not normally distributed.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 44/51
Confidence Interval for Population Variance σ2
Example : Suppose we have a sample of observations with values 4.1, 5.2, and10.2. Estimate σ2 with confidence coefficient 0.90. Assume normality.
For the data, we have s2 = 10 .57.
We can see from the χ2 distribution table that χ22,0.95 = 0.103 and
χ
2
2,0.05= 5
.991
.
Confidence Interval for Population Variance σ2
Thus, the 90% confidence interval for σ2 is(N − 1)s2
χ22,0.05
,(n− 1)s2
χ22,0.95
(2)(10.57)
5.991,
(2)(10.57)
0.103
[3.53, 205.24]
This confidence interval is very wide. Why?
Comparing σ2 in Two Populations
What if we want to compare σ2 in two populations?
Rather than look at s21 − s2
2, which has a complicated sampling distribution, we
look ats21s22
.
Let two independent random samples of sizes N 1 and N 2 be drawn from twonormal populations with variances σ2
1 and σ22 .
Let the variances of the random samples be s21 and s2
2.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 45/51
Comparing σ2 in Two Populations
Notice that(N 1−1)s21
σ21and
(n2−1)s22σ22
are both independent χ2 random variables.
The ratio of two χ2N −1 random variables is distributed according to an F
distribution with (N 1 − 1) and (N 2 − 1) degrees of freedom, i.e.,
F N 1−
1,N 2−
1 =(N 1 − 1)s2
1
(N 1 − 1)σ21
/(N 2 − 1)s2
2
(N 2 − 1)σ22
=s2
1σ22
s22σ21
that has an F distribution with (N 1 − 1) numerator degrees of freedom and(N 2 − 1) denominator degrees of freedom.
This quantity acts as a pivotal quantity.
Comparing σ2 in Two Populations
And so, we want to find:
P
F N 1−1,N 2−1,α/2 ≤ s2
1σ22
s22σ2
1
≤ F N 1−1,N 2−1,1−(α/2)
= 1 − α
When we reorder the inequalities, we have
1
F N 1−1,N 2−1,1−(α/2)× s2
1
s22
≤ σ21
σ22
≤ 1
F N 1−1,n2−1,α/2× s2
1
s22
Thus, if we were to construct a 90% confidence interval for the ratio of twopopulation variances based on two sample variances where N 1 = 10 andN 2 = 8, we would have
1
F 9,7,0.95× s2
1
s22
≤ σ21
σ22
≤ 1
F 9,7,0.05× s2
1
s22
Comparing σ2 in Two Populations
But how do you find 1F 9,7,0.95
and 1F 9,7,0.05
?
Most F distribution tables will only give you information related to theright-hand tail of the distribution.
Thus, it is relatively straightforward to find that 1F 9,7,0.95
= 13.68 .
But how do we find what 1F 9,7,0.05
is?
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 46/51
Comparing σ2 in Two Populations
Recall from our discussion of probability distributions that if W 1 and W 2 areindependent and ∼ χ2
k and χ2 , respectively, then
W 1W 2
∼ F k,
In other words, the ratio of two chi-squared variables is distributed as F withd.f. equal to the number of d.f. in the numerator and denominator variables.
We saw that this implied that:
If X ∼ F (k, ), then 1X∼ F (, k) (because 1
X= 1
(W 1/W 2)= W 2
W 1).
Comparing σ2 in Two Populations
Well, it follows from this that:
F N 2−1,N 1−1,1−(α/2) =1
F N 1−1,N 2−1,α/2
Given this, we have
F 9,7,0.05 =1
F 7,9,0.95=
1
3.29= 0.3
As a result, we can write the 90% confidence interval as
1
3.68× s2
1
s22
≤ σ21
σ22
≤ 1
0.3× s2
1
s22
Comparing σ2 in Two Populations
Example : Two samples of sizes 16 and 10 are drawn at random from twonormal populations. Suppose their sample variances are 25.2 and 20respectively. Find the (i) 98% and (ii) 90% confidence limits for the ratio of the variances.
We need to calculate 1F 15,9,0.99
and 1F 15,9,0.01
.
Looking at the back of the book, we find that F 15,9,0.99 = 4.96.
We also know that F 15,9,0.01 = 1F 9,15,0.99
= 13.89 .
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 47/51
Comparing σ2 in Two Populations
Given this, we find for the 98% confidence interval that we have
1
4.96× 25.2
20.0≤ σ2
1
σ22
≤ 3.89× 25.2
20.0
0.283≤
σ21
σ22 ≤4.90
Comparing σ2 in Two Populations
We now need to find 1F 15,9,0.95
and 1F 15,9,0.05
.
Looking at the back of the book, we find that F 15,9,0.95 = 3.01.
We also know that F 15,9,0.05 = 1F 9,15,0.95
= 12.59 .
Given this, we find for the 90% confidence interval that we have
1
3.01× 25.2
20.0≤ σ2
1
σ22
≤ 2.59× 25.2
20.0
0.4186 ≤ σ21
σ22
≤ 3.263
Comparing σ2 in Two Populations
Example : Find the 98% and 90% confidence limits for the ratio of the standarddeviations in the previous example.
By taking square roots of the inequalities in the previous example, we find the98% confidence limits are
√ 0.283 ≤ σ1
σ2≤√
4.90
0.53 ≤ σ1
σ2≤ 2.21
and that the 90% confidence limits are
√ 0.4186 ≤ σ1
σ2≤ √ 3.263
0.65 ≤ σ1
σ2≤ 1.81
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 48/51
Stata
Let’s return to Zorn’s Warren & Burger Court data from last time.
One variable – constit – was coded one if the cases was decided onconstitutional grounds, and zero otherwise.
The true “population” proportion is π = 0.2536; we’ll use this as an example of
how we can learn about that parameter through the use of confidence intervals.
We’ll begin by considering a rather small random sample of cases (N = 20),and calculating the confidence interval for π based on that sample
Stata
. use WarrenBurger
. su
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
us | 0id | 7161 3581 2067.347 1 7161
amrev | 7161 .4319229 1.342633 0 33amaff | 7161 .4099986 1.302139 0 37sumam | 7161 .8419215 2.189712 0 39
-------------+--------------------------------------------------------fedpet | 7161 .173998 .3791343 0 1
constit | 7161 .2535959 .4350993 0 1sgam | 7161 .0786203 .269164 0 1
. sample 20, count(7141 observations deleted)
Stata
. ci constit, level(95)
Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+---------------------------------------------------------------
constit | 20 .2 .0917663 .0079309 .3920691
The confidence interval for this sample is [0.008, 0.392], which means that inrepeated random samples from this population, we would expect the “true”population parameter to be contained in an interval constructed in this way95% of the time.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 49/51
Stata
To illustrate the idea of a confidence interval, we can do what we just did, say,100 times, and then see how many of the resulting confidence intervals containthe true population value 0.2536.
program define CI20, rclassversion 10use WarrenBurger, clear
sample 20, counttempvar zgen ‘z’ = constitsummarize ‘z’return scalar mean=r(mean)return scalar ub=r(mean) + 1.96*sqrt((r(mean) * (1-r(mean)))/20)return scalar lb=r(mean) - 1.96*sqrt((r(mean) * (1-r(mean)))/20)end
. set seed 11101968
. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI20, nodots
Stata
. su
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
pihat | 100 .2335 . 0901892 .05 .5ub | 100 .4125235 .1171615 .1455186 .7191347lb | 10 0 .0 54 47 65 . 06 41 94 1 -. 04 55 18 6 . 28 08 65 3
. tab pihat
r(mean) | Freq. Percent Cum.------------+-----------------------------------
.05 | 4 4.00 4.00.1 | 7 7.00 11.00
.15 | 14 14.00 25.00.2 | 24 24.00 49.00
.25 | 21 21.00 70.00.3 | 10 10.00 80.00
.35 | 16 16.00 96.00.4 | 3 3.00 99.00.5 | 1 1.00 100.00
------------+-----------------------------------Total | 100 100.00
Stata
Figure: 100 Confidence Intervals for πconstit for N = 200
0
0
1
1
2
2
3
3
4
4Density of Estimates of pi)
( D e n s i t y o f E s t i m a t e s o f p i )
(Density of Estimates of pi)
0
02
. 2
.24
. 4
.46
. 6
.68
. 8
.8I Range
C I R a n g e
CI Range
0
01
.1
.12
.2
.23
.3
.34
.4
.45
.5
.5alue of Pi
Value of Pi
Value of Pi
Note that we have four observations with π = 0.05, seven with π = 0.10, andone with π = 0.50, all of which have calculated confidence intervals that do notinclude the “true” value π = 0.2536. That’s 12/100, or α = 0.12, which isquite different from α = 0.05.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 50/51
Stata
We can modify the code slightly to do the same thing with 100 samples of N = 100:
program define CI100, rclassversion 10use WarrenBurger, clearsample 100, count
tempvar zgen ‘z’ = constitsummarize ‘z’return scalar mean=r(mean)return scalar ub=r(mean) + 1.96*sqrt((r(mean) * (1-r(mean)))/100)return scalar lb=r(mean) - 1.96*sqrt((r(mean) * (1-r(mean)))/100)end
. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI100, nodots
Stata
. tab pihat
r(mean) | Freq. Percent Cum.------------+-----------------------------------
.16 | 3 3.00 3.00
.17 | 1 1.00 4.00
.18 | 2 2.00 6.00
.19 | 3 3.00 9.00.2 | 5 5.00 14.00
.21 | 7 7.00 21.00
.22 | 6 6.00 27.00
.23 | 10 10.00 37.00
.24 | 13 13.00 50.00
.25 | 14 14.00 64.00
.26 | 8 8.00 72.00
.27 | 8 8.00 80.00
.28 | 4 4.00 84.00
.29 | 1 1.00 85.00.3 | 7 7.00 92.00
.31 | 3 3.00 95.00
.32 | 2 2.00 97.00
.33 | 1 1.00 98.00
.35 | 1 1.00 99.00
.37 | 1 1.00 100.00------------+-----------------------------------
Total | 100 100.00
Stata
Figure: 100 Confidence Intervals for πconstit for N = 1000
0
0
5
50
1 0
10Density of Estimates of pi)
( D e n s i t y o f E s t i m a t e s o f p i )
(Density of Estimates of pi)1
. 1
.12
. 2
.23
. 3
.34
. 4
.45
. 5
.5I Range
C I R a n g e
CI Range15
.15
.152
.2
.225
.25
.253
.3
.335
.35
.35alue of Pi
Value of Pi
Value of Pi
Only six samples (three with π = 0.16, one with π = 0.17, one with π = 0.35, and
one with π = 0.37) out of 100 whose confidence intervals do not include π now.
That’s much closer to α = 0.05, as we expect it to be. What we say here is that the
“coverage probabilities” are getting better as the size of the sample increases.
Notes
Notes
Notes
7/29/2019 Lecture8 Handout
http://slidepdf.com/reader/full/lecture8-handout 51/51
Stata
If we do the same for 100 samples each with N = 400, the coverage gets evenbetter:
. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI400, nodots
command: CI400, nodotspihat: r(mean)
ub: r(ub)lb: r(lb)
Simulations (100)----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5.................................................. 50.................................................. 100
. su
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
pihat | 100 .24975 .0213984 .2 .3025ub | 1 00 . 29 21 02 6 . 022 60 69 . 23 92 . 3 47 515 4lb | 1 00 . 20 73 97 4 . 020 19 02 . 16 08 . 2 57 484 6
Stata
Figure: 100 Confidence Intervals for πconstit for N = 4000
0
0
5
50
1 0
105
1 5
150
2 0
20Density of Estimates of pi)
( D e n s i t y o f E s t i m a t e s o f p i )
(Density of Estimates of pi)15
. 1 5
.152
. 2
.225
. 2 5
.253
. 3
.335
. 3 5
.35I Range
C I R a n g e
CI Range2
.2
.222
.22
.2224
.24
.2426
.26
.2628
.28
.283
.3
.3alue of Pi
Value of Pi
Value of Pi
While they are not shown here the coverage probability is more-or-less perfect(96/100). In each figure, the density plot overlaid shows the distribution of estimatedmeans (πs). Note that they look both increasingly Normal, and that their range /standard deviation declines, as the sample sizes increase.
Stata
This is the code for the figures.
. twoway (scatter pihat pihat, mcolor(black) msymbol(circle))(rcap ub lb pihat, lcolor(black) lwidth(vthin) msize(small))(kdensity pihat, yaxis(2) lcolor(gs8) lpattern(dash)),ytitle(CI Range) yline(.2536, lpattern(longdash) lcolor(cranberry))ytitle((Density of Estimates of pi), axis(2))xtitle(Value of Pi) legend(off)aspectratio(1, placement(center))
Notes
Notes
Notes