1 estimation from sample data chapter 08. chapter 8 - learning objectives explain the difference...
TRANSCRIPT
1
Estimation From Sample DataEstimation From Sample Data
Chapter 08
Chapter 8 - Learning Objectives
• Explain the difference between a point and an interval estimate.
• Construct and interpret confidence intervals: with a z for the population mean or proportion. with a t for the population mean.
• Determine appropriate sample size to achieve specified levels of accuracy and confidence.
3
8.1 Introduction
Statistical inference is the process by which we acquire information about populations from samples.
There are two types of inferences:1. Estimation of a parameter
2. Testing a Hypotheses about a parameter
4
8.2 Concept of Estimation
The main objective of estimation is to determine the value of a population parameter on the basis of a sample statistic.
There are two types of estimators:
1. Point Estimator2. Interval Estimator
5
Point Estimator
A point estimator allows us to draw inference about a population parameter (say the mean or the proportion) by estimating a statistic from a sample.
The statistic provides estimate of the value of the parameter at a single point (value)—thus the name point estimate.
7
An interval estimator draws inferences about a population parameter by providing a range (interval) of value within which the unknown population parameter lies.
Interval estimator
Population distribution
Sample distribution
Parameter
Interval Estimator
8
Take a sample and compute the Average Weekly Summer Income of students in your sample (Say, 600) of UMD students.
400=X
You want to know the Average Weekly Summer income of UMD students.
Point Estimate: µ=$400
Interval estimate: µ= $380-$420
Example--
Interval estimator is used more frequently than point estimator: (1) point estimator is more prone to making faulty inferences, and (2) when we use interval estimator we can specify how confident we are in our estimate.
9
In estimation, you would want to select the right sample and sample statistic that allow you to estimate a parameter with less error.
The selection of the right statistic depends on some important characteristics.
Characteristics of Estimators
Desirable characteristics of Estimators
10
Desirable Characteristics of Estimators
1. Unbiasedness: An unbiased estimator is one whose expected value is equal to the parameter it estimates.
2. Consistency: An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size increases.
3. Relative efficiency:From among two or more unbiased estimators (estimates), the one with a smaller variance is said to be relatively efficient.
11
8.3 Interval Estimation of the Population Mean and Proportion
8.3.1 When the Population Variance is Known
8.3.2 When the Population Variance is Unknown
12
8.3.1 Estimating the Population Mean when the Population Variance is Known
We are able to provide an interval estimate of a population mean or proportion based on the following characteristics of a sampling distribution.
1. Given the sampling distribution, we can draw a sample of size n from the population, and calculate the sample mean or proportion
2. Given the central limit theorem we consider that the sampling distribution of the sample means or proportions is normal (or approximately normal) and thus provide probability estimates for the sample mean or proportion that we estimate.
3. Given the formula for standardizing any random variable, we can relate the standardized value obtained from a normal distribution and the sample mean/proportion we are estimating :
nx
Z
13
The general form of an interval estimate of aThe general form of an interval estimate of a population mean ispopulation mean is
The general form of an interval estimate of aThe general form of an interval estimate of a population mean ispopulation mean is
Margin of Errorx Margin of Errorx
Margin of Error and the Interval EstimateMargin of Error and the Interval Estimate
14
8.3 Estimating the Population Mean when the Population Variance is Known
We estimate the range (interval) that contains the value of the unknown population parameter (say the mean) as follows…
)(2n
zx
15
8.3.1 Estimating the Population Mean when the Population Variance is Known
)(2n
zx
where: where: is the sample meanis the sample mean
zz/2 /2 is the standardized value of the Random is the standardized value of the Random variable representing an area,variable representing an area,/2 in on one tail /2 in on one tail of the standard normal probability distributionof the standard normal probability distribution
is the population standard deviationis the population standard deviationnn is the sample size is the sample size1-1-αα is the confidence coefficient is the confidence coefficient
X
16
1)n
zxn
zx(P 22
In its expanded form, the interval can be stated as follows:
The Confidence Interval for ( is known)
The confidence interval
17
Interpreting the Confidence Interval for
Based on the estimate, we can say that with a (1 – percent confidence the interval:
contains the true value of the unknown population parameter.
Based on the estimate, we can say that with a (1 – percent confidence the interval:
contains the true value of the unknown population parameter.
n
zxn
zx
22 ,
19
Interval Estimationof a Population Proportion
p zp pn
/
( )2
1p z
p pn
/
( )2
1
where: 1 -where: 1 - is the confidence coefficient is the confidence coefficient
zz/2 /2 is the is the zz value providing an area of value providing an area of
/2 in the upper tail of the standard/2 in the upper tail of the standard
normal probability distributionnormal probability distribution
is the sample proportionis the sample proportionpp
20
The Confidence Interval for ( is known)
Commonly used confidence levels and their corresponding Z scores
Confidence level α Z (for α/2)
90% 10% 1.645
95% 5% 1.960
99% 1% 2.575
21
SSDDInterval Estimate of Population Mean:
Known: Example
Step-1: Identify coefficient (α) and the confidence coefficient (1- α)at which the margin of error is to be computed (α =5%)
Step-2: Compute the corresponding margin of error for the selected Confidence coefficient
Step-3: Establish the Interval estimate of by adding and subtracting the margin of error to the sample mean:
22
The Confidence Interval for ( is known)
Hands-On-Practice Problems
23
Interval Estimate of Population Mean: Known: Example
SSDD
A random sample of 81 credit card sales in a department store showed that an average sale of $68. From past data, it is known that the standard deviation of sales on credit card is $27.
8.1) Determine the 90% confidence interval estimate of sales on credit .
8.2) Determine the 95% confidence interval estimate of sales on credit.
8.3) Determine the 99% confidence interval estimate of sales on credit.
24
Interval Estimate of Population Mean: Known: Example
SSDD
Solution: n = 81; = $68. = $27. X Step-1: Identify coefficient (α) and the confidence coefficient (1- α) at which the margin of error is to be computed:
Coefficient (α =10%) Confidence Coefficient: (1- α)= 90%
Coefficient (Zα/2 =0.05)=1.645; Standard ErrorMargin of Error = 1.645 x 3 = 4.935
39
27
81
27
nx
Step-3: Establish the Interval estimate of by adding and subtracting the margin of error to the sample mean:
68 – 4.935 = 63.065; 68 + 4.935 = 72. 935; [ 63.065 72.935]
We are 90 percent confident that the average credit sales of the store lies in the interval $63 and $73
Step-2: Compute the corresponding margin of error for the selected confidence coefficient
25
Interval Estimate of Population Mean: Known: Example
SSDD
A random sample of 81 credit card sales in a department store showed that an average sale of $68,000. From past data, it is known that the standard deviation of the credit card sales is $27.
8.1) Determine the 90% confidence interval estimate of the sales on credit cards. [63.065- 72.935]
8.2) Determine the 95% confidence interval estimate of the sales on credit cards.
26
Interval Estimate of Population Mean: Known: Example
SSDD
Solution: n = 81; = $68. = $27. X Step-1: Identify coefficient (α) and the confidence coefficient (1- α) at which the margin of error is to be computed:
Coefficient (α =5%) Confidence Coefficient: (1- α)= 95%
Coefficient (Zα/2 =0.025)=1.96; Standard ErrorMargin of Error = 1.96 x 3 = 5.88
39
27
81
27
nx
Step-3: Establish the Interval estimate of by adding and subtracting the margin of error to the sample mean:
68 – 5.88= 62.12; 68 + 5.88 = 73. 88; [ 62.12 73.88]
We are 95 percent confident that the average credit sales of the store lies in the interval $62 and $74
Step-2: Compute the corresponding margin of error for the selected confidence coefficient
27
Interval Estimate of Population Mean: Known: Example
SSDD
A random sample of 81 credit card sales in a department store showed that an average sale of $68,000. From past data, it is known that the standard deviation of the credit card sales is $27.
8.1) Determine the 90% confidence interval estimate of the sales on credit cards. [63.065 - 72.935]
8.2) Determine the 95% confidence interval estimate of the sales on credit cards [62.12 - 73.88]
8.3) Determine the 99% confidence interval estimate of the sales on credit cards.
28
Interval Estimate of Population Mean: Known: Example
SSDD
Solution: n = 81; = $68. = $27. X Step-1: Identify coefficient (α) and the confidence coefficient (1- α) at which the margin of error is to be computed:
Coefficient (α =1%) Confidence Coefficient: (1- α)= 99%
Coefficient (Zα/2 =0.005)=2.575; Standard ErrorMargin of Error = 2.575 x 3 = 7.725
39
27
81
27
nx
Step-3: Establish the Interval estimate of by adding and subtracting the margin of error to the sample mean:
68 – 7.725= 60.275; 68 + 7.725 = 75.725; [ 60.275 75.725]
We are 99 percent confident that the average credit sales of the store lies in the interval $60 and $76
Step-2: Compute the corresponding margin of error for the selected confidence coefficient
29
Interval Estimate of Population Mean: Known: Example
SSDD
A random sample of 81 credit card sales in a department store showed that an average sale of $68. From past data, it is known that the standard deviation of the credit card sales is $27.
8.1) The 90% confidence interval estimate of sales on credit cards.
[63.065 - 72.935]8.2) The 95% confidence interval estimate of sales on credit cards
[62.12 - 73.88]8.3) The 99% confidence interval estimate of sales on credit cards
[60.275 - 75.725]
30
Implications…
As we increase the confidence coefficient (say from 90% to 95% or to 99%), the interval that contains the mean of the population widens.
There is a trade-off between the width of the interval and the confidence with which we can make the estimation
31
The Confidence Interval for ( When The Population Standard Deviation Is Unknown)
32
Recall that when the population variance is known we use the following statistic to provide an interval estimate of a population mean
The Confidence Interval for ( When The Population Standard Deviation Is Unknown)
)(
nzx
2
33
The t - Statistic
n
x
n
x s
n
x
Z
However, information about population variance may not be available all the time. Provided that the sampled population is normally distributed, even if the population variance is unknown, we use variance estimated from the sample and a t statistic (Student t distribution) to make inference about the population mean.
t
34
The t - Statistic
0
The t distribution is mound-shaped, and symmetrical around zero.
The variance of a t-distribution depends on the sample size. Generally it has higher variance than a normal distribution
35
t Distribution
StandardStandardnormalnormal
distributiondistribution
tt distributiondistribution(20 degrees(20 degreesof freedom)of freedom)
tt distributiondistribution(10 degrees(10 degrees
of of freedom)freedom)
zz, , tt
When the degrees of freedom (sample size) is more than 100, the standard normal When the degrees of freedom (sample size) is more than 100, the standard normal zz value value provides a good approximation to the provides a good approximation to the tt value. value.
The variance (spread) of a t-distribution, compared to that of normal distribution is largely determined by the “degrees of freedom” ( the sample size)
36
The t - Statistic
The interval estimate of the population mean is thus computed as :
] x 1-nat [ )(2n
stx
37
Example:8.2.1 In a random sample of 100 oil changes, it was found that
on average it takes about 22 minutes to change oil for a car with a standard deviation of 5 minutes.
Assuming that the amount of time it takes to change oil on a car is normally distributed, provide the 99% confidence interval estimate of the average amount of time it takes to change oil on a typical car).
The Confidence Interval for ( is unknown)
38
Example 8.2.2.
Using the same information, but assuming a standard deviation of 25 minutes, provide the 99% confidence interval estimate of the population mean (the average amount of time it takes to change oil on a car).
The Confidence Interval for ( is unknown)
39
Example 8.2.3. Using the same information (std. deviation=5,
sample=22, but assuming a sample size of 400 car changes and provide the 99% confidence interval estimate of the population mean (that is, the average amount of time it takes to change oil on a typical car).
The Confidence Interval for ( is unknown)
40
The width of the confidence interval is affected by
1. The confidence level (1-a): The higher the confidence level, the wider the interval estimate.
2. The population standard deviation (s): The higher the variance, the wider the interval estimate.
3. The sample size (n): The larger the sample size, the narrower the interval estimate
The width of the confidence interval is affected by
1. The confidence level (1-a): The higher the confidence level, the wider the interval estimate.
2. The population standard deviation (s): The higher the variance, the wider the interval estimate.
3. The sample size (n): The larger the sample size, the narrower the interval estimate
Implications for the Width of the Confidence Interval
41
Wide interval estimator provides little information.
Where is
???? ???? ???? ???? ???? ???? ???? ???? ???? ???? ???? ???? ???? ???? ????
42
But we want a narrower confidence interval and a higher confidence level.But we want a narrower confidence interval and a higher confidence level.
Wide interval estimator provides little information.Where is
Information and the Width of the Interval
43
The width of the confidence interval is affected by confidence level, variance of the population, and sample size.
1. We want higher confidence level and narrow interval estimate. However, there is a trade-off between confidence level and the interval estimate we want to establish.
2. Although lower variance can provide us with narrow interval estimate, the variance of the population or sample is beyond our control.
3. Therefore, the only way we can establish narrow (more informative interval) while maintaining higher confidence level is thus by adjusting (increasing) our sample size.
The width of the confidence interval is affected by confidence level, variance of the population, and sample size.
1. We want higher confidence level and narrow interval estimate. However, there is a trade-off between confidence level and the interval estimate we want to establish.
2. Although lower variance can provide us with narrow interval estimate, the variance of the population or sample is beyond our control.
3. Therefore, the only way we can establish narrow (more informative interval) while maintaining higher confidence level is thus by adjusting (increasing) our sample size.
The Width of the Confidence Interval
44
90%
Confidence level
Determining the Proper sample size is thus a critical component of in Establishing Narrow Interval Estimation
Determining the Proper sample size is thus a critical component of in Establishing Narrow Interval Estimation
The Sample Size
45
From the formula that we used to establish the interval estimate of the population parameter, we can derive a formula that allows us to determine the appropriate sample size.Two important requirements:1. At what confidence level do we want to provide the
interval estimate2. What interval width (W) do we need?
8.3 Selecting the Sample size
46
Where W is the interval width we want to maintain. Thus to compute the sample size, first we need to determine the interval width.
2
222/
2
2 )()(
w
Z
w
zn
2
222/
2
2 )()(
w
Z
w
zn
8.3 Selecting the Sample size
47
Example 10.2 In order to estimate the amount of lumber that can be harvested from
a tract of land with a 99% confidence, it was indicated that the mean diameter of trees in the tract must be within one inch.
Assuming that diameters are normally distributed with standard deviation of 6 inches, how many samples should be selected to provide the interval estimation for the mean of the diameter of the trees in the tract at the specified confidence level?.
Selecting the Sample size
48
Solution The estimate accuracy is +/-1 inch. That is w = 1.
The confidence level 99% leads to = .01, thus z/2 = z.005 = 2.575.
The standard deviation was given as 6
Thus, we can compute the required sample size as follows:
2391
)6(575.2w
zn
222
Selecting the Sample size
49
1. Determine the sample size, and the values of variables of interest (width, spread of
the population or sample).
2. Select the confidence level for the interval estimation
3. Compute the sample mean ( population variance may be known or unknown).
4. Determine the critical value (Z or t from the standard normal table)
5. Compute the confidence interval.
Computing Interval Estimates: Summary
50
Summary of Interval Estimation Summary of Interval Estimation ProceduresProcedures
for a Population Meanfor a Population Mean
Is theIs thepopulation standardpopulation standard deviation deviation known ? known ?
Use the sampleUse the samplestandard deviationstandard deviation
ss to estimate to estimate
UseUse
YesYes NoNo
/ 2
sx t
n / 2
sx t
nUseUse
/ 2x zn
/ 2x z
n
KnownKnownCaseCase
UnknownUnknownCaseCase
51
Interval Estimationof a Population Proportion
p zp pn
/
( )2
1p z
p pn
/
( )2
1
where: 1 -where: 1 - is the confidence coefficient is the confidence coefficient
zz/2 /2 is the is the zz value providing an area of value providing an area of
/2 in the upper tail of the standard/2 in the upper tail of the standard
normal probability distributionnormal probability distribution
is the sample proportionis the sample proportionpp