chapter 7. estimates and sample size - el camino college

1

Chapter 7. Estimates and Sample Size

Chapter Problem: How do we interpret a poll about global warming?

Pew Research Center Poll: “From what you’ve read and heard, is there a solid

evidence that the average temperature on earth has been increasing over the past a

few decades, or not?”

• 1501 randomly selected US adults responded

• 70% yes

Important issues related to this poll:

• How can the poll be used to estimate the percentage of US adults who believe

that the earth is getting warmer?

• How accurate is the result of 70% likely to be?

• Is the sample size too small? (1501/225,139,000 = 0.0007% of the population)

• Does the method of selecting the people to be polled have much of an effect on

the results?

2

7.1 Review and Preview

Review

• Descriptive statistics

• Two major applications of inferential statistics

o Estimate the value of a population parameter

o Test some claim (or hypothesis) about a population

This chapter focus on

• Estimating important population parameters:

o Proportion

o Mean

o Variance

• Methods for determining the sample sizes necessary to estimate

the above parameters

3

7.2 Estimating a Population Proportion

1. Introduction

2. Why Do We Need Confidence Interval?

3. Interpreting a Confidence Interval

4. Critical Values

5. Margin of Error

6. Determining Sample Size

7. Better-Performing Confidence Intervals

4


1. Introduction

Definition – a point estimate is a single value (or point) used to

approximate a population parameter.

We will see, the sample proportion is the best point estimate of

the population proportion p.

Proportion, probability, and percent. We focus on proportion. We

can also work with probabilities and percentages.

p̂

5


1. Introduction

e.g.1 Proportion of Adults Believing in Global warming. In the

chapter problem we noted that in a Pew Research Center poll,

70% of 1501 randomly selected adults in the U.S. believe in

global warming, so that sample proportion is = 0.70. Find the

best point estimate of the proportion of all adults in the U.S.

who believe in global warming.

Solution.

p̂

The best point estimate of p is 0.70

6


2. Why Do We Need Confidence Interval?

Definition – a confidence interval (or interval estimate) is a range

(or an interval) of values used to estimate the true value of a

population parameter. A confidence interval is sometimes

abbreviated as CI.

Confidence interval

• Associated with a confidence level

• Confidence level is the probability 1 (or area 1 – )

• is the complement of confidence level

• E.g. for 95% confidence level, = 0.05

7


2. Why Do We Need Confidence Interval

Definition – The confidence level is the probability 1 – that is the

proportion of times that the confidence interval actually does

contains the population parameter, assuming that the estimation

process is repeated a large number of times.

Example Here is an example of CI found later (in e.g.3), which is

based on the sample data of 1501 adults polled, with 70% of them

saying that they believe in global warming:

The 95% confidence interval estimate of the population

proportion p is 0.677 < p < 0.723

8



There are correct interpretation and incorrect creative interpretations

Correct: “we are 95% confident that the interval from 0.677 to 0.723

actually does contain the true value of p.” This means that if we

were to select many different samples of size 1501 and

construct the corresponding CI, 95% of them would actually

contain the value of population proportion p. (success rate)

Wrong: “There is a 95% chance that the true value of p will fall

between 0.677 and 0.723.”

9



– At any specific point of time, the population has a fixed and

constant value p, and a CI constructed from a sample either

contain p or does not.

– A confidence level of 95% tells us that the process that we

are using will, in the long run, result in confidence interval

limits that contain the true population proportion 95% of

time.

p = 0.75

0.70

0.65

0.80

0.85

Figure 7-1 CI’s from 20

Different Samples This confidence interval

Does not contain p = 0.75.

10


4. Critical Values

Notation for Critical Value and

Definition – a critical value is the number on the borderline

separating sample statistics that are likely to occur from those

that are unlikely to occur. The number is a critical value that

is a z score with the property that it separates an area of /2 in

the right tail of the standard normal distribution.

Figure 7-2 Critical Value in

the Standard Normal Distribution 2z

2z

2z

z/2

2

2

z/2

2z

11


4. Critical Values

e.g.2 Find the critical value corresponding to a 95% confidence

level.

By Table A-2, we find that 96.12

z

Figure 7-3 Finding for a 95% CL 2z

The total area to the left

of this boundary is 0.975

z/2

025.02

025.02

z/2

2z

Confidence level 95%

12


4. Critical Values

We have the following brief table:

Confidence Level Critical Value,

90% 0.10 1.646

95% 0.05 1.96

99% 0.01 2.575

2z

13


5. Margin of Error

Definition – The margin of error E, is the maximum likely (with

probability 1 – ) difference between the observed sample

proportion and the true value of the population p.

Formula 7 –1 Margin of error for proportion

p̂

n

qpzE

ˆˆ2/

14


5. Margin of Error

Requirements for this section

1) Simple random sample

2) Conditions for binomial distribution are satisfied.

3) At least 5 success and at least 5 failures. (With p and q unknown, we

estimate their values using sample proportion)

Notation for Proportions

p = population proportion

= sample proportion of x successes in a sample of size n

n = number of sample values

n

xp ˆ

pq ˆ1ˆ

15


5. Margin of Error

Confidence Interval (or Interval Estimate) for the Population

Proportion p

Where

The CI is often expressed in the following equivalent formats:

or

Round-off Rule for CI estimate of p

Round the CI limits for p to three significant digits.

EppEp ˆˆn

qpzE

ˆˆ2/

Ep ˆ

EpEp ˆ,ˆ

16


5. Margin of Error

Procedure for Constructing a Confidence Interval for p.

1) Verify that the requirements are satisfied.

2) Find the critical value that corresponds to the desired CL.

3) Evaluate the margin or error:

4) Obtain the CI:

5) Round the resulting CI limits to three significant digits.

EppEp ˆˆ

n

qpzE

ˆˆ2/

2/z

17


5. Margin of Error

e.g.3 Constructing a Confidence Interval: Poll Results. In the chapter

problem we noted that a Pew Research poll of 1501 randomly

selected U.S. adults showed that 70% of the respondents

believe in global warming. n = 1501 and = 0.70.

a. Find the margin of error corresponding to 95% CL

b. Find the 95% CI estimate of the population proportion p.

c. Based on the results, can we conclude that the majority of adults

believe in global warming?

d. Assuming that you are a newspaper …

Sol. Requirements met.

p̂

18


5. Margin of Error

e.g.3 (Chapter Problem, TI-84 demo).

a. , = 0.70, = 0.30, and n = 1501, so

b. 0.7 – 0.023183 = 0.676817, 0.7 + 0.023183 = 0.723183, so

CI: 0.677 < p < 0.723 or in interval notation CI = (0.677, 0.723)

c. To interpret the results

Based on the CI obtained in part b, it does appears that the proportion of

adults who believe in global warming is greater than 50%, so we can safely

conclude that the majority of adults believe in global warming. Because the

limits of 0.677 and 0.723 are likely to contain the true population, it appears

that the proportion is a value greater than 0.5.

023183.01501

)30.0)(70.0(96.1

ˆˆ2/

n

qpzE

p̂96.12/ z q̂

19


5. Margin of Error

Analyzing Polls e.g.3 Continue. When analyzing results from polls,

we should consider the following:

1) The sample should be simple random sample, not an

inappropriate sample (such as a voluntary response sample)

2) The confidence level should be provided. (It is often 95%, but

media reports often neglect to identify it)

3) The sample size should be provided. (It is usually provided by the

media, but not always)

4) Except for relatively rare cases, the quality of the poll results

depends on the sampling method and the size of the sample, but

the size of the population is usually not a factor.

20



Sample Size for Estimating Proportion p

Requirement: simple random sample.

When an estimate is known: Formula 7-2

When no estimate is known: Formula 7-3

Round-off rule for Determining Sample Size

If the computed sample size n is not a whole number, round it

up to the whole number.

p̂

2

2

2/ˆˆ

E

qpzn

p̂

2

2

2/ 25.0

E

zn

21



e.g.4 How many adults use the Internet? Assume that a manager for E-Bay wants to determine the current percentage of U.S. adults who now use the Internet. How many adults must be surveyed in order to be 95% confident that the sampling percentage is in error by no more than 3 percentage points?

a. Use the result from a Pew Research Center poll: in 2006, 73% of U.S. adults used the Internet

b. Assume that we have no prior information.

a.

b.

= 0.05, ,96.12/ z ,27.0ˆ ,73.0ˆ qp E = 0.03

n = 842 samples

n = 1068 samples

22



Interpretation. To be 95% confident that our sample percentage is

within 3 percentage points of the true percentage for all adults, we

should obtain a simple random sample of 1068 adults. By comparing

this result to the sample size of 842 found in part (a), we can see that

if we have no knowledge of a prior study, a larger sample is required

to achieve the same result as when the value of can be estimated.

p̂

23



Finding the Point Estimate and E from a Confidence Interval.

Point estimate of p:

Margin of Error:

e.g.5 Given that a confidence interval = (0.58, 0.81), Find and E.

2

limit) confidence(lower limit) confidenceupper (ˆ

p

2

limit) confidence(lower limit) confidenceupper ( E

695.02

81.058.0ˆ

p 115.0

2

58.081.0

E

p̂

24


7. Better-Performing Confidence Intervals

• Skip

Using Technology for CI’s

• Statdisk

• TI-84 (1-PropZInt, STAT/TESTS/1-PropZInt)

25

7.3 Estimating a Population Mean: Known

1. Introduction

2. Confidence Interval

3. Determining Sample Size Required to Estimate

4. Using Technology

26


1. Introduction

• Goal: present methods for using sample data to find a point

estimate and CI estimate of population mean.

• Requirements:

– Simple random sample

– The population SD is known

– Normal distribution or n > 30.

• Known population standard deviation

27


1. Introduction

The sample mean is the best point estimate of the population mean.

• Unbiased estimator of population mean

• For many populations, the distribution of sample mean tends

to be more consistent (with less variation) than the distributions

of other sample statistics.

x

x

28



Margin of error for estimating mean ( known) is:

Confidence Interval Estimate for the Population Mean (with

known) is:

Where

or

or

nzE

2/

ExEx n

zE

2/

Ex

) ,( ExEx

29



Procedure for Constructing a Confidence Interval for p.

1) Verify that the requirements are satisfied.

2) Find the critical value that corresponds to the desired CL.

3) Evaluate the margin or error:

4) Obtain the CI:

5) Rounding: next page.

ExEx ˆˆ

nzE

2/

2/z

30



Round-Off Rule for CI Used to Estimate

1. When use the original set of data to construct a confidence

interval, round the CI limits to one more decimal place than is

used for the original set of data.

2. When use the summary statistics (n, , s) to construct CI,

round the CI limits to the same number of decimal places used

for the sample mean.

x

31



Interpreting a Confidence Interval – similar to 7.2 for proportion be

careful to interpret CI correctly. After obtaining a CI estimate of the

population mean , such as 95% CI of 164.49 < < 180.61.

Correct: “we are 95% confident that the interval from 164.49 to 180.61 actually does contain the true value of .” This means that if we were to select many different samples of the same size and construct the corresponding CI, 95% of them would actually contain the value of population mean .

Wrong: “There is a 95% chance that the true value of will fall between 164.49 and 180.61.”

32



e.g.1 Weights of Men. n = 40, = 172.55 lb, and = 26 lb.

Using a 95% CL, find the following (TI-84):

a. The margin of error E, and CI for .

b. What do the results suggest about the mean weight of 166.3 lb

that was used to determine the safe passenger capacity in 1996?

Sol. a.

CI is :

(round to two decimals as in )

x

0574835.840

2696.12/

nzE

ExEx

x

172.55 – 8.0574835 < < 172.55 + 8.0574835

164.49 < < 180.61

33



Interpretation. The confidence interval from part (a) could also be

expressed in 172.55 ± 8.06 or as (164.49, 180.61). Based on the

sample with n = 40 and = 173.55 and assumed to be 26, the

confidence interval for the population means is 164.49 lb < <

180.61 lb and this interval has a 0.95 confidence level. This

means that if we were to select many different simple random

samples of 40 men and construct the confidence intervals as we

did here, 95% of them would contain the value of the

population mean

x

34



Rationale for CI.

By Central Limit Theorem, sample means (of sample size n) are

normally distributed with mean and variance . In the

equation z = , replace with , replace with ,

then solve for to get

n/xxx /)( x x

nzx

n/

35



Sample Size for Estimating mean

Formula 7-4

where = critical z score based on the desired CL

E = desired margin of error

= population standard deviation

Comments

– It does not depends on population size N.

– Round up.

2

2/

E

zn

2/z

36



Dealing with Unknown When Finding Sample Size

– Use range rule of thumb: range / 4

– Start the sample collection process without knowing and, using the

first several values, calculate the sample standard deviation s and use it

in places of . The estimated value of can then be improved as more

sample data are obtained, and the sample size can be refined

accordingly.

– Estimating the value of by using the results of some other study that

was done earlier.

– Use other know results. E.g. IQ tests are typically designed so that the

mean is 100 and SD is 15. For a population of statistics professors, IQ is

more than 100 and SD is less than 15. But we can assume SD is 15 to

play it safe.

– My comments: see next section!!

37



e.g.2 IQ Scores of Statistics students. Want to estimate the mean IQ

score for the population of statistics students.

Q: how many statistics students must be randomly selected for IQ

tests if we want 95% confidence that the sample mean is within

3 IQ points of the population mean?

Sol. = 1.96 (since = 0.05)

E = 3

= 15 (see previous page)

Now

2/z

samples 9704.963

)15(96.122

2/

E

zn

38



Interpretation. Among the thousands of statistics students, we need

to obtain a simple random sample of at least 97 of them. Then we

get their IQ scores. With a simple random sample of only 97

statistics students, we will be 95% confident that the sample mean

is within 3 IQ points of the true population mean . x

39


Using Technology for CI’s

• See e.g.2, TI-84 Demo

• Statdisk (analysis)

40

7.4 Estimating a Population Mean: Not Known

1. Introduction


3. Choosing the Appropriate Distribution

4. Finding Point Estimate and E from a CI

5. Using Technology

41


1. Introduction

• is not known

• Use Student t distribution (instead of normal distribution)

• Method in this section is realistic, practical, and often used

• Requirement

– Simple random sample

– From normally distributed population or n > 30

42


1. Introduction

Just like in section 7.3, we have:

The sample mean is the best point estimate of the

population mean.

x

43



Before finding the CI, first introduce Student t distribution.

Student t Distribution

If a population has normal distribution, then the distribution of

is a Student t Distribution for all sample of size n. A student t distribution

often referred to as a t distribution, is used to find critical values denoted

by .

n

s

xt

2/t

44



Definition

The number of degrees of freedom for a collection of sample

data is the number of sample values that can vary after certain

restrictions have been imposed on all data values.

For application in this section, degree of freedom = n – 1

Example 1. Finding a Critical Value. n = 7 samples is selected from

a normally distributed population. Find with 95% CL

Sol. df = 7 – 1 = 6 Table A-3 6th row for 95% CL, =

0.05, find the column listing values for an area of 0.05 in two

tails = 2.447 (can do TI-84 demo).

2/t

2/t

45



Margin of Error E for the Estimate of (with not known)

Confidence Interval for the Estimate (with not known)

n

stE 2/

ExEx

46



Procedure for Constructing a CI for (with not known)

Step 1. Verify the requirements are met. (simple random sample,

either data is from a normal distribution or n > 30)

Step 2. Given CL, let df = n –1, use A-3 to find the critical value

that corresponds to the desired CL

Step 3. Evaluate the margin of error

Step 4. Find . Then the CI is

Step 5. Original data: add one decimal place; Summary: same.

2/t

n

stE 2/

x ExEx

47



e.g.2 Constructing a Confidence Interval: Garlic for Reducing

Cholesterol. Use summary data, n = 49, , s = 21.0 and

.

The margin of error (95% CL) is:

The CI is 0.4 – 6.027 < < 0.4 + 6.027, or

Interpretation. Because the confidence interval contains the value of

0, it is possible that the mean of the changes in LDL cholesterol

is equal to 0, suggesting that the garlic treatment did not affect

the LDL cholesterol levels. It does not appear that the garlic

treatment is effective in lowering LDL cholesterol.

009.22/ t

4.0x

.027.649

0.21009.2 E

–5.6 < < 6.4

48



Important Properties of the Student t Distribution

1. Different for different sample size (unlike standard normal distribution)

2. Same bell shape as the standard normal distribution. But reflect greater

variability (with wider distributions) that is expected with small samples.

(n = 3, wider, n = 12 narrower)

3. Mean = 0 (just like standard normal distribution)

4. SD varies with sample size. It is always greater than 1 (SD > 1). Unlike

standard normal distribution where SD = 1.

5. As sample size gets larger, Student t distribution gets closer to standard

normal distribution.

49



start

Is

known

?

Use nonparametric

or bootstrapping

methods

yes

no Is the

Population normally

Distributed ?

Is the

Population normally

Distributed ?

Is

n > 30

?

Is

n > 30

?

no

no

no no

Use nonparametric

or bootstrapping

methods t

Use the t

distribution

Z Use the normal

distribution

yes

yes

yes

yes

Figure 7 – 6

Choosing Between z and t

50



Method

Use normal (z) distribution

Use t distribution

Use a nonparametric method or bootstrapping

Conditions

Known and normally distributed population

Or

Known and n > 30

not known and normally distributed

population

Or

not known and n > 30

Population is not normally distribution and n

30

Note:

1. Criteria for deciding whether the population is normally distributed: Population need not be

exactly norm, but is should appear to be somewhat symmetric with one mode no outliers

2. Sample size > 30: This is a commonly used guideline, but sample sizes of 15 to 30 are

adequate if the population appears to have a distribution that is not far from being normal and

there is no outliers. For some population distributions that are extremely far from normal, the

sample size might need be much larger than 30.

51



Example 3. Choosing Distributions. To construct CI for . Use the

given data to determine whether the margin or error E should

be calculated using (from normal distribution), (from

Student t distribution) or neither?

a. n = 9, s = 15, and population has a normal distribution

b. n = 5, s = 2, and population has a very skewed distribution

c. n = 12, = 0.6, and population has a normal distribution

d. n = 75, = 0.6, and distribution is skewed

e. n = 75, s = 0.6, and distribution is extremely skewed

2/z 2/t

,75x

,6.98x

,20x

,6.98x

,6.98x

2/t

Neither

2/z

2/z

2/t

52



e.g.4 CI for Alcohol in Video Games. Twelve different video games

showing substance use were observed. The duration times (in

second) of alcohol use were recorded, with the times listed

below. The design of the study justifies the assumption that the

sample can be treated as a simple random sample.

84, 14, 583, 50, 0, 57, 207, 43, 178, 0, 2, 57

Use this data to construct 95% CI estimate of , the mean

duration time that the video showed the use of alcohol.

Sol. Next page.

53



e.g.4 Continue.

Check requirements: not normal, n > 30 not satisfied.

Requirement are not satisfied.

(TI-84 demo, Tinterval) to get

CI = (1.8, 210.7)

Interpretation. Because the requirements

not satisfied, we don’t have the 95% confidence that (1.8 sec, 210.7

sec) interval contains the true population mean. We should use some

other method.

0 100 200 600

2

4

Fre

qu

ency

5

3

1

54


4. Finding Point Estimate and E from a CI

Given the upper and lower limits of a CI, find the mean and margin

of error:

Point estimate of :

Margin of error:

e.g.5 Weights of Garbage. Data Set 22 in Appendix B. Given 95%

CI (24.28, 30.607) Find the mean and margin of error.

Sol.

2

limit) confidence(lower limit) confidence(upper x

2

limit) confidence(lower limit) confidence(upper E

lb 444.272

282430.607

.x

lb 3.1642

2824607.30

.E

55


5. Using Technology

• TI-84 demo

– summary data

– original data

• Statdisk

56

7.5 Estimating a Population Variance

1. Chi-Square Distribution

2. Estimator of 2

57



• Given a normally distribution population with variance 2,

randomly select a sample of size n

• Compute the sample variance s2,

• The sample statistic 2 = (n – 1)s2/2 has sampling distribution

called chi-square distribution.

Chi-Square Distribution

Formula 7 – 5

Where n = sample size

s2 = sample variance

2 = population variance

2

22 )1(

sn

58



• Denote the chi-square by 2

• Pronounced “kigh square”

• To find the critical value, refer to A-4

• Chi-square distribution depends on degree of freedom

• Degree of freedom df = n – 1

59



Properties of the Distribution of the 2 Statistic

• Not symmetric, as df increases, it becomes more symmetric

• Value of 2 can be positive or zero, but never negative

• The 2 distribution is different for different df. As df increases,

the 2 distribution approaches to normal distribution (just the t

distribution)

60



e.g.1 Finding Critical Values of 2. A Simple random sample of ten

voltage levels is obtained. Construction of a confidence level

for the population standard deviation requires that the left and

right critical values of 2 corresponding to the confidence level

of 95% and a sample size of n = 10. Find the critical value of 2

separating an area of 0.025 in the left tail, and find the critical

value of 2 separating an area of 0.025 in the right tail.

Sol. Df = 10 – 1 = 9. (next page)

61


e.g.1 continue (can also use Statdisk, Excel, and Minitab)

Sol.

62


2. Estimator of 2

Point estimator

• The sample variance s2 is the best point estimate of the

population variance 2

• The sample standard deviation s is commonly used as a point

estimate of (even though it is a biased estimate)

63


2. Estimator of 2

Interval estimator

Requirements:

1) The sample is a simple random sample

2) The population must have normally distributed values

CI for the population variance 2:

CI for the population standard deviation:

2

22

2

2 )1()1(

LR

snsn

2

2

2

2 )1()1(

LR

snsn

64


2. Estimator of 2

Procedure for constructing a confidence interval for 2 or .

Step 1. verify that the requirements are satisfied

Step 2. Using n – 1 degree of freedom, Table A-4 to find critical

values and corresponding to the desired confidence

level.

Step 3. Find the CI:

Step 4. Find CI for . Take the square root on all three places.

Step 5. Rounding. Data: add one decimal place; summary: same.

2

L2

R

2

22

2

2 )1()1(

LR

snsn

65


2. Estimator of 2

e.g.2 Confidence Interval for Home Voltage. Sample of 10:

123.3, 123.5, 123.7, 123.4, 123.6, 123.5, 123.5, 123.4, 123.6, 123.8

Step 1. requirements: Histogram is normal. Simple random sample.

66


2. Estimator of 2

e.g.2 Confidence Interval for Home Voltage. Sample of 10:

123.3, 123.5, 123.7, 123.4, 123.6, 123.5, 123.5, 123.4, 123.6, 123.8

Step 2. For 95% CL, double sided. Found that = 2.700, and

= 19.023.

Step 3.

or, 0.010645 < 2 < 0.075000

Step 4. Take square root: 0.10 volt < < 0.27 volt

Interpretation: Based on this result, the confidence interval is (0.10, 0.27). The

limits of the CI is 0.10 and 0.27. But the format s E cannot be used because the

CI does not have s at its center.

700.2

)15.0)(110(

023.19

)15.0)(110( 22

2

2

L2

R

67


2. Estimator of 2

Rationale for the CI. If we obtain simple random samples of size n

from a population with variance 2, there is a probability of 1 –

that the statistic (n – 1)s2/2 will fall between the critical values of

and , i.e. there is a probability that the following is true:

Combine both in equality into one inequality, we have:

2

2

22

2

2 )1( and

)1(LR

snsn

2

L2

R

2

22

2

2 )1()1(

LR

snsn

68


2. Estimator of 2

Determine the sample size.

• Harder than in the case of mean and proportion.

– Use Table 7-2

– Statdisk/Analysis/Sample size determination/Estimate St Dev

• Excel, Minitab, TI-84 does not provide sample size.

69


2. Estimator of 2

70


2. Estimator of 2

e.g.3. Finding the sample size for estimating . We want to estimate

the standard deviation for all voltage levels in a home. We

want to be 95% confident that our estimate is within 20% of the

true value of . How large should the sample size be? Assume

that the population is normally distributed.

Ans. at least 48 samples

chapter 7. estimates and sample size - el camino college

Documents