section 10.3 estimating a population proportion. now what now what in this section we are interested...

21
SECTION 10.3 SECTION 10.3 Estimating a Population Estimating a Population Proportion Proportion

Upload: kyle-dillon

Post on 27-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

SECTION 10.3SECTION 10.3

Estimating a Population ProportionEstimating a Population Proportion

Page 2: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

NOW WHATNOW WHATIn this section we are interested in the In this section we are interested in the

unknown unknown proportion, pproportion, p of a population as of a population as opposed to the unknown mean of a population.opposed to the unknown mean of a population.

Keep in mind, p will have an approximately normal Keep in mind, p will have an approximately normal distribution, so it is distribution, so it is

BACK TO THE WORLD OF z.BACK TO THE WORLD OF z.

Page 3: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Standard ErrorStandard Error

We don’t really know p for our standard deviation. So, We don’t really know p for our standard deviation. So, when we create confidence intervals, will be a decent when we create confidence intervals, will be a decent estimate of p.estimate of p.

The standard deviation of isThe standard deviation of is

Standard error of isStandard error of is

(1 )p p

n

ˆ ˆ(1 )p p

n

Page 4: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Conditions for Inference about a Conditions for Inference about a Population ProportionPopulation Proportion

1.1. RandomRandom—Data are an SRS from the population or are results —Data are an SRS from the population or are results from a randomized experimentfrom a randomized experiment

2.2. NormalityNormality—The sample size is large enough to assume the —The sample size is large enough to assume the sampling distribution of is approximately Normal. sampling distribution of is approximately Normal. Remember, there is no population distribution for p.Remember, there is no population distribution for p.For a confidence interval, check:For a confidence interval, check:

3.3. IndependenceIndependence—Either the sample is collected with replacement —Either the sample is collected with replacement or the population is at least ten times as large as the sample so or the population is at least ten times as large as the sample so that we can use our formula for standard deviationthat we can use our formula for standard deviation

ˆ 10

ˆ(1 ) 10

np and

n p

NOTE: we don’t use p here because we don’t know p

Page 5: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

CAUTIONCAUTIONBe sure to check that the conditions Be sure to check that the conditions for constructing a confidence for constructing a confidence interval for the population proportion interval for the population proportion are satisfied before you perform any are satisfied before you perform any calculations.calculations.

Page 6: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Confidence IntervalConfidence Interval

These procedures are very similar to what we did in These procedures are very similar to what we did in Section 10.1. However, now we are working with Section 10.1. However, now we are working with proportions instead of means.proportions instead of means.

We will interpret our results in a very similar fashion.We will interpret our results in a very similar fashion.

ˆ ˆ(1 )ˆ *

p pp z

n

Page 7: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

INFERENCE TOOLBOX (p 631)INFERENCE TOOLBOX (p 631)

1—1—PPARAMETER—Identify the population of interest ARAMETER—Identify the population of interest and the parameter you want to draw a conclusion and the parameter you want to draw a conclusion about.about.2—2—CCONDITIONS—Choose the appropriate inference ONDITIONS—Choose the appropriate inference procedure. VERIFY conditions (procedure. VERIFY conditions (Random, Normal, Random, Normal, Independent) Independent) before using it.before using it.3—3—CCALCULATIONS—If the conditions are met, carry ALCULATIONS—If the conditions are met, carry out the inference procedure.out the inference procedure.4—4—IINTERPRETATION—Interpret your results in the NTERPRETATION—Interpret your results in the context of the problem. CONCLUSION, context of the problem. CONCLUSION, CONNECTION, CONTEXT(meaning that our CONNECTION, CONTEXT(meaning that our conclusion about the parameter connects to our work conclusion about the parameter connects to our work in part 3 and includes appropriate context)in part 3 and includes appropriate context)

Steps for constructing a CONFIDENCE INTERVAL:DO YOU REMEMBER WHAT THE STEPS ARE???

Page 8: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Example: Will smoking shorten your life?Example: Will smoking shorten your life?

Do smokers realize that smoking is bad for their Do smokers realize that smoking is bad for their health? Have most smokers tried to quit? The Harris health? Have most smokers tried to quit? The Harris Poll addressed smoking in a sample survey conducted Poll addressed smoking in a sample survey conducted by telephone in January 2000. Because Harris called by telephone in January 2000. Because Harris called residential telephone numbers at random, the sample residential telephone numbers at random, the sample (ignoring practical problems) was an SRS of smokers (ignoring practical problems) was an SRS of smokers living in the United States in households with living in the United States in households with telephone service. That sample size was n=1010. telephone service. That sample size was n=1010. Here are two findings from this sample survey:Here are two findings from this sample survey:– ““Do you believe that smoking will probably shorten your life, Do you believe that smoking will probably shorten your life,

or not?or not?848 of 1010 said “yes”848 of 1010 said “yes”– ““Have you ever tried to give up smoking?”Have you ever tried to give up smoking?”707 of 1010 said 707 of 1010 said

“yes”“yes”

Construct a 95% confidence interval for the proportion Construct a 95% confidence interval for the proportion of all American smokers who think that smoking will of all American smokers who think that smoking will probably shorten their lives.probably shorten their lives.

Page 9: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Example: Will smoking shorten your life? Cont.Example: Will smoking shorten your life? Cont.

--We want to estimate p=the actual proportion --We want to estimate p=the actual proportion of all American smokers who think smoking will of all American smokers who think smoking will shorten their lives using the sample proportionshorten their lives using the sample proportion

--we will check to see if we can create a one---we will check to see if we can create a one-proportion confidence intervalproportion confidence interval– We are told to treat the sample as an SRSWe are told to treat the sample as an SRS– Since are both at least Since are both at least

10, we are safe using the Normal approximation.10, we are safe using the Normal approximation.– There are at least 10,100 American smokers so There are at least 10,100 American smokers so

random sampling ensures independent responses.random sampling ensures independent responses.

848ˆ 0.83961010

p

ˆ ˆ848 (1 ) 162np and n p

Page 10: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Example: Will smoking shorten your life? Cont.Example: Will smoking shorten your life? Cont.

--About 84% of the smokers in the sample --About 84% of the smokers in the sample thought that smoking will probably shorten their thought that smoking will probably shorten their lives. To extend this result to the population, lives. To extend this result to the population, report the confidence interval:report the confidence interval:

ˆ ˆ(1 ) (0.8396)(0.1604)ˆ * 0.8396 1.96

10100.8396 0.0226

(0.8170,0.8622)

(0.81697,0.86224) /

p pp z

n

w calc

Page 11: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Example: Will smoking shorten your life? Cont.Example: Will smoking shorten your life? Cont.

--We are 95% confident that between 81.7% --We are 95% confident that between 81.7% and 86.2% of all smokers believe that smoking and 86.2% of all smokers believe that smoking will probably shorten their lives because the will probably shorten their lives because the methods we used will yield an interval such that methods we used will yield an interval such that 95% of all such intervals will capture the true 95% of all such intervals will capture the true proportion. proportion.

Page 12: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Example NotesExample Notes

Harris poll suggests avoiding “margin of error” Harris poll suggests avoiding “margin of error” statements because the general public does statements because the general public does not understand the meaning.not understand the meaning.Margin of error includes the range of variation Margin of error includes the range of variation due to the play of chance in choosing a random due to the play of chance in choosing a random sample.sample.Margin of error DOES NOT INCLUDE variation Margin of error DOES NOT INCLUDE variation due to refusual to be interviewed (non-due to refusual to be interviewed (non-response), question wording, question order, response), question wording, question order, interviewer bias, weighting by demographic interviewer bias, weighting by demographic control data and screening, etc.control data and screening, etc.

Page 13: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

Choosing a Sample SizeChoosing a Sample Size

To determine the appropriate sample size To determine the appropriate sample size

for a desired margin of error, use the following for a desired margin of error, use the following formula:formula:

– Round Round UPUP to the nearest whole number to the nearest whole number– p* is a guessed value for the sample proportion p* is a guessed value for the sample proportion

based on either a pilot study or previous experience. based on either a pilot study or previous experience. Using a p* of 0.5 will yield the most conservative Using a p* of 0.5 will yield the most conservative (largest) estimate of the necessary sample size. If (largest) estimate of the necessary sample size. If we have something else available to it, we use it we have something else available to it, we use it because sampling costs money.because sampling costs money.

*(1 *)*p p

z mn

Page 14: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

CALCULATOR FUNCTIONSCALCULATOR FUNCTIONS

You may be able to find this on your own You may be able to find this on your own by now, but just in case, you will be by now, but just in case, you will be looking for:looking for:– 1-PropZInt1-PropZInt

Note: x is your number of successes while n is Note: x is your number of successes while n is your total trialsyour total trials

Page 15: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

+4 Confidence Interval for Proportions+4 Confidence Interval for Proportions

THE FOLLOWING THE FOLLOWING INFORMATION IS NOT INFORMATION IS NOT IN YOUR BOOK IN YOUR BOOK (at least not in detail)(at least not in detail)

BUT WILL BE BUT WILL BE ADDRESSED ON YOUR ADDRESSED ON YOUR

QUIZ AND/OR YOUR QUIZ AND/OR YOUR TESTTEST

Page 16: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

+4 Confidence Interval for Proportions+4 Confidence Interval for ProportionsThe confidence interval we have used so far for the The confidence interval we have used so far for the population proportion p is easy to calculate and easy population proportion p is easy to calculate and easy to understand because it rests directly on the to understand because it rests directly on the approximately Normal distribution of . approximately Normal distribution of . Unfortunately, this interval is often quite inaccurate Unfortunately, this interval is often quite inaccurate unless the sample is very large. The actual unless the sample is very large. The actual confidence level is actually LESS than the confidence confidence level is actually LESS than the confidence level you asked for in choosing the critical value z*. level you asked for in choosing the critical value z*. THAT IS BAD! And, accuracy does not consistently THAT IS BAD! And, accuracy does not consistently get better as the sample size n increases.get better as the sample size n increases.Fortunately, there is a simple modification that is Fortunately, there is a simple modification that is almost magically effective in improving the accuracy of almost magically effective in improving the accuracy of the confidence interval. We call it the “plus four” the confidence interval. We call it the “plus four” method, because all you need to do is add four method, because all you need to do is add four imaginary observations (2 success, 2 failures).imaginary observations (2 success, 2 failures).

Page 17: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

+4 Confidence Interval for Proportions+4 Confidence Interval for ProportionsThe plus four estimate of p isThe plus four estimate of p is

The formula for the confidence interval is exactly as The formula for the confidence interval is exactly as before, with the new sample size and count of before, with the new sample size and count of successes. With the calculator, just enter the new successes. With the calculator, just enter the new plus four sample size and count of successes into the plus four sample size and count of successes into the usual large-sample procedure.usual large-sample procedure.

USE THIS interval when the confidence level is at USE THIS interval when the confidence level is at least 90% and the sample size n is at least 10.least 90% and the sample size n is at least 10.

2

4

count of successes in the samplep

n

Page 18: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

+4 Confidence Interval EXAMPLE+4 Confidence Interval EXAMPLESome shrubs have the useful ability to resprout from Some shrubs have the useful ability to resprout from their roots after their tops are destroyed. Fire is a their roots after their tops are destroyed. Fire is a particular threat to shrubs in dry climates, as it can particular threat to shrubs in dry climates, as it can injure the roots as well as destroy the aboveground injure the roots as well as destroy the aboveground material. One study of resprouting took place in a dry material. One study of resprouting took place in a dry area of Mexico. The investigators clipped the tops of area of Mexico. The investigators clipped the tops of samples of several species of shrubs. In some cases, samples of several species of shrubs. In some cases, they also applied a propane torch to the stumps to they also applied a propane torch to the stumps to simulate a fire. Of 12 specimens of the shrub simulate a fire. Of 12 specimens of the shrub Krameria cytisoides, 5 resprouted after fire. Krameria cytisoides, 5 resprouted after fire.

Give a 95% confidence interval for the proportion of all Give a 95% confidence interval for the proportion of all shrubs of this species that will resprout after fire.shrubs of this species that will resprout after fire.

Page 19: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

+4 Confidence Interval EXAMPLE cont.+4 Confidence Interval EXAMPLE cont.This sample is not large enough to allow use of the traditional large-This sample is not large enough to allow use of the traditional large-sample confidence interval. The plus-four confidence interval will be sample confidence interval. The plus-four confidence interval will be quite accurate even for this small sample. quite accurate even for this small sample. The usual sample proportion isThe usual sample proportion is

Note that we don’t have enough successes or failures to use the Note that we don’t have enough successes or failures to use the large-sample confidence intervallarge-sample confidence intervalThe plus four sample proportion isThe plus four sample proportion is

NOTE: The plus four estimate always moves away from 0 or 1 and NOTE: The plus four estimate always moves away from 0 or 1 and toward 0.5. The result is not very different from unless is very toward 0.5. The result is not very different from unless is very near 0 or 1. The plus four adjustment is immediately attractive if, for near 0 or 1. The plus four adjustment is immediately attractive if, for example, 12 of 12 sample shrubs of a species resprout. We don’t example, 12 of 12 sample shrubs of a species resprout. We don’t really think that p = 1. The plus four estimate of 14/16=0.875 seems really think that p = 1. The plus four estimate of 14/16=0.875 seems more plausible. More importantly, the consistent moving toward 0.5 more plausible. More importantly, the consistent moving toward 0.5 can have a major effect on the coverage probability of the confidence can have a major effect on the coverage probability of the confidence interval, which is often much closer to the desired 0.95 for the plus interval, which is often much closer to the desired 0.95 for the plus four interval.four interval.

5 2 70.4375

12 4 16p

5ˆ 0.41712

p

Page 20: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

+4 Confidence Interval EXAMPLE cont.+4 Confidence Interval EXAMPLE cont.--here is the plus four interval--here is the plus four interval

(1 ) (0.4375)(0.5625)* 0.4375 1.96

4 160.4375 0.2431

(0.1944,0.6806)

p pp z

n

--We are 95% confident that between 19% and 68% of --We are 95% confident that between 19% and 68% of this species will resprout after being burned because this species will resprout after being burned because we used a method that yields intervals such that 95% we used a method that yields intervals such that 95% of all intervals will capture the true proportion for this of all intervals will capture the true proportion for this population of plants.population of plants.

This interval is so wide because the sample is small.This interval is so wide because the sample is small.

Page 21: SECTION 10.3 Estimating a Population Proportion. NOW WHAT NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed

+4 Confidence Interval for Proportions+4 Confidence Interval for ProportionsThe numerical difference between the large-sample and plus The numerical difference between the large-sample and plus four intervals is often small. Remember that the confidence four intervals is often small. Remember that the confidence level is the probability that the interval will catch the true level is the probability that the interval will catch the true population proportion in very many uses. Small differences population proportion in very many uses. Small differences every time add up to accurate confidence levels from plus four every time add up to accurate confidence levels from plus four versus inaccurate levels from the large-sample interval. versus inaccurate levels from the large-sample interval. HOW MUCH MORE ACCURATE?HOW MUCH MORE ACCURATE? Computer studies have run numbers to determine needed Computer studies have run numbers to determine needed sample sizes to ensure that the confidence level was accurate. sample sizes to ensure that the confidence level was accurate. It was found that for a 95% confidence interval to cover the true It was found that for a 95% confidence interval to cover the true parameter at least 94% of the time, if p=0.1 the sample size parameter at least 94% of the time, if p=0.1 the sample size needs to be 646 for the large-sample interval but only 11 for the needs to be 646 for the large-sample interval but only 11 for the plus four interval. The consensus of computational and plus four interval. The consensus of computational and theoretical studies is that plus four is much better than the theoretical studies is that plus four is much better than the large-sample interval for many combinations of n and p.large-sample interval for many combinations of n and p.