chap_06

70
360 Prelude Reacting to numbers I van is the product manager in charge of blenders for a manufacturer of household appliances. Last month’s sales report showed a 4% increase in blender sales to retailers, so Ivan treated the sales staff to lunch. This month, blender sales decline by 3%. Ivan is mystified—he can find no reason why sales should fall. Perhaps the sales force is getting lazy. Ivan expresses his disappointment, the sales staff are defensive, morale declines Then Caroline, a student intern who has taken a statistics class, asks about the source of the sales data. They are, she is told, estimates based on a sample of customer orders. Complete sales data aren’t available until orders are filled and paid for. Caroline looks at the month-to-month variation in past estimates and at the relationship of the estimates to actual orders in the same month from all customers. Her report notes many reasons why the estimates will vary, including both normal month-to-month variation in blender orders and the additional variation due to using data on just a sample. Her report shows that changes of 3% or 4% up or down often occur simply because the data look only at a sample of orders. She even describes how large a change in the estimates provides good evidence that actual orders are really different from last month’s total. Ivan took every number as fixed and solid. He failed to grasp a statistical truism: we expect data to vary “just by chance.” Only variation larger than typical chance variation is good evidence of a real change. ...

Upload: api-19746504

Post on 16-Nov-2014

47 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: chap_06

360

Prelude

Reacting to numbers

I van is the product manager in charge of blenders for a manufacturer ofhousehold appliances. Last month’s sales report showed a 4% increase in

blender sales to retailers, so Ivan treated the sales staff to lunch. This month,blender sales decline by 3%. Ivan is mystified—he can find no reason whysales should fall. Perhaps the sales force is getting lazy. Ivan expresses hisdisappointment, the sales staff are defensive, morale declines

Then Caroline, a student intern who has taken a statistics class, asksabout the source of the sales data. They are, she is told, estimates based on asample of customer orders. Complete sales data aren’t available until ordersare filled and paid for. Caroline looks at the month-to-month variation inpast estimates and at the relationship of the estimates to actual orders inthe same month from all customers. Her report notes many reasons whythe estimates will vary, including both normal month-to-month variationin blender orders and the additional variation due to using data on justa sample. Her report shows that changes of 3% or 4% up or down oftenoccur simply because the data look only at a sample of orders. She evendescribes how large a change in the estimates provides good evidence thatactual orders are really different from last month’s total.

Ivan took every number as fixed and solid. He failed to graspa statistical truism: we expect data to vary “just by chance.”

Only variation larger than typical chance variation is goodevidence of a real change.

. . .

Page 2: chap_06

6

Introduction toInference

6.1 Estimating with Confidence

6.2 Tests of Significance

6.3 Using Significance Tests

6.4 Power and Inference asa Decision

Introduction toInference

CH

AP

TER

6

Page 3: chap_06

Introduction to Inference362 CHAPTER 6

What are the probabilities?EXAMPLE 6.1

Introduction

��

The purpose of statistical inference is to draw conclusions from data,conclusions that take into account the natural variability in the data. To dothis, formal inference relies on probability to describe chance variation. Wecan then correct our “eyeball” judgment by calculation.

In this chapter we introduce the two most prominent types of formalstatistical inference. Section 6.1 concerns for estimatingthe value of a population parameter. Section 6.2 presents ,which assess the evidence for a claim. Both types of inference are based onthe sampling distributions of statistics. That is, both report probabilitiesthat state .This kind of probability statement is characteristic of standard statisticalinference. Users of statistics must understand the nature of the reasoningemployed and the meaning of the probability statements that appear, forexample, on computer output for statistical procedures.

Because the methods of formal inference are based on sampling distribu-tions, they require a probability model for the data. Trustworthy probabilitymodels can arise in many ways, but the model is most secure and inferenceis most reliable when the data are produced by a properly randomizeddesign.

If this is not true,your conclusions may be open to challenge. Do not be overly impressedby the complex details of formal inference. This elaborate machinery can-not remedy basic flaws in producing the data such as voluntary responsesamples and uncontrolled experiments. Use the common sense developed inyour study of the first three chapters of this book, and proceed to detailedformal inference only when you are satisfied that the data deserve suchanalysis.

This chapter introduces the reasoning of statistical inference. We willdiscuss only a few specific techniques—for inference about the unknownmean of a population. Moreover, we will temporarily make an unrealisticassumption: that we know the standard deviation of the population. Laterchapters will present inference methods for use in most of the settings wemet in learning to explore data. There are libraries—both of books and ofcomputer software—full of more elaborate statistical techniques. Informeduse of any of these methods requires an understanding of the underlying

confidence intervalstests of significance

what would happen if we used the inference method many times

When you use statistical inference you are acting as if the data comefrom a random sample or a randomized experiment.

Suppose we show a new TV commercial and the present commercial to 20consumers each. Twelve of those who see the new ad declare an interest in buyingthe product, versus only 8 who watch the current version. Is the new commercialmore effective? Perhaps, but a difference this large or larger between the resultsin the two groups would occur about one time in five simply because of chancevariation. An effect that could so easily be just chance is not convincing.

Page 4: chap_06

6.1 Estimating with Confidence 363

6.1 Estimating with Confidence

COMMUNITY BANKS

Statistical confidence

population

� �

� �

� � ��

population

1

reasoning. A computer will do the arithmetic, but must still exercisejudgment based on understanding.

One way to characterize a collection of businesses is to determine the averageof some measure of size. Total assets is one commonly used measure. If thecollection of businesses is large, we generally take a sample and use theinformation gathered to make an inference about the entire collection. Weuse the term to refer to the entire collection of interest.

Community banks are banks with less than a billion dollars of assets. Thereare approximately 7500 such banks in the United States. In many studiesof the industry these banks are considered separately from banks that havemore than a billion dollars of assets. The latter banks are called “largeinstitutions.”

The Community Bankers Council of the American Bankers Association(ABA) conducts an annual survey of community banks. For the 110 banksthat make up the sample in a recent survey, the mean assets are 220(in millions of dollars). What can we say about , the mean assets of allcommunity banks?

The sample mean is the natural estimator of the unknown populationmean . We know that is an unbiased estimator of . More important,the law of large numbers says that the sample mean must approach thepopulation mean as the size of the sample grows. The value 220therefore appears to be a reasonable estimate of the mean assets for allcommunity banks. But how reliable is this estimate? A second sample wouldsurely not give 220 again. Unbiasedness says only that there is no systematictendency to underestimate or overestimate the truth. Could we plausibly geta sample mean of 250 or 200 on repeated samples? An estimate without anindication of its variability is of limited value.

Just as unbiasedness of an estimator concerns the center of its samplingdistribution, questions about variation are answered by looking at thespread. The central limit theorem tells us that if the entire population ofcommunity bank assets has mean and standard deviation , then inrepeated samples of size 110 the sample mean approximately followsthe ( / 110) distribution. Suppose that the true standard deviationis equal to the sample standard deviation 161. This is not realistic,

you

x

xx

x

xN ,

s

CA

SE6.1

Page 5: chap_06

Probability = 0.95

Samplingdistributionof

(unknown)µ + 30µ– 30µ

x

x

Introduction to Inference364 CHAPTER 6

FIGURE 6.1

In 95% of all samples, lies within 30 of . So also lieswithin 30 of in those samples.

xx

� �

� �

x�

� �

� �

� � � �

��

� �

although it will give reasonably accurate results for samples as large as110. In the next chapter we will learn how to proceed when is notknown. But for now, we are more interested in statistical reasoning thanin such details of our methods. In repeated sampling the sample mean isapproximately Normal, centered at the unknown population mean , withstandard deviation

16115 millions of dollars

110

Now we can talk about estimating . Consider this line of thought,which is illustrated by Figure 6.1:

The 68–95–99.7 rule says that the probability is about 0.95 that is within30 (two standard deviations of ) of the population mean assets .

To say that lies within 30 of is the same as saying that is within 30of .

So 95% of all samples will capture the true in the interval from 30to 30.

We have simply restated a fact about the sampling distribution of .

. Oursample gave 220. We say that we are that the unknownmean assets for all community banks lie between

30 220 30 190

and

30 220 30 250

x

xx

xx

xx

x Thelanguage of statistical inference uses this fact about what would happen inthe long run to express our confidence in the results of any one sample

x 95% confident

x

x

Page 6: chap_06

6.1 Estimating with Confidence 365

Confidence intervals

APPLY YOURKNOWLEDGE

margin of error

CONFIDENCE INTERVAL

level confidence interval

confidence level ,

6.1 Company invoices.

6.2

6.3

��

margin of error

Be sure that you understand the grounds for our confidence. There areonly two possibilities:

1. The interval between 190 and 250 contains the true .

2. Our SRS was one of the few samples for which is not within 30 of thetrue . Only 5% of all samples give such inaccurate results.

We cannot know whether our sample is one of the 95% for which theinterval 30 catches or one of the unlucky 5%. The statement thatwe are 95% confident that the unknown lies between 190 and 250 isshorthand for saying, “We arrived at these numbers by a process that givescorrect results 95% of the time.”

The interval of numbers between the values 30 is called afor . Like most confidence intervals we will meet, this one has

the form

estimate margin of error

The estimate ( in this case) is our guess for the value of the unknownparameter. The 30 shows how precise we believe our guessis, based on the variability of the estimate. This is a confidence intervalbecause it catches the unknown in 95% of all possible samples.

A for a parameter has two parts:

An interval calculated from the data, usually of the form

estimate margin of error

A which gives the probability that the intervalwill capture the true parameter value in repeated samples.

x

x

x

x

x

x 95% confidenceinterval

x

95%

The mean amount for all of the invoices for yourcompany last month is not known. Based on your past experience, you arewilling to assume that the standard deviation of invoice amounts is about$200. If you take a random sample of 100 invoices, what is the value of thestandard deviation for ?

In the setting of the previous exercise, the 68–95–99.7 rule says that theprobability is about 0.95 that is within of the population mean

. Fill in the blank.

In the setting of the previous two exercises, about 95% of all samples willcapture the true mean of all of the invoices in the interval plus or minus

. Fill in the blank.

C

C

APPLY YOURKNOWLEDGE

Page 7: chap_06

µ

Density curve of x

Introduction to Inference366 CHAPTER 6

FIGURE 6.2

Twenty-five samples from the same population gave these95% confidence intervals. In the long run, 95% of all samples give aninterval that covers .

APPLY YOURKNOWLEDGE

www.whfreeman.com/pbs

6.4 80% confidence intervals.

��

Figure 6.2 illustrates the behavior of 95% confidence intervals in repeatedsampling. The center of each interval is at and therefore varies from sampleto sample. The sampling distribution of appears at the top of the figure.The 95% confidence intervals 30 from 25 SRSs appear below. Thecenter of each interval is marked by a dot. The arrows on either sideof the dot span the confidence interval. All except one of the 25 intervalscover the true value of . In a very large number of samples, 95% ofthe confidence intervals would contain . You can choose the confidencelevel. Common practice is to choose 95%, but 90% and 99% are alsopopular.

The applet at animatesFigure 6.2 and allows you to choose among several levels of confidence.This interactive applet is an excellent way to grasp the idea of a confidenceinterval.

xx

xx

Confidence Intervals

The idea of an 80% confidence interval is thatthe interval captures the true parameter value in 80% of all samples. That’snot high enough confidence for practical use, but 80% hits and 20% misses

APPLY YOURKNOWLEDGE

Page 8: chap_06

Probability = CProbability = 21 – C

StandardNormal curve

–z* z*0

Probability = 21 – C

6.1 Estimating with Confidence 367

FIGURE 6.3

Confidence interval for a population mean

The area between the critical values and under thestandard Normal curve is .

z zC

�� �

� �

��

� � �

We use the sampling distribution of the sample mean to construct a levelconfidence interval for the mean of a population. We assume that the

data are an SRS of size . The sampling distribution is exactly ( / )when the population has the ( ) distribution. The central limit theoremsays that this same sampling distribution is approximately correct forlarge samples whenever the population mean and standard deviation areand .

Our construction of a 95% confidence interval for the mean assetsof community banks began by noting that any Normal distribution hasprobability about 0.95 within 2 standard deviations of its mean. Toconstruct a level confidence interval, we first catch the central areaunder a Normal curve. Because all Normal distributions are the same in thestandard scale, we can obtain everything we need from the standard Normalcurve. Figure 6.3 shows the relationship between the central area and the

Confidence Intervals

xC

n N , nN ,

C C

C

make it easy to see how a confidence interval behaves in repeated samplesfrom the same population.

(a) Set the confidence level in the applet to 80%. Click“Sample” to choose an SRS and calculate the confidence interval. Dothis 10 times to simulate 10 SRSs with their 10 confidence intervals.How many of the 10 intervals captured the true mean ? How manymissed?

(b) You see that we can’t predict whether the next sample will hit or miss.The confidence level, however, tells us what percent will hit in thelong run. Reset the applet and click “Sample 50” to get the confidenceintervals from 50 SRSs. How many hit? Keep clicking “Sample 50” andrecord the percent of hits among 100, 200, 300, 400, 500, 600, 700,800, and 1000 SRSs. Even 1000 samples is not truly “the long run,” butwe expect the percent of hits in 1000 samples to be fairly close to theconfidence level, 80%.

Page 9: chap_06

Introduction to Inference368 CHAPTER 6 �

critical values

CONFIDENCE INTERVAL FOR A POPULATION MEAN

level confidence interval

critical value

margin of error.

� �

� �� �

� �

�� �

�� �

� �

� �

� �

� �

� �

� �

� � �

�� �

critical value

points that mark off this area. Values of for many choices of appearin the row labeled in Table D at the back of the book. Here are the mostimportant entries from that part of the table:

1.645 1.960 2.576

90% 95% 99%

Values that mark off specified areas are called of thestandard Normal distribution. Notice that for 95% the table gives

1 960. This is slightly more precise than the value 2 based on the68–95–99.7 rule.

Any Normal curve has probability between the point standarddeviations below the mean and the point above the mean, as Fig-ure 6.3 reminds us. The sample mean has the Normal distribution withmean and standard deviation / . So there is probability that liesbetween

and

This is exactly the same as saying that the unknown population mean liesbetween

and

So, the probability that the interval

/

contains is . This is our confidence interval. The estimate of the unknownis , and the margin of error is / .

Choose an SRS of size from a population having unknown meanand known standard deviation . A for is

Here is the with area between and underthe standard Normal curve. The quantity

/

is the The interval is exact when the populationdistribution is Normal and is approximately correct when is largein other cases.

z z Cz

z

C

zC

z . z

C zz

xn C x

z zn n

x z x zn n

x z n

Cx z n

n

x zn

z C z z

z n

n

C

Page 10: chap_06

6.1 Estimating with Confidence 369

Banks’ loan-to-deposit ratioEXAMPLE 6.2

How confidence intervals behave

APPLY YOURKNOWLEDGE

6.5 Bank assets.

6.6

��

��

Case1

� �

� �

� �

In this example we used an estimate for based on a large sample.Sometimes we may know quite accurately from past experience withsimilar data. When we have few data, past experience may be better thanestimation from our sample. In general, however, we can’t act as if we knowthe population standard deviation . The next chapter presents methodsthat don’t assume that we know .

The margin of error / for estimating the mean of a Normal populationillustrates several important properties that are shared by all confidenceintervals in common use. In particular, a higher confidence level increasesand so increases the margin of error for intervals based on the same data.We would like high confidence and a small margin of error, but improvingone degrades the other. If a confidence interval’s margin of error is too large,there are only three ways to reduce it:

Use a lower level of confidence (smaller , hence smaller ).

Reduce .

Increase the sample size (larger ).

x . s .s

z .

.z . .

n

x z . .n

. , .

z n

z

C z

n

The ABA survey of community banks also asked about the loan-to-deposit ratio(LTDR), a bank’s total loans as a percent of its total deposits. The mean LTDR forthe 110 banks in the sample is 76 7 and the standard deviation is 12 3.This sample is sufficiently large for us to use as the population here. Give a95% confidence interval for the mean LTDR for community banks.

For 95% confidence, we see from Table D that 1 96. The margin of error is

12 31 96 2 3

110

A 95% confidence interval for is therefore

76 7 2 3

(74 4 79 0)

We are 95% confident that the mean LTDR is between 74.4% and 79.0%.

The 110 banks in the ABA survey had mean assetsof 220 million dollars. The standard deviation of their assets was161. Assume that the sample standard deviation can be used inplace of the population standard deviation. Give a 99% confidence intervalfor , the mean assets for all community banks.

In the setting of the previous exercise, would the margin of errorfor 90% confidence be larger or smaller? Verify your answer byperforming the calculations.

� �

APPLY YOURKNOWLEDGE

CA

SE6

.1C

ASE

6.1

CA

SE6.1

Page 11: chap_06

n = 25

n = 110

72 74 76 78 80 82

Introduction to Inference370 CHAPTER 6

FIGURE 6.4

How confidence intervals behaveEXAMPLE 6.3

Confidence intervals for 110 and 25, forExamples 6.2 and 6.3.

n n

� �

� �

� �

� �

� �

� �

The common choices of confidence level are 99%, 95%, and 90%. Thecritical values for these levels are 2.576, 1.960, and 1.645, decreasing asthe confidence level drops. Figure 6.3 makes it clear why is smaller forlower confidence (smaller ). If and are unchanged, settling for lowerconfidence will reduce the margin of error.

The standard deviation measures variation in the population. Thinkof the variation among individuals in the population as noise that obscuresthe average value . It is harder to pin down the mean of a highlyvariable population. This is why the margin of error of a confidence intervalincreases with . Sometimes we can reduce by carefully controlling themeasurement process or by restricting our attention to only part of a largepopulation.

Increasing the sample size also reduces the margin of error. Supposewe want to cut the margin of error in half. The square root in the formulaimplies that we must have four times as many observations, not just twiceas many.

x

.z . .

n

x z . .n

. , .

z .

.z .

n

.

zz

C n

n

Suppose there were only 25 banks in the survey of community banks, and thatand are unchanged from Example 6.2. The margin of error increases from 2.3 to

12 31 96 4 8

25

A 95% confidence interval for is therefore

76 7 4 8

(71 9 81 5)

Figure 6.4 illustrates how the margin of error increases for the smaller sample.Suppose next that we demand 99% confidence for the mean LTDR in

Example 6.2, rather than 95%. Table D tells us that for 99% confidence,2 576. The margin of error increases from 2.3 to

12 32 576

110

3 0

� �

CA

SE6.1

Page 12: chap_06

99%confidence

95%confidence

72 74 76 78 80 82

6.1 Estimating with Confidence 371

FIGURE 6.5

Choosing the sample size

Confidence intervals for 95% and 99% confidence from thesame data, for Example 6.3.

SAMPLE SIZE FOR DESIRED MARGIN OF ERROR

� �

� �

2

A wise user of statistics never plans data collection without at the sametime planning the inference. You can arrange to have both high confidenceand a small margin of error. The margin of error of the confidence interval

/ for a Normal mean is

To obtain a desired margin of error , just set this expression equal to ,substitute the critical value for your desired confidence level, and solvefor the sample size . Here is the result.

The confidence interval for a population mean will have a specifiedmargin of error when the sample size is

In practice, observations cost time and money. The sample size youcalculate from this formula may turn out to be too expensive. Note that itis the size and not the population size that determines the marginof error. The size of the (as long as it is much larger than thesample) does not influence the margin of error.

x . .

. , .

. .

x z n

zn

m mz

n

m

zn

m

samplepopulation

The 99% confidence interval is therefore

margin of error 76 7 3 0

(73 7 79 7)

Requiring 99% confidence rather than 95% confidence has increased the margin oferror from 2 3 to 3 0. Figure 6.5 compares the two intervals.

� �

Page 13: chap_06

Introduction to Inference372 CHAPTER 6

How many banks should we survey?EXAMPLE 6.4

Some cautions

APPLY YOURKNOWLEDGE

6.7 Starting salaries.

6.8

� � �

� �� �

� � �

2 2

Always round to the next higher whole number when using theformula for the sample size . In practice we often calculate the margins oferror corresponding to a range of values of . We then decide what marginof error we can afford.

We have already seen that small margins of error and high confidence canrequire large numbers of observations. You should also be keenly awarethat . Ifthe government required statistical procedures to carry warning labels likethose on drugs, most inference methods would have long labels indeed. Ourformula / for estimating a Normal mean comes with the followinglist of warnings for the user:

The data must be an SRS from the population. We are completely safe ifwe actually did a randomization and drew an SRS. The ABA survey thatlies behind Examples 6.1 to 6.4 is based on a random sample of banks.We are not in great danger if the data can plausibly be thought of asindependent observations from a population.

n

z ..

z . .n .

m .

upn

n

any formula for inference is correct only in specific circumstances

x z n

In Example 6.2 we found that the margin of error was 2.3 for estimating the meanLTDR of community banks with 110 and 95% confidence. We are willing tosettle for a margin of error of 3.0 when we do the survey next year. How manybanks should we survey?

For 95% confidence, Table D gives 1 960. We will assume that thestandard deviation next year will be approximately the same as the 12 3 thatwe obtained this year. For margin of error of 3.0 we have

1 96 12 364 6

3 0

Because 64 observations will give a slightly wider interval than desired and 65observations a slightly narrower interval, we will survey 65 banks. With thischoice, our estimate of mean assets will be within 3.0 of the true value with 95%confidence.

You are planning a survey of starting salaries for recentbusiness major graduates from your college. From a pilot study you estimatethat the standard deviation is about $8000. What sample size do you needto have a margin of error equal to $500 with 95% confidence?

Suppose that in the setting of the previous exercise you are willing to settlefor a margin of error of $1000. Will the required sample size be larger orsmaller? Verify your answer by performing the calculations.

APPLY YOURKNOWLEDGE

CA

SE6.1

Page 14: chap_06

6.1 Estimating with Confidence 373

��

��

The formula is not correct for probability sampling designs more complexthan an SRS. Correct methods for other designs are available. We willnot discuss confidence intervals based on multistage or stratified samples.If you plan such samples, be sure that you (or your statistical consultant)know how to carry out the inference you desire.

There is no correct method for inference from data haphazardly collectedwith bias of unknown size. Fancy formulas cannot rescue badly produceddata.

Because is not resistant, outliers can have a large effect on the confidenceinterval. You should search for outliers and try to correct them or justifytheir removal before computing the interval. If the outliers cannot beremoved, ask your statistical consultant about procedures that are notsensitive to outliers.

If the sample size is small and the population is not Normal, the trueconfidence level will be different from the value used in computingthe interval. Examine your data carefully for skewness and other signs ofnon-Normality. The interval relies only on the distribution of , whicheven for quite small sample sizes is much closer to Normal than that ofthe individual observations. When 15, the confidence level is notgreatly disturbed by non-Normal populations unless extreme outliers orquite strong skewness is present. We will discuss this issue in more detailin the next chapter.

You must know the standard deviation of the population. This unreal-istic requirement renders the interval / of little use in statisticalpractice. We will learn in the next chapter what to do when is unknown.If, however, the sample is large, the sample standard deviation will beclose to the unknown . The formula / is then an approximateconfidence interval for .

The most important caution concerning confidence intervals is a conse-quence of the first of these warnings.

The margin of error is ob-tained from the sampling distribution and indicates how much error canbe expected because of chance variation in randomized data production.Practical difficulties such as undercoverage and nonresponse in a samplesurvey cause additional errors. These errors can be larger than the randomsampling error, particularly when the sample size is large. Remember thisunpleasant fact when reading the results of an opinion poll. The practicalconduct of a survey influences the trustworthiness of its results in ways thatare not included in the announced margin of error.

Every inference procedure that we will meet has its own list of warnings.Because many of the warnings are similar to those above, we will not printthe full warning label each time. If we use the mathematics of probability,it is easy to state conditions under which a method of inference is exactly

x

C

x

n

x z n

sx z s n

The margin of error in a confidenceinterval covers only random sampling errors.

Page 15: chap_06

Introduction to Inference374 CHAPTER 6

EYOND THE ASICS: HE OOTSTRAP

APPLY YOURKNOWLEDGE

B B T B

bootstrap

resample.

6.9 Internet users.

� � � �

��

��

bootstrap

resample

3

2

correct. These conditions are fully met in practice. For example, nopopulation is exactly Normal. Deciding when a statistical procedure shouldbe used in practice often requires judgment assisted by exploratory analysisof the data. Mathematical facts are therefore only a part of statistics. Thedifference between statistics and mathematics can be stated thus: mathemat-ical theorems are true; statistical methods are often effective when used withskill.

Finally, you should understand what statistical confidence does notsay. We are 95% confident that the mean assets of community banks inCase 6.1 lie between $190 million and $250 million. This says that thesenumbers were calculated by a method that gives correct results in 95% ofall possible samples. It does say that the probability is 95% that the truemean falls between 190 and 250. No randomness remains after we drawa particular sample and get from it a particular interval. The true meaneither is or is not between 190 and 250. Probability in its interpretationas a description of random behavior makes no sense in this situation. Theprobability calculations of standard statistical inference describe how oftenthe gives correct answers.

Confidence intervals are based on sampling distributions. The intervalfollows from the fact that the sampling distribution of is

( ) when the data are an SRS from a ( ) population. The cen-tral limit theorem says that this sampling distribution is also approximatelyright for large samples from non-Normal populations.

What if the population is clearly non-Normal and we have only a smallsample? Then we do not know what the sampling distribution of lookslike. The is a procedure for approximating sampling distributionswhen theory cannot tell us their shape.

The basic idea is to act as if our sample were the population. We takemany samples from it. Each of these is called a For each resample,we calculate the mean . We get different results from different resamplesbecause we sample . That is, an observation in the originalsample can appear more than once in a resample.

Detailed surveys were sent to more than 13,000 organizations on the Inter-net; 1,468 usable responses were received. According to Mr. Quarterman,the margin of error is 2.8 percent, with a confidence level of 95 percent.

response rate

never

not

method

x z s n xN , n N ,

x

xwith replacement

A survey of users of the Internet found that males outnum-bered females by nearly 2 to 1. This was a surprise, because earlier surveyshad put the ratio of men to women closer to 9 to 1. Later in the article wefind this information:

(a) What was the for this survey? (The response rate is thepercent of the planned sample that responded.)

(b) Do you think that the small margin of error is a good measure of theaccuracy of the survey’s results? Explain your answer.

APPLY YOURKNOWLEDGE

Page 16: chap_06

6.1 Estimating with Confidence 375

Check the interval with the bootstrap

ECTION UMMARY

EXAMPLE 6.5

S 6.1 Sconfidence interval

The community bank assets data described in Case 6.1 are somewhatskewed toward high values. The mean $220 million is larger than themedian, $190 million. There are two large banks with assets of $908and $804 million that could be considered outliers. Because we have noreason to suspect that these banks are qualitatively different from the othercommunity banks in the sample, we are reluctant to discard them. Rather,we will reexamine our inference with a method that is not sensitive to theNormality assumption.

If we had 1000 SRSs from the population of banks, the distributionof the 1000 values of would be close to the sampling distribution of .We have only one sample. The bootstrap idea says: take 1000 resamplesof size 110 (with replacement) from our one sample of 110 banks. Foreach resample, compute the mean. Treat the distribution of ’s from the1000 resamples as if it were the sampling distribution. Because the sampleis like the population, taking resamples from the sample is similar to takingsamples from the population.

The bootstrap is practical only when you can use a computer to take1000 or more resamples quickly. It is an example of how fast and cheapcomputing has changed the way we do statistics.

If the bootstrap method applied to this problem gave very differentresults, we would need to reconsider the analysis. Obtaining similar resultsfrom different methods gives us confidence that the answers we seek arecoming from the data, not from the particular method that we used.

The purpose of a is to estimate an unknownparameter with an indication of how accurate the estimate is and of howconfident we are that the result is correct.

Any confidence interval has two parts: an interval computed from thedata and a confidence level. The interval often has the form

estimate margin of error

,x

,

x x

x

In the discussion of Case 6.1 we found a 95% confidence interval for the meanassets of community banks using the traditional statistical procedure based on themiddle 95% of a Normal distribution. The interval was (190 250). Let’s checkto see if we made a serious error. The middle 95% of the 1000 ’s from 1000resamples runs from 193 to 249. Try it again with 1000 new resamples: we getthe interval (191 252). The two bootstrap estimates are close to each other andalso close to the standard interval that assumes Normality. The standard interval isreasonably accurate for these data.

CA

SE6.1

Page 17: chap_06

Introduction to Inference376 CHAPTER 6

ECTION XERCISES

S 6.1 E

confidence level

critical value

margin of error

6.10 Apartment rental rates.

��

� �

� �

2

The states the probability that the method will give acorrect answer. That is, if you use 95% confidence intervals often, in thelong run 95% of your intervals will contain the true parameter value.When you apply the method once, you do not know if your interval gave acorrect value (this happens 95% of the time) or not (this happens 5% ofthe time).

A level confidence interval for the mean of a Normal populationwith known standard deviation , based on an SRS of size , is given by

Here is a obtained from the bottom row in Table D. Theprobability is that a standard Normal random variable takes a valuebetween and .

The in the confidence interval for is

Other things being equal, the margin of error of a confidence intervaldecreases as

the confidence level decreases,

the population standard deviation decreases, or

the sample size increases.

The sample size required to obtain a confidence interval of specifiedmargin of error for a Normal mean is

where is the critical value for the desired level of confidence.

A specific confidence interval formula is correct only under specificconditions. The most important conditions concern the method usedto produce the data. Other factors such as the form of the populationdistribution may also be important.

Cn

x zn

zC

z z

zn

C

n

m

zn

m

z

You want to rent an unfurnished one-bedroomapartment for next semester. The mean monthly rent for a random sampleof 10 apartments advertised in the local newspaper is $540. Assume thatthe standard deviation is $80. Find a 95% confidence interval for the mean

Page 18: chap_06

6.1 Estimating with Confidence 377

6.11 Study habits.

6.12 Clothing for runners.

6.13 Pounds versus kilograms.

6.14 99% versus 95% confidence interval.

6.15 Hotel managers.

x

� .

x

monthly rent for unfurnished one-bedroom apartments available for rent inthis community.

A questionnaire about study habits was given to a randomsample of students taking a large introductory statistics class. The sample of25 students reported that they spent an average of 110 minutes per weekstudying statistics. Assume that the standard deviation is 40 minutes.

(a) Give a 95% confidence interval for the mean time spent studyingstatistics by students in this class.

(b) Is it true that 95% of the students in the class have weekly study timesthat lie in the interval you found in part (a)? Explain your answer.

Your company sells exercise clothing and equipmenton the Internet. To design the clothing, you collect data on the physicalcharacteristics of your different types of customers. Here are the weights (inkilograms) for a sample of 24 male runners. Assume that these runners can beviewed as a random sample of your potential male customers. Suppose alsothat the standard deviation of the population is known to be 4 5 kg.

67.8 61.9 63.0 53.1 62.3 59.7 55.4 58.9 60.9 69.2 63.7 68.364.7 65.6 56.0 57.8 66.0 62.9 53.6 65.0 55.8 60.4 69.3 61.7

(a) What is , the standard deviation of ?

(b) Give a 95% confidence interval for , the mean of the population fromwhich the sample is drawn.

(c) Will the interval contain the weights of approximately 95% of similarrunners? Explain your answer.

Suppose that the weights of the runners in Exer-cise 6.12 were recorded in pounds rather than kilograms. Use your answersto Exercise 6.12, and the fact that 1 kilogram equals 2.2 pounds, to answerthese questions.

(a) What is the mean weight of these runners?

(b) What is the standard deviation of the mean weight?

(c) Give a 95% confidence interval for the mean weight of the populationof runners that these runners represent.

Find a 99% confidence interval forthe mean weight of the population of male runners in Exercise 6.12. Is the99% confidence interval wider or narrower than the 95% interval found inExercise 6.12? Explain in plain language why this is true.

A study of the career paths of hotel general managers sentquestionnaires to an SRS of 160 hotels belonging to major U.S. hotel chains.There were 114 responses. The average time these 114 general managers hadspent with their current company was 11.8 years. Give a 99% confidenceinterval for the mean number of years general managers of major-chainhotels have spent with their current company. (Take it as known thatthe standard deviation of time with the company for all general managers is3.2 years.)

Page 19: chap_06

Introduction to Inference378 CHAPTER 6 �

6.16 Supermarket shoppers.

6.17 Bank workers.

6.18 Hotel managers.

6.19 Supermarket shoppers.

6.20 Bank employees.

6.21 Business mergers.

4

x

For results based on this sample, one can say with 95 percent confidencethat the maximum error attributable to sampling and other random effectsis plus or minus 3 percentage points. In addition to sampling error, questionwording and practical difficulties in conducting surveys can introduce erroror bias into the findings of public opinion polls.

A marketing consultant observed 50 consecutiveshoppers at a supermarket. One variable of interest was how much eachshopper spent in the store. Here are the data (in dollars), arranged inincreasing order:

3.11 8.88 9.26 10.81 12.69 13.78 15.23 15.6217.00 17.39 18.36 18.43 19.27 19.50 19.54 20.1620.59 22.22 23.04 24.47 24.58 25.13 26.24 26.2627.65 28.06 28.08 28.38 32.03 34.98 36.37 38.6439.16 41.02 42.97 44.08 44.67 45.40 46.69 48.6550.39 52.75 54.80 59.07 61.22 70.32 82.70 85.7686.37 93.34

Assume that the standard deviation is $22.00. Find a 95% confidenceinterval for the mean amount spent by shoppers in similar circumstances.

We examined the wages of a random sample of NationalBank black female hourly workers in Example 1.9 on page 32. Here are thedata (in dollars per year):

16015 17516 17274 16555 2078819312 17124 18405 19090 1264117813 18206 19338 15953 16904

Find a 95% confidence interval for the mean earnings of all black femalehourly workers at National Bank. Use $1900 for the standard deviation.

How large a sample of the hotel managers in Exer-cise 6.15 would be needed to estimate the mean within 1 year with99% confidence?

Suppose that you want to perform a study similar tothe survey of supermarket shoppers described in Exercise 6.16. If you wantthe margin of error to be about $4.00, how many shoppers would you needin your sample? (Use $22.00 for the standard deviation in your calculations.)

In Exercise 6.17 we examined the earnings of a randomsample of 15 black women who were hourly workers at National Bank. Ifwe wanted to estimate the mean with an interval of the form 500, howmany observations are required?

A Gallup Poll asked 1004 adults about mergers betweencompanies. One question was “Do you think the result is usually good forthe economy or bad for the economy?” Forty-three percent of the samplethought mergers were good for the economy. The Gallup press release added:

The Gallup Poll uses a complex multistage sample design, but the samplepercent has approximately a Normal sampling distribution.

Page 20: chap_06

6.1 Estimating with Confidence 379

6.22 Median household income.

6.23 Household income by state.

6.24 More than one confidence interval.

estimate

estimate

estimate

z

z

all

(a) The announced poll result was 43% 3%. Can we be certain that thetrue population percent falls in this interval?

(b) Explain to someone who knows no statistics what the announced result43% 3% means.

(c) This confidence interval has the same form we have met earlier:

estimate

What is the standard deviation of the estimated percent?

(d) Does the announced margin of error include errors due to practicalproblems such as undercoverage and nonresponse?

When the statistic that estimates an unknownparameter has a Normal distribution, a confidence interval for the parameterhas the form

estimate

In a complex sample survey design, the appropriate unbiased estimate of thepopulation mean and the standard deviation of this estimate may requireelaborate computations. But when the estimate is known to have a Normaldistribution and its standard deviation is given, we can calculate a confidenceinterval for from complex sample designs without knowing the formulasthat led to the numbers given.

A report based on the Current Population Survey estimates the 1999median annual earnings of households as $40,816 and also estimates thatthe standard deviation of this estimate is $191. The Current PopulationSurvey uses an elaborate multistage sampling design to select a sample ofabout 50,000 households. The sampling distribution of the estimated medianincome is approximately Normal. Give a 95% confidence interval for the1999 median annual earnings of households.

The previous problem reports data on themedian household income for the entire United States. In a detailed reportbased on the same sample survey, you find that the estimated median incomefor four-person families in Michigan is $65,467. Is the margin of error forthis estimate with 95% confidence greater or less than the margin of errorfor the national median? Why?

As we prepare to take a sample andcompute a 95% confidence interval, we know that the probability that theinterval we compute will cover the parameter is 0.95. That’s the meaning of95% confidence. If we use several such intervals, however, our confidencethat give correct results is less than 95%.

Suppose we are interested in confidence intervals for the median house-hold incomes for three states. We compute a 95% interval for each of thethree, based on independent samples in the three states.

(a) What is the probability that all three intervals cover the true medianincomes? This probability (expressed as a percent) is our overall confi-dence level for the three simultaneous statements.

(b) What is the probability that at least two of the three intervals cover thetrue median incomes?

Page 21: chap_06

Introduction to Inference380 CHAPTER 6 �

6.2 Tests of Significance

The reasoning of significance tests

6.25 An election poll.

6.26 Manager trainee wages.

6.27 Talk show poll.

6.28 An outlier.

Confidence intervals are one of the two most common types of formalstatistical inference. They are appropriate when our goal is to estimate apopulation parameter. The second common type of inference is directed ata different goal: to assess the evidence provided by the data in favor of someclaim about the population.

A significance test is a formal procedure for comparing observed data witha hypothesis whose truth we want to assess. The hypothesis is a statementabout the parameters in a population or model. The results of a testare expressed in terms of a probability that measures how well the data

xs

s

A newspaper headline describing a poll of registered voterstaken two weeks before a recent election reads “Ringel leads with 52%.”The accompanying article says that the margin of error is 3% with 95%confidence.

(a) Explain in plain language to someone who knows no statistics what“95% confidence” means.

(b) The poll shows Ringel leading. But the newspaper article says that theelection was too close to call. Explain why.

A newspaper ad for a manager trainee positioncontained the statement “Our manager trainees have a first-year earningsaverage of $20,000 to $24,000.” Do you think that the ad is describing aconfidence interval? Explain your answer.

A radio talk show invites listeners to enter a dispute abouta proposed pay increase for city council members. “What yearly pay doyou think council members should get? Call us with your number.” In all,958 people call. The mean pay they suggest is $9740 per year, andthe standard deviation of the responses is $1125. For a large samplesuch as this, is very close to the unknown population . The stationcalculates the 95% confidence interval for the mean pay that all citizenswould propose for council members to be $9669 to $9811. Is this resulttrustworthy? Explain your answer.

Exercise 6.12 gives the weights of a sample of 24 male runners.Suppose that the sample actually contained 25 runners. The extra runnerclaimed to weigh 92.3 kg.

(a) Compute the 95% confidence interval for the mean with this observationincluded.

(b) Would you report the interval you calculated in part (a) of this questionor the interval you calculated in Exercise 6.12? Explain the reasons foryour choice.

Page 22: chap_06

6.2 Tests of Significance 381

Banks’ net incomeEXAMPLE 6.6

��

� ��

��

and the hypothesis agree. We use the following example to illustrate theseideas.

What are the key steps in Example 6.6? We want to know if thesample gives good evidence that the mean income for all community bankshas changed in the past year. Suppose that there was no change, that is,that 0. Compare the sample outcome 8 1 with the no-changepopulation mean 0. The comparison takes the form of a probability: If

0, then a sample of size 110 will have a mean at least as far from 0 as8 1 only with probability 0.001.

The fact that the calculated probability is very small leads us to concludethat the average percent change in income is not in fact zero. Here’s why. Ifthe true mean is 0, we would see a sample mean as far away as 8.1%only once per 1000 samples. So there are only two possibilities:

1. 0 and we have observed something very unusual, or

2. is not zero but has some other value that makes the observed datamore probable.

We calculated a probability taking the first of these choices as true. Thatprobability guides our final choice. If the probability is very small, the datadon’t fit the first possibility and we conclude that the mean is not in factzero. Here is an example in which the data lead to a different conclusion.

x .s .

x

sampleall banks

What is the probability of observing a sample mean at least asfar from zero as 8.1%

x .

x .

x .

x

The community bank survey described in Case 6.1 also asked about net incomeand reported the percent change in net income between the first half of lastyear and the first half of this year. As you might expect, there is considerablebank-to-bank variation in this percent. Some banks report an increase whileothers report a decrease. The mean change for the 110 banks in the sample is

8 1%. Because the sample size is large, we are willing to use the samplestandard deviation 26 4% as if it were the population standard deviation .The large sample size also makes it reasonable to assume that is approximatelyNormal.

Is the 8.1% mean increase in a good evidence that the net income forhas changed? After all, the sample result might happen just by chance

even if the true mean change for all banks is 0%. To answer this question, weask another: Suppose that the truth about the population is that 0%. Thisis our hypothesis.

? The answer is 0.001. Because this probability is so small,we see that the sample mean 8 1 is incompatible with a population mean of

0. We conclude that the income of community banks has changed sincelast year.

CA

SE6.1

Page 23: chap_06

–10 –8 –6 –4 –2 0 2 4 6 8 10Percent x = 8.1x = 3.5

= 0µ

Sampling distributionof x when = 0and n = 110

µ

Introduction to Inference382 CHAPTER 6

FIGURE 6.6

Is this percent change different from zero?EXAMPLE 6.7

Stating hypotheses

The mean change in net assets for a sample of 110 bankswill have this sampling distribution if the mean for the population ofall banks is 0. A sample mean 3 5% could easily happenby chance. A sample mean 8 1% is so far out on the curve that itwould rarely happen just by chance.

x .x .

� ��

� � �

�� �

� �

� �

The probabilities in Examples 6.6 and 6.7 measure the compatibility of theoutcomes 8 1 and 3 5 with the hypothesis that 0. Figure 6.6compares the two results graphically. The Normal curve centered at zerois the sampling distribution of when 0. You can see that we arenot particularly surprised to observe 3 5, but 8 1 is an extremeoutcome. That’s the core reasoning of statistical tests:

. Now we proceed to the formal details of tests.

In Example 6.6, we asked whether the bank survey data are plausible if, infact, the true percent change in net income for all banks ( ) is zero. Thatis, we ask if the data provide evidence the claim that the populationmean is zero. The first step in a test of significance is to state a claim that wewill try to find evidence against.

x . .

n.

x .

x . x .

xx . x .

A sample outcome thatwould be extreme if a hypothesis were true is evidence that the hypothesis isnot true

against

Suppose that next year the percent change in net income for a sample of 110 banksis 3 5%. (We assume that the standard deviation is the same, 26 4%.)This sample mean is closer to the value 0 corresponding to no mean change inincome. What is the probability that the mean of a sample of size 110 froma Normal population with mean 0 and standard deviation 26 4 is asfar away or farther away from zero as 3 5? The answer is 0.16. A sampleresult this far from zero would happen just by chance in 16% of samples from apopulation having true mean zero. An outcome that could so easily happen just bychance is not good evidence that the population mean is different from zero.

��

� �

CA

SE6.1

Page 24: chap_06

6.2 Tests of Significance 383

Have we reduced processing time?EXAMPLE 6.8

NULL HYPOTHESIS

null hypothesis.

alternative hypothesis

one-sided two-sided.

a

a

a

a

a

a

a

a

a

a

alternativehypothesis

one-sided andtwo-sided

alternatives

0

0

0

0

0

0

�0

The statement being tested in a test of significance is called theThe test of significance is designed to assess

the strength of the evidence against the null hypothesis. Usually thenull hypothesis is a statement of “no effect” or “no difference.” Weabbreviate “null hypothesis” as .

A null hypothesis is a statement about a population, expressed in termsof some parameter or parameters. The null hypothesis for Example 6.6 is

: 0

It is convenient also to give a name to the statement we hope orsuspect is true instead of . This is called the and isabbreviated as . In Example 6.6, the alternative hypothesis states that thepercent change in net income is not zero. We write this as

: 0

. For this reason, we always state and in terms of populationparameters.

Because expresses the effect that we hope to find evidence , weoften begin with and then set up as the statement that the hoped-foreffect is not present. Stating is not always straightforward. It is notalways clear, in particular, whether should be or

The alternative : 0 in the bank net income example is two-sided.We simply asked if income had changed. In any given year, income mayincrease or decrease, so we include both possibilities in the alternativehypothesis. Here is a setting in which a one-sided alternative is appropriate.

The alternative hypothesis should express the hopes or suspicions wehad in mind when we decided to collect the data. It is cheating to first lookat the data and then frame to fit what the data show. If you do not havea specific direction firmly in mind in advance, use a two-sided alternative.

H .H .

H

H

HH

H

Hypotheses always refer to some population or model, not to a particularoutcome H H

H forH H

HH

H

H

Your company hopes to reduce the mean time required to process customerorders. At present, this mean is 3.8 days. You study the process and eliminate someunnecessary steps. Did you succeed in decreasing the average process time? Youhope to show that the mean is now less than 3.8 days, so the alternative hypothesisis one-sided, : 3 8. The null hypothesis is as usual the “no-change” value,

: 3 8 days.

H

Page 25: chap_06

Introduction to Inference384 CHAPTER 6 �

Test statistics

APPLY YOURKNOWLEDGE

6.29 Customer feedback.

6.30 DXA scanners.

a

a

a

0

0

0

0

0

0

0

In fact, some users of statistics argue that we should work with thetwo-sided alternative.

The choice of the hypotheses in Example 6.8 as

: 3 8

: 3 8

deserves a final comment. We do not expect that elimination of steps in theordering process would actually increase the process time. However, we canallow for an increase by including this case in the null hypothesis. Then wewould write

: 3 8

: 3 8

This statement is logically satisfying because the hypotheses account for allpossible values of . However, only the parameter value in that is closestto influences the form of the test in all common significance-testingsituations. We will therefore take to be the simpler statement that theparameter has a specific value, in this case : 3 8.

We will learn the form of significance tests in a number of commonsituations. Here are some principles that apply to most tests and that help inunderstanding the form of tests:

The test is based on a statistic that estimates the parameter appearingin the hypotheses. Usually this is the same estimate we would use in aconfidence interval for the parameter. When is true, we expect theestimate to take a value near the parameter value specified by .

x

.x

always

H .

H .

H .

H .

HH

HH .

HH

Feedback from your customers shows that many thinkit takes too long to fill out the online order form for your products. Youredesign the form and plan a survey of customers to determine whetheror not they think that the new form is actually an improvement. Sampledcustomers will respond using a five-point scale: 2 if the new form takesmuch less time than the old form; 1 if the new form takes a little lesstime; 0 if the new form takes about the same time; 1 if the new formtakes a little more time; and 2 if the new form takes much more time.The mean response from the sample is , and the mean response for all ofyour customers is . State null and alternative hypotheses that provide aframework for examining whether or not the new form is an improvement.

A dual X-ray absorptiometry (DXA) scanner is used tomeasure bone mineral density for people who may be at risk for osteoporosis.Customers want assurance that your company’s latest-model DXA scanneris accurate. You therefore supply an object called a “phantom” that hasknown mineral density 1 4 grams per square centimeter. A user scansthe phantom 10 times and compares the sample mean reading with thetheoretical mean using a significance test. State the null and alternativehypotheses for this test.

APPLY YOURKNOWLEDGE

Page 26: chap_06

6.2 Tests of Significance 385

Banks’ income: the hypotheses

Banks’ income: the test statistic

EXAMPLE 6.9

EXAMPLE 6.10

test statistic

a

a

� �

test statistic

0

0

0

0

� �

� �

0

0

Values of the estimate far from the parameter value specified bygive evidence against . The alternative hypothesis determines whichdirections count against .

A measures compatibility between the null hypothesis andthe data. Many test statistics can be thought of as a distance between asample estimate of a parameter and the value of the parameter specified bythe null hypothesis.

The income changes for individual banks do not look Normal, but thecentral limit theorem assures us that (and therefore ) is approximatelyNormal because the sample size is large. So if the null hypothesis : 0is true, the test statistic is a random variable that has approximately thestandard Normal distribution (0 1). To move from the test statistic to aprobability, we must do Normal probability calculations. Review Sections1.3 and 4.4 if you need to refresh your skills.

H

H

x Hx

H x .x

xz

n

z

.z .

.

z .

HH

H

x zH

zN , z

For Example 6.6, the hypotheses are stated in terms of the mean change in netincome for all community banks:

: 0

: 0

The estimate of is the sample mean . Because is two-sided, large positiveand negative values of (large increases and decreases of net income in the sample)count as evidence against the null hypothesis.

For Example 6.6, the null hypothesis is : 0, and a sample gave 8 1.The test statistic for this problem is the standardized version of :

/

This statistic is the distance between the sample mean and the hypothesizedpopulation mean in the standard scale of -scores. In this example,

8 1 03 22

26 4/ 110

Even without a formal probability calculation, we know that standard score3 22 lies far out on a Normal curve.

CA

SE6.1

CA

SE6.1

Page 27: chap_06

Introduction to Inference386 CHAPTER 6

Banks’ income: the -valueEXAMPLE 6.11

-valuesP

P

-VALUE

-value

a

a

��

a

� ��

0

0

0

0

0

0

0

� �

0

0

A test of significance assesses the evidence against the null hypothesis andprovides a numerical summary of this evidence in terms of a probability.The idea is that “surprising” outcomes are evidence against . A surprisingoutcome is one that is far from what we would expect if were true.In many cases, including our bank income example, the standard scaleof -scores is one way to measure “far from what we would expect.” InExample 6.10, the standardized distance of the sample outcome from thehypothesized population mean 0 was 3 22. This suggests thatthe sample is not compatible with the null hypothesis : 0. In fact,the Supreme Court of the United States has said that “two or three standarddeviations” ( 2 or 3) is its criterion for rejecting , and this is thecriterion used in most applications involving the law.

Because not all test statistics produce -scores, we translate the values oftest statistics into a common language, the language of probability. A testof significance finds the probability of getting an outcome

. “Extreme” means “farfrom what we would expect if were true.” The direction or directionsthat count as “far from what we would expect” are determined by thealternative hypothesis .

The probability, computed assuming that is true, that the teststatistic would take a value as extreme or more extreme than thatactually observed is called the of the test. The smaller the

-value, the stronger the evidence against provided by the data.

To calculate the -value, we must use the sampling distribution of thetest statistic. In this chapter, we need only the standard Normal distributionfor the test statistic .

n.

x .H H

.z .

.

zz in either direction

H H P

P z . P z .

HH

zx

z .H

z H

z

as extreme ormore extreme than the actually observed outcome

H

H

H

P H

P

z

In Example 6.6 the observations are an SRS of size 110 from a populationof community banks with 26 4. The observed average percent change in netincome is 8 1. In Example 6.10, we found that the test statistic for testing

: 0 versus : 0 is

8 1 03 22

26 4/ 110

If the null hypothesis is true, we expect to take a value not too far from 0.Because the alternative is two-sided, values of far from 0 countas evidence against and in favor of . So the -value is

( 3 22) ( 3 22)

P

P

� � �

� �CA

SE6.1

Page 28: chap_06

P = 0.0012

–3.22 0z

3.22

6.2 Tests of Significance 387

FIGURE 6.7 The -value for Example 6.11. Thetwo-sided -value is the probability (when is true)that takes a value at least as far from 0 as the actuallyobserved value.

0

PP H

x

APPLY YOURKNOWLEDGE

6.31 Spending on housing.

a

� �

� �

0

0

H z N ,P

P Z . . .

P Z . .

P P Z . .

H

H

.

x .z

zP

If is true, then has approximately the standard Normal (0 1) distribution.Figure 6.7 shows the -value as a very small area under the standard Normalcurve.

From Table A, we find

( 3 22) 1 0 9994 0 0006

Similarly, from Table A or using the symmetry of the Normal distribution,

( 3 22) 0 0006

That is,

2 ( 3 22) 0 0012

In Example 6.6 we rounded this value to 0.001.

The Census Bureau reports that households spendan average of 31% of their total spending on housing. A homebuildersassociation in Cleveland wonders if the national finding applies in their area.They interview a sample of 40 households in the Cleveland metropolitanarea to learn what percent of their spending goes toward housing. Taketo be the mean percent of spending devoted to housing among all Clevelandhouseholds. We want to test the hypotheses

: 31%

: 31%

The population standard deviation is 9 6%.

(a) The study finds 28 6% for the 40 households in the sample. Whatis the value of the test statistic ? Sketch a standard Normal curve andmark on the axis. Shade the area under the curve that represents the

-value.

� �

� �

APPLY YOURKNOWLEDGE

Page 29: chap_06

Introduction to Inference388 CHAPTER 6 �

Statistical significance

significancelevel

STATISTICAL SIGNIFICANCE

statistically significant at level .�

6.32

6.33

a

significance level

0

0

0

0

0

0

�0

Statistical software automates the task of calculating the test statistic andits -value. You must still decide which test is appropriate and whether touse a one-sided or two-sided test. You must also decide what conclusionthe computer’s numbers support. We know that smaller -values indicatestronger evidence against the null hypothesis. But how strong is strongenough?

One approach is to announce in advance how much evidence againstwe will require to reject . We compare the -value with a level that says“this evidence is strong enough.” The decisive level is called the

. It is denoted by , the Greek letter alpha. If we choose 0 05, weare requiring that the data give evidence against so strong that it wouldhappen no more than 5% of the time (1 time in 20) when is true. If wechoose 0 01, we are insisting on stronger evidence against , evidenceso strong that it would appear only 1% of the time (1 time in 100) if isin fact true.

If the -value is as small or smaller than , we say that the data are

“Significant” in the statistical sense does not mean “important.” Theoriginal meaning of the word is “signifying something.” In statistics, theterm is used to indicate only that the evidence against the null hypothesisreached the standard set by . Significance at level 0.01 is often expressed by

P

PP

x .

H

H

P

P

HH P

.H

H. H

H

P

(b) Calculate the -value. Are you convinced that Cleveland differs fromthe national average?

In the setting of the previous exercise, suppose that the Cleveland home-builders were convinced, before interviewing their sample, that residents ofCleveland spend less than the national average on housing. Do the inter-views support their conviction? State null and alternative hypotheses. Findthe -value, using the interview results given in the previous problem. Whydo the same data give different -values in these two problems?

The homebuilders wonder if the national finding applies in the Clevelandarea. They have no idea whether Cleveland residents spend more or lessthan the national average. Because their interviews find that 28 6%,less than the national 31%, their analyst tests

: 31%

: 31%

Explain why this is incorrect.

Page 30: chap_06

Area = 0.005 Area = 0.005Area = 0.99

–2.576 2.576z

z notsignificantat = 0.01α

z significantat = 0.01α

z significantat = 0.01α

6.2 Tests of Significance 389

FIGURE 6.8

Significant?EXAMPLE 6.12

Values of in the extreme area 0.01 under a standardNormal curve are significant at 0.01.

z

APPLY YOURKNOWLEDGE

6.34 How to show that you are rich.

� ��

the statement “The results were significant ( 0 01).” Here standsfor the -value. The -value is more informative than a statement ofsignificance because we can then assess significance at any level we choose.For example, a result with 0 03 is significant at the 0 05 level butis not significant at the 0 01 level.

You need not actually find the -value to assess significance at a fixedlevel . You need only compare the observed statistic with athat marks off area in one or both tails of the standard Normal curve.

z ..

zz

z . z. z . z .

z .P .

P . PP P

P . ..

Pz critical value

In Example 6.11, the test statistic took the value 3 22. Is this significant at the0 01 level? Values in the extreme 0.01 area of the standard Normal curve

are significant at the 0.01 level. Because the alternative is two-sided, this area isdivided equally between the two tails of the curve. So is significant if it lies in theextreme area 0.005 at either end. Look at the row at the bottom of Table D.The extreme 0.005 in the right tail starts at 2 576. That is, the -valuesthat are significant at the 0 01 level are 2 576 and 2 576. Figure6.8 shows why. The bank survey outcome 3 22 is statistically significant( 0 01).

Every society has its own marks of wealthand prestige. In ancient China, it appears that owning pigs was such amark. Evidence comes from examining burial sites. If the skulls of sacrificedpigs tend to appear along with expensive ornaments, that suggests thatthe pigs, like the ornaments, signal the wealth and prestige of the personburied. A study of burials from around 3500 B.C. concluded that “thereare striking differences in grave goods between burials with pig skulls and

� �

CA

SE6.1

APPLY YOURKNOWLEDGE

Page 31: chap_06

Introduction to Inference390 CHAPTER 6 �

Tests for a population mean

one-sample statistic

6.35 Significance.

6.36 Significance.

6.37 The Supreme Court speaks.

a

a

� ��

� �

a

. . .

zone-samplestatistic

0

0 0

0

0

5

0

0

There are four steps in carrying out a significance test:

1. State the hypotheses.

2. Calculate the test statistic.

3. Find the -value.

4. State your conclusion in the context of your specific setting.

Once you have stated your hypotheses and identified the proper test, you oryour software can do Steps 2 and 3 by following a recipe. We now presentthe recipe for the test we have used in our examples.

We have an SRS of size drawn from a Normal population withunknown mean . We want to test the hypothesis that has a specifiedvalue. Call the specified value . The null hypothesis is

:

The test is based on the sample mean . Because Normal calculations requirestandardized variables, we will use as our test statistic the standardizedsample mean

/

This has the standard Normal distribution when istrue. The -value of the test is the probability that takes a value at least asextreme as the value for our sample. What counts as extreme is determinedby the alternative hypothesis . Here is a summary.

�H Hz

.

H Hz

.

z zz

z

P

n

H

x

xz

n

HP z

H

burials without them. A test indicates that the two samples of totalartifacts are significantly different at the 0.01 level.” Explain clearly why“significantly different at the 0.01 level” gives good reason to think thatthere really is a systematic difference between burials that contain pig skullsand those that lack them.

You are testing : 0 against : 0 based on an SRSof 20 observations from a Normal population. What values of the statisticare statistically significant at the 0 005 level?

You are testing : 0 against : 0 based on an SRSof 20 observations from a Normal population. What values of the statisticare statistically significant at the 0 005 level?

Court cases in such areas as employmentdiscrimination often involve statistical evidence. The Supreme Court hassaid that -scores beyond 2 or 3 are generally convincing statisticalevidence. For a two-sided test, what significance level corresponds to 2?To 3?

z

� �

� �

Page 32: chap_06

z

z

z

6.2 Tests of Significance 391

Blood pressures of executivesEXAMPLE 6.13

TEST FOR A POPULATION MEAN

one-sample statistic

Step 1: Hypotheses.

a

� �

� �� �

� �

� �

� �

a

a

a � � �

0 0

0

0

0

0

0

�0

To test the hypothesis : based on an SRS of size from apopulation with unknown mean and known standard deviation ,compute the

/

In terms of a variable having the standard Normal distribution,the -value for a test of against

: is ( )

: is ( )

: is 2 ( )

These -values are exact if the population distribution is Normal andare approximately correct for large in other cases.

x .

H

H

H n

xz

n

ZP H

H P Z z

H P Z z

H P Z z

Pn

The medical director of a large company is concerned about the effects of stress onthe company’s younger executives. According to the National Center for HealthStatistics, the mean systolic blood pressure for males 35 to 44 years of age is 128and the standard deviation in this population is 15. The medical director examinesthe records of 72 executives in this age group and finds that their mean systolicblood pressure is 129 93. Is this evidence that the mean blood pressure forall the company’s young male executives is higher than the national average? Asusual in this chapter, we make the unrealistic assumption that the populationstandard deviation is known, in this case that executives have the same 15 asthe general population.

The hypotheses about the unknown mean of the executivepopulation are

: 128

: 128

z

z

Page 33: chap_06

P = 0.1379

z 1.09

Introduction to Inference392 CHAPTER 6

FIGURE 6.9

The -value for the one-sided test in Example 6.13.P

Step 2: Test statistic.

Step 3: -value.

Step 4: Conclusion.

��

P

� �� �

� � �

0

The data in Example 6.13 do establish that the mean blood pressurefor this company’s middle-aged male executives is 128. We sought evidence

that was higher than 128 and failed to find convincing evidence. That isall we can say. No doubt the mean blood pressure of the entire executivepopulation is not exactly equal to 128. A large enough sample would giveevidence of the difference, even if it is very small. Tests of significance assess

z

z

x .z

n

.

PP Z

P P Z . . .

x .

not

The test requires that the 72 executives in the sample arean SRS from the population of the company’s young male executives. We mustask how the data were produced. If records are available only for executives withrecent medical problems, for example, the data are of little value for our purpose.It turns out that all executives are given a free annual medical exam and that themedical director selected 72 exam results at random. The one-sample statistic is

129 93 128

/ 15/ 72

1 09

Draw a picture to help find the -value. Figure 6.9 shows that the-value is the probability that a standard Normal variable takes a value of 1.09

or greater. From Table A we find that this probability is

( 1 09) 1 0 8621 0 1379

About 14% of the time, an SRS of size 72 from the generalmale population would have a mean blood pressure as high as that of the executivesample. The observed 129 93 is not significantly higher than the nationalaverage at any of the usual significance levels.

� �

� �

Page 34: chap_06

6.2 Tests of Significance 393

A company-wide health promotion campaignEXAMPLE 6.14

Step 1: Hypotheses.

Step 2: Test statistic.

Step 3: -value.

Step 4: Conclusion.

a

aP

0

0 0

0

0

� �

� �

� �

� �

0

0

0

the evidence . If the evidence is strong, we can confidently rejectin favor of the alternative. Failing to find evidence against means

only that the data are consistent with , not that we have clear evidencethat is true.

Our conclusion in Example 6.14 is cautious. We would like to concludethat the health campaign the drop in mean blood pressure. Butthe data are not protected from confounding. If, for example, the localtelevision station runs a series on the risk of heart attacks and the valueof better diet and exercise, many employees may be moved to reformtheir health habits even without encouragement from the company. Only arandomized comparative experiment protects against such lurking variables.The medical director preferred to launch a company-wide campaign that

n x

H

H

z

xz

n

.

H zH P

P P Z . .

against HH H

HH

caused

The company medical director institutes a health promotion campaign toencourage employees to exercise more and eat a healthier diet. One measure ofthe effectiveness of such a program is a drop in blood pressure. Choose a randomsample of 50 employees, and compare their blood pressures from annual physicalexaminations given before the campaign and again a year later. The mean changein blood pressure for these 50 employees is 6. We take the populationstandard deviation to be 20.

We want to know if the health campaign reduced averageblood pressure. Taking to be the mean change in blood pressure for allemployees,

: 0

: 0

The one-sample test is appropriate:

6 0

/ 20/ 50

2 12

Because is one-sided on the low side, large negative values ofcount against . See Figure 6.10. From Table A, we find that the -value is

( 2 12) 0 0170

A mean change in blood pressure of 6 or better would occuronly 17 times in 1000 samples if the campaign had no effect on the blood pressuresof the employees. This is convincing evidence that the mean blood pressure in thepopulation of all employees has decreased.

� � �

� �

Page 35: chap_06

P = 0.0170

z = –2.12

Introduction to Inference394 CHAPTER 6

FIGURE 6.10

Two-sided significance tests and confidence intervals

The -value for the one-sided test in Example 6.14.P

APPLY YOURKNOWLEDGE

6.38 Testing a random number generator.

6.39

6.40 A new supplier.

a

a

a

��

� �

� �0 0

0

0

0

appealed to all employees. This is no doubt a sound medical decision, butthe absence of a control group weakens the statistical conclusion.

A 95% confidence interval captures the true value of in 95% of allsamples. If we are 95% confident that the true lies in our interval, we arealso confident that values of that fall outside our interval are incompatiblewith the data. That sounds like the conclusion of a test of significance.In fact, there is an intimate connection between 95% confidence intervalsand significance at the 5% level. The same connection holds between 99%confidence intervals and significance at the 1% level, and so on.

.x . s .

.

H z .

P H

P H

P H

P

Statistical software has a “randomnumber generator” that is supposed to produce numbers uniformly dis-tributed between 0 to 1. If this is true, the numbers generated come froma population with 0 5. A command to generate 100 random numbersgives outcomes with mean 0 532 and 0 316. Because the sam-ple is reasonably large, take the population standard deviation also to be

0 316. Do we have evidence that the mean of all numbers produced bythis software is not 0.5?

A test of the null hypothesis : gives test statistic 1 8.

(a) What is the -value if the alternative is : ?

(b) What is the -value if the alternative is : ?

(c) What is the -value if the alternative is : ?

A new supplier offers a good price on a catalyst used inyour production process. You compare the purity of this catalyst with thatfrom your current supplier. The -value for a test of “no difference” is 0.15.Can you be confident that the purity of the new product is the same as thepurity of the product that you have been using? Discuss.

� �

� �

� �

� �

APPLY YOURKNOWLEDGE

Page 36: chap_06

6.2 Tests of Significance 395

Testing pharmaceutical productsEXAMPLE 6.15

CONFIDENCE INTERVALS AND TWO-SIDED TESTS

a

� ��

� � � �

� �

� ��

� �

� �

� � �

0 0

0

0

�0

A level two-sided significance test rejects a hypothesis :exactly when the value falls outside a level 1 confidenceinterval for .

First, the test. The mean of the three analyses is 0 8404. Theone-sample test statistic is therefore

0 8404 0 864 99

/ 0 0068/ 3

We need not find the exact -value to assess significance at the 0 01level. Look in Table D under tail area 0.005 because the alternative istwo-sided. The -values that are significant at the 1% level are 2 576and 2 576. Our observed 4 99 is significant ( 0 01).

To compute a 99% confidence interval for the mean concentration, find inTableDthecritical value for99%confidence. It is 2 576, the samecriticalvalue that marked off significant ’s in our test. The confidence interval is

0 00680 8404 2 576

3

0 8404 0 0101

(0 8303 0 8505)

.

. . .

H .

H .

H

x .z

x . .z .

n .

P .

z z .z . z . P .

z .z

.x z . .

n

. .

. , .

The Deely Laboratory analyzes pharmaceutical products to verify theconcentration of active ingredients. Such chemical analyses are not perfectlyprecise. Repeated measurements on the same specimen will give slightly differentresults. The results of repeated measurements follow a Normal distribution quiteclosely. The analysis procedure has no bias, so that the mean of the populationof all measurements is the true concentration in the specimen. The standarddeviation of this distribution is a property of the analytical procedure and isknown to be 0 0068 grams per liter. The laboratory analyzes each specimenthree times and reports the mean result.

A client sends a specimen for which the concentration of active ingredient issupposed to be 0.86%. Deely’s three analyses give concentrations

0 8403 0 8363 0 8447

Is there significant evidence at the 1% level that the true concentration is not0.86%? This calls for a test of the hypotheses

: 0 86

: 0 86

We will carry out the test twice, with the usual significance test and then from a99% confidence interval.

Page 37: chap_06

Cannotreject H0:µ = 0.85 µ

Reject H0:µ = 0.86

0.83 0.84 0.85 0.86

µ

Introduction to Inference396 CHAPTER 6

FIGURE 6.11

How significant?EXAMPLE 6.16

-values versus fixed

Values of falling outside a 99% confidence interval can be rejected at the1% significance level. Values falling inside the interval cannot be rejected.

APPLY YOURKNOWLEDGE

P

6.41

6.42

a

��

��

a �

��

0

0

0

0

0

0

The hypothesized value 0 86 falls outside this confidence interval.We are therefore 99% confident that is equal to 0.86, so we can reject

: 0 86

at the 1% significance level. On the other hand, we cannot reject

: 0 85

at the 1% level in favor of the two-sided alternative : 0 85, because0.85 lies inside the 99% confidence interval for . Figure 6.11 illustratesboth cases.

In Example 6.15, we concluded that the test statistic 4 99 is significantat the 1% level. This isn’t the whole story. The observed is far beyond thecritical value for 0 01, and the evidence against is far stronger than1% significance suggests. The -value 0 0000006 (from software) givesa better sense of how strong the evidence is.

Knowing the -value allows us to assesssignificance at any level.

P H

,

H

H

.not

H .

H .

H .

z .z

. HP P .

The P-value is the smallest levelat which the data are significant. P

The -value for a two-sided test of the null hypothesis : 10 is 0.06.

(a) Does the 95% confidence include the value 10? Why?

(b) Does the 90% confidence include the value 10? Why?

A 95% confidence interval for a population mean is (28 35).

(a) Can you reject the null hypothesis that 34 at the 5% significancelevel? Why?

(b) Can you reject the null hypothesis that 36 at the 5% significancelevel? Why?

In Example 6.14, we tested the hypotheses

: 0

: 0�

APPLY YOURKNOWLEDGE

Page 38: chap_06

0 0.01 0.040.030.02 0.05

Not significant at levels α < P P Significant at levels α ≥≥ P

Significance level α

6.2 Tests of Significance 397

FIGURE 6.12

Fill the bottlesEXAMPLE 6.17

An outcome with -value is significant at all levels at orabove and is not significant at smaller levels .

P PP

a

� �

� �

� � �

0

0

A -value is more informative than a reject-or-not finding at a fixedsignificance level. But assessing significance at a fixed level is easier,because no probability calculation is required. You need only look up acritical value in a table. Because the practice of statistics almost alwaysemploys software that calculates -values automatically, tables of criticalvalues are becoming outdated. We include the usual tables of critical values(such as Table D) at the end of the book for learning purposes and to rescuestudents without good computing facilities. The tables can be used directlyto carry out fixed tests. They also allow us to approximate -valuesquickly without a probability calculation. The following example illustratesthe use of Table D to find an approximate -value.

P P . ..

P

. . . . . .

H

H

x . z

x .z .

n

P

P

P

P

to evaluate the effectiveness of a health promotion program on blood pressure.The test had the -value 0 0170. This result is not significant at the 0 01level, because 0.0170 0.01. However, it is significant at the 0 05 level,because the -value is less than 0.05. See Figure 6.12.

Bottles of a popular cola drink are supposed to contain 300 milliliters (ml) ofcola. There is some variation from bottle to bottle because the filling machineryis not perfectly precise. The distribution of the contents is Normal with standarddeviation 3 ml. An inspector who suspects that the bottler is underfillingmeasures the contents of six bottles. The results are

299 4 297 7 301 0 298 9 300 2 297 0

Is this convincing evidence that the mean contents of cola bottles is less than theadvertised 300 ml? The hypotheses are

: 300

: 300

The sample mean contents of the six bottles measured is 299 03 ml. The teststatistic is therefore

299 03 3000 792

/ 3/ 6

� ��

Page 39: chap_06

Observed z

z* for = 0.25z* for = 0.20α

α z

Introduction to Inference398 CHAPTER 6

FIGURE 6.13

ECTION UMMARY

The -value of a statistic can be approximated bynoting which levels from Table D it falls between. Here, lies between0.20 and 0.25.

P zP

APPLY YOURKNOWLEDGE

S 6.2 Stest of significance

null hypothesis alternative hypothesis .

6.43

6.44

a

0

� �

0

0

0

A is intended to assess the evidence provided by dataagainst a and in favor of an

z H HP

P Z .

z .

H . z .

H . z .

z .. .

P

P

P

.

.

P

.

.

Small values of count against because is one-sided on the low side. The-value is

( 0 792)

Rather than compute this probability, compare 0 792 with the critical valuesin Table D. According to this table, we would

reject at level 0 25 if 0 674

reject at level 0 20 if 0 841

As Figure 6.13 illustrates, the observed 0 792 lies between these two criticalvalues. We can reject at 0 25 but not at 0 20. That is the same as sayingthat the -value lies between 0.25 and 0.20. A sample mean at least as small asthat observed will occur in between 20% and 25% of all samples if the populationmean is 300 ml. There is no convincing evidence that the mean is below 300,and there is no need to calculate the -value more exactly.

The -value for a significance test is 0.078.

(a) Do you reject the null hypothesis at level 0 05?

(b) Do you reject the null hypothesis at level 0 01?

(c) Explain your answers.

The -value for a significance test is 0.03.

(a) Do you reject the null hypothesis at level 0 05?

(b) Do you reject the null hypothesis at level 0 01?

(c) Explain your answers.

H H

� �

� �

� �

a

APPLY YOURKNOWLEDGE

Page 40: chap_06

6.2 Tests of Significance 399

ECTION XERCISES

S 6.2 E

one-sided alternativetwo-sided alternative

test statistic. -value

statistically significant

statistic:

standard Normal critical values

6.45 Hypotheses.

6.46 Hypotheses.

a

a

� ��

a

0

0

0

0

0

0 0

0

0

0

It provides a method for ruling out chance as an explanation for data thatdeviate from what we expect under .

The hypotheses are stated in terms of population parameters. Usuallyis a statement that no effect is present, and says that a parameter differsfrom its null value, in a specific direction ( ) or in eitherdirection ( ).

The test is based on a The is the probability,computed assuming that is true, that the test statistic will take a valueat least as extreme as that actually observed. Small -values indicate strongevidence against . Calculating -values requires knowledge of thesampling distribution of the test statistic when is true.

If the -value is as small or smaller than a specified value , the data areat significance level .

Significance tests for the hypothesis : concerning theunknown mean of a population are based on the

/

The test assumes an SRS of size , known population standarddeviation , and either a Normal population or a large sample. -valuesare computed from the Normal distribution (Table A). Fixed tests use thetable of ( row in Table D).

HH

HH

H

HH

HP

H PH

P

H

xz

n

z nP

z

Each of the following situations requires a significance testabout a population mean . State the appropriate null hypothesis andalternative hypothesis in each case.

(a) The mean area of the several thousand apartments in a new developmentis advertised to be 1250 square feet. A tenant group thinks that theapartments are smaller than advertised. They hire an engineer to measurea sample of apartments to test their suspicion.

(b) Larry’s car averages 32 miles per gallon on the highway. He nowswitches to a new motor oil that is advertised as increasing gas mileage.After driving 3000 highway miles with the new oil, he wants to determineif his gas mileage actually has increased.

(c) The diameter of a spindle in a small motor is supposed to be 5millimeters. If the spindle is either too small or too large, the motorwill not perform properly. The manufacturer measures the diameter ina sample of motors to determine whether the mean diameter has movedaway from the target.

In each of the following situations, a significance test for apopulation mean is called for. State the null hypothesis and thealternative hypothesis in each case.

(a) Experiments on learning in animals sometimes measure how long ittakes a mouse to find its way through a maze. The mean time is

P

z

Page 41: chap_06

Introduction to Inference400 CHAPTER 6 �

6.47 Hypotheses.

6.48 Hypotheses.

6.49 Blood pressure and calcium.

a

a

0

0

6

H H

H H

18 seconds for one particular maze. A researcher thinks that a loudnoise will cause the mice to complete the maze faster. She measures howlong each of 10 mice takes with a noise as stimulus.

(b) The examinations in a large accounting class are scaled after grading sothat the mean score is 50. A self-confident teaching assistant thinks thathis students have a higher mean score than the class as a whole. Hisstudents this semester can be considered a sample from the populationof all students he might teach, so he compares their mean score with 50.

(c) A university gives credit in French language courses to students who passa placement test. The language department wants to know if studentswho get credit in this way differ in their understanding of spokenFrench from students who actually take the French courses. Experiencehas shown that the mean score of students in the courses on a standardlistening test is 24. The language department gives the same listeningtest to a sample of 40 students who passed the credit examination to seeif their performance is different.

In each of the following situations, state an appropriate nullhypothesis and alternative hypothesis . Be sure to identify the param-eters that you use to state the hypotheses. (We have not yet learned how totest these hypotheses.)

(a) An economist believes that among employed young adults there isa positive correlation between income and the percent of disposableincome that is saved. To test this, she gathers income and savings datafrom a sample of employed persons in her city aged 25 to 34.

(b) A sociologist asks a large sample of high school students which academicsubject they like best. She suspects that a higher percent of males thanof females will name economics as their favorite subject.

(c) An education researcher randomly divides sixth-grade students into twogroups for physical education class. He teaches both groups basketballskills, using the same methods of instruction in both classes. He en-courages Group A with compliments and other positive behavior butacts cool and neutral toward Group B. He hopes to show that positiveteacher attitudes result in a higher mean score on a test of basketballskills than do neutral attitudes.

Translate each of the following research questions into appro-priate and .

(a) Census Bureau data show that the mean household income in the areaserved by a shopping mall is $62,500 per year. A market research firmquestions shoppers at the mall to find out whether the mean householdincome of mall shoppers is higher than that of the general population.

(b) Last year, your company’s service technicians took an average of 2.6hours to respond to trouble calls from business customers who hadpurchased service contracts. Do this year’s data show a different averageresponse time?

A randomized comparative experiment exam-ined whether a calcium supplement in the diet reduces the blood pressure ofhealthy men. The subjects received either a calcium supplement or a placebofor 12 weeks. The statistical analysis was quite complex, but one conclusion

Page 42: chap_06

6.2 Tests of Significance 401

6.50 California brushfires.

6.51 Exercise and statistics exams.

6.52 Financial aid.

6.53 Who is the author?

2

a

7

2

8

0

P .P

Collectively, since 1910, there has been a highly significant increase (r0.61, P 0.01) in the number of fires per decade.

r

P

P .

P .

..

x .

H .

H .

z P

was that “the calcium group had lower seated systolic blood pressure( 0 008) compared with the placebo group.” Explain this conclusion,especially the -value, as if you were speaking to a doctor who knows nostatistics.

We often see televised reports of brushfires threateninghomes in California. Some people argue that the modern practice of quicklyputting out small fires allows fuel to accumulate and so increases the damagedone by large fires. A detailed study of historical data suggests that this iswrong—the damage has risen simply because there are more houses in riskyareas. As usual, the study report gives statistical information tersely. Hereis the summary of a regression of number of fires per decade (9 data points,for the 1910s to the 1990s):

How would you explain this statement to someone who knows no statistics?Include an explanation of both the description given by and the statisticalsignificance.

A study examined whether exercise affectshow students perform on their final exam in statistics. The -value was givenas 0.87.

(a) State null and alternative hypotheses that could be used for this study.(Note that there is more than one correct answer.)

(b) Do you reject the null hypothesis? State your conclusion in plainlanguage.

(c) What other facts about the study would you like to know for a properinterpretation of the results?

The financial aid office of a university asks a sample ofstudents about their employment and earnings. The report says that “foracademic year earnings, a significant difference ( 0 038) was foundbetween the sexes, with men earning more on the average. No difference( 0 476) was found between the earnings of black and white students.”Explain both of these conclusions, for the effects of sex and of race on meanearnings, in language understandable to someone who knows no statistics.

Statistics can help decide the authorship of literaryworks. Sonnets by a certain Elizabethan poet are known to contain anaverage of 6 9 new words (words not used in the poet’s other works).The standard deviation of the number of new words is 2 7. Now amanuscript with 5 new sonnets has come to light, and scholars are debatingwhether it is the poet’s work. The new sonnets contain an average of

11 2 words not used in the poet’s known works. We expect poems byanother author to contain more new words, so to see if we have evidencethat the new sonnets are not by our poet we test

: 6 9

: 6 9

Give the test statistic and its -value. What do you conclude about theauthorship of the new poems?

��

Page 43: chap_06

Introduction to Inference402 CHAPTER 6 �

6.54 Study habits.

6.55 Corn yield.

6.56 Cigarettes.

6.57 Academic probation and TV watching.

6.58

a

a

a

0

0

0

x .

H

H

P

x .

P

H

H

H H

z .

z

z

z .

The Survey of Study Habits and Attitudes (SSHA) is a psy-chological test that measures the motivation, attitude toward school, andstudy habits of students. Scores range from 0 to 200. The mean score forU.S. college students is about 115, and the standard deviation is about 30. Ateacher who suspects that older students have better attitudes toward schoolgives the SSHA to 20 students who are at least 30 years of age. Their meanscore is 135 2.

(a) Assuming that 30 for the population of older students, carry out atest of

: 115

: 115

Report the -value of your test, and state your conclusion clearly.

(b) Your test in (a) required two important assumptions in addition to theassumption that the value of is known. What are they? Which of theseassumptions is most important to the validity of your conclusion in (a)?

The mean yield of corn in the United States is about 135 bushelsper acre. A survey of 40 farmers this year gives a sample mean yield of

138 8 bushels per acre. We want to know whether this is good evidencethat the national mean this year is not 135 bushels per acre. Assume thatthe farmers surveyed are an SRS from the population of all commercial corngrowers and that the standard deviation of the yield in this population is

10 bushels per acre. Give the -value for the test of

: 135

: 135

Are you convinced that the population mean is not 135 bushels per acre?Is your conclusion correct if the distribution of corn yields is somewhatnon-Normal? Why?

According to data from the Tobacco Institute Testing Laboratory,Camel Lights King Size cigarettes contain an average of 1.4 milligrams ofnicotine. An advocacy group commissions an independent test to see if themean nicotine content is higher than the industry laboratory claims.

(a) What are and ?

(b) Suppose that the test statistic is 2 42. Is this result significant at the5% level?

(c) Is the result significant at the 1% level?

There are other statistics that wehave not yet met. We can use Table D to assess the significance of any

statistic. A study compares the habits of students who are on academicprobation with students whose grades are satisfactory. One variable mea-sured is the hours spent watching television last week. The null hypothesis is“no difference” between the means for the two populations. The alternativehypothesis is two-sided. The value of the test statistic is 1 37.

(a) Is the result significant at the 5% level?

(b) Is the result significant at the 1% level?

Explain in plain language why a significance test that is significant at the 1%level must always be significant at the 5% level.

Page 44: chap_06

6.2 Tests of Significance 403

6.59

6.60

6.61

6.62

6.63 Radon.

6.64 Clothing for runners.

a

a

a

0

0

0

9

0

0

z . P

z . P

H H

z H

H zH

Pz . P

.

H .

H .

H

You have performed a two-sided test of significance and obtained a value of3 3. Use Table D to find the approximate -value for this test.

You have performed a one-sided test of significance and obtained a value of0 215. Use Table D to find the approximate -value for this test.

You will perform a significance test of : 0 versus : 0.

(a) What values of would lead you to reject at the 5% level?

(b) If the alternative hypothesis was : 0, what values of would leadyou to reject at the 5% level?

(c) Explain why your answers to parts (a) and (b) are different.

Between what critical values from Table D does the -value for the outcome1 37 in Exercise 6.57 lie? Calculate the -value using Table A, and

verify that it lies between the values you found from Table D.

Radon is a colorless, odorless gas that is naturally released by rocksand soils and may concentrate in tightly closed houses. Because radon isslightly radioactive, there is some concern that it may be a health hazard.Radon detectors are sold to homeowners worried about this risk, but thedetectors may be inaccurate. University researchers placed 12 detectors ina chamber where they were exposed to 105 picocuries per liter (pCi/l) ofradon over 3 days. Here are the readings given by the detectors:

91.9 97.8 111.4 122.3 105.4 95.0103.8 99.6 96.6 119.3 104.8 101.7

Assume (unrealistically) that you know that the standard deviation ofreadings for all detectors of this type is 9.

(a) Give a 95% confidence interval for the mean reading for this type ofdetector.

(b) Is there significant evidence at the 5% level that the mean reading differsfrom the true value 105? State hypotheses and base a test on yourconfidence interval from (a).

Your company sells exercise clothing and equipmenton the Internet. To design the clothing, you collect data on the physicalcharacteristics of your different types of customers. Here are the weights fora sample of 24 male runners. Assume that these runners can be viewed as arandom sample of your potential customers. The weights are expressed inkilograms.

67.8 61.9 63.0 53.1 62.3 59.7 55.4 58.9 60.9 69.2 63.7 68.364.7 65.6 56.0 57.8 66.0 62.9 53.6 65.0 55.8 60.4 69.3 61.7

Exercise 6.12 (page 377) asks you to find a 95% confidence interval forthe mean weight of the population of all such runners, assuming that thepopulation standard deviation is 4 5 kg.

(a) Give the confidence interval from that exercise, or calculate the intervalif you did not do the exercise.

(b) Based on this confidence interval, does a test of

: 61 3 kg

: 61 3 kg

reject at the 5% significance level?

� �

Page 45: chap_06

Introduction to Inference404 CHAPTER 6 �

6.3 Using Significance Tests

6.65 Cockroaches.

6.66 Market pioneers.

a

0

10

0

11

Significance tests are widely used in reporting the results of research inmany fields of applied science and in industry. New pharmaceutical prod-ucts require significant evidence of effectiveness and safety. Courts inquireabout statistical significance in hearing class-action discrimination cases.Marketers want to know whether a new ad campaign significantly outper-forms the old one, and medical researchers want to know whether a newtherapy performs significantly better. In all these uses, statistical significanceis valued because it points to an effect that is unlikely to occur simply bychance.

Carrying out a test of significance is often quite simple, especially if youget a -value effortlessly from a calculator or computer. Using tests wiselyis not so simple. Here are some points to keep in mind when using orinterpreting significance tests.

H

. .

H H

Can patent protection explain pioneer share advantages? Only 21% of thepioneers claim a significant benefit from either a product patent or a tradesecret. Though their average share is two points higher than that of pioneerswithout this benefit, the increase is not statistically significant (z 1.13).Thus, at least in mature industrial markets, product patents and trade secretshave little connection to pioneer share advantages.

P z

P

(c) Would : 63 be rejected at the 5% level if tested against atwo-sided alternative?

Your company is developing a better means to eliminatecockroaches from buildings. In the process, your R&D team studies the ab-sorption of sugar by these insects. They feed cockroaches a diet containingmeasured amounts of a particular sugar. After 10 hours, the cockroaches arekilled and the concentration of the sugar in various body parts is determinedby a chemical analysis. The paper that reports the research states that a 95%confidence interval for the mean amount (in milligrams) of the sugar in thehindguts of the cockroaches is 4 2 2 3.

(a) Does this paper give evidence that the mean amount of sugar in thehindguts under these conditions is not equal to 7 mg? State andand base a test on the confidence interval.

(b) Would the hypothesis that 5 mg be rejected at the 5% level in favorof a two-sided alternative?

Market pioneers, companies that are among the first todevelop a new product or service, tend to have higher market shares thanlatecomers to the market. What accounts for this advantage? Here is anexcerpt from the conclusions of a study of a sample of 1209 manufacturersof industrial goods:

Find the -value for the given . Then explain to someone who knows nostatistics what “not statistically significant” in the study’s conclusion means.Why does the author conclude that patents and trade secrets don’t help, eventhough they contributed 2 percentage points to average market share?

Page 46: chap_06

6.3 Using Significance Tests 405

How small a is convincing?

Statistical significance and practical significance

APPLY YOURKNOWLEDGE

P

There is no sharp borderbetween “significant” and “insignificant,” only increasingly strong evidenceas the -value decreases.

6.67 Is it significant?

a

a

� �

0 0

0 0

0

The purpose of a test of significance is to describe the degree of evidenceprovided by the sample against the null hypothesis. The -value does this.But how small a -value is convincing evidence against the null hypothesis?This depends mainly on two circumstances:

If represents an assumption that the people youmust convince have believed for years, strong evidence (small ) will beneeded to persuade them.

If rejecting in favor ofmeans making an expensive changeover from one type of product

packaging to another, you need strong evidence that the new packagingwill boost sales.

These criteria are a bit subjective. Different people will often insist ondifferent levels of significance. Giving the -value allows each of us to decideindividually if the evidence is sufficiently strong.

Users of statistics have often emphasized standard levels of significancesuch as 10%, 5%, and 1%. This emphasis reflects the time when tables ofcritical values rather than computer software dominated statistical practice.The 5% level ( 0 05) is particularly common.

There is no practical distinction between the-values 0.049 and 0.051. It makes no sense to treat 0 05 as a universal

rule for what is significant.

When a null hypothesis (“no effect” or “no difference”) can be rejectedat the usual levels, 0 05 or 0 01, there is good evidence thatan effect is present. But that effect may be extremely small. When large

H

H

x .

x .

.

PP

How plausible is H ? HP

What are the consequences of rejecting H ? HH

P

.

P P .

. .

More than 200,000 people worldwide take the GMATexamination each year as they apply for MBA programs. Their scoresvary Normally with mean about 525 and standard deviation about

100. One hundred students go through a rigorous training programdesigned to raise their GMAT scores. Test the hypotheses

: 525

: 525

in each of the following situations:

(a) The students’ average score is 541 4. Is this result significant at the5% level?

(b) The average score is 541 5. Is this result significant at the 5% level?

The difference between the two outcomes in (a) and (b) is of no importance.Beware attempts to treat 0 05 as sacred.

P

��

APPLY YOURKNOWLEDGE

Page 47: chap_06

Introduction to Inference406 CHAPTER 6

It’s significant. So what?EXAMPLE 6.18

APPLY YOURKNOWLEDGE

6.68 How far do rich parents take us?

0

samples are available, even tiny deviations from the null hypothesis will besignificant.

Remember the wise saying:On the other hand, if we fail to reject the null

hypothesis, it may be because is true or because our sample size isinsufficient to detect the alternative. Exercise 6.73 demonstrates in detail theeffect on of increasing the sample size.

The remedy for attaching too much importance to statistical significanceis to pay attention to the actual experimental results as well as to the

-value. Plot your data and examine them carefully. Are there outliers orother deviations from a consistent pattern? A few outlying observationscan produce highly significant results if you blindly apply common tests ofsignificance. Outliers can also destroy the significance of otherwise convinc-ing data. The foolish user of statistics who feeds the data to a computerwithout exploratory analysis will often be embarrassed. Is the effect thatyou are seeking visible in your plots? If not, ask yourself if the effect is largeenough to be of practical importance. It is usually wise to give a confidenceinterval for the parameter in which you are interested. A confidence intervalactually estimates the size of an effect rather than simply asking if it istoo large to reasonably occur by chance alone. Confidence intervals arenot used as often as they should be, while tests of significance are perhapsoverused.

r ..

r .

statistical significance is not the same aspractical significance.

H

P

P

We are testing the hypothesis of no correlation between two variables. With 1000observations, an observed correlation of only 0 08 is significant evidence at the

0 01 level that the correlation in the population is not zero but positive. Thelow significance level does not mean there is a strong association, only that there isstrong evidence of some association. The true population correlation is probablyquite close to the observed sample value, 0 08. We might well conclude thatfor practical purposes we can ignore the association between these variables, eventhough we are confident (at the 1% level) that the correlation is positive.

How much education children get isstrongly associated with the wealth and social status of their parents. Insocial science jargon, this is “socioeconomic status,” or SES. But the SES ofparents has little influence on whether children who have graduated fromcollege go on to yet more education. One study looked at whether collegegraduates took the graduate admissions tests for business, law, and othergraduate programs. The effects of the parents’ SES on taking the LSAT testfor law school were “both statistically insignificant and small.”

(a) What does “statistically insignificant” mean?

(b) Why is it important that the effects were small in size as well asinsignificant?

APPLY YOURKNOWLEDGE

Page 48: chap_06

6.3 Using Significance Tests 407

Cell phones and brain cancerEXAMPLE 6.19

Statistical inference is not valid for all sets of data

Beware of searching for significance

APPLY YOURKNOWLEDGE

6.69

12

We know that badly designed surveys or experiments often produce uselessresults. Formal statistical inference cannot correct basic flaws in the design. Astatistical test is valid only in certain circumstances, with properly produceddata being particularly important. The test, for example, should bearthe same warning label that we attached on page 372 to the confidenceinterval. Similar warnings accompany the other tests that we will learn.

Tests of significance and confidence intervals are based on the laws ofprobability. Randomization in sampling or experimentation ensures thatthese laws apply. But we must often analyze data that do not arise fromrandomized samples or experiments. To apply statistical inference to suchdata, we must have confidence in a probability model for the data. Thediameters of successive holes bored in auto engine blocks during produc-tion, for example, may behave like independent observations on a Normaldistribution. We can check this probability model by examining the data. Ifthe model appears correct, we can apply the recipes of this chapter to doinference about the process mean diameter . Do ask how the data wereproduced, and don’t be too impressed by -values on a printout until youare confident that the data deserve a formal analysis.

Statistical significance ought to mean that you have found an effect that youwere looking for. The reasoning behind statistical significance works well ifyou decide what effect you are seeking, design a study to search for it, anduse a test of significance to weigh the evidence you get. In other settings,significance may have little meaning.

A hospital study that compared brain cancer patients and a similar group withoutbrain cancer found no statistically significant association between cell phone useand a group of brain cancers known as gliomas. But when 20 types of glioma wereconsidered separately, an association was found between phone use and one rareform. Puzzlingly, however, this risk appeared to decrease rather than increase withgreater mobile phone use.

.

zz

P

Give an example of a set of data for which statistical inference is not valid.

Might the radiation from cell phones be harmful to users? Many studies havefound little or no connection between using cell phones and various illnesses. Hereis part of a news account of one study:

Think for a moment: Suppose that the 20 null hypotheses for these 20significance tests are all true. Then each test has a 5% chance of being significant atthe 5% level. That’s what 0 05 means—results this extreme occur only 5%of the time just by chance when the null hypothesis is true. Because 5% is 1/20, weexpect about 1 of 20 tests to give a significant result just by chance. Running 1 test

APPLY YOURKNOWLEDGE

Page 49: chap_06

Introduction to Inference408 CHAPTER 6

ECTION UMMARY

APPLY YOURKNOWLEDGE

S 6.3 S

6.70 Predicting success of trainees.

0

The peril of multiple analyses is increased now that a few simplecommands will set software to work performing all manner of complicatedtests and operations on your data. We will state it as a law that any largeset of data—even several pages of a table of random digits—contains someunusual pattern. Sufficient computer time will discover that pattern, andwhen you test specifically for the pattern that turned up, the result will besignificant. That’s much like testing for 20 kinds of cancer and finding onetest significant. Significance levels assume that you chose one hypothesisbefore searching your data and then tested that one hypothesis.

Searching data for suggestive patterns is certainly legitimate. Exploratorydata analysis is an important part of statistics. But the reasoning of formalinference does not apply when your search for a striking effect in the data issuccessful. The remedy is clear. Once you have a hypothesis, design a studyto search specifically for the effect you now think is there. If the result ofthis study is statistically significant, you have real evidence.

-values are more informative than the reject-or-not result of a fixedlevel test. Beware of placing too much weight on traditional values of ,such as 0 05.

Very small effects can be highly significant (small ), especially whena test is based on a large sample. A statistically significant effect neednot be practically important. Plot the data to display the effect you areseeking, and use confidence intervals to estimate the actual value ofparameters.

On the other hand, lack of significance does not imply that is true,especially when the test is based on a small sample.

.

P

.

P

H

and reaching the 0 05 level is reasonably good evidence that you have foundsomething; running 20 tests and reaching that level only once is not.

What distinguishes managerial trainees whoeventually become executives from those who, after expensive training,don’t succeed and leave the company? We have abundant data on pasttrainees—data on their personalities and goals, their college preparation andperformance, even their family backgrounds and their hobbies. Statisticalsoftware makes it easy to perform dozens of significance tests on thesedozens of variables to see which ones best predict later success. We findthat future executives are significantly more likely than washouts to have anurban or suburban upbringing and an undergraduate degree in a technicalfield.

Explain clearly why using these “significant” variables to select futuretrainees is not wise. Then suggest a follow-up study using this year’s traineesas subjects that should clarify the importance of the variables identified bythe first study.

APPLY YOURKNOWLEDGE

Page 50: chap_06

6.3 Using Significance Tests 409

ECTION XERCISES

S 6.3 E6.71 Significance tests.

6.72 Vitamin C and colds.

6.73 Coaching for the SAT.

6.74 Coaching for the SAT.

6.75 Student loan poll.

a

� �

0

Significance tests are not always valid. Faulty data collection, outliersin the data, and testing a hypothesis on the same data that suggested thehypothesis can invalidate a test. Many tests run at once will probablyproduce some significant results by chance alone, even if all the nullhypotheses are true.

P .

P

H

H

x

x

x

Which of the following questions does a test of significanceanswer?

(a) Is the sample or experiment properly designed?

(b) Is the observed effect due to chance?

(c) Is the observed effect important?

In a study of the suggestion that taking vitamin C willprevent colds, 400 subjects are assigned at random to one of two groups. Theexperimental group takes a vitamin C tablet daily, while the control grouptakes a placebo. At the end of the experiment, the researchers calculate thedifference between the percents of subjects in the two groups who were freeof colds. This difference is statistically significant ( 0 03) in favor of thevitamin C group. Can we conclude that vitamin C has a strong effect inpreventing colds? Explain your answer.

Every user of statistics should understand the distinc-tion between statistical significance and practical importance. A sufficientlylarge sample will declare very small effects statistically significant. Let ussuppose that SAT mathematics (SATM) scores in the absence of coachingvary Normally with mean 515 and 100. Suppose further thatcoaching may change but does not change . An increase in the SATMscore from 515 to 518 is of no importance in seeking admission to college,but this unimportant change can be statistically very significant. To see this,calculate the -value for the test of

: 515

: 515

in each of the following situations:

(a) A coaching service coaches 100 students; their SATM scores average518.

(b) By the next year, the service has coached 1000 students; their SATMscores average 518.

(c) An advertising campaign brings the number of students coached to10,000; their average score is still 518.

Give a 99% confidence interval for the mean SATMscore after coaching in each part of the previous exercise. For largesamples, the confidence interval says, “Yes, the mean score is higher aftercoaching, but only by a small amount.”

A local television station announces a question for a call-in opinion poll on the six o’clock news and then gives the response on theeleven o’clock news. Today’s question concerns a proposed increase in funds

� �� �

Page 51: chap_06

Introduction to Inference410 CHAPTER 6 �

6.4 Power and Inference as a Decision�

6.76 Extrasensory perception.

6.77 More than one test.Bonferroni procedure

6.78 More than one test.

6.79 Many tests.

Bonferroniprocedure

*Although the topics in this section are important in planning and interpreting significance tests,they can be omitted without loss of continuity.

Although we prefer to use -values rather than the reject-or-not view of thefixed significance test, the latter view is important for planning statistical

P .

P

allany

P . .k

k

P

.

. P

X

P

for student loans. Of the 2372 calls received, 1921 oppose the increase. Thestation, following standard statistical practice, makes a confidence statement:“81% of the Channel 13 Pulse Poll sample oppose the increase. We can be95% confident that the proportion of all viewers who oppose the increaseis within 1.6% of the sample result.” Is the station’s conclusion justified?Explain your answer.

A researcher looking for evidence of extrasensoryperception (ESP) tests 500 subjects. Four of these subjects do significantlybetter ( 0 01) than random guessing.

(a) Is it proper to conclude that these four people have ESP? Explain youranswer.

(b) What should the researcher now do to test whether any of these foursubjects have ESP?

A -value based on a single test is misleading if youperform several tests. The gives a significance level forseveral tests together. Level then means that if the null hypotheses aretrue, the probability is that of the tests rejects its null hypothesis.

If you perform 2 tests and want to use the 5% significance level,Bonferroni says to require a -value of 0 05/2 0 025 to declare eitherone of the tests significant. In general, if you perform tests and wantprotection at level , use / as your cutoff for statistical significance for eachtest.

You perform 6 tests and obtain individual -values of 0.476, 0.032,0.241, 0.008, 0.010, and 0.001. Which of these are statistically significantusing the Bonferroni procedure with 0 05?

Refer to the previous problem. A researcher has per-formed 12 tests of significance and wants to apply the Bonferroni procedurewith 0 05. The calculated -values are 0.041, 0.569, 0.050, 0.416,0.001, 0.004, 0.256, 0.041, 0.888, 0.010, 0.002, and 0.433. Which of thesetests reject their null hypotheses with this procedure?

Long ago, a group of psychologists carried out 77 separatesignificance tests and found that 2 were significant at the 5% level. Supposethat these tests are independent of each other. (In fact, they were notindependent, because all involved the same subjects.) If all of the nullhypotheses are true, each test has probability 0.05 of being significant at the5% level.

(a) What is the distribution of the number of tests that are significant?

(b) Find the probability that 2 or more of the tests are significant.

Page 52: chap_06

6.4 Power and Inference as a Decision 411

Does exercise make strong bones?EXAMPLE 6.20

The power of a statistical test

POWER

power

a

0

0

0

0

0

0

0

0

studies. We will first explain why, then discuss different views of statisticaltests.

In examining the usefulness of a confidence interval, we are concerned withboth the level of confidence and the margin of error. The confidence leveltells us how reliable the method is in repeated use. The margin of error tellsus how sensitive the method is, that is, how closely the interval pins downthe parameter being estimated. Fixed level significance tests are closelyrelated to confidence intervals—in fact, we saw that a two-sided test can becarried out directly from a confidence interval. The significance level, like theconfidence level, says how reliable the method is in repeated use. If we use5% significance tests repeatedly when is in fact true, we will be wrong(the test will reject ) 5% of the time and right (the test will fail to reject

) 95% of the time.High confidence is of little value if the interval is so wide that few values

of the parameter are excluded. Similarly, a test with a small level of is oflittle value if it almost never rejects even when the true parameter valueis far from the hypothesized value. We must be concerned with the ability ofa test to detect that is false, just as we are concerned with the marginof error of a confidence interval. This ability is measured by the probabilitythat the test will reject when an alternative is true. The higher thisprobability is, the more sensitive the test is.

The probability that a fixed level significance test will rejectwhen a particular alternative value of the parameter is true is calledthe of the test.

We will answer this question by calculating the power of the significancetest that will be used to evaluate the data to be collected. The calculationconsists of three steps:

1. State , , the particular alternative we want to detect, and thesignificance level .

HH

H

H

H

H

H

H H

Can a 6-month exercise program increase the total body bone mineral content(TBBMC) of young women? A team of researchers is planning a study to examinethis question. Based on the results of a previous study, they are willing to assumethat 2 for the percent change in TBBMC over the 6-month period. A changein TBBMC of 1% would be considered important, and the researchers wouldlike to have a reasonable chance of detecting a change this large or larger. Are 25subjects a large enough sample for this project?

Page 53: chap_06

Introduction to Inference412 CHAPTER 6 �

Step 1

Step 2

Step 3

� ��

� �� �

� �

��

��

��

a

� �

� �� �

� �

��

�� �

� �

� �

0

0

0

0

0

0

0

2. Find the values of that will lead us to reject .

3. Calculate the probability of observing these values of when the alter-native is true.

The null hypothesis is that the exercise program has no effect onTBBMC. In other words, the mean percent change is zero. The alternative isthat exercise is beneficial; that is, the mean change is positive. Formally, wehave

: 0

: 0

The alternative of interest is 1% increase in TBBMC. A 5% test ofsignificance will be used.

The test rejects at the 0 05 level whenever

01 645

/ 2/ 25

Be sure you understand why we use 1.645. Rewrite this in terms of :

2reject when 1 645

25

reject when 0 658

Because the significance level is 0 05, this event has probability 0.05of occurring .

The power for the alternative 1% is the probability thatwill be rejected . We calculate this probability by

standardizing , using the value 1, the population standard deviation2, and the sample size 25. The power is

0 658 1( 0 658 when 1)

/ 2/ 25

( 0 855) 0 80

Figure 6.14 illustrates the power with the sampling distribution ofwhen 1. This significance test rejects the null hypothesis that exercisehas no effect on TBBMC 80% of the time if the true effect of exercise isa 1% increase in TBBMC. If the true effect of exercise is greater than a1% increase, the test will have greater power; it will reject with a higherprobability.

High power is desirable. Along with 95% confidence intervals and5% significance tests, 80% power is becoming a standard. Many U.S.government agencies that provide research funds require that the sample sizefor the funded studies be sufficient to detect important results 80% of thetime using a 5% test of significance.

x H

x

H

H

z H .

x xz .

n

x

H x .

H x .

.when the population mean is 0

H when in fact 1x

n

x .P x . P

n

P Z . .

x

Page 54: chap_06

Fail to reject H0 Reject H0

= 0.05α

–2 –1 0 10.658 2 3

0.658

Fail to reject H0

Reject H0

Power = 0.80

–2 –1 0 1 2 3Increase

Increase

Distributionof x when = 1µ

Distributionof x when = 0µ

6.4 Power and Inference as a Decision 413

FIGURE 6.14

Increasing the power

The sampling distributions of when 0 andwhen 1 with and the power. Power is the probability that thetest rejects when the alternative is true.0

x

H

�� �

� �

��

a

��

0

0

0

��

Suppose you have performed a power calculation and found that the poweris too small. What can you do to increase it? Here are four ways:

Increase . A 5% test of significance will have a greater chance of rejectingthe alternative than a 1% test because the strength of evidence requiredfor rejection is less.

Consider a particular alternative that is farther away from . Values ofthat are in but lie close to the hypothesized value are harder to

detect (lower power) than values of that are far from .

Increase the sample size. More data will provide more information aboutso we have a better chance of distinguishing values of .

Decrease . This has the same effect as increasing the sample size: moreinformation about . Improving the measurement process and restrictingattention to a subpopulation are two common ways to decrease .

Power calculations are important in planning studies. Using a significancetest with low power makes it unlikely that you will find a significant effecteven if the truth is far from the null hypothesis. A null hypothesis that is in factfalse can become widely believed if repeated attempts to find evidence againstit fail because of low power. The following example illustrates this point.

H

x

Page 55: chap_06

Introduction to Inference414 CHAPTER 6

Are stock markets efficient?

Power for a two-sided test

EXAMPLE 6.21

EXAMPLE 6.22

a

a

.

� �

� �

� �

� �

� �

� �

� �

0

0

0

0

13

0

0

Here is another example of a power calculation, this time for a two-sidedtest.

H HH

HH

H .

H .

.H z .

x .z

.

z . x .

z . x .

computedassuming that the alternative 0.845 is true

x . .P x . P

n .

P Z .

x . .P x . P

n .

P Z . .

z

The “efficient market hypothesis” for stock prices says that future stock pricesshow only random variation. No information available now will help us predictstock prices in the future, because the efficient working of the market has alreadyincorporated all available information in the present price. Many studies testedthe claim that one or another kind of information is helpful. In these studies, theefficient market hypothesis is , and the claim that prediction is possible is .Almost all studies failed to find good evidence against . As a result, the efficientmarket theory became quite popular.

An examination of the significance tests employed found that the power wasgenerally low. Failure to reject when using tests of low power is not evidencethat is true. As one expert said, “The widespread impression that there is strongevidence for market efficiency may be due just to a lack of appreciation of the lowpower of many statistical tests.” More careful studies later showed that the sizeof a company and measures of value such as the ratio of stock price to earnings dohelp predict future stock price movements.

Example 6.15 (page 395) presented a test of

: 0 86

: 0 86

at the 1% level of significance. What is the power of this test against the specificalternative 0 845?

The test rejects when 2 576. The test statistic is

0 86

0 0068/ 3

Some arithmetic shows that the test rejects when either of the following is true:

2 576 (in other words, 0 870)

2 576 (in other words, 0 850)

These are disjoint events, so the power is the sum of their probabilities,. We find that

0 87 0 845( 0 87)

/ 0 0068/ 3

( 6 37) 0

0 85 0 845( 0 85)

/ 0 0068/ 3

( 1 27) 0 8980

� �

� � �

� �� �

� �� �

Page 56: chap_06

Reject H0

Reject H0Fail to reject H0

Power = 0.8980

0.845 0.850 0.870(alternative µ)µ

6.4 Power and Inference as a Decision 415

FIGURE 6.15

Inference as decision*

The power for Example 6.22.

Acceptance sampling

a

acceptancesampling

*The purpose of this section is to clarify the reasoning of significance tests by contrast with a relatedtype of reasoning. It can be omitted without loss of continuity.

0

0 0

0

We have presented tests of significance as methods for assessing the strengthof evidence against the null hypothesis. This assessment is made by the

-value, which is a probability computed under the assumption that istrue. The alternative hypothesis (the statement we seek evidence for) entersthe test only to show us what outcomes count against the null hypothesis.Most users of statistics think of tests in this way.

But signs of another way of thinking were present in our discussionof significance tests with fixed level . A level of significance chosen inadvance points to the outcome of the test as a . If the -value is lessthan , we reject in favor of . Otherwise we fail to reject . Thereis a big distinction between measuring the strength of evidence and makinga decision. Many statisticians feel that making decisions is too grand a goalfor statistics alone. Decision makers should take account of the results ofstatistical studies, but their decisions are based on many additional factorsthat are hard to reduce to numbers.

Yet there are circumstances in which a statistical test leads directly toa decision. is one such circumstance. A producer ofbearings and the consumer of the bearings agree that each carload lotshall meet certain quality standards. When a carload arrives, the consumerinspects a random sample of bearings. On the basis of the sample outcome,the consumer either accepts or rejects the carload. Some statisticians argue

H

P H

decision PH H H

Figure 6.15 illustrates this calculation. Because the power is about 0.9, we arequite confident that the test will reject when this alternative is true.

Page 57: chap_06

Type Ierror

Correctdecision

Correctdecision

Type IIerror

Truth aboutthe population

H0 true Ha true

Decisionbased on sample

Reject H0

Accept H0

Introduction to Inference416 CHAPTER 6

FIGURE 6.16

Two types of error

The two types of error in testing hypotheses.

TYPE I AND TYPE II ERRORS

Type Ierror.

Type IIerror.

a

a

a

a a

a

0

0

0 0

0

0 0

0

0

0

that if “decision” is given a broad meaning, almost all problems of statisticalinference can be posed as problems of making decisions in the presenceof uncertainty. We will not venture further into the arguments over howwe ought to think about inference. We do want to show how a differentconcept—inference as decision—changes the reasoning used in tests ofsignificance.

Tests of significance concentrate on , the null hypothesis. If a decision iscalled for, however, there is no reason to single out . There are simply twohypotheses, and we must accept one and reject the other. It is convenientto call the two hypotheses and , but no longer has the specialstatus (the statement we try to find evidence against) that it had in tests ofsignificance. In the acceptance sampling problem, we must decide between

: the lot of bearings meets standards

: the lot does not meet standards

on the basis of a sample of bearings.We hope that our decision will be correct, but sometimes it will be

wrong. There are two types of incorrect decisions. We can accept a bad lotof bearings, or we can reject a good lot. Accepting a bad lot injures theconsumer, while rejecting a good lot hurts the producer. To help distinguishthese two types of error, we give them specific names.

If we reject (accept ) when in fact is true, this is a

If we accept (reject ) when in fact is true, this is a

The possibilities are summed up in Figure 6.16. If is true, our decisioneither is correct (if we accept ) or is a Type I error. If is true, our

HH

H H H

H

H

H H H

H H H

HH H

Page 58: chap_06

Truth about the lot

Decisionbased on sample

Rejectthe lot

Acceptthe lot

Type Ierror

Correctdecision

Does meetstandards

Does not meet standards

Correctdecision

Type IIerror

6.4 Power and Inference as a Decision 417

FIGURE 6.17

Are the bearings acceptable?EXAMPLE 6.23

Error probabilities

The two types of error in the accep-tance sampling setting.

a

0

0 0

0

0

0

decision either is correct or is a Type II error. Only one error is possibleat one time. Figure 6.17 applies these ideas to the acceptance samplingexample.

We can assess any rule for making decisions in terms of the probabilitiesof the two types of error. This is in keeping with the idea that statisticalinference is based on probability. We cannot (short of inspecting the wholelot) guarantee that good lots of bearings will never be rejected and bad lotswill never be accepted. But by random sampling and the laws of probability,we can say what the probabilities of both kinds of error are.

Significance tests with fixed level give a rule for making decisions,because the test either rejects or fails to reject it. If we adopt the decision-making way of thought, failing to reject means deciding that is true.We can then describe the performance of a test by the probabilities of TypeI and Type II errors.

.

H

H

z

xz

.

H

z . z .

H

HH H

The mean diameter of a type of bearing is supposed to be 2.000 centimeters (cm).The bearing diameters vary Normally with standard deviation 0 010 cm.When a lot of the bearings arrives, the consumer takes an SRS of 5 bearings fromthe lot and measures their diameters. The consumer rejects the bearings if thesample mean diameter is significantly different from 2 at the 5% significance level.

This is a test of the hypotheses

: 2

: 2

To carry out the test, the consumer computes the statistic:

2

0 01/ 5

and rejects if

1 96 or 1 96

A Type I error is to reject when in fact 2.

� �

Page 59: chap_06

Reject H0

Accept H0 Reject H0

= 2(H0)µ = 2.015

(Ha)µCritical

value of xCriticalvalue of x

Introduction to Inference418 CHAPTER 6

FIGURE 6.18

The two error probabilities for Example 6.23. Theprobability of a Type I error ( ) is the probability of rejecting

: 2 when in fact 2. The probability of a Type II error () is the probability of accepting when in fact 2.015.

0

0

light shaded areaH darkshaded area H

SIGNIFICANCE AND TYPE I ERROR

a

� �

� ��

0

0

0 0

0

� ��

� � �

0

0

0 0

The probability of a Type I error is the probability of rejecting whenit is really true. In Example 6.23, this is the probability that 1 96when 2. But this is exactly the significance level of the test. The criticalvalue 1.96 was chosen to make this probability 0.05, so we do not have tocompute it again. The definition of “significant at level 0.05” is that sampleoutcomes this extreme will occur with probability 0.05 when is true.

The significance level of any fixed level test is the probability of aType I error. That is, is the probability that the test will reject thenull hypothesis when is in fact true.

The probability of a Type II error for the particular alternative 2 015in Example 6.23 is the probability that the test will fail to reject when

has this alternative value. The of the test against the alternative

H

H .

x . HH . H

Hz .

H

H H

.H

power

What about Type II errors? Because there are many values of in , wewill concentrate on one value. The producer and the consumer agree that a lotof bearings with mean 0.015 cm away from the desired mean 2.000 should berejected. So a particular Type II error is to accept when in fact 2 015.

Figure 6.18 shows how the two probabilities of error are obtained from thetwo sampling distributions of , for 2 and for 2 015. When 2,is true and to reject is a Type I error. When 2 015, accepting is a TypeII error. We will now calculate these error probabilities.

� � ��

Page 60: chap_06

6.4 Power and Inference as a Decision 419

The common practice of testing hypotheses

POWER AND TYPE II ERROR

a

� 0

0

0

0

0 0

0

0

2 015 is just the probability that the test reject . By followingthe method of Example 6.22, we can calculate that the power is about 0.92.The probability of a Type II error is therefore 1 0 92, or 0.08.

The power of a fixed level test against a particular alternative is 1minus the probability of a Type II error for that alternative.

The two types of error and their probabilities give another interpretationof the significance level and power of a test. The distinction between testsof significance and tests as rules for deciding between two hypothesesdoes not lie in the calculations but in the reasoning that motivates thecalculations. In a test of significance we focus on a single hypothesis ( )and a single probability (the -value). The goal is to measure the strengthof the sample evidence against . Calculations of power are done to checkthe sensitivity of the test. If we cannot reject , we conclude only thatthere is not sufficient evidence against , not that is actually true. Ifthe same inference problem is thought of as a decision problem, we focuson two hypotheses and give a rule for deciding between them based on thesample evidence. We therefore must focus equally on two probabilities, theprobabilities of the two types of error. We must choose one or the otherhypothesis and cannot abstain on grounds of insufficient evidence.

Such a clear distinction between the two ways of thinking is helpful forunderstanding. In practice, the two approaches often merge. We continuedto call one of the hypotheses in a decision problem . The common practiceof mixes the reasoning of significance tests and decisionrules as follows:

1. State and just as in a test of significance.

2. Think of the problem as a decision problem, so that the probabilities ofType I and Type II errors are relevant.

3. Because of step 1, Type I errors are more serious. So choose an(significance level) and consider only tests with probability of Type Ierror no greater than .

4. Among these tests, select one that makes the probability of a Type II erroras small as possible (that is, power as large as possible). If this probabilityis too large, you will have to take a larger sample to reduce the chance ofan error.

Testing hypotheses may seem to be a hybrid approach. It was, historically,the effective beginning of decision-oriented ideas in statistics. An impressivemathematical theory of hypothesis testing was developed between 1928 and

. does H

.

HPH

HH H

Htesting hypotheses

H H

Page 61: chap_06

Introduction to Inference420 CHAPTER 6

ECTION UMMARY

ECTION XERCISES

S

S

6.4 S

6.4 E

power

Type I errorType II error

6.80 Net income of banks.

6.81 Fill the bottles.

a

a

a

a

0

0

0 0

0

0

0

0

1938 by Jerzy Neyman and Egon Pearson. The decision-making approachcame later (1940s). Because decision theory in its pure form leaves you withtwo error probabilities and no simple rule on how to balance them, it hasbeen used less often than either tests of significance or tests of hypotheses.

The of a significance test measures its ability to detect analternative hypothesis. Power against a specific alternative is calculatedas the probability that the test will reject when that alternative is true.This calculation requires knowledge of the sampling distribution of the teststatistic under the alternative hypothesis. Increasing the size of the sampleincreases the power when the significance level remains fixed.

In the case of testing versus , decision analysis chooses a decisionrule on the basis of the probabilities of two types of error. Aoccurs if is rejected when it is in fact true. A occurs ifis accepted when in fact is true.

In a fixed level significance test, the significance level is theprobability of a Type I error, and the power against a specific alternative is1 minus the probability of a Type II error for that alternative.

�H H

. Hz . z .

xz

.

HH

n

xz

H z .

H

H H

H HH

In Example 6.11, we ask if the net income ofcommunity banks has changed in the last year. If is the mean percentchange for all such banks, the hypotheses are : 0 and : 0. Thedata come from an SRS of 110 banks and we assume that the populationstandard deviation is 26 4. The test rejects at the 1% level ofsignificance when 2 576 or 2 576, where

0

26 4/ 110

Is this test sufficiently sensitive to usually detect a percent change in netincome of 5%? Answer this question by calculating the power of the testagainst the alternative 5.

Example 6.17 discusses a test of : 300 against: 300, where is the mean content of cola bottles, in milliliters (ml).

The sample size is 6, and the population is assumed to have a Normaldistribution with 3. The test statistic is therefore

300

3/ 6

The test rejects at the 5% level of significance if 1 645. Powercalculations help us see how large a shortfall in bottle content the test canbe expected to detect.

(a) Find the power of this test against the alternative 298 ml.

(b) Is the power against 295 ml higher or lower than the value youfound in (a)? Why?

� � �

� �

� �

� �

Page 62: chap_06

6.4 Power and Inference as a Decision 421

6.82 Mail-order catalog sales.

6.83 Fill the bottles.

6.84 Net income of banks.

6.85 Fill the bottles.

6.86 Decide.

a

a

� �

� �

0

0 0

0

0

0

0 1

0

1

0 0

1

0

0

n

H

H

H x H

H

H

H z . z

xz

X p px X

x

p

p

X

H p

H p

H X XH

You want to see if a redesign of the cover ofa mail-order catalog will increase sales. A very large number of customerswill receive the original catalog, and a random sample of customers willreceive the one with the new cover. For planning purposes, you are willingto assume that the mean sales from the new catalog will be approximatelyNormal with 50 dollars and that the mean for the original catalog willbe 25 dollars. You decide to use a sample size of 900. You wishto test

: 25

: 25

You decide to reject if 26 and to accept otherwise.

(a) Find the probability of a Type I error, that is, the probability that yourtest rejects when in fact 25 dollars.

(b) Find the probability of a Type II error when 28 dollars. This is theprobability that your test accepts when in fact 28.

(c) Find the probability of a Type II error when 30.

(d) The distribution of sales is not Normal because many customers buynothing. Why is it nonetheless reasonable in this circumstance to assumethat the mean will be approximately Normal?

Increasing the sample size increases the power of a test whenthe level is unchanged. Suppose that in Exercise 6.81 a sample of 20bottles rather than 6 bottles had been measured. The significance test stillrejects when 1 645, but the statistic is now

300

3/ 20

Find the power of this test against the alternative 298. Compare yourresult with the power from Exercise 6.81.

Use the result of Exercise 6.80 to give the probabilitiesof Type I and Type II errors for the test discussed there.

Use the result of Exercise 6.81(a) to give the probability of aType I error and the probability of a Type II error for the test in that exercise.

You must decide which of two discrete distributions a randomvariable has. We will call the distributions and . Here are theprobabilities that they assign to the values of :

0 1 2 3 4 5 6

0.1 0.1 0.1 0.1 0.2 0.1 0.3

0.2 0.1 0.1 0.2 0.2 0.1 0.1

You have a single observation on and wish to test

: is correct

: is correct

One possible decision procedure is to accept if 4 or 6 andreject otherwise.

� �

Page 63: chap_06

Introduction to Inference422 CHAPTER 6

TATISTICS IN UMMARY

S S

6.87 A Web-based business.

6.88 (Optional) Acceptance sampling.

0 0

Statistical inference draws conclusions about a population on the basis ofsample data and uses probability to indicate how reliable the conclusionsare. A confidence interval estimates an unknown parameter. A significancetest shows how strong the evidence is for some claim about a parameter.

The probabilities in both confidence intervals and tests tell us whatwould happen if we used the recipe for the interval or test very many times.A confidence level is the probability that the recipe for a confidence intervalactually produces an interval that contains the unknown parameter. A 95%confidence interval gives a correct result 95% of the time when we use itrepeatedly. A -value is the probability that the sample would produce aresult at least as extreme as the observed result if the null hypothesis reallywere true. That is, a -value tells us how surprising the observed outcomeis. Very surprising outcomes (small -values) are good evidence that the nullhypothesis is not true.

H p

P

PP

(a) Find the probability of a Type I error, that is, the probability that youreject when is the correct distribution.

(b) Find the probability of a Type II error.

You are in charge of marketing for a Web site thatoffers automated medical diagnoses. The program will scan the results ofroutine medical tests (pulse rate, blood pressure, urinalysis, etc.) and eitherclear the patient or refer the case to a doctor. You are marketing the programfor use as part of a preventive-medicine system to screen many thousands ofpersons who do not have specific medical complaints. The program makes adecision about each patient.

(a) What are the two hypotheses and the two types of error that the programcan make? Describe the two types of error in terms of “false-positive”and “false-negative” test results.

(b) The program can be adjusted to decrease one error probability at the costof an increase in the other error probability. Which error probabilitywould you choose to make smaller, and why? (This is a matter ofjudgment. There is no single correct answer.)

The acceptance sampling test in Exam-ple 6.23 has probability 0.05 of rejecting a good lot of bearings andprobability 0.08 of accepting a bad lot. The consumer of the bearings mayimagine that acceptance sampling guarantees that most accepted lots aregood. Alas, it is not so. Suppose that 90% of all lots shipped by the producerare bad.

(a) Draw a tree diagram for shipping a lot (the branches are “bad” and“good”) and then inspecting it (the branches at this stage are “accept”and “reject”).

(b) Write the appropriate probabilities on the branches, and find the prob-ability that a lot shipped is accepted.

(c) Use the definition of conditional probability or Bayes’s formula (page349) to find the probability that a lot is bad, given that the lot isaccepted. This is the proportion of bad lots among the lots that thesampling plan accepts.

Page 64: chap_06

Chapter 6 Review Exercises 423

HAPTER EVIEW XERCISESC 6 R E

A. CONFIDENCE INTERVALS

B. SIGNIFICANCE TESTS

6.89 Company cash flow and investment.

�� �

��

These ideas are the foundation for the rest of this book. We will havemuch to say about many statistical methods and their use in practice. Inevery case, the basic reasoning of confidence intervals and significance testsremains the same. Here are the most important things you should be able todo after studying this chapter.

1. State in nontechnical language what is meant by “95% confidence”or other statements of confidence in statistical reports.

2. Calculate a confidence interval for the mean of a Normal populationwith known standard deviation , using the recipe / .

3. Recognize when you can safely use this confidence interval recipe andwhen the data collection design or a small sample from a skewedpopulation makes it inaccurate. Understand that the margin of errordoes not include the effects of undercoverage, nonresponse, or otherpractical difficulties.

4. Understand how the margin of error of a confidence interval changeswith the sample size and the level of confidence .

5. Find the sample size required to obtain a confidence interval ofspecified margin of error when the confidence level and otherinformation are given.

1. State the null and alternative hypotheses in a testing situation whenthe parameter in question is a population mean .

2. Explain in nontechnical language the meaning of the -value whenyou are given the numerical value of for a test.

3. Calculate the one-sample statistic and the -value for both one-sidedand two-sided tests about the mean of a Normal population.

4. Assess statistical significance at standard levels , either by comparingto or by comparing to standard Normal critical values .

5. Recognize that significance testing does not measure the size or im-portance of an effect.

6. Recognize when you can use the test and when the data collec-tion design or a small sample from a skewed population makes itinappropriate.

x z n

C

m

PP

z P

P z z

z

How much a company invests in itsbusiness depends on how much cash flow it has. What factors influence therelationship between cash flow and investment? Here’s a clever suggestion: Ifan industry has an active market in used equipment, investment will be lesssensitive to cash flow (“lower elasticity” in economic jargon). Companiesin these industries can borrow easily because lenders know they can sellthe company’s equipment if it defaults. A study of 270 manufacturing

Page 65: chap_06

Introduction to Inference424 CHAPTER 6 �

6.90 Foreign investment and exchange rates.

6.91 Wine.

6.92 Annual household income.

14

15

,

industries measured SHRUSED, the proportion of secondhand equipmentin the industry’s total investment. The study found that “industries withSHRUSED values above the median have smaller cash flow elasticitiesthan those with lower SHRUSED values; the difference is significant atthe 5% level in the full sample.” Explain to someone who knows nostatistics why this study gives good reason to think that an active used-equipment market really does change the relationship between cash flow andinvestment.

We might suspect that foreigndirect investment (FDI), in which U.S. companies buy or build facilitiesoverseas, depends on the rate at which the dollar can be exchanged withforeign currencies. A study of 3036 FDI transactions found that “there is nostatistically significant relationship between the level of the exchange rateand foreign investment relative to domestic investment.”

(a) Explain this conclusion to someone who knows no statistics.

(b) We are reasonably confident that if there were a relationship betweenFDI and exchange rate that was large enough to be of interest, this studywould have found it. Why?

Many food products contain small quantities of substances that wouldgive an undesirable taste or smell if they were present in large amounts. Anexample is the “off-odors” caused by sulfur compounds in wine. Oenologists(wine experts) have determined the odor threshold, the lowest concentrationof a compound that the human nose can detect. For example, the odorthreshold for dimethyl sulfide (DMS) is given in the oenology literature as25 micrograms per liter of wine ( g/l). Untrained noses may be less sensitive,however. Here are the DMS odor thresholds for 10 beginning students ofoenology:

31 31 43 36 23 34 32 30 20 24

Assume (this is not realistic) that the standard deviation of the odor thresholdfor untrained noses is known to be 7 g/l.

(a) Make a stemplot to verify that the distribution is roughly symmetricwith no outliers. (A Normal quantile plot confirms that there are nosystematic departures from Normality.)

(b) Give a 95% confidence interval for the mean DMS odor thresholdamong all beginning oenology students.

(c) Are you convinced that the mean odor threshold for beginning studentsis higher than the published threshold, 25 g/l? Carry out a significancetest to justify your answer.

A government report gives a 90% confidenceinterval for the 1999 median annual household income as $40 816 $314.This result was calculated by advanced methods from the Current PopulationSurvey, a multistage random sample of about 50,000 households.

(a) Would a 95% confidence interval be wider or narrower? Explain youranswer.

(b) Would the null hypothesis that the 1999 median household incomewas $40,000 be rejected at the 10% significance level in favor of thetwo-sided alternative?

� �

Page 66: chap_06

Chapter 6 Review Exercises 425

6.93 Annual household income.

6.94 Too much cellulose to be profitable?

6.95 Where do you buy?

Store type

a

x s

0

16

weekly

x

H H

s s

Refer to the previous problem. Give a 90%confidence interval for the 1999 median household income. Use52.14 as the number of weeks in a year.

Excess cellulose in alfalfa reduces the“relative feed value” of the product that will be fed to dairy cows. Ifthe cellulose content is too high, the price will be lower and the producerwill have less profit. An agronomist examines the cellulose content of onetype of alfalfa hay. Suppose that the cellulose content in the populationhas standard deviation 8 milligrams per gram (mg/g). A sample of 15cuttings has mean cellulose content 145 mg/g.

(a) Give a 90% confidence interval for the mean cellulose content in thepopulation.

(b) A previous study claimed that the mean cellulose content was140 mg/g, but the agronomist believes that the mean is higher than thatfigure. State and and carry out a significance test to see if the newdata support this belief.

(c) The statistical procedures used in (a) and (b) are valid when severalassumptions are met. What are these assumptions?

Consumers can purchase nonprescription medicationsat food stores, mass merchandise stores such as Kmart and Wal-Mart, orpharmacies. About 45% of consumers make such purchases at pharmacies.What accounts for the popularity of pharmacies, which often charge higherprices?

A study examined consumers’ perceptions of overall performance of thethree types of stores, using a long questionnaire that asked about such thingsas “neat and attractive store,” “knowledgeable staff,” and “assistance inchoosing among various types of nonprescription medication.” A perfor-mance score was based on 27 such questions. The subjects were 201 peoplechosen at random from the Indianapolis telephone directory. Here are themeans and standard deviations of the performance scores for the sample:

Food stores 18.67 24.95Mass merchandisers 32.38 33.37Pharmacies 48.60 35.62

We do not know the population standard deviations, but a sample standarddeviation from so large a sample is usually close to . Use in place of theunknown in this exercise.

(a) What population do you think that the authors of the study want todraw conclusions about? What population are you certain that they candraw conclusions about?

(b) Give 95% confidence intervals for the mean performance for each typeof store.

(c) Based on these confidence intervals, are you convinced that consumersthink that pharmacies offer higher performance than the other types ofstores? Note that in Chapter 12 we will study a statistical method forcomparing means of several groups.

Page 67: chap_06

Introduction to Inference426 CHAPTER 6 �

6.96 CEO pay.

6.97 Large samples.

6.98 Roulette.

6.99 Significant.

6.100 Significant.

6.101 Welfare reform.

a

a

0

0

0

0

x .s

H

H

s

x HP

x .

P

.

C

P Hn

H H

.

P .

A study of the pay of corporate chief executive officers (CEOs)examined the increase in cash compensation of the CEOs of 104 companies,adjusted for inflation, in a recent year. The mean increase in real com-pensation was 6 9%, and the standard deviation of the increases was

55%. Is this good evidence that the mean real compensation of allCEOs increased that year? The hypotheses are

: 0 (no increase)

: 0 (an increase)

Because the sample size is large, the sample is close to the population , sotake 55%.

(a) Sketch the Normal curve for the sampling distribution of whenis true. Shade the area that represents the -value for the observedoutcome 6 9%.

(b) Calculate the -value.

(c) Is the result significant at the 0 05 level? Do you think the studygives strong evidence that the mean compensation of all CEOs went up?

Statisticians prefer large samples. Describe briefly the effectof increasing the size of a sample (or the number of subjects in an experiment)on each of the following:

(a) The width of a level confidence interval.

(b) The -value of a test, when is false and all facts about the populationremain unchanged as increases.

(c) The power of a fixed level test, when , the alternative hypothesis,and all facts about the population remain unchanged.

A roulette wheel has 18 red slots among its 38 slots. You observemany spins and record the number of times that red occurs. Now you wantto use these data to test whether the probability of a red has the value thatis correct for a fair roulette wheel. State the hypotheses and that youwill test. (We will describe the test for this situation in Chapter 8.)

When asked to explain the meaning of “statistically significantat the 0 05 level,” a student says, “This means there is only probability0.05 that the null hypothesis is true.” Is this a correct explanation ofstatistical significance? Explain your answer.

Another student, when asked why statistical significance appearsso often in research reports, says, “Because saying that results are significanttells us that they cannot easily be explained by chance variation alone.” Doyou think that this statement is correct? Explain your answer.

A study compares two groups of mothers with youngchildren who were on welfare two years ago. One group attended a voluntarytraining program offered free of charge at a local vocational school andadvertised in the local news media. The other group did not choose to attendthe training program. The study finds a significant difference ( 0 01)between the proportions of the mothers in the two groups who are still onwelfare. The difference is not only significant but quite large. The reportsays that with 95% confidence the percent of the nonattending group still

Page 68: chap_06

Chapter 6 Review Exercises 427

6.102 Simulation.

6.103 Simulation.

6.104 Simulation.

6.105 Simulation.

6.106 Simulation.

a

� �

0

0

P .

nN ,

nN ,

H H .

H

.

n

x m m

x

on welfare is 21% 4% higher than that of the group who attended theprogram. You are on the staff of a member of Congress who is interested inthe plight of welfare mothers and who asks you about the report.

(a) Explain briefly and in nontechnical language what “a significant differ-ence ( 0 01)” means.

(b) Explain clearly and briefly what “95% confidence” means.

(c) Is this study good evidence that requiring job training of all welfaremothers would greatly reduce the percent who remain on welfare forseveral years?

Use a computer to generate 5 observations from the Nor-mal distribution (20 5) with mean 20 and standard deviation 5. Find the95% confidence interval for . Repeat this process 100 times and then countthe number of times that the confidence interval includes the value 20.Explain your results.

Use a computer to generate 5 observations from the Nor-mal distribution (20 5) with mean 20 and standard deviation 5. Test

: 20 versus : 20 at the 0 05 significance level. Repeatthis process 100 times and then count the number of times that you reject

. Explain your results.

Use the same procedure for generating data as in the previousexercise. Now test the null hypothesis that 22 5. Explain your results.

Figure 6.2 (page 366) demonstrates the behavior of a confidenceinterval in repeated sampling by showing the results of 25 samples fromthe same population. Now you will do a similar demonstration, though inan artificial setting. Suppose that the net assets of a large population ofcommunity banks follow the Uniform distribution between 20 and 500 (inmillions of dollars). Then the mean assets of these banks are 240 andthe standard deviation is 150.

(a) Simulate the drawing of 25 SRSs of size 100 from this population.

(b) For calculating the confidence intervals, you are willing to assume thatthe sample means are approximately Normal. Explain why this is areasonable assumption.

(c) The 50% confidence interval for the population mean has the form. What is the margin of error ? (Use 150 for the standard

deviation.)

(d) Use your software to calculate the 50% confidence interval for foreach of your 25 samples. Verify the computer’s calculations by checkingthe interval given for the first sample against your result in (b). Use the

reported by the software.

(e) How many of the 25 confidence intervals contain the true mean240? If you repeated the simulation, would you expect exactly the samenumber of intervals to contain ? In a very large number of samples,what percent of the confidence intervals would contain ?

In the previous exercise you simulated the assets of 25 SRSs of100 community banks. Now use these samples to demonstrate the behaviorof a significance test. We know that the population standard deviationis 150 and we are willing to assume that the sample means areapproximately Normal.

��

� �

Page 69: chap_06

Introduction to Inference428 CHAPTER 6

HAPTER ASE TUDY XERCISES

C 6 C S E

CASE STUDY 6.1: Older customers in restaurants.

Question 50–64 65–79

Ambience

Menu design

Service

a

0

0

17

H

H

Px

.P

H

(a) Use your software to carry out a test of

: 240

: 240

for each of the 25 samples.

(b) Verify the computer’s calculations by using Table A to find the -valueof the test for the first of your samples. Use the reported by yoursoftware.

(c) How many of your 25 tests reject the null hypothesis at the 0 05significance level? (That is, how many have -value 0.05 or smaller?)

(d) Because the simulation was done with 240, samples that lead torejecting produce the wrong conclusion. In a very large number ofsamples, what percent would falsely reject the hypothesis?

Persons aged 55 and overrepresented 21.3% of the U.S. population in the year 2000. This group is expectedto increase to 30.5% by 2025. In terms of actual numbers of people, the increaseis from 58.6 million to 101.4 million. Restaurateurs have found this market to beimportant and would like to make their businesses attractive to older customers.One study used a questionnaire to collect data from people aged 50 and over. Forone part of the analysis, individuals were classified into two age groups: 50–64 and65–79. There were 267 people in the first group and 263 in the second. One setof items concerned ambience, menu design, and service. A series of questions wasrated on a 1 to 5 scale with 1 representing “strongly disagree” and 5 representing“strongly agree.” In some cases the wording of questions has been shortened in thetable below. Here are the means:

Most restaurants are too dark 2.75 2.93Most restaurants are too noisy 3.33 3.43Background music is often too loud 3.27 3.55Restaurants are too smoky 3.17 3.12Tables are too small 3.00 3.19Tables are too close together 3.79 3.81

Print size is not large enough 3.68 3.77Glare makes menus difficult to read 2.81 3.01Colors of menus make them difficult to read 2.53 2.72

It is difficult to hear the service staff 2.65 3.00I would rather be served than serve myself 4.23 4.14I would rather pay the server than a cashier 3.88 3.48Service is too slow 3.13 3.10

Page 70: chap_06

Chapter 6 Case Study Exercises 429

CASE STUDY 6.2: Accessibility concerns.

Question 50–64 65–79

Accessibility and comfort inside

Outside accessibility

z P

z

First examine the means of the people who are 50 to 64. Order the statementsaccording to the means and describe the results. Then do the same for the oldergroup. For each question compute the statistic and the associated -value for thecomparison between the two groups. For these calculations you can assume thatthe denominator in the test statistic is 0.08, so is simply the difference in the meansdivided by 0.08. Note that you are performing 13 significance tests in this exercise.Keep this in mind when you interpret your results. Write a report summarizing yourwork.

Refer to the previous question. Thequestionnaire also asked about accessibility both inside the restaurant and outside.Analyze these data and write a report.

Most chairs are too small 2.49 2.56Bench seats are usually too narrow 3.03 3.25Salad bars and buffets are difficult to reach 3.04 3.09Serving myself from salad bars and buffets is difficult 2.58 2.75Floors around salad bars and buffets are often slippery 2.84 3.01Aisles are too narrow 3.04 3.20Bathroom stalls are too narrow 2.82 3.10

Doors are too heavy 2.51 3.01Curbs near entrance are difficult 2.54 3.07Parking spaces are too narrow 2.83 3.16Distance from parking lot is too far 2.33 2.64Parking lots are too dark at night 2.84 3.26