Download - Whenever the ﬁrst word of a problem is pre- ) = 0 20 ...pages.stat.wisc.edu/~wardrop/courses/f2011lectexamp14.pdf · Whenever the ﬁrst word of a problem is pre-ceeded by an

*Whenever the first word of a problem is pre-ceeded by an asterisk, the problem is for enrich-ment purposes only.

Chapter 1 Lecture Examples: FALL 2011

1. A CM has a sample space that consists offour elements, denoted: a, b, c and d. As-suming the ELC, find the probabilities ofeach of the following events.

(a) A = {a}(b) B = {a, b}(c) C = {b, c, d}

2. Refer to the previous problem. Now, in-stead of the ELC, assume that the probabil-ities of a, b, c and d follow the ratio 9:3:3:1.(Note: If interested, seeMendelian inheri-tance in Wikipedia for a discussion of the9:3:3:1 ratio, as well as the 1:2:1 and the3:1 ratios.)

(a) Determine the probabilities of the in-dividual outcomes a, b, c and d.

(b) Calculate the probabilities of theeventsA, B andC given in the pre-vious problem.

3. You are given the following information:the eventsA andB are disjoint;P (A) =0.40; andP (B) = 0.25. Calculate the fol-lowing probabilities.

(a) P (A or B).

(b) P (Ac).

(c) P (Bc).

4. You are given the following information:P (A) = 0.25; P (B) = 0.45; P (AB) =0.20. CalculateP (A or B).

5. What is wrong with each of the following?

(a) P (A) = 0.20; P (B) = 0.55; andP (AB) = 0.25.

(b) P (A) = 0.60; P (B) = 0.55; andAandB are disjoint.

6. Consider a sample space with three mem-bers: 1, 2 and 3. Assume the ELC and i.i.d.trials. The following table helps to visual-ize the results of the first two trials:

X2

X1 1 2 31 (1,1) (1,2) (1,3)2 (2,1) (2,2) (2,3)3 (3,1) (3,2) (3,3)

The nine entries in this table are equallylikely.

DefineX = X1 +X2, the total of the num-bers obtained in the first two trials. Find thesampling distribution ofX.

7. Consider a sample space with five mem-bers: 0, 1, 2, 3 and 4. Assume the ELCand i.i.d. trials. The following table helpsto visualize the results of the first two trials:

X2

X1 0 1 2 3 40 (0,0) (0,1) (0,2) (0,3) (0,4)1 (0,1) (1,1) (1,2) (1,3) (1,4)2 (0,2) (2,1) (2,2) (2,3) (2,4)3 (0,3) (3,1) (3,2) (3,3) (3,4)4 (0,4) (4,1) (4,2) (4,3) (4,4)

The 25 entries in this table are equallylikely.

DefineX = X1X2, the product of the num-bers obtained in the first two trials. Find thesampling distribution ofX.

1

Chapter 1 Lecture Examples: Continued

8. Consider a sample space with three mem-bers: 1, 2 and 3. Do not assume the ELC.Instead assume the following:

P (1) = 0.2, P (2) = 0.1 andP (3) = 0.7.

Assume i.i.d. trials. The following tablehelps to visualize the results of the first twotrials:

X2

X1 1 2 31 (1,1) (1,2) (1,3)2 (2,1) (2,2) (2,3)3 (3,1) (3,2) (3,3)

Note that these nine entries are not equallylikely.

DefineX = X1 +X2, the total of the num-bers obtained in the first two trials. Find thesampling distribution ofX.

9. Refer to the previous question. LetX =X1 + X2 + X3, the total of the numbersobtained in the first three trials. Find thesampling distribution ofX.

10. Consider the CM: Cast a balanced die fivetimes and compute the sum,X, of the fivenumbers obtained. Assume independenceof casts. The table below presents a hugeamount of information: the exact probabil-ities for X; the computer simulation ap-proximations based onm =100,000 runs;the nearly certain interval for eachP (X =x). Note that every nearly certain inter-val contains the exact probability; i.e. everyone is correct.

ComputerSimulation Nearly Certain

Exact Approx. Intervalx P (X = x) r.f.(X = x) Lower Upper5 0.00013 0.00014 — —6 0.00064 0.00067 0.00042 0.000927 0.00193 0.00176 0.00136 0.002168 0.00450 0.00445 0.00382 0.005089 0.00900 0.00877 0.00789 0.0096510 0.01620 0.01606 0.01487 0.0172511 0.02636 0.02637 0.02485 0.0278912 0.03922 0.03977 0.03792 0.0416213 0.05401 0.05482 0.05266 0.0569814 0.06944 0.06955 0.06714 0.0719615 0.08372 0.08492 0.08228 0.0875616 0.09452 0.09628 0.09348 0.0990817 0.10031 0.09777 0.09495 0.1005918 0.10031 0.09965 0.09681 0.1024919 0.09452 0.09498 0.09220 0.0977620 0.08372 0.08297 0.08035 0.0855921 0.06944 0.06882 0.06642 0.0712222 0.05401 0.05407 0.05192 0.0562223 0.03922 0.03813 0.03631 0.0399524 0.02636 0.02646 0.02494 0.0279825 0.01620 0.01691 0.01569 0.0181326 0.00900 0.00954 0.00862 0.0104627 0.00450 0.00447 0.00384 0.0051028 0.00193 0.00182 0.00142 0.0022229 0.00064 0.00072 0.00047 0.0009730 0.00013 0.00013 — —

Total 0.99996 1.00000

You will be asked to use this table in Chap-ter 1 homework.

2


1. Anna likes to play basketball. Assume thatAnna’s free throw attempts are BT withp = 0.65.

(a) Anna will shoot four free throws. Cal-culate the probability that she obtainsS, S, F, S, in that order.

(b) Anna will shoot five free throws. Cal-culate the probability that she obtainsa total of exactly four successes.

(c) Next week, Anna will shoot five freethrows on each of four days: Mondaythru Thursday. If she makes exactlyfour free throws on a particular day,we say that the event ‘Brad’ has oc-curred.

i. Calculate the probability thatBrad will occur on Mondayand Tuesday and not occur onWednesday and Thursday. (I amasking for one answer.)

ii. Calculate the probability thatBrad will occur a total of exactlytwo times next week.

(d) Next week Anna will shoot four freethrows on Friday. The number of freethrows she shoots on Saturday willequal the number that she makes onFriday.

Let Y denote the total number of freethrows that Anna makes on Fridayand Saturday combined.

i. CalculateP (Y = 2).

ii. CalculateP (Y = 6).

2. Refer to the previous problem. Anna willshoot nine free throws on Sunday. Definethe following events.

• A: She makes her first two freethrows.

• B: She misses her last free throw.

• C: She makes exactly four of her firstfive free throws.

• D: She makes exactly six of her ninefree throws.

Calculate the following probabilities.

(a) P (A).

(b) P (B).

(c) P (AB).

(d) P (C).

(e) P (D).

(f) P (AC).

(g) P (CD).

(h) P (ABCD).

3. Let X ∼ Bin(256, 0.50). Calculate themean, variance and standard deviation ofX. Write down the equation ofZ, the stan-dardized version ofX.

4. Let X ∼ Bin(625, 0.20). For each ofthe following, use the appropriate websiteto obtain the exact probability and use adifferent appropriate website to obtain thenormal curve approximation.

(a) P (X ≥ 130).

(b) P (X = 125).

(c) P (X ≤ 140).

(d) P (115 ≤ X ≤ 140).

(e) P (118 ≤ X < 137).

3


5. *Assume thatn = 10 members will beselected at random with (without) replace-ment from a finite population withN =200 (N = 400) andp = 0.40. Below is theexact sampling distribution forX, the totalnumber of successes that will be obtainedin the sample, both for sampling with andwithout replacement. Comment.

N = 200 N = 400With Without With Without

x P (X = x) P (X = x) P (X = x) P (X = x)0 0.0060 0.0052 0.0060 0.00561 0.0403 0.0373 0.0403 0.03882 0.1209 0.1183 0.1209 0.11963 0.2150 0.2177 0.2150 0.21634 0.2508 0.2573 0.2508 0.25405 0.2007 0.2041 0.2007 0.20246 0.1115 0.1099 0.1115 0.11087 0.0425 0.0397 0.0425 0.04118 0.0106 0.0092 0.0106 0.00999 0.0016 0.0012 0.0016 0.0014

10 0.0001 0.0001 0.0001 0.0001

6. In the 2008 presidential election in Mon-tana, Barack Obama received 232,159votes and John McCain received 243,882votes. In this example, I will ignore votescast for any other candidates. The finitepopulation size isN = 232,159+ 243,882= 476,041. I will designate a vote for Mc-Cain as a success, givingp = 0.512 andq = 0.488.

Imagine a lazy pollster named Larry. Larryplans to selectn = 5 persons at randomwith replacement from the population. Hecounts the number of successes in his sam-ple and calls itX. He decides that ifX ≥3, then he will declare McCain to be thewinner. If X ≤ 2, then he will declareObama the winner.

Calculate by hand the probability that Larrywill correctly predict the winner.

7. Refer to the previous exercise. Larry de-cides that the answer we obtained is toosmall. So he repeats the above withn =501instead ofn = 5. He will declare McCainthe winner if, and only if,X ≥ 251.

What is the exact probability that Larry willcorrectly predict the winner?

Next, use the normal curve to obtain the ap-proximate probability that Larry will cor-rectly predict the winner.

4


1. Use the website for exact binomial prob-abilities to complete the following table.Each entry is the probability that the fixed-width CI p ± 0.04 is correct for the givencombination ofn andp.

pn 0.10 0.50 0.70

100200

2. Nancy observesn = 300 BTs and obtainsa total of 72 successes. Use the snc approx-imation to obtain the 80%, 95% and 99%two-sided confidence intervals forp. Com-pare your answers.

3. Refer to the previous question. Supposethat Nancy observesn = 1200 BTs andobtains a total of 288 successes. Use thesnc approximation to obtain the two-sided95% confidence interval forp. Comparethis current answer to the answer in the pre-vious question.

4. Use the website to obtain the exact two-sided CI forp for the following situations.

(a) n = 20; x = 5; 95%.

(b) n = 24; x = 11; 95%.

(c) n = 28; x = 12; 90%.

5. Use the website to obtain the exact one-sided upper CI forp for the following sit-uations.

(a) n = 20; x = 2; 95%.

(b) n = 33; x = 1; 95%.

(c) n = 38; x = 3; 90%.

6. Tom wants to kill mosquitoes. He exposes100 mosquitoes to a certain poison and 72of them die. How would you analyze thesedata?

7. Molly examines 200 plants of a certaintype. She notes that four of the plants arediseased. How would you analyze thesedata?

8. Discuss the meaning of 95% confidence.

9. Using the snc approximation, Bob con-structs a 90% CI forp and gets0.2000 ±0.0294. Obtain the 98% CI forp for Bob’sdata.

10. Bert observesn BT and calculates the ap-proximate 95% CI forp. Bert obtains theinterval p ± 0.0661. Diana observes4n BTand her value ofp is the same as Bert’s.

(a) Calculate Diana’s 95% CI forp.

(b) Combine Bert’s and Diana’s data toobtain a sample of size5n. Calculatethe 95% CI forp for these5n trials.

(c) Givenn = 200 and p > 0.50, calcu-late the value ofp.

obtains the same value

11. In each of the following situations, usethe website to obtain the exact 95% upperbound forp. Do you see a pattern in youranswers?

(a) x = 0 successes inn = 10 trials.

(b) x = 0 successes inn = 100 trials.

(c) x = 0 successes inn = 1000 trials.

(d) x = 0 successes inn = 10000 trials.

5


12. *There are three possibilities for a CI forp:

• It can be incorrect by being too small:the upper boundu is smaller thanp.

• It can be incorrect by being too large:the lower boundl is larger thanp.

• It can be correct:l ≤ p ≤ u.

I want to examine the performance of theapproximate two-sided CI forp whenn =200 andp = 0.15. Here’s how I proceed.

The CI is

p ± 1.96√

pq/200.

A particular CI is too small if, and only if,

u = p + 1.96√

pq/200 < 0.15.

Now, we could proceed with algebra, but Iprefer using a ‘spreadsheet’ approach. Be-causen = 200, the possible values ofx are0, 1, 2, . . . , 200. Putting these 201 values inthe spreadsheet, I evaluateu for each valueof x. I find that this expression is less than0.15 if, and only if,x ≤ 21. (You can ver-ify that x = 21 givesp = 21/200 = 0.105and

u = 0.105 + 1.96√

0.105(0.895)/200 =

0.105 + 0.042 = 0.147

and thatx = 22 givesp = 22/200 = 0.110and

u = 0.110 + 1.96√

0.110(0.890)/200 =

0.110 + 0.043 = 0.153.)

Similarly, a particular CI is too large if, andonly if,

l = p − 1.96√

pq/200 > 0.15.

Again, using my spreadsheet approach, wesee that this happens if, and only if,x ≥ 42.

Putting these results together, we see thatthe CI is correct if, and only if,22 ≤ X ≤41.

Now, we go to the binomial website, enterp = 0.15 andn = 200, and enter firstx =41 and thenx = 21. We obtain

P (22 ≤ X ≤ 41) =

P (X ≤ 41) − P (X ≤ 21) =

0.9860 − 0.0415 = 0.9445.

This is not the desired 95%, but it is close.

By contrast, the exact 95% CI is correct if,and only if, 20 ≤ X ≤ 40. (I obtainedthis by trial and error.) For this interval,the probability it is correct forp = 0.15 is0.9780−0.0149 = 0.9531 which is≥ 0.95,as promised.

6


1. Suppose thatX ∼ Poisson(16). Calculatethe mean, variance and standard deviationof X.

2. Suppose thatX ∼ Poisson(16). Use thewebsite to calculate the following probabil-ities.

(a) P (X = 15).

(b) P (X ≤ 15).

(c) P (X > 15).

(d) P (16 ≤ X ≤ 24).

3. Suppose thatX ∼ Poisson(100). Use thenormal curve website to approximate thefollowing probabilities. Also, use the Pois-son website to obtain the exact probabili-ties.

(a) P (X > 105).

(b) P (90 < X < 116).

4. Suppose thatX ∼ Poisson(θ), with θ un-known. Given thatX = 80,

(a) Use the snc to obtain the approximate90% CI forθ.

(b) Use the website to obtain the exact90% CI forθ.

5. Tammy is observing a Poisson Process witha rate of 3 occurrences per hour. LetX de-note the number of successes she will ob-serve in a 2.5 hour period. Use the websiteto findP (X = 9).

6. Refer to the previous question. AfterTammy has finished her data collection,Ralph observes the same process for 90minutes. LetY denote the total number ofsuccesses observed by Tammy and Ralph.Use the website to findP (Y ≤ 15).

7. Refer to the previous question. Supposethat Tammy and Ralph observe a total of 17successes. Now, pretending that we don’tknow that the rate is 3 per hour, use thewebsite to calculate the 95% CI for the rate.Is your CI correct?

8. I observe a Poisson Process for 10 hoursand count 255 successes.

(a) Use the snc approximation to obtainthe 95% confidence interval for rateper hour.

(b) Use the exact website to obtain the95% confidence interval for rate perhour.

(c) Compare your two CI’s. Comment.

9. *Suppose that Alana observes a PoissonProcess during the time interval[0, 2]. Sup-pose that Bill observes the same PoissonProcess during the time iterval[1, 4]. LetX (Y ) denote the number of successes thatAlana (Bill) observes. LetZ = X + Y .

Discuss how you would calculateP (Z =4).

10. *Below is a mathematically valid way toassign probabilities to the nonnegative in-tegers. It is called thegeometric distribu-tion with parameterλ, with 0 < λ < 1.

P (X = x) = λ(1 − λ)x, x = 0, 1, 2, . . . .

It can be shown that if one observes

X1, X2, X3, . . .Xn

that are i.i.d. geometric, then the best (max-imum likelihood estimate) ofλ is 1/(1 +X), whereX is the mean (arithmetic aver-age) of the

X1, X2, X3, . . .Xn

7


Three researchers A, B and C want to esti-mateP (X = 0).

Researcher A believes the data come froma geometric distribution. Hence, to re-searcher A, the answer isλ and he/shewants to estimate it.

Researcher B believes the data come froma Poisson distribution. Hence, to researcherB, the answer ise−θ and he/she wants toestimate it.

Researcher C is not willing to make anyassumptions and simply wants to estimateP (X = 0).

Suppose, for concreteness thatn = 50. Thethe various estimates will be:1/(1 + x) forA; e−x for B; andy/50 for C, wherey isthe observed number of 0’s in the data set.

Now, suppose that, in fact, the true distribu-tion is geometric withλ = 0.5. Rememberthat only Nature would know this. How dothe various researchers do?

I answer this question by performing a sim-ulation study with 1000 runs. To make thisconcrete, below are the results of my firstrun.

I generate 50 i.i.d. random variables fromthe geometric distribution withλ = 0.5 andobtain the following values.

x: 0 1 2 3 4 5 6 7Freq.: 27 9 7 2 2 1 1 1

You can verify thatx = 55/50 = 1.1.Thus, the estimates are1/(1 + 1.1) =0.4762 for A; e−1.1 = 0.3329 for B; andy/50 = 27/50 = 0.54 for C. Because thecorrect answer is 0.5, we see that A did bet-ter than C and B did very poorly indeed.

But as I said in the Course Notes, one sam-ple is not conclusive. Thus, I performed999 more runs. I will summarize my resultsbelow.

Each researcher has 1000 values of his/herestimate. The mean of these 1000 estimatesis: 0.5003 for A; 0.3683 for B; and 0.4948for C. Remembering that the correct an-swer is 0.5, we see that, on average, the es-timates of A and C are acceptable, but theestimate of B is horrible.This generallyhappens: If you assume the wrong modelyou get very bad answers. To assume nomodel, as C does, while usually inefficient,is not misleading.

Next, following the example in the CourseNotes, the mean absolute deviation is:0.039815 for A; and 0.055760 for C. TheMAD is 28.6% smaller for A than for C.

I repeated the above simulation study withn = 100 and still 1000 runs. For thisnew simulation study, the MAD for C is0.038970, which is very close to the MADof 0.039815 for A whenn = 50. Thus, wecould say that the effect of not knowing touse the geometric model is that we wouldneed to collect twice as much data to getthe same quality of an answer.

8


1. Describe how to find the area under theχ2(6) to the right of 8.04.

2. Describe how to find theχ2

0.01(6). Thewebsite gives me 16.81; explain what thismeans.

3. Suppose that a male human with AO bloodimpregnates a female human with BOblood.

(a) What are the possible blood types forthe child?

(b) Do you have hypothesized probabili-ties for each of these blood types?

4. Discuss Type 1 and Type 2 errors in thecontext of the snapdragon example in theCourse Notes.

5. (Hypothetical data.) A cross between whiteand yellow summer squash gave the fol-lowing colors.

Color White Yellow GreenNumber 244 50 26

Are these data consistent with the 12:3:1ratio predicted by a genetic model? Useα = 0.05. Also, calculate the P-value.

6. (Hypothetical data.) In a breeding exper-iment, white chickens with small combswere mated and produced 240 offspring.Are these data consistent with the 9:3:3:1ratio predicted by a genetic model? Useα = 0.10. Also, calculate the P-value.

White feathers White feathers Dark feathers Dark feathersType Small comb Large comb Small comb Large combNumber 129 47 47 17

7. *Bob shootsm = 3 free throws every dayfor n = 160 days. Each day he counts thenumber of successes he obtains. His dataare below.

Number ofS’s 0 1 2 3Number of days 44 40 34 42

Are these data consistent with Bob’s shotsbeing BT withp unknown? Useα = 0.01.Also, calculate the P-value.

8. It is possible to perform a Goodness of Fittest without actually determining the valueof χ2

α(df). Simply calculate the P-valueand reject the null hypothesis if, and onlyif the P-value is≤ α. This example willgive you practice with this idea.

For each of the situations below first de-termine the P-value and then determinewhether or not one should reject the nullfor a Goodness of Fit test withα equal: to0.10; to 0.05; and to 0.01.

(a) df = 3 andχ2 = 6.01.

(b) df = 4 andχ2 = 7.91.

(c) df = 5 andχ2 = 11.47.

(d) df = 6 andχ2 = 18.62.

9


1. If we divide Katie’s data (Model Project 5)into four equal sized segments of 25 trials,her number of successes per segment are:6, 8, 5 and 10.

Analyze these data descriptively and alsowith a Chi-Squared test.

2. In her report on her statistics project, Kris-ten Joiner wrote:

I chose to test Muffin’s prefer-ence for balls. She has twoballs we often throw for her tofetch. One is small and blue, theother is slightly larger, heavierand red. My father rolled bothballs for her with one hand at thesame time; I recorded which ballshe chose to chase first. Arbi-trarily, the blue ball was labeleda success.

Here is a listing of Kristen’s 96 trials, inorder from left to right.

S S S F F S S S S F F S

S F S F F S F S S S S F

S S S F F F S S S S F F

F S F F S F S F S F S F

F F F F F S F S F F S S

S S F S S F S F S S S S

S F F S S F F S S S S S

S S F S S S S S S F F S

(a) Investigate the second assumption ofBT using four segments of size 24.

(b) Investigate the second assumption ofBT using two segments of size 48.

(c) Investigate the issue of independenceof trials.

3. Below is a listing of Boone’s data, in orderfrom left to right. Use these data to checkfor memory.

S F F S S S F S F S

S F S S S F S S S S

F S S F S S S F S S

F F S S S S S S S F

F S F S S F S F F S

S F S F S F S S S F

S F S S F S S S S S

S S S S S S S S S F

S F S S S S F S F F

S S S S S S F S S S

4. (a) Below are tables (1–3) from three dif-ferent studies. Match each Table tothe correct statement below.

A This table provides evidence thatp increased over the course of thestudy.

B This table provides evidence thatp decreased over the course of thestudy.

C This table provides no evidencethatp changed over the course ofthe study.

11


Table 1Half S F Total1st 54 89 1432nd 65 78 143



5. Below are tables (4, 5 and 6) from threedifferent studies. Note that at least one ofthese tables is the answer to more than oneof the followingsix questions.

(a) Which table is for the study that hadan F on its first trial and an S on itslast trial?

(b) Which table is for the study that hadan F on its first and last trials?

(c) Which table is for the study that hadan S on its first trial and an F on itslast trial?

(d) Which table is for the study in whichthe majority of trials yielded an S?

(e) Which table is for the study in whichthe proportion of successes after anS was smaller than the proportion ofsuccesses after an F?

(f) Which table is for the study in whichthe proportion of successes after anS was equal to the proportion of suc-cesses after an F?

Table 4Current

Prev. S F TotalS 96 144 240F 144 216 360

Total 240 360 600

Table 5Current

Prev. S F TotalS 77 78 155F 77 122 199

Total 154 200 354

Table 6Current

Prev. S F TotalS 82 74 156F 75 36 111

Total 157 110 267

12


1. Alma plans to observeY ∼ Bin(200,0.40).

(a) Calculate the point predictiony.

(b) Use the binomial calculator to deter-mine the probability that the pointprediction will be correct.

(c) Use the snc approximation to obtainthe 90% PI forY .

(d) Use the binomial calculator to deter-mine the probability that the PI willbe correct. Should it be changed?

(e) After the trials are observed, it is dis-covered thaty = 95 successes wereobtained. Comment on your answersin (a) and (c).

2. Beth plans to observem = 300 BT. Shedoes not know the value ofp. Beth has pre-vious data which consists ofx = 62 suc-cesses inn = 200 BT.


(b) Obtain the 90% PI forY .

(c) After the trials are observed, it is dis-covered thaty = 100 successes wereobtained. Comment on your answersin (a) and (b).

3. *Carly plans to observeY ∼ Poisson(169).


(b) Use the Poisson calculator to deter-mine the probability that the pointprediction will be correct.

(c) Use the snc approximation to obtainthe 80% PI forY .

(d) Use the Poisson calculator to deter-mine the probability that the PI willbe correct. Should it be changed?

(e) After the trials are observed, it is dis-covered thaty = 150 successes wereobtained. Comment on your answersin (a) and (c).

4. *Dave plans to observe Poisson Processwith unknown rateλ per minute. He plansto observe the PP for two hours and usesYto denote the total number of successes thathe will observe. Dave has previous data onthe PP for which 80 minutes yielded 294successes.


(b) Use the snc approximation to obtainthe 80% PI forY .

(c) After the trials are observed, it is dis-covered thaty = 470 successes wereobtained. Comment on your answersin (a) and (b).

13

Chapter 9 Lecture Examples: FALL 2011In 1984, the State of Wisconsin Department ofTransportation conducted a large survey of its li-censed drivers. For the sake of problems 1 and 2below, we will assume that it is valid to pretendthat the drivers who were actually surveyed werea random sample from the population of all li-censed drivers in Wisconsin in 1984.

Each driver was asked to self-report his/herfrequency of drinking alcohol with four op-tions: Several times per week; Several times permonth; Less often; and Not at all. Some peo-ple, of course, refused to answer and they areexcluded from the analyses below.

Each person was asked the following: Whichdo you feel would be most effective in keep-ing people from driving after too much to drink?This question was followed by five possible op-tions: Strong enforcement; Revoking driver li-cense; More education; Mandatory jail terms;and Higher fines.

For each of the five options the subject couldrate the proposal as: Most important; Fairly im-portant; Somewhat important; or Not very im-portant. To avoid making this long exampleeven longer, I will collapse this response into adichotomy: Most important (Success); or Other(Failure).

Thus, we have four populations, determinedby self-reported drinking frequency, and wecan compare these four populations on fiveresponses—one for each legal option. But, forbrevity, we will consider two responses only.See Bob Wardrop for more information if youare interested.

1. Strong enforcement. Below is a table ofcounts generated by the data.

Drinking S F Total pSev/Week 176 147 323 0.545Sev/Month 347 243 590 0.588Less often 623 294 917 0.679Not at all 343 124 467 0.734

Total 1489 808 2297

Before comparing these populations two-at-a-time, we will conduct a chi-squaredtest as presented in Chapter 6. With thehelp of our online test site, we findχ2 =43.609 with df = 3 which gives (detailsnot shown) an approximate P-value of 1.8in a billion. Thus, it seems quite clear thatwe should reject the (chi-squared) null hy-pothesis that the four populations have thesame proportion of successes.

(a) Perform six Fisher’s Tests to comparethe four populations in pairs. Re-port your six (two-sided) P-values andcomment.

(b) Compare the four populations by cal-culating six pairwise confidence inter-vals. Define your populations so thatp1 − p2 is always positive and use the95% confidence level for each inter-val.

15


2. Higher fines. Below is a table of countsgenerated by the data.

Drinking S F Total pSev/Week 115 204 319 0.360Sev/Month 218 360 578 0.377Less often 453 442 895 0.506Not at all 258 191 449 0.575

Total 1044 1197 2241

Before comparing these populations two-at-a-time, we will conduct a chi-squaredtest as presented in Chapter 6. With thehelp of our online test site, we findχ2 =59.682 with df = 3 which gives (detailsnot shown) an approximate P-value of 7 ina trillion. Thus, it seems quite clear that weshould reject the (chi-squared) null hypoth-esis that the four populations have the sameproportion of successes.

(a) Perform six Fisher’s Tests to comparethe four populations in pairs. Re-port your six (two-sided) P-values andcomment.

(b) Compare the four populations by cal-culating six pairwise confidence inter-vals. Define your populations so thatp1 − p2 is always positive and use the95% confidence level for each inter-val.

3. For the 2008–2009 men’s basketball team,Joe Krabbenhoft made 66 free throws in78 attempts. Marcus Landry made 57 freethrows in 91 attempts.

Use these data to decide which man wasthe better free throw shooter. Use both a

confidence interval and a test of hypothe-ses. For the test use the two-sided alterna-tive andα = 0.05. (Take Joe’s shots to bepopulation 1.)

4. An observational study yields the following“collapsed table.”

Group S F Total1 72 228 3002 88 212 300

Total 160 440 600

Below are two component tables for thesedata. Complete these tables so that Simp-son’s Paradox is occurringor explain whySimpson’s Paradoxcannot occur for thesedata. For the latter, you must provide com-putations that justify your answer.

Subgp A Subgp BGp S F Tot Gp S F Tot1 30 30 60 1 42 198 2402 120 2 180

Tot 180 Tot 420

5. Refer to the previous question. Supposethat the collapsed table is changed slightlyto:

Group S F Total1 72 228 3002 92 208 300

Total 164 436 600

With the same given component tables,complete the tables so that Simpson’s Para-dox is occurringor explain why Simpson’sParadoxcannot occur for these data. Forthe latter, you must provide computationsthat justify your answer.

16


1. I have drawn a histogram for 400 obser-vations. One of the rectangles has end-points of 0.50 and 0.60, and a height of0.25. For each of the three situations be-low, determinehow many observations arein this class interval [0.50 to 0.60), with theusual endpoint convention.

(a) If it is a frequency histogram.

(b) If it is a relative frequency histogram.

(c) If it is a density scale histogram.

2. Below are 100 sorted observations.

0.01 0.02 0.02 0.05 0.060.09 0.14 0.14 0.15 0.150.15 0.16 0.20 0.21 0.240.26 0.31 0.32 0.34 0.380.39 0.40 0.47 0.47 0.490.49 0.51 0.53 0.53 0.550.59 0.61 0.62 0.63 0.640.78 0.82 0.86 0.93 0.940.97 0.99 1.03 1.04 1.051.18 1.19 1.29 1.35 1.361.37 1.39 1.46 1.48 1.511.52 1.59 1.61 1.70 1.701.75 1.79 1.79 1.79 1.811.86 1.97 2.09 2.11 2.132.16 2.23 2.38 2.39 2.492.52 2.56 2.71 2.88 3.003.13 3.37 3.49 3.56 3.583.61 3.85 4.05 4.20 4.224.40 4.52 4.74 5.16 5.275.37 5.48 5.90 6.75 6.82

(a) Calculate the median of these data.

(b) Draw a relative frequency histogramof these data. Use five intervals ofequal width, beginning at 0.00 andending at 7.50. Clearly label theheight and endpoints of each of thefive rectangles.

(c) Draw a density scale histogram ofthese data. Use 0.00–0.50, 0.50–1.00,1.00–2.00, 2.00–4.00 and 4.00–7.00as your class intervals. Clearly labelthe height and endpoints of each ofthe four rectangles.

3. Refer to the data in example 2 above.

(a) Calculate the first and third quartilesof these data.

(b) Given the mean and standard devia-tion of these data are 1.78 and 1.65,respectively, determine thepropor-tion of observations that are withinone standard deviation of the mean.How does your proportion compare tovalue predicted by the empirical rule?Are you are surprised by the agree-ment/disagreement? Comment.

4. A sample of sizen = 28 yields the follow-ing sorted data. Note that the largest num-ber in the list has been replaced by the letter‘y.’

384 420 422 456 462 466 486492 494 518 520 572 576 580594 618 622 630 642 644 646650 686 712 754 764 790 y

Hint: The mean of these data is 586.0.

(a) Calculate the median and the firstquartile of these data.

(b) Suppose we discover that the obser-vation 384 is an error. Recalculate themedian, the first quartile and the meanafter deleting the observation 384.

17


1. The cat population has the following prob-ability distribution.

x 0 1 2 3P (X = x) 0.1 0.5 0.3 0.1

It can be shown thatµ = 1.4 andσ = 0.8.For n = 25, using Formula (11.2) we getthat x ± 1.96(0.8/

√25) = x ± 0.3136 is

an approximate 95% CI forµ, whenσ isknown. I performed a 10,000 run simula-tion study on this CI.

(a) Explain, in detail, the steps in eachrun.

(b) Here are the results of the simula-tion study: 268 simulated CI’s weretoo small (this means that the up-per bound of the CI was smaller thanµ = 1.4); and 340 simulated CI’swere too large (this means that thelower bound of the CI was larger thanµ = 1.4). Discuss these findings.

(c) I repeated the above simulation studywith 10,000 runs with the change thatnow n = 100. The results were: 235simulated CI’s weretoo small; 279simulated CI’s weretoo large. Dis-cuss these findings.

2. A researcher obtained the weights, ingrams, ofn = 13 web weaver spiders.We have the following summary statistics:x = 0.3582 ands = 0.1140.

(a) Calculate Gosset’s 95% CI for thepopulation mean weight of webweaver spiders.

(b) How would you obtain a random sam-ple of web weaver spiders?

(c) Test the null hypothesis that the pop-ulation mean equals 0.5 grams versusthe two-sided alternative. Find the P-value and useα = 0.05.

(d) Here are the sorted weights of thesespiders:

0.106 0.234 0.287 0.324 0.3250.325 0.357 0.387 0.404 0.4220.439 0.506 0.540

Use these data to calculate a CI forthe population median,ν. Use a con-fidence level that is close to 95%.

3. Refer to the problem 1.

(a) I performed a 10,000 run simulationstudy with 95% confidence level,n =10 and both the Slutsky (z = 1.96)and Gosset (t = 2.262) CI’s for thecat population. Here is what I found:

• For Slutsky, 451 CI’s were toosmall and 371 were too large.

• For Gosset, 362 CI’s were toosmall and 148 were too large.

Discuss these findings.

(b) I performed a 10,000 run simulationstudy with 95% confidence level,n =20 and both the Slutsky (z = 1.96)and Gosset (t = 2.093) CI’s for thecat population. Here is what I found:


• For Gosset, 332 CI’s were toosmall and 208 were too large.


(c) I performed a 10,000 run simulationstudy with 95% confidence level,n =40 and both the Slutsky (z = 1.96)and Gosset (t = 2.023) CI’s for thecat population. Here is what I found:

19

Chapter 11 Lecture Examples: Cont.


• For Gosset, 320 CI’s were toosmall and were 209 too large.


4. *On page 526 of my textbook is a pic-ture of the lognormal pdf with parameters5 and 1. (This means that if we begin withY which has a N(5,1) pdf, thenX = eY

has this lognormal pdf.) It is suffice tonote thatX must be positive and its pdf isstrongly skewed to the right, as evidencedby µ = 244.7 being about 65% larger thanν = 148.4. Another indication of skewnessis that, even thoughX cannot be negative,σ = 320.8 is larger thanµ = 244.7.

When I was writing my text, I performedseveral 5,000 run simulation studies to seehow Gosset’s CI worked for this lognormaland various choices ofn. My results aresummarized in the table below.

% of Correctn ‘95%’ Gosset CI’s10 0.841020 0.876430 0.8858100 0.9092150 0.9254200 0.9306

Briefly discuss what this table reveals.

5. Four researchers study the same populationand each one calculates an 80% CI forµ.Their intervals are:

[15, 38], [44, 57], [25, 52] and[39, 63].

(a) Nature announces, “Two of the CI’sare correct; one is too small; and one

is too large.” Given this information,determine all possible values ofµ.

(b) Nature announces, “Exactly three ofthe CI’s are correct.” Given this infor-mation, determine all possible valuesof µ.

6. Twenty researchers select random sam-ples from the same population. Each re-searcher computes a confidence interval forthe mean of the population. Thus, eachresearcher is estimating the same number.Unfortunately, the lower and upper boundsof the 20 confidence intervals became dis-connected. The 20 lower bounds, sorted,are below.

220 318 332 340 351364 371 378 383 424436 453 464 475 480487 489 492 511 519

The 20 upper bounds, sorted, are below.

496 507 528 534 569573 595 604 618 644674 686 696 719 725744 751 837 941 964

(a) Given thatµ = 497: How many ofthe confidence intervals are too small?How many are too large? How manyare correct?

(b) Suppose we are told that exactly threeof the intervals are too large. Deter-mine all possible values ofµ.

(c) Suppose we are told that at most fourof the intervals are too small. Deter-mine all possible values ofµ.

(d) Nature announces that at least 17 ofthe CI’s are correct. Determine allpossible values ofµ that yield thisnumber of correct intervals.

20


1. A 1991 study compared a treatment anda placebo in postmenopausal women withosteoporosis. (See the references to Chap-ter 16 of my text for more information.)Forty subjects were available for study andthey were divided into two equal sizedgroups by randomization. The responsewas the percentage change in ‘bone min-eral content’ from baseline to the end ofthe study. In the first group, calcium sup-plement, the sample mean was 5.3 percentwith a standard deviation of 7.05 percent.In the second group, placebo, the samplemean was−2.7 percent with a standard de-viation of 9.83 percent.

(a) Use Case 3 to obtain the 95% CI forµ1 − µ2.

(b) Use Case 1 to obtain the 95% CI forµ1 − µ2.

(c) Compare your answers in (a) and (b).

(d) Find the P-value for testingµ1 = µ2

versusµ1 > µ2. Use Case 3.

(e) Repeat (d) for Case 1.

(f) Compare your answers in (d) and (e).

2. Calculatesp in each of the following cases.

(a) n1 = 8, n2 = 14, s1 = 5.00 ands2 =8.50.

(b) n1 = 12, n2 = 15, s1 = 9.00 ands2 = 11.50.

3. For this problem, refer to the t-curve calcu-lator linked to our course website. Remem-ber that there are three boxes and a choice.The top box is for thedf . The left box has achoice, either ‘Area left of’ and ‘Area rightof.’

For each of the situations below, first tellme what to enter in each box and which

choice to select to obtain the desired an-swer. Next, do it and tell me the answer.

(a) n1 = 8, n2 = 14 and we want thetneeded for the 90% CI forµ1 − µ2.

(b) n1 = 12, n2 = 15 and we want thetneeded for the 98% CI forµ1 − µ2.

(c) n1 = 8, n2 = 14, t = 2.734 and wewant the P-value for the alternative>.

(d) n1 = 8, n2 = 14, t = 2.734 and wewant the P-value for the alternative6=.

(e) n1 = 12, n2 = 15, t = −1.342 andwe want the P-value for the alterna-tive <.

4. Wrigley Field, the home of the ChicagoCubs, is widely believed to be a great hit-ter’s park, especially for power hitters. Thisexercise will attempt to investigate this is-sue empirically. In 1987, there were 204home runs hit in Wrigley Field, comparedto a mean of 147.3 at the eleven other Na-tional League ball parks. This, however, isnot a good way to compare stadiums. Theowners of the Cubs built their team to takeadvantage of their home park. For exam-ple, they signed Andre Dawson as a freeagent before the 1987 season and Dawsonled the league that season with 49 homeruns. A better method is to compare thenumber of home runs hit by both teams inthe Cubs’ home games to the number hit byboth teams in the Cubs’ road games. Ex-plain why this is an improvement.

Below are the numbers of home runs hit,by both teams, in all Cubs’ home and awaygames during the 20 seasons, 1967–1987.(The 1981 season is omitted because itwas shortened substantially by a players’strike.)

21

Chap. 12 Lecture Examples: Continued

Assume the 20 differences are the observedvalues of i.i.d. random variables. The meanand standard deviation of the differencesare equal to 46.2 and 24.42 home runs.

(a) Construct a 95% confidence intervalfor the mean difference,µd.

(b) Find the P-value for testingµd = 25versusµd > 25.

SeasonLocation 1967 1968 1969 1970Home 160 166 148 201Away 110 102 112 121H−A 50 64 36 80

1971 1972 1973 1974 1975 1976144 146 138 139 125 155116 99 107 93 100 7328 47 31 46 25 82

1977 1978 1979 1980 1982151 117 151 116 11588 80 111 100 11263 57 40 16 3

1983 1984 1985 1986 1987140 156 202 168 204117 79 104 130 16423 77 98 38 40

5. Refer to the data in the previous problem.Pretend that the data are actually two in-dependent sequences of i.i.d. trials, withn1 = n2 = 20. With this scenario, we getthe following summary statistics:

x = 152.10, y = 105.90, sx = 26.55 and

sy = 20.04.

Construct a 95% confidence interval for thedifference of the means. Compare your an-swer to themore valid answer in the previ-ous problem and comment.

6. I have a computer program with the follow-ing property. I input:x, s1, n1, y, s2, n2 andthe confidence level I want. For output, itgives me the Case 1 confidence interval forµ1 − µ2.

Yesterday, I input these seven numbers, in-cluding n1 = 7, n2 = 9 and 90% level.The 90% confidence interval obtained bythe program was:15.00 ± 10.00.

Today, I discovered that there were two er-rors in my data entry yesterday: the actualvalue ofn1 is 13 and the actual value ofn2

is 17.

(a) Explain why the value ofs2

p for thecorrect sample sizes is the same as itwas for the incorrect sample sizes.

(b) For the corrected values of the sam-ple sizes, obtain the 95% Case 1 con-fidence interval forµ1 − µ2.

7. Refer to the previous question.

Yesterday, I input the seven numbers, in-cludingn1 = n2 = 8 and 90% level. The90% confidence interval obtained by theprogram was:20.00 ± 12.00.

Today, I discovered that there was an error(two errors?) in my data entry yesterday:the actual value ofn1 = n2 is 12.

(a) Explain why the value ofs2

p for thecorrect sample sizes is the same as itwas for the incorrect sample sizes.

(b) For the corrected values of the sam-ple sizes, obtain the 95% Case 1 con-fidence interval forµ1 − µ2.

22


1. Below is a table of population counts.

B Bc TotalA 60 40 100Ac 140 260 400

Total 200 300 500

(a) Create the table of population propor-tions, a.k.a. probabilities.

(b) Create the table of the conditionalprobabilities of theA’s given theB’s.

(c) Create the table of the conditionalprobabilities of theB’s given theA’s.

(d) Suppose that this table is for a screen-ing test for a disease. Describe, inwords, each of the eight conditionalprobabilities.

2. My neighbor has trained her dog to barkwhenever she (my neighbor, not the dog)makes a basket. From my perch on myporch, I can see my neighbor shooting freethrows, but cannot see the basket she isshooting at. In addition, whether she makesthe shot or not, she gives no reaction that Ican see. In addition, she claims all of thefollowing to be true:

• Her free throws are BT’s withp =0.75.

• Given that she makes (misses) a freethrow, there is an 80% (28%) chancethat her dog will bark.

Assume that all of my neighbors probabili-ties are correct.

(a) Define eventsA, Ac, B andBc

(b) Create the table of probabilities.

(c) Calculate the probability that myneighbor’s dog will bark on any givenshot.

(d) On a given shot, given that the dogbarks, what is the probability myneighbor made the shot?

(e) On a given shot, given that the dogdoes not bark, what is the probabilitymy neighbor missed the shot?

3. This example is about relative risks andodds ratios. Below is a table of hypothet-ical population counts, in thousands.

Group B Bc TotalA 9 91 100Ac 27 873 900

Total 36 964 1000

A case control study with 1000 subjectsfrom this population yielded the data be-low.

Group B Bc TotalA 126 46 172Ac 374 454 828

Total 500 500 1000

(a) Calculate the relative risk and odds ra-tio for the population; comment.

(b) Calculate the point estimate of thepopulation odds ratio.

(c) Obtain the 95% CI for the populationodds ratio.

23

Chap. 13 Lecture Examples: Continued

4. Below are data that you have seen previ-ously in homework. They are the data onRick Robey shooting free throws.

Second ShotFirst Shot Made (B) Missed (Bc) TotalMade (A) 54 37 91Missed (Ac) 49 31 80Total 103 68 171

(a) Find the P-values for the test ofpA =pB versus each of the three possiblealternatives.

(b) Obtain the 80% CI forpA − pB.

5. Linda performed a randomized pairs de-sign with 50 trials to investigate whetherMaisie is better at catching a 28 inch fris-bee (the first treatment) or a 36 inch frisbeein her mouth. Each trial was a toss of afrisbee and Maisie was credited with a suc-cess if she caught the frisbee before it hitthe ground. Here are Maisie’s results:

• For six pairs of trials she obtained asuccess with each frisbee.

• For four pairs of trials she obtained afailure with each frisbee.

• For six pairs of trials she obtained asuccess with the smaller risbee but afailure with the larger frisbee.

Answer questions (a) and (b) from 3.

24


Use the following computer output to answer questions 1–11.

Regression Analysis: Runs versus OPSThe regression equation isRuns = - 912 + 2205 OPS

Predictor Coef SE Coef T PConstant -912.4 120.5 -7.57 0.000OPS 2205.4 163.0 13.53 0.000

S = 16.91 R-Sq = 92.9% R-Sq(adj) = 92.4%

Analysis of Variance

Source DF SS MS F PRegression 1 52368 52368 183.12 0.000Residual Error 14 4004 286Total 15 56372

Obs OPS Runs Fit SE Fit Residual St Resid1 0.781 820.00 810.05 8.04 9.95 0.672 0.784 804.00 816.67 8.46 -12.67 -0.873 0.767 785.00 779.18 6.21 5.82 0.374 0.758 780.00 759.33 5.23 20.67 1.295 0.756 772.00 754.92 5.05 17.08 1.066 0.744 735.00 728.45 4.30 6.55 0.407 0.747 730.00 735.07 4.42 -5.07 -0.318 0.742 720.00 724.04 4.25 -4.04 -0.259 0.743 710.00 726.25 4.28 -16.25 -0.99

10 0.738 707.00 715.22 4.23 -8.22 -0.5011 0.712 673.00 657.88 6.11 15.12 0.9612 0.729 671.00 695.37 4.53 -24.37 -1.5013 0.699 657.00 629.21 7.78 27.79 1.8514 0.719 643.00 673.32 5.34 -30.32 -1.8915 0.701 638.00 633.62 7.51 4.38 0.2916 0.705 636.00 642.44 6.98 -6.44 -0.42

25

Chap. 14 Lecture Examples: Cont.

1. What is the point estimate of the slope ofthe regression line? Interpret this number.

2. Calculate the 99% CI for the slope of theregression line.

3. Find the P-value for testing the null hypoth-esis thatβ1 = 2000 versus the alternativeβ1 > 2000.

4. Calculate the 98% CI for the mean ofYgiven thatX = 0.781.

5. Calculate the 98% CI for the mean ofYgiven thatX = 0.738.

6. Calculate the 90% prediction interval forthe value ofY given thatX = 0.767.

7. Calculate the 90% prediction interval forthe value ofY given thatX = 0.701.

8. The output tells you thats = 16.91. Calcu-late this value by using the ANOVA tableoutput.

9. The output tells you thatR2 = 0.929. Cal-culate this value by using the ANOVA tableoutput.

10. Fill in the blanks in the following.

According to the empirical rule, about68% of the residuals will fall between

and .

Determine, by counting and then dividingby 16, the actual percentage of residualsthat fall between your two numbers above.

26

Download - *Whenever the ﬁrst word of a problem is pre- ) = 0 20 ...pages.stat.wisc.edu/~wardrop/courses/f2011lectexamp14.pdf · *Whenever the ﬁrst word of a problem is pre-ceeded by an

Top Related

Download - Whenever the ﬁrst word of a problem is pre- ) = 0 20 ...pages.stat.wisc.edu/~wardrop/courses/f2011lectexamp14.pdf · Whenever the ﬁrst word of a problem is pre-ceeded by an