copyright © 2011 pearson education, inc. alternative approaches to inference chapter 17

35

Upload: melissa-williams

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17
Page 2: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

Copyright © 2011 Pearson Education, Inc.

Alternative Approaches to

Inference

Chapter 17

Page 3: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

An auto insurance company is thinking about compensating agents by comparing the number of claims they produce to a standard. Annual claims average near $3,200 with a median claim of $2,000.

Claims are highly skewed Use nonparametric methods that don’t rely on a

normal sampling distribution

Copyright © 2011 Pearson Education, Inc.

3 of 35

Page 4: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Distribution of Sample of Claims (n = 42)

For this sample, the average claim is $3,632 with s = $4,254. The median claim is $2,456.

Copyright © 2011 Pearson Education, Inc.

4 of 35

Page 5: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Is Sample Mean Compatible with µ=$3,200?

To answer this question, construct a 95% confidence interval for µ

This interval is $3,632 ± 2.02 x $4,254 / [$2,306 to $4,958]

Copyright © 2011 Pearson Education, Inc.

5 of 35

42

Page 6: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Is Sample Mean Compatible with µ=$3,200?

The national average of $3,200 lies within the 95% confidence t-interval for the mean.

BUT…the sample does not satisfy the sample size condition necessary to use the t-interval.

The t-interval is unreliable with unknown coverage when the conditions are not met.

Copyright © 2011 Pearson Education, Inc.

6 of 35

Page 7: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Nonparametric Statistics

Avoid making assumptions about the shape of the population.

Often rely on sorting the data.

Suited to parameters such as the population median θ (theta).

Copyright © 2011 Pearson Education, Inc.

7 of 35

Page 8: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Nonparametric Statistics

For the claims data that are highly skewed to the right, θ < µ.

If the population distribution is symmetric, then θ = µ.

Copyright © 2011 Pearson Education, Inc.

8 of 35

Page 9: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

First step in finding a confidence interval for θ is to sort the observed data in ascending order (known as order statistics).

Order statistics are denoted asX(1) < X(2) < … < X(n)

Copyright © 2011 Pearson Education, Inc.

9 of 35

Page 10: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

If data are an SRS from a population with median θ, then we know

1. The probability that a random draw from the population is less than or equal to θ is ½,

2. The observations in the random sample are independent.

Copyright © 2011 Pearson Education, Inc.

10 of 35

Page 11: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

Determine the probabilities that the population median lies between ordered observations using the binomial distribution.

To form the confidence interval for θ combine several segments to achieve desired coverage.

Copyright © 2011 Pearson Education, Inc.

11 of 35

Page 12: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

In general, can’t construct a confidence interval for θ whose coverage is exactly 0.95.

The 94.6% confidence interval for the median claim is [$1,217 to $3,168].

Copyright © 2011 Pearson Education, Inc.

12 of 35

Page 13: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.1 A Confidence Interval for the Median

Parametric versus Nonparametric

Limitations of nonparametric methods

1. Coverage is limited to certain values determined by sums of binomial probabilities (difficult to obtain exactly 95% coverage).

2. Median is not equal to the mean if the population distribution is skewed. This prohibits obtaining estimates for the total (total = nµ).

Copyright © 2011 Pearson Education, Inc.

13 of 35

Page 14: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.2 Transformations

Transform Data into Symmetric Distributions

Taking base 10 logs of the claims data results in a more symmetric distribution.

Copyright © 2011 Pearson Education, Inc.

14 of 35

Page 15: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.2 Transformations

Transform Data into Symmetric Distributions

Taking base 10 logs of the claims data results in data that could be from a normal distribution.

Copyright © 2011 Pearson Education, Inc.

15 of 35

Page 16: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.2 Transformations

Transform Data into Symmetric Distributions

If y = log10 x, then = 3.312 with sy = 0.493.

The 95% confidence t-interval for µy is

[3.16 to 3.47].

If we convert back to the original scale of dollars, this interval resembles that for the median rather than that for the mean.

Copyright © 2011 Pearson Education, Inc.

16 of 35

y

Page 17: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.3 Prediction Intervals

Prediction Interval: an interval that holds a future

draw from the population with chosen probability.

For the auto insurance example, a prediction interval anticipates the size of the next claim, allowing for the random variation associated with an individual.

Copyright © 2011 Pearson Education, Inc.

17 of 35

Page 18: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.3 Prediction Intervals

For a Normal Population

The 100 (1 – α)% prediction interval for an independent draw from a normal population is

where and s estimate µ and σ.

Copyright © 2011 Pearson Education, Inc.

18 of 35

nstx

n

11

1,2/

x

Page 19: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.3 Prediction Intervals

Nonparametric Prediction Interval

Relies on the properties of order statistics:

P(X(i) ≤ X ≤ X(i+1)) = 1/(n + 1)

P(X ≤ X(1)) = 1/(n + 1)

P(X(n) ≤ X) = 1/(n + 1)

Copyright © 2011 Pearson Education, Inc.

19 of 35

Page 20: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.3 Prediction Intervals

Nonparametric Prediction Interval

Combine segments to get desired coverage.

P (X(2) ≤ X ≤ X(41)) = P ($255 ≤ X ≤ $17,305)

= (41 – 2)/43 0.91

There is a 91% chance that the next claim is between $255 and $17,305.

,

Copyright © 2011 Pearson Education, Inc.

20 of 35

Page 21: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.1: EXECUTIVE SALARIES

Motivation

Fees earned by an executive placement service are 5% of the starting annual total compensation package. How much can the firm expect to earn by placing a current client as a CEO in the telecom industry?

Copyright © 2011 Pearson Education, Inc.

21 of 35

Page 22: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.1: EXECUTIVE SALARIES

Method

Obtain data (n = 23 CEOs from telecom industry).

Copyright © 2011 Pearson Education, Inc.

22 of 35

Page 23: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.1: EXECUTIVE SALARIES

Method

The distribution of total compensation for CEOs in the telecom industry is not normal. Construct a nonparametric prediction interval for the client’s anticipated total compensation package.

Copyright © 2011 Pearson Education, Inc.

23 of 35

Page 24: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.1: EXECUTIVE SALARIES

Mechanics

Sort the data:

Copyright © 2011 Pearson Education, Inc.

24 of 35

Page 25: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.1: EXECUTIVE SALARIES

Mechanics

The interval x(3) to x(21) is

$743,801 to $29,863,393

and is a 75% prediction interval.

Copyright © 2011 Pearson Education, Inc.

25 of 35

Page 26: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.1: EXECUTIVE SALARIES

Message

The compensation package of three out of four placements in this industry is predicted to be in the range from about $750,000 to $30,000,000. The implied fee ranges from $37,500 to $1,500,000.

Copyright © 2011 Pearson Education, Inc.

26 of 35

Page 27: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.4 Proportions Based on Small Samples

Wilson’s Interval for a Proportion

An adjustment that moves the sampling distribution of closer to ½ and away from the troublesome boundaries at 0 and 1.

Add four artificial cases (2 successes and 2 failures) to create an adjusted proportion .

Copyright © 2011 Pearson Education, Inc.

27 of 35

p~

Page 28: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

17.4 Proportions Based on Small Samples

Wilson’s Interval for a Proportion

Add 2 successes and 2 failures to the data and define = (# of successes+2)/n+4 ( = n+4).

The z-interval is

Copyright © 2011 Pearson Education, Inc.

28 of 35

p~ n~

n

ppzp ~

)~1(~~2/

Page 29: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.2: DRUG TESTING

Motivation

A company is developing a drug to prolong time before a relapse of cancer. The drug must cut the rate of relapse in half. To test this drug, the company first needs to know the current time to relapse.

Copyright © 2011 Pearson Education, Inc.

29 of 35

Page 30: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.2: DRUG TESTING

Method

Data are collected for 19 patients who were observed for 24 months. Doctors found a relapse in 9 of the 19 patients. While the SRS condition is satisfied, the sample size condition is not. Use Wilson’s interval for a proportion.

Copyright © 2011 Pearson Education, Inc.

30 of 35

Page 31: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.2: DRUG TESTING

Mechanics

By adding two successes and two failures, we have

The interval is

0.478 ± 1.96 = [0.27 to 0.68]

Copyright © 2011 Pearson Education, Inc.

31 of 35

478.0)419/()29(~ p

)419/()478.01(478.0

Page 32: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

4M Example 17.2: DRUG TESTING

Message

We are 95% confident that the proportion of patients with this cancer that relapse within 24 months is between 27% and 68%. In order to cut this proportion in half, the drug will have to reduce this rate to somewhere between 13% and 34%.

Copyright © 2011 Pearson Education, Inc.

32 of 35

Page 33: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

Best Practices

Check the assumptions carefully when dealing with small samples.

Consider a nonparametric alternative if you suspect non-normal data.

Use the adjustment procedure for proportions from small samples.

Verify that your data are an SRS.

Copyright © 2011 Pearson Education, Inc.

33 of 35

Page 34: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

Pitfalls

Avoid assuming that populations are normally distributed in order to use a t – interval for the mean.

Do not use confidence intervals based on normality just because they are narrower than a nonparametric interval.

Do not think that you can prove normality using a normal quantile plot.

Copyright © 2011 Pearson Education, Inc.

34 of 35

Page 35: Copyright © 2011 Pearson Education, Inc. Alternative Approaches to Inference Chapter 17

Pitfalls (Continued)

Do not rely on software to know which procedure to use.

Do not use a confidence interval when you need a prediction interval.

Copyright © 2011 Pearson Education, Inc.

35 of 35