copyright © 2011 pearson education, inc. alternative approaches to inference chapter 17

Copyright © 2011 Pearson Education, Inc.

Alternative Approaches to

Inference

Chapter 17

17.1 A Confidence Interval for the Median

An auto insurance company is thinking about compensating agents by comparing the number of claims they produce to a standard. Annual claims average near $3,200 with a median claim of $2,000.

Claims are highly skewed Use nonparametric methods that don’t rely on a

normal sampling distribution


3 of 35


Distribution of Sample of Claims (n = 42)

For this sample, the average claim is $3,632 with s = $4,254. The median claim is $2,456.


4 of 35


Is Sample Mean Compatible with µ=$3,200?

To answer this question, construct a 95% confidence interval for µ

This interval is $3,632 ± 2.02 x $4,254 / [$2,306 to $4,958]


5 of 35

42


Is Sample Mean Compatible with µ=$3,200?

The national average of $3,200 lies within the 95% confidence t-interval for the mean.

BUT…the sample does not satisfy the sample size condition necessary to use the t-interval.

The t-interval is unreliable with unknown coverage when the conditions are not met.


6 of 35


Nonparametric Statistics

Avoid making assumptions about the shape of the population.

Often rely on sorting the data.

Suited to parameters such as the population median θ (theta).


7 of 35


Nonparametric Statistics

For the claims data that are highly skewed to the right, θ < µ.

If the population distribution is symmetric, then θ = µ.


8 of 35


Nonparametric Confidence Interval

First step in finding a confidence interval for θ is to sort the observed data in ascending order (known as order statistics).

Order statistics are denoted asX(1) < X(2) < … < X(n)


9 of 35



If data are an SRS from a population with median θ, then we know

1. The probability that a random draw from the population is less than or equal to θ is ½,

2. The observations in the random sample are independent.


10 of 35



Determine the probabilities that the population median lies between ordered observations using the binomial distribution.

To form the confidence interval for θ combine several segments to achieve desired coverage.


11 of 35



In general, can’t construct a confidence interval for θ whose coverage is exactly 0.95.

The 94.6% confidence interval for the median claim is [$1,217 to $3,168].


12 of 35


Parametric versus Nonparametric

Limitations of nonparametric methods

1. Coverage is limited to certain values determined by sums of binomial probabilities (difficult to obtain exactly 95% coverage).

2. Median is not equal to the mean if the population distribution is skewed. This prohibits obtaining estimates for the total (total = nµ).


13 of 35

17.2 Transformations

Transform Data into Symmetric Distributions

Taking base 10 logs of the claims data results in a more symmetric distribution.


14 of 35



Taking base 10 logs of the claims data results in data that could be from a normal distribution.


15 of 35



If y = log10 x, then = 3.312 with sy = 0.493.

The 95% confidence t-interval for µy is

[3.16 to 3.47].

If we convert back to the original scale of dollars, this interval resembles that for the median rather than that for the mean.


16 of 35

y

17.3 Prediction Intervals

Prediction Interval: an interval that holds a future

draw from the population with chosen probability.

For the auto insurance example, a prediction interval anticipates the size of the next claim, allowing for the random variation associated with an individual.


17 of 35


For a Normal Population

The 100 (1 – α)% prediction interval for an independent draw from a normal population is

where and s estimate µ and σ.


18 of 35

nstx

n

11

1,2/

x


Nonparametric Prediction Interval

Relies on the properties of order statistics:

P(X(i) ≤ X ≤ X(i+1)) = 1/(n + 1)

P(X ≤ X(1)) = 1/(n + 1)

P(X(n) ≤ X) = 1/(n + 1)


19 of 35


Nonparametric Prediction Interval

Combine segments to get desired coverage.

P (X(2) ≤ X ≤ X(41)) = P ($255 ≤ X ≤ $17,305)

= (41 – 2)/43 0.91

There is a 91% chance that the next claim is between $255 and $17,305.

,


20 of 35

4M Example 17.1: EXECUTIVE SALARIES

Motivation

Fees earned by an executive placement service are 5% of the starting annual total compensation package. How much can the firm expect to earn by placing a current client as a CEO in the telecom industry?


21 of 35


Method

Obtain data (n = 23 CEOs from telecom industry).


22 of 35


Method

The distribution of total compensation for CEOs in the telecom industry is not normal. Construct a nonparametric prediction interval for the client’s anticipated total compensation package.


23 of 35


Mechanics

Sort the data:


24 of 35


Mechanics

The interval x(3) to x(21) is

$743,801 to $29,863,393

and is a 75% prediction interval.


25 of 35


Message

The compensation package of three out of four placements in this industry is predicted to be in the range from about $750,000 to $30,000,000. The implied fee ranges from $37,500 to $1,500,000.


26 of 35

17.4 Proportions Based on Small Samples

Wilson’s Interval for a Proportion

An adjustment that moves the sampling distribution of closer to ½ and away from the troublesome boundaries at 0 and 1.

Add four artificial cases (2 successes and 2 failures) to create an adjusted proportion .


27 of 35

p̂

p~

17.4 Proportions Based on Small Samples

Wilson’s Interval for a Proportion

Add 2 successes and 2 failures to the data and define = (# of successes+2)/n+4 ( = n+4).

The z-interval is


28 of 35

p~ n~

n

ppzp ~

)~1(~~2/

4M Example 17.2: DRUG TESTING

Motivation

A company is developing a drug to prolong time before a relapse of cancer. The drug must cut the rate of relapse in half. To test this drug, the company first needs to know the current time to relapse.


29 of 35


Method

Data are collected for 19 patients who were observed for 24 months. Doctors found a relapse in 9 of the 19 patients. While the SRS condition is satisfied, the sample size condition is not. Use Wilson’s interval for a proportion.


30 of 35


Mechanics

By adding two successes and two failures, we have

The interval is

0.478 ± 1.96 = [0.27 to 0.68]


31 of 35

478.0)419/()29(~ p

)419/()478.01(478.0


Message

We are 95% confident that the proportion of patients with this cancer that relapse within 24 months is between 27% and 68%. In order to cut this proportion in half, the drug will have to reduce this rate to somewhere between 13% and 34%.


32 of 35

Best Practices

Check the assumptions carefully when dealing with small samples.

Consider a nonparametric alternative if you suspect non-normal data.

Use the adjustment procedure for proportions from small samples.

Verify that your data are an SRS.


33 of 35

Pitfalls

Avoid assuming that populations are normally distributed in order to use a t – interval for the mean.

Do not use confidence intervals based on normality just because they are narrower than a nonparametric interval.

Do not think that you can prove normality using a normal quantile plot.


34 of 35

Pitfalls (Continued)

Do not rely on software to know which procedure to use.

Do not use a confidence interval when you need a prediction interval.


35 of 35

copyright © 2011 pearson education, inc. alternative approaches to inference chapter 17

Documents