confidence intervals: bootstrap distribution 2/6/12 more on confidence intervals bootstrap...

44
Confidence Intervals: Bootstrap Distribution 2/6/12 • More on confidence intervals • Bootstrap distribution • Other levels of confidence Section 3.2, 3.3 Professor Kari Lock Morgan Duke University

Upload: hilary-wilkerson

Post on 11-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Confidence Intervals:Bootstrap Distribution

2/6/12

• More on confidence intervals• Bootstrap distribution• Other levels of confidence

Section 3.2, 3.3 Professor Kari Lock MorganDuke University

Page 2: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

FEMMES

Females Excelling More in Math, Engineering, and Science

http://www.duke.edu/web/FEMMES/

•Program for 4-6th grade girls•Saturday, Feb 18th, morning

•I’ll be leading a statistics activity. Any girls interested in helping out? Let me know after class or email me.

Page 3: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Confidence Intervals

The National Football Conference (NFC) has won 25 super bowls, while the American Football Conference (AFC) has won 21. You want to estimate the proportion of super bowls won by the NFC.

Would a confidence interval be appropriate?

a) Yesb) No

Page 4: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Confidence IntervalsBased on Nielson Media Research, last night’s super bowl received a 47.8 rating and 71 share.

“The rating is the percentage of television households tuned to a broadcast, and the share is the percentage of homes watching among those with TVs on at the time”

Nielson’s numbers are based on “a cross-section of representative homes throughout the U.S”.

You want to estimate the true rating and share. Would a confidence interval be appropriate?

a) Yesb) No

 http://www.foxnews.com/sports/2012/02/06/overnight-rating-for-super-bowl-just-shy-mark/#ixzz1lcjldi37

Page 5: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Confidence IntervalsIf context were added, which of the following would be an appropriate interpretation for a 95% confidence interval:

a) “we are 95% sure the interval contains the parameter” b)“there is a 95% chance the interval contains the parameter”c)Both (a) and (b)d)Neither (a) or (b)

Page 6: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Summary from Last Class• To create a confidence interval:

1. Take many random samples from the population, and compute the sample statistic for each sample

2. Compute the standard error as the standard deviation of all these statistics

3. Use statistic 2SE

• One small problem…

Page 7: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Reality

… WE ONLY HAVE ONE SAMPLE!!!!

• How do we know how much sample statistics vary, if we only have one sample?!?

BOOTSTRAP!

Page 8: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Sample: 52/100 orange

Where might the “true” p be?

ONE Reese’s Pieces Sample

ˆ 0.52p

Page 9: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

• Imagine the “population” is many, many copies of the original sample

• (What do you have to assume?)

“Population”

Page 10: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Reese’s Pieces “Population”

Sample repeatedly from this “population”

Page 11: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

• To simulate a sampling distribution, we can just take repeated random samples from this “population” made up of many copies of the sample

• In practice, we do this by sampling with replacement from the sample we have (each unit can be selected more than once)

Sampling with Replacement

Page 12: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

• How would you take a bootstrap sample from your sample of Reese’s Pieces?

Reese’s Pieces

Page 13: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Reese’s Pieces “Population”

Page 14: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

• A bootstrap sample is a random sample taken with replacement from the original sample, of the same size as the original sample

• A bootstrap statistic is the statistic computed on the bootstrap sample

• A bootstrap distribution is the distribution of many bootstrap statistics

Bootstrap

Page 15: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Original Sample

BootstrapSample

BootstrapSample

BootstrapSample

.

.

.

Bootstrap Statistic

Sample Statistic

Bootstrap Statistic

Bootstrap Statistic

.

.

.

Bootstrap Distribution

Page 16: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

“Pull yourself up by your bootstraps”

Why “bootstrap”?

• Lift yourself in the air simply by pulling up on the laces of your boots

• Metaphor for accomplishing an “impossible” task without any outside help

Page 18: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Bootstrap statistics are to the original sample statistic

as

the original sample statistic is to the population parameter

Golden Rule of Bootstrapping

Page 19: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Standard Error

•The variability of the bootstrap statistics is similar to the variability of the sample statistics

• The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!

Page 20: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Based on this sample, give a 95% confidence interval for the true proportion of Reese’s Pieces that are orange.

a) (0.47, 0.57)b) (0.42, 0.62)c) (0.41, 0.51)d) (0.36, 0.56)e) I have no idea

Reese’s Pieces

0.52 2 × 0.05

Page 21: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples.

What is the sample size of each bootstrap sample?

(a) 50(b) 1000

Bootstrap Distribution

Page 22: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples.

How many bootstrap statistics will you have?

(a) 50(b) 1000

Bootstrap Distribution

Page 23: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples.

How many dots will be in a dotplot of the bootstrap distribution?

(a) 50(b) 1000

Bootstrap Distribution

Page 24: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Atlanta Commutes

What’s the mean commute time for workers in metropolitan Atlanta?

Data: The American Housing Survey (AHS) collected data from Atlanta in 2004

Page 25: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Where might the “true” μ be?

Time20 40 60 80 100 120 140 160 180

CommuteAtlanta Dot Plot

 

Random Sample of 500 Commutes

WE CAN BOOTSTRAP TO FIND OUT!!!

Page 26: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Original Sample

Page 27: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Sample from this “population”

“Population” = many copies of sample

Page 28: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Bootstrap Distributionlock5stat.com/statkey/

95% confidence interval for the average commute time for Atlantans:(a) (28.2, 30.0) (b) (27.3, 30.9) (c) 26.6, 31.8 (d) No idea

Page 29: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

• We can use bootstrapping to assess the uncertainty surrounding ANY sample statistic!

• If we have sample data, we can use bootstrapping to create a 95% confidence interval for any parameter!

(well, almost…)

The Magic of Bootstrapping

Page 30: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

What percentage of Americans believe in global warming?

A survey on 2,251 randomly selected individuals conducted in October 2010 found that 1328 answered “Yes” to the question

“Is there solid evidence of global warming?”

Global Warming

Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/10. http://pewresearch.org/pubs/1780/poll-global-warming-scientists-energy-policies-offshore-drilling-tea-party

www.lock5stat.com/statkey

Page 31: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Global Warmingwww.lock5stat.com/statkey

2 0.01

0.59 0.02

0

0.5

.57, .61

9

0

We are 95% sure that the true percentage of all Americans that believe there is solid evidence of global warming is between 57% and 61%

Page 32: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Does belief in global warming differ by political party?

“Is there solid evidence of global warming?”

The sample proportion answering “yes” was 79% among Democrats and 38% among Republicans.(exact numbers for each party not given, but assume n=1000 for each group)

Give a 95% CI for the difference in proportions.

Global Warming

Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/10. http://pewresearch.org/pubs/1780/poll-global-warming-scientists-energy-policies-offshore-drilling-tea-party

www.lock5stat.com/statkey

Page 33: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Global Warmingwww.lock5stat.com/statkey

2 0.02

0.41 0.04

0

0.4

.37, .45

1

0

We are 95% sure that the difference in the proportion of Democrats and Republicans who believe in global warming is between 0.37 and 0.45.

Page 34: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Global WarmingBased on the data just analyzed, can you conclude that the proportion of people believing in global warming differs by political party?

(a) Yes(b) No

Page 35: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Body TemperatureWhat is the average body temperature of humans?

www.lock5stat.com/statkey

98.26 2 0.105

98.26 0.21

98.05,98.47

We are 95% sure that the average body temperature for humans is between 98.05 and 98.4798.6???

Shoemaker, What's Normal: Temperature, Gender and Heartrate, Journal of Statistics Education, Vol. 4, No. 2 (1996)

Page 36: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Other Levels of Confidence

•What if we want to be more than 95% confident?

• How might you produce a 99% confidence interval for the average body temperature?

Page 37: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Percentile Method•For a P% confidence interval, keep the middle P% of bootstrap statistics

•For a 99% confidence interval, keep the middle 99%, leaving 0.5% in each tail.

• The 99% confidence interval would be

(0.5th percentile, 99.5th percentile)

where the percentiles refer to the bootstrap distribution.

Page 38: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

www.lock5stat.com/statkey

Body Temperature

We are 99% sure that the average body temperature is between 98.00 and 98.58

Middle 99% of bootstrap statistics

Page 39: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Level of Confidence

Which is wider, a 90% confidence interval or a 95% confidence interval?

(a) 90% CI(b) 95% CI

Page 40: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Two Methods for 95%

:

98.26 2 0.105 98.05,98.47

statistic ± 2×SE

98.05,98.47

Percentile Method :

Page 41: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Two Methods

•For a symmetric, bell-shaped bootstrap distribution, using either the standard error method or the percentile method will given similar 95% confidence intervals

• If the bootstrap distribution is not bell-shaped or if a level of confidence other than 95% is desired, use the percentile method

Page 42: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

www.lock5stat.com/statkey

Mercury and pH in Lakes: Correlation

We are 90% confident that the true correlation between average mercury level and pH of Florida lakes is between -0.702 and -0.433.Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993)

Page 43: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

Summary• The standard error of a statistic is the standard

deviation of the sample statistic, which can be estimated from a bootstrap distribution

• Confidence intervals can be created using the standard error or the percentiles of a bootstrap distribution

• Confidence intervals can be created this way for any parameter, as long as the bootstrap distribution is approximately symmetric and continuous

Page 44: Confidence Intervals: Bootstrap Distribution 2/6/12 More on confidence intervals Bootstrap distribution Other levels of confidence Section 3.2, 3.3 Professor

To Do

• Homework 3 (due Monday)

• Research question and data for Project 1(proposal due 2/15)

• As soon as you come up with a question and find data, you already know how to do the first half of Project 1!

• If you haven’t yet registered your clicker, email me with the number on the back