introduction to inference - mr. song's statistics

28
Sampling Distributions Introduction to Inference

Upload: others

Post on 18-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Inference - Mr. Song's Statistics

Sampling Distributions

Introduction to Inference

Page 2: Introduction to Inference - Mr. Song's Statistics

Parameter

• A parameter is a number that describes the population.

– A parameter always exists but in practice we rarely know it’s value because we cannot examine the entire population.

– We use Greek letters to describe them (μ or σ). If we are talking about a proportion of parameter, we use rho (ρ).

Page 3: Introduction to Inference - Mr. Song's Statistics

Statistic

• A statistic is a number that describes a sample. – Value of a statistic can be found when we sample.

– A statistic can change from sample to sample. (Sampling variability)

– Statistics use variables like 𝑥 , 𝑠 and 𝜌 .

– Ex: I take a random sample of 500 American males and find their IQ’s. We find that 𝑥 = 103.2.

– Ex: I take a random sample of 200 women and find that 40 like broccoli. Then 𝜌 = .2

Page 4: Introduction to Inference - Mr. Song's Statistics

Exercises

• For each of the following, use appropriate notation to describe each number.

– 9.1 Making Ball Bearings A lot of ball bearings has mean diameter 2.5003 cm. Inspector chooses 100 bearings from the lot that have the mean diameter of 2.5009 cm.

𝜇 = 2.5003; 𝑥 = 2.5009

– 9.2 Unemployment The Bureau of Labor Statistics last month interviewed 60,000 members of the U.S. labor force, of whom 7.2% were unemployed.

𝜌 = 7.2% is a statistic

Page 5: Introduction to Inference - Mr. Song's Statistics

– 9.3 Telemarketing A telemarketing firm in LA uses a device that dials residential telephone numbers in that city at random. Of the first 100 numbers dialed, 48% are unlisted. This is not surprising because 52% of all LA residential phones are unlisted.

𝜌 = 48% is a statistics; ρ = 52% 𝑖𝑠 𝑎 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 – 9.4 Well-fed Rats A researcher carries out a

randomized comparative experiment with young rats to investigate the effects of a toxic compound in food. She feeds the control group a normal diet. The experimental group receives a diet with 2500 parts per million of the toxic material. After 8 weeks, the mean weight gain is 335g for the control group and 289g for the experimental group.

Both 𝑥 1 = 335 and 𝑥 2 = 289 are statistics

Page 6: Introduction to Inference - Mr. Song's Statistics

Describing Sampling Distribution

• Television executives and companies who advertise on TV are interested in how many viewers watch particular television shows. According to 2001 Nielsen ratings, Survivor II was one of the most-watched television shows in the U.S. during every week that it aired. Suppose that the true proportion of U.S. adults who watched Survivor II is 𝜌 = 0.37. Figure 9.5 shows the results of drawing 1000 SRSs of size n=100 from a population with 𝜌 = 0.37.

Page 7: Introduction to Inference - Mr. Song's Statistics

• The overall shape of the distribution is symmetric and approximately normal.

• The center of the distribution is very close to the true value 𝜌 = 0.37.

• The values of 𝜌 have a large spread. The range from 0.22 to 0.54. Because the distribution is close to normal, we can use the StDev to describe its spread. The StDev is about 0.05.

• There are no outliers or other important deviations from the overall pattern.

Page 8: Introduction to Inference - Mr. Song's Statistics

Bias

• A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. The statistic is called an unbiased estimator of the parameter.

• We say something is biased if it’s a poor predictor.

Page 9: Introduction to Inference - Mr. Song's Statistics

• An unbiased statistic will sometimes fall above the true value of the parameter and sometimes below if we take many samples. Because its sampling distribution is centered at the true value, however, there is no systematic tendency to overestimate or underestimate the parameter.

Page 10: Introduction to Inference - Mr. Song's Statistics

The Approximate sampling distributions for sample proportions 𝜌 for SRSs of two sizes drawn from a population with 𝜌 = 0.37.

(a) Sample size 100. (b) Sample size 1000.

Page 11: Introduction to Inference - Mr. Song's Statistics

• The approximate sampling distribution of 𝜌 for samples of size 100, shown in (a), is close to the normal distribution with mean 0.37 and standard deviation 0.05. So, 95% of values of 𝜌 will fall within two standard deviation of the mean, 𝜌 = 0.37. If in fact 37% of U.S. adults have seen survivor II, the estimates from repeated SRSs of size 100 will usually fall between 27% and 47%. That’s not very satisfactory.

• For sample size 1000, shown in (b), the standard deviation is only about 0.01. So 95% of these samples will give an estimate within about 0.02 of the true parameter, that is, between 0.35 and 0.39. an SRS of size 1000 can be trusted to give sample estimates that are very close to the truth about the entire population.

Page 12: Introduction to Inference - Mr. Song's Statistics

Variability

• The variability of a statistic is described by the spread of its sampling distribution. This spread is determined by the sampling design and the size of the sample. Larger samples give smaller spread.

• As long as the population is much larger than the sample (at least 10 times as large), the spread of the sampling distribution is approximately the same for any population size.

Page 13: Introduction to Inference - Mr. Song's Statistics

• The size of the pop have little influence on the behavior of statistics from random samples

• A statistic from an SRS of size 2500 from the more than 300 million residents of the U.S. is just as precise as an SRS of size 2500 from the 775,000 inhabitants of San Francisco.

Page 14: Introduction to Inference - Mr. Song's Statistics

• Why does the size of the population have little influence on the behavior of statistics from random samples?

• Imagine sampling harvested corn by thrusting a scoop into a lot of corn kernels. The scoop doesn’t know whether it is surrounded by a bag of corn or by an entire truckload. As long as the corn is well mixed (so that the scoop selects a random sample), the variability of the result depends only on the size of the scoop.

Page 15: Introduction to Inference - Mr. Song's Statistics

Bias and Variability

• We can think of the true value of the population parameter as the bull’s-eye on a target and of the sample statistic as an arrow fired at the target. Both bias and variability describe what happens when we take many shots at the target.

Page 16: Introduction to Inference - Mr. Song's Statistics

• Bias means that our aim is off and we consistently miss the bull’s-eye in the same direction. Our sample values do not center on the population value.

• High variability means that repeated shots are widely scattered on the target. Repeated samples do not give very similar results.

Page 17: Introduction to Inference - Mr. Song's Statistics
Page 18: Introduction to Inference - Mr. Song's Statistics

The Sampling Distribution

• The sampling distribution of a statistic is the distribution of means of all possible samples of the same size from the population.

• When we sample, we sample with replacement.

• A sampling distribution is a sample space – it describes everything that can happen when we sample.

Page 19: Introduction to Inference - Mr. Song's Statistics

Central Limit Theorem

• As you take more and more SRSs of the same size, the distribution of their means will get closer and closer to a normal curve centered around the true population mean no matter what the shape of the parent population.

• The Sampling Distribution of means has a

mean of µ and a standard deviation of 𝜎

𝑛.

Page 20: Introduction to Inference - Mr. Song's Statistics
Page 21: Introduction to Inference - Mr. Song's Statistics

CLT Summary • The mean of the population (what we want to find) will

be the same as the mean of all your many samples.

• The standard Deviation of all your many samples will be the population standard deviation divided by 𝑛 (your sample size).

• The histogram of the samples will appear normal. • The larger the sample size (n), the smaller the standard

deviation will be and the more constricted the graph will be.

Page 22: Introduction to Inference - Mr. Song's Statistics

Example

• The true average study time for a final exam in history is found to be 6 hours and 25 minutes with a standard deviation of 1 hour and 45 minutes. Assume the distribution is normal. N(6.417, 1.75) – What is the probability that a student chosen at random

spends more than 7 hours studying? Normalcdf(7,100,6.417,1.75) = 37% – What is the probability that an SRS of 4 students will

average more than 7 hours in studying? Normalcdf(7,100,6.417,1.75/√4) = 25.3%. – Why did the probability go down?

• A student to study more than 7 hours is not probable…a group of 4 to average more than 7 is less probable.

Page 23: Introduction to Inference - Mr. Song's Statistics

Example 2

• The length of pregnancy from conception to birth varies normally with a mean of 266 days and a standard deviation of 16 days – What is the probability that a woman chosen at random

has a pregnancy lasting more than 270 days? 40.1% – What is the probability that an SRS of 16 women have

pregnancies averaging more than 270 days? 15.9% – What is the mean and standard deviation of my sampling

distribution?

𝜇𝑋 = 𝜇 = 266 and 𝜎𝑋 =𝜎

𝑛=

16

16= 4

Page 24: Introduction to Inference - Mr. Song's Statistics

What if we’re talking about proportions?

𝜌 =coung of successes" in sample

size of sample=

X

n

Provided that the population is much larger than the sample, the count X will follow a binomial distribution.

𝜇𝑋 = 𝑛𝜌 and 𝜎𝑋 = 𝑛𝜌(1 − 𝜌)

Page 25: Introduction to Inference - Mr. Song's Statistics

• Choose an SRS of size n from a large population with population proportion 𝜌 having some characteristic of interest. Let 𝜌 be the proportion of the sample having that characteristic. Then:

• The mean of the sampling distribution of 𝜌 is exactly 𝜌.

• The standard deviation of the sampling distribution of

𝜌 is 𝜌(1−𝜌)

𝑛

Page 26: Introduction to Inference - Mr. Song's Statistics

Rule of Thumb

1. Use 𝜎

𝑛 or

𝜌(1−𝜌)

𝑛 for 𝜌 only when the

population is at least 10 times as large as the sample.

2. We will use the normal approximation to the sampling distribution of 𝜌 for values of n and p that satisfy 𝑛𝜌 ≥ 10 𝑎𝑛𝑑 𝑛(1 − 𝜌) ≥ 10.

Page 27: Introduction to Inference - Mr. Song's Statistics

Exercise 9.19 Do you drink the cereal milk? A USA Today poll asked a random sample of 1012 U.S. adults what they do with the milk in the bowl after they have eaten the cereal. Of the respondents, 67% said that they drink it. Suppose that 70% of U.S. adults actually drink the cereal milk.

(a) Find the mean and standard deviation of the proportion 𝜌 of the sample that say they drink the cereal milk?

(b) Explain why you can use the formula for the standard deviation of 𝜌 in this setting (rule of thumb 1).

(c) Check that you can use the normal approximation of the distribution of 𝜌 (rule of thumb 2).

Page 28: Introduction to Inference - Mr. Song's Statistics

(d) Find the probability of obtaining a sample of 1012 adults in which 67% or fewer say they drink the cereal milk. Do you have any doubts about the result of this poll?

(e) What sample size would be required to reduce the standard deviation of the sample proportion to half the value you found in (a)?