Sampling Distribution Models
Copyright © 2009 Pearson Education,
Inc.
Rather than showing real repeated samples, imagine what would happen if we were to actually draw many samples.
The histogram we’d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions.
Sli
de
1-
2
It turns out that the histogram is unimodal, symmetric, and centered at p.
More specifically, it’s an amazing and fortunate fact that a Normal model is just the right one for the histogram of sample proportions.
Sli
de
1-
3
Model how sample proportions vary from sample to sample.
A sampling distribution model for how a sample proportion varies from sample to sample allows us to quantify that variation and how likely it is that we’d observe a sample proportion in any particular interval.
Sli
de
1-
4
When working with proportions,
Mean = p
Standard deviation =
So, the distribution of the sample proportions is modeled with a probability model that is
Sli
de
1-
5
pq
n
N p,pq
n
A picture of what we just discussed is as follows:
Sli
de
1-
6
Normal model says that 95% of values are within two standard deviations of the mean.
So 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard deviations.
This is what we mean by sampling error. It’s not really an error at all, but just variability you’d expect to see from one sample to another.
Sli
de
1-
7
The Normal model gets better as a good model for the distribution of sample proportions as the sample size gets bigger.
Sli
de
1-
8
There are two assumptions in the case of the model for the distribution of sample proportions:
1. The Independence Assumption: The sampled values must be independent of each other.
2. The Sample Size Assumption: The sample size, n, must be large enough.
Sli
de
1-
9
1. Randomization Condition: The sample should be a simple random sample of the population.
2. 10% Condition: If sampling has not been made with replacement, then the sample
size, n, must be no larger than 10% of the population.
3. Success/Failure Condition: The sample size
has to be big enough so that both np and
nq are at least 10.
Sli
de
1-
10
Sampling distribution models are important because ◦ they act as a bridge from the real world of data to
the imaginary model of the statistic and
◦ enable us to say something about the population when all we have is data from the real world.
Sli
de
1-
11
Proportions summarize categorical variables.
The Normal sampling distribution model looks like it will be very useful.
Can we do something similar with quantitative data?
We can indeed. Even more remarkable, not only can we use all of the same concepts, but almost the same model.
Sli
de
1-
12
A sample mean also has a sampling distribution.
Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:
Sli
de
1-
13
Looking at the average of two dice after a simulation of 10,000 tosses:
The average of three dice after a simulation of 10,000 tosses looks like:
Sli
de
1-
14
The average of 5 dice after a simulation of 10,000 tosses looks like:
The average of 20 dice after a simulation of 10,000 tosses looks like:
Sli
de
1-
15
As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean.
The sampling distribution of a mean becomes Normal.
Sli
de
1-
16
The sampling distribution of any mean becomes more nearly Normal as the sample size grows. ◦ All we need is for the observations to be
independent and collected with randomization.
◦ We don’t even care about the shape of the population distribution!
The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).
Sli
de
1-
17
The Central Limit Theorem (CLT)
The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be.
Sli
de
1-
18
The CLT requires essentially the same assumptions we saw for modeling proportions: Independence Assumption: The sampled values
must be independent of each other. Sample Size Assumption: The sample size must
be sufficiently large.
Sli
de
1-
19
The Normal model for the sampling distribution of the mean has a mean equal to the population mean:
𝑦 = 𝜇 And a standard deviation equal to
where σ is the population standard deviation.
Sli
de
1-
20
SD y
n
Both of the sampling distributions we’ve looked at are Normal. ◦ For proportions
◦ For means
Sli
de
1-
21
SD p̂ pq
n
SD y
n
When we don’t know p or σ, we will use sample statistics to estimate these population parameters.
Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error.
Sli
de
1-
22
For a sample proportion, the standard error is
For the sample mean, the standard error is
Sli
de
1-
23
SE p̂ p̂q̂
n
SE y s
n
Be careful! Now we have two distributions to deal with.
Sli
de
1-
24
The first is the real world distribution of the sample, which we might display with a histogram.
The second is the math world sampling distribution of the statistic, which we model with a Normal model based on the Central Limit Theorem.
Don’t confuse the two!
There are two basic truths about sampling distributions:
1. Sampling distributions arise because samples vary. Each random sample will have different cases and so, a different value of the statistic.
2. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions.
Sli
de
1-
25
Sli
de
1-
26
Don’t confuse the sampling distribution with the distribution of the sample. ◦ When you take a sample, you look at the distribution of
the values, usually with a histogram, and you may calculate summary statistics.
◦ The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get.
Watch out for small samples from skewed populations. ◦ The more skewed the distribution, the larger the sample
size we need for the CLT to work.
Sli
de
1-
27
Based on past experience, a bank believes that 12% of people who receive loans will not make payments on time. The bank has recently approved 500 loans. a) What is the mean and standard deviation of the
proportion of clients in this group who many not make timely payments?
μ = p = 0.12
𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟏𝟐 ∙. 𝟖𝟖/𝟓𝟎𝟎 = 𝟎. 𝟎𝟏𝟓 b) What is the probability that over 14% of these clients will not make payments on time? P(p > 0.14) = 1 – Normdist(0.14,0.12,0.015,1) = 0.91
Sli
de
1-
28
Just before a referendum on a school budget, a local newspaper polls 435 voters to predict whether the budget will pass. Suppose the budget has the support of 54% of the voters. What is the probability that the newspaper’s sample will lead it to predict defeat?
a) mean and standard deviation of the proportion:
μ = p = 0.54
𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟓𝟒 ∙. 𝟒𝟔/𝟒𝟑𝟓 = 𝟎. 𝟎𝟐𝟒
P(p < 0.5) = 1 – Normdist(0.5,0.54,0.024,1) = 0.048
Sli
de
1-
29
When a truckload of apples arrives at a packing plant, a random sample of 125 is selected and examined for bruises, discoloration, and other defects. The whole truckload is rejected if more than 10% of the sample is unsatisfactory. Suppose in fact that 12% of the apples on the truck do not meet the desired standard. What is the probability that the shipment will be accepted anyway?
mean and standard deviation of the proportion:
μ = p = 0.12
𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟏𝟐 ∙. 𝟖𝟖/𝟏𝟐𝟓 = 𝟎. 𝟎𝟐𝟗
P(p < 0.10) = 1 – Normdist(0.1,0.12,0.029,1) = 0.245
Sli
de
1-
30
A new restaurant with 119 seats is being planned. Studies show that 63% of the customers demand a smoke-free area. How many seats should be in the non-smoking area in order to be very sure (μ+3σ) of having enough seating there? mean and standard deviation of the proportion: μ = p = 0.63
𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟔𝟑 ∙. 𝟑𝟕/𝟏𝟏𝟗 = 𝟎. 𝟎𝟒𝟒 μ+3σ = .63 + 3*0.044 = 0.763 0.763*119 = 90 seats
Sli
de
1-
31
Assume that the duration of human pregnancies can be described by a normal model with mean 268 days and standard deviation 16 days.
a) What percentage of pregnancies should last between 255 and 270 days?
P(255<x < 270) =
normdist(270,268,16,1)-normdist(255,268,16,1)
=.341 = 34.1%
Sli
de
1-
32
Assume that the duration of human pregnancies can be described by a normal model with mean 268 days and standard deviation 16 days.
b) At least how many days should the longest 30% of all pregnancies last?
P(x > ?) = 0.3
norminv(0.7,268,16) = 276.4
Sli
de
1-
33
c) Suppose a certain obstetrician is currently providing prenatal care to 40 pregnant women. According to the CLT, what is the mean and standard deviation of this model?
Mean = 268
𝑺𝑫 =𝝈
𝒏=
𝟏𝟔
𝟒𝟎= 𝟐. 𝟓𝟑
d) What is the probability that the mean duration of these patients’ pregnancies will be less than 274 days?
P(y < 274) = normdist(274,268,2.53,1) = .991
Sli
de
1-
34
The score distribution shown in the table is for all students who took a yearly AP statistics exam. An AP statistics teacher had 46 students preparing to take the AP exam. He considered his students to be “typical” of all the national students.
Sli
de
1-
35
Score Percent of students
5 13.9
4 22.5
3 25.3
2 17.2
1 21.1
The score distribution shown in the table is for all students who took a yearly AP statistics exam. An AP statistics teacher had 46 students preparing to take the AP exam. He considered his students to be “typical” of all the national students.
Sli
de
1-
36
Score Percent of students
5 13.9
4 22.5
3 25.3
2 17.2
1 21.1
What is the probability that his students will achieve an average score of at least 3. 1. Find mean and standard deviation of the population. μ= E(X) = Σ x * P(X) = 2.909 σ = sqrt(Σ (x – μ)2 * P(x)) = 1.337
Sli
de
1-
37
Score Percent of students
5 13.9
4 22.5
3 25.3
2 17.2
1 21.1
What is the probability that his students will achieve an average score of at least 3?
2. Find mean and standard deviation of the sample.
Mean = 2.909
SD = σ/sqrt(n)= 1.337/sqrt(46) = .171
3. Find probability:
P(x > 3) = 1 – normdist(3,2.909,.171,1) = .2976
Sli
de
1-
38
The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal model with a mean of 16.3 ounces and a standard deviation of .21 ounces. a) What fraction of all bags sold are underweight? P(x<16) = normdist(16,16.3,0.21,1) = .0766 b) Some of the chips are sold in bargain packs of 5 bags. What is the probability that none of the 5 is underweight? P(x = 0) = p0q5 = (1-.0766)5 = .6715 Sli
de
1-
39
The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal model with a mean of 16.3 ounces and a standard deviation of .21 ounces.
c) What is the probability that the mean weight of the 5 bags is below the stated amount?
P(x<16) = normdist(16,16.3,0.21/sqrt(5),1) = .0007
d) What is the probability that the mean weight of a 30-bag case of potato chips is below 16 ounces?
P(x<16) = normdist(16,16.3,0.21/sqrt(30),1) = .0000
Sli
de
1-
40
Suppose that the IQs of university A’s students can be described by a normal model with mean 130 and standard deviation 7 points. Also suppose that IQs of students from university B can be described by a normal model with mean 110 and standard deviation 12. a) Select a student at random from university A. Find the probability that the student’s IQ is at least 125 points. P(x > 125) = 1 - normdist(125,130, 7,1) = .762 b) Select a student at random from each school. Find the probability that the university A student’s IQ is at least 5 points higher than the university B student’s IQ. Define Z = A – B μ= 130 – 110 = 20 σ = sqrt(72 + 122) = 13.89 P(Z > 5) = 1 – normdist(5,20,13,89,1) = 0.860
Sli
de
1-
41
Suppose that the IQs of university A’s students can be described by a normal model with mean 130 and standard deviation 7 points. Also suppose that IQs of students from university B can be described by a normal model with mean 110 and standard deviation 12. c) Select 3 university B students at random. Find the probability that this groups average IQ is at least 115 points. P(x > 115) = 1 - normdist(115,110, 12/sqrt(3),1) = .235 d) Also select 3 university A students at random. What is the probability that their average IQ is at least 5 points higher than the average for the 3 university B student? Define Z = A – B μ= 130 – 110 = 20 σ = sqrt(72 /3+ 122/3) = 8.02 P(Z > 5) = 1 – normdist(5,20,8.02,1) = 0.969
Sli
de
1-
42