Download - Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Sampling Distribution Models

Copyright © 2009 Pearson Education,

Inc.

Rather than showing real repeated samples, imagine what would happen if we were to actually draw many samples.

The histogram we’d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions.

Sli

de

1-

2

It turns out that the histogram is unimodal, symmetric, and centered at p.

More specifically, it’s an amazing and fortunate fact that a Normal model is just the right one for the histogram of sample proportions.

Sli

de

1-

3

Model how sample proportions vary from sample to sample.

A sampling distribution model for how a sample proportion varies from sample to sample allows us to quantify that variation and how likely it is that we’d observe a sample proportion in any particular interval.

Sli

de

1-

4

When working with proportions,

Mean = p

Standard deviation =

So, the distribution of the sample proportions is modeled with a probability model that is

Sli

de

1-

5

pq

n

N p,pq

n

A picture of what we just discussed is as follows:

Sli

de

1-

6

Normal model says that 95% of values are within two standard deviations of the mean.

So 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard deviations.

This is what we mean by sampling error. It’s not really an error at all, but just variability you’d expect to see from one sample to another.

Sli

de

1-

7

The Normal model gets better as a good model for the distribution of sample proportions as the sample size gets bigger.

Sli

de

1-

8

There are two assumptions in the case of the model for the distribution of sample proportions:

1. The Independence Assumption: The sampled values must be independent of each other.

2. The Sample Size Assumption: The sample size, n, must be large enough.

Sli

de

1-

9

1. Randomization Condition: The sample should be a simple random sample of the population.

2. 10% Condition: If sampling has not been made with replacement, then the sample

size, n, must be no larger than 10% of the population.

3. Success/Failure Condition: The sample size

has to be big enough so that both np and

nq are at least 10.

Sli

de

1-

10

Sampling distribution models are important because ◦ they act as a bridge from the real world of data to

the imaginary model of the statistic and

◦ enable us to say something about the population when all we have is data from the real world.

Sli

de

1-

11

Proportions summarize categorical variables.

The Normal sampling distribution model looks like it will be very useful.

Can we do something similar with quantitative data?

We can indeed. Even more remarkable, not only can we use all of the same concepts, but almost the same model.

Sli

de

1-

12

A sample mean also has a sampling distribution.

Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

Sli

de

1-

13

Looking at the average of two dice after a simulation of 10,000 tosses:

The average of three dice after a simulation of 10,000 tosses looks like:

Sli

de

1-

14

The average of 5 dice after a simulation of 10,000 tosses looks like:

The average of 20 dice after a simulation of 10,000 tosses looks like:

Sli

de

1-

15

As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean.

The sampling distribution of a mean becomes Normal.

Sli

de

1-

16

The sampling distribution of any mean becomes more nearly Normal as the sample size grows. ◦ All we need is for the observations to be

independent and collected with randomization.

◦ We don’t even care about the shape of the population distribution!

The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).

Sli

de

1-

17

The Central Limit Theorem (CLT)

The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be.

Sli

de

1-

18

The CLT requires essentially the same assumptions we saw for modeling proportions: Independence Assumption: The sampled values

must be independent of each other. Sample Size Assumption: The sample size must

be sufficiently large.

Sli

de

1-

19

The Normal model for the sampling distribution of the mean has a mean equal to the population mean:

𝑦 = 𝜇 And a standard deviation equal to

where σ is the population standard deviation.

Sli

de

1-

20

SD y

n

Both of the sampling distributions we’ve looked at are Normal. ◦ For proportions

◦ For means

Sli

de

1-

21

SD p̂ pq

n

SD y

n

When we don’t know p or σ, we will use sample statistics to estimate these population parameters.

Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error.

Sli

de

1-

22

For a sample proportion, the standard error is

For the sample mean, the standard error is

Sli

de

1-

23

SE p̂ p̂q̂

n

SE y s

n

Be careful! Now we have two distributions to deal with.

Sli

de

1-

24

The first is the real world distribution of the sample, which we might display with a histogram.

The second is the math world sampling distribution of the statistic, which we model with a Normal model based on the Central Limit Theorem.

Don’t confuse the two!

There are two basic truths about sampling distributions:

1. Sampling distributions arise because samples vary. Each random sample will have different cases and so, a different value of the statistic.

2. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions.

Sli

de

1-

25

Sli

de

1-

26

Don’t confuse the sampling distribution with the distribution of the sample. ◦ When you take a sample, you look at the distribution of

the values, usually with a histogram, and you may calculate summary statistics.

◦ The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get.

Watch out for small samples from skewed populations. ◦ The more skewed the distribution, the larger the sample

size we need for the CLT to work.

Sli

de

1-

27

Based on past experience, a bank believes that 12% of people who receive loans will not make payments on time. The bank has recently approved 500 loans. a) What is the mean and standard deviation of the

proportion of clients in this group who many not make timely payments?

μ = p = 0.12

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟏𝟐 ∙. 𝟖𝟖/𝟓𝟎𝟎 = 𝟎. 𝟎𝟏𝟓 b) What is the probability that over 14% of these clients will not make payments on time? P(p > 0.14) = 1 – Normdist(0.14,0.12,0.015,1) = 0.91

Sli

de

1-

28

Just before a referendum on a school budget, a local newspaper polls 435 voters to predict whether the budget will pass. Suppose the budget has the support of 54% of the voters. What is the probability that the newspaper’s sample will lead it to predict defeat?

a) mean and standard deviation of the proportion:

μ = p = 0.54

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟓𝟒 ∙. 𝟒𝟔/𝟒𝟑𝟓 = 𝟎. 𝟎𝟐𝟒

P(p < 0.5) = 1 – Normdist(0.5,0.54,0.024,1) = 0.048

Sli

de

1-

29

When a truckload of apples arrives at a packing plant, a random sample of 125 is selected and examined for bruises, discoloration, and other defects. The whole truckload is rejected if more than 10% of the sample is unsatisfactory. Suppose in fact that 12% of the apples on the truck do not meet the desired standard. What is the probability that the shipment will be accepted anyway?

mean and standard deviation of the proportion:

μ = p = 0.12

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟏𝟐 ∙. 𝟖𝟖/𝟏𝟐𝟓 = 𝟎. 𝟎𝟐𝟗

P(p < 0.10) = 1 – Normdist(0.1,0.12,0.029,1) = 0.245

Sli

de

1-

30

A new restaurant with 119 seats is being planned. Studies show that 63% of the customers demand a smoke-free area. How many seats should be in the non-smoking area in order to be very sure (μ+3σ) of having enough seating there? mean and standard deviation of the proportion: μ = p = 0.63

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟔𝟑 ∙. 𝟑𝟕/𝟏𝟏𝟗 = 𝟎. 𝟎𝟒𝟒 μ+3σ = .63 + 3*0.044 = 0.763 0.763*119 = 90 seats

Sli

de

1-

31

Assume that the duration of human pregnancies can be described by a normal model with mean 268 days and standard deviation 16 days.

a) What percentage of pregnancies should last between 255 and 270 days?

P(255<x < 270) =

normdist(270,268,16,1)-normdist(255,268,16,1)

=.341 = 34.1%

Sli

de

1-

32

Assume that the duration of human pregnancies can be described by a normal model with mean 268 days and standard deviation 16 days.

b) At least how many days should the longest 30% of all pregnancies last?

P(x > ?) = 0.3

norminv(0.7,268,16) = 276.4

Sli

de

1-

33

c) Suppose a certain obstetrician is currently providing prenatal care to 40 pregnant women. According to the CLT, what is the mean and standard deviation of this model?

Mean = 268

𝑺𝑫 =𝝈

𝒏=

𝟏𝟔

𝟒𝟎= 𝟐. 𝟓𝟑

d) What is the probability that the mean duration of these patients’ pregnancies will be less than 274 days?

P(y < 274) = normdist(274,268,2.53,1) = .991

Sli

de

1-

34

The score distribution shown in the table is for all students who took a yearly AP statistics exam. An AP statistics teacher had 46 students preparing to take the AP exam. He considered his students to be “typical” of all the national students.

Sli

de

1-

35

Score Percent of students

5 13.9

4 22.5

3 25.3

2 17.2

1 21.1

The score distribution shown in the table is for all students who took a yearly AP statistics exam. An AP statistics teacher had 46 students preparing to take the AP exam. He considered his students to be “typical” of all the national students.

Sli

de

1-

36


5 13.9

4 22.5

3 25.3

2 17.2

1 21.1

What is the probability that his students will achieve an average score of at least 3. 1. Find mean and standard deviation of the population. μ= E(X) = Σ x * P(X) = 2.909 σ = sqrt(Σ (x – μ)2 * P(x)) = 1.337

Sli

de

1-

37


5 13.9

4 22.5

3 25.3

2 17.2

1 21.1

What is the probability that his students will achieve an average score of at least 3?

2. Find mean and standard deviation of the sample.

Mean = 2.909

SD = σ/sqrt(n)= 1.337/sqrt(46) = .171

3. Find probability:

P(x > 3) = 1 – normdist(3,2.909,.171,1) = .2976

Sli

de

1-

38

The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal model with a mean of 16.3 ounces and a standard deviation of .21 ounces. a) What fraction of all bags sold are underweight? P(x<16) = normdist(16,16.3,0.21,1) = .0766 b) Some of the chips are sold in bargain packs of 5 bags. What is the probability that none of the 5 is underweight? P(x = 0) = p0q5 = (1-.0766)5 = .6715 Sli

de

1-

39

The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal model with a mean of 16.3 ounces and a standard deviation of .21 ounces.

c) What is the probability that the mean weight of the 5 bags is below the stated amount?

P(x<16) = normdist(16,16.3,0.21/sqrt(5),1) = .0007

d) What is the probability that the mean weight of a 30-bag case of potato chips is below 16 ounces?

P(x<16) = normdist(16,16.3,0.21/sqrt(30),1) = .0000

Sli

de

1-

40

Suppose that the IQs of university A’s students can be described by a normal model with mean 130 and standard deviation 7 points. Also suppose that IQs of students from university B can be described by a normal model with mean 110 and standard deviation 12. a) Select a student at random from university A. Find the probability that the student’s IQ is at least 125 points. P(x > 125) = 1 - normdist(125,130, 7,1) = .762 b) Select a student at random from each school. Find the probability that the university A student’s IQ is at least 5 points higher than the university B student’s IQ. Define Z = A – B μ= 130 – 110 = 20 σ = sqrt(72 + 122) = 13.89 P(Z > 5) = 1 – normdist(5,20,13,89,1) = 0.860

Sli

de

1-

41

Suppose that the IQs of university A’s students can be described by a normal model with mean 130 and standard deviation 7 points. Also suppose that IQs of students from university B can be described by a normal model with mean 110 and standard deviation 12. c) Select 3 university B students at random. Find the probability that this groups average IQ is at least 115 points. P(x > 115) = 1 - normdist(115,110, 12/sqrt(3),1) = .235 d) Also select 3 university A students at random. What is the probability that their average IQ is at least 5 points higher than the average for the 3 university B student? Define Z = A – B μ= 130 – 110 = 20 σ = sqrt(72 /3+ 122/3) = 8.02 P(Z > 5) = 1 – normdist(5,20,8.02,1) = 0.969

Sli

de

1-

42

Download - Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Top Related