introduction to sampling distributions

Post on 24-Feb-2016

50 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Chapter 7. Introduction to Sampling Distributions. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania. Terms, Statistics & Parameters. Terms : Population, Sample, Parameter, Statistics. Why Sample?. - PowerPoint PPT Presentation

TRANSCRIPT

1

Section 7.1-2

Sampling Distributions and the Central Limit Theorem

© 2012 Pearson Education, Inc. All rights reserved.

2

Section 7.1 Objectives

• Describe sampling distributions and verify their properties

• Explain the Central Limit Theorem• Apply the Central Limit Theorem

© 2012 Pearson Education, Inc. All rights reserved.

3

Terms, Statistics, and Parameters

© 2012 Pearson Education, Inc. All rights reserved.

4

Why Sample?

© 2012 Pearson Education, Inc. All rights reserved.

At times, we’d like to know something about the population, but because our time, resources, and efforts are limited, we can take a sample to learn about the population.

5

Sampling Demonstration (size n=2)

© 2012 Pearson Education, Inc. All rights reserved.

• Write each of the following numbers on a card: 1,3,5,7

• Repeat the following experiment:1. Randomly select card #1and replace it2. Randomly select card #2 and replace it

6

Types of Inference

© 2012 Pearson Education, Inc. All rights reserved.

• Estimation: We estimate the value of a population parameter

• Testing: We formulate a decision about a population parameter

• Regression: We make predictions about the value of a population parameter

7

Sampling Distributions

© 2012 Pearson Education, Inc. All rights reserved.

• The distribution of values taken on by the statistic • It is based on all possible samples of the same size

from a given population we sample with replacement the same value can be used over again

• A sampling distribution is a sample space• The distribution describes everything that can happen

when we sample

8

Sampling Distributions

© 2012 Pearson Education, Inc. All rights reserved.

• To evaluate the reliability of our inference, we need to know about the probability distribution of the statistic we are using.

• Typically, we are interested in the sampling distributions for sample means and sample proportions.

9

Sampling Distributions

Sampling distribution • The probability distribution of a sample statistic • Formed when samples of size n are repeatedly taken

from a population• e.g. Sampling distribution of sample means

© 2012 Pearson Education, Inc. All rights reserved.

10

Sampling Distribution of Sample Means

Sample 1

1x

Sample 5

5xSample 2

2x

Sample 3

3xSample 4

4x

Population with μ, σ

The sampling distribution consists of the values of the sample means, 1 2 3 4 5, , , , ,...x x x x x

© 2012 Pearson Education, Inc. All rights reserved.

11

Checkpoint (contd.)4. What is the sample characteristic corresponding to each sample?The sample statistic is the sample mean length

5. What is the sampling distribution?The sampling distribution is the probability distribution of all possible values of

6. To which population parameter does this sampling distribution correspond?This sampling distribution relates to the population mean length µ of all the trout in the tank.

© 2012 Pearson Education, Inc. All rights reserved.

12

Exercise 1: Sampling Distribution of Sample Means

The population values {1, 3, 5, 7} are written on slips of paper and put in a box. Two slips of paper are randomly selected, with replacement. a. Find the mean, variance, and standard deviation of

the population.

Mean: 4xN

22Varianc : 5e ( )x

N

Standard Deviat 5ion 236: 2.

Solution:

© 2012 Pearson Education, Inc. All rights reserved.

13

Exercise 1 : Sampling Distribution of Sample Means

b. Graph the probability histogram for the population values {1,3,5,7}

All values have the same probability of being selected (uniform distribution)

Population values

Prob

abili

ty

0.25

1 3 5 7x

P(x) Probability Histogram of Population of x

Solution:

© 2012 Pearson Education, Inc. All rights reserved.

14

Exercise 1 Sampling Distribution of Sample Means

c. List all the possible samples of size n = 2 and calculate the mean of each sample.

53, 743, 533, 323, 141, 731, 521, 311, 1

77, 767, 557, 347, 165, 755, 545, 335, 1

These means form the sampling distribution of sample means.

© 2012 Pearson Education, Inc. All rights reserved.

SampleSolution:

Sample x x

15

Exercise 1 Sampling Distribution of Sample Means

d. Construct the probability distribution of the sample means.

x f Probabilityf Probability1 1 0.06252 2 0.12503 3 0.18754 4 0.25005 3 0.18756 2 0.12507 1 0.0625

𝑥

© 2012 Pearson Education, Inc. All rights reserved.

16

Exercise 1 : Sampling Distribution of Sample Means

e. Use your calculator to find the mean, variance, and standard deviation of the sampling distribution of the sample means.Solution:The mean, variance, and standard deviation of the 16 sample means are:

4x 2 5 2 52

.x 2 5 1 581x . .

These results satisfy the properties of sampling distributions of sample means.

4x 5 2 236 1 5812 2

. .x n

© 2012 Pearson Education, Inc. All rights reserved.

17

Exercise 1 : Sampling Distribution of Sample Means

f. Graph the probability histogram for the sampling distribution of the sample means.

The shape of the graph is symmetric and bell shaped. It approximates a normal distribution.

Solution:

Mean Trout Length (in.)

Prob

abili

ty

0.25

P(x) Probability Histogram of Sampling Distribution of

0.20

0.15

0.10

0.05

6 75432

x

x

© 2012 Pearson Education, Inc. All rights reserved.

18

Exercise 2 : Sampling Distribution of Sample Means

Class Exercise: Each group of three students should repeat the following experiment five times.• Generate random number pairs using the command

RandInt(1,7,2). Reject any pair that contains an even number. Complete the following table and report your five mean values to the class recorder.

© 2012 Pearson Education, Inc. All rights reserved.

Trial #1      2      3      4      5      

19

2. The standard deviation of the sample means, , is equal to the population standard deviation, σ, divided by the square root of the sample size, n.

1. The mean of the sample means, , is equal to the population mean μ.

Properties of Sampling Distributions of Sample Means

x

x

x

x n

• Called the standard error of the mean.

© 2012 Pearson Education, Inc. All rights reserved.

20

The Central Limit Theorem 7.1If the population itself is normally distributed,

then the sampling distribution of sample means is normally distribution for any sample size n.

x

© 2012 Pearson Education, Inc. All rights reserved.

x

xx xx

xxxx x

xxx

21

The Central Limit Theorem 7.21. If samples of size n ≥ 30 are drawn from any

population with mean = µ and standard deviation = σ,

x

xx xx

xxxx x

xxx x

then the sampling distribution of sample means approximates a normal distribution. The greater the sample size, the better the approximation.

© 2012 Pearson Education, Inc. All rights reserved.

22

The Central Limit Theorem• In either case, the sampling distribution of sample

means has a mean equal to the population mean.

• The sampling distribution of sample means has a variance equal to 1/n times the variance of the population and a standard deviation equal to the population standard deviation divided by the square root of n.

Variance

Standard deviation (standard error of the mean)

x

x n

22x n

© 2012 Pearson Education, Inc. All rights reserved.

Mean

23

The Central Limit TheoremCase: Any Population Distribution Case: Normal Population Distribution

Distribution of Sample Means, n ≥ 30

Distribution of Sample Means, (any n)

© 2012 Pearson Education, Inc. All rights reserved.

24

Exercise 3: Interpreting the Central Limit Theorem

Cellular phone bills for residents of a city have a mean of $63 and a standard deviation of $11. Random samples of 100 cellular phone bills are drawn from this population and the mean of each sample is determined. Find the mean and standard error of the mean of the sampling distribution. Then sketch a graph of the sampling distribution of sample means.

© 2012 Pearson Education, Inc. All rights reserved.

25

Solution: Interpreting the Central Limit Theorem

• The mean of the sampling distribution is equal to the population mean

• The standard error of the mean is equal to the population standard deviation divided by the square root of n.

63x

11 1.1100x n

© 2012 Pearson Education, Inc. All rights reserved.

26

Solution: Interpreting the Central Limit Theorem

• Since the sample size is greater than 30, the sampling distribution can be approximated by a normal distribution with

$63x $1.10x

© 2012 Pearson Education, Inc. All rights reserved.

27

Exercise 4: Interpreting the Central Limit Theorem

Suppose the training heart rates of all 20-year-old athletes are normally distributed, with a mean of 135 beats per minute and standard deviation of 18 beats per minute. Random samples of size 4 are drawn from this population, and the mean of each sample is determined. Find the mean and standard error of the mean of the sampling distribution. Then sketch a graph of the sampling distribution of sample means.

© 2012 Pearson Education, Inc. All rights reserved.

28

Solution: Interpreting the Central Limit Theorem

• The mean of the sampling distribution is equal to the population mean.

• The standard error of the mean is equal to the population standard deviation divided by the square root of n.

=

© 2012 Pearson Education, Inc. All rights reserved.

29

Solution: Interpreting the Central Limit Theorem

• Since the population is normally distributed, the sampling distribution of the sample means is also normally distributed.

135x 9x

© 2012 Pearson Education, Inc. All rights reserved.

Exercise 5: Probabilities for Sampling Distributions

The graph shows the length of time people spend driving each day. You randomly select 50 drivers age 15 to 19. What is the probability that the mean time they spend driving each day is between 24.7 and 25.5 minutes? Assume that σ = 1.5 minutes.

Larson/Farber 4th ed 30

Solution: Probabilities for Sampling Distributions

From the Central Limit Theorem (sample size is greater than 30), the sampling distribution of sample means is approximately normal with

25x 1.5 0.2121350x n

31Larson/Farber 4th ed

Solution: Probabilities for Sampling Distributions

124 7 25 1 411 5

50

xz

n

- . - - ..

24.7 25

P(24.7 < x < 25.5)

x

Normal Distributionμ = 25 σ = 0.21213

225 5 25 2 361 5

50

xz

n

- . - ..

25.5 -1.41z

Standard Normal Distribution μ = 0 σ = 1

0

P(-1.41 < z < 2.36)

2.36

0.99090.0793

32Larson/Farber 4th ed

𝑷 (24.7<𝑥<25.5 )=𝑷 (− 1.41<𝑧<2.36)=𝒏𝒐𝒓𝒎𝒂𝒍𝒄𝒅𝒇 (− 1.41, 2.36 )=0.9116

33

Solution: Interpreting the Central Limit Theorem

• The mean of the sampling distribution is equal to the population mean

• The standard error of the mean is equal to the population standard deviation divided by the square root of n.

135x

© 2012 Pearson Education, Inc. All rights reserved.

18 94x n

34

Exercise 6: Probabilities for and Suppose a team of biologists has been studying the Pinedale Children’s fishing pond. Let x represent the length of a single trout taken at random from the pond. Assume x has a normal distribution with μ=10.2 inches and standard deviation σ=1.4 in.a) What is the probability that a single trout taken at

random from the pond is between 8 and 12 inches?b) What is the probability that the mean length of 5

trout taken at random is between 8 and 12 inches?c) Explain the difference between parts a) and b).

35

Exercise 6: Probabilities for and

a) What is the probability that a single trout taken at random from the pond is between 8 and 12 inches?

b) What is the probability that the mean length of 5 trout taken at random is between 8 and 12 inches?

= = 10.2 =

36

Exercise 6: Probabilities for and

 c) Explain the difference between parts a) and b)

In part a, we are computing the probability that a single trout will be between 8 and 12 inches in length.

In part b, we are computing the probability that average length for a sample of size of 5 will be between 8 and 12 inches.

37

Exercise 6: Probabilities for and

NOTES:1. Both curves use the same scale on

the horizontal axis.2. The means are the same.3. The shaded area is above the

interval from 8 to 12 on each graph.

3. The smaller standard deviation of the distribution has the effect of gathering together much more of the total probability into the region over its mean.

38

Exercise 7: Probabilities for x and xAn education finance corporation claims that the average credit card debts carried by undergraduates are normally distributed, with a mean of $3173 and a standard deviation of $1120. (Adapted from Sallie Mae)

Solution:You are asked to find the probability associated with a certain value of the random variable x.

a) What is the probability that a randomly selected undergraduate, who is a credit card holder, has a credit card balance less than $2700?

© 2012 Pearson Education, Inc. All rights reserved.

39

Solution: Probabilities for x and x

P( x < 2700) = P(z < –0.42) = 0.3372

z x

2700 3173

1120 0.42

2700 3173

P(x < 2700)

x

Normal Distribution μ = 3173 σ = 1120

–0.42z

Standard Normal Distribution μ = 0 σ = 1

0

P(z < –0.42)

0.3372

© 2012 Pearson Education, Inc. All rights reserved.

40

Example: Probabilities for x and x

b) You randomly select 25 undergraduates who are credit card holders. What is the probability that their mean credit card balance is less than $2700?

Solution:You are asked to find the probability associated with a sample mean .x

3173x 1120 22425x n

© 2012 Pearson Education, Inc. All rights reserved.

41

0

P(z < –2.11)

–2.11z

Standard Normal Distribution μ = 0 σ = 1

0.0174

Solution: Probabilities for x and x

z x

n

2700 31731120

25

473224

2.11

Normal Distribution μ = 3173 σ = 1120

2700 3173

P(x < 2700)

x

P( x < 2700) = P(z < –2.11) = 0.0174

© 2012 Pearson Education, Inc. All rights reserved.

42

Solution: Probabilities for x and x

c) Write interpretive statements for the two calculations above1. There is about a 34% chance that an undergraduate

will have a balance less than $2700.2. There is only about a 2% chance that the mean of a

sample of 25 will have a balance less than $2700. If the mean balance of a sample of 25 actually was less than $2700, we would consider this to be an unusual event.

© 2012 Pearson Education, Inc. All rights reserved.

43

Population Variability vs. Standard Error

Variability – The spread of the sampling distribution indicates the variability of the statistic

Example 1: Americans’ incomes are quite widely distributed, from $0 to Bill Gates’

Large population variability standard error will be quite variable

44

Population Variability vs. Standard Error

Variability – The spread of the sampling distribution indicates the variability of the statistic

Example 2: Americans’ car values are less widely distributed, from about $500 to about $50K

Smaller population variability standard error will be less variable

45

Section 7.2 Summary

• Found sampling distributions and verified their properties

• Interpreted the Central Limit Theorem• Applied the Central Limit Theorem to find the

probability of a sample mean

© 2012 Pearson Education, Inc. All rights reserved.

46

Section 7.3 Objectives

• Compute the mean and standard deviation for the sample proportion

• Use the normal approximation to compute probabilities for proportions

© 2012 Pearson Education, Inc. All rights reserved.

47

Sampling Distribution for the Proportion

© 2012 Pearson Education, Inc. All rights reserved.

48

Sampling Distribution for the Proportion

© 2012 Pearson Education, Inc. All rights reserved.

• The standard error for the distribution is the standard deviation of the

• We consider the sampling distribution for r in the binomial distribution

• The distribution is discrete, while x is continuous• To adjust for this, we will need to apply an

appropriate continuity correction

49

Sampling Distribution for the Proportion

© 2012 Pearson Education, Inc. All rights reserved.

50

Exercise 1: Sampling distribution of The annual crime rate in the Capital Hill neighborhood of Denver is 111 victims per 1000 residents. This means that 111 out of 1000 residents have been the victim of at least one crime. These crimes range from relatively minor crimes (stolen hubcaps or purse snatching) to major crimes (murder). The Arms is an apartment building in Capital Hill. It has 50 year round residents. Suppose we view each of the n residents as a binomial trial. The random variable r (which takes on values 0, 1, 2, . . . , 50) represents the number of victims of at least one crime in the next year.

51

Exercise 1: Sampling distribution of

a) What is the population probability p that a resident in the Capital Hill neighborhood will be the victim of a crime next year? What is the probability q that a resident will not be a victim?Solution:Probability = relative frequency =

p = 111/1000 = 0.111 q = 1 – p = 0.889

52

Exercise 1: Sampling distribution of

b) Can we approximate the distribution with a normal distribution? Explain.Solution:Consider the random variable = np = 50(0.111) = 5.55

nq = 50(0.889) = 44.45Since both np and nq are greater than 5, we can approximate the distribution with a normal distribution.

53

Exercise 1: Sampling distribution of

Sampling Distribution for p-hat (n=50, p=0.111)

0

1/50

1/25

3/50

2/25

1/10

3/25

7/50

4/25

9/50

1

/5

11/5

0

6/25

13/5

0

7/25

3/10

0.000

0.020

0.040

0.060

0.080

0.100

0.120

0.140

0.160

0.180

0.200

Sampling Distribution for p-hat (n=50, p=0.111)

phat

P(ph

at)

54

Exercise 1: Sampling distribution of

c) What are the mean and standard deviation for the distribution?Solution:

55

Exercise 1: Sampling distribution of

d) What is the probability that between 10% and 20% of the Arms residents will be victims of a crime next year? Interpret the results.Continuity Correction: = 0.01 

.

56

Exercise 1: Sampling distribution of

Interpretive Statement: There is about a 67% chance that between 10% and 20% of the Arms residents will be crime victims next year.

57

Exercise 2: Sampling distribution of

Consider tossing a fair coin 5 times. Calculate the proportion of the 5 tosses that result in heads. Calculate the sampling distribution of

a) Compute the possible values of

r = r/n0 0 1 1/5 2 2/5 3 3/5 4 4/5 5 1

58

Exercise 2: Sampling distribution of

b) Compute the possible values of

r P() = binompdf(5,0.5,r)0 0 0.0311 1/5 0.1562 2/5 0.3133 3/5 0.3134 4/5 0.1565 1 0.031

59

Exercise 2: Sampling distribution of

c) G

0 1/5 2/5 3/5 4/5 1 more0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

Sampling Distribution for p-hat (n=5, p=0.5)

60

Exercise 3: Sampling distribution of

According to a study by the U.S. Department of Transportation, 44% of college students drive while distracted. Professor Baker surveyed 244 students at her college and 36% of them admitted to driving while distracted in the past week.

Do these results seem reasonable? Compute the probability that in a sample of 244 students, 36% or less have engaged in distracted driving.

61

Solution: Sampling distribution of

𝑛=244 ,𝑝=0.44 ,𝑞=0.56244(0.44) = 107.36

244(0.56) = 136.64

Since both np and nq are greater than 5, we can approximate the distribution with a normal distribution.

62

Solution: Sampling distribution of

Continuity Correction: = 0.002

= =

63

Solution: Sampling distribution of

Interpretive Statement: Theoretically, there is only about a 0.73% chance that in a sample of 244 students, 36% or less will have engaged in distracted driving. It does not appear that the Professor Baker’s students are being very honest in response to this survey!

Bias in SamplingDefinition: A sample statistic is unbiased if the mean of its sampling distribution equals the value of the parameter being estimated.• The sample mean is an unbiased estimator of the

mean µ when n ≥ 30• The sample proportionis an unbiased estimator of the

population proportion of successes p in binomial experiments with sufficiently large numbers of trials n

• Sample standard deviation is a biased estimator of population standard deviation (bias is introduced by the non-linear square root function).

Bias in SamplingDefinition: A sample statistic is unbiased if the mean of its sampling distribution equals the value of the parameter being estimated.• The sample variance is an unbiased estimator of

population variance • Sample standard deviation s is a biased estimator of

population standard deviation σ (bias is introduced by the non-linear square root function).

https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/sample-standard-deviation-and-bias

Variability of Distribution

• Spread of sampling distribution is an indication of the variability of the statistic

• Spread is affected by sample size The v decreases as sample size increases The variability of decreases as sample size

increases

Population Parameter as a Target

Population Parameter as a Target

Both bias and variability describe what happens when we take many shots at the target.

Bias means that our aim is off and we consistently miss the bulls-eye in the same direction.

Our sample values do not center on the population value.

Population Parameter as a Target

High variability means that repeated shots are widely scattered on the target.

Repeated samples do not give very similar results.

For best results, choose a sample statistic with • Low bias• Minimum variability

Spread: Low Variability is Better!

Larger samples are more likely to produce an estimate close to the true value of the parameter.

Sample size: larger n smaller standard error

71

Section 7.3 Summary

• Computed the mean and standard deviation for the sample proportion

• Used normal approximation to compute probabilities for proportions

• Discussed the concepts of statistical bias and variability

© 2012 Pearson Education, Inc. All rights reserved.

top related