1 sampling distributions lecture 9. 2 background we want to learn about the feature of a population...

43
1 Sampling Distributions Lecture 9

Upload: robert-porter

Post on 30-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Sampling Distributions

Lecture 9

2

Background

We want to learn about the feature of a population (parameter) In many situations, it is impossible to examine all elements of a

population because elements are physically inaccessible, too costly to do so, or the examination involved may destroy the item.

Sample is a relatively small subset of the total population. We study a random sample to draw conclusions about a population,

this is where statistics come into the picture. Statistics, such as the sample mean and sample variance,

computed from sample measurements, vary from sample to sample. Therefore, they are random variables.

The probability distribution of a statistic is called a sampling distribution.

3

Sampling Distributions

Sampling Distributions

Sampling Distribution of

the Mean

Sampling Distribution of the Proportion

A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population

4

Developing a Sampling Distribution

Assume there is a population …

Population size N=4

Random variable, X,

is age of individuals

Values of X: 18, 20,

22, 24 (years)

A B C D

5

.3

.2

.1

0 18 20 22 24

A B C D

P(x)

x

(continued)

Summary Measures for the Population Distribution:

Developing a Sampling Distribution

214

24222018

N

Xμ i

2.236N

μ)(Xσ

2i

6

Sampling with replacement

Samples Age Sample means

A, A 18, 18 18

A, B 18, 20 19

A, C 18, 22 20

A, D 18, 24 21

B, A 20, 18 19

B, B 20, 20 20

B, C 20, 22 21

B, D 20, 24 22

C, A 22, 18 20

C, B 22, 20 21

C, C 22, 22 22

C, D 22, 24 23

D, A 24, 18 21

D, D 24, 20 22

D, C 24, 22 23

D, D 24, 24 24

7

1st 2nd Observation Obs 18 20 22 24

18 18 19 20 21

20 19 20 21 22

22 20 21 22 23

24 21 22 23 24

Sampling Distribution of All Sample Means

18 19 20 21 22 23 240

.1

.2

.3 P(X)

X

Sample Means Distribution

16 Sample Means

_

Developing a Sampling Distribution

(continued)

_

8

Summary Measures of this Sampling Distribution (note

that N=16 for the population of sample means):

Developing aSampling Distribution

(continued)

2116

24211918

N

Xμ i

X

1.5816

21)-(2421)-(1921)-(18

N

)μX(σ

222

2Xi

X

9

Comparing the Population with its Sampling Distribution (with replacement)

18 19 20 21 22 23 240

.1

.2

.3 P(X)

X 18 20 22 24

A B C D

0

.1

.2

.3

PopulationN = 4

P(X)

X _

21.58σ 21μ

X

X2.236σ 21μ

Sample Means Distributionn = 2

_

10

Mean and standard error of the sample Mean (sample with replacement)

The mean of the distribution of sample mean:

A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean:(This assumes that sampling is with replacement or sampling is without replacement from an infinite population)

Note that the standard error of the mean decreases as the sample size increases

n

σσ

X

X

11

If the Population is Normal If a population is normal with mean μ and standard

deviation σ, The sampling distribution of is also normally distributed

with

and

Or, equivalently, the sampling distribution of is

normally distributed with

and

n

iiX

1

X

μμX

n

σσ

X

μμ niX n

iXσ

12

Sampling Distribution Properties

As n increases,

decreasesLarger sample size

Smaller sample size

x

(continued)

μ

13

If the Population is not normal The central limit theorem states that when the number of

observations in each sample (called sample size) gets large enough

The sampling distribution of is approximately normally

distributed with

and

Or, equivalently, the sampling distribution of is also

approximately normally distributed with

and

n

iiX

1

X

μμX

n

σσ

X

μμ niX n

iXσ

14

Z value for means

n

XZ

Standardize the sample mean:

15

Population Distribution

Sampling Distribution (becomes normal as n increases)

Central Tendency

Variation

x

x

Larger sample size

Smaller sample size

Visualizing the Central Limit Theorem

Sampling distribution properties:

μμx

n

σσx

μ

16

How Large is Large Enough?

For most distributions, n > 30 will give a sampling distribution that is nearly normal

For fairly symmetric distributions, n > 15

Recall that, for normal population distributions, the sampling distribution of the mean is always normally distributed regardless of sample size n

17

Calculating probabilities

Suppose we want to find out

If the population is normal, then regardless of the value of n:

If the population is not normal, then, when n is large enough (n > 30)

n

aZ

n

aPbXaP

)(

)( bXaP

n

aZ

n

aPbXaP

)(

18

Example

Suppose a population has mean μ = 10 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected.

What is the probability that the sample mean is between 9.7 and 10.3?

19

Example

Solution:

Even if the population is not normally distributed, the central limit theorem can be used (n > 30)

… so the sampling distribution of is approximately normal

… with mean = 10

…and standard deviation

(continued)

x

0.536

3

n

σσx

20

Example

Solution (continued):(continued)

0.65140.6)ZP(-0.6

363

10-10.3

μ- X

363

10-9.7P 10.3) X P(9.7

9.7 10 10.3

Sampling Distribution

Population Distribution

??

??

?????

??? Sample

10μ 10μ

X xX

21

One more example

Time spent using e-mail per session is normally distributed with =8 minutes and =2 minutes.

1. If a random sample of 25 sessions were selected, what proportion of the sample mean would be between 7.8 and 8.2 minutes?

22

Example (Cont’d)2. If a random sample of 100 sessions were selected, what proportion

of the sample mean would be between 7.8 and 8.2 minutes?

3. What sample size would you suggest if it is desired to have at least 0.90 probability that the sample mean is within 0.2 of the population mean?

23

Sampling Distribution of the Proportion

Sampling Distributions

Sampling Distribution of

the Mean

Sampling Distribution of the Proportion

24

Population Proportions

In Bernoulli trials, let

π = the proportion of successes

Recall that Y = the number of successes in n Bernoulli trials follows

Bin(n, π)

For the ith Bernoulli trial, Define

Then, obviously

failure"" a is outcomeith theif 0

success"" a is outcomeith theif 1iX

)1()( and )( ii XXE

25

Population proportions (Cont’d) For large n, apply the CLT to sample mean and sum

How large is large?

Or

)-(1,N as ddistributeely approximat is

n

)-(1,N as ddistributeely approximat is

2

1

2

1

nnXY

n

XXp

n

ii

n

ii

51 and 5 )-n(n

51 and 5 -p)n(np

26

Z-Value for Proportions

n)(1

p

σ

pZ

p

Standardize p to a Z value with the formula:

27

Example

If the true proportion of voters who support

Proposition A is π = 0.4, what is the probability

that a sample of size 200 yields a sample

proportion between 0.40 and 0.45?

i.e.: if π = 0.4 and n = 200, what is

P(0.40 ≤ p ≤ 0.45) ?

28

Example(continued)

0.03464200

0.4)0.4(1

n

)(1σp

0.4251 1.44)ZP(0

0.03464

0.400.45Z

0.03464

0.400.40P0.45)pP(0.40

Find :

Convert to standard normal:

29

Review example The number of claims received by an automobile insurance company

on collision insurance on one day follows the following probability distribution:

With

Suppose the number of claims received are independent from day to day.

x 0 1 2 3 4

p(x) 0.65 0.2 0.1 0.03 0.02

93.0)(

57.0)(

24

0

2

4

0

x

x

xpx

xxp

30

Review example (cont’d) For a 50-day period, Find the probability of the following

events:1) The total number of claims exceeds 20

2) On more than 20 days, at least one claim is received

31

Sampling distribution of difference of two independent populations

An important estimation problem involves the comparison of means of the two populations. For example, you may want to make comparisons like these: The average scores on GRE for students who

majored in mathematics versus chemistry The average income for male and female college

graduates The proportion of patients receiving different

medications who recovered from a certain disease

32

Sample distributions of difference of two independent sample means

Suppose there are two populations

Independent random samples of size n₁ and n₂ observations have been selected from the two populations with sample means and respectively

Recall that when n₁ and n₂ are large, and are approximately normally distributed with

Population Mean S.d.

I

II

12

12

1X

2X

1

1111 ,

nXXE

2

2222 ,

nXXE

1X 2X

33

Since the two samples are independent

Standardize:

2

22

1

21

XX

21XX

n

σ

n

σσ

21

21

2

22

1

21

2121

XXZ

34

Example

A light bulb factory operates two different types of machines. The mean life expectancy is 385 hours from machine I and 365 hours from machine II. The process standard deviation of life expectancy of machine I is 110 hours and of machine II is 120 hours.

What is the probability that the average life expectancy of a random sample of 100 light bulbs from Machine I is shorter than the average life expectancy of 100 light bulbs from Machine II?

35

Example (Cont’d) Note that

Therefore

120,110

365,385

100,100

21

21

21

nn

1093.023.128.16

20

100120

100110

36538500

2221

ZPZP

ZPXXP

36

Sampling distribution of difference of two independent sample proportions

Assume that independent random samples of n₁ and n₂ observations have been selected from binomial populations with parameters and , respectively.

The sampling distribution of the difference in sample proportions (p₁-p₂) can be approximated by a normal distribution with mean and standard deviation

The Z statistic is

2

22

1

11

21

)1()1(21

21

nnpp

pp

2

22

1

11

2121

)1()1(nn

ppZ

21

37

Example

From a study by the Charles Schwab Corporation, 74% of African Americans and 84% of Whites with an annual income above $50,000 owned stocks.

For a random sample of 500 African American and a random sample of 500 Whites with income above $50,000, what is the probability that more whites own stocks?

38

Example (Cont’d)

Summary data:

It follows that

84.0,74.0

500,500

21

21

nn

99995.0)91.3(

50016.084.0

50026.074.0

74.084.00012

ZP

ZPppP

39

Important Summary of sampling distributions

Param. Point estimate

Sampling distribution

Standardized Z

μ

21

21

X

n

N2

,

n

XZ

nN

1, n

pZ

1

21 XX

2

22

1

21

21 ,nn

N

2

22

1

21

2121

nn

XXZ

21 pp

1

11

1

1121

11,

nnN

2

22

1

11

2121

11nn

ppZ

p

40

Sampling methods

Simple random samples Stratified samples

41

Simple Random Samples

Every individual or item from the frame has an equal chance of being selected

Selection may be with replacement or without replacement

Samples obtained from table of random numbers or computer random number generators

Simple to use May not be a good representation of the population’s

underlying characteristics

42

Stratified Samples

Divide population into two or more subgroups (called strata)

according to some common characteristic

A simple random sample is selected from each subgroup, with

sample sizes proportional to strata sizes

Samples from subgroups are combined into one

Ensures representation of individuals across the entire population

Population

Divided

into 4

strata

Sample

43

Types of Survey Errors

Coverage error

Non response error

Sampling error

Measurement error

Excluded from frame

Follow up on nonresponses

Random differences from sample to sample

Bad or leading question

(continued)