ch7. sampling and sampling distributions - sogangocw.sogang.ac.kr/rfile/2014/business...

1

CH7. Sampling and Sampling Distributions

Sampling variation

• (Sample) statistic – a random variable whose value depends on which

population items happen to be included in the random sample (independent &

identical).

• Depending on the sample size, the sample statistic could either represent the

population well or differ greatly from the population

• Ex) Consider eight random samples of size n = 5 from a large population of

GMAT scores for MBA applicants.

Point Estimator

• Estimator – a statistic ( ) derived from a sample to infer the value of a

population parameter ( ).

• Estimate – the specific value of the estimator in a particular sample.

• Population parameters are represented by Greek letters and the

corresponding statistic by Roman letters.

2

Ex) sample mean, sample proportion, sample standard deviation

Sampling distribution

• The sampling distribution of an estimator is the probability distribution of all

possible values the statistic may assume when a random sample of size n is

taken.

• An estimator is a random variable since samples vary.

• Sampling Error =

• We want to have an estimator minimizing the sampling error.

Unbiased

• Bias is the difference between the expected value of the estimator and the

true parameter.

• Bias =

• An estimator is unbiased if

• On average, an unbiased estimator neither overstates nor understates the

true parameter.

Efficiency (under unbiased)

• Efficiency refers to the variance of the estimator’s sampling distribution.

• A more efficient estimator has smaller variance.

• If and are unbiased estimator with , then is more

efficient estimator than .

Better

• If and are with , then is better estimator

than .

ˆ( )E

ˆ( )E

1 2 1 2ˆ ˆ( ) ( )V V 1

2

1 2 1 2ˆ ˆ( ) ( )MSE MSE 1

2

3

Consistency

• A consistent estimator converges toward the parameter being estimated as

the sample size increases. (as n , )

Central limit theorem

• Central limit theorem for a mean: If a random sample of size n is drawn from

a population with mean and standard deviation , the distribution of

the sample mean x approaches a normal distribution with mean and

standard deviation /x n as the sample size increase.

• The central limit theorem does not specify the population distribution into a

normal distribution.

Sample mean ( )

• The sample mean is an unbiased estimator of , therefore,

( ) ( )E X E X

• The standard deviation (standard error) of the sample mean is:

( ) /xV X n

• If is unknown, use the following:

/xS S n

• The sample mean is a consistent estimator.

Sample variance (2S )

2 2ˆ ˆ ˆ ˆ ˆ( ) ( ) [{ ( )} { ( ) }]MSE E E E E

2 2ˆ ˆ ˆ ˆ ˆ ˆ[{ ( )} 2{ ( )}{ ( ) } { ( ) } ]E E E E E

2 2 2ˆ ˆ ˆ ˆ ˆ{ ( )} { ( ) } ( ) ( )E E E V bias

2 2( ) ?E S 2 2

1

1( )

1

n

i

i

S X Xn

2 2

1 1

( ) ( )n n

i i

i i

X X X X

2 2

1

2 2

1

[( ) 2( )( ) ( ) ]

( ) ( )

n

i i

i

n

i

i

X X X X

X n X

x

4

Sampling distribution of

• If we use a large (n > 30) simple random sample, the central limit theorem

enables us to conclude that the sampling distribution of can be

approximated by a normal distribution.

• When the simple random sample is small (n < 30), the sampling distribution

of can be considered normal only if we assume the population has a

normal distribution.

• Ex) The average price, , of a 5 GB MP3 player is $80.00 with a standard

deviation, , equal to $10.00. We have a sample of 20 players.

( ) ( )E X E X = $80.00

/x n =10 / 20 =2.236

1) If the distribution of prices for these players is a normal distribution,

then the sampling distribution on x is? N (80.00, 2.236).

2) Compute P (80< X <83) under the normal distribution assumption.

Sample proportion (p)

• X~ Binomial (n, π ), X: # of success

• X=Y1+Y2+…+Yn , Yi~ Bernoulli (π)

• p = X/n (a proportion is a mean of data whose only value is 0 or 1).

• E(p)=π and V(p)= π(1- π)/n

Sampling distribution of p

• The sampling distribution of p can be approximated by a normal distribution

whenever the sample size is large.

p~ N(π, π(1- π)/n)

• (Rule of Thumb) The sample size is considered large whenever these

conditions are satisfied:

n π > 5 and n (1- π) > 5

2 2( )iE X

2 2 2

1 1

1 1[ ( ) ] [ ( ) ( ) ]

1 1

n n

i i

i i

E X X E X n Xn n

22 21

[ ]1

n nn n

x

x

x

5

CH8. Interval Estimation

Confidence level

• Assume that 2~ ( , )X N and

2 is known. 2~ ( , / )X N n

~ (0,1)/

XZ N

n

• The confidence level (usually expressed as 100 (1 ) %) is the area under the curve

of the sampling distribution.

/2 /2 /2 /21 ( ) ( )/

XP Z X Z P Z Z

n n n

/2 /2( )P Z Z Z

We can rewrite this in terms of : If 0.05 (or 1 0.95 ), then

1.96 / 1.96 /X n X n

Confidence interval for population mean with known

• Assume that 2~ ( , )X N and

2 is known. 2~ ( , / )X N n

• A sample mean x is a point estimate of the population mean .

• A confidence interval for the population mean is a range

• Other expressions are 0 0( , )x z x zn n

or 0x z

n

.

Interpretation

• A confidence interval either does or does not contain .

• Out of 100 confidence intervals, approximately 95% would contain , while

approximately 5% would not contain 95% C.I. for .

0 0x z x zn n

6

Confidence interval for population mean with unknown

• Use the Student’s t distribution instead of the normal distribution when the population is

normal but the standard deviation is unknown

• The confidence interval for (unknown ) is

0 0

s sx t x t

n n

(or 0 0( , )

s sx t x t

n n , or

0

sx t

n )

Here 0 1( / 2)nt t

Note: If 2~ ( , )X N , then

1~/

n

Xt t

s n

Degree of freedom in student t distribution

• Degrees of Freedom (d.f.) is a parameter based on the sample size that is used to

determine the value of the t statistic.

• Degrees of freedom tell how many observations are used to calculate s, less the number

of intermediate estimates used in the calculation. d.f. = n - 1

• As n increases, the t distribution approaches the shape of the normal distribution.

Ex) Construct a 90% C.I. for the mean GMAT score of all MBA applicants.

x =510, s=73.77

Since is unknown, use the Student’s t for the C.I. with n = 20 – 1 = 19 d.f.

First find 0t from student t distribution.

The 90% confidence interval is:

1.729 1.729s s

x xn n

Confidence Interval for a population proportion ( )

• The confidence interval for might be

0

(1 )p z

n

7

• Since is unknown, the confidence interval for p = x/n (assuming a large sample) is

0

(1 )p pp z

n

Ex) A sample of 75 retail in-store purchases showed that 24 were paid in cash.

What is p? p = x/n = 24/75 = .32

Is p normally distributed? np = (75)(.32) = 24, n(1-p) = (75)(.88) = 51

The 95% confidence interval for the proportion of retail in-store purchases that are paid in

cash is:

0

(1 ) .32(1 .32).32 1.96

75

p pp z

n

Therefore, 0.214 < < 0.426

Confidence interval width

• Example: /2X Z

n

⇒

/22 Zn

• Confidence interval width reflects

- the sample size,

- the confidence level and

- the standard deviation.

• To obtain a narrower interval and more precision

- increase the sample size

- lower the confidence level (e.g., from 90% to 80% confidence)?

Sample size determination for a mean

• To estimate a population mean with a precision of + E (allowable error), you would

need a sample of size

2

0zn

E

Sample size determination for a proportion

• To estimate a population proportion with a precision of + E (allowable error), you

would need a sample of size

2

0 (1 )z

nE

• How to estimate ?

8

Method 1: Take a Preliminary Sample

Take a small preliminary sample and use the sample proportion (p) in

place of in the sample size formula.

Method 2: Use a Prior Sample or Historical Data

How often are such samples available? might be different enough to

make it a questionable assumption.

Method 3: Assume that = .50

This conservative method ensures the desired precision. However, the

sample may end up being larger than necessary.

Chi-square distribution

• If the population is normal, then the modification of sample variance 2s

follows the chi-square distribution (2 ) with degrees of freedom v = n – 1.

• Lower (left) (2

L ) and upper (right) (2

U ) tail percentiles for the chi-square

distribution can be found using the chi-square distribution table.

Confidence interval for a population variance (2 )

• Using the sample variance 2s , the confidence interval is

2 22

2 2

( 1) ( 1)

U L

n s n s

Summary

(1 )100% Confidence Interval Formulas

A. One population case:

1. For (population mean):

1) Population is a normal distribution. Population variance 2 is known.

/ 2x Zn

2) Population is a normal distribution. Population variance 2 is unknown.

( 1), / 2n

sx t

n ,

where ‘s’ is a sample standard deviation (2 2

1

( ) /( 1)n

i

i

s x x n

).

9

3) Population distribution is not known, but large samples. (Approximate version)

(i) Population variance 2 is known.

/ 2x Zn

(ii) Population variance 2 is unknown.

/ 2

sx Z

n

2. For (population proportion):

Under normal approximation of binomial distribution;

/ 2

(1 )p pp Z

n

,

where p is a sample proportion.

3. For 2 (population variance):

Population is a normal distribution.

2 22

2 2

1, / 2 1,1 / 2

( 1) ( 1)

n n

n s n s

12

<Chi-square critical values>

ch7. sampling and sampling distributions - sogangocw.sogang.ac.kr/rfile/2014/business...

Documents