week 6 october 6-10

42
Week 6 October 6-10 Four Mini-Lectures QMM 510 Fall 2014

Upload: gil

Post on 22-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Week 6 October 6-10. Four Mini-Lectures QMM 510 Fall 2014 . Chapter Contents 8.1 Sampling Variation 8.2 Estimators and Sampling Errors 8.3 Sample Mean and the Central Limit Theorem 8.4 Confidence Interval for a Mean (μ) with Known σ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Week  6  October  6-10

Week 6 October 6-10

Four Mini-Lectures QMM 510Fall 2014

Page 2: Week  6  October  6-10

8-2

Sampling Distributions ML 6.1

Chapter Contents8.1 Sampling Variation8.2 Estimators and Sampling Errors8.3 Sample Mean and the Central Limit Theorem8.4 Confidence Interval for a Mean (μ) with Known σ8.5 Confidence Interval for a Mean (μ) with Unknown σ8.6 Confidence Interval for a Proportion (π)8.7 Estimating from Finite Populations8.8 Sample Size Determination for a Mean8.9 Sample Size Determination for a Proportion8.10 Confidence Interval for a Population Variance, 2 (Optional)

Chapter 8

So many topics, so little time …

Page 3: Week  6  October  6-10

8-3

Learning Objectives LO8-1: Define sampling error, parameter, and estimator.

LO8-2: Explain the desirable properties of estimators.

LO8-3: State the Central Limit Theorem for a mean.

LO8-4: Explain how sample size affects the standard error.

Chapter 8

Sampling Distributions

Page 4: Week  6  October  6-10

8-4

• Sample statistic – a random variable whose value depends on which population items are included in the random sample.

• Depending on the sample size, the sample statistic could either represent the population well or differ greatly from the population.

• This sampling variation can be illustrated. Here are 100 individual items drawn from a population. When n = 1, the histogram of the sampled items resembles the population, but not exactly.

Chapter 8

Sampling Variation

Page 5: Week  6  October  6-10

8-5

Chapter 8

• Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA applicants.

• The sample items vary, but the means tend to be close to the population mean (m = 520.78).

Sampling Variation

Example: GMAT Scores

Page 6: Week  6  October  6-10

8-6

• Sample dot plots show that the sample means have much less variation than the individual sample items.

Chapter 8

Sampling Variation

Example: GMAT Scores

Page 7: Week  6  October  6-10

8-7

• Estimator – a statistic derived from a sample to infer the value of a population parameter.

• Estimate – the value of the estimator in a particular sample.• A population parameter is usually represented by a

Greek letter and the corresponding statistic by a Roman letter.

Some Terminology

Chapter 8

Estimators and Sampling Distributions

Page 8: Week  6  October  6-10

8-8

Examples of Estimators

Chapter 8

Sampling DistributionsThe sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken.

Estimators and Sampling Distributions

Note: An estimator is a random variable since samples vary.

Page 9: Week  6  October  6-10

8-9

• Bias is the difference between the expected value of the estimator and the true parameter. Example for the mean,

• An estimator is unbiased if its expected value is the parameter being estimated. The sample mean is an unbiased estimator of the population mean since

• On average, an unbiased estimator neither overstates nor understates the true parameter.

Chapter 8• Sampling error is the difference between an estimate and the

corresponding population parameter. For example, if we use the sample mean as an estimate for the population mean, then the

Estimators and Sampling Distributions

Page 10: Week  6  October  6-10

8-10

Chapter 8

Estimators and Sampling Distributions

A desirable property for an estimator is for it to be unbiased.

Unbiased

Page 11: Week  6  October  6-10

8-11

• Efficiency refers to the variance of the estimator’s sampling distribution.• A more efficient estimator has smaller variance.

Efficiency

Figure 8.6

Chapter 8

Estimators and Sampling Distributions

Page 12: Week  6  October  6-10

8-12

ConsistencyA consistent estimator converges toward the parameter being estimated as the sample size increases.

Figure 8.6

Chapter 8

Estimators and Sampling Distributions

Page 13: Week  6  October  6-10

8-13

Chapter 8

Central Limit Theorem

The Central Limit Theorem is a powerful result that allows us toapproximate the shape of the sampling distribution of the sample mean even when we don’t know what the population looks like.

Page 14: Week  6  October  6-10

8-14

If the population is exactly normal, then the sample mean follows a normal distribution.

Chapter 8

As the sample size n increases, the distribution of sample means narrows in on the population mean µ.

Central Limit Theorem

Page 15: Week  6  October  6-10

8-15

If the sample is large enough, the sample means will have approximately a normal distribution even if your population is not normal.

Chapter 8

Central Limit Theorem

Page 16: Week  6  October  6-10

8-16

Illustrations of Central Limit Theorem

Note:

Chapter 8

Using the uniformand a right-skewed distribution.

Central Limit Theorem

Page 17: Week  6  October  6-10

8-17

The Central Limit Theorem permits us to define an interval within which the sample means are expected to fall. As long as the sample size n is large enough, we can use the normal distribution regardless of the population shape (or any n if the population is normal to begin with).

Applying The Central Limit Theorem

Chapter 8

Central Limit Theorem

Page 18: Week  6  October  6-10

8-18

Sample Size and Standard Error

Chapter 8

For example, when n = 4 the standard error is halved. To halve it again requires n = 16, and to halve it again requires n = 64. To halve thestandard error, you must quadruple the sample size (the law of diminishing returns).

Central Limit Theorem

The sample means tend to fall within a narrower interval as n increases. The key is the standard error:

/x n

Page 19: Week  6  October  6-10

8-19

• Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}.

• The population parameters are: m = 1.5, = 1.118.

Illustration: All Possible Samples from a Uniform Population

Chapter 8

Central Limit Theorem

Page 20: Week  6  October  6-10

8-20

• The population is uniform, yet the distribution of all possible sample means of size 2 has a peaked triangular shape.

Illustration: All Possible Samples from a Uniform Population

Chapter 8

Central Limit Theorem

Page 21: Week  6  October  6-10

8-21

The population is uniform, yet the histogram of sample means has a peaked triangular shape starting with n = 2. By n = 8, the histogram appears normal.

Illustration: 100 Samples from a Uniform Population

Chapter 8

Central Limit Theorem

Page 22: Week  6  October  6-10

8-22

The population is skewed, yet the histogram of sample means starts to have a normal shape starting with n = 4. By n = 16, the histogram appears arguably normal.

Illustration: 100 Samples from a Skewed Population

Chapter 8

Central Limit Theorem

Page 23: Week  6  October  6-10

8-23

What Is a Confidence Interval?

Chapter 8Confidence Interval for ML 6.2a Mean (m) with Known

Page 24: Week  6  October  6-10

8-24

What is a Confidence Interval?• The confidence interval for m with known is:

Chapter 8

Confidence Interval for a Mean (m) with Known

z-values for commonly-used confidence levels

Page 25: Week  6  October  6-10

8-25

Example: Bottle Fill

Chapter 8

Confidence Interval for a Mean (m) with Known

… but usually we do NOT know σ

Page 26: Week  6  October  6-10

8-26

• A higher confidence level leads to a wider confidence interval.

Choosing a Confidence Level

• Greater confidence implies loss of precision (i.e. greater margin of error).

• 95% confidence is most often used.

Chapter 8

Confidence Intervals for Example 8.2

Confidence Interval for a Mean (m) with Known

Page 27: Week  6  October  6-10

8-27

• A confidence interval either does or does not contain m.

• The confidence level quantifies the risk.

• Out of 100 confidence intervals, approximately 95% may contain m, while approximately 5% might not contain m when constructing 95% confidence intervals (for example, sample 14 below).

Interpretation

Chapter 8

Confidence Interval for a Mean (m) with Known

Page 28: Week  6  October  6-10

8-28

Chapter 8

• If is known and the population is normal, then we can safely use the formula to compute the confidence interval.

• If is known and we do not know whether the population is normal, a common rule of thumb is that n 30 is sufficient to use the formula as long as the distribution is approximately symmetric with no outliers.

Confidence Interval for a Mean (m) with Known

When Can We Assume Normality?

• Larger n may be needed to assume normality if you are sampling from a strongly skewed population or one with outliers.

Page 29: Week  6  October  6-10

8-29

Use the Student’s t distribution instead of the normal distribution when the population is normal but the standard deviation s is unknown and the sample size is small.

Student’s t Distribution

Chapter 8

Confidence Interval for ML 6.3a Mean (m) with Unknown

… and usually we do NOT know σ …

Page 30: Week  6  October  6-10

8-30

Student’s t Distribution

Chapter 8

Confidence Interval for a Mean (m) with Unknown

Page 31: Week  6  October  6-10

8-31

Student’s t Distribution• t distributions are symmetric and shaped like the

standard normal distribution.

• The t distribution is dependent on the size of the sample.

Figure 8.11

Chapter 8

Comparison of Normal and Student’s t

Confidence Interval for a Mean (m) with Unknown

Page 32: Week  6  October  6-10

8-32

Degrees of Freedom• Degrees of freedom (d.f.) is a parameter based on the sample size that

is used to determine the t distribution.

• The d.f. for the t distribution in this case is given by d.f. = n 1.

Chapter 8

• As n increases, the t distribution approaches the shape of the normal distribution.

• For a given confidence level, t is always larger than z, so a confidence interval based on t is always wider than if z were used.

Confidence Interval for a Mean (m) with Unknown

Comparison of Normal and Student’s t

Page 33: Week  6  October  6-10

8-33

Comparison of z and t• For very small samples, t-values differ substantially from the normal.• As degrees of freedom increase, the t-values approach the normal z-

values.

Chapter 8

• So for a 90 percent confidence interval, we would use t = 1.697, which is slightly larger than z = 1.645.

Confidence Interval for a Mean (m) with Unknown

Note: the z and t distributions are almost the same for d.f. = 30• For example, for n = 31, the

degrees of freedom would be d.f. = 31 – 1 = 30.

Page 34: Week  6  October  6-10

8-34Figure 8.13

Chapter 8

Confidence Interval for a Mean (m) with Unknown

Example: GMAT Scores Again

Page 35: Week  6  October  6-10

8-35

• Construct a 90% confidence interval for the mean GMAT score of all MBA applicants.

x = 510 s = 73.77

• Since is unknown, use the Student’s t for the confidence interval with d.f. = 20 – 1 = 19.

• Find t/2 = t.05 = 1.729 from Appendix D.

Chapter 8

Confidence Interval for a Mean (m) with Unknown

Example: GMAT Scores Again

Page 36: Week  6  October  6-10

8-36

• For a 90% confidence interval, use Appendix D to find t0.05 = 1.729 with d.f. = 19.

Note: We could also use Excel, MINITAB, etc. to obtain t.05 values as well as to construct confidence intervals.

Chapter 8

We are 90 percent confident that the true mean GMAT score might be within the interval [481.48, 538.52]

Confidence Interval for a Mean (m) with Unknown

Example: GMAT Scores Again

=T.INV.2T(0.1,19) = 1.729

Page 37: Week  6  October  6-10

8-37

Confidence Interval Width• Confidence interval width reflects

- the sample size, - the confidence level and - the standard deviation.

• To obtain a narrower interval and more precision- increase the sample size, or - lower the confidence level (e.g., from 90% to 80% confidence).

Chapter 8

Confidence Interval for a Mean (m) with Unknown

There is no free lunch!

Page 38: Week  6  October  6-10

8-38

Using Appendix D

• Beyond d.f. = 50, Appendix D shows d.f. in steps of 5 or 10.

• If the table does not give the exact degrees of freedom, use the t-value for the next lower degrees of freedom.

• This is a conservative procedure since it causes the interval to be slightly wider.

• A conservative statistician may use the t distribution for confidence intervals when σ is unknown because using z would underestimate the margin of error.

Chapter 8

Confidence Interval for a Mean (m) with Unknown

Page 39: Week  6  October  6-10

8-39

• If the population is normal, then the sample variance s2 follows the chi-square distribution (c2) with degrees of freedom d.f. = n – 1.

• Lower (c2L) and upper (c2

U) tail percentiles for the chi-square distribution can be found using Appendix E.

Chi-Square Distribution

Confidence Interval for a ML 6.4Population Variance, 2.

Chapter 8

Note: The chi-square distribution is skewed right, but less so for larger d.f.

Page 40: Week  6  October  6-10

8-40

• Using the sample variance s2, the confidence interval is

Confidence Interval

• To obtain a confidence interval for the standard deviation , just take the square root of the interval bounds.

Confidence Interval for a Population Variance, 2

Chapter 8

Page 41: Week  6  October  6-10

8-41

• You can use Appendix E to find critical chi-square values.

Chapter 8

or from Excel:=CHISQ.INV(0.025,39)= 23.65=CHISQ.INV(0.975,39) = 58.12

Confidence Interval for a Population Variance, 2

Page 42: Week  6  October  6-10

8-42

• Estimating a variance is easy.

• But you don’t see it very often.

• Maybe because the chi-square distribution is less familiar?

• Maybe because we usually are more about the mean?

Bottom Line:

Chapter 8

Confidence Interval for a Population Variance, 2