chapter 7 inferences concerning a mean 7.1 point estimation...

51
Chapter 7 Inferences concerning a mean 7.1 Point Estimation A single number (single value) computed from sample data (e.g. mean, variance, percentile) is referred to as a statistic. Point estimation involves the use of sample data to calculate a statistic which is to serve as the "best guess" or "best estimate" of an unknown population parameter. A point estimator is a method (almost always a representation) for computing the statistic from the sample data. Thus point estimation is the application of a point estimator to the data. e.g. desired Population parameter: mean Data: random sample 1 , 2 ,…, Point Estimator: (Recall theorem 6.1) As with any estimation, we want to have some idea of how well the estimator is doing. thus we want a means to asses the possible error of the point estimator

Upload: others

Post on 15-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Chapter 7 Inferences concerning a mean 7.1 Point Estimation

A single number (single value) computed from sample data (e.g. mean, variance, percentile) is referred to as a statistic. Point estimation involves the use of sample data to calculate a statistic which is to serve as the "best guess" or "best estimate" of an unknown population parameter. A point estimator is a method (almost always a representation) for computing the statistic from the sample data. Thus point estimation is the application of a point estimator to the data. e.g. desired Population parameter: mean 𝜇 Data: random sample 𝑋1, 𝑋2, … , 𝑋𝑛 Point Estimator: 𝑋 (Recall theorem 6.1) As with any estimation, we want to have some idea of how well the estimator is doing. thus we want a means to asses the possible error of the point estimator

Page 2: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

The accuracy of a point estimate can be expressed as two components:

1) the bias – which is the expected value of the estimator minus the value of the parameter it is estimating e.g. for the estimator 𝑋 , the bias is 𝐸[𝑋 − 𝜇] from a single sample, the bias can be estimated by 𝑥 − 𝜇 the problem with using the bias, is that it requires knowledge of the unknown population mean 𝜇

2) the standard deviation (variance) of the estimator e.g. for the estimator 𝑋 , the variance is 𝜎2 𝑛 this could be approximated by the sample standard variance 𝑠2 𝑛

e.g. desired Population parameter: mean 𝜇

Data: random sample 𝑋1, 𝑋2, … , 𝑋𝑛

Point Estimator: 𝑋

Bias of point estimator: 𝐸 𝑋 − 𝜇 = 0 (!! – theorem 6.1)

Estimate of error of point estimator: sample standard error 𝑆 𝑛

Page 3: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. Consider the random sample 2.4 2.9 2.7 2.6 2.9 2.0 2.8 2.2 2.4 2.4 2.0 2.5 of 12 measurements of lead concentration [𝜇g/l] from 12 environmental samples. Each sample had 1.25 𝜇g/l lead added to it just before measurement. From the random sample, the point estimator gives

𝑥 = 2.483, with est. bias = 2.483 − 1.25 = 1.233, and 𝑠 𝑛 = 0.090

Since 𝑋 is an unbiased estimator, and the measured sample bias is so much larger than the standard error, we suspect that the environmental samples already had lead in them before each sample had 1.25 𝜇g/l lead added. (In fact we suspect the environment has a lead concentration around 1.2 𝜇g/l )

Let 𝜃 denote the population parameter of interest (e.g mean, variance, standard deviation,

etc). Let 𝜃 denote be a statistic (single value computed from a random sample to estimate 𝜃).

A statistic 𝜃 is said to be an unbiased estimate (its estimator is said to be an unbiased estimator) if and only if the mean value of the sampling distribution of the estimator equals 𝜃 for all possible values that the parameter 𝜃 can have. (a statistic is unbiased if, on average, its value equals the value of the population parameter it estimates) Thus we see (from theorem 6.1) that the point estimator 𝑋 is an unbiased estimator of the mean value of a population.

Page 4: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Unbiased point estimators are highly desirable. There may be more than one unbiased point estimators for the same parameter.

e.g. for samples of size 2, both 𝑋1 + 𝑋2

2 and

𝑎𝑋1 + 𝑏𝑋2

𝑎 + 𝑏, where 𝑎 > 0, 𝑏 > 0

are unbiased estimators for the mean of the population.

How to choose among unbiased estimators? Consider a normal population. We know (by the symmetry of the distribution) that the mean 𝜇 and the median 𝑃0.5 are equal for a normal population. We know that the point estimator 𝑋 is an unbiased estimator for the mean, and it can be shown that our method of getting the sample median (Ch. 2.6) is an unbiased estimator for the median. Thus we have two unbiased estimators for the mean of a normal population. We know that the variance of the point estimator 𝑋 is 𝜎2 𝑛 . It can be shown that the variance of the point estimator for the median is 𝜋

2 𝜎2 𝑛 . Thus the variance of the median point estimator is larger than the variance of the mean point estimator for random samples of the same size. Thus if the goal is to determine the mean (or median) of a normal population, it seems more efficient to use the point estimator 𝑋

A statistic 𝜃 1 is said to be a more efficient unbiased estimator of 𝜃 than the statistic 𝜃 2 if

1. 𝜃 1 and 𝜃 2 are both unbiased estimators for 𝜃

2. and the variance of the sampling distribution for 𝜃 1 is never larger than that for 𝜃 2 and is smaller than that for 𝜃 2 for at least one value of 𝜃

Page 5: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Recall that, for large n,

𝑍 ≡𝑋 − 𝜇

𝜎 𝑛

is a RV having a distribution that is approximately normal. Therefore, for any value 0 ≤ 𝛼 ≤ 1, the probability of getting a value of Z that lies in the range

−𝑧𝛼/2 ≤ 𝑋 − 𝜇

𝜎 𝑛 ≤ 𝑧𝛼/2

i.e. |𝑋 − 𝜇|

𝜎 𝑛 ≤ 𝑧𝛼/2

is 1 − 𝛼. Let

𝐸 = 𝑧𝛼/2 ∙𝜎

𝑛

Then the probability of getting a value of 𝑋 such that

𝑋 − 𝜇 ≤ 𝐸

is 1 − 𝛼. Stated another way, we can state, with probability 𝟏 − 𝜶, that the error 𝒙 − 𝝁

in a single measurement will be at most 𝑬 (where 𝐸 = 𝑧𝛼/2 ∙𝜎

𝑛). The standard values

used for 𝛼 are 0.05 and 0.01 having respective values 𝑧0.025 = 1.96 and 𝑧0.005 = 2.575 (Table 3)

Page 6: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. sample size n = 150 is used to measure the mean of a population whose standard deviation is (assumed to be) σ = 6.2. 99% of such measurements will generate a sample mean that differs from the true mean by at most what?

99% corresponds to 1 − 𝛼 = 0.99, i. e. 𝛼/2 = 0.005. We know that the largest deviation from the mean corresponding to this value of α is

𝑋 − 𝜇 ≤ 𝐸 = 𝑧𝛼2

∙𝜎

𝑛= 2.575 ∙

6.2

150= 1.3

Thus 99% of the time, the sample mean should differ from the true mean by at most 1.3

Page 7: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Suppose a single sample is measured and the value obtained for 𝑥 is 69.5. We do not know for sure if this value differs from the true mean by at most 1.3 but, based on the above calculation (i.e. the above prediction), we are 99% confident that the measurement differs from the true mean by at most 1.3.

In general, we make probability statements about future (predicted) values of RVs and confidence statements about data values that have been obtained

Page 8: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

How large should a sample be so that we can be 1 − 𝛼 100% confident that in the error of a single measurement of the sample mean will be at most E? Solving

𝐸 = 𝑧𝛼/2 ∙𝜎

𝑛

for n we have

𝑛 =𝑧𝛼 2 ∙ 𝜎

𝐸

2

Thus, if we make the sample size this big, we can be 1 − 𝛼 100% confident that our measured sample mean will differ from the true mean by an most E.

𝑍𝛼/2

Page 9: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. A population standard deviation is 1.6. How large must a sample size be to be 95% confident that the error in the measured sample mean is no larger than 0.5?

(i.e. n = 40). Thus a sample size of 40 is sufficient to be 95% confident in the desired error.

Analysis: What is given in this problem and is the question asked by the problem? What is 1.6? What is the meaning of the number 0.5?

𝜎 = 1.6, 𝑋 − 𝜇 ≤ 𝐸 = 0.5, 𝛼 = 1 − 0.95 = 0.05, 𝛼

2= 0.025

Find from Table 3: 𝑍0.025 = 1.96

𝑛 ≥𝑍0.025𝜎

𝐸

2

= 39.3

Page 10: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

The major problem with the preceding analysis for the confidence estimator E is that it assumes knowledge of the population standard deviation σ. If σ is unknown and the sample size is large, we may have to further approximate 𝐸 by

𝐸 = 𝑧𝛼/2 ∙𝑠

𝑛

If σ is unknown, and the population under consideration is normal, we can use the sample standard deviation s as long as we consider

𝑡 ≡𝑋 − 𝜇

𝑠 𝑛

which is a RV having a t – distribution with 𝑛 − 1 df. In this case we can assert with 1 − 𝛼 100% confidence that the error made in using 𝑥 to estimate 𝜇 is at most

𝐸 = 𝑡𝛼/2 ∙𝑠

𝑛

for samples of size n and 𝑡𝛼/2 computed for the t – distribution with 𝑛 − 1 df.

Page 11: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. for 6 random samples, the measured mean and standard deviations are 232.26 and 0.14. With 98% confidence limit, the measured mean differs from the true mean by at most what?

For 𝑛 = 6, and 𝛼 2 = 0.01 the t value is 𝑡0.01 = 3.365 (𝑣 = 5). Thus

𝐸 = 3.365 ∙0.14

6= 0.19

With 98% confidence limit, the true mean satisfies −0.19 ≤ 232.26 − 𝜇 ≤ 0.19

i.e. 232.07 ≤ 𝜇 ≤ 232.45

Analysis: unlike the previous problem, here 0.14 is the sample standard deviation, not the standard deviation of the population!

Page 12: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

7.2 Interval estimation Once we wrote that, with probability 1 − 𝛼, we have

−𝑧𝛼/2 ≤ 𝑋 − 𝜇

𝜎 𝑛 ≤ 𝑧𝛼/2

we developed a way to estimate errors over intervals. Rewriting the above, we have that, if 𝒙 is the mean of a sample of size n, then a population with standard deviation 𝝈 has, with 𝟏 − 𝜶 𝟏𝟎𝟎% confidence, a mean value 𝝁 that lies in the interval

𝒙 −𝒛𝜶/𝟐∙𝝈

𝒏 ≤ 𝝁 ≤ 𝒙 +𝒛𝜶/𝟐∙

𝝈

𝒏

i.e.

𝒙 − 𝑬 ≤ 𝝁 ≤ 𝒙 + 𝑬

The interval 𝑥 − 𝐸, 𝑥 + 𝐸 is known as the confidence interval for 𝜇 having the degree of confidence 1 − 𝛼 100%. The endpoints of the confidence interval are known as the confidence limits.

Remember, since this confidence interval uses z-scores, it is applicable for normal populations, or for large sample sizes (where the normal distribution is a good approximation).

Page 13: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. A sample mean of 21.6 is observed for a random sample of size 100 taken from a population having standard deviation 5.1. With 95% confidence, over what interval does the population mean lie?

𝑛 = 100, 𝑥 = 21.6, 𝜎 = 5.1, 𝑧0.025 = 1.96 . Thus the 95% confidence interval is

21.6 − 1.96 ∙5.1

100 , 21.6 + 1.96 ∙

5.1

100= 20.1 , 22.6

If the population is normal but the population standard deviation is unknown, the confidence interval (confidence limits) can be established using t-values

𝑥 − 𝑡𝛼 2 ∙𝑠

𝑛 ≤ 𝜇 ≤ 𝑥 + 𝑡𝛼/2 ∙

𝑠

𝑛

If the population standard deviation is unknown, but the sample size large, the confidence interval (confidence limits) may be approximated using the sample standard deviation

𝑥 − 𝑧𝛼 2 ∙𝑠

𝑛 ≤ 𝜇 ≤ 𝑥 + 𝑧𝛼/2 ∙

𝑠

𝑛

Page 14: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. A sample mean of 3.42 and standard deviation of 0.68 are observed for a random sample of size 16 taken from a normal population. With 99% confidence, over what interval does the population mean lie?

𝑛 = 16, 𝑥 = 3.42, 𝑠 = 0.68, 𝑡0.005 = 2. 947 (𝑣 = 15). Thus the 99% confidence interval is

3.42 − 2.947 ∙0.68

16 , 3.42 + 2.947 ∙

0.68

16= 2.92 , 3.92

Page 15: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

It is important that the following distinctions be understood for confidence intervals. Consider, for e.g. 1 − 𝛼 100% = 95%. Before any observations are made, 𝑋 and S are RVs and the interval

𝑋 − 𝑡0.025 ∙𝑆

𝑛 , 𝑋 + 𝑡0.025 ∙

𝑆

𝑛

is a random interval centered at 𝑋 with length 2 𝑡0.025 ∙ 𝑆 𝑛 (i.e. length proportional to S). 95% of the time, this interval will cover the true (fixed) population mean 𝜇. Once a sample observation has been made (giving the values 𝑥 and 𝑠), the calculated interval

𝑥 − 𝑡0.025 ∙𝑠

𝑛 , 𝑥 + 𝑡0.025 ∙

𝑠

𝑛

is fixed. We do not know if 𝜇 does lie in this interval – it either does or it doesn’t. However we are 95% confident that it does: specifically if we run a large number of samples of size n, (each sample giving a different value for 𝑥 and 𝑠), in 95% of those samples, 𝜇 would lie in the measured interval for that sample. (See Fig. 7.2)

Page 16: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

7.3 Maximum Likelihood

We have a RV that we believe is governed by a binomial distribution

𝑏 𝑥; 𝑛, 𝑝 =𝑛𝑥

𝑝𝑥 1 − 𝑝 𝑛−𝑥

with 𝑛 = 4 but we do not know the value of p. (The RV under consideration is the number of successes x observed in Bernoulli trials of length 4). We take a single sample and observe a value x=1. (We do one sequence of length 4 and observe a single success out of the 4 trials.) What information does this give us regarding p? One way of thinking is that the single value of x that we see is the most likely value of x that would be produced by the distribution. From the sample of 5 binomial distributions (different p values for 𝑛 = 4) below, we see that the distribution with p = 0.3 produces values x=1 with the greatest probability.

Page 17: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Thus we suspect that the most likely value of p is 𝑝 ≈ 0.3. In fact, we can do better than this - we can compute the value of p that gives the maximum probability at 𝑥 = 1.

The binomial distribution for 𝑛 = 4, 𝑥 = 1 is

𝑏(𝑥 = 1; 𝑛 = 4, 𝑝) = 41

𝑝1 1 − 𝑝 4−1 = 4𝑝 1 − 𝑝 3

From Calc 1, we know that this function has maximum value at 1) the endpoints

or 2) when 𝑑

𝑑𝑝4𝑝 1 − 𝑝 3 = 0

1) checking the endpoints 𝑝 = 0,1 we find 𝑏(𝑥 = 1; 𝑛 = 4,0) = 𝑏(𝑥 = 1; 𝑛 = 4,1) = 0

2) checking the vanishing derivative we find 4 1 − 𝑝 3 − 4𝑝 ∙ 3 1 − 𝑝 2 = 0 → 1 − 𝑝 = 3𝑝 → 𝑝 = 0.25 and 𝑏(𝑥 = 1; 𝑛 = 4,0.25) = 0.4219

Thus the binomial distribution with 𝑝 = 0.25 has the maximum probability when 𝑥 = 1

Therefore, observing one sample producing 𝑥 = 1 leads us to conclude that the most likely value of p is 𝑝 = 0.25

Note: in this case the maximum probability occurred where the derivative vanished. If we had instead observed 𝑥 = 4, the maximum probability would occur at the endpoint value 𝑝 = 1 and not where the derivative vanished . Therefore both steps 1 and 2 must be checked.

Page 18: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Observing more values of x (more samples) should provide more information about the most likely value for p. Suppose we take a random sample 𝑋1, 𝑋2 of size 2 (two observations of 4 Bernoulli trials) and get 𝑥1 = 1, 𝑥2 = 2 (one success in the first sample, two successes in the second). From the observation 𝑥1 = 1 we concluded the most likely value for p was 𝑝 = 0.25. By similar reasoning, seeing 𝑥2 = 2 implies the most likely value for p will be 𝑝 = 0.5. Thus seeing 𝑥1 = 1, 𝑥2 = 2 implies the most likely value for p should be somewhere between 0.25 and 0.5. We can quantify this by maximizing the joint probability density. Since, by definition of a random sample, the two values from our random sample comes from independent RVs, then their joint probability density is the product of their individual probability densities

𝑓 𝑥1, 𝑥2; 𝑝 = 𝑏(𝑥1 = 1; 𝑛 = 4, 𝑝) ∙ 𝑏(𝑥2 = 2; 𝑛 = 4, 𝑝)

= 4𝑝 1 − 𝑝 3 ∙ 6𝑝2 1 − 𝑝 2 = 24𝑝3 1 − 𝑝 5

Maximizing 𝑓 𝑥1, 𝑥2 requires checking endpoint values as well as checking for vanishing derivative:

1) checking the endpoints 𝑝 = 0,1 we find

𝑓 𝑥1, 𝑥2; 𝑝 = 0 = 𝑓 𝑥1, 𝑥2; 𝑝 = 1 = 0

2) checking the vanishing derivative 𝑑𝑓(𝑥1,𝑥2)

𝑑𝑝= 0 we find

24 ∙ 3𝑝2 1 − 𝑝 5 − 24 ∙ 5𝑝3 1 − 𝑝 4 = 0 → 𝑝 = 0.375 𝑓 𝑥1, 𝑥2; 𝑝 = 0.375 = 0.12

Thus our observation of 𝑥1 = 1, 𝑥2 = 2 leads us to infer that p is most likely 0.375

Page 19: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

We can generalize this procedure to random samples of size n, and to any probability distribution: Assume a population is governed by a probability density 𝑓(𝑥; 𝜃) where 𝜃 is a parameter in the density. Take a random sample of size n giving the observations 𝑥1, 𝑥2, … , 𝑥𝑛. Form the joint probability density for 𝑋1, 𝑋2, … , 𝑋𝑛

𝐿 𝜃 𝑥1, 𝑥2, … , 𝑥𝑛 = 𝑓(𝑥𝑖; 𝜃)

𝑛

𝑖=1

In this context, the joint probability density when 𝑥1, 𝑥2, … , 𝑥𝑛 have specific measured values, 𝐿 𝜃 𝑥1, 𝑥2, … , 𝑥𝑛 , is referred to as the likelihood function for 𝜃. Maximizing the likelihood function for the particular observation 𝑥1, 𝑥2, … , 𝑥𝑛 gives a maximum likelihood

value 𝜃 for the parameter 𝜃 .

Note: maximizing the likelihood function requires checking endpoint values of 𝜃 as well as

checking for 𝑑𝐿

𝑑𝜃= 0

A statistic 𝜃 𝑋1, … , 𝑋𝑛 is a maximum likelihood estimator of 𝜃 if, for each sample 𝑥1, … , 𝑥𝑛,

𝜃 𝑥1, … , 𝑥𝑛 is a value for the parameter 𝜃 that maximizes the likelihood function 𝐿 𝜃 𝑥1, 𝑥2, … , 𝑥𝑛 .

Page 20: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Ex. 1 A characteristic (e.g. a genetic trait) occurs in a population with frequency p. We measure a random sample 𝑋1, … , 𝑋𝑛 of size n getting the observations 𝑥1, … , 𝑥𝑛 where each 𝑥𝑖 is either 0 (doesn’t have the trait) or 1 (has the trait). The probability density for each 𝑋𝑖 is

𝑓 𝑥𝑖; 𝑝 = 𝑝𝑥𝑖(1 − 𝑝)1−𝑥𝑖

check: 𝑥𝑖 = 0 → 𝑓 0; 𝑝 = 1 − 𝑝 , 𝑥𝑖 = 1 → 𝑓 1; 𝑝 = 𝑝 √ The likelihood function is

𝐿 𝑝 𝑥1, 𝑥2, … , 𝑥𝑛 = 𝑝𝑥𝑖(1 − 𝑝)1−𝑥𝑖

𝑛

𝑖=1

= 𝑝𝑥1(1 − 𝑝)1−𝑥1 ∙ 𝑝𝑥2 1 − 𝑝 1−𝑥2 ∙ … ∙ 𝑝𝑥𝑛(1 − 𝑝)1−𝑥𝑛 = 𝑝𝑥1+𝑥2+⋯+ 𝑥𝑛 (1 − 𝑝)1−𝑥1+1−𝑥2+ …+1−𝑥𝑛

𝑝 𝑥𝑖𝑛𝑖=1 (1 − 𝑝)𝑛− 𝑥𝑖

𝑛𝑖=1

1) checking endpoint values 𝐿 0 𝑥1, 𝑥2, … , 𝑥𝑛 = 0

as long as all the 𝑥𝑖 are not zero or all are not 1

For the special cases 𝑥𝑖

𝑛𝑖=1 = 0 → 𝐿 0 0,0, … , 0 = (1 − 0)𝑛= 1

and 𝑥𝑖

𝑛𝑖=1 = 𝑛 → 𝐿 1 1,1, … , 1 = 1𝑛 = 1

Page 21: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

2) checking for a vanishing derivative. Let 𝑎 = 𝑥𝑖𝑛𝑖=1 . Then

0 =𝑑𝐿(𝑝)

𝑑𝑝=

𝑑

𝑑𝑝𝑝𝑎(1 − 𝑝)𝑛−𝑎= 𝑎𝑝𝑎−1(1 − 𝑝)𝑛−𝑎−𝑝𝑎(𝑛 − 𝑎)(1 − 𝑝)𝑛−𝑎−1

→ 𝑎 1 − 𝑝 = 𝑛 − 𝑎 𝑝 → 𝑝 = 𝑎/𝑛 i.e.

𝑝 = 𝑥𝑖

𝑛𝑖=1

𝑛

𝐿𝑎

𝑛=

𝑎

𝑛

𝑎

(1 −𝑎

𝑛)𝑛−𝑎

From 1) and 2), we see that the maximum likelihood estimator 𝑝 is

𝑝 =

0 if 𝑥𝑖

𝑛

𝑖=1= 0

1 if 𝑥𝑖

𝑛

𝑖=1= n

𝑥𝑖𝑛𝑖=1

𝑛otherwise

In fact, we note that the third condition for estimator 𝑝 automatically includes the first two. We therefore conclude that

𝑝 = 𝑥𝑖

𝑛𝑖=1

𝑛= 𝑥

is the maximum likelihood estimator

Page 22: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Note: in step 2, the text book checks the derivative 𝑑

𝑑𝑝ln 𝐿(𝑝)

The text book is relying on the following result from Calc 1:

Theorem: Let 𝑔(𝑓) be a function that is monotonically increasing ( 𝑑𝑔

𝑑𝑓> 0 for all values of f)

or monotonically decreasing ( 𝑑𝑔

𝑑𝑓< 0 for all values of f) . Then the functions 𝑓(𝑥) and

g(𝑓 𝑥 ) have derivatives that vanish at the same values of x

Proof: by the chain rule 𝑑(𝑔(𝑓 𝑥 )

𝑑𝑥=

𝑑𝑔(𝑓)

𝑑𝑓

𝑑𝑓

𝑑𝑥

Since 𝑑𝑔

𝑑𝑓 is never 0, we see that

𝑑(𝑔(𝑓 𝑥 )

𝑑𝑥= 0 if and only if

𝑑𝑓

𝑑𝑥= 0

Page 23: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Ex. 2 Let 𝑋1, … , 𝑋𝑛 be a random sample of size n from the Poisson distribution

𝑓 𝑥; λ =λ𝑥𝑒−λ

𝑥!, λ ∈ [0, ∞)

Obtain the maximum likelihood estimator λ

The likelihood function is

𝐿 λ 𝑥1, 𝑥2, … , 𝑥𝑛 = λ𝑥𝑖𝑒−λ

𝑥𝑖!= 𝑒−𝑛λ

λ 𝑥𝑖𝑛𝑖=1

𝑥𝑖!𝑛𝑖=1

𝑛

𝑖=1

1) endpoint λ=0 as long as at least one 𝑥𝑖 is non-zero, then 𝐿 0 𝑥1, 𝑥2, … , 𝑥𝑛 = 0 If 𝑥𝑖 = 0𝑛

𝑖=1 (all 𝑥𝑖 = 0) then L(0|0,0, …, 0) = 1

2) derivative: let a = 𝑥𝑖𝑛𝑖=1 , b = 𝑥𝑖!

𝑛𝑖=1

𝑑𝐿(λ)

𝑑λ=

−𝑛𝑒−𝑛λλ𝑎 + 𝑒−𝑛λ𝑎λ𝑎−1

𝑏= 0 → −𝑛λ + 𝑎 = 0 → λ = 𝑎/𝑛

𝐿𝑎

𝑛𝑥1, 𝑥2, … , 𝑥𝑛 =

𝑒−𝑎

𝑏

𝑎

𝑛

𝑎

Page 24: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

combining 1) and 2) we have

λ =

0 if 𝑥𝑖

𝑛

𝑖=1= 0

𝑥𝑖𝑛𝑖=1

𝑛otherwise

Again we see that the second case also includes the first. We can therefore encapsulate the maximum likelihood estimator as

λ = 𝑥𝑖

𝑛𝑖=1

𝑛= 𝑥

Page 25: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Ex. 3 Let 𝑋1, … , 𝑋𝑛 be a random sample of size n from a normal population with standard deviation 𝜎 and unknown mean. Find the maximum likelihood estimator of 𝜇.

𝐿 𝜇 𝑥1, 𝑥2, … , 𝑥𝑛 = 1

2𝜋𝜎𝑒− 𝑥𝑖−𝜇 2 2𝜎2

𝑛

𝑖=1

1) there are no endpoints, 𝜇 can have any value

2) the derivative can be simplified by writing 𝑥𝑖 − 𝜇 as 𝑥𝑖 − 𝑥 + 𝑥 − 𝜇 where 𝑥 is the sample average. Completing the square in the exponent 𝐿 𝜇 becomes

𝐿 𝜇 𝑥1, 𝑥2, … , 𝑥𝑛 =𝑒−𝑛 𝑥 −𝜇 2 2𝜎2

2𝜋𝜎2 𝑛/2𝑒− 𝑥𝑖−𝑥 2 2𝜎2 𝑛

𝑖=1 𝑒− 𝑥𝑖−𝑥 𝑥 −𝜇𝑛𝑖=1 𝜎2

Noting that 𝑥𝑖 − 𝑥 = 0𝑛𝑖=1 the last exponential in the RHS above is 𝑒0 = 1. Thus 𝐿 𝜇

becomes

𝐿 𝜇 𝑥1, 𝑥2, … , 𝑥𝑛 =𝑒−𝑛 𝑥 −𝜇 2 2𝜎2

2𝜋𝜎2 𝑛/2𝑒− 𝑥𝑖−𝑥 2 2𝜎2 𝑛

𝑖=1

= 𝑎 𝑒−𝑛 𝑥 −𝜇 2 2𝜎2 𝑑𝐿(𝜇)

𝑑𝜇= a

−𝑛

2𝜎2 −2 𝑥 − 𝜇 𝑒−𝑛 𝑥 −𝜇 2 2𝜎2 = 0

→ 𝜇 = 𝑥

Page 26: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Ex. 4 Let 𝑋1, … , 𝑋𝑛 be a random sample of size n from a normal population with mean 𝜇 and unknown variance. Find the maximum likelihood estimator of 𝜎2.

𝐿 𝜎2 𝑥1, 𝑥2, … , 𝑥𝑛 = 1

2𝜋𝜎𝑒− 𝑥𝑖−𝜇 2 2𝜎2

𝑛

𝑖=1

=1

2𝜋𝜎2 𝑛/2𝑒

−1

2𝜎2 𝑥𝑖−𝜇 2𝑛𝑖=1

1) endpoints: when 𝜎2 = 0, the Gaussian becomes a delta function of infinite height and no width at 𝜇

2) derivative. Let 𝑎 =1

2 𝑥𝑖 − 𝜇 2𝑛

𝑖=1 , 𝑏 =1

2𝜋 𝑛/2 , and 𝑦 = 𝜎2

𝑑𝐿(𝑦)

𝑑𝑦= −

𝑛

2𝑏𝑦−

𝑛+22 𝑒

−𝑎𝑦 + 𝑏𝑦−

𝑛2𝑎𝑦−2𝑒

−𝑎𝑦 = 0 → 𝑦 =

2𝑎

𝑛

Thus from 1) and 2) we see that the maximum likelihood estimator for 𝜎2 is

𝜎2 = 𝑥𝑖 − 𝜇 2𝑛

𝑖=1

𝑛

(as long as 𝜎2 is not zero)

A similar computation shows that the maximum likelihood estimator for the standard deviation 𝜎 is

𝜎 = 𝑥𝑖 − 𝜇 2𝑛

𝑖=1

𝑛

Page 27: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Theorem: Let 𝜃 be a maximum likelihood estimator for 𝜃.

If 𝑔(𝜃) is a continuous, one-to-one function of 𝜃,

then 𝑔(𝜃 ) will be a maximum likelihood estimator for 𝑔 𝜃

i.e. 𝑔(𝜃) = 𝑔(𝜃 )

e.g. Number of defective hard drives produced per day is a RV described by a Poisson distribution. The counts for 10 days are 7 3 1 2 4 1 2 3 1 2. Obtain a maximum likelihood estimate for the mean number of defective drives per day.

The maximum likelihood estimator for λ is λ = 𝑥 =26

10= 2.6

What is the maximum likelihood estimate for getting a) 0 defective drives per day b) 1 defective drive per day c) 0 or 1 defective drive per day

a) 𝑓 0; λ gives the probability of getting 0 defective drives per day.

Therefore 𝑓 0; λ =λ 0𝑒−λ

0!= 𝑒−2.6 is the maximum likelihood of getting 0 defective

drives per day

b) 𝑓 1; λ =λ 1𝑒−λ

1!= 2.6 𝑒−2.6 c) 𝑒−2.6+2.6 𝑒−2.6 = 3.6 𝑒−2.6

Page 28: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Ex. 5 In example 3 we found the MLE 𝜇 for a normal population for which 𝜎2 is known.

In example 4 we found the MLE 𝜎2 for a normal population for which 𝜇 is known. What are the MLEs for 𝜇, 𝜎2 (and 𝜎) for a normal population in which neither is known? We form the likelihood function

𝐿 𝜇, 𝜎2 𝑥1, 𝑥2, … , 𝑥𝑛 = 1

2𝜋𝜎𝑒− 𝑥𝑖−𝜇 2 2𝜎2

𝑛

𝑖=1

which is now a function of two unknown parameters. To maximize over both 𝜇 and 𝜎2 we

require the solutions 𝜇 and 𝜎2 which simultaneously cause the two partial derivatives to vanish, i.e.

𝑑𝐿(𝜇, 𝜎2)

𝑑𝜇= 0 and

𝑑𝐿(𝜇, 𝜎2)

𝑑𝜎2 = 0

The computations proceed similarly to those in examples 3 and 4 giving the results

𝜇 = 𝑥 , 𝜎2 = 𝑥𝑖 − 𝑥 2𝑛

𝑖=1

𝑛, 𝜎 = 𝜎2

(The result for 𝜇 is unchanged from example 3, but now 𝜎2 depends on 𝑥 and not on 𝜇.)

Page 29: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. The following 15 values can from a random sample of a normal population 5.57 5.76 4.18 4.64 7.02 6.62 6.33 7.24 5.57 7.89 4.67 7.24 6.43 5.59 5.39 a) find the maximum likelihood estimate (MLE) for the mean b) find the MLE for the variance c) find the MLE for the coefficient of variation

a) 𝜇 = 𝑥 =90.14

15= 6.009

b) 𝜎2 = 𝑥𝑖 − 𝑥 2𝑛

𝑖=1

𝑛=

16.2631

15= 1.084

c) 𝜎

𝜇

=

𝜎

𝜇 =

1.084

6.0109= 0.173

Page 30: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

7.4 Hypothesis Testing

In 7.2 we learned how to establish, based upon a random sample, a confidence interval concerning where the true value of a parameter lay. We worked mostly with the mean μ and its unbiased estimator 𝑋

𝑥 − 𝐸𝛼 2 ≤ 𝜇 ≤ 𝑥 + 𝐸𝛼 2

In 7.3 we learned how to make a maximum likelihood estimate of the value of a parameter based upon a random sample. Maximize, over all values of 𝜃, the likelihood function

𝐿 𝜃 𝑥1, 𝑥2, … , 𝑥𝑛 = 𝑓(𝑥𝑖; 𝜃)

𝑛

𝑖=1

In this section we want to develop hypothesis testing which begins with a hypothesis statement regarding the value of a parameter, 𝜃. We will learn to state our hypothesis in terms of a so-called “null hypothesis” which is either true or false.

We compute a value for the parameter estimator 𝜃 using a random sample. Based upon the observed value of 𝜃 we will accept the null hypothesis if the value 𝜃 lies within a “certain distance” of the value 𝜃, otherwise we reject the null hypothesis. This “distance” is computed relative to a previously chosen probability 𝛼.

Page 31: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. A manufacturer claims that the drying time of a “fast drying paint” is 20 minutes on average (this is the hypothesis on the mean value 𝜇) Assume we know that the standard deviation 𝜎 of the drying times for this new paint is 2.4 minutes. We test the claim by testing the drying time of paint from 36 different 1-gallon cans of the manufacturer’s paint. (random sample of 𝑛 = 36) We will reject the manufacturers claim if the mean estimator 𝑋 from the random sample gives a value 𝑥 > 20.75 minutes. (the “certain distance” is 0.75 minutes greater than the hypothesized mean). Since the random sample is “large”, we know that the z-score

𝑍 =𝑋 − 𝜇

𝜎 𝑛

is a RV governed (to good approximation) by the standard normal distribution.

Page 32: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

𝑧 = 0 𝜇 = 20

𝑧 = 1.875 𝑥 = 20.75

𝛼 = 0.0304

standard normal distribution

Suppose the manufacturer’s claim (the hypothesis) is true.

Therefore the rejection distance (20.75 minutes) that we have developed corresponds to a z-score of

𝑧 =20.75 − 20

2.4 36 = 1.875

From table 3, we see that the probability of getting a value of 𝑋 greater than 20.75 is 3%. If the hypothesis is true, we have a 3% probability of getting a random sample larger than 20.75, which means that we will make the correct decision regarding the hypothesis in 97% of our random samples, but 3% of the time we will make an incorrect conclusion.

correctly accept hypothesis as true

incorrectly reject hypothesis as false

Page 33: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Suppose the manufacturer’s claim (the hypothesis) is false.

Now, if it is false, we have no idea what the true mean is. Suppose, just for example, the true mean was 𝜇 = 21 (an alternate hypothesis). Based on this, our “certain distance”, 20.75 minutes, really corresponds to a z-score of

𝑧 =20.75 − 21

2.4 36 = −0.625

From table 3, we see that the probability of getting a value of 𝑋 smaller than 20.75 is 26.6%. If the hypothesis was false, and the mean was really 21, we have a 26.6% probability of getting a random sample smaller than 20.75, which means that we will make the correct decision regarding the hypothesis (i.e. that it is false) in 73.4% of our

𝑧 = 0 𝜇 = 21

𝑧 = −0.625 𝑥 = 20.75

𝛽 = 0.266

standard normal distribution random samples, but 26.6% of the time we will make an incorrect conclusion

incorrectly accept hypothesis as true

correctly reject hypothesis as false

Page 34: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

“Truth” about H Conclusion from random sample

Accept H Reject H

H is true Correct decision Type I error, probability 𝛼

H is false Type II error, probability 𝛽 Correct decision

Refer to the hypothesis as H In our hypothesis test, we see we face the following 4 possibilities diagrammed in the table below

1a. H is true our random sample leads us to conclude H is true (accept H) 1b. H is true our random sample leads us to conclude H is false (reject H)

2a. H is false our random sample leads us to conclude H is true (accept H) 2b. H is false our random sample leads us to conclude H is false (reject H)

Case 1b is known as a Type I error, which occurs with some probability 𝛼 (Thus Case 1a, correctly determining that the hypothesis is true occurs with probability 1 − 𝛼)

Case 2a is known as a Type II error, which occurs with some probability 𝛽 (Thus Case 2b, correctly determining that the hypothesis is false occurs with probability 1 − 𝛽)

Page 35: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

In hypothesis testing, we have the ability to set 𝛼 by careful statement of the hypothesis.

As we saw in our example, we can also specify 𝛽 if we are willing to specify an appropriately stated alternate hypothesis.

7.5 The Null Hypothesis

To set 𝛼, we need to be able to formulate the hypothesis H in such a way that we can measure a “distance” away from it, and determine a z-score (t-score, F-value, 2-value) corresponding to 𝛼 In the example in 7.4 H was “𝜇 = 20”. Thus 20.75 had a z-score of 1.875, which corresponded to 𝛼 = 0.03. If the hypothesis H had been “𝜇 < 20”, then 20.75 would not correspond to a fixed z-score relative to (the unknown) 𝜇. It is important to state H in such a way that we can determine a z-score (t-score, F-value, 2-value) corresponding to 𝛼 relative to H

Page 36: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

It is important to state H in such a way that we can determine a z-score (t-score, F-value, 2-value), corresponding to 𝛼, relative to H One can often take a problematic hypothesis, and restate it in terms of the opposite of what we desire to prove. e.g. We want to show method A is more efficient than method B. We instead hypothesize that they are equally efficient and test that. e.g. We want to show that method A is more expensive than method B. We instead hypothesize they are equally expensive and test that. e.g. We want to show that method A leads to better yield strengths than method B. We instead hypothesize that the yield strengths are equal and test that.

Page 37: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

In the examples above, we derive the hypothesis to test as a statement of no difference between the two methods. “No difference” hypotheses are referred to as null hypotheses and are usually denoted by 𝐻0. It has become usual practice to refer to the hypothesis that is being directly tested as the null hypothesis

Aside: The textbook notes that the criminal court of many countries approaches a trial with the null hypothesis that a defendant is innocent. Proving guilt requires that a condition of “beyond reasonable doubt” be met, which is akin to setting a small value for 𝛼 so that the probability of a type I error (innocent yet found guilty) is small. Failure to show guilt beyond reasonable doubt in a trial (= a single random sample) leads to an acceptance of the null hypothesis (innocent). Showing guilt beyond reasonable doubt in a trial (= a single random sample) leads to a rejection of the null hypothesis (innocent) and a declaration of guilty.

- Criminal procedure requires that the outcome from only a single random sample (a single trial) be used to test the null hypothesis.

Note, though, that the criminal code has a process by which a random sample (a trial) can be declared invalid (a “mistrial”) – equivalent to determining that the sample was, in fact, biased.

Page 38: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Hypothesis testing requires the following 5 steps 1. Formulate a null hypothesis 𝑯𝟎

Formulate an appropriate alternative hypothesis to be accepted when the null hypothesis is rejected. In our drying time example, 𝐻0 was 𝜇 = 20 The alternate hypothesis, which we accept if 𝐻0 is rejected is: 𝜇 > 20 This is an example of a one-sided alternative (a.k.a one-tailed test) where the alternate to “is equal to” is “is greater than”. The other example of a one-sided alternative is when the alternate to “is equal to” is “is less than”.

𝐻0 𝐻0

accept 𝐻0 accept 𝐻0

reject 𝐻0 (accept 𝐻1)

one-sided alternative testing

probability 𝛼

Page 39: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

We may also choose to employ a two-sided alternative (a.k.a two-tailed test)where the alternate to “is equal to” is “is less than or greater than”.

𝐻0

accept 𝐻0

reject 𝐻0 (accept 𝐻1)

two-sided alternative testing

probability 𝛼/2 probability 𝛼/2

To reject 𝐻0 with probability 𝛼 in favor of a two-sided alternative, it is necessary to set the rejection range to correspond to area 𝛼/2 in each tail.

Page 40: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

The choice of alternative hypothesis depends on what we wish to show.

2. Specify the probability 𝜶 of a Type I error. If possible, specify the probability 𝜷 of the type II error for the alternate hypothesis The probability 𝛼 of a Type I error is referred to as the level of significance of the test Common values for 𝛼 are 0.05 and 0.01 (Note: there is a “coupling” between Type I and Type II errors. Setting 𝛼 too small may increase the probability of making Type II errors.)

𝜇 = 20 𝑥 = 20.75

𝛼 = 0.0304

𝜇 = 19.8 𝑥 = 20.75

𝛼 = 0.0088

How to test 𝐻0: 𝜇 ≤ 20? Note that as 𝜇 decreases, the level of significance corresponding to our cut-off 20.75 decreases (see figures below). Therefore the test on 𝜇 = 20 represents the worst case (largest value) level of significance. Thus to test an inequality for 𝐻0 we instead test for a worst-case equality (a maximum probability for Type I error)

Page 41: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

3. Based upon the sampling distribution of the appropriate statistics, construct a criterion (e.g. choose z-score, t-value, F-value, 𝟐-value) for testing the null hypothesis against the specified alternative. 4. Calculate the value of the statistic from the random sample (the data) 5. Decide whether to accept or reject the null hypothesis

Note: this objective approach incorporates some level of subjective decision making: 1. sample size n 2. choice of level of significance 𝛼 and choice of 𝛽 Choice of 𝛼 and 𝛽 are particularly important since this directly controls the probability of committing an error in the test. The acceptable level of error is a subjective decision reflecting “a comfort level in exposure to risk”. For companies, this risk exposure often has financial consequences; the level of significance may be set according to the amount of financial loss to which the company is willing to be exposed in case of Type I or Type II errors. In many areas of public health and safety, 𝛼 and 𝛽 are determined by government regulation. e.g. potable water: a contaminant is allowed only below a specified concentration set by a regulatory agency.

Page 42: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

Because the probability of falsely rejecting the null hypothesis is controlled (you get to choose 𝛼), the null hypothesis is retained unless the data strongly contradicts it. Therefore, if we want to show that the data strongly supports our claim, we should show that the data strongly contradicts the opposite to our claim. Consequently:

When the goal of an experiment is to establish an assertion, the negation of the assertion should be taken as the null hypothesis, and the assertion itself becomes the alternative hypothesis. e.g. if we want to show two things are different, let the null hypothesis assume they are the same. Thus if the data strongly contradict that assumption that they are the same, then you have great confidence that they are in fact different.

By similar reasoning, if causing a certain error is critical (e.g. may lead to death), the null hypothesis should be chosen such that the critical error is assigned to the Type I error category. Usual notation 𝑯𝟎: the null hypothesis (often the negation of the claim to be established) 𝑯𝟏: the alternative hypothesis (often the claim to be established) Type I error: rejection of 𝐻0 when 𝐻0 is true. Type II error: Acceptance of 𝐻0 when 𝐻1 is true. 𝜶: probability of making a Type I error (level of significance) 𝜷: probability of making a Type II error

Page 43: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

7.6 Hypotheses concerning one mean

e.g. Desired goal: establish, to a level of significance of 0.05, that the mean of the thermal conductivity (a RV with 𝜎 = 0.01) of cement brick is not 0.34 (as claimed) using a random sample (𝑛 = 35) of bricks.

1. Null hypothesis: 𝜇 = 0.34

Alternate hypothesis: 𝜇 ≠ 0.34 (two-tailed test)

2. Level of significance: 0.05 Since this is a 2-tailed test, we want half the probability (𝛼/2=0.025) to appear in each tail.

3. Test on the statistic 𝑋 in its standardized representation

𝑍 =𝑋 − 𝜇

𝜎 𝑛

A value of 𝛼 2 = 0.025 corresponds to the z-value 𝑧0.025 = 1.96. Thus if the value 𝑥 obtained from the random sample gives a z-score 𝑧 < −1.96 or 𝑧 > 1.96, then we can reject the null hypothesis at the significance level 0.05, and declare in favor of the alternate hypothesis. Note: if the sample size is large and 𝜎 unknown, we can approximate the standardized representation for the statistic 𝑋 by

𝑍 =𝑋 − 𝜇

𝑆 𝑛

Page 44: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

4. Calculate 𝑥 from the random sample. Assume we get 𝑥 = 0.343 This corresponds to a z-value

𝑧 =0.343 − 0.34

0.01 35 = 1.77

5. As the positive z-value does NOT exceed 𝑧0.025 = 1.96, we must accept the null hypothesis at the level of significance of 0.05. Thus, to level of significance 0.05 we cannot claim the manufacturers claim is wrong.

P-value: From the measured z-value 1.77, we can compute the tail probability, the probability of getting a difference between 𝑥 and 𝜇 greater than that observed in the random sample. As this was a 2-tailed test, this is the area under the standard normal distribution below ─1.77 and above 1.77. From Table 3, this area is 2 P(z < −1.77) =0.0768. This P-value of 7.68%, of course, exceeds the 5% significance level, but gives some indication of “how close” we came to rejecting the null hypothesis.

Page 45: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. Surface hardness of vinyl flooring is a RV with mean 4.5 and standard deviation 1.5. An evening shift is added to the production line and a random sample of hardness measurements of 25 specimens produced by the second shift is taken to determine whether the mean surface hardness is being maintained on the second shift.

1. Null hypothesis: 𝜇 = 4.5

Alternate hypothesis: 𝜇 ≠ 4.5 (two-tailed test)

2. Level of significance: 0.05 Since this is a 2-tailed test, we want half the probability (𝛼/2) to appear in each tail.

3. Test on the statistic 𝑋 in its standardized representation

𝑍 =𝑋 − 𝜇

𝜎 𝑛

A value of 𝛼 2 = 0.025 corresponds to the z-value 𝑧0.025 = 1.96. Thus if the value 𝑥 obtained from the random sample gives a z-score 𝑧 < −1.96 or 𝑧 > 1.96, then we can reject the null hypothesis at the significance level 0.05, and declare in favor of the alternate hypothesis.

4. Calculate z from the random sample measurement of 𝑥 = 3.9 This corresponds to a z-value

𝑧 =3.9 − 4.5

1.5 25 = −2.0

Page 46: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

5. As the observed z-value is less than −𝑧0.025 = −1.96, we must reject the null hypothesis at the level of significance of 0.05. Thus, to level of significance 0.05 the second shift is not producing flooring to the desired average surface hardness.

P-value: The tail area (2-tail test) corresponding to the observed z = −2.0 is 2𝑃 𝑧 < −2.0 = 2 ∙ 0.0228 = 0.0456; less than the 0.05 level of significance.

The P-value can be defined for any test statistic in any hypothesis testing problem:

The P-value is the probability of obtaining a value for the test statistic that is as extreme or more extreme than the value actually observed. The probability is calculated under the null hypothesis.

Page 47: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. The average lifetime of a model of tire is stated to be at least 28,000 miles. A random sample of 40 tires are tested, producing a mean sample lifetime of 27,463 with a sample standard deviation of 1,348 miles. What conclusion can be reached if the probability of a Type I error is to be less than 0.01?

1. Null hypothesis: 𝜇 ≥ 28,000 . We test on the worst case 𝜇 = 28,000

Alternate hypothesis: 𝜇 < 28,000 (one-tailed test)

2. Level of significance: 0.05 Since this is a 1-tailed test, we assign all the probability 𝛼 to the left-hand tail.

3. Test on the statistic 𝑋 in its approximate standardized representation

𝑍 =𝑋 − 𝜇

𝑆 𝑛

A value of α = 0.01 corresponds to the z-value −𝑧0.01= −2.33. Thus if the value 𝑥 obtained from the random sample gives a value 𝑧 < −2.33 then we can reject the null hypothesis at the significance level 0.01, and declare in favor of the alternate hypothesis.

4. Calculate z from the random sample measurement for 𝑥 and s gives

𝑧 =27,463 − 28,000

1.348 40 = −2.52, (𝑃 − value: 0.0059)

5. As the observed z-value is less than −𝑧0.01 we must reject the null hypothesis at the level of significance of 0.01. Thus, to level of significance 0.01 the tire lifetime is less than 28,000 miles.

Page 48: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

If the sample size is small and 𝜎 is unknown, the Z representation for 𝑋 cannot be used. If the sample comes from a (approximately) normal population, we can use the t representation for 𝑋

𝑡 =𝑋 − 𝜇

𝑆 𝑛

which we know is governed by the t-distribution. In this case we need to compare against 𝑡𝛼 (one-tailed) or 𝑡𝛼/2 (two-tailed) values based upon 𝑛 − 1 df.

e.g. Specifications for rope call for an average breaking strength of 180 pounds. A random sample of 5 pieces are tested, yielding a mean breaking strength of 169.5 pounds with a standard deviation of 5.7 pounds. Test for the specification at the level of significance of 0.01

1. Null hypothesis: 𝜇 = 180

Alternate hypothesis: 𝜇 < 180 (one-tailed test! - we don’t care if mean breaking strength exceeds 180)

2. Level of significance: 0.01 (assigned to the left-hand tail)

3. Test on the t-value for df = 4. A value of α = 0.01 corresponds to −𝑡0.01= −3.747.

4. Calculate t from the random sample measurement for 𝑥 and s gives

𝑡 =169.5 − 180

5.7 5 = −4.12

5. As the observed t-value is less than −𝑡0.01 we must reject the null hypothesis at the level of significance of 0.01. Thus, to level of significance 0.01 the breaking strength is below specification.

Page 49: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

7.7 Relation between 2-Tailed Hypothesis Tests and Confidence Intervals

Consider the level of confidence 𝛼 in a two-tailed test for the null hypothesis 𝐻0: 𝜇 = 𝜇0 vs 𝐻1: 𝜇 ≠ 𝜇0

using the t-value for 𝑋 . The critical region (values for t for which we reject the null hypothesis) is

𝑥 − 𝜇0

𝑠 𝑛 = 𝑡 ≥ 𝑡𝛼 2

Stated equivalently, the acceptance region (values for t for which we accept the null hypothesis) is

𝑥 − 𝜇0

𝑠 𝑛 < 𝑡𝛼 2

i.e. 𝑥 − 𝑡𝛼 2 ∙

𝑠

𝑛 ≤ 𝜇0 ≤ 𝑥 + 𝑡𝛼/2 ∙

𝑠

𝑛

This is exactly the 1 − 𝛼 100% confidence interval placed on 𝜇0 from the random sample measurement!! Thus the null hypothesis will not be rejected to level of confidence 𝛼 (in a two-tailed test) if 𝜇0 lies within the 1 − 𝛼 100% confidence interval established by the random measurement.

Page 50: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −

e.g. The mean weight loss of a random sample of 𝑛 = 16 grinding balls (used in slurries) is 3.42 grams with a standard deviation of 0.68. Construct a 95% confidence interval assuming the underlying population is normal. Test the hypotheses

𝐻0: 𝜇 = 3.7 vs 𝐻1: 𝜇 ≠ 3.7 and

𝐻0: 𝜇 = 3.0 vs 𝐻1: 𝜇 ≠ 3.0 The 95% confidence interval is

𝑥 − 𝑡0.025 ∙𝑠

𝑛 ≤ 𝜇 ≤ 𝑥 + 𝑡0.025 ∙

𝑠

𝑛

3.42 − 2.131 ∙0.68

16 ≤ 𝜇 ≤ 3.42 + 2.131 ∙

0.68

16

3.06 < 𝜇 < 3.78 As 𝜇 = 3.7 lies within the 95% confidence limits, we must accept the first null hypothesis 𝐻0: 𝜇 = 3.7 at the level of significance of 0.05 As 𝜇 = 3.0 lies outside the 95% confidence limits, we must reject the second null hypothesis 𝐻0: 𝜇 = 3.0 at the level of significance 0.05

Page 51: Chapter 7 Inferences concerning a mean 7.1 Point Estimation ...linli/teaching/ams-310/lecture-notes...7.2 Interval estimation Once we wrote that, with probability 1− , we have −