chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/ch7.pdf · 1 chapter 7 inference for...

23
1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves using data collected in a sample to make statements about some population parameter. Involves: 1. Estimation (confidence intervals) 2. Tests of significance Assumptions 1. Simple random sample 2. Either (1) the population is normal or (2) the sample size is large enough for the central limit theorem to apply 3. The population standard deviation s is known . n x Z σ µ = Then the sampling distribution of is and the Z-score transformation gives X ( ) n N X σ µ, ~

Upload: others

Post on 08-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

1

Chapter 7

Inference for

Distributions

§7.1: Inference for the Mean of a Population

Statistical inference involves using data collected in a sample to make statements about some population parameter.

Involves:1. Estimation (confidence intervals)

2. Tests of significance

Assumptions

1. Simple random sample

2. Either (1) the population is normalor

(2) the sample size is large enough for the central limit theorem to apply

3. The population standard deviation s is known.

nxZ

σµ−

=

Then the sampling distribution of is and the Z-score transformation gives

X ( )nNX σµ,~

Page 2: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

2

General Concepts

Now suppose that the population standard deviation σ is unknown.

We still assume that we have a simple random sample and that the population is (at least approximately) normally distributed.

When σ is unknown the use of the Z procedures that we have discussed in Chapter 6 are not valid.

If the population standard deviation σ is unknown, then instead of using the standard normal distribution Z, we use a similar distribution called the Student’s t-distribution.

t-Distribution

Assumptions for t-distribution:

1. Simple random sample

2. Near normal population

i. If the sample size is very small (less than 15), then the distribution must be very close to normal.

ii. If the sample size is at least 15, the t procedures can be used except in cases in which the distribution is heavily skewed.

iii. For large samples (n > 40) the t-distribution is valid even for heavily skewed distributions (central limit theorem)

Properties of t-Distribution• The t distribution is a

family of similar probability distributions.

• A specific t distribution depends on a parameter known as the degrees of freedom.

• A t distribution with more degrees of freedom has less dispersion.

• The mean of the t distribution is zero.• As the number of degrees of freedom increases, the

difference between the t distribution and the standard normal probability distribution becomes smaller and smaller.

Page 3: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

3

t-Distribution

The t-statistic is

The sampling distribution of this statistic is called the Student’s t-distribution with degrees of freedom n-1.

nsxt µ−

=

t-DistributionThe difference between the Z statistic and the t statistic is knowledge of the population standard deviation σ. If σ is known, then so too is the standard deviation of , , and hence use of the Z statistic is possible.If σ is unknown, then neither is the standard deviation of . Hence this standard deviation must be estimated.When the standard deviation of a statistic is estimated from sample data, the estimate is called the standard error of the statistic. In this case the standard error is

X nσ

X

( ) nsXSE =

Example

a. n = 12 p = .05

df = n - 1 = 12 - 1 = 11

t11 = 1.796

b. n = 28 p = .01 t = ???

df = n - 1 = 28 - 1 = 27

t27 = 2.473

c. n = 68 p = .05 t= ???

df = n - 1 = 68 - 1 = 67 (Use df = 60 or df = 80)

t60 = 1.671 t80 = 1.664

Page 4: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

4

Confidence Interval for µ

A 100 C% confidence interval is given by

Situation: The population standard deviation σ is unknown

Assumptions: 1. Simple random sample

2. The population is near normal or the sample size is large enough for the central limit theorem to apply

where t* = the t value providing an area of (1-C)/2 in the upper tail of a t distribution with n –1degrees of freedom

s = the sample standard deviation

nstx *±

Example: Apartment Rents

A reporter for a student newspaper is writing an article on the cost of off-campus housing. A sample of 10 one-bedroom units within a half-mile of campus resulted in a sample mean of $550 per month and a sample standard deviation of $60.

Let us provide a 95% confidence interval estimate of the mean rent per month for the population of one-bedroom units within a half-mile of campus. We’ll assume this population to be normally distributed.

550 + 42.92or $507.08 to $592.92

We are 95% confident that the mean rent per month for the population of one-bedroom units within a half-mile of campus is between $507.08 and $592.92.

ns

*tx ±ns

*tx ±

Example: Apartment Rents

t Value df=n - 1 = 10 - 1 = 9, t* = 2.262.

1060

2.262550 ±

Page 5: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

5

Exercises

1) Nine workers, chosen at random from a work force in a factory, have a mean wage of $125 a week with a standard deviation of $12. Assuming that the distribution of all workers' wages is normal, find a 98% confidence interval for the mean wages of all workers.

2) The monthly incomes (In $1.000) from a random sample of faculty at a university are shown below.

3.0 4.0 6.0 3.0 5.0 5.0 6.0 8.0

Compute a 90% confidence interval for the mean of the population. Give your answer in dollars.

Test for µ

Null hypothesis: H0: µ = µ0

Alternative hypothesis: Ha: µ > µ0Ha: µ < µ0Ha: µ ≠ µ0

Situation: The population standard deviation σ is unknown

1. Hypotheses

2. Test Statistic:

Which follows a t-distribution with n - 1 degrees of freedom.

nsx

tµ−

=

1. Upper one-sided test: Ha: µ > µ0p-value = P ( T > t )

2. Lower one-sided test: Ha: µ < µ0p-value = P ( T < t )

3. Two-sided test: Ha: µ ≠ µ0p-value = 2 P ( T > | t | )

3. P-Value

4. Conclusion

Reject H0 if the P-value < α

Test for µ

Page 6: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

6

Example

a. Upper one-sided test, df = 8, t = 2.000

p-value = P( T8 > 2.000)

For p = .05, t8 = 1.860

For p = .025, t8 = 2.306

So .025 < p-value < .05

NOTE: Writing .05 < p-value < .025 is WRONG

Or, tcdf(2, 1E99, 8)=0.0403

Example

b. Lower one-sided test, df = 38 , t = -1.53

p-value =P( T40 < -1.53)

For p = .10, t40 = 1.303

For p = .05, t40 = 1.684

So .05 < p-value < .10

df = 38 use df = 40

= P( T40 > 1.53)

Or, tcdf(1.53, 1E99, 40)=0.0669

Example

c. Two-sided test, df = 15 , t = -2.680

p-value =2P( T15 > |-2.680|)

For p = .01, t15 = 2.602

For p = .005, t15 = 2.947

So .005 < P( T15 > 2.680) < .01

= 2P( T15 > 2.680)

So .01 < p-value < .02

Or, 2*tcdf(2, 1E99, 8)=0.0171

Page 7: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

7

Example: Metro EMSA major west coast city provides one of the

most comprehensive emergency medical services in the world. Operating in a multiple hospital system with approximately 20 mobile medical units, the service goal is to respond to medical emergencies with a mean time of less than 4.8 minutes.

The director of medical services wants to test whether or not the service goal of less than 4.8 minutes is being achieved. He selected a sample of 20 emergency response times and found that the mean time is 4.6 minutes and the standard deviation 1.2 minutes. Test the claim at 0.05 level.

Example Parameter of interest =µ = mean time before

ambulance arrives1. H0:

Ha:µ = 4.8 minutesµ < 4.8 minutes

3. P-value: P(T19 < -0. 745)= P(T19 > 0. 745)

.20 < p-value < .25 Exact=0.2327

0.745201.2/4.84.6

nsµx

t −=−

=−

=

1.2s4.6,x20,n ===2. Test Statistic

There is insufficient evidence to conclude that the meantime before an ambulance arrives is less than 4.8 minutes.

4. Decision: Since p-value > .05, we fail to reject H0

Example: Banana PricesThe average retail price for bananas in 1998 was 51¢per pound, as reported by the U.S. Department ofAgriculture in Food cost Review. Recently, a randomsample of 15 markets gave the following prices forbananas in cents per pound.

56 53 55 53 50 57 58 54 48 4757 57 51 55 50

At 0.05 level, can you conclude that the current meanretail price for bananas is different from the 1998 mean of 51 ¢ per pound?

Page 8: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

8

Example Parameter of interest = µ = mean retail price for bananas

1. H0:Ha:

µ = 51 ¢µ ≠ 51 ¢

3. P-value= 2P(T14 > 2.66)

0.01 < p-value < 0.02 Exact=0.0187

2.66153.5/5153.4

nsµx

t =−

=−

=

14df3.5,s53.4,x15,n ====2. Test Statistic

There is sufficient evidence to conclude that the meanretail price for bananas is not 51 ¢ .

4. Decision: Since p-value < .05, we reject H0

0.005<P(T14 > 2.66)<0.01

Exercises

1. An inspector from the Department of Weights and Measures weights 10 one-pound samples of peanut butter; he finds their mean weight is 15.8 oz. with standard deviation of 0.4 oz. Do the weights of packages of peanut butter sold by the shop from which these samples were taken differ from the announced weight?

Exercises

2. In the past the average age of employees of a large corporation has been 40 years. Recently, the company has been hiring older individuals. In order to determine whether there has been an increase in the average age of all the employees, a sample of 25 employees was selected. The average age in the sample was 45 years with a standard deviation of 5 years. Assume the distribution of the population is normal. At α = .05, test to determine whether or not the mean age of all employees is significantly more than 40 years.

Page 9: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

9

• With a matched-sample design each sampled item provides a pair of data values.

• To compare the responses to the treatments in a matched pairs design.

• The parameter m is the mean difference in the two responses.

• The one-sample t procedure is applied to the observed differences.

Matched Pairs t Procedures

Example: Express Deliveries

A Chicago-based firm has documents that must be quickly distributed to district offices throughout the U.S. The firm must decide between two delivery services, UPX (United Parcel Express) and INTEX (International Express), to transport its documents. In testing the delivery times of the two services, the firm sent two reports to a random sample of ten district offices with one report carried by UPX and the other report carried by INTEX.

Do the data that follow indicate a difference in mean delivery times for the two services?

Delivery Time (Hours)District Office UPX INTEX DifferenceSeattle 32 25 7Los Angeles 30 24 6Boston 19 15 4Cleveland 16 15 1New York 15 13 2Houston 18 15 3Atlanta 14 15 -1St. Louis 10 8 2Milwaukee 7 9 -2Denver 16 11 5

Example: Express Deliveries

Page 10: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

10

Let µ = the mean of the difference values for the two delivery services for the population of district offices.

Example: Express Deliveries

1. Hypotheses H0: µ = 0, Ha: µ ≠ 0

2. Test Statistic 9,10,9.2,7.2 ==== dfnsx 9,10,9.2,7.2 ==== dfnsx

2.94102.902.7

nsµx

t 0 =−

=−

= 2.94102.902.7

nsµx

t 0 =−

=−

=

3. P-value =2P(T9>2.94) 0.005<P(T9>2.94)<0.010.01<P-value<0.02 Exact=0.0165

4. Conclusion

There is a significant difference between the mean delivery times for the two services.

P-value <0.05, so reject H0.

The following data presents the number of computer units sold per day by a sample of 6 salespersons before and after a bonus plan was implemented. At0.05 level of significance, test to see if the bonusplan was effective. That is, did the bonus planactually increase sales?

Exercise

1287584After978673Before654321Saleperson

t procedures

Except in small samples, the assumption of a SRS from the population is more important than the assumption that the population distribution is normal.

Sample size is less than 15: use t procedures if the data are close to normal.Sample size is at least 15: the t procedures can beused except in the presence of strong outliers or strong skewness.

Sample size is large(n > 40): t Procedures can be used even for clearly skewed distributions

Using the t Procedures:

Robustness : A confidence interval or Hypothesis test is said to be robust if the confidence level or P-value does not change very much when the assumptions of the procedure are violated.

Page 11: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

11

§7.2: Comparing Two Means

Now suppose we have two independent populations, and of interest is to make statistical inferences about the difference between the two population means: µ1 − µ2

Example:Suppose one population consists of all male students, and the second population consists of all female students.

We could be interested in making inferences about the difference between the mean IQ of male students and the mean IQ of female students.

Point Estimate

We take a simple random sample of n1 subjects from the first population (in our example a sample of n1 males) and an independent simple random sample of n2 subjects from the second population (in our example a sample of n2females).

A point estimate of µ1 − µ2 is the difference in the sample means:

21 xx −

Sampling Distribution

Assumptions:

(1) Suppose we have two independent simple random samples.

(2) (i) Either both populations are normally distributed:X1 ~ N(µ1, σ1) and X2 ~ N(µ2, σ2)

or (ii) The populations are possibly nonnormal but both sample sizes are large enough such that the central limit theorem applies

Page 12: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

12

Sampling Distribution

This assumes σ1 and σ2 are both known.

The sampling distribution for the difference in the sample means, , is approximately normal with mean µ1 - µ2and standard deviation

21 xx −

( ) ( )2221

21 nσnσ +

⎟⎟⎠

⎞⎜⎜⎝

⎛+−−

2

22

1

21

2121 nσ

,µµN~xx

( ) ( )( ) ( )2

221

21

2121

nσnσ

µµxxZ

+

−−−=

Z-Score Transformation

Confidence Interval for µ1 − µ2

Suppose we want to estimate the difference between the population means: µ1 − µ2

If the assumptions satisfied then a 100 C% confidence interval for µ1 − µ2 is given by

( )2

22

1

21

21 nσ

zxx +±− ∗

ExampleIf a random sample of 50 non-smokers has a mean life of 76 years and a random sample of 65 smokers live 68 years. Find a 95% confidence interval for the difference of mean lifetime of non-smokers and smokers. Assume that the standard deviations of smokers and non-smokers are 8 and 9 years.

µ1 = mean lifetime of non-smokers andµ2 = mean lifetime of smokers

8,76,50 111 === σxn

9,68,65 222 === σxn

( )2

22

1

21

21 nσ

zxx +±− ∗

( )659

50896.16876

22

+±−

12.38 ±(4.88, 11.12) years

Page 13: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

13

Test for µ1 − µ2

We hypothesize that the difference between the population means equals some specified value µ0 and we use data from our samples to test whether this value is reasonable or whether the mean difference is actually greater than µ0, less than µ0, or not equal to µ0.

Null Hypothesis: H0: µ1 − µ2 = µ0

Ha: µ1 − µ2 > µ0Alternative Hypothesis: Ha: µ1 − µ2 < µ0

Ha: µ1 − µ2 ≠ µ0

1. Hypotheses

In many problems µ0 = 0, and hence the hypotheses are often written as follows:

H0: µ1 − µ2 = 0 H0: µ1 = µ2

Ha: µ1 − µ2 > 0 Ha: µ1 > µ2

Ha: µ1 − µ2 < 0 Ha: µ1 < µ2

Ha: µ1 − µ2 ≠ 0 Ha: µ1 ≠ µ2

Test for µ1 − µ2

Assumptions:

(1) Suppose we have two independent simple random samples.

(2) (i) Either both populations are normally distributed:X1 ~ N(µ1, σ1) and X2 ~ N(µ2, σ2)

or (ii) The populations are possibly nonnormal but both sample sizes are large enough such that the central limit theorem applies

(3) The population standard deviations σ1 and σ2 are known.

Test for µ1 − µ2

Page 14: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

14

2. Test Statistic

The p-value is calculated as we’ve seen in Ch 6

3. p-value

Reject H0 if p-value < α

4. Conclusion

2

22

1

21

21

xxz

+

−=

Test for µ1 − µ2

Example

The U.S. National Center for Health Statistics compilesdata on the length of stay by patients in short-termhospitals and publishes its findings in Vital and HealthStatistics. Independent samples of 39 male patients and 35 female patients gave sample means of 7.9 and 7.11 days respectively. At the 5% significance level, do the data provide sufficient evidence to conclude that , on the average, the length of stay in short-term hospitalsby males and females differ? Assume that σ1=5.4 andσ2=4.6 days.

Example

4.5,39,90.7 111 === σnX

6.4,35,11.7 222 === σnX

µ1 = mean length of stay in short-term hospitals by malesµ2 = mean length of stay in short-term hospitals by females

(1) H0: Ha:

µ1 = µ2 H0: µ1 − µ2 = 0µ1 ≠ µ2 Ha: µ1 − µ2 ≠ 0

(2)

(4) Conclusion: Since the p-value > .05, we do not reject H0.

There is no sufficient evidence to conclude that the meanlength of stay in short-term hospitals by males and females differ.

0.68

354.6

395.4

7.117.9z

22=

+

−=

(3) P-value= 2P(Z > |z|)=2P(Z>0.68)=2(0.2483)=0.4966

Page 15: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

15

ExerciseThe registrar at AU is comparing the GPA of marriedand unmarried students. He finds that 100 married students have a mean GPA of 2.85, while a randomsample of 100 unmarried students has a mean GPA of2.73. At 0.01 level of significance, do married studentshave a higher GPA? Assume that σmarried=0.4 andσunmarried=0.3.

Comparing Two MeansSmall Samples and σ’s unknown

Suppose we have two independent populations with population means µ1 and µ2, respectively. We are interested in making statistical inferences (confidence interval and significance tests) about the difference in the population means: µ1 − µ2.

Earlier we assumed that the population standard deviationsσ1 and σ2 were known. Now we will discuss what to dowhen the population standard deviations are unknown.

If the population standard deviations are unknown, we calculate the sample standard deviations and use a t distribution instead of Z.

Confidence Interval for µ1 − µ2

( ) ( )( ) ( )2

221

21

2121

nσnσ

µµxxz

+

−−−=

When σ1 and σ2 are unknown, the Z statistic that results from the sampling distribution of is not appropriate.

21 xx −

Page 16: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

16

Instead we must use a t-statistic

which follows a t distribution with degrees of freedom equal to the smaller of n1 - 1 and n2 - 1. (or approximated using the formula in page 536)

( ) ( )

2

22

1

21

2121

ns

ns

xxt+

−−−=

µµ

Confidence Interval for µ1 − µ2

When σ1 and σ2 are unknown, the Z statistic that results from the sampling distribution of is not appropriate.

21 xx −

If the assumptions are satisfied, then a 100 C% confidence interval for µ1 − µ2 can be determined using:

Where t* is the value t(k) density curve with area C between -t* and t*. The value of the degrees of freedom k is approximated by software or the smaller of n1 - 1 and n2–1.

( )2

22

1

21

21 ns

nstxx +±− ∗

Confidence Interval for µ1 − µ2

Example

In a sampling study conducted by the Clearview NationalBank, two independent samples of checking account balances for customers at two Clearview branch banksyielded the following results:

$12$92010Beechmont

$15$100012Cherry Grove

Sample Standard deviation

Sample Mean

Number of Checking Accounts

Bank Branch

Develop a 90% confidence interval for the difference between the mean checking account balances at the two branch banks.

Page 17: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

17

Example

15,1000,12 111 === sxn 12,920,10 222 === sxn

( )2

22

1

21

21 ns

ns

txx +±− ∗

Parameter of interest: µ1−µ2

µ1 = average account balance at Cherry Grove branchµ2 = average account balance at Beechmont branch

df= min(12 -1, 10 - 1) = min(11, 9) = 9So t* =1.833

1012

1215

1.833920)(100022

+±−

10.5580 ±

($69.45, $90.55)

ExerciseIn a packing plant, a machine packs cartons with jars. Supposedly, a new machine will pack faster on the average than the machine currently used. The times it takes each machine to pack 10 cartons are recorded. The results in seconds are shown below:

43.23

42.13

Sample mean

0.75010Present

0.68510New

Sample Standard deviation

Sample size

Machine

Construct a 90% confidence interval for the difference between the mean time it takes the new machine to pack 10 cartons and the mean time it takes the present machine to pack 10 cartons.

follows a t distribution with degrees of freedom (df) equal to n1 + n2 – 2 where the pooled variance sp

2 is given by

When σ1 and σ2 are unknown and assumed equal then the statistic

( ) ( )

21p

2121

n1

n1s

µµxxt

+

−−−=

Sampling Distribution

2nn1)s(n1)s(n

s21

222

2112

p −+−+−

=2nn

1)s(n1)s(ns

21

222

2112

p −+−+−

=

Page 18: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

18

Confidence interval with σ ‘s Unknown and equal is

Where t* is the value of t(n1+n2-2) density curvewith area C between -t* and t*.

Confidence Interval of µ1 − µ2Pooled procedure

( )21

p21 n1

n1

stxx +±− ∗( )21

p21 n1

n1

stxx +±− ∗

2nn1)s(n1)s(n

s21

222

2112

p −+−+−

=2nn

1)s(n1)s(ns

21

222

2112

p −+−+−

=

Specific Motors of Detroit has developed a new automobile known as the M car. 12 M cars and 8 J cars(from Japan) were road tested to compare miles-per-gallon (mpg) performance. The sample statistics are:

Sample #1 Sample #2M Cars J Cars

Sample Size n1 = 12 cars n2 = 8 carsMean = 29.8 mpg = 27.3 mpgStandard Dev. s1 = 2.56 mpg s2 = 1.81 mpgConstruct a 95% confidence interval for the difference in miles-per- gallon (mpg) performance assuming that the distributions of the populations are normal with equal variances.

Example: Specific Motors

x2x2x1x1

Let µ1= mean MPG for the population of M carsµ2= mean MPG for the population of J cars

Point estimate of µ1 − µ2 = = 29.8 - 27.3 = 2.5 mpg.

We will make the following assumptions:– The miles per gallon rating must be normally

distributed for both the M car and the J car.– The variance in the miles per gallon rating must

be the same for both the M car and the J car.

Using the t distribution with n1 + n2 - 2 = 18 degreesof freedom, the appropriate t value is t*= 2.101.

x x1 2−x x1 2−

Example: Specific Motors

Page 19: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

19

• 95% Confidence Interval for the Difference Between Two Population Means:

= 2.5 + 2.2 or 0.3 to 4.7 miles per gallon.We are 95% confident that the difference between the mean mpg ratings of the two car types is from 0.3 to 4.7 mpg (with the M car having the higher mpg).

5.282812

7(1.81)11(2.56)2nn

1)s(n1)s(ns

22

21

222

2112

p =−+

+=

−+−+−

= 5.282812

7(1.81)11(2.56)2nn

1)s(n1)s(ns

22

21

222

2112

p =−+

+=

−+−+−

=

( )81

121

2.298*2.1012.5)n1

n1

(stxx21

p*

21 +±=+±− ( )81

121

2.298*2.1012.5)n1

n1

(stxx21

p*

21 +±=+±−

Example: Specific Motors

Significance Test on µ1 − µ2

Assumptions:

(1) Suppose we have two independent simple random samples.

(2) (i) Either both populations are normally distributed:X1 ~ N(µ1, σ1) and X2 ~ N(µ2, σ2)

or (ii) The populations are possibly nonnormal but both sample sizes are large enough such that the central limit theorem applies

(3) The population standard deviations σ1 and σ2 are unknown.

Significance Test on µ1 − µ2

We hypothesize that the difference between the population means equals some specified value µ0 (=0 in most cases) and want to test whether this value is reasonable or whether the mean difference is actually greater than µ0, less than µ0, or not equal to µ0.

Null Hypothesis: H0: µ1 − µ2 = µ0

Ha: µ1 − µ2 > µ0Alternative Hypothesis: Ha: µ1 − µ2 < µ0

Ha: µ1 − µ2 ≠ µ0

( H0: µ1= µ2)

( Ha: µ1> µ2)( Ha: µ1< µ2)( Ha: µ1 ≠ µ2)

Suppose the population standard deviations σ1 and σ2 are unknown.

1. Hypotheses

Page 20: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

20

Significance Test on µ1 − µ2

If the assumptions are satisfied then the test statistic is:

( )( ) ( )2

221

21

21

nsns

xxt

+

−=

2. Test Statistic

which follows a t distribution with df=min( n1 – 1, n2 – 1).

The p-value of the test depends on the alternative hypothesis and is based on the t distribution.

3. P-value

The decision and conclusion are determined the same as for all other tests.

4. Conclusion

Significance Test on µ1 − µ2Pooled Procedure

( )( ) ( )21p

21

n1n1sxx

t+

−=

If the population standard deviations σ1 and σ2 are unknown and assumed equal (σ1=σ2), then the test statistic will be

which follows a t distribution with df=n1 + n2 – 2 where

2nn1)s(n1)s(n

s21

222

2112

p −+−+−

=2nn

1)s(n1)s(ns

21

222

2112

p −+−+−

=

Example: Faculty SalariesIndependent random samples of 25 faculty membersin public institutions and 30 faculty members inprivate institutions yielded the statistics in thefollowing table

1866.430Private2057.525Public

Standard deviation

Sample Mean

Sample size

At the 5% significance level, do the data provide sufficient evidence to conclude that mean salaries for faculty in public and private institutions differ?

Page 21: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

21

Exampleµ1 = mean salaries for faculty in public institutions µ2 = mean salaries for faculty in private institutions

(1) H0: Ha:

µ1 = µ2 H0: µ1 − µ2 = 0µ1 ≠ µ2 Ha: µ1 − µ2 ≠ 0

(2)( )

( ) ( ) ( ) ( ) 1.71930182520

66.457.5

nsns

xxt

222

221

21

21 −=+

−=

+

−=

20,5.57,25 111 === sxn 18,4.66,30 222 === sxn

df=min(25-1, 30-1)=min(24,29)=24

(3) P-value=2 P(T24 >|-1.719|)= 2 P(T24 >1.719)

0.025 < P(T24 >1.719) < 0.05

0.05 < P-value < 0.1 Exact=0.0985

Exampleµ1 = mean salaries for faculty in public institutions µ2 = mean salaries for faculty in private institutions

(1) H0: Ha:

µ1 = µ2 H0: µ1 − µ2 = 0µ1 ≠ µ2 Ha: µ1 − µ2 ≠ 0

(2) t=-1.711

(3) 0.05 < P-value < 0.1

(4) Conclusion: Since the p-value > .05, we do not reject H0.

There is no sufficient evidence to conclude that mean salaries for faculty in public and private institutions differ.

ExampleA researcher was interested in comparing the amount oftime spent watching television by women and by men.Independent samples of 14 women and 17 men wereselected and each person was asked how many hours heor she had watched television during the previous week. The summary statistics are as follows

4.411.314Women4.716.917Men

Standard deviation

Sample Mean

Sample size

Do the data provide sufficient evidence to conclude that mean time for women is less than mean time for men? Perform a pooled t-test at the 5% significance level.

Page 22: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

22

Exampleµ1 = mean time spent watching television by women µ2 = mean time spent watching television by men

(1) Η0: Ηa:

µ1 = µ2 H0: µ1 − µ2 = 0µ1 < µ2 Ha: µ1 − µ2 < 0

(2)

( )( ) ( ) ( ) ( )

3.39681/171414.568

16.913.1n1n1s

xxt

21p

21 −=+

−=

+−

=

4.4,3.11,14 111 === sxn 7.4,9.16,17 222 === sxn

df=14+17-2=29

(3) P-value=P(T29 <-3.3968)= P(T29 >3.3968)0.0005 < P(T29 > 3.3968) < 0.001

4.568s 20.866221714

1)(4.7)(171)(4.4)(14s p

222p =⇒=

−+−+−

= 4.568s 20.866221714

1)(4.7)(171)(4.4)(14s p

222p =⇒=

−+−+−

=

Exampleµ1 = mean time spent watching television by women µ2 = mean time spent watching television by men

(1) H0: Ha:

µ1 = µ2 H0: µ1 − µ2 = 0µ1 < µ2 Ha: µ1 − µ2 < 0

(2) t =-3.3968

(3) 0.0005 < P-value < 0.001

(4) Conclusion: Since the p-value < .05, we reject H0.

There is sufficient evidence to conclude that mean time for women is less than mean time for men (on average, women spend less time watching television than men).

Exercises

1. A random sample of 17 third graders who read poorly has a mean IQ of 98 with a standard deviation of 10; a random sample of 10 third graders who read well has mean IQ of 101 with a standard deviation of 9. At 0.05 level of significance is the mean IQ of good readers higher that the mean IQ of poor readers? Assume that the IQ scores for both groups is normally distributed with equal variances.

Page 23: Chapter 7site.iugaza.edu.ps/nbarakat/files/2010/02/CH7.pdf · 1 Chapter 7 Inference for Distributions §7.1: Inference for the Mean of a Population Statistical inference involves

23

Exercises2. A high school teachers’ group is investigating summer

work patterns. It finds that the mean monthly income of 20 randomly selected teachers who teach in the summer is $600 with a standard deviation of $100, while a random sample of 10 teachers who will sell real estate during the summer is $700 with a standard deviation of $50. The teachers’ group believes the pay for both kinds is normally distributed with different variances. At 0.05 level of significance, is there any difference in the earning of the two groups?

Exercises3. Recently, a local newspaper reported that part time

students are older than full time students. In order to test the validity of its statement, two independent samples of students were selected. The following shows the ages of the students in the two samples. Using the following data, test to determine whether or not the average age of part time students is significantly more than full time students. Use an Alpha of 0.05. Assume the populations are normally distributed and have equal variances.

182019251721Part-time

20191822171819Full-time