probability and statistics the tests we have worked with before is that chi square tests are used...

30
Week 5 Hypothesis Testing (two-mean groups & categorical variables)

Upload: others

Post on 18-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Week 5Hypothesis Testing (two-mean groups & categorical variables)

Page 2: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

3. Two Samples: Tests on Two Means (unpaired samples):

If and are known (and n1, n2 are

>30), then we have:

21

22

N(0,1)~

nn

)()XX(Z

2

2

2

1

2

1

2121

If and are unknown but = =2, then we have:21

22 2

122

)2nnt(~

n

1

n

1S

)()XX(T 21

21

p

2121

Where the pooled estimate of 2 is

2

)1()1(

21

222

2112

nn

SnSnS p

Page 3: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

The degrees of freedom of is =n1+n22.

Now, suppose we need to test the null hypothesis

Ho: 1 = 2 Ho: 1 2 = 0

Generally, suppose we need to test

Ho: 1 2 = d (for some specific value d)

Against one of the following alternative hypothesis

2pS

H1:

1 2 d

1 2 > d

1 2 < d

Page 4: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Hypotheses Ho: 1 2 = d

H1: 1 2 d

Ho: 1 2 = d

H1: 1 2 > d

Ho: 1 2 = d

H1: 1 2 < d

Test Statistic

(T.S.)N(0,1)~

nn

d)XX(Z

2

2

2

1

2

1

21

)2nnt(~

n

1

n

1S

d)XX(T 21

21

p

21

{if = =2 is unknown}21

22

{if and are known}21

22

Page 5: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

R.R.and A.R.

of Ho

Decision: Reject Ho (and accept H1) at the significance level if:

or or or

T.S. R.R.

Two-Sided Test

T.S. R.R.

One-Sided Test

T.S. R.R.

One-Sided Test

Page 6: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Example

An experiment was performed to compare the wear of two different materials. Twelve

pieces of material 1 were tested by exposing each piece to a machine measuring

wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear

was observed. The samples of material 1 gave an average wear of 85 units with a

sample standard deviation of 4, while the samples of materials 2 gave an average

wear of 81 and a sample standard deviation of 5. Can we conclude at the 0.05 level of

significance that the mean wear of material 1 exceeds that of material 2 by more than

2 units? Assume populations to be approximately normal with equal variances.

Page 7: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Solution:

Material 1 material 2

n1=12 n2=10

=85 =81

S1=4 S2=5

Hypotheses:

Ho: 1 = 2 + 2 (d=2)

H1: 1 > 2 + 2

Or equivalently,

Ho: 1 2 = 2 (d=2)

H1: 1 2 > 2

Calculation:

=0.05

1X 2X

05.2021012

)5)(110()4)(112(

2

)1()1( 22

21

222

2112

nn

SnSnS p

Page 8: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely
Page 9: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Sp=4.478

= n1+n22=12+10 2 = 20

t0.05 = 1.725

T.S.:

04.1

10

1

12

1)478.4(

2)8185(

11

)(

21

21

nnS

dXXT

p

Decision:

Since T=1.04 A.R. (T=1.04< t0.05 = 1.725), we accept (do not reject) Ho and

reject H1: 1 2 > 2 at =0.05.

Page 10: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

R Examples

t.test(Sample,mu=40)

t.test(Sample,alternative="less",mu=40)

t.test(Sample,alternative="greater",mu=36)

t.test(Sample,mu=36, conf.level = 0.99)

Two Samples Tests

Always H0:mu1-mu2=0

H1: m1 Not equal m2, or mu1-mu2<0 , or mu1-mu2>0

Default: Not equal variances (estimated using Welch)

For equal variances, use Var.Equal=True

> Control = c(91, 87, 99, 77, 88, 91)

> Treat = c(101, 110, 103, 93, 99, 104)

t.test(Control, Treat, alternative="less", var.equal=TRUE)

Paired Tests (same sample, but before and after a treatment or intervention)

> Before= c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28)

> After= c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32)

> t.test(Before, After ,alternative="greater", paired=TRUE)

One Sample Tests

Page 11: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Hypothesis Testing on Categorical Variables (Chi-square)

Page 12: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Objectives

By the end of this topic, you should be able to:• Apply hypothesis testing in making inferences about the population’s

parameters when we have categorical variables.

• Apply Chi-square test as a goodness-of-fit test.

Page 13: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Introduction

• The primary difference between a chi-square test andthe tests we have worked with before is that chisquare tests are used for categorical data.

• The chi-square test can be used to• estimate how closely the distribution of a categorical

variable matches an expected distribution (the goodness-of-fit test),

• estimate whether two categorical variables areindependent of one another (the test of independence).

Page 14: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• When collect survey data, for example, if you find that the ratio of males to females who are in favor or against a certain design is 30:70, how would you test that the true population ratio is also 3:7?

• The observed frequencies (30, 70) will almost always differ from the expected frequencies due to sampling error.

• Question: Are these differences significant, or are they due to chance?

• The chi-square goodness-of-fit test will enable one to answer this question.

• The null and alternative hypotheses reflect this focus:• H0 : The population distribution of the variable is the same as the proposed

distribution

• H1 : The distributions are different

When to use test for goodness-of-fit?

Page 15: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• Suppose you conducted a survey to see whether consumers have any preference among five different designs of a new product. A sample of 100 people provided the following data. Test whether this indicates that consumers have preference towards some designs and that the obtained data is not just by chance due to sampling error.

Test for Goodness of Fit - Example

Desig1 Desig2 Desig3 Desig4 Desig5

32 28 16 14 10

Page 16: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• If there were no preferences, one would expect that each design would be selected with equal frequency.

• In this case, the equal frequency is 100/5 = 20.

• That is, approximately 20 people would select each design.

Test for Goodness of Fit - Example

Page 17: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• The frequencies obtained from the sample are called observed frequencies.

• The frequencies obtained from calculations are called expected frequencies.

• Table for the test is shown next.

Test for Goodness of Fit - Example

Page 18: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Test for Goodness of Fit - Example

Freq. Desig1 Desig2 Desig3 Desig4 Desig5

Observed 32 28 16 14 10

Expected 20 20 20 20 20

Page 19: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• The appropriate hypotheses for this example are:

• H0: Consumers show no preference for the design of the product.

• H1: Consumers show a preference.

• The degrees of freedom (df) for this test is equal to the number of categories minus 1.

Test for Goodness of Fit - Example

Page 20: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

2

2

1

O E

E

d f number of categories

O observed frequency

E frequency

. .

expected

Test for Goodness of Fit - Example

Page 21: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Table of Chi-square

Page 22: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• Is there enough evidence to reject the claim that there is no preference in the selection of different designs? Let = 0.05.

• Step 1: State the hypotheses and identify the claim.

• H0: Consumers show no preference (claim).

• H1: Consumers show a preference.

• Step 2: Identify an appropriate test and significance level.

• A Chi-Square goodness of fit test is appropriate for answering this question. In the absence of a stated significance level in the problem, we assume the default 0.05.

Test for Goodness of Fit - Example

Page 23: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• Step 3: Analyze the sample data

• Find the critical value and the d. f. are 5 – 1 = 4 and = 0.05. Hence, the critical value= 𝜒𝛼,𝑑𝑓=4

2 = 9.49.

• Compute the test value. = (32 – 20)2/20 + (28 – 20)2/20 + … + (10 – 20)2/20 = 18.0.

• Step 4: Make the decision. The decision is to reject the null hypothesis, since 18.0 > 9.488.

• Step 5: Write conclusions. There is enough evidence to reject the claim that consumers show no preference for the designs.

Test for Goodness of Fit – Example1

Page 24: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• An insurance company needs to investigate the claim of one of itsemployees that female drivers get in fewer accidents than maledrivers. Specifically, he says that male drivers are held responsible in65% of accidents involving drivers under 23. Another survey is doneto investigate this claim. In the results, 46 out of the 85 accidentsconsidered involve male drivers, does this data support or refute theinitial hypothesis?

Test for Goodness of Fit – Example2

Page 25: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• Step 1: Clearly state the null and alternative hypotheses.• H0: In the population of all drivers, male drivers are responsible for 65% of

accidents and female drivers are responsible for 35%.• H1: The data do not match the proposed model.

• Step 2: Identify an appropriate test and significance level.• A Chi-Square goodness of fit test is appropriate for answering this question. In

the absence of a stated significance level in the problem, we assume the default 0.05.

• Step 3: Analyze sample data.

• Create a table to organize the data and compare the observed data to the expected one and find the T.S value:

Test for Goodness of Fit – Example2

Page 26: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

Male Drivers Female Drivers Total

Observed 46 39 85

Expected 0.65*85=55.25 0.35*85=29.75 85

Test for Goodness of Fit – Example2

Page 27: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

• Using the table, with df=1, we find the critical value to be 3.84.

• The critical value indicates that only 0.05, or 5%, of values would be as high as 3.84. If the 𝜒2 of our data is greater than 3.84, then fewer than 5 times out of 100 would we expect to get that result if the null hypothesis is true.

• Step 4: Make the decision. The decision is to reject the null hypothesis, since the test statistic is greater than the critical value.

• Step 5: Write conclusion. There is enough evidence to reject the claim that 65% of the accidents are done by male drivers.

Test for Goodness of Fit – Example2

Page 28: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

R Functions Table2=table(BikeData$gender) F M 31 90

prop.table(BikeData$gender) F M 0.2561983 0.7438017

Table1 = table(BikeData$cyc_freq, BikeData$gender)

> Table1 F M

Daily 9 38 Less than once a month 2 0 Several times per month 5 9 Several times per week 15 43

barplot(Table1)

barplot(Table1,beside=TRUE)

Page 29: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

R Functions

Note: All expected values for each category must be greater than 5 for the Chi-square test results to be valid.

Conclusion: We accept the hypothesis that the true proportion of female to male cyclists in the population is 1:2.

Page 30: Probability and Statistics the tests we have worked with before is that chi square tests are used for categorical data. •The chi-square test can be used to •estimate how closely

R- Functions

• If you don’t have the data frame, just numbers:

• observed = c(772, 1611, 737)expected = c(0.25, 0.50, 0.25)chisq.test(x = observed, p = expected)

X-squared = 4.1199, df = 2, p-value = 0.1275

Decision: Since p-value is greater than 0.05, there is no enough evidence that the observed distribution is true, hence we accept the null hypothesis that the population distribution matches the expected, not the observed.