chapter 5 5.1 introductory chi-square test objectives:- concerning with the methods of analyzing the...

31
CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods to be analyzed : Goodness-of-fit test: To test over assumption that some variables follow certain distribution. Independence Test To test if the variable is dependent to one another. Homogeneity Test To test if there is a homogeneous relationship between the variables.

Upload: dustin-simon-gaines

Post on 02-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

CHAPTER 55.1 INTRODUCTORY CHI-SQUARE TEST

Objectives:-Concerning with the methods of analyzing the

categorical data

In chi-square test, there are 3 methods to be analyzed :

Goodness-of-fit test: To test over assumption that some variables follow

certain distribution.

Independence Test To test if the variable is dependent to one another. Homogeneity Test To test if there is a homogeneous relationship

between the variables.

Page 2: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Goodness-of-fit test:

In Goodness-of-fit test, chi-square analysis is i. applied for the purpose of examine whether sample data could have been drawn from a population having a specific probability distributionii. To compare an observed distribution to an expected distribution

Page 3: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

In Goodness-of-fit test, the test procedures are appropriate when the following conditions are met :i. The sampling method is simple random samplingii. The population is at least 10 times as large as the sampleiii. The variable under study is categoricaliv. The expected value for each level of the variable is at least 5

Page 4: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Test procedure to run the Goodness-of-fit test:

1. State the null hypothesis and alternative hypothesis

2. Determine:i. the level of significance, ii. The degree of freedom,

0H 1H

1df k p

k

p

where number of levels of the categorical variable

the number of unknown parameters needed

to be estimated from the data.

If there is no unknown parameter, then th

1 0

2 1

k p .

k p .

e degreed of

freedom is where

If there is unknown parameter, then the degreed of

freedom is where

Page 5: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

3. Find the value of from the table of chi-square distribution

4. Calculate the value of

Where the

21,k p

2 using the formula belowcalculated

2

2

1

observed frequency

expected frequency

ki i

calculatedi i

thi

thi

o e

e

o i

e i

1 2 and i i ke nP X n o o ... o

Category

1 2 … k

Frequency

…1o 2o ko

2

Page 6: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

5. Determine the rejection region:i. critical value approach; Rejectii. p – value approach;

6. Make decision

0Reject if value H p

2 20 if calculated ,dfH

Page 7: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Example 5.1:The authority claims that the proportions of road

accidents occurring in this country according to the categories User attitude (A), Mechanical Fault (M), Insufficient Sign Board (I) and Fate (F) are 60%, 20%, 15% and 5% respectively. A study by an independent body shows the following data

Can we accept the claim at significance level

Solution:

1.

Category A M I F Total

Frequency 130

35 30 5 200

0 05.

0

1

200

: 0 6 0 2 0 15 0 05

: At least one differs for and

n

H P A . ,P M . ,P I . ,P F .

H P i i A,M ,I F

Page 8: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

2.

3. From chi-square distribution table

4..

0 05 1 4 1 3. , df k

2 2 20 05 3 0 0 05 37 815 reject if 7 815. , calculated . ,. , H .

0.833

0.625

0.000

2.500

io130Ao 35Mo

30Io 5Fo

i ie nP X 2

i i

i

o e

e

0 6 200 120Ae .

0 2 200 40Me . 0 15 200 30Ie .

0 05 200 10Fe . 2 3 958c .

2 2 2 22 (130 120) (35 40) (30 30) (5 10)

120 40 30 100.833 0.625 0.000 2.5 3.958

calculated

Page 9: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

5. Rejection Region:

6. . Since . Thus we accept and conclude that we have no evidence to reject the claim.

2 2 20 05 3 0 05 3 07 815 3 958 7 815 (Do not reject . , calculated . ,. , . . H )

2 20 05 33 958 7 815c . ,. . 0H

Page 10: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Exercise 5.1:The number of students playing truancy in a school

over 200 school days is showing below

If X is a random variable representing the number of students playing truancy per day, test the hypothesis that X follows the Poisson distribution with mean 3 per day at

No. of truancy 0 1 2 3 4

No of days 12 32 45 50 35 26

5

0 01.

Page 11: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Exercise 5.2 :The probabilities of blood phenotypes A, B, AB and

O in the population of all Caucasians in the US are 0.41, 0.10, 0.04 and 0.45 respectively. To determine whether or not the actual population proportions fit this set of reported probabilities, a random sample of 200 Americans were selected and their phenotypes were recorded. The observed cells are count as calculated. Test the goodness of fit of these blood phenotype probabilities at Blood

Phenotypes

A B AB O

Observed 89 18 12 81

0.10

Page 12: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

The Chi-Square Test for Homogeneity

The homogeneity test is used to determine whether several populations are similar or equal or homogeneous in some characteristics.

This test is applied to a single categorical variable from two different population

Page 13: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

The test procedure is appropriate when satisfy the below conditions : i. For each population, the sampling method is simple random samplingii. Each population is at least 10 times as large as the sampleiii. The variable under study is categoricaliv. If sample data are displayed in contingency table (population x category levels), the expected value for each cell of the table is at least 5.

Page 14: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Two dimensional contingency table layout:

The above is contingency table (r x c) where r denotes as the number of categories of the row variable, c denotes as the number of categories of the column variable

is the observed frequency in cell i, j be the total frequency for row category i be the total frequency for column category j be the grand total frequency for all cell (i, j) where

Column Variable

Category B1

Category B2

… Category Bc

Total

Row Variable

Category A1

Category A2

Category …

… … … … …

Category Ar

Total …

11o

21o

1ro

1n

12o

22o

2ro

1co

2co

rco

2n cn

1n

2n

rn

n

ijo

in

jn

n

row total column total

grand total

th thi j

ij

i jn ne

n

Page 15: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Test procedure to run Chi-square test for homogeneity:1. State the null hypothesis and alternative hypothesis

Eg:

2. Determine:i. the level of significance, ii. The degree of freedom, where

3. Find the value of from the table of chi-square distribution Determine the rejection region:

i. critical value approach; Reject ii. p – value approach;

0H1H

1 1df r c

number of rows

number of column

r

c

2,df

2 20 if

calculated ,dfH

0Reject if valueH p

0

1

: The proportion of ROW variable are SAME with COLUMN variable

: The proportion of ROW variable are NOT SAME with COLUMN variable

H

H

Page 16: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

4. Calculate the value of using the formula below:

5. Make decision

2calculated

2

2

1 1

observed frequency of and column

expected frequency of and column

r cij ij

ci j ij

th thij

th thi

o e

e

o i j

e i j

Page 17: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Example 5.2:Four machines manufacture cylindrical steel pins.

The pins are subjected to a diameter specification. A pin may meet the specification or it may be too thin or too thick. Pins are sampled from each machine and the number of pins in each category is counted. Table below presents the results. Test at whether the categories of pins are similar for all machines.

0 01.

Too thin OK Too Thick

Machine 1 10 102 8

Machine 2 34 161 5

Machine 3 12 79 9

Machine 4 10 60 10

Page 18: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Solution:Construct a contingency table:

Calculation of the expected frequency:

Too thin OK Too Thick Total

Machine 1 10 102 8 120

Machine 2 34 161 5 200

Machine 3 12 79 9 100

Machine 4 10 60 10 80

Total 66 402 32 500

row total column total

grand total

th thi j

ij

i jn ne

n

Page 19: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Testing procedure:1.

2.

3. From table of chi-square:

0

1

: The proportion of pins that are too thin, OK, or too thick is the same for all machines

: The proportion of pins that are too thin, OK, or too thick is the not same for all machines

H

H

0 01

4 3 So 1 1 4 1 3 1 6

.

r , c , df r c

2 2 20 01 6 0 0 01 616 812 and we reject if . , calculated . ,. H

Page 20: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

4. Using the observed and expected frequency in the contingency table, we calculate using the formula given:

2c

ijo row total column total

grand total

th th

ij

i je

2

ij ij

ij

o e

e

11 10o

12 102o

13 8o

21 34o

22 161o

23 5o

31 12o

11

120 6615 84

500e .

12

120 40296 8

500e .

13

120 327 68

500e .

21

200 6626 4

500e .

22

200 402160 8

500e .

23

200 3212 8

500e .

31

100 6613 2

500e .

210 15 84

2 153115 84

..

.

2102 96 48

0 315896 48

..

.

28 7 68

0 00147 68

..

.

234 26 40

2 187926 40

..

.

2161 160 80

0 0002160 80

..

.

25 12 80

4 753112 80

..

.

212 13 20

0 109113 20

..

.

Page 21: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

32 79o

33 9o

41 10o

42 60o

43 10o

32

100 40280 40

500e .

33

100 326 40

500e .

41

80 6610 56

500e .

42

80 40264 32

500e .

43

80 325 12

500e .

2 15 5844c .

279 80 40

0 024480 40

..

.

29 6 40

1 05636 40

..

,

210 10 56

0 029710 56

..

.

260 64 32

0 290164 32

..

.

210 5 12

4 65135 12

..

.

20 05 2

0

5. Since the value of 15 5844 16 812 thus we fail to

reject and conclude that the proportion of pins that are too thin, OK,

or too thick is the same for all mchines

cc . ,. . ,

H

Page 22: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Exercise 5.3:200 female owners and 200 male owners of Proton

cars selected at random and the color of their cars are noted. The following data shows the results:

Use a 1% significance level to test whether the proportions of color preference are the same for female and male.

Car Colour

Black Dull Bright

Gender Male 40 110 50

Female 20 80 100

Page 23: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Chi-Square Test for Independence

This test is applied to a single population which has categorical variables

To determine whether there is a significant association between the two variables.

Eg : In an election survey, voter might be classified by gender (female and male) and voting preferences (democrate ,republican or independent) . This test is used to determine whether gender is related to voting preferences.

Page 24: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

The test is appropriated if the following are met :1. The sampling method is simple random samplingii. Each population is at least 10 times as large as the sampleiii. The variable under study is categoricaliv. If sample data are displayed in contingency table (population x category levels), the expected value for each cell of the table is at least 5.

Page 25: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Note: The procedure for the Chi-square test for independence is the same as the Chi-square test for homogeneity.

The only different between these two test is at the determination of the null and alternative hypothesis. The rest of the procedure are the same for both tests.

This theorem is useful in testing the following hypothesis:

0

1

: ROW and COLUMN variable are INDEPENDENT

: ROW and COLUMN variable are NOT INDEPENDENT

H

H

Page 26: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Example 5.3:Insomnia is disease where a person finds it hard to

sleep at night. A study is conducted to determine whether the two attributes, smoking habit and insomnia disease are dependent. The following data set was obtained.

Use a 5% significance level to conduct the study.

Insomnia

Yes No

Habit Non-smokers

10 70

Ex-smokers 8 32

Smokers 22 38

Page 27: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Solution:

1.

2.

3 From table of chi-square:

Insomnia

Yes No Total

Habit Non-smokers 10 70 80

Ex-smokers 8 32 40

Smokers 22 38 60

Total 40 140 180

0

1

: Smoking habits and Insomnia are independent

: Smoking habits and Insomnia are not independent

H

H

0 05

3 2 So 1 1 3 1 2 1 2

.

r , c , df r c

2 2 20 05 2 0 0 01 65 991 and we reject if . , calculated . ,. H

Page 28: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

4. Using the observed and expected frequency in the contingency table, we calculate using the formula given:

2c

ijo ije 2

ij ij

ij

o e

e

11 10o 11

80 4017 78

180e . 2

10 17 783 40

17 78

..

.

12 70o

21 8o

22 32o

31 22o

32 38o

12

80 14062 22

180e .

21

40 408 89

180e .

22

40 14031 11

180e .

31

60 4013 33

180e .

32

60 140146 67

180e .

270 62 22

0 9762 22

..

.

28 8 89

0 908 89

..

.

232 31 11

0 0331 11

..

.

222 13 33

5 6413 33

..

.

238 46 67

1 6146 67

..

.

2 12 55c .

Page 29: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

5. Since

2 20 05 2 012 55 5 991 so we reject and conclude that the smoking habit and

insomnia disease is not independentc . ,. . , H

Page 30: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Exercise 5.4:A study is conducted to determine whether

student’s academic performance are independent of their active in co-curricular activities. The following data set was obtained:

Use a 5% significance level to conduct the study.

Academic Performance

Low Fair Good

Co-curricular Activities

Inactive 40 80 60

Active 30 90 60

Page 31: CHAPTER 5 5.1 INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods

Exercise 5.5:

A total of n = 309 furniture defects were recorded and the defects were classified into four types: A,B,C,D. At the same time, each piece of furniture was identified by the production shift in which it was manufactured. Test at 5% significance level types of defects and furniture are independence. These counts are presented in table below:

Type of Defects

1 2 3

A 15 26 33

B 21 31 17

C 45 34 49

D 13 5 20