6. categorical data analysis - chi-square & fisher exact test

19
KNOWLEDGE FOR THE BENEFIT OF HUMANITY BIOSTATISTICS (HFS3283) CATEGORICAL DATA (CHI-SQUARE & FISHER EXACT TEST) Dr. Mohd Razif Shahril School of Nutrition & Dietetics Faculty of Health Sciences Universiti Sultan Zainal Abidin 1

Upload: razif-shahril

Post on 13-Jan-2017

1.813 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

KNOWLEDGE FOR THE BENEFIT OF HUMANITY

BIOSTATISTICS (HFS3283)

CATEGORICAL DATA (CHI-SQUARE & FISHER EXACT TEST)

Dr. Mohd Razif Shahril

School of Nutrition & Dietetics

Faculty of Health Sciences

Universiti Sultan Zainal Abidin

1

Page 2: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Topic Learning Outcomes At the end of this lecture, students should be able to;

• identify types of categorical data analysis and their use

• explain assumptions to be met when using chi-square

and fisher exact test

• perform chi-square and fisher exact test using SPSS

• explain how to interpret the SPSS outputs from chi-

square and fisher exact test

2

Page 3: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What is categorical data analysis?

3

• Independent (Explanatory) Variable is

Categorical (Nominal or Ordinal)

• Dependent (Response) Variable is Categorical

(Nominal or Ordinal)

• Most common;

– 2x2 (Each variable has 2 levels)

– Nominal/Nominal

– Nominal/Ordinal

– Ordinal/Ordinal

CONTINGENCY TABLE

Page 4: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Contingency Table

4

• Tables representing all combinations of levels of

explanatory and response variables

• Numbers in table represent Counts of the

number of cases in each cell

• Row and column totals are called Marginal

counts

Page 5: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Example of Contingency Table

5

• Response Variable – Cognitive Level (Low,

High)

• Explanatory Variable – BMI (Underweight,

Normal, Overweight, Obese)

BMICognitive

TotalLow High

Underweight 59 232 291

Normal 54 367 421

Overweight 114 101 215

Obese 173 54 227

Total 400 754 1154

Marginal Count

Marginal Count

Counts

Page 6: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

2 x 2 Contingency Table

6

• Each variable has 2 levels– Explanatory Variable – Groups (Typically based on

demographics, exposure, or treatment)

– Response Variable – Outcome (Typically presence or absence of a characteristic)

BMICognitive

TotalLow High

≤ 24.9 113 599 712

> 24.9 287 155 442

Total 400 754 1154

Page 7: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Chi-Square Test (X2)

7

• Hypothesis;– Comparing two or more

proportion

– Ho : P1 = P2

• Assumption– Random samples

– Observations are independent

– The number of cells with Expected Count (EC) less than 5, must be less than 20% of the total number of cells.

– The smallest EC must be at least 2.

Based on study design & method

Calculate expected count for each cell

(SPSS will do it)

The chi-square test for independence, also called Pearson's chi-square test or

the chi-square test of association, is used to discover if there is a

relationship between two categorical variables.

Page 8: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Example Chi-Square Test (X2) – (1)

8

• Hypothesis;– Association between gender and Knowledge on

Nutrition (KoN)

– Comparing the proportion of Low KoN between gender

– Ho : P(KoN)male = P(KoN)femafe

• Assumption– Random samples [ √ ]

– Observations are independent [ √ ]

– The number of cells with Expected Count (EC) less than 5, must be less than 20% of the total number of cells

– The smallest EC must be at least 2Calculated by SPSS

Page 9: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

9

Chi-square using SPSS - procedure:

1

2

3

Page 10: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

10

Chi-square using SPSS - procedure:

4

5

6

7

8

9

Page 11: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

Chi-square using SPSS - Output:

11

Descriptive statistics for each group

Chi-square statistic = 0.417df = 1; P-value = 0.518

Must be < 20%

Must be ≥ 2

2 EC assumptions

is met

Page 12: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

Chi-square using SPSS – Table and Interpretation:

12

Variable nLow KoNFreq (%)

High KoNFreq (%)

X2 statistics a

(df)P-value

Gender

Male 39 19 (48.7) 20 (51.3)0.417 (1) 0.518

Female 34 14 (41.2) 20 (58.8)

Ethnicity

Malay

Others

Education Level

Low

High

Table 1: Factors (categorical variable) associated with Knowledge on Nutrition

a Chi-square test for independence

The prevalence (proportion) of Low Knowledge on Nutrition between male and female is not

significantly different (P = 0.518). Therefore, there is no significant association between gender and

Knowledge on Nutrition.

Page 13: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What if assumptions were not met?

13

• Combine adjacent columns or/and rows to

increase the EC if possible.

• If still did not meet expected cell assumption,

Fisher’s exact (FE) test can be applied (only

for 2 x 2 table in SPSS).

Page 14: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Example Chi-Square Test (X2) – (2)

14

• Hypothesis;– Association between ethnicity and Knowledge on Nutrition

(KoN)

– Comparing the proportion of Low KoN between ethnicity

– Ho : P(KoN)malay=P(KoN)chinese=P(KoN)indian=P(KoN)others

• Assumption

– Random samples [ √ ]

– Observations are independent [ √ ]

– The number of cells with Expected Count (EC) less than

5, must be less than 20% of the total number of cells

– The smallest EC must be at least 2 Calculated by SPSS

Page 15: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

Chi-square using SPSS - Output:

Descriptive statistics for each group

4 (50%) cells have EC less than 5. The smallest EC is 1.36.One remedial maybe to

combine Indian and others, (or even combing 3 levels) and

call it as “others”.(Combination should be

interpretable/ meaningful)

15

Must be < 20%

Must be ≥ 2

2 EC assumptions

is not met

Page 16: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

Chi-square using SPSS - Output:

Descriptive statistics for each group

16Must be < 20% Must be ≥ 2

2 EC assumptions

is met

Chi-square statistic = 0.072df = 1; P-value = 0.788

If EC assumptionsis still not met

Page 17: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

Chi-square using SPSS – Table and Interpretation:

17

Variable nLow KoNFreq (%)

High KoNFreq (%)

X2 statistics a

(df)P-value

Gender

Male 39 19 (48.7) 20 (51.3)0.417 (1) 0.518

Female 34 14 (41.2) 20 (58.8)

Ethnicity

Malay 43 20 (46.5) 23 (53.5)0.072 (1) 0.788

Others 30 13 (43.3) 17 (56.7)

Education Level

Low

High

Table 1: Factors (categorical variable) associated with Knowledge on Nutrition

a Chi-square test for independence

The prevalence (proportion) of Low Knowledge on Nutrition between Malay and other ethnicity is not significantly different (P = 0.788). Therefore,

there is no significant association between ethnicity and Knowledge on Nutrition.

Page 18: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Fisher Exact Test

18

• Fisher’s Exact Test is a test for independence in a 2 X 2 table.

• It is most useful when the total sample size and the expected values are small. – Useful when E(cell counts) < 5.

• The output consists of more than one p-values: – Choose Exact Sig. (2-sided)

Page 19: 6. Categorical data analysis - Chi-Square & Fisher Exact Test

Thank You

19