6. categorical data analysis - chi-square & fisher exact test

Post on 13-Jan-2017

1.813 Views

Category:

Health & Medicine

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

KNOWLEDGE FOR THE BENEFIT OF HUMANITY

BIOSTATISTICS (HFS3283)

CATEGORICAL DATA (CHI-SQUARE & FISHER EXACT TEST)

Dr. Mohd Razif Shahril

School of Nutrition & Dietetics

Faculty of Health Sciences

Universiti Sultan Zainal Abidin

1

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Topic Learning Outcomes At the end of this lecture, students should be able to;

• identify types of categorical data analysis and their use

• explain assumptions to be met when using chi-square

and fisher exact test

• perform chi-square and fisher exact test using SPSS

• explain how to interpret the SPSS outputs from chi-

square and fisher exact test

2

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What is categorical data analysis?

3

• Independent (Explanatory) Variable is

Categorical (Nominal or Ordinal)

• Dependent (Response) Variable is Categorical

(Nominal or Ordinal)

• Most common;

– 2x2 (Each variable has 2 levels)

– Nominal/Nominal

– Nominal/Ordinal

– Ordinal/Ordinal

CONTINGENCY TABLE

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Contingency Table

4

• Tables representing all combinations of levels of

explanatory and response variables

• Numbers in table represent Counts of the

number of cases in each cell

• Row and column totals are called Marginal

counts

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Example of Contingency Table

5

• Response Variable – Cognitive Level (Low,

High)

• Explanatory Variable – BMI (Underweight,

Normal, Overweight, Obese)

BMICognitive

TotalLow High

Underweight 59 232 291

Normal 54 367 421

Overweight 114 101 215

Obese 173 54 227

Total 400 754 1154

Marginal Count

Marginal Count

Counts

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

2 x 2 Contingency Table

6

• Each variable has 2 levels– Explanatory Variable – Groups (Typically based on

demographics, exposure, or treatment)

– Response Variable – Outcome (Typically presence or absence of a characteristic)

BMICognitive

TotalLow High

≤ 24.9 113 599 712

> 24.9 287 155 442

Total 400 754 1154

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Chi-Square Test (X2)

7

• Hypothesis;– Comparing two or more

proportion

– Ho : P1 = P2

• Assumption– Random samples

– Observations are independent

– The number of cells with Expected Count (EC) less than 5, must be less than 20% of the total number of cells.

– The smallest EC must be at least 2.

Based on study design & method

Calculate expected count for each cell

(SPSS will do it)

The chi-square test for independence, also called Pearson's chi-square test or

the chi-square test of association, is used to discover if there is a

relationship between two categorical variables.

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Example Chi-Square Test (X2) – (1)

8

• Hypothesis;– Association between gender and Knowledge on

Nutrition (KoN)

– Comparing the proportion of Low KoN between gender

– Ho : P(KoN)male = P(KoN)femafe

• Assumption– Random samples [ √ ]

– Observations are independent [ √ ]

– The number of cells with Expected Count (EC) less than 5, must be less than 20% of the total number of cells

– The smallest EC must be at least 2Calculated by SPSS

9

Chi-square using SPSS - procedure:

1

2

3

10

Chi-square using SPSS - procedure:

4

5

6

7

8

9

Chi-square using SPSS - Output:

11

Descriptive statistics for each group

Chi-square statistic = 0.417df = 1; P-value = 0.518

Must be < 20%

Must be ≥ 2

2 EC assumptions

is met

Chi-square using SPSS – Table and Interpretation:

12

Variable nLow KoNFreq (%)

High KoNFreq (%)

X2 statistics a

(df)P-value

Gender

Male 39 19 (48.7) 20 (51.3)0.417 (1) 0.518

Female 34 14 (41.2) 20 (58.8)

Ethnicity

Malay

Others

Education Level

Low

High

Table 1: Factors (categorical variable) associated with Knowledge on Nutrition

a Chi-square test for independence

The prevalence (proportion) of Low Knowledge on Nutrition between male and female is not

significantly different (P = 0.518). Therefore, there is no significant association between gender and

Knowledge on Nutrition.

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

What if assumptions were not met?

13

• Combine adjacent columns or/and rows to

increase the EC if possible.

• If still did not meet expected cell assumption,

Fisher’s exact (FE) test can be applied (only

for 2 x 2 table in SPSS).

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Example Chi-Square Test (X2) – (2)

14

• Hypothesis;– Association between ethnicity and Knowledge on Nutrition

(KoN)

– Comparing the proportion of Low KoN between ethnicity

– Ho : P(KoN)malay=P(KoN)chinese=P(KoN)indian=P(KoN)others

• Assumption

– Random samples [ √ ]

– Observations are independent [ √ ]

– The number of cells with Expected Count (EC) less than

5, must be less than 20% of the total number of cells

– The smallest EC must be at least 2 Calculated by SPSS

Chi-square using SPSS - Output:

Descriptive statistics for each group

4 (50%) cells have EC less than 5. The smallest EC is 1.36.One remedial maybe to

combine Indian and others, (or even combing 3 levels) and

call it as “others”.(Combination should be

interpretable/ meaningful)

15

Must be < 20%

Must be ≥ 2

2 EC assumptions

is not met

Chi-square using SPSS - Output:

Descriptive statistics for each group

16Must be < 20% Must be ≥ 2

2 EC assumptions

is met

Chi-square statistic = 0.072df = 1; P-value = 0.788

If EC assumptionsis still not met

Chi-square using SPSS – Table and Interpretation:

17

Variable nLow KoNFreq (%)

High KoNFreq (%)

X2 statistics a

(df)P-value

Gender

Male 39 19 (48.7) 20 (51.3)0.417 (1) 0.518

Female 34 14 (41.2) 20 (58.8)

Ethnicity

Malay 43 20 (46.5) 23 (53.5)0.072 (1) 0.788

Others 30 13 (43.3) 17 (56.7)

Education Level

Low

High

Table 1: Factors (categorical variable) associated with Knowledge on Nutrition

a Chi-square test for independence

The prevalence (proportion) of Low Knowledge on Nutrition between Malay and other ethnicity is not significantly different (P = 0.788). Therefore,

there is no significant association between ethnicity and Knowledge on Nutrition.

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Fisher Exact Test

18

• Fisher’s Exact Test is a test for independence in a 2 X 2 table.

• It is most useful when the total sample size and the expected values are small. – Useful when E(cell counts) < 5.

• The output consists of more than one p-values: – Choose Exact Sig. (2-sided)

Thank You

19

top related