inferential statistics 3: the chi square test advanced higher geography statistics
TRANSCRIPT
Inferential Statistics 3:Inferential Statistics 3:The Chi Square TestThe Chi Square Test
Advanced Higher GeographyAdvanced Higher Geography
StatisticsStatistics
Introduction (1)Introduction (1)
We often have occasions to make We often have occasions to make comparisons between two characteristics comparisons between two characteristics of something to see if they are linked or of something to see if they are linked or related to each other.related to each other.
One way to do this is to work out what One way to do this is to work out what we would expect to find if there was we would expect to find if there was no no relationshiprelationship between them (the usual between them (the usual null hypothesis) and what we actually null hypothesis) and what we actually observeobserve..
Introduction (2)Introduction (2)
The test we use to measure the The test we use to measure the differences between what is observed differences between what is observed and what is expected according to an and what is expected according to an assumed hypothesis is called the assumed hypothesis is called the chi-chi-square testsquare test..
For ExampleFor Example
Some null hypotheses may be:Some null hypotheses may be:– ‘‘there is no relationship between the there is no relationship between the
height of the land and the vegetation height of the land and the vegetation cover’.cover’.
– ‘‘there is no difference in the location of there is no difference in the location of superstores and small grocers shops’superstores and small grocers shops’
– ‘‘there is no connection between the size there is no connection between the size of farm and the type of farm’of farm and the type of farm’
ImportantImportant The chi square test can only be used The chi square test can only be used
on data that has the following on data that has the following characteristics:characteristics:
The data must be in the form of frequencies
The frequency data mush have a precise numerical value and must
be organised into categories or groups.
The total number of observations must be greater than 20.
The expected frequency in any one cell of the table must be greater than
5.
FormulaFormula
χ 2 = ∑ (O – E)2
E
χ2 = The value of chi squareO = The observed valueE = The expected value∑ (O – E)2 = all the values of (O – E) squared then added together
Write down the Write down the NULL HYPOTHESISNULL HYPOTHESIS and and ALTERNATIVE HYPOTHESISALTERNATIVE HYPOTHESIS and and set the set the LEVEL OF SIGNIFICANCE.LEVEL OF SIGNIFICANCE.
NHNH ‘ there is no difference in the distribution of old ‘ there is no difference in the distribution of old established industries and food processing established industries and food processing industries in the postal district of Leicester’industries in the postal district of Leicester’
AH AH ‘There is a difference in the distribution of old ‘There is a difference in the distribution of old established industries and food processing established industries and food processing industries in the postal district of Leicester’industries in the postal district of Leicester’
We will set the We will set the level of significance at 0.05.level of significance at 0.05.
Construct a table with the information you have observed or obtained.
Observed Frequencies (O)
Post Post CodesCodes
LE1LE1 LE2LE2 LE3LE3 LE4LE4 LE5&LELE5&LE66
Row Row TotalTotal
Old Old IndustrIndustr
yy
99 1313 1010 1010 88 5050
Food Food IndustrIndustr
yy
44 33 55 99 2121 4242
Column Column TotalTotal
1313 1616 1515 1919 2929 9292(Note: that although there are 3 cells in the table that are not greater than 5, these are observed frequencies. It is only the expected frequencies that have
to be greater than 5.)
Work out the expected frequency.Work out the expected frequency.
Expected frequency = row total x column total
Grand total
Post Post CodesCodes
LE1LE1 LE2LE2 LE3LE3 LE4LE4 LE5&LELE5&LE66
Row Row TotalTotal
Old Old IndustrIndustr
yy
7.077.07
Food Food IndustrIndustr
yy
Column Column TotalTotal
Eg: expected frequency for old industry in LE1 = (50 x 13) / 92 = 7.07
Post Post CodesCodes
LE1LE1 LE2LE2 LE3LE3 LE4LE4 LE5&LELE5&LE66
Row Row TotalTotal
Old Old IndustrIndustr
yy
7.077.07 8.708.70 8.158.15 10.3310.33 15.7615.76 5050
Food Food IndustrIndustr
yy
5.935.93 7.307.30 6.856.85 8.678.67 13.2413.24 4242
Column Column TotalTotal
1313 1616 1515 1919 2929 9292
For each of the cells calculate.For each of the cells calculate.
Post Post CodesCodes
LE1LE1 LE2LE2 LE3LE3 LE4LE4 LE5&LELE5&LE66
Row Row TotalTotal
Old Old IndustrIndustr
yy
0.530.53
Food Food IndustrIndustr
yy
Column Column TotalTotal
Eg: Old industry in LE1 is (9 – 7.07)2 / 7.07 = 0.53
(O – E)2
E
Post Post CodesCodes
LE1LE1 LE2LE2 LE3LE3 LE4LE4 LE5&LELE5&LE66
Old Old IndustrIndustr
yy
0.530.53 2.132.13 0.420.42 0.010.01 3.823.82
Food Food IndustrIndustr
yy
0.630.63 2.542.54 0.500.50 0.010.01 4.554.55
Add up all of the above numbers to obtain the value for chi square: χ2 = 15.14.
Look up the significance tables. These Look up the significance tables. These will tell you whether to accept the null will tell you whether to accept the null hypothesis or reject it.hypothesis or reject it.
The number of degrees of freedom to use is: the number of rows in the table minus 1, multiplied by the number of columns minus 1. This is (2-1) x (5-1) = 1 x 4 = 4 degrees of freedom.
We find that our answer of 15.14 is greater than the critical value of 9.49 (for 4 degrees of freedom and a significance level of 0.05) and so we reject the null hypothesis.
‘The distribution of old established industry and food processing industries in Leicester is
significantly different.’
Now you have to look for geographical factors to explain your findings