creating a geodemographic system and more: oac and the way forward dr dan vickers social and spatial...
TRANSCRIPT
Creating a geodemographic system and more: OAC and the way forwardDr Dan VickersSocial and Spatial Inequalities Group,Department of Geography, University of Sheffield
Centre for Advanced Spatial Analysis, UCL, 10th December 2008
18/04/23 © The University of Sheffield
What are you in for?
• Creating the Output Area Classification (OAC)
• Geodemographics over time
• Measuring Segregation with geodemographics
18/04/23 © The University of Sheffield
Why classify areas?
“All the real knowledge which we possess depends on methods by which we
distinguish the similar from the dissimilar. The greater number of natural distinctions
this method comprehends the clearer becomes our idea of things. The more
numerous the objects which employ our attention the more difficult it becomes to
form such a method and the more necessary.” (Linnæus 1737)
The myths of geodemographics• The more data the better?
• The smaller the areas the more meaningful?
• These are distinct groups?
• A true depiction of reality?
The Objective in Creating OAC• To provide a publicly accessible classification,
free at the point of use for researchers, policy makers and the public, avoiding the expense of commercial classifications.
• To provide a transparent and replicable methodology with the input data as well as the classification output.
• To stimulate value added research either using the classification or the raw data.
18/04/23 © The University of Sheffield
The seven steps of cluster analysis1. Clustering elements (objects to cluster, also known as
“operational taxonomic units”)
2. Clustering variables (attributes of objects to be used)
3. Variable standardisation
4. Measure of association (proximity measure)
5. Clustering method
6. Number of clusters
7. Interpretation, testing and replication
• adapted from Milligan, G. W. (1996) Clustering validation: Results and implications for applied analyses. in Arabie, P., Hubert, L. J. and De Soete, G. Eds., Clustering and Classification. Singapore: World Scientific.
What went in • Output Areas are the smallest area for general census output.
• 223,060 in the UK
• England & WalesNumber of OAs: 174,434, Min size: 40 households, 100
people, Mean size: 124 households, 297 people
• ScotlandNumber of OAs: 42,604, Min size: 20 households, 50 people,
Mean size: 52 households, 119 people
• Northern Ireland Number of OAs: 5,022, Min size: 40 households, 100 people,
Mean size: 125 households, 336 people
What went in• 41 Census Variables covering:
• Demographic attributes: Including - age, ethnicity, country of birth and population density
• Household composition: Including - living arrangements, family type and family size.
• Housing characteristics: Including - tenure , type & size, and quality/overcrowding
• Socio-economic traits: Including - education, socio-economic class, car ownership & commuting and health & care.
• Employment attributes: Including - level of economic activity and employment class type.
Standardising the variables• Log Transformation
• Reduce the effect of extreme values
• Range Standardisation (0-1)• Problems will occur if there are differing scales
or magnitudes among the variables. In general, variables with larger values and greater variation will have more impact on the final similarity measure. It is necessary therefore to make each variable equally represented in the distance measure by standardising the data.
Clustering the data• K-means clustering• K-means is an iterative relocation algorithm based on an
error sum of squares measure. The basic operation of the algorithm is to move a case from one cluster to another to see if the move would improve the sum of squared deviations within each cluster (Aldenderfer and Blashfield, 1984).
• The case will then be assigned/re-allocated to the cluster to which it brings the greatest improvement. The next iteration occurs when all the cases have been processed. A stable classification is therefore reached when no moves occur during a complete iteration of the data. After clustering is complete, it is then possible to examine the means of each cluster for each dimension (variable) in order to assess the distinctiveness of the clusters (Everitt et al., 2001).
Issues of cluster number selection• Issue 1: Analysis of average distance from cluster centres for
each cluster number option. The ideal solution would be the number of clusters which gives smallest average distance from the cluster centre across all clusters.
• Issue 2: Analysis of cluster size homogeneity for each cluster number option. It would be useful, where possible, to have clusters of as similar size as possible in terms of the number of members within each.
• Issue 3: The number of clusters produced should be as close to the perceived ideal as possible. This means that the number of clusters needs to be of a size that is useful for further analysis.
Issues of cluster number selection
• First Level target 6, 7 selected based on analysis of, average distance from cluster centre and size of each cluster.
• Second Level target 20, 21 selected based on analysis of, average distance from cluster centre and size of each cluster.
• Third Level target 50, 52 selected based on size of each cluster. Split into either 2 or 3 groups
Average Distance from Cluster Centre
0.79
0.81
0.83
0.85
0.87
0.89
0.91
0.93
0.95
0.97
23456789101112
K
Ave
erag
e D
ista
nce
from
Clu
ster
C
entr
e
Creating a hierarchy UK Database 223,060 OAs
K-means algorithm SPSS, 7 Super Groups
1 2 3 4 5 6 7K-means algorithm SPSS, 3 Groups
5a 5b 5cK-means algorithm SPSS, 3 Sub Groups
5c1 5c2 5c3
1: BLUE COLLAR COMMUNITIES
1a: Terraced Blue Collar
1b: Younger Blue Collar
1c: Older Blue Collar
2: CITY LIVING2a: Transient Communities
2b: Settled in the City
3: COUNTRYSIDE3a: Village Life
3b: Agricultural
3c: Accessible Countryside
4: PROSPERING SUBURBS
4a: Prospering Younger Families
4b: Prospering Older Families
4c: Prospering Semis
4d: Thriving Suburbs
5: CONSTRAINED BY CIRCUMSTANCES5a: Senior Communities
5b: Older Workers
5c: Public Housing
6: TYPICAL TRAITS
6a: Settled Households
6b: Least Divergent
6c: Young Families in Terraced Homes
6d: Aspiring Households
7: MULTICULTURAL7a: Asian Communities
7b: Afro-Caribbean Communities
1: Blue Collar Communities
2: City Living
3: Countryside
4: Prospering Suburbs
5: Constrained by Circumstance
6: Typical Traits
7: Multicultural
Alternative Profiles
Mapping the Classification
http://www.maptube.org/map.aspx?mapid=1
Geodemographics over time
Variables• V01: Age 0-4• V02: Age 5-14• V03: Age 25-44• V04: Age 45-64• V05: Age 65+• V06: Indian, Pakistani &
Bangladeshi• V07: Black African, Black
Caribbean & Black Other• V08: Born Outside the UK• V09: Unemployed• V10: Working part-time• V13: Economically inactive
looking after family
• V14: No central heating• V16: Rent (private)• V17: Rent (public)• V18: 2+ Car Households• V20: Flats• V21: Detached• V22: Terraced• V23: Lone parent household• V24: Single pensioner household• V25: Single person (not
pensioner) household• V26: Population Density
18/04/23 © The University of Sheffield
Methodology
• England only
• 1991 data assigned to 2001 Output areas Simpson (2002), Norman et al. (2003)
• Clustered based on the 2001 data
• 1991 data assigned to centroids created from 2001 data
18/04/23 © The University of Sheffield
VariableCluster
1 2 3 4 5 6 7
Age 0-4 .17 .24 .18 .31 .19 .24 .26
Age 5-14 .13 .23 .23 .29 .22 .20 .29
Age 25-44 .47 .41 .30 .38 .34 .36 .33
Age 45-64 .31 .34 .50 .28 .43 .31 .33
Age 65+ .15 .13 .18 .10 .18 .21 .18
Indian, Pakistani & Bangladeshi .05 .03 .01 .44 .02 .06 .02
Black African/ Caribbean / Other .07 .02 .00 .14 .01 .16 .02
Born Outside the UK .23 .08 .06 .39 .06 .21 .05
Unemployed .14 .12 .06 .20 .08 .24 .18
Working part-time .18 .30 .32 .22 .33 .21 .30
Looking after family .15 .18 .19 .34 .16 .25 .29
No central heating .13 .15 .04 .19 .07 .08 .12
Rent (private) .30 .13 .06 .17 .06 .08 .04
Rent (public) .14 .11 .03 .18 .08 .67 .52
2+ Car Households .17 .25 .57 .18 .38 .08 .16
Flats .68 .10 .03 .17 .08 .77 .17
Detached .05 .06 .67 .05 .21 .02 .06
Terraced .17 .66 .06 .57 .15 .12 .36
Lone parent household .11 .18 .08 .21 .12 .23 .28
Single pensioner household .16 .13 .12 .10 .16 .25 .19
Single person (not pensioner) hhold .33 .19 .09 .16 .12 .28 .15
Population Density .17 .14 .04 .21 .08 .23 .10
The clusters were named as follows:
1. Urban Melting Pot2. Mixed Communities3. Out in the Sticks4. Asian Influence5. Middle Class Achievers6. Down and Out7. Working Class Endeavour
Changing prevalence of types
1991 Frequency
1991 Percent
2001 Frequency
2001 Percent
Change Freq
Change %
1. Urban Melting Pot 11,350 6.9 15,392 9.3 4,042 2.4
2. Mixed Communities 21,949 13.2 25,910 15.6 3,961 2.4
3. Out in the Sticks 32,915 19.9 35,704 21.6 2,789 1.7
4. Asian Influence 5,039 3.0 4,930 3.0 -109 0
5. Middle Class Achievers 50,578 30.5 46,791 28.2 -3,787 -2.3
6. Down and Out 8,257 5.0 10,878 6.6 2,621 1.6
7. Working Class Endeavour 35,577 21.5 26,060 15.7 -9,517 -5.8
Total 165,665 100.0 165,665 100.0 0.0 0.0
Cluster results
1991
1 2 3 4 5 6 7 Total
2001 1: Urban Melting Pot 9,200 1,318 195 467 2,245 1,116 851 15,392
2: Mixed Communities 779 15,991 237 432 3,445 76 4,950 25,910
3: Out in the Sticks 23 206 26,561 6 8,444 8 456 35,704
4: Asian Influence 153 519 3 3,699 117 129 310 4,930
5: Middle Class Achievers 482 2,628 5,658 103 33,044 27 4,849 46,791
6: Down and Out 618 126 29 204 505 6,401 2,995 10,878
7: Working Class Endeavour 95 1,161 232 128 2,778 500 21,166 26,060
Total 11,350 21,949 32,915 5,039 50,578 8,257 35,577 165,665
010002000300040005000600070008000
1: U
rban
Mel
ting
Pot
2: M
ixed
Com
mun
ities
3: O
ut in
the
Stic
ks
4: A
sian
Infl
uenc
e
5: M
iddl
eC
lass
Ach
ieve
rs
6: D
own
and
Out
7: W
orki
ngC
lass
End
eavo
ur
1991
2001
London
0200400600800
100012001400
1: U
rban
Mel
ting
Pot
2: M
ixed
Com
mun
ities
3: O
ut in
the
Stic
ks
4: A
sian
Infl
uenc
e
5: M
iddl
eC
lass
Ach
ieve
rs
6: D
own
and
Out
7: W
orki
ngC
lass
End
eavo
ur
1991
2001
Sheffield
0500
100015002000250030003500
1: U
rban
Mel
ting
Pot
2: M
ixed
Com
mun
ities
3: O
ut in
the
Stic
ks
4: A
sian
Infl
uenc
e
5: M
iddl
eC
lass
Ach
ieve
rs
6: D
own
and
Out
7: W
orki
ngC
lass
End
eavo
ur
1991
2001
Tyneside
Surry
0
500
1000
1500
20001:
Urb
anM
elti
ng P
ot
2: M
ixed
Com
mun
itie
s
3: O
ut in
the
Stic
ks
4: A
sian
Infl
uenc
e
5: M
iddl
eC
lass
Ach
ieve
rs
6: D
own
and
Out
7: W
orki
ngC
lass
End
eavo
ur
1991
2001
Measuring Segregation with geodemographics
Segregation Index
• Calculates the number of output areas that would need to be moved between Local Authority Districts to have an equal distribution of OAC types across the country
Segregation index• The index of segregation, as
used here, is the index of dissimilarity between a group and the whole of the population. This is the index we use to measure the degree to which a group is spatially polarised. The index of dissimilarity between two groups is half the sum over all areas of the absolute differences between each group divided by its respective total population. It ranges between 0 and near 100%.
18/04/23 © The University of Sheffield
where Da is the index between groups a and b (b in our case is the population as a whole), Pi is the population of area i of group a, * represents all areas, and N is the number of areas being considered
Most Segregated types• Blue Collar Communities 26.5%
• City Living 48.8%
• Countryside 53.3%
• Prospering Suburbs 21.4%
• Constrained by Circumstances 28.8%
• Typical Traits 22.7%
• Multicultural 67.7%
18/04/23 © The University of Sheffield
Where is most geodemographicly segregated
Segregated Places
1 Newham 89.3
2 City of London 87.0
3 Hackney 82.7
4 Westminster 82.7
5 Tower Hamlets 82.6
6 Lambeth 82.5
7 Kensington and Chelsea 82.2
8 Camden 82.0
9Hammersmith and Fulham 82.0
10 Islington 81.7
425 Dacorum 16.3
426 Wellingborough 15.0
427 Welwyn Hatfield 14.8
428 Gravesham 14.5
429 Cardiff 14.3
430 Sheffield 13.8
431 Leeds 11.6
432 Peterborough 11.2
433 North Hertfordshire 10.5
434 Epping Forest 9.5
Most Least
18/04/23 © The University of Sheffield
Thank you
• [email protected]•Areaclassification.org.uk•Sheffield.ac.uk/SASI