creating a geodemographic system and more: oac and the way forward dr dan vickers social and spatial...

Creating a geodemographic system and more: OAC and the way forwardDr Dan VickersSocial and Spatial Inequalities Group,Department of Geography, University of Sheffield

Centre for Advanced Spatial Analysis, UCL, 10th December 2008

18/04/23 © The University of Sheffield

What are you in for?

• Creating the Output Area Classification (OAC)

• Geodemographics over time

• Measuring Segregation with geodemographics


Why classify areas?

“All the real knowledge which we possess depends on methods by which we

distinguish the similar from the dissimilar. The greater number of natural distinctions

this method comprehends the clearer becomes our idea of things. The more

numerous the objects which employ our attention the more difficult it becomes to

form such a method and the more necessary.” (Linnæus 1737)

The myths of geodemographics• The more data the better?

• The smaller the areas the more meaningful?

• These are distinct groups?

• A true depiction of reality?

The Objective in Creating OAC• To provide a publicly accessible classification,

free at the point of use for researchers, policy makers and the public, avoiding the expense of commercial classifications.

• To provide a transparent and replicable methodology with the input data as well as the classification output.

• To stimulate value added research either using the classification or the raw data.


The seven steps of cluster analysis1. Clustering elements (objects to cluster, also known as

“operational taxonomic units”)

2. Clustering variables (attributes of objects to be used)

3. Variable standardisation

4. Measure of association (proximity measure)

5. Clustering method

6. Number of clusters

7. Interpretation, testing and replication

• adapted from Milligan, G. W. (1996) Clustering validation: Results and implications for applied analyses. in Arabie, P., Hubert, L. J. and De Soete, G. Eds., Clustering and Classification. Singapore: World Scientific.

What went in • Output Areas are the smallest area for general census output.

• 223,060 in the UK

• England & WalesNumber of OAs: 174,434, Min size: 40 households, 100

people, Mean size: 124 households, 297 people

• ScotlandNumber of OAs: 42,604, Min size: 20 households, 50 people,

Mean size: 52 households, 119 people

• Northern Ireland Number of OAs: 5,022, Min size: 40 households, 100 people,

Mean size: 125 households, 336 people

What went in• 41 Census Variables covering:

• Demographic attributes: Including - age, ethnicity, country of birth and population density

• Household composition: Including - living arrangements, family type and family size.

• Housing characteristics: Including - tenure , type & size, and quality/overcrowding

• Socio-economic traits: Including - education, socio-economic class, car ownership & commuting and health & care.

• Employment attributes: Including - level of economic activity and employment class type.

Standardising the variables• Log Transformation

• Reduce the effect of extreme values

• Range Standardisation (0-1)• Problems will occur if there are differing scales

or magnitudes among the variables. In general, variables with larger values and greater variation will have more impact on the final similarity measure. It is necessary therefore to make each variable equally represented in the distance measure by standardising the data.

Clustering the data• K-means clustering• K-means is an iterative relocation algorithm based on an

error sum of squares measure. The basic operation of the algorithm is to move a case from one cluster to another to see if the move would improve the sum of squared deviations within each cluster (Aldenderfer and Blashfield, 1984).

• The case will then be assigned/re-allocated to the cluster to which it brings the greatest improvement. The next iteration occurs when all the cases have been processed. A stable classification is therefore reached when no moves occur during a complete iteration of the data. After clustering is complete, it is then possible to examine the means of each cluster for each dimension (variable) in order to assess the distinctiveness of the clusters (Everitt et al., 2001).

Issues of cluster number selection• Issue 1: Analysis of average distance from cluster centres for

each cluster number option. The ideal solution would be the number of clusters which gives smallest average distance from the cluster centre across all clusters.

• Issue 2: Analysis of cluster size homogeneity for each cluster number option. It would be useful, where possible, to have clusters of as similar size as possible in terms of the number of members within each.

• Issue 3: The number of clusters produced should be as close to the perceived ideal as possible. This means that the number of clusters needs to be of a size that is useful for further analysis.

Issues of cluster number selection

• First Level target 6, 7 selected based on analysis of, average distance from cluster centre and size of each cluster.

• Second Level target 20, 21 selected based on analysis of, average distance from cluster centre and size of each cluster.

• Third Level target 50, 52 selected based on size of each cluster. Split into either 2 or 3 groups

Average Distance from Cluster Centre

0.79

0.81

0.83

0.85

0.87

0.89

0.91

0.93

0.95

0.97

23456789101112

K

Ave

erag

e D

ista

nce

from

Clu

ster

C

entr

e

Creating a hierarchy UK Database 223,060 OAs

K-means algorithm SPSS, 7 Super Groups

1 2 3 4 5 6 7K-means algorithm SPSS, 3 Groups

5a 5b 5cK-means algorithm SPSS, 3 Sub Groups

5c1 5c2 5c3

1: BLUE COLLAR COMMUNITIES

1a: Terraced Blue Collar

1b: Younger Blue Collar

1c: Older Blue Collar

2: CITY LIVING2a: Transient Communities

2b: Settled in the City

3: COUNTRYSIDE3a: Village Life

3b: Agricultural

3c: Accessible Countryside

4: PROSPERING SUBURBS

4a: Prospering Younger Families

4b: Prospering Older Families

4c: Prospering Semis

4d: Thriving Suburbs

5: CONSTRAINED BY CIRCUMSTANCES5a: Senior Communities

5b: Older Workers

5c: Public Housing

6: TYPICAL TRAITS

6a: Settled Households

6b: Least Divergent

6c: Young Families in Terraced Homes

6d: Aspiring Households

7: MULTICULTURAL7a: Asian Communities

7b: Afro-Caribbean Communities

1: Blue Collar Communities

2: City Living

3: Countryside

4: Prospering Suburbs

5: Constrained by Circumstance

6: Typical Traits

7: Multicultural

Alternative Profiles

Mapping the Classification

http://www.maptube.org/map.aspx?mapid=1

http://www.casa.ucl.ac.uk/googlemaps/OAC-super-EngScotWales.html

http://www.casa.ucl.ac.uk/googlemaps/OAC-super-EngScotWales.html

Geodemographics over time

Variables• V01: Age 0-4• V02: Age 5-14• V03: Age 25-44• V04: Age 45-64• V05: Age 65+• V06: Indian, Pakistani &

Bangladeshi• V07: Black African, Black

Caribbean & Black Other• V08: Born Outside the UK• V09: Unemployed• V10: Working part-time• V13: Economically inactive

looking after family

• V14: No central heating• V16: Rent (private)• V17: Rent (public)• V18: 2+ Car Households• V20: Flats• V21: Detached• V22: Terraced• V23: Lone parent household• V24: Single pensioner household• V25: Single person (not

pensioner) household• V26: Population Density


Methodology

• England only

• 1991 data assigned to 2001 Output areas Simpson (2002), Norman et al. (2003)

• Clustered based on the 2001 data

• 1991 data assigned to centroids created from 2001 data


VariableCluster

1 2 3 4 5 6 7

Age 0-4 .17 .24 .18 .31 .19 .24 .26

Age 5-14 .13 .23 .23 .29 .22 .20 .29

Age 25-44 .47 .41 .30 .38 .34 .36 .33

Age 45-64 .31 .34 .50 .28 .43 .31 .33

Age 65+ .15 .13 .18 .10 .18 .21 .18

Indian, Pakistani & Bangladeshi .05 .03 .01 .44 .02 .06 .02

Black African/ Caribbean / Other .07 .02 .00 .14 .01 .16 .02

Born Outside the UK .23 .08 .06 .39 .06 .21 .05

Unemployed .14 .12 .06 .20 .08 .24 .18

Working part-time .18 .30 .32 .22 .33 .21 .30

Looking after family .15 .18 .19 .34 .16 .25 .29

No central heating .13 .15 .04 .19 .07 .08 .12

Rent (private) .30 .13 .06 .17 .06 .08 .04

Rent (public) .14 .11 .03 .18 .08 .67 .52

2+ Car Households .17 .25 .57 .18 .38 .08 .16

Flats .68 .10 .03 .17 .08 .77 .17

Detached .05 .06 .67 .05 .21 .02 .06

Terraced .17 .66 .06 .57 .15 .12 .36

Lone parent household .11 .18 .08 .21 .12 .23 .28

Single pensioner household .16 .13 .12 .10 .16 .25 .19

Single person (not pensioner) hhold .33 .19 .09 .16 .12 .28 .15

Population Density .17 .14 .04 .21 .08 .23 .10

The clusters were named as follows:

1. Urban Melting Pot2. Mixed Communities3. Out in the Sticks4. Asian Influence5. Middle Class Achievers6. Down and Out7. Working Class Endeavour

Changing prevalence of types

1991 Frequency

1991 Percent

2001 Frequency

2001 Percent

Change Freq

Change %

1. Urban Melting Pot 11,350 6.9 15,392 9.3 4,042 2.4

2. Mixed Communities 21,949 13.2 25,910 15.6 3,961 2.4

3. Out in the Sticks 32,915 19.9 35,704 21.6 2,789 1.7

4. Asian Influence 5,039 3.0 4,930 3.0 -109 0

5. Middle Class Achievers 50,578 30.5 46,791 28.2 -3,787 -2.3

6. Down and Out 8,257 5.0 10,878 6.6 2,621 1.6

7. Working Class Endeavour 35,577 21.5 26,060 15.7 -9,517 -5.8

Total 165,665 100.0 165,665 100.0 0.0 0.0

Cluster results

1991

1 2 3 4 5 6 7 Total

2001 1: Urban Melting Pot 9,200 1,318 195 467 2,245 1,116 851 15,392

2: Mixed Communities 779 15,991 237 432 3,445 76 4,950 25,910

3: Out in the Sticks 23 206 26,561 6 8,444 8 456 35,704

4: Asian Influence 153 519 3 3,699 117 129 310 4,930

5: Middle Class Achievers 482 2,628 5,658 103 33,044 27 4,849 46,791

6: Down and Out 618 126 29 204 505 6,401 2,995 10,878

7: Working Class Endeavour 95 1,161 232 128 2,778 500 21,166 26,060

Total 11,350 21,949 32,915 5,039 50,578 8,257 35,577 165,665

010002000300040005000600070008000

1: U

rban

Mel

ting

Pot

2: M

ixed

Com

mun

ities

3: O

ut in

the

Stic

ks

4: A

sian

Infl

uenc

e

5: M

iddl

eC

lass

Ach

ieve

rs

6: D

own

and

Out

7: W

orki

ngC

lass

End

eavo

ur

1991

2001

London

0200400600800

100012001400

1: U

rban

Mel

ting

Pot

2: M

ixed

Com

mun

ities

3: O

ut in

the

Stic

ks

4: A

sian

Infl

uenc

e

5: M

iddl

eC

lass

Ach

ieve

rs

6: D

own

and

Out

7: W

orki

ngC

lass

End

eavo

ur

1991

2001

Sheffield

0500

100015002000250030003500

1: U

rban

Mel

ting

Pot

2: M

ixed

Com

mun

ities

3: O

ut in

the

Stic

ks

4: A

sian

Infl

uenc

e

5: M

iddl

eC

lass

Ach

ieve

rs

6: D

own

and

Out

7: W

orki

ngC

lass

End

eavo

ur

1991

2001

Tyneside

Surry

0

500

1000

1500

20001:

Urb

anM

elti

ng P

ot

2: M

ixed

Com

mun

itie

s

3: O

ut in

the

Stic

ks

4: A

sian

Infl

uenc

e

5: M

iddl

eC

lass

Ach

ieve

rs

6: D

own

and

Out

7: W

orki

ngC

lass

End

eavo

ur

1991

2001

Measuring Segregation with geodemographics

Segregation Index

• Calculates the number of output areas that would need to be moved between Local Authority Districts to have an equal distribution of OAC types across the country

Segregation index• The index of segregation, as

used here, is the index of dissimilarity between a group and the whole of the population. This is the index we use to measure the degree to which a group is spatially polarised. The index of dissimilarity between two groups is half the sum over all areas of the absolute differences between each group divided by its respective total population. It ranges between 0 and near 100%.


where Da is the index between groups a and b (b in our case is the population as a whole), Pi is the population of area i of group a, * represents all areas, and N is the number of areas being considered

Most Segregated types• Blue Collar Communities 26.5%

• City Living 48.8%

• Countryside 53.3%

• Prospering Suburbs 21.4%

• Constrained by Circumstances 28.8%

• Typical Traits 22.7%

• Multicultural 67.7%


Where is most geodemographicly segregated

Segregated Places

1 Newham 89.3

2 City of London 87.0

3 Hackney 82.7

4 Westminster 82.7

5 Tower Hamlets 82.6

6 Lambeth 82.5

7 Kensington and Chelsea 82.2

8 Camden 82.0

9Hammersmith and Fulham 82.0

10 Islington 81.7

425 Dacorum 16.3

426 Wellingborough 15.0

427 Welwyn Hatfield 14.8

428 Gravesham 14.5

429 Cardiff 14.3

430 Sheffield 13.8

431 Leeds 11.6

432 Peterborough 11.2

433 North Hertfordshire 10.5

434 Epping Forest 9.5

Most Least


Thank you

• [email protected]•Areaclassification.org.uk•Sheffield.ac.uk/SASI

mailto:[email protected]

creating a geodemographic system and more: oac and the way forward dr dan vickers social and spatial...

Documents

geodemographics slide

university of sheffield

type size

mean size

family size

classification output

clustering method

people scotland number