advancing the comparability of occupational data through linked open data
TRANSCRIPT
Richard Zijdeman [richard.zijdeman at iisg.nl]Kathrin DentlerRinke Hoekstra
Albert Meroño-Peñuela
Advancing the comparability of occupational data through
Linked Open Data
HISCO workshopHistorical Population Database of Transylvania
Cluj, RomaniaJune 18, 2016
2
... it is market position, and especially position in the occupational division of labour, which is fundamental to the generation of structured inequalities. The life chances of individuals and families are largely determined by their position in the market and occupation is taken to be its central indicator ... .
(Rose and Harrison, 2010)
3
Occupations are important as dependent variables (occupational attainment studies) and independent variables (occupation stratification studies) in educational (and occupational) status attainment, health, voting, consumption, marriage etc.
(Ganzeboom, 2008)
4
Occupations are one of the few indicators of social position that are available in:
• large quantities • different time periods • various societies• at the individual level (smallest level of detail)
5
Lack of comparability
• Many different occupational classifications
• Differences in mobility studies could results from different classification methods (Kaelble 1985)
Charles Booth (1886-1903)
6
HISCO
• Historical International Standard Classification of Occupations
• Put together by a large number of institutes
• Based on ILO’s ISCO ’68
• Occupations retrieved from registers
• 1675 occupational codes
7
Current solution: 2-step procedure
Code into the concept, first:• Classify into the concept (HISCO)• Link the measure of stratification to the concept (e.g. SOCPO,
HISCAM)
8
New problems
1. What concept?• Historical International Standard Classification (HISCO)• OCCHISCO• PST
2. Not all measures link to all concepts• E.g. no link between OCCHISCO and HISCAM
3. Adaptability of concepts (new versions)
9
Is this a substantive problem?
Illustrative example:• Subset of SAME occupational titles from NAPP and HISCO• Link these occupations to HISCAM• For HISCO directly provided by HISCAM people• For OCCHISCO indirectly through a mapping
10
occupations
OCCHISCO
HISCO
HISCAMCross-walk
E.g.: necessary for a comparison between Norway and the Netherlands
11
12
13
So yes, this is problematic
• ‘Lost’ 41% explained variance • Cf. regression models: usually not above 30%• HISCAM often both as dependent and independent variable
14
New problems
1. What concept?• Historical International Standard Classification (HISCO)• OCCHISCO• PST
2. Not all measures link to all concepts• E.g. no link between OCCHISCO and HISCAM
3. Adaptability of concepts (new versions)
15
Towards a solution
• Linked Data (Berners-Lee, 2006)
• Define Resources (books, respondents, etc.) with a URI
• Present URI’s as URL’s
• Describe Resources using so called ’triples’
16
An example of a triple
Margaret Minerworks as
PropertyResource Value
17
Miner
occupation
is of type
Resource
Property
Value
18
Miner
occupation
is of type
Margaret Minerworks as
19
miner
50.56
71105
71120
hasocchisco
has hisco
has hiscam
Occupational title
Source
PST: 123
OCCHISCO: 123
HISCO: 12345
HISCO: 54321
WasDerivedFrom
codedByLeigh
codedByEvan
codedByChris
codedByRichard
HISCAM: 88codedByMappingFile
Provenance
21
HISCO vocabulary
22
• hisco:entry for ‘occupational titles’
• transitivity between category, unit, minor and major group
23
Case study: DBpedia
- Structured data behind Wikipedia
- Information on all kinds of topics, also occupations
- Add HISCO codes to DBpedia occupations
- Let’s try and do this live: http://yasgui.org/short/VJfZvnx6x
24
Caveats
• We did not check the technique on a really big scale (e.g. NAPP data)
• Sharing code remains a collective action problem (but less of a coordination problem)
25
Conclusions
Linked Data
• Enhances comparative occupational research
• Adds visibility of heterogeneity in coding practices
26
Outlook
• Linkage to texts (occupations in newspapers)
• Linkage to public resources: Wikipedia
• Combine Machine Learning and Linked Data for automated occupational coding