advancing the comparability of occupational data through linked open data

27
Richard Zijdeman [richard.zijdeman at iisg.nl] Kathrin Dentler Rinke Hoekstra Albert Meroño-Peñuela Advancing the comparability of occupational data through Linked Open Data HISCO workshop Historical Population Database of Transylvania Cluj, Romania June 18, 2016

Upload: richard-zijdeman

Post on 16-Apr-2017

65 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Advancing the comparability of occupational data through Linked Open Data

Richard Zijdeman [richard.zijdeman at iisg.nl]Kathrin DentlerRinke Hoekstra

Albert Meroño-Peñuela

Advancing the comparability of occupational data through

Linked Open Data

HISCO workshopHistorical Population Database of Transylvania

Cluj, RomaniaJune 18, 2016

Page 2: Advancing the comparability of occupational data through Linked Open Data

2

... it is market position, and especially position in the occupational division of labour, which is fundamental to the generation of structured inequalities. The life chances of individuals and families are largely determined by their position in the market and occupation is taken to be its central indicator ... .

(Rose and Harrison, 2010)

Page 3: Advancing the comparability of occupational data through Linked Open Data

3

Occupations are important as dependent variables (occupational attainment studies) and independent variables (occupation stratification studies) in educational (and occupational) status attainment, health, voting, consumption, marriage etc.

(Ganzeboom, 2008)

Page 4: Advancing the comparability of occupational data through Linked Open Data

4

Occupations are one of the few indicators of social position that are available in:

• large quantities • different time periods • various societies• at the individual level (smallest level of detail)

Page 5: Advancing the comparability of occupational data through Linked Open Data

5

Lack of comparability

• Many different occupational classifications

• Differences in mobility studies could results from different classification methods (Kaelble 1985)

Charles Booth (1886-1903)

Page 6: Advancing the comparability of occupational data through Linked Open Data

6

HISCO

• Historical International Standard Classification of Occupations

• Put together by a large number of institutes

• Based on ILO’s ISCO ’68

• Occupations retrieved from registers

• 1675 occupational codes

Page 7: Advancing the comparability of occupational data through Linked Open Data

7

Current solution: 2-step procedure

Code into the concept, first:• Classify into the concept (HISCO)• Link the measure of stratification to the concept (e.g. SOCPO,

HISCAM)

Page 8: Advancing the comparability of occupational data through Linked Open Data

8

New problems

1. What concept?• Historical International Standard Classification (HISCO)• OCCHISCO• PST

2. Not all measures link to all concepts• E.g. no link between OCCHISCO and HISCAM

3. Adaptability of concepts (new versions)

Page 9: Advancing the comparability of occupational data through Linked Open Data

9

Is this a substantive problem?

Illustrative example:• Subset of SAME occupational titles from NAPP and HISCO• Link these occupations to HISCAM• For HISCO directly provided by HISCAM people• For OCCHISCO indirectly through a mapping

Page 10: Advancing the comparability of occupational data through Linked Open Data

10

occupations

OCCHISCO

HISCO

HISCAMCross-walk

E.g.: necessary for a comparison between Norway and the Netherlands

Page 11: Advancing the comparability of occupational data through Linked Open Data

11

Page 12: Advancing the comparability of occupational data through Linked Open Data

12

Page 13: Advancing the comparability of occupational data through Linked Open Data

13

So yes, this is problematic

• ‘Lost’ 41% explained variance • Cf. regression models: usually not above 30%• HISCAM often both as dependent and independent variable

Page 14: Advancing the comparability of occupational data through Linked Open Data

14

New problems

1. What concept?• Historical International Standard Classification (HISCO)• OCCHISCO• PST

2. Not all measures link to all concepts• E.g. no link between OCCHISCO and HISCAM

3. Adaptability of concepts (new versions)

Page 15: Advancing the comparability of occupational data through Linked Open Data

15

Towards a solution

• Linked Data (Berners-Lee, 2006)

• Define Resources (books, respondents, etc.) with a URI

• Present URI’s as URL’s

• Describe Resources using so called ’triples’

Page 16: Advancing the comparability of occupational data through Linked Open Data

16

An example of a triple

Margaret Minerworks as

PropertyResource Value

Page 17: Advancing the comparability of occupational data through Linked Open Data

17

Miner

occupation

is of type

Resource

Property

Value

Page 18: Advancing the comparability of occupational data through Linked Open Data

18

Miner

occupation

is of type

Margaret Minerworks as

Page 19: Advancing the comparability of occupational data through Linked Open Data

19

miner

50.56

71105

71120

hasocchisco

has hisco

has hiscam

Page 20: Advancing the comparability of occupational data through Linked Open Data

Occupational title

Source

PST: 123

OCCHISCO: 123

HISCO: 12345

HISCO: 54321

WasDerivedFrom

codedByLeigh

codedByEvan

codedByChris

codedByRichard

HISCAM: 88codedByMappingFile

Provenance

Page 21: Advancing the comparability of occupational data through Linked Open Data

21

HISCO vocabulary

Page 22: Advancing the comparability of occupational data through Linked Open Data

22

• hisco:entry for ‘occupational titles’

• transitivity between category, unit, minor and major group

Page 23: Advancing the comparability of occupational data through Linked Open Data

23

Case study: DBpedia

- Structured data behind Wikipedia

- Information on all kinds of topics, also occupations

- Add HISCO codes to DBpedia occupations

- Let’s try and do this live: http://yasgui.org/short/VJfZvnx6x

Page 24: Advancing the comparability of occupational data through Linked Open Data

24

Caveats

• We did not check the technique on a really big scale (e.g. NAPP data)

• Sharing code remains a collective action problem (but less of a coordination problem)

Page 25: Advancing the comparability of occupational data through Linked Open Data

25

Conclusions

Linked Data

• Enhances comparative occupational research

• Adds visibility of heterogeneity in coding practices

Page 26: Advancing the comparability of occupational data through Linked Open Data

26

Outlook

• Linkage to texts (occupations in newspapers)

• Linkage to public resources: Wikipedia

• Combine Machine Learning and Linked Data for automated occupational coding

Page 27: Advancing the comparability of occupational data through Linked Open Data

27

Thank you

[email protected]