1 e-science resources for handling data on occupations, educational qualifications and ethnicity –...

38
1 E-science resources for handling data on occupations, educational qualifications and ethnicity – the DAMES and GEODE projects Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

Upload: doris-lambert

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

E-science resources for handling data on occupations, educational qualifications and ethnicity – the DAMES and GEODE projects

Paul Lambert University of Stirling

Vernon Gayle University of Stirling & ISER

2

Structure of this talk• What is e-Science

• What is the Grid

• What can e-Social Science do for survey research?

• Grid Enabled Specialist Data Environments

• Specialist files and resources

• A little on occupations

• Something on education

• Almost nothing on ethnicity (it is only 30 mins)

3

What is e-Science

• Originally experiments to connect together a few powerful computers

• The ability to connect high powered computers to undertake enormous calculations often on huge datasets

• “The Grid” = the co-ordination of geographically dispersed computing and data resources

4

What is e-Science• “What is exciting about the Grid is the combination of

extensive connectivity, massive computer power and vast quantities of digitised data – all three of which are still rapidly expanding – making possible new applications that are orders of magnitude more potent than even a few years ago”

• “The term 'e-research' is sometimes used instead of 'e-science', with the advantage that gives more emphasis to the end result of better, richer, faster or new research results, rather than the technologies used to get them” (http://www.ncess.ac.uk/)

5

The Grid• Grid computing (or the use of a computational grid)

is the application of several computers to a single problem at the same time

– usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data

• According to John Patrick, IBM’s vice president for Internet strategies, “the next big thing will be grid computing”

6

E-Social Science in the UK

• ‘e-Science’ nowadays used as a broader term involving use of technologies associated with the Grid and with other collaborations between computing and software resources

• NCeSS: UK programme of projects looking at e-Science applications in social science projects (e.g. distributed computing; access and analysis of complex data; secure access to sensitive data)

7

Less obviously some other activities where e-science could potentially help the research process

Data Preparation & Management

• Manipulating data

Recoding categories / ‘operationalising’ variables

These are the focus of the DAMES research Node, www.dames.org.uk

8

Data Collection

(survey data agencies, academics etc.)

Data Storage & Curation

(data archives etc.)

Data Management and Analysis – “Lone Researcher”

Stand alone computer (usually a PC)

Statistical Software (e.g. SPSS)

The Orthodox Survey Research Process

9

Statistical Analysis Process

• Awesome increases in desk top computing power (and storage capacity)

• Almost instant data download (from archives etc)

• The time ratio of data preparation to statistical modelling is probably about 10:1

10

e-Social Science Possibilities

secure accesscomputing networks

Software (e.g. Sabre R)

Data Linking

Harmonisation

11

e-Social Science Possibilities

secure accesscomputing networks

Software (e.g. Sabre R)

Data Linking

Harmonisation

Specialist Files & Resources

12

Grid Enabled Specialist Data Environments (‘GE*DE’)

• Programme of activities within the DAMES research Node

• Coordinating access to and exploitation of specialist information resources in the fields of – Occupations– Educational qualifications– Ethnicity and migration

13

An Example: Specialist Files & Resources

• A researcher has a survey with occupational information and wants to construct an occupation based social class measure

14

An Example:Specialist Files & Resources

• Historically, the ‘information’ to construct the measure will be in the following forms

– books (or paper files)– www files (e.g. national statistical agencies)– computer files (e.g. Stata .do files)

15

Clear problems...

• Access to the information– are the files publicly (or easily) available?– a “.do file” on a single researcher’s hard drive

• Unnecessary re-working– certain sources (e.g. paper) need lots of working to

produce properly coded survey data

An Example:Specialist Files & Resources

16

• Replicability

– Is there clear information that allows a secondary researcher to use the resource?

– e.g. clear documentation

– what the information science community call metadata i.e. “data about other data”

– Dale (2006) for a discussion

17

Motivation for our projects?

• We currently observe inadequate practices in survey data analysis

• Substandard practices in data management is observed in the following areas

• Not keeping adequate records• Not linking relevant data • Not trying out relevant variable operationalisations

18

Key Variables

• We concentrate (so far) on “key” variables

These are variables that are central to, and commonly found in, survey data analysis

They include... occupation, education, ethnicity, gender, age, income

(some survey variables are easier to deal with than others)

19

An Example: Occupational Social Class

• As far back as the late 60s Frank Bechhoffer recommends that researchers should use established (and therefore replicable) social class schemes

• Returning to the researcher with a survey with occupational information (e.g. SOC 90) and employment status information who wants to construct a social class measure

20

GEODE -

• Portal to log into

• Searchable – can find resources

• e.g. SPSS file that allows linkage of their survey data to an occupation-based social class scheme

• Further examples is our working paper (2008-1)

21

22

23

Education:

• Education is a key social science measure that is included in an extremely wide variety of substantive analyses

• Education as an explanatory (X) variable:

Education is frequently used in statistical analyses as a key explanatory variable (usually with a number of other explanatory variables)

This is usual in areas such as sociology, social policy and economics

24

Education:

• Education as an outcome (Y) variable:

In more specialist studies an education measure is itself of interest as an outcome (for example gaining a specific qualification or level of attainment)

This is common in educational studies and within the sociology of education

25

Education:

“the question of how to measure education and qualifications – or indeed what ‘measure’ means – raises interesting issues…Since there is no agreed standard way of categorising educational qualifications”

(Prandy, Unt & Lambert 2004)

26

Comparing Education with

Occupational information

• Survey starts with textual description

• Translated into Occupational Unit Group

• Agreed standards of data collection & classification

OUG Scheme; Industrial sector; employment status

• No similar consensus with educational data

27

Obvious issues with Educational variables

• Many measures (not just qualifications)

• Organisation and structure changes

• Changes in distributions over time

• We can learn from international comparisons

28

Many MeasuresSome Examples of the 41 CategoriesHighest Qualification (General Household Survey 2003)

highest qualification | Freq. ----------------------------------------+------------ 1. higher degree | 669 2. nvq level 5 | 20 3. first degree | 1,416 4. other degree | 278 5. nvq level 4 | 71 6. diploma in higher education | 282 7. hnc/hnd btec higher etc | 551 9. teaching - secondary education | 55 10. teaching - primary education | 69 12. nursing etc | 267 14. other higher education below degree | 151 21. scotish 6th year certificate/csys | 24 28. city & guilds craft/part 2 | 306 29. btec/scotvec first or gen diploma e | 42 30. o level, gcse grase a*-c or equival | 2,460 31. nvq level 1 or equivalent | 102 33. gse below grade 1, gcse below grade | 693 41. dont know | 79 ----------------------------------------+------------

Total | 24,489

29

Many MeasuresHighest Academic Qualification (British Household Panel Survey 1991 – Wave A)

highest academic | qualification | Freq. Percent Cum.----------------------------+----------------------------------- -9. missing | 19 0.19 0.19-7. proxy respondent | 352 3.43 3.61 1. higher degree | 122 1.19 4.80 2. 1st degree | 598 5.83 10.63 3. hnd,hnc,teaching | 496 4.83 15.46 4. a level | 1,362 13.27 28.73 5. o level | 2,510 24.45 53.19 6. cse | 529 5.15 58.34 7. none of these | 4,276 41.66 100.00----------------------------+----------------------------------- Total | 10,264 100.00

30

Organisational ChangesType of School Attended by Birth Cohorts British Household Panel Survey 1991 – Wave A(Extract column percentages) cohorts type of school | attended | Prewar 1944 Act Crossland | Total----------------------+---------------------------------+---------- comprehensive sch | - 10.47 53.25 | 25.92 ----------------------+---------------------------------+---------- grammar not fee pa | 9.58 19.14 8.06 | 12.10 ----------------------+---------------------------------+---------- grammar fee-paying | 4.55 1.93 0.97 | 2.25 ----------------------+---------------------------------+---------- public & private | 5.52 5.63 4.68 | 5.22 ----------------------+---------------------------------+---------- elementary | 35.20 2.45 - | 10.35 ----------------------+---------------------------------+---------- secondary modern | - 52.11 24.01 | 33.64 ----------------------+---------------------------------+---------- technical | - 3.49 0.80 | 2.15 ----------------------+---------------------------------+----------1.Suspect errors – potentially misleading measure

31

GHS 1983 GHS 2003O’Levels GCSE

educational level | Freq. Percent Cum.-------------------------+----------------------------------- -9. does not apply | 3,529 17.59 17.59 1. higher degree | 99 0.49 18.09 2. first degree | 790 3.94 22.03 3. teaching qual | 279 1.39 23.42 4. other higher qual | 651 3.25 26.66 5. nursing qual | 283 1.41 28.07 6. gce a level 2+ | 385 1.92 29.99 7. gce a level 1 | 688 3.43 33.42 8. gce o level 5+ | 1,439 7.17 40.60 9. gce o lev1-4 & cq | 418 2.08 42.68 10. gce o lev1-4 no cq | 1,053 5.25 47.9311. com qual no o levels | 704 3.51 51.44 12. cse grades 2-5 | 595 2.97 54.41 13. apprenticeship | 907 4.52 58.93 14. foreign quals | 154 0.77 59.70 15. other quals | 251 1.25 60.95 16. no quals | 7,734 38.56 99.51 17. no answer | 29 0.14 99.66 18 | 8 0.04 99.7020. never went to school | 61 0.30 100.00-------------------------+----------------------------------- Total | 20,057 100.00

education level - | Freq. Percent Cum.----------------------------------------+----------------------------------- -9. never attended school | 21 0.09 0.09 -8. na | 4 0.02 0.10 -6. child/out age/no int | 9,694 39.59 39.69 1. higher degree | 689 2.81 42.50 2. first degree | 1,765 7.21 49.71 3. teaching qualification | 213 0.87 50.58 4. other higher qualification | 979 4.00 54.58 5. nursing qualification | 259 1.06 55.63 6. gce a level in two or more subjects | 1,752 7.15 62.79 7. gce a level in one subject | 486 1.98 64.77 8. gcse/olevel, standard grades, 5+ | 1,915 7.82 72.59 9. gcse/olevel 1-4 | 1,257 5.13 77.7210. cse below grade 1, gcse below grade | 1,373 5.61 83.33 11. apprenticeship | 144 0.59 83.92 12. other qualification | 654 2.67 86.59 13. no qualification | 3,284 13.41 100.00----------------------------------------+----------------------------------- Total | 24,489 100.00

Changes in Qualification (titles & levels)

32

Changes in Distributions

02

,000

4,0

006

,000

8,0

00

no quals some quals f.e. h.e.

count of own count of father

British Household Panel Survey (Wave M)Respondent’s Education Level and Father’s Education Level

33

We can learn from international comparisons

CASMIN Brynin Example of BHPS & GSOEP

34

Can e-Social science help us?

• Data discipline

• Data matching / merging

• Data access (confidential records)

(future changes in access agreements)

35

What should we do in DAMES?

• Database of typologies of qualifications linking to broader educational measures

– Listings / taxonomies of educational titles• e.g. based on what major social surveys have used

• Enhanced access to specialist data on educational qualifications

• Same model as GEODE?

• User friendly prescriptions for best practice in using educational data

• User friendly support for distributing data (and metadata) on education

36

Ethnicity and the DAMES project

• Tricky topic to collate information on – Few recognisable ‘ethnic unit groups’ – Limited previous ‘data management’ reflection – Very few published databases on ethnicity– Important question of sparse distributions– Dynamic, & rapidly expanding (or contracting)

• Likely role is to give guidance on existing data / taxonomies and routines to allow their analysis

– category recodings \ scaling of categories– support for analysis in context of age \ gender \ region– {GEODE model with far fewer ‘Ethnicity unit groups’}

37

Conclusions

• e-Social Science resources can help improve survey research

– assist with access to disparate resources

– help with data management (especially key variables)

– help with data standard and best practice

– help with replicability (and improve incremental science)

38

Brynin, M. (2003). Using CASMIN: the effect of education on wages in Britain and Germany, in Hoffmeyer-Zlotnik, J. and Wolf, C., Advances in Cross-National Comparison: A European Working Book for Demographic and Socio-Economic Variables, Kluwer: Amsterdam, 327-44.

Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.

Lambert, P. S., Tan, K. L. L., Turner, K. J., Gayle, V., Prandy, K., & Sinnott, R. O. (2007). Data Curation Standards and Social Science Occupational Information Resources.International Journal of Digital Curation, 2(1), 73-91.

Lambert, P.S., Gayle, V., Tan, L., Blum, J., Bowes, A., Jones, S., Turner, K., Warner, G., Sinnott, R., & Bihagen, E. (2008). Grid Enabled Specialist Data Environments: Forward Planning for GE*DE Services for Specialist Data

Occupations, Educational Qualifications, and Ethnicity, Dames Project Technical Paper 2008-1

Prandy, K., Unt, M., & Lambert, P. S. (2004). Not by degrees: Education and social reproduction in twentieth-century Britain. Paper presented at the ISA RC28 Research Committee on Social Stratification and Mobility.