cross-national data in dames and ge*de paul lambert, university of stirling prepared for the...

27
Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24 th June 2009 This talk presents materials from the DAMES Node, an ESRC funded research Node of the National Centre for e-Social Science www.dames.org.uk

Upload: nicholas-ball

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

Cross-national data in DAMES and GE*DE

Paul Lambert, University of Stirling

Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24th

June 2009

This talk presents materials from the DAMES Node, an ESRC funded research Node of the National Centre for e-Social Science www.dames.org.uk

Page 2: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

2

Some recent history –Atkinson (1996: 47)

Page 3: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

3

Stewart et al. (2009: 5)

Page 4: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

4

Today’s workshop: ‘Where next?’

Problems / challenges with cross-national survey analysis Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data

The contribution of e-social science

Page 5: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

5

Why is e-Science relevant?e-Science models cover distributed computing & enabling

of collaborations [e.g. Foster et al., 2001]

e-Social Science directed to research infrastructures for collaboration, and for supporting the lifecycle of data oriented research [e.g. Halfpenny & Procter, 2009]

Cross-national survey projects include complex distributed data & a clear need for collaborations…

Hitherto, cross-national survey projects have not generally made use of e-science initiatives

Page 6: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

6

Part 1: What is e-Social Science doing for cross-national survey research?

Projects on the research lifecycle data collection data management [DAMES]data analysis

Projects on a national scale Projects on data, but not necessarily survey data

[e.g. digital records; aggregate data; metadata]

Page 7: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

7

The example of DAMES and GE*DE

www.dames.org.uk 1.1) Grid Enabled Specialist Data Environments (‘GE*DE’)

2.1) Description, discovery & service use through metadata and data abstraction

1.2) Data resources for micro-simulation on social care data

2.2) Techniques to handle data from multiple sources

1.3) Linking e-Health and social science databases

2.3) Workflow modelling for social science

1.4) Training and interfaces for management of complex survey data

2.4) Security driven data management

Page 8: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

8

‘Data management’ means… ‘the tasks associated with linking related data resources, with

coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’ […DAMES Node..]

Usually performed by social scientists themselvesMost overt in quantitative survey data analysis

• Preparing or ‘enabling’ survey analysisUsually a substantial component of the work process

• But not explicitly rewarded (and sometimes penalised)

Here we differentiate from archiving / controlling data itselfHere we differentiate from archiving / controlling data itself

Page 9: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

9

‘The significance of data management for social survey research’

(see http://www.esds.ac.uk/news/eventdetail.asp?id=2151)

The data management studied across the DAMES Node is a major component of the social survey research workload

Pre-release manipulations performed by distributors / archivists• Coding measures into standard categories• Dealing with missing records

Post-release manipulations performed by researchers • Re-coding measures into simple categories

We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently

So the ‘significance’ of DM is about how much better research might be if we did things more effectively…

Page 10: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

10

In GE*DE, we’re developing

Services for accessing and depositing specialist data • Occupations, educational qualifications, ethnicity• UK Administrative data (with ADLS)

Materials specifically oriented to comparative analytical approaches

• Data resources often from major cross-national studies • Producing new cross-national data resources• (see also talk on standardization of categorical data in session 4a)

Page 11: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

11

GEODE v1: Organising and distributing specialist data resources (on occupations)

Page 12: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

12

Cross-national data in DAMES and GE*DE

1. New specialist data on occupations, education and ethnicity

a. Curation and re-release of existing data

b. Generation of new data (and/or metadata), with focus on standardisation/ harmonisation

2. Conduit to existing resources

3. Generic resources for workflow documentation and replication

Page 13: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

13

E.g. (1a) Occupations [cf. Leiulfsrud et al. 2005]

Page 14: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

14

E.g. (1b) Ethnicity / Migration

AboriginalLatin, Central and South American

CanadianBlack/Caribbean South Asian

French and Canadian Other EuropeanCanadian and other Other multiple origins

French BritishBritish and French French and otherBritish Isles, French and Canadian British Isles and CanadianBritish Isles, Canadian and others Other East and Southeast AsianBritish, French and other French, Canadian and otherWest Asian British and otherBritish Isles, French, Canadian and other

Arab

Other Christian+indigenousNo religion+indigenous Catholic+indigenous

Other ChristianCatholicNo religion

Other relgion

Laotian

Hmong Other race, n.e.c.Latin American Indian

Cambodian

AIAN, tribe not specifiedNavajo

Choctaw BlackSiouxVietnameseChippewa Other Specified Indian tribe (2000-2005)CherokeeTwo or more racesPuebloNative Hawaiian

White Asian, not specifiedThai

Other Asian race combinationsFilipino

KoreanPakistani

ChineseJapanese

Asian Indian

Taiwanese

Canada 2001 Mexico 2000 USA 2000Source: IPUMS International (Minnesota Population Centre, 2009).Point show mean occupational advantage score for employed adults using US 2000 CAMSIS. (For ethnic groups with >= 1000 census responses)

Page 15: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

15

E.g. (2): Occupations

Page 16: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

16

E.g. (3): Workflow documentation

Page 17: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

17

Part 2: The contribution of e-Science

The contribution should concern:Navigating complex dataSecurityWorkflows

Compare with current issues for cross-national surveys: Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data

Page 18: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

18

(a) Quantity of data (& metadata)

…current trends

Moving beyond macro-data analysis* to exploiting large-scale micro-datamicro-data

*Country level analysis, e.g. Fuchs (2009)

Interest in / access to securesecure micro-data Exploitation of complexcomplex micro-data

o Longitudinal data and the life-course [Mayer, 2005]o Micro-data and links with macro-data o Metadata about the quality of the micro-data

Page 19: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

19

(a) … can be helped by…

Interest in / access to secure micro-dataE-Science projects building portals for secure access to data (e.g.

Sinnott 2008)

Exploitation of complex micro-data Services for organising complex data (e.g. GE*DE)

Metadata provision on data resources (e.g. PolicyGrid)

Comparative standardisations (e.g. GE*DE)

Tools for complex analysis (e.g. e-Stat)

Tools for simulation (e.g. NeISS)

Tools for visualisation of complex data (e.g. Maptube)

Tools for workflow records for research lifecycle (cf. MyExperiment]

Page 20: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

20

(b) Harmonisation, equivalence and data quality

Variable manipulations require standardization through measurement or meaning equivalence, and adequate documentation / justification for those manipulations

E-Science resources support Documenting / replicating ex post harmonisations

e.g. syntax databases at GE*DE Furnishing new scaling tools (meaning equivalence) e.g. scales of

educational qualifications at GE*DE Facilitating manipulations and standardizations e.g. user-friendly services on variables at GE*DE to enable plurality of

alternative measures

? Pluralistic/open source v’s quality control

Page 21: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

21

More on GE*DE and issues of data quality

GE*DE covers Occupations; Educational qualifications; Ethnicity and migration

These are ‘key variables’ in social science research

Regularly measured Link to concepts of central interest Multivariate context

(Critical relations with gender, age cohort, etc)

Page 22: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

22

Key variables: concepts and measures

Variable Concept Measure (e.g.) Something useful Occupation Class; stratification;

unemploymentOccupation-based social classification

www.geode.stir.ac.uk

Education Credentials; Ability; Merit

Qualification based educational level

www.equalsoc.org/8

[Schneider, 2008]

Ethnic group

Ethnicity; religion; race; national origins

Minority ethnic group indicators

[Bosveld et al 2006]

Age Age; life course stage; cohort

Polynomial age function

[Abbott 2006]

Gender Gender; household / family context

www.genet.ac.uk

Income Income; wealth; poverty;

Monthly income; income groups; …

www.data-archive.ac.uk [SN 3909]

Page 23: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

23

c) Access to data

..need for

Facilities for granting access to dataIncluding new [potentially secure] data

Distribution of suitably detailed metadata [cf. Highly selective approach of existing projects, and benefits of pre-harmonisation accordingly]

E-Social science contributions Security infrastructures (e.g. portal frameworks) offer much

stronger models for secure access to data Services for organising / distributing metadata

Page 24: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

24

The contribution of e-Science - reflections

The contribution should concern: Navigating complex data Security Workflows

But, generally, it isn’t taken up

(cf. existing networks, e.g. LIS, IPUMS, ESS, etc)

Page 25: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

25

Possible explanations

E-science tools and services too heavyweight compared to ad hoc sharing solutions

• Overheads in adopting e-Science tools (cf. existing working models)

• E-science tools are unduly generic (c.f. ongoing focussed projects and related resources)

Working habits: Experts and software • Major cross-national projects pre-date e-Science initiatives• Key role of project-specific experts• Many projects are ‘small N’ and don’t seem to require

heavyweight inputs• Survey researchers collaborate through proprietary software

(e.g. Stata, SPSS)

Page 26: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

26

Conclusions – will things change?

Overheads of e-Science engagement might decline • GE*DE aims: user friendly services, service delivery emphasis,

training workshops, mainstream software

Existing ad hoc practices could become insufficient• Data of greater scale and complexity • Data with security limits• Need for integrated access and complex analysis• Need for plurality in analyses of multiple measures (even in

‘Small N’ comparisons)• Need for documentation for replication

Page 27: Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

27

References cited

Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford: Stanford University Press.

Atkinson, A. B. (1996). Seeking to explain the distribution of income. In J. Hills (Ed.), New Inequalities: The changing distirbution of income and wealth in the United Kingdom. Cambridge: Cambridge University Press.

Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.

Foster, I., Kesselman, C., & Tuecke, S. (2001). The Anatomy of the Grid: Enabling Scaleable Virtual Organizations. International Journal of Supercomputer Applications, 15(3), 200-222.

Fuchs, C. (2009). The Role of Income Inequality in a Multivariate Cross-National Analysis of the Digital Divide. Social Science Computer Review, 27(1), 41-58.

Halfpenny, P., Procter, R., & (2009). Guest editorial: Special issue on e-Social Science. Social Science Computer Review, 27(4).

Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Mayer, K. U. (2005). Life courses and life chances in a comparative perspective. In S. Svallfors (Ed.), Analyzing

Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. Minnesota Population Center. (2009). Integrated Public Use Microdata Series - International: Version 5.0.

Minneapolis: University of Minnesota. Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of

Content and Criterion Validity for 15 European Countries. Mannheim: MZES. Sinnott, R. O., & (2008). Grid Security. In L. Wang, W. Jie & J. Chen (Eds.), Grid Computing: Technology, Service

and Applications. London: CRC Press. Stewart, K., Sefton, T., & Hills, J. (2009). Introduction. In J. Hills, T. Sefton & K. Stewart (Eds.), Towards a more

equal society? Poverty, inequality and policy since 1997. Bristol: The Policy Press. Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.