attribution and impact for social science data
TRANSCRIPT
Attribution and impact for social
science data
ODIN conference, Cologne
October 2013
Louise Corti
Collections Development and
Producer Support
Overview
• Introducing the UK Data Service
• Our data portfolio and users
• Citation, impact measurement and DOIs
• Challenges for social science citation
The UK Data Archive
• Based at the University of Essex, since 1967
• 45 years of selecting, ingesting, curating and providing access to social science data
• designated as Place of Deposit by The National Archives
• Data and data support services for higher and further education for research, teaching and learning
• Recently attained the highest information security standard, ISO 27001
SISTER DATA ARCHIVES
Council of European Social Science Data Archives (CESSDA )
ADA Australian Social
Science Data Archive
ICPSR (USA) Inter-University Consortium for
Political and Social Research
What is the UK Data Service?
• Comprehensive data resource funded by the UK Economic and Social Research Council
• Single virtual point of access to a wide range of secondary data for social science research (Directed from Essex)
• Offer promotion, support, training and guidance
What does the UK Data Service do?
• Put together a collection of the most valuable data
• Preserve data for the long term for future research
purposes
• Make the data and documentation available for reuse
• Provide data management advice for data creators
• Provide training and support for users of the service
• Bring together owners, producers and users
• Demonstrate impact through evidence of usage
• Easy access through website - ukdataservice.ac.uk
Who is our service for?
• Data for secondary analysis, research, policy making
• Teaching and learning
• Academic researchers and students
• Government analysts
• Charities and foundations
• Business consultants
• Independent research centres
• Think tanks
Our data portfolio
• Over 6,000 datasets in the collection
• 230 new datasets added each year
• Official agencies - mainly central government
• International statistical time series
• Individual academic’ research grants
• Market research agencies
• Public records/historical sources
• Access to international data via links with
other data archives worldwide
UK survey series
• High quality repeated cross-sectional surveys
• Individual or household level data
• Cover many topics including health, work, crime, social
attitudes, family expenditure, living costs, housing etc.
• Labour Force Survey
• British Crime Survey
• Health Survey for England
• British Social Attitudes
• Annual Population Survey
….
Cross-national surveys and macro databanks
• Eurobarometers
• European Social Survey
• European Values Survey
• International Social Survey Programme
• Time series data aggregated to country/region
• International governmental organisations (IMF, OECD,
IEA, World Bank)
Longitudinal studies
• British Household Panel Survey and Understanding
Society
• Understanding Society (2009-)
• English Longitudinal Study of Ageing
• Families and Children Study
• Growing Up in Scotland
• Longitudinal Study of Young People in England
UK census data
• 1971-2011 census data
• Baseline for other statistics
• Detailed combinations of characteristics
• Small geographies
• Census outputs
• Aggregate data
• Boundary data
• Flow data
• Microdata
Business data
• Collected through a wide range of surveys, and
administrative sources:
• productivity, innovation, workforce skills, earnings
• international trade, foreign direct investment
• research and development
• business demography
• industrial relations
Qualitative data
• Interviews, focus groups
• Essays, diaries, open-ended survey questions
• Observations, case notes etc.
• Family Life and Work Experience before 1918, Middle and Upper Class Families in the Early 20th Century,1870-1977
• Gender Difference, Anxiety and the Fear of Crime, 1995
• Mothers Alone: Poverty and the Fatherless Family, 1955-1966
Usage of data
• Operate a spectrum of access
• Web download under End
User Licence
• Permission only via Special
Licence access
• ‘Approved researcher’ access
via remote secure access
• End user licence includes:
• Appropriate data usage
• Full citation of data and informing us of re-use
• Have always provided a citation format
• over 22,000
registered users
• approximately
60,000 downloads
worldwide p.a.
• 3,000+ user support
queries
Evidence of access and re-use
User access information
• Collect user information and ‘projects’ upon registration
• Collate data and documentation download statistics
• Users can share project information for others to see
• Report data access stats on demand
Usage information
• Email all users every 6 months after registration about activity
• Manually add all research outputs references to the data record
• Reporting rate of publications is poor!
• Prior to DOIs, have scanned citation literature for dataset
mentions – very manual and unreliable, and poorly cited
Impactful case studies of use
• Identify and seek out case studies of re-use: research or
teaching.
• Very successful!
• 125 case studies in our database
• Can help provide impact stories for data owners/producers
and users
• And can inspire others!
• Some are harvested by ESRC for their website
• Often include ongoing work – no need to wait for
publications
Our Persistent identifiers approach
• Our data collections are not digital objects
• Need to capture changes made to data
• Versioning data in a commonly understood manner
• Needed rule-based definition of a‘significant’change
• Integrate processes with digital preservation activities & work
flows
• In 2011 we assigned Datacite DOIs for all of our collections
• Mint and update DOIs with our metadata management
infrastructure
Recording significant change
• Approx. 15% UKDA data collections are altered within
first year after first publication
• We have distinguished between major and minor
changes to a data collection = high impact vs. low impact
• DOI allocated to a metadata instance of a data collection
• DOIs resolve to jump page pointing to all external instances
• New DOI = High Impact change, with explicit logging
• Provided access only to most up-to-date version of data
Major changes – high impact
• New variable added
• New labels/value codes added
• Weighting variables reconstructed
• Wrong data supplied (e.g., March
not April)
• Mis-coded data (e.g., Don’t
know/Refused confused)
• Change in format (file migration)
• Significant changes in
documentation
• Change in access conditions
Raising awareness in the social sciences
• ESRC funding for short-term project on citation
• Advocacy for best practice in citing research data
• Audiences
• Professional organisations
• Academic publishers and journal editors
• Researchers and postgraduates
• Key activities
• Data citation principles for social sciences
• Personal communications
• Events with BL DataCite, JISC and wider PI community
• Outreach through Doctoral Training Centres
Demonstrating impact with citation
• Assuming better use of DOIS…
• Starting to search for use of our DOIs – Google
• Automate this process and compile reports; promote
• Gather data citation statistics from Thomson Reuters
Data Citation Index. One of the early 20 feeder
repositories, but our own access limited!
• Work with BL Datacite and ODIN to gain connectivity
between identifiers & outputs – early adopters
CHALLENGES FOR THE FUTURE
• Citing parts (fragments) of data collections
• single files
• subsets of quantitative data
• extracts of textual data
• ESRC project Digital Futures will enable extract level
citation within a web-based browsing system
• Using rich highly structured XML metadata
• GUIDS for everything
CONTACT
UK Data Service
University of Essex
Wivenhoe Park
Colchester
Essex CO4 3SQ • ……………..…..………………………..
T +44 (0)1206 872001