supporting the development of a national research data discovery service – a pilot project

21
Supporting the development of a national Research Data Discovery Service – a Pilot Project Stuart Macdonald EDINA & Data Library University of Edinburgh [email protected]

Upload: edina-university-of-edinburgh

Post on 13-Jan-2017

96 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Supporting the development of a national Research Data Discovery Service – a Pilot ProjectStuart MacdonaldEDINA & Data LibraryUniversity of Edinburgh

[email protected]

Page 2: Supporting the development of a national Research Data Discovery Service – a Pilot Project

• University of Edinburgh • Background and context• UK Research Data Discovery Service• PhD Interns• Observations• Closing remarks

Page 3: Supporting the development of a national Research Data Discovery Service – a Pilot Project

University of Edinburgh• Founded in 1582 - 6th oldest university in the English-speaking

world and one of Scotland's 4 ancient universities.

• 3 Colleges (MVM, CSE, CHSS) , 22 Schools

• Over 60 disciplinary/cross-disciplinary Institutes and Research Centres

• 34000 students, 4500 researchers, 6000 research students

Page 4: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Background• EDINA and Data Library are a division within Information Services (IS) of the

University of Edinburgh.

• EDINA is a Jisc-funded centre for digital expertise providing national online resources for education and research.

• Data Library & Consultancy assists Edinburgh University users in the discovery, access, use and management of research datasets.

• The Data Library is part of the new Research Data Service – the culmination of a 36 month RDM Roadmap (phase 1 and 2) to implement the University’s RDM Policy and develop a suite of RDM Services that map onto the research lifecycle

Data Library Services: http://www.ed.ac.uk/is/data-library EDINA: http://edina.ac.uk/

Page 5: Supporting the development of a national Research Data Discovery Service – a Pilot Project

• In order to be reused, research data must be discoverable.

• The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects.

• Universities are increasingly making research data assets available through repositories or other data portals.

• The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops.

* https://www.epsrc.ac.uk/about/standards/researchdata/expectations/

Context

Page 6: Supporting the development of a national Research Data Discovery Service – a Pilot Project

UK Research Data Discovery Service (RDDS)In 2013, the Digital Curation Centre (DCC) and the UK Data Service piloted a registry service to aggregate metadata for research data held within a sample of UK universities and national, discipline specific data centres.

This 6 month pilot that tested an existing data registry architecture developed by the Australian National Data Service (ANDS).

This was followed up with Phase 2 funding from Jisc to evaluate technical solutions and further develop a national Research Data Discovery Service • https://www.jisc.ac.uk/rd/projects/uk-research-data-discovery • http://ckan.data.alpha.jisc.ac.uk/

Page 7: Supporting the development of a national Research Data Discovery Service – a Pilot Project

As part of Phase 2 University of Edinburgh received funding from Jisc to support the development of UKRDDS.

PhD interns from the 3 Colleges were hired through a ‘streamlined’ e-recruitment* process - As part of IS’s plan to recruit 500 PhD interns per academic year – complete with formal eligibility to work checks, inductions, probation reports, end of employment /continuation of employment processes !!

To engage with local researchers in order to make metadata and full data sets available for harvest into the pilot service for discovery and potential reuse.

This work was co-ordinated jointly by EDINA & Data Library and Library & University Collections.

Progress was reported back to Jisc via monthly UKRDDS meetings and F-2-F workshops as well as representation on UKRDDS Technical and Metadata Advisory Groups.

Page 8: Supporting the development of a national Research Data Discovery Service – a Pilot Project

PhD Interns: responsibilitiesTo develop plans for getting researchers in schools engaged with recording or sharing their data

To work closely with researchers and School administrators to assist in the description and upload of research data into:

• PURE, the University’s proprietary Current Research Information System, used as a data catalogue where descriptive metadata about datasets can be added to link to related research outputs, publications or projects.

• Work needed to convert PURE ver. 5 API into OAI-PMH end-point

 

Page 9: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Edinburgh DataShare - the University’s OA multi-disciplinary data repository hosted by the Data Library

• It allows University researchers to upload, share, and license their data resources for online discovery and re-use by others.

• OAI-PMH compliant• Built on the DSpace platform• http://datashare.is.ed.ac.uk

Page 10: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Other responsibilities:To validate and quality control metadata records ingested into both PURE and DataShare for the purpose of being harvested by UKRDDS

To develop or enhance the quality of metadata records to the standard set for UKRDDS

To assist in the identification and deposit of research datasets deemed suitable or appropriate for open publication and long-term preservation into DataShare

To record their own observations and provide period reports on data sharing and cataloguing practices within respective Schools.

Page 11: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Observations

1st tranche of PhD interns (Dec. 15 – April 16)

• School of Literatures, Languages and Cultures• Roslin Institute• School of Social and Political Science

2nd tranche (Mar. 15 – Sept.16)• Division of Infection and Pathway Medicine. School of Medicine• School of Literatures, Languages and Cultures (2nd intern)

3rd tranche (June. 16 – Sept. 16)• School of Divinity• School of Engineering

Page 12: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Literatures, Languages, and Cultures (LLC)• 3 datasets described in PURE, 2 datasets deposited into DataShare and

described in PURE

• 14 researchers interviewed for LLC ( + 7 researchers for Philosophy, Psychology and Language Science)

• LLC has dedicated RDM webpages

• Communications with researchers within the two Schools were conducted via Research Administrators

• Research Administrators and researchers happy to talk once the interns is not seen as an ‘enforcing figure’

Page 13: Supporting the development of a national Research Data Discovery Service – a Pilot Project

• Researchers expressed discomfort or unfamiliarity concerning online distribution of data and unease about upsetting publishers making their data available online

• Due to the nature of humanities research, where interpretation of existing artefacts (books, historic texts, manuscripts) is itself the research output, researchers did not tend to regard this as data

• Copyright was seen as one of the main issue hindering dataset deposit – a limiting factor when researchers’ data is based on texts and other archival material.

• Also, some documents no longer under copyright are restricted from imaging due to preservation efforts

• When texts themselves are a researcher’s own ‘data’ (as if often the case in Humanities) there is still a reluctance to share

Page 14: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Roslin Institute67 researchers interviewed belonging to 4 divisions (70% of total)• Infection and Immunity• Genetics and genomics• Neurobiology and Developmental Biology• Clinical researchers from Veterinary School

0 datasets deposited in DataShare. Linking data in e.g. NCBI to PURE unrealisitic (see next slide)

• PhD interns worked closely with dedicated Data Manager, PURE Administrator and Research Administrator.

• Roslin have dedicated RDM webpages• c. 60% researchers kept their research outputs up-to-date in PURE though very few had

updated research data metadata or were aware that they could.• c. 90% of researchers submitted data to journals and open access domain repositories

e.g. 50% submitted to NCBI , 20% submitted to EBI

Page 15: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Number of datasets deposited into NCBI from Roslin Institute are large (e.g. Over 55,000 expressed sequence tags, over 73000 protein sequences, over 132000 genome survey sequences)

Unwieldy proposition to record metadata from NCBI into PURE

Currently no automated processes in place.

Page 16: Supporting the development of a national Research Data Discovery Service – a Pilot Project

• The main reasons stated for using these repositories were:

• Funder requirement• Default repository within their discipline• Recommendation by peers

• c. 40% of researchers were confident about the safety of their data and long term gaurantees provided by the domain repositories, whereas c. 60% did not know or were not sure

• Researchers working with industry partners indicated that due to confidential nature of the data they do not upload data to open access repositories

• Only one third of researchers had heard about DataShare (with only one researcher who had used it). Two thirds hadn’t heard of it.

• In general there was no interest in using DataShare due to well established domain repositories

Page 17: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Social and Political Science

• 19 datasets held in the UK Data Archive described in PURE, 0 datasets deposited into DataShare

• 15 researchers identified as having made data available via the UK Data Archive were sent a questionnaire – only 2 knew about DataShare

• 12 ESRC funded PhD students interviewed (about making their data available in UKDA / DataShare) - No Data Management Plans written by ESRC funded PhDs at start of research (this is now mandatory)

• 10 researchers interviewed (different from those that answered the questionnaire)

Page 18: Supporting the development of a national Research Data Discovery Service – a Pilot Project

• Research Assistants are regularly employed to manage, clean and publish datasets. The temporary nature of contracts often means that the knowledge and practice of curating datasets is not retained within the School

• Among the challenges cited by researchers for making datasets available both in a quantitative and qualitative sense, the most common is that of ethics and anonymisation

• Of c. 300 researchers in the School between 2008-2016 only 19 had deposited data in the UK Data Archive

• This confirmed (in the eyes of ther PhD intern) that making datasets available in open access or domain repositories is not necessarily a wide spread practice nor of primary importance

Page 19: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Closing remarks

• Internships instrumental in starting RDM conversations within Schools• Mixed economy of research culture, practice and behaviour• Speed and process of data generation, description and deposit varies• Are we surprised? Old habits die hard.

• Build it and they will come!• From a service provision perspective there is no one-size-fits all

solution• With more emphasis placed on ‘as required’ service solutions

Page 20: Supporting the development of a national Research Data Discovery Service – a Pilot Project

• Greater understanding needed of disciplinary and sub-disciplinary practice

• Rethink outreach, formal and informal training strategies • Targeted approach, local data managers, 6FTEs

• OA has taken c. 10 years to become embedded as common practice within the scholarly communication process

• Arguably it is early days for RDM

• We’ll await observations from other Schools with interest !

Page 21: Supporting the development of a national Research Data Discovery Service – a Pilot Project

Questions!Special thanks to:

Rodrigo BacigalupeCleo DaviesJames JafaliNatalie Lankester-CarthyBridget Moynihan