Download - Federation
![Page 1: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/1.jpg)
Federation
eCrystals Federation: Open Repositories for Open Science
Dr Liz Lyon, UKOLN, University of Bath, UK
Dr Simon Coles, University of Southampton, UK
Dr Manjula Patel, UKOLN, University of Bath, UK
CNI Taskforce Meeting, Washington DC, December 2007
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0
http://creativecommons.org/licenses/by-sa/3.0/
![Page 2: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/2.jpg)
1. Chemistry and Open Science : context and practice.
2. Lessons learnt from eBank Phase 3
3. Data curation and preservation issues
4. Setting up the Federation: Challenges ahead?
Overview
![Page 3: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/3.jpg)
Federation
Chemistry and Open Science: context and practice
![Page 4: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/4.jpg)
Social networks for chemists….
New postgraduate cohorts : millennials / Google generation : new behaviours
![Page 5: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/5.jpg)
Community content for chemists : rich media
video + paper = Pubcast
>8000 views
![Page 6: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/6.jpg)
At the coalface: tagging & sharing workflows Astronomy, Bioinformatics, Chemistry, Social Science pilots.
Universities of Manchester & Southampton
![Page 7: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/7.jpg)
“Small science” : sharing in the lab
![Page 8: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/8.jpg)
Open Wetware Laboratory wikis
![Page 9: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/9.jpg)
Transforming practice?
2006
Open Notebook Science (ONS)
26 September:
1st use of term blogged by Jean-Claude Bradley, Drexel University
![Page 10: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/10.jpg)
2007
27 March: ONS at
Amer Chem Society Symposium
7 August: ONS Poster in Second Life on Nature island
24 September: ONS Case Studies in Second Life
4 October: > 43,000 hits in Google for term ONS
![Page 11: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/11.jpg)
10 & 15 October: Policy lists,DabbleDB membership database created US
11 October: ONS experiment starts in Cambridge, UK
7 November: Cameron Neylon (Univ Southampton / STFC, UK) posts “Sourceforge for Science” concept
![Page 12: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/12.jpg)
10 November: Open Data for common molecules - Wikichemicals? Peter Murray-Rust’s blog at Univ. Cambridge, UK
27 November: Research Network proposal submitted to UK research council
Yesterday: about 2,400,000 Google hits for Open Notebook Science
New ideas are surfacing very fast with instant development, testing and take-up…..
![Page 13: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/13.jpg)
eBank Project – building the eCrystals Data Repository
Institutional Repository exemplar
http://ecrystals.chem.soton.ac.uk
![Page 14: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/14.jpg)
Metadata Publication• Using simple Dublin Core
• Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date
• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords
• Specifies which ‘datasets’ are present in an entry
• DOI http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145
• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html
• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
![Page 15: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/15.jpg)
blogswikis
HarvestPublish
![Page 16: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/16.jpg)
Federation
Lessons learnt from eBank Phase 3
![Page 17: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/17.jpg)
• Scoping the eCrystals Federation of crystallography data repositories
• Questionnaire and interview-based • Joint Consultation Workshop (eBank, R4L, SPECTRa) & Report• Engage whole data lifecycle community – crystallographers,
central facilities, publishers, data centres, and chemical information specialists.
• Mixed project team: Chemists, Digital Library researchers & Computer Scientists
Study Aims and Approach
![Page 18: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/18.jpg)
Lessons: Policy and practice• Must be considered at level of the Institution and the
practising Laboratory• Mixed lab practice – central service facility versus single
“staff crystallographer” in department• “Repository Lite” for smaller lab operations?• Established data ‘publication’ practice + domain subject
repository: Cambridge Crystallographic Data Centre (CCDC)
• Institutional policy buy-in is essential• Demonstrate benefits and added value to senior managers• Implications for information services structure
![Page 19: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/19.jpg)
Interoperability & Standards
• Instrument manufacturers proprietary formats
• Technical software platform• Metadata schema : Application profiles• Standards and identifiers –
International Chemical Identifier (InChI), DOI, CIF, CML, de facto software
• Semantic interoperability
X-ray diffractometers
![Page 20: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/20.jpg)
Subject Repositories, Publishing and IPR
• Established subject repository at CCDC (40 years old!) : repository interactions?
• The “embargo problem” : prior dissemination affecting publication of journal article
• Cultural issues related to chemists “its my data” (journal article will always be sacred)
• Mechanisms for sharing with collaborators and referees prior to publication?
![Page 21: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/21.jpg)
Advocacy
• The most important issue?!?
![Page 22: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/22.jpg)
Federation
Data curation and preservation issues
![Page 23: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/23.jpg)
Digital Curation Centre http://www.dcc.ac.uk/
• Community Development work
• Led by UKOLN • eBank/eCrystals
partner
![Page 24: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/24.jpg)
eBank-UK Phase 3 Curation & Preservation Study
http://www.ukoln.ac.uk/projects/ebank-uk/curation/
Examined four main areas1. Audit and certification (TRAC,
DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group)
2. The Open Archival Information System (OAIS) and Representation Information (RI)
3. eBank-UK application profile and preservation metadata
4. ePrints.org repository platform
![Page 25: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/25.jpg)
Observations & Recommendations 1• Self-assessment using DRAMBORA toolkit• Engage DCC audit & certification team• Formulation of long-term objectives and policy
– Deposit agreements– Services
• Aim for community-supported sustainability plan• Implement regular audits: annual• Produce evidence of compliance
– Documentation– Transparency– Adequacy– Measurability
• Federation context
![Page 26: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/26.jpg)
Observations & Recommendations 2
• Maintenance and open access of critical file formats and software– Work-up software e.g. XPREP– Export raw data from instrumentation as imgCIF
• Consider Representation Information (RI) in context of whole crystallography landscape (CCDC, IUCR etc.) • Develop a preservation and curation strategy and formal policies to indicate levels of service
– Deposit, ingest, validation, dissemination
• Consider services to be developed over the DCC Registry/Repository of Representation Information (RRoRI)
![Page 27: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/27.jpg)
Observations & Recommendations 3• Develop preservation strategy & plan for the specific content • Capture preservation metadata, including versioning and provenance information• PREMIS Data Dictionary
– Semantic Units (e.g. file format, significant properties, provenance, fixity info)
– Extend eBank metadata application profile (AP)?
• Obtain consensus on AP• Seek to automate metadata generation, extraction, maintenance• ePrints.org support for information packages
![Page 28: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/28.jpg)
Federation
Setting up the Federation: Challenges ahead?
![Page 29: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/29.jpg)
CreateDeposit
Link
Curate Preserve Standards
Scientist
Funder
Collaborate Share
User
Discover Re-use
eCrystals Federation Data Deposit Model
Link
Link
Scientist
Policy AdvocacyTraining
HarvestIR Federation
Publishers
Data centres / aggregator
servicesAdvisory
![Page 30: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/30.jpg)
Repository deployment & support
• Roll-out in 2 phases– Universities Sydney, Glasgow, Newcastle with
eprints.org platform– Universities Cambridge, STFC, ReciprocalNet,
ARCHER with other platforms
• Information Environment Service Registry (IESR) listing Federation Collections
![Page 31: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/31.jpg)
Raw Data
Public domain material
Laboratory Workflow & Provenance• Achieving end-to-end
workflows: avoiding fragmentation of data, results and interpretations
• Account for differinglaboratory practice
RAW DATA DERIVED DATA RESULTS DATA
![Page 32: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/32.jpg)
Repository interoperability & linking services
• Establish core Federation application profile and mappings
• Bi-directional links with derived articles in “publisher repositories”, IUCr, Royal Society of Chemistry (RSC), Chemistry Central
• Test linking options: StORe middleware and CLADDIER (JISC-funded projects)
• OAI-ORE Pathways Project developments
![Page 33: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/33.jpg)
Interoperability testbed• Experimental data sets + metadata as compound objects• Dublin Core and METS not sufficient• OAI-ORE (base: Atom Publishing Protocol) testbed• Enable 3rd party services e.g. data / text mining
eChemistry project
![Page 34: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/34.jpg)
Enabling data discovery
• Royal Society of Chemistry Project Prospect tagging & semantic linking
![Page 35: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/35.jpg)
Preservation & Sustainability• DRAMBORA Assessment : use DRAMBORA Interactive• Enhance Application Profile with PREMIS preservation metadata• Populate RRoRI with crystallography representation information• Examine repository platform conformance to OAIS Ref Model
• Survey partner institutional preservation policies
![Page 36: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/36.jpg)
Embedding into current publishing practice• Chemists still want to publish scholarly articles• Blogs and repositories are a new form of rapid
communication, but there are prior publication concerns• Timing of release of data into public domain and formal
publication will be crucial – Repository must provide control over timing of public visibility – EPrints3 version of eCrystals has ‘embargo tokens’
• Validation and quality in an ‘Open’ world– Quality indicators?
![Page 37: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/37.jpg)
Advocacy• Chemists still wary of ‘Open Access’• eCrystals Roadshow Workshops
engaging both crystallographers and their service ‘users’ in the workplace
• Open forum at International Union of Crystallography world congress (Aug 2008)
• Publishers Workshop to demonstrate co-existence of open data models & traditional scholarly articles
![Page 38: Federation](https://reader035.vdocuments.us/reader035/viewer/2022062423/56814d27550346895dba5a38/html5/thumbnails/38.jpg)
Federation
Questions?
Slides will be available at :
http://wiki.ecrystals.chem.soton.ac.uk/index.php
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0
http://creativecommons.org/licenses/by-sa/3.0/