© s.j. coles 2006 digital repositories as a mechanism for the capture, management and dissemination...
TRANSCRIPT
© S.J. Coles 2006
Digital Repositories as a Mechanism for the Capture, Management and
Dissemination of Chemical Data
Simon Coles
School of Chemistry,
University of Southampton, U.K.
© S.J. Coles 2006
A Data-Rich Subject – the Crystallography Problem
Cl
Cl
Cl
Cl
Cl
Cl
ClCl Cl
Cl
Cl
ClCl
O
O
O
O
N
N
N
N
N+
O
O
O
N+
O
O
O
30,000,000
1.5,000,000
450,000
© S.J. Coles 2006
Funding Body Viewpoint
© S.J. Coles 2006
Open Access as the Answer?
© S.J. Coles 2006
Separating Data from Interpretations
Underlying data
Intellect & Interpretation
© S.J. Coles 2006
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Searching , harvesting, embedding
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
The scholarly knowledge cycle.
Liz Lyon, eBankUK article. Ariadne, July 2003.
© S.J. Coles 2006
Workflow Capture and Analysis
RAW DATA DERIVED DATA RESULTS DATA
© S.J. Coles 2006
The eCrystals Data Archive
http://ecrystals.chem.soton.ac.uk
© S.J. Coles 2006
Access to the underlying data
© S.J. Coles 2006
Metadata Publication
• Using simple Dublin Core • Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date
• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords
• Specifies which ‘datasets’ are present in an entry
• DOI
• Rights
• Citation
http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
© S.J. Coles 2006
Metadata and Data Quality Control Data manipulation toolbox
Associated Metadata
Value added
Format conversion
© S.J. Coles 2006
Harvesting & Aggregating: Google
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k
© S.J. Coles 2006
Harvesting: OAIster
© S.J. Coles 2006
Linking and aggregating
© S.J. Coles 2006
Embedded in a science portal
© S.J. Coles 2006
eBank/eCrystals Future
• Full embedding in daily laboratory practice• Roll out to other institutions• Full support from host institution• Community acceptance• Federation of repositories• Specialised aggregator services (Crystallography)• Generic aggregator services (Chemistry / Science)
© S.J. Coles 2006
The Information Environment
Institutional Data Sources
© S.J. Coles 2006
Data and Information Loss
© S.J. Coles 2006
Repositories Supporting Laboratory Working Practice
• eBank-UK concentrating on dissemination of data compiled once a study is complete
• To fully assure quality and accuracy of metadata essential to capture as it is generated
• Repository architecture has the potential to store data and metadata as they are generated
• Repository also has capability to manage data and provide report generation and analysis tools
© S.J. Coles 2006
Laboratory Repositories
© S.J. Coles 2006
Workflow Analysis
Researcher, Compound, Experiment type, Timestamp
Sample preparation
Data acquisition
Deposit current dataset
Analyse: Refine experiment?
Complete experiment deposit
© S.J. Coles 2006
The R4L Repository
Deposit
Search
© S.J. Coles 2006
R4L Essentials
• Continual deposition and metadata capture from the very start of the experiment
• Prior Assertion Service – a legally sound protection of IPR
• Laboratory data management and analysis of heterogeneous datasets
• Production of reports – Individual experiment• Production of reports – Study involving several
experiments• Panel of publishers to direct requirements for
data publication
© S.J. Coles 2006
Something to take home!
• Open access to data does not harm or hinder publication of ideas and interpretation in a conventional fashion
• Open access to data, when linked to a publication containing interpretations, enhances the value of the publication
• Open access to ALL data underpinning a publication enables efficient assessment and reuse of that data
• Essential to embed repository deposition into ALL aspects of (laboratory) working procedures