animl: a new analytical data standard
TRANSCRIPT
AnIML:A New Analytical Data
StandardStuart J. Chalk, Department of Chemistry, University of North
ACS Meeting Boston 2015
Data Formats Goals for Data Handling Introduction to AnIML Sections of an AnIML file AnIML Schemas and Files AnIML Technique Definitions Publishing Instrument Data Referencing Data Elements Calculations on Data Future Developments Conclusion
Overview
Native Data Formats Proprietary formats "Metadata" separated from result data Metadata and data in multiple files Metadata not available electronically No way to link metadata with result data
Interchange Data Formats Available for only a few techniques
ANDI — GC, LC, MS JCAMP-DX — UV-Vis, IR, NMR, UV/Vis, IMS
Fixed order, fixed syntax, immutable formats Content limitations Inconsistent implementations
Current Data Formats
Extensible Easy to add new elements without breaking existing
applications Flexible
Useful for diverse needs: Interchange, Interconversion, Archiving...
Useable & Maintainable Easy to create, use, adapt, maintain... Readily available tools
Acceptable Use standard mechanisms accepted by mainstream
computing Human readable eXtensible Markup Language
Goals for Data Handling
Extensible Markup Language (XML) specification
Development under ASTM E13.15 ‘AnIML Task Group’
Data standard to:
“Develop an analytical data standard that can
be used to store data from any analytical instrument”
Introduction to AnIML
http://animl.sourceforge.net
JCAMP-DX http://www.jcamp-dx.org/
ANDI (netCDF) ThermoML (NIST) SpectroML
Nguyen, A. D. T., Arslan, A., Travis, J., Smith, M., Schafer, R., & Kramer, G. W. (2004) ‘Molecular Spectrometry Data Interchange Applications for NIST's SpectroML’, JALA 9 (6), 346-354. doi:10.1016/j.jala.2004.09.001
Generalized Analytical Markup Language (GAML) http://www.gaml.org/
First official meeting March 23, 2003 @ ASTM
Brief History of Time AnIML
Broad scope Different types of data Size of data sets Everyone calls ‘widgit’ something different Need for metadata dictionaries One size does not fit all Getting broad community involvement
Domain experts User communities
What format?
Challenges for AnIML
AnIML XML elements are ‘pigeon holes’ for metadata Minimal ‘required’ information If it’s not required you don’t have to include the
element Extensible Store raw data not processed data
(except for FT techniques) Support for legacy data Record of changes
Validatable Signable (digital sense)
AnIML Design Philosophy
AnIML Schemas and Files
Sections of an AnIML File
AnIML Technique Definitions
AnIML - Sample
AnIML - Sample
AnIML-
Experiment
AnIML - Result
Data storageformat
Not just forspectral data
Access Data Metadata
Manipulateusing XSLT
Validate Signable
AnIML in an ELN
AnIML Viewer -> Jmol/JSpecView (http://jmol.sourceforge.net)
Publish Supplementary Data
Conversion of AnIML data to SVG using XSLT
Convert to Image File for Publication
Expose an AnIML file at a URL Optional: Define a DOI for that URL
Use XPath to reference a specific data point in an AnIML file
//ExperimentStepSet[1]/ExperimentStep[1]/Method[1]/Author[1]/Name[1]
Encode the XPath expression so it can be part of the URL
Open Instrument Data
Part of a Data Management Plan
Federal agencies are mandating data be made available
Long term archive format for research data Referenceable if available online Searchable with Xquery Publish data processing algorithms (XSLT)
Future proof data -> conversion to future data formats
The Healthcare and Life Science (HCLS) Community Profile is a Note from the Semantic Web HCLS Interest Group Access to consistent, high-quality metadata is critical to
finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval.
Data Descriptions:HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/
AnIML 1.0 Deliverables Core Schema - Fundamental framework for AnIML documents Technique Schema - Fundamental framework for technique definition
and extension documents AnIML Technique Definition Documents (ATDD) - Rules for content of
specific technique file AnIML Naming and Design Rules - Specifies rules about data element
structure for interoperability Standard Practice for AnIML Files - Describes how the specification is
supposed to work How to Create a Technique Definition Document - Guidelines for
creating new technique definition documents Other documents
Draft Requirements Specification for AnIML Version 1.0 Requirements and Goals of the Analytical Information Markup Language
AnIML Specification
http://animl.sourceforge.net
Documentation Core specification Technique and extension specification Naming and design rules Annotated technique definitions
(UV/Vis, IR, 1D NMR, MS, Chromatography) Balloting through ASTM (end of 2015)
Vendor, User, Developer extensions Semantic extension of AnIML metadata
items
Future Developments
Conclusion AnIML is a great solution
for storing instrument data Human readable (UTF-8) Platform neutral Archivable Validatable
AnIML leverages the extensiveXML ecosystem of tools
Software engineers know XML
[email protected] Phone: 904-620-1938 Skype: stuartchalk LinkedIn/Slidehare: https://www.linkedin.com/in/
stuchalk ORCID: http://orcid.org/0000-0002-0703-7776 ResearcherID:
http://www.researcherid.com/rid/D-8577-2013
Questions?