an on-line collaborative data management system
DESCRIPTION
A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management system that was developed for GeoChronos - an on-line collaborative platform for Earth observation scientists.TRANSCRIPT
![Page 1: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/1.jpg)
An On-line Collaborative Data Management System
Roger Curry1, Cameron Kiddle1, Rob Simmonds1 and Gilberto Z. Pastorello Jr.2
1Grid Research Centre, University of Calgary2Centre for Earth Observation Science, University of Alberta
![Page 2: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/2.jpg)
Data Challenges Related Work Data Management System Use Case: GeoChronos Summary and Future Work
Outline
GCE 2010 Nov. 14, 2010 2
![Page 3: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/3.jpg)
Data Acquisition Much scientific data stored on off-line media Cumbersome and time consuming to access Making data available on-line difficult Insufficient storage and bandwidth
Sharing of Data Lack of willingness to share data Proprietary data - need for controlled access
Data Challenges - I
GCE 2010 Nov. 14, 2010 3
![Page 4: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/4.jpg)
Usability of Data Insufficient metadata to describe data Various metadata standards in some domains,
but many lacking metadata standards – many scientists use their own metadata format
Finding Data Difficult to find data that you need Different data organized / stored differently Tools to browse, search, visualize data often
lacking
Data Challenges - II
GCE 2010 Nov. 14, 2010 4
![Page 5: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/5.jpg)
Content Management Systems i.e., Drupal, Joomla!, Microsoft SharePoint, Plone, ... Offer rich set of features but do not handle:
Meaningful support to specific data formats Efficient association of metadata and ancillary files to data sets Access to a variety of data processing tools Uniform handling of outputs from processing tools
Spectral Libraries i.e., USGS, ASTER, Vegetation Spectral Library (VSL) Are available on-line but lack:
ability to dynamically restructure metadata for browsing collaboration features enabled by social networking
Related Work - I
GCE 2010 Nov. 14, 2010 5
![Page 6: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/6.jpg)
Spectral Library Tools i.e., DLR-DFD Spectral Archive, SPECCHIO Flexibile in creating / handling metadata but:
Have a fixed metadata schema – do not support new metadata needs
Data repositories for other domains i.e., Astrophysics Data System, FLUXNET, European Bioinformatics (EBI)
Databases Offer wide range of functionality but:
Primarily focus on data that is already validated and structured Do not handle preliminary, intermediate, untested data (i.e. research in progress)
Digital Libraries i.e., Planetary Data Systems, NCore, SciPort Have flexible functionality but:
Most focus on well-defined digital artefacts Limited in handling collaboration on evolving data, metadata and schemas
Related Work - II
GCE 2010 Nov. 14, 2010 6
![Page 7: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/7.jpg)
Supports the following functionality: On-line access to data Enables scientists to share data while
maintaining control of who sees it Ability to add and edit metadata while working
with multiple schemas Collaboratively create new schemas to facilitate
consistent/accurate recording of metadata Dynamically restructure the way data is browsed
Data Management System - Overview
GCE 2010 Nov. 14, 2010 7
![Page 8: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/8.jpg)
Data Management System - Framework
GCE 2010 Nov. 14, 2010 8
User & Data: User acquires data from sensor and
uploads to portal Direct acquisition of data also possible
Elgg Portal: Built on top of Elgg – Open source
social networking platform Fine grained access control Flexible data model
Data Storage: Currently local NFS storage Working on distributed iRODS based
system Data Ingestion Service:
Creates records, parses metadata, establishes ancillary relationships
Deployed on cloud-based Condor pool
![Page 9: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/9.jpg)
Data Management System – Data Model
GCE 2010 Nov. 14, 2010 9
Source: http://docs.Elgg.org/wiki/File:Elgg_data_model.png)
Data Management System – Data Model
Arbitrary metadata can be assigned to any entity
Annotations allow users to comment on entities not owned by them
Data management system adds three new types of ElggObjects Schema Collection Record
![Page 10: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/10.jpg)
Data Management System - Schemas
GCE 2010 Nov. 14, 2010 10
Create schemas Custom or standards-based (i.e.
Dublin Core) Individually or as a collaborative
team Schemas consist of
Namespace Description Read/write access permissions Series of metadata keys
Metadata keys consist of Name Description Type (text, latlong, ancillary) Optionality: required,
recommended, optional
![Page 11: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/11.jpg)
Data Management System - Collections Group of related data
i.e., spectral library, set of satellite data Collection consists of
Name, description, read/write access permissions, metadata, records
GCE 2010 Nov. 14, 2010 11
![Page 12: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/12.jpg)
Data Management System - Records
GCE 2010 Nov. 14, 2010 12
Atomic unit of data management system Usually represents a single file, but does not need to be
associated with a file Tabbed interface for viewing:
Spectral plot, metadata, ancillary data, map, comments Custom tabs based on data type
![Page 13: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/13.jpg)
Data Management System – Virtual Directory Structure
GCE 2010 Nov. 14, 2010 13
Dynamic restructuring of data for browsing purposes Folders based on metadata keys/values User can customize the metadata keys used to establish the
directory hierarchy
![Page 14: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/14.jpg)
Use Case - GeoChronos
GCE 2010 Nov. 14, 2010 14
(http://geochronos.org/)
![Page 15: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/15.jpg)
An on-line platform For:
Earth Observation Scientists Facilitating:
Collaboration between scientists Data access, management and sharing Application access, management and sharing
Leveraging: Web 2.0 and social networking technologies Cloud computing technologies
Funded by: CANARIE - Network Enabled Platform (NEP-1) program Cybera
GeoChronos - Overview
GCE 2010 Nov. 14, 2010 15
![Page 16: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/16.jpg)
GeoChronos - Project Team
GCE 2010 Nov. 14, 2010 16
Dr. Arturo Sanchez-AzofeifaUniversity of Alberta
Dr. John GamonUniversity of Alberta
Dr. Benoit RivardUniversity of Alberta
Dr. Rob SimmondsUniversity of Calgary
Prinicipal Investigators
Project Coordination Platform Development Domain Scientists
![Page 17: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/17.jpg)
GeoChronos - Virtual Organization
GCE 2010 Nov. 14, 2010 17
![Page 18: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/18.jpg)
Libraries created Ingested some existing on-line libraries
USGS, ASTER, Vegetation Spectral Library (VSL) Many enhanced features as part of GeoChronos
Spectral Library module - improved browsing, dynamic plotting, mapping, annotations, ...
Domain scientists have contributed libraries Rock samples, tar sand samples, lichen samples,
vegetation samples, alfalfa/barley field samples Data formats / parsers supported
ENVI, UNISPEC, ASD, several ASCII formats Schemas incorporated
Library specific – USGS, ASTER, VSL, ... Sensor/Format specific – UNISPEC, ENVI, .. Other Standards – Dublin Core
Currently hosting (including MODIS data) 10+ schemas, 20+ collections (libraries), 20,000+ records
GeoChronos – Spectral Libraries
GCE 2010 Nov. 14, 2010 18
![Page 19: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/19.jpg)
GeoChronos – MODIS Satellite Data Developed automated workflow
service for mosaicing, subsetting, reprojecting and masking MODIS satellite data
Significantly reduces time that scientists have spent manually doing such workflows
Data management system used to store raw MODIS satellite data and data products derived from the workflow
Parsers/schemas specific to MODIS data have been added to system
User provided with same powerful interface as Spectral Libraries for browsing, accessing and viewing data
GCE 2010 Nov. 14, 2010 19
![Page 20: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/20.jpg)
Have developed data management system in an interactive, iterative fashion
Domain scientists on project have provided much guidance, testing and feedback
Have customized, enhanced the data management system based on feedback received
GeoChronos – User Feedback
GCE 2010 Nov. 14, 2010 20
![Page 21: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/21.jpg)
Identified data related challenges facing scientists
Discussed some related efforts and shortcomings of these approaches
Presented an on-line collaborative data management system addressing many data challenges
Showed example usage of the data management system by GeoChronos
Summary
GCE 2010 Nov. 14, 2010 21
![Page 22: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/22.jpg)
Currently have a single local data repository Working on extending data management system to work with
distributed data repositories using iRODS Currently have powerful browsing functionality
Need to add search functionality across collections and based on metadata values
Currently support custom metadata schemas Plan to make use of Semantic Web technologies to better
relate data and provide ontological mapping between different metadata schemas / standards
Currently work with spectral and MODIS satellite data Plan to incorporate other data such as carbon flux data, other
satellite data, meteorological data, phenology tower data
Next Steps
GCE 2010 Nov. 14, 2010 22
![Page 23: An On-line Collaborative Data Management System](https://reader035.vdocuments.us/reader035/viewer/2022062513/5578fbb2d8b42a675b8b4b4c/html5/thumbnails/23.jpg)
Contact Information
GCE 2010 Nov. 14, 2010 23
http://geochronos.org/[email protected]
http://grid.ucalgary.ca/ http://ceos.ualberta.ca/ http://www.cybera.ca/