cyberinfrastructure overview core cyberinfrastructure team matthew b. jones national center for...
Post on 30-Dec-2015
218 Views
Preview:
TRANSCRIPT
Cyberinfrastructure Overview
Core Cyberinfrastructure Team
Matthew B. JonesNational Center for Ecological Analysis and Synthesis (NCEAS)
University of California, Santa Barbara
DataONE Kick-off Meeting October 20-22, 2009
Cyberinfrastructure Objectives
Support synthesis in earth observation sciences
Support full lifecycle of scientific process Data acquisition and management Data preservation Data discovery and access Data integration Data analysis and visualization Process management and preservation
Evolve to accommodate technology change
Design goals
Distributed management at Member Nodes Replication and caching for preservation and performance Software must provide benefits for scientists today Evolution of software and standards Support and adapt existing community software efforts Emphasize Free and Open Source Software
What data are in scope?
Biological e.g., Gene, Organism,
Population, Species, Community, Biome, Ecosystem
Environmental e.g., Atmospheric, Chemical,
Ecological, Hydrological, Oceanographic, Physical
Social e.g., Land use, human population
Economic e.g., trade, ecosystem services,
resource extraction
Providers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students
Consumers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students
Who are the providers and consumers?
Same people, different rolesdriving needs
Every community has multiple metadata schemas
Biological Data Profile, Darwin Core, Dublin Core, Ecological Metadata Language, Open GIS schemas
multiple data formats ASCII, NetCDF, HDF, GeoTiff, ...
Some communities have general and domain specific ontologies
Addressing this heterogeneity is critical Integrated analysis of datasets requires
Syntax mapping Semantics mapping Sophisticated integration tools that do not exist
Metadata and data integration
Overview of Components
Member Nodes Earth observing institutions, projects, and networks Provide resources for their own data and replicated data Focused on serving their constituencies
Coordinating Nodes Provide network-wide services to Member Nodes Geographically replicated services
Investigator Toolkit Tools for researchers to access DataNetONE General Purpose and discipline-specific tools Adapt existing tools where possible
Node Design
Member nodes Geographically Distributed Nodes Authoritative repository for many datasets Diversity tolerant (less tightly coordinated) Freedom to try new tools, methods, and leapfrog forward Partial replication
Coordinating nodes Completely replicated Complete metadata catalogue Data Subset (initially a large fraction) Tightly coordinated, stable service platform
DataONE Service Interface
Federated Identity and Authorization Services
Object Management Services
Discovery and Usage Services
Preservation Services
Network Services
Create common access methods for different clients
Create a mechanism to map heterogeneous services
Provide an interface between nodes and service requests
Simplicity of construction Lightweight Ease of implementation Implementations are opaque to service
consumers
Service Interface for Interoperability
What is the Investigator Toolkit?
Suite of software tools for researchers Emphasize Free and Open Source, but support commercial General analysis frameworks (e.g., R, MATLAB) Domain-specific tools (e.g., GARP, Phylocom) Organized using scientific workflows
Supports the scientific lifecycle Data management and preservation Data query and access Data analysis and visualization Process management and preservation
Communication via the Service Interface
Toolkit Functions
Supports the scientific lifecycle
Data management and preservation
Data query and access
Data analysis and visualization
Process management and preservation
Portal software
Many existing open source efforts exist Data management: MATT, UDig, Specify Analysis and modeling: R, Octave
Workflow systems: Kepler, Taverna, Triana, Pegasus
Grid systems: Condor, Globus, BOINC Data and workflow portals: VegBank, myExperiment
Commercial tools important tooMATLAB, SAS, ArcGIS
DataONE: help communities build their own tools Integrate, interoperate, stabilize Create libraries to DataONE Service Interface
Who will build the Toolkit?
Data Management and Preservation
Data management functions Data creation, input, editing, versioning Metadata creation, editing, annotation Local data storage, indexing, searching
Example applications Morpho metadata editor Mercury metadata editor MATT metadata editor ESRI ArcCatalog
Metacat Data Server -- lab group data management
Data Analysis and Visualization
Need community-standard analysis frameworks R, Octave, GRASS SPlus, MATLAB, ArcGIS
Thousands of domain-specific analytical tools exist GARP: Genetic Algorithm for Rule Processing Blast search ClustalW Phlylocom Mesquite
Workflow system capabilities
Workflow systems: Enable communication Support preservation of scientific processes Enable component re-use Allow integration across many software frameworks
Example workflow engines Kepler, Taverna, Pegasus, Triana
Community tools have been successful
Investigator Toolkit will build upon these successes Adapt tools to work together with Service Interface Support Free and Open Source Software
Supported tools will build over time
DataONE discovery portals
Data discovery portal at Coordinating Nodes
Workflow discovery portal at Coordinating Nodes
Other portals as needed
Outstanding issues
Data Discovery, Access, and Availability Federated Identity, Authentication, and Access
Control Metadata and data standards
Evolution of specifications Data Integration and Interoperability Data and Metadata preservation, longevity,
and migration Versioning and identifiers
Scalability
NIH Syndrome
Lots of: metadata catalogs and specifications data standards service definitions architectures and protocols
Many communities of practice GEOSS, KNB, CUAHSI, NBII, GBIF, TDWG, Ameriflux, EOS, OGC, W3C,
LTER, NEON, OOI and on and on and on...
DataONE can not just be Community n+1 Easy to get entrained in the details Have to save people work Have to engage groups early and earnestly
top related