abstract
DESCRIPTION
Integrating CUAHSI HIS Cyberinfrastructure with Open Source DataTurbine Streaming Data Middleware. Thomas Whitenack, Sameer Tilak, Ilya Zaslavsky, Tony Fountain San Diego Supercomputer Center, UCSD, San Diego, CA. What is DataTurbine. Abstract. DataTurbine in Environmental Observing Projects. - PowerPoint PPT PresentationTRANSCRIPT
AbstractAbstract
Integrating CUAHSI HIS Cyberinfrastructure Integrating CUAHSI HIS Cyberinfrastructure with Open Source DataTurbine Streaming Data Middleware with Open Source DataTurbine Streaming Data Middleware
Thomas Whitenack, Sameer Tilak, Ilya Zaslavsky, Tony FountainThomas Whitenack, Sameer Tilak, Ilya Zaslavsky, Tony FountainSan Diego Supercomputer Center, UCSD, San Diego, CASan Diego Supercomputer Center, UCSD, San Diego, CA
About CUAHSI HISAbout CUAHSI HIS
The CUAHSI HIS (Hydrologic Information System) project has focused The CUAHSI HIS (Hydrologic Information System) project has focused on consistent management of observations data available from on consistent management of observations data available from government agencies as well as data published by academic government agencies as well as data published by academic investigators. Management of real time data is an important component investigators. Management of real time data is an important component of the CUAHSI HIS project. The Open Source DataTurbine Initiative is of the CUAHSI HIS project. The Open Source DataTurbine Initiative is an NSF-supported effort that focuses on providing open source an NSF-supported effort that focuses on providing open source streaming data middleware to multiple environmental observation streaming data middleware to multiple environmental observation projects. The Open Source DataTurbine team has been collaborating projects. The Open Source DataTurbine team has been collaborating with the CUAHSI HIS team in the development of applications that with the CUAHSI HIS team in the development of applications that demonstrate the utility of DataTurbine for managing streaming data demonstrate the utility of DataTurbine for managing streaming data from hydrologic stations. from hydrologic stations.
In this project, DataTurbine is used to acquire and stream data from a In this project, DataTurbine is used to acquire and stream data from a range of sensors connected to a Campbell Datalogger, into a database range of sensors connected to a Campbell Datalogger, into a database system. The schema of this database follows the CUAHSI system. The schema of this database follows the CUAHSI Observations Data Model (ODM). A Java application (DataTurbine sink Observations Data Model (ODM). A Java application (DataTurbine sink program) was developed to retrieve realtime data from the DataTurbine program) was developed to retrieve realtime data from the DataTurbine server and populate the ODM's Values table. The streaming data were server and populate the ODM's Values table. The streaming data were configured for access via the CUAHSI HIS GetValues service, making configured for access via the CUAHSI HIS GetValues service, making them available for querying from many CUAHSI HIS client applications. them available for querying from many CUAHSI HIS client applications.
The DataTurbine middleware is capable of connecting with multiple The DataTurbine middleware is capable of connecting with multiple types of sensors from different vendors and exposing streaming data in types of sensors from different vendors and exposing streaming data in a uniform way. The ability to efficiently explore and integrate data a uniform way. The ability to efficiently explore and integrate data streams from different distributed real time and archival sources, to streams from different distributed real time and archival sources, to create a comprehensive dynamic portrait of the state of environment in create a comprehensive dynamic portrait of the state of environment in a given area, is an important component of the vision for environmental a given area, is an important component of the vision for environmental observatories. observatories.
The Consortium of Universities for the Advancement of Hydrologic The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ Science, Inc. (CUAHSI) is an organization representing 120+ universities in the US and abroad. As part of its mission, CUAHSI universities in the US and abroad. As part of its mission, CUAHSI supports the development of cyberinfrastructure for the hydrologic supports the development of cyberinfrastructure for the hydrologic sciences. The CUAHSI HIS (Hydrologic Information System) project is sciences. The CUAHSI HIS (Hydrologic Information System) project is a multi-year multi-institution effort focused on consistent management a multi-year multi-institution effort focused on consistent management of observations data available from several federal agencies (USGS, of observations data available from several federal agencies (USGS, EPA, USDA, NOAA, etc.) as well as published by individual EPA, USDA, NOAA, etc.) as well as published by individual investigators. investigators.
CUAHSI HIS develops service-oriented architecture for hydrologic CUAHSI HIS develops service-oriented architecture for hydrologic research and education, to enable publication, discovery, retrieval, research and education, to enable publication, discovery, retrieval, analysis and integration of hydrologic data. The project team has analysis and integration of hydrologic data. The project team has defined a common information model for organizing hydrologic defined a common information model for organizing hydrologic observation data, designed a common exchange protocol (Water observation data, designed a common exchange protocol (Water Markup Language) and developed a collection of SOAP web services Markup Language) and developed a collection of SOAP web services (WaterOneFlow services) that provide uniform access to different (WaterOneFlow services) that provide uniform access to different federal, state and local hydrologic data repositories. federal, state and local hydrologic data repositories.
This system is now implemented as a collection of Hydrologic This system is now implemented as a collection of Hydrologic Information Servers deployed at NSF-supported Hydrologic Information Servers deployed at NSF-supported Hydrologic Observatory test beds.Observatory test beds.
DataTurbine CapabilitiesDataTurbine Capabilities
ConclusionConclusionThe Open Source DataTurbine can be seamlessly integrated in CUAHSI HIS The Open Source DataTurbine can be seamlessly integrated in CUAHSI HIS cyberinfrastructure, providing an efficient, scalable and fault-tolerant solution for cyberinfrastructure, providing an efficient, scalable and fault-tolerant solution for streaming observations data to HIS components. Further work will include streaming streaming observations data to HIS components. Further work will include streaming large volumes of observations data from real-time stations maintained by large volumes of observations data from real-time stations maintained by government agencies, managing multimedia streams, and integration of Real-time government agencies, managing multimedia streams, and integration of Real-time Data Viewer with CUAHSI HIS online clients such as Hydroseek and DASH. Data Viewer with CUAHSI HIS online clients such as Hydroseek and DASH.
DataTurbine in Environmental Observing ProjectsDataTurbine in Environmental Observing Projects
CUAHSI HIS Service Oriented Architecture: General OutlineCUAHSI HIS Service Oriented Architecture: General Outline
What is DataTurbineWhat is DataTurbine• Solution for accessing both streaming and static data, from different vendor systems, via a common interface• Released under Apache 2.0 Open Source License• Provides real high performance data streaming, 10+MB/sec, 1000 frames/sec on PCs• Supported by NASA SBIR, 15 years in development• NSF invested in supporting open-source development of the DataTurbine: SDCI project, 2007-09.
• Additional support from the Moore foundation, and from multiple observatory projects
• It is one of just a handful comprehensive solutions for managing streaming data
LinksLinksCUAHSI HIS: CUAHSI HIS: http://www.cuahsi.org/his/
HIS Wiki @ SDSC : HIS Wiki @ SDSC : http://river.sdsc.edu/wiki
Open Source DataTurbine: Open Source DataTurbine: http://www.dataturbine.org
Getting RBNB DataTurbine Software: Getting RBNB DataTurbine Software: http://code.google.com/p/dataturbineReal-time Data Viewer (RDV): http://code.google.com/p/rdv/
Integrating DataTurbine in HISIntegrating DataTurbine in HIS
Existing DeploymentsExisting Deployments
Test bed HISServers
Central HIS servers
ArcGIS
Matlab
IDL
MapWindow
Excel
Programming (Fortran, C, VB)
Desktop clients
Customizable web interface (DASH)
HTML - XML
WS
DL
-SO
AP
Modeling (OpenMI)
Global search (Hydroseek)
WaterOneFlow Web Services, WaterML
HIS LiteServers
External data providers
Deployment to test beds
Other popular online clients
ODM DataLoader
Streaming Data Loading
Ontology tagging (Hydrotagger)
WSDL and ODM registration
Data publishing
ODMTools
Server configtools
Test bed HISServers
Test bed HISServers
Test bed HISServers
Central HIS servers
Central HIS servers
ArcGIS
Matlab
IDL
MapWindow
Excel
Programming (Fortran, C, VB)
Desktop clients
Customizable web interface (DASH)
HTML - XML
WS
DL
-SO
AP
Modeling (OpenMI)
Global search (Hydroseek)
WaterOneFlow Web Services, WaterML
HIS LiteServersHIS LiteServersHIS LiteServers
External data providers
External data providers
Deployment to test beds
Other popular online clients
ODM DataLoader
Streaming Data Loading
Ontology tagging (Hydrotagger)
WSDL and ODM registration
Data publishing
ODMTools
Server configtools
• Can be configured to feed data to several applications, including remote servers
• Supports multiple types of streams: real-time monitoring, video and multimedia, telemetry, instant messages, etc. etc.
• Can be accessed via URLs (e.g. can stream to browser); one can also write to the server via browser
• Can be mapped as a network drive (e.g. as a “Web Folder” opened in IE), built-in support in Windows, Mac OS X, Linux, several other systems
• Has a programmer API, and a developer community. .Net support available (though Java is used more often)
• Has direct connection with Matlab; M-files are provided with standard distribution
• Has several standard applications: rbnbAdmin, rbnbSource (signal generator), rbnbPlot, rbnbChat
• Scalable: DataTurbine servers can be interconnected to handle large streams
• Can manipulate the streams: fast forward or slow motion playback (TiVo-like)
• Secure access to DataTurbine Server, based on user credentials (under development)
DataTurbine server system requirementsDataTurbine server system requirements• A computer running Linux, Windows, Unix, OSX or similar, with a working JVM version 1.1 or later.
Different brands of JVMs should be fine (e.g. Sun, IBM, Jrocket, etc)• Enough memory to hold the data you want, and enough disk to contain the archive you want• A network connection that's fast enough and reliable enough for your needs.• Apple minis have been tested as minimal servers. With 2GB of memory, they're fast and cheap.
However, since all that matters is the JVM, you can use whatever you prefer.• In general, more memory is good. A 32-bit JVM can use up to 3.5GB, and with a 64-bit JVM you
can address as much as you can afford.• If you have extreme needs, consider a 64-bit Sun box. Good results with their Niagara-architecture
T2000.• Note: 3 DataTurbine servers are available for public use: each with 4Gb RAM and shared 7.5 Tb
archive space on RAID5 – see www.dataturbine.orgStep 1:Step 1:Download and install a DataTurbine server• download and double-click jar file to install, or use a public DataTurbine server
Step 2:connect the DataTurbine with sensors• Configure the sensors• According to RBNB Simple API (SAPI) create a “source” program to insert stream data into DataTurbine
server: - SAPI documentation and example of source programs are included in the download - for many sensors and vendors, source programs already exist (e.g. for LoggerNet)
Step 3: make sure the DataTurbine server receives the streams• Install Real-time Data Viewer (RDV), a common DataTurbine client• Write a small JNLP (Java Network Launch Protocol) file defining command line parameters and
JVM options for launching RDV. Examples: http://it.nees.org/software/rdv/RDV.jnlp, http://geo.sdsc.edu/jnlp/RDV.jnlp
• Launch RDV and verify the data streams
Step 4: configure RBNB output to CUAHSI ODM
• Setup ODM database instance and populate it is with metadata (using any of the data loading tools developed in HIS: ODM Data Loader, SDL, SSIS scripts.
• Configure a Java program ‘stream2db’ (an RBNB DataTurbine sink program) to automatically insert data values into ODM database when the new data arrives in RBNB server (mapping sensor channels into table and column names in ODM)
• Open the ODM instance in SQL Management Studio and verify that that the DataValues table has been populated with values from the sensors
• Configure ODM web services over the ODM instance, register then in Central HIS, or in regional HIS Server (and visualize in DASH, Data Access System for Hydrology)
RBNB data displayed in CUAHSIDASH application (simulated air temperature stream)
Live data and video from Santa Live data and video from Santa Margarita Ecological Reserve (NEON)Margarita Ecological Reserve (NEON)
Live video from Live video from Kenting coral Kenting coral reef, off of reef, off of SE Taiwan SE Taiwan (CREON (CREON project)project)
Integration of Heterogeneous DevicesIntegration of Heterogeneous DevicesProject NI
cRIO Campbell CR510
Apprise Templine
Davis weather station
Vaisala WXT510
Vaisala PTB210
Axis 241 (video)
Greenspan Dissolved Oxygen Sensor
GLEON X X X X X CREON X X X NEON X X X NEES X X PRAGMA X X X X X
NASA data, NASA data, integration integration with Google with Google Earth and Earth and DataTurbineDataTurbine
NEES data NEES data with a user-with a user-
authored authored wireframe wireframe
data viewer data viewer added to added to
RDVRDV
NEON – Ecology http://neoninc.org GLEON – Hydroecology http://gleon.org/ CREON – Coral reefs http://www.coralreefeon.org/ MoveBank – Animal tracking
http://www.princeton.edu/~wikelski/research/index.htm Bridges and Civil Infrastructure – Engineering
http://healthmonitoring.ucsd.edu/ NEES – Earthquake Engineering http://it.nees.org/ PRAGMA – Pacific Rim Applications and Grid Middleware Assembly
http://pragma-grid.net