CZO Integrated Data Management
Data Model and Metadata
David Tarboton
Based on CUAHSI HIS
Data Discovery and Integration
Data PublicationData Synthesis and
Research
HIS Central
HydroDesktopHydroServer
Metadata
Data
WaterML
GMLOGC Services
ODM
Analysis
Geo Data
Internet based system to support the sharing of hydrologic data comprised of hydrologic databases and servers connected through web services and software for data publication, discovery and access.
SupportEAR 0622374
CUAHSI
HISSharing hydrologic data
CZO Servers
Boulder Shale Sierra Luquillo Jemez Christina
Standardized web based display
Harvester
CZO Central
Data Store
Data System OverviewCZO Desktop
GetSitesGetSiteInfoGetVariableInfoGetValuesWaterOneFlow
Web Service
WaterML
ASCII text
Requirements• Sufficient metadata for published CZO data to
be unambiguously interpreted and used• Each CZO operate own local data management
system• Format used to present data and metadata
should be identical across CZOs and should support heterogeneous local systems
• Local systems are autonomous with local control on the release and publication of data
Access
• Users required to agree to CZO data use policies
• Same data use agreement for all CZOs• Data accessible freely to registered users who
have agreed to policy
Information Hierarchy
• National CZO• Experimental
Watershed • Sites• Variables• Series• Data values
Abstract data model
• (where) location, object or platform identifier• (when) date and time• (what) attribute (or identifier of attribute)• THE VALUE• (how) method (or identifier of method)• (who) creator (or identifier of creator or data
source)
Data series
• used as an organizing construct• logical grouping of data values (usually from a
column in a table)• commonly, but not limited to time series (e.g.
type series with depth)• Properties we control become identifying
series-level attributes• Properties we measure become variables or
variable level attributes
Why an Observations Data Model• Syntactic consistency (File types and formats)• Semantic consistency
– Language for observation attributes (structural)– Language to encode observation attribute values
(contextual)• Publishing and sharing research data • Metadata to facilitate unambiguous
interpretation• Enhance analysis capability
What are the basic attributes to be associated with each single data value and how can these best be organized?
Community Design Requirements(from comments of 22 reviewers)
• Incorporate sufficient metadata to identify provenance and give exact definition of data for unambiguous interpretation
• Spatial location of measurements• Scale of measurements (support, spacing, extent)• Depth/Offset Information• Censored data• Classification of data type to guide appropriate interpretation
– Continuous– Indication of gaps
• Indicate data quality
http://www.neng.usu.edu/cee/faculty/dtarb/HydroObsDataModelReview.pdf
Observations Data Model
Soil moisture
data
Streamflow
Flux tower data
Precipitation& Climate
Groundwaterlevels
Water Quality
• A relational database at the single observation level• Common persistence model for observations data• Metadata for unambiguous interpretation• Traceable heritage from raw measurements to usable
information• Promote syntactic and semantic consistency • Cross dimension retrieval and analysis
Horsburgh et al., 2008, WRR 44: W05406
Horsburgh, J. S., D. G. Tarboton, D. R. Maidment and I. Zaslavsky, (2008), A Relational Model for Environmental and Water Resources Data, Water Resour. Res., 44: W05406, doi:10.1029/2007WR006392.
CUAHSI Observations Data Model http://his.cuahsi.org/odmdatabases.html
Stage and Streamflow Example
Water Chemistry from a profile in a lake
Water Chemistry from Laboratory Sample
CUAHSI Observations Data Modelhttp://www.cuahsi.org/his/odm.html
123
Work from Out to In
4
56
7
At last …
And don’t
forget …
ODM ODM ODM
WaterOneFlow WaterOneFlow WaterOneFlow
HydroServerDatabase
ODM Databases and WaterOneFlow Web Services
ArcGIS Server Spatial Data Services
SpatialServices
WaterOneFlowServices
Map Server Time Series AnalystHydroServer Website HydroServer Capabilities Web Service
HydroServer Database
Configuration Tool
HydroServer - A Platform for Managing and Publishing Experimental Watershed Data
http://hydroserver.codeplex.com/
Dynamic shared vocabulary moderation system
Local ODMDatabase
Master ODM Shared
Vocabulary
ODM Website
ODM SharedVocabulary Moderator
ODM Data Manager
ODMShared
VocabularyWeb Services
ODM Tools
Local Server
XML
http://his.cuahsi.org/mastercvreg.html From Jeff Horsburgh
CZO Servers
Boulder Shale Sierra Luquillo Jemez Christina
Standardized web based display
Harvester
CZO Central
Data Store
Data System OverviewCZO Desktop
GetSitesGetSiteInfoGetVariableInfoGetValuesWaterOneFlow
Web Service
WaterML
ASCII text
CUAHSI HIS – looking ahead
• A “data sharing/social networking” site for hydrologic data (and possibly models)
• Simple and easy to use• Find, create, share, connect, integrate, work
together online. Collaborate• Hydro value added
CZO web based file format • Time series display files
– The data – time series in columns• Methods files
– A single file listing the methods used by the CZO• Measurement location files (the term agreed for what used to
be called a site. Other names considered were station, node, monitoring point, platform) – A single file listing the measurement locations at which
measurements are made by the CZO– Need a concept of spatial grouping for locations– Identify the groups that locations belong to – implies a
need for a location groups file. (Measurement groups)The slides from this one following contain edits made during the presentation, e.g. the change from “site” to “measurement location”. As a result they may not be entirely consistent, but were as we left things at the end of the meeting.
Time series display file
• Header– Doc group– Default parameter group– Column header group
• Data – Columns of data
Doc groupDoc Attributes
Description
Title A title for the set of data series in the fileAbstract Description of the dataInvestigator contact Information
Name and contact information for investigator responsible for the data
Keywords Keywords useful for discovery of the data seriesVariable names
Names for variables for the data series
Citation Text string that give the citation to be used when the data are referenced.
Publications Publications related to this dataComments Additional comments related to interpretation and use of this
data
Default parameters pertain to all data in file except when overridden by a specific column header (to encourage specification only once)
ExamplesDEFAULT_PARAMETER. site ="GREEN LAKE 4" DEFAULT_PARAMETER. offset_value ="2", offsetUnits =
"meters", offset_description= "this is vertical offset from the ground level down"
DEFAULT_PARAMETER. quality_control_level ="0" DEFAULT_PARAMETER. missing_value_indicator ="-
9999"
Column headersExamplesCOL1. label=ValueAttribute, value=DateTime, UTCOffset=-7,
Timezone=MST, format=”YYYYMMDD hh:mm”COL2. label=VariableName, value=StreamFlow, units=m3/s,
TimeSupport= 3, TimeSupportUnits=hr, NoDataValue=-9999, SampleMedium=water, method=method1, Offsetvalue = 3, OffsetValueUnits=m , offsetDescription = "Depth below surface"
COL3. label=VariableName, value=pH, units=pH units, missing value indicator=-9999
COL4. label=VariableName, value=conductance, units=uS/cm @ 25 degrees C
Series level attributes• Required metadata for each data value in a
CZO time series display fileSiteCodeUnitsMethodOffsetValueOffsetDescriptionSampleTypeVariableNameSampleMediumValueTypeTimeSupport
TimeSupportUnitsDataTypeDataLevel NoDataValueUTCOffsetTimeZoneOffsetValueOffsetDescriptionOffSetUnitsCensorCode
Series level attribute definitions 1Attributes DescriptionLocationCode Code used to identify the Measurement Location (refers to Measurement
locations file)Units The units associated with a data valueMethod Identifier to point to a record in the methods fileOffsetValue The value of a measurement offset if constant. (Optional)OffsetDescription Full text description of the offset value. (Optional, but required if OffsetValue is
given)VariableName Name of the variable from the variables preferred value table.SampleMedium The medium of the sample or where the measurement is made. This should
be from the SampleMediumPV preferred vocabulary table. ValueType Text value indicating what type of data value is being recorded. This should be
from the ValueTypeCV controlled vocabulary table. (e.g. Field measurement, modeled, derived)
TimeSupport Numerical value that indicates the temporal footprint of the data values. 0 is used to indicate data values that are instantaneous. Other values indicate the time over which the data values are implicitly or explicitly averaged or aggregated.
Series level attribute definitions 2Attributes DescriptionTimeSupportUnits Units of time support value from Units PV table.DataType Text value that identifies the data as one of several types (e.g. min,
max, average). PVDataLevel Level used to identify the level of quality control to which data
values have been subjected. Ameriflux is the starting point. Quality control and processing.
Version DOI. A version is associated with a publication or specific release for a specific analysis purpose.
NoDataValue The value used to encode no dataUTCOffset Offset in hours from UTC time of the corresponding LocalDateTime
value.TimeZone Time zone where observation site is located (e.g. Mountain time)OffSetUnits Units with which the offset value is measured (Units PV)CensorCode Text indication of whether the data value is censored from the
CensorCodeCV controlled vocabulary. See USGS document that Anthony knows about
Value level attributesAttributes DescriptionDateTime The date and time at which the value was
observedOffsetValue The value of a measurement offset. (Optional).
[Note that OffsetValue may be either a series level, or value level attribute for any data series, depending upon whether it is a controlled or measured property.]
SampleNumber (then put sample attributes in a separate file associated with sample numbers a cross reference to SESAR)
Type of sample, e.g. grab, from groundwater, from leaf. From sample type preferred value table. Collection method. Need a more general concept of sample attributes. Also need sample number.
Spatial Support Horizontal Optional Spatial Support Vertical OptionalValueAccuracy Specify as absolute
Any value level attribute that is the same for an entire series may be promoted to series level attribute and go in column header
Measurement Locations fileMeasurement Location File Attribute labels
Description
SiteCode Code used by organization that collects the data to identify the siteSiteName Full name of the sampling site.Latitude Latitude in decimal degrees.Longitude Longitude in decimal degrees. East positive, West negative.LatLongDatum The Spatial Reference System of the latitude and longitude coordinates in the
SpatialReferences table.Elevation Elevation of site (in m – or do we want a separate item to give units).VerticalDatum Vertical datum of the elevation. Controlled Vocabulary from VerticalDatumCV.LocalX Local Projection X coordinate. (Optional)LocalY Local Projection Y Coordinate. (Optional)Local Z Local elevation coordinateLocalProjection Identifier that references the Spatial Reference System of the local coordinates.
(Optional) X, Y and ZPosAccuracy Value giving the accuracy with which the positional information is specified.
(Optional)Comments Comments related to the site. (Optional)
Sampling feature refers to feature of interest.
Methods file
Attributes Description LinkMethod Description of each
method.Hyperlink to external reference on the method (Optional)
Is further subdivision needed to elicit specific method elements ?
Shared vocabularies• Variable names (grouped into categories with a keyword list associated with
each name. Need a field for keywords and categories to be added to present CUAHSI HIS system) (e.g. Precipitation, Streamflow, Nitrogen, Soil moisture)
• Units (extended from CUAHSI HIS) (e.g. m, g/L)• Value type (from CUAHSI HIS) (e.g. Field observation, derived value, model
output)• Sample type (from CUAHSI HIS) (e.g. stream water, ground water, rock, soil)• Data type (from CUAHSI HIS) (e.g. average over interval, cumulative,
continuous, sporadic)• Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 = fully
infilled and quality controlled)• Spatial references ( extensible based on EPSG) (e.g. NAD 1983, WGS84, UTM
zone 11)• Censor code (from CUAHSI HIS) (e.g. less than, not-censored, non detect)• Qualifier code (in CUAHSI HIS qualifiers are not a PV. A CZO specific set of
qualifiers will need to be developed)• Vertical datum (from CUAHSI HIS) (e.g. Mean Sea Level, NGVD29)
Ilya’s Unresolved issues• Policies and best practices for generating display files and
setting up data folders, and how we detect what is new• Update frequency• Semantic tagging (how automated)• How shall we handle situations when data are
removed/overwritten?• Need more examples and test cases• What information in log files is needed• How to present data use agreements in services• How to deal with different types of data
Other issues• Other data types
– Maps, GIS data (OGC web services?)– Geophysical data, images, geochemistry data,
geological data, soil profile data• Simple capability to store and share arbitrary
digital objects with metadata using e.g. Catalog Services for the web
• LIDAR data (just use SDSC Open Topography or NCALM)
• Archiving• Questions, additional needs (wishes)