towards validating observation data in waterml 2.0 water for a healthy country you can change this...
TRANSCRIPT
Towards validating observation data in WaterML 2.0
WATER FOR A HEALTHY COUNTRY
You can change this image to be appropriate for your topic by inserting an image in this space or use the alternate title slide with lines.Note: only one image should be used and do not overlap the title text.
Enter your Business Unit or Flagship name in the ribbon above the url.
[delete instructions before use]
Jonathan Yu | Software Engineer16th July 2012Hydroinformatics 2012
An architecture for validating structure and content of WaterML 2.0 documents
Towards validating observation data in WaterML 2.0 | Jonathan Yu
Outline:
1. Overview of WaterML 2.02. Validation problem3. Proposed validation approach4. Discussion and challenges5. Conclusion
2 |
Towards validating observation data in WaterML 2.0 | Jonathan Yu
Need for water data standards
3 |
A data standard is vital
Towards validating observation data in WaterML 2.0 | Jonathan Yu
WaterML 2.0• International standard XML
encoding for transfer of water information
• Result of harmonization of a number of identified exchange formats
• Ongoing effort by WMO/OGC Hydro DWG• headed up by Peter Taylor from
CSIRO
• Where possible, enhance reusability of standards
4 |
Towards validating observation data in WaterML 2.0 | Jonathan Yu
What does WaterML 2.0 enable?
• delivery and consumption of water observations data • any Sensor Observation Service (SOS) implementation
• integration of water observations data with data from closely related domains in environmental sciences• such as geology and meteorology, where OGC-conformant systems are being
deployed.• applications such as
– groundwater interoperability– climate monitoring
5 |
Towards validating observation data in WaterML 2.0 | Jonathan Yu
Validation problem
• Current implementation target of WaterML 2.0 is XML
• Common practice is to use XML schema to describe grammar/structure
• Can we adequately validate WaterML 2.0 using XML schema?• XML schema validation is inadequate
6 |
Towards validating observation data in WaterML 2.0 | Jonathan Yu
XML Schema validation inadequate
WaterML 2.0 Information Model
Co-constraints
XML Schema
7 |
Excerpt of Timeseries – default timeValuePair<wml2:Timeseries gml:id="time_series_1">… <wml2:defaultTimeValuePair> <wml2:TimeValuePair> <!-- Unit of measure must use the UCUM code --> <wml2:unitOfMeasure xlink:href="m"/> <wml2:quality xlink:href="http://www.opengis.net/WaterML2.0/def/quality/unchecked" xlink:title="unchecked data"/> <!-- Codes for data types defined in specification. --> <wml2:dataType xlink:href="http://www.opengis.net/WaterML2.0/def/timeseriesType/AveragePrec" xlink:title="Average in preceeding interval"/> <wml2:processing xlink:href="http://www.opengis.net/WaterML2.0/def/processing/raw"
xlink:title="As measured data"/> </wml2:TimeValuePair></wml2:defaultTimeValuePair>
Towards validating observation data in WaterML 2.0 | Jonathan Yu8 |
Content validation may
be required
Towards validating observation data in WaterML 2.0 | Jonathan Yu
How do we go about enhancing XML Schema validation?
Option 1: Overload the XML Schema. E.g. Ship vocabulary definitions as static enumerations in the schema.
Option 2: Create custom code to handle co-constraints to parse XML and apply constraints checking• Opaque, non-standard, reporting format is also non-standard
Option 3: Other standards-based constraints checking technology• i.e. Schematron
11 |
Towards validating observation data in WaterML 2.0 | Jonathan Yu
Schematron• Schematron is an ISO standard
• ISO/IEC 19757-3:2006 Information technology -- Document Schema Definition Language (DSDL) -- Part 3: Rule-based validation -- Schematron
• Has a defined language for reporting: Schematron Validation Report Language (SVRL)
• We can apply standard transformation on SVRL outputs to further process or convert this report to human readable formats (HTML) or some other machine readable format
12 |
Proposed validation service architecture
Vocabulary Service
RDF Triple Store
Validation Service
User interface
SchematronRules
XSD Validation
HTTP REST Interface
SKOS/RDF Vocab
Interfaces
SPARQL Queries
WaterML 2.0Doc
Towards validating observation data in WaterML 2.0 | Jonathan Yu13 |
ConformanceCertificate Report
First pass: XML Schema validation
Second pass: Schematron validation- Involves vocabulary checking
Report is generated and returned
Towards validating observation data in WaterML 2.0 | Jonathan Yu
Requirement class: measurement time series exchange
Req 1
Req 2
Conformance class:measurement time series exchange
Conf Test(s) 1
Conf Test(s) 2
Structuring content validation rules
Requirement class: measurement time series exchange
Req 1
Req 2
Conformance class:measurement time series exchange
Conf Test(s) 1
Conf Test(s) 2
14 |
Exchanging water observation data
Conformance certification
report
Use the OGC modular spec to define the WaterML 2.0 requirements classes and the associated conformance classes for validation rules.
Wider implications: decoupled architectureDecoupling of vocabulary services allows:
• Distributed vocabulary services• Reference vocabularies to emerge• Makes vocabulary services highly reusable for other purposes- Inclusion in validation of other encoding formats
(e.g. WaterML 2.0 – P.2. Ratings and gauges?)- Documentation generation, user interface elements
Decoupling allows validation service to be generic• Adapt for other XML based markup language validation
Towards validating observation data in WaterML 2.0 | Jonathan Yu15 |
Potential scenario
WDTFValidation
WaterML 2.0Validation
SI UnitsVocService
International AuthorityVocService
Aust. AuthorityVocService
BOM AuthorityVocService
Towards validating observation data in WaterML 2.0 | Jonathan Yu16 |
‘Goldilocks’ of content rule definition
Tension in determining content rules to provide out-of-the-box
Too constrained:• trade-off in flexibility of the format• can restrict its usage and be more prescriptive of the use than is required• Users not able to express what they want
Not constrained enough:• greater flexibility• yield ‘conformant’ documents that may have problems
Working on getting the balance right…
Towards validating observation data in WaterML 2.0 | Jonathan Yu17 |
Towards validating observation data in WaterML 2.0 | Jonathan Yu
Conclusion and Future work
WaterML 2.0 and the validation service• the need for standards and appropriate validation mechanism• proposed a validation service for schematic and semantic
validation enhanced with vocabulary checking• importance of the decoupling of validation and vocabularies
Future work:• balance of content rules - flexible but prescriptive enough• develop a set of reference vocabularies for timeseries• reporting output to outline the level of conformance according
to the WaterML 2.0 specification• finding a home for the validation service
18 |
Land and WaterJonathan YuSoftware Engineert +61 3 9252 6440e [email protected] www.csiro.au/clw
ICT CentrePeter TaylorSoftware Engineert +61 3 6232 5530e [email protected] www.csiro.au/ict
WATER FOR A HEALTHY COUNTRY
Thank you
Towards validating observation data in WaterML 2.0 | Jonathan Yu
Information models reference controlled vocabularies
20 |
WaterML 2.0 Information
Model
Observations and
Measurements
Information models Controlled vocabularies
Unit of Measure Vocabs
Unit of Measure Vocabs
InterpolationType
Vocabs
InterpolationType
Vocabs
InterpolationType
Vocabs
Unit of Measure Vocabs
Validating WaterML 2.0
Propose 2-pass method of validation• Syntactic level – XML Schemas• Content level – Business/Logic Rules
1. XML Schema validation can verify data-types and basic patterns
2. Validating at Content level. This involves checking• Valid identifiers
– e.g. verify the URI exists• Co-constraints and vocabulary
checking– e.g. the uom is suitable for the
property (expressed as a URI)
Towards validating observation data in WaterML 2.0 | Jonathan Yu21 |
<wml2:Timeseries gml:id="time_series_1"> <wml2:defaultTimeValuePair> <wml2:TimeValuePair> <wml2:unitOfMeasure
xlink:href="m"/> <wml2:quality xlink:href="http://www.opengis.net/WaterML2.0/
def/quality/unchecked" xlink:title="unchecked data"/><wml2:dataType xlink:href="http://www.opengis.net/WaterML2.0/
def/timeseriesType/AveragePrec" xlink:title="Average in preceeding interval"/> <wml2:processing xlink:href="http://www.opengis.net/WaterML2.0/
def/processing/raw" xlink:title="As measured data"/>
</wml2:TimeValuePair></wml2:defaultTimeValuePair>
WaterML 2.0 XML Fragment