1 nesc workshop grid and geospatial standards 7-sep-2005 data integration with the climate science...
TRANSCRIPT
1
Data integration with theClimate Science Modelling Language
Andrew Woolf1, Bryan Lawrence2, Roy Lowry3, Kerstin Kleese van Dam1, Ray Cramer3, Marta Gutierrez2, Siva
Kondapalli3, Susan Latham2, Dominic Lowe2, Kevin O’Neill1, Ag Stephens2
1CCLRC e-Science Centre2British Atmospheric Data Centre
3British Oceanographic Data Centre
2
Outline
Background
Standards – a framework for interoperability
Climate Science Modelling Language (CSML)
Using CSML
3
Background
Data integration requirements:• scalability across providers• warehousing not an option• enhance access and use, ‘outwards-facing’ (e.g. impacts
community, policymakers)• storage heterogeneity
Semantics as integration ‘key’• common language across providers (and users)• supports wrapper/mediator architecture
4
Standards
Emerging ISO standards• TC211 – around 40 standards for geographic information• Cover activity spectrum: discovery access use
ISO 19101Domain
Reference Model
A geospatial dataset…
…consists of features and related objects…
…in a defined logical structure…
…delivered through services…
…and described by metadata.
5
Standards
Geographic ‘features’• “abstraction of real world
phenomena” [ISO 19101]• Type or instance• Encapsulate important
semantics in universe of discourse
Application schema• Defines semantic content and
logical structure of datasets• ISO standards provide toolkit:
• spatial/temporal referencing
• geometry (1-, 2-, 3-D)• topology• dictionaries (phenomena,
units, etc.)• GML – canonical encoding
[from ISO 19109 “Geographic information – Rules for Application Schema”]
6
Standards
The importance of governance• Information community defined by shared semantics
• Need community process to manage those semantics (definitions, models, vocabularies, taxonomies, etc.)
• e.g. CF conventions for netCDF files
• Role of Feature Type Catalogues [ISO 19110] and registers [ISO 19135]
Governance as driver for granularity• Remit / interest determines appropriate granularity
• e.g. IOC, IHO, WMO
abstract generic highly specialised
feature types spectrum
<temperatureProfile/><measurement type=“Radiosonde” measurand=“temperature”/> <Sonde parameter=“temperature”/>
7
Climate ScienceModelling Language
Aims:• provide semantic integration mechanism for NDG data• explore new standards-based interoperability framework• emphasise content, not container
Design principles:• offload semantics onto parameter type (‘phenomenon’, observable,
measurand)• e.g. wind-profiler, balloon temperature sounding
• offload semantics onto CRS• e.g. scanning radar, sounding radar
• ‘sensible plotting’ as discriminant• ‘in-principle’ unsupervised portrayal
• explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit)
8
Climate ScienceModelling Language
CSML feature types• defined on basis of geometric and topologic structure
CSML feature type Description Examples
TrajectoryFeature Discrete path in time and space of a platform or instrument. ship’s cruise track, aircraft’s flight path
PointFeature Single point measurement. raingauge measurement
ProfileFeature Single ‘profile’ of some parameter along a directed line in space.
wind sounding, XBT, CTD, radiosonde
GridFeature Single time-snapshot of a gridded field. gridded analysis field
PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries
ProfileSeriesFeature Series of profile-type measurements.
vertical or scanning radar, shipborne ADCP, thermistor chain timeseries
GridSeriesFeature Timeseries of gridded parameter fields.numerical weather prediction model, ocean general circulation model
9
Climate ScienceModelling Language
CSML feature types• examples...
ProfileSeriesFeature
ProfileFeature
GridFeature
10
Climate ScienceModelling Language
Application schema• logical structure and semantic content of NDG ‘Dataset’• Based on GML 3.1
«Type»GML::AbstractGMLType
«Type»Dataset
«Type»UnitDefinitions
«Type»ReferenceSystemDefinitions
«Type»PhenomenonDefinitions
«Type»AbstractArrayDescriptor
«Type»GML::FeatureCollection
**
*
*
*
*
*
*
*
*
11
Climate ScienceModelling Language
Integration approaches: wrapper/mediator
BADCBODC
wrapper wrapper
mediatorServicedataInterface
mediatorServicedataInterface
12
Climate ScienceModelling Language
Numerical array descriptors• provides ‘wrapper’ architecture
for legacy data files• ‘Connected’ to data model
numerical content through ‘xlink:href’
Three subtypes:• InlineArray• ArrayGenerator• FileExtract (NASAAmes,
NetCDF, GRIB)
Composite design pattern for aggregation
+arraySize[1]+uom[0..1]+numericType[0..1]+numericTransform[0..1]+regExpTransform[0..1]
«Type»AbstractArrayDescriptor
+aggType[1]+aggIndex[1]
«Type»AggregatedArray
1
+component
*
+values[*]
«Type»InlineArray
+expression[1]
«Type»ArrayGenerator
+fileName[1]
«Type»AbstractFileExtract
+variableName[1]+index[0..1]
«Type»NASAAmesExtract
+variableName[1]
«Type»NetCDFExtract
+parameterCode[1]+recordNumber[0..1]+fileOffset[0..1]
«Type»GRIBExtract
+id+metaDataProperty+description+name
«Type»GML::AbstractGMLType
13
Climate ScienceModelling Language
Inline array
Array generator
<NDGInlineArray><arraySize>5 2</arraySize><uom>udunits.xml#degreeC</uom><numericType>float</numericType><regExpTransform>s/10/9/ge</regExpTransform><numericTransform>+5</numericTransform><values>1 2 3 4 5 6 7 8 9 10</values>
</NDGInlineArray>
<NDGArrayGenerator><arraySize>10001</arraySize><uom>udunits.xml#minute</uom><numericType>float</numericType><expression>0:5:50000</expression>
</NDGArrayGenerator>
14
Climate ScienceModelling Language
File extract
<NDGNASAAmesExtract><arraySize>526</arraySize><numericType>double</numericType><fileName>/data/BADC/macehead/mh960606.cf1</fileName><variableName>CFC-12</variableName>
</NDGNASAAmesExtract>
<NDGNetCDFExtract gml:id="feat04azimuth"><arraySize>10000</arraySize><fileName>radar_data.nc</fileName><variableName>az</variableName>
</NDGNetCDFExtract>
<NDGGRIBExtract><arraySize>320 160</arraySize><numericType>double</numericType><fileName>/e40/ggas1992010100rsn.grb</fileName><parameterCode>203</parameterCode><recordNumber>5</ recordNumber><fileOffset>289412</fileOffset>
</NDGGRIBExtract>
15
Climate ScienceModelling Language
Aggregated array• arrays may be aggregated along an ‘existing’ or ‘new’ dimension
<NDGAggregatedArray gml:id="feat05cruisetrack"><arraySize>2 50</arraySize><aggType>new</aggType><aggIndex>1</aggIndex><component>
<NDGNetCDFExtract><arraySize>50</arraySize><fileName>cruisetrack.nc</fileName><variableName>alat</variableName>
</NDGNetCDFExtract></component><component>
<NDGNetCDFExtract><arraySize>50</arraySize><fileName>cruisetrack.nc</fileName><variableName>alon</variableName>
</NDGNetCDFExtract></component>
</NDGAggregatedArray>
16
Climate ScienceModelling Language
Provides semantic abstraction layer
instantiateNetCDF(DatasetID, FeatureID)
<CSML>+writeNetCDF()
CSMLAbstractFeature(SAX) demarshalling
+read()
AbstractFileExtract
<CSML>
filestore
<CSML>
NetCDF
WCS
WFS
OPeNDAP
....
<CSML>
17
Climate ScienceModelling Language
Status:• Initial feature types defined
• First draft application schema complete
• Trial software tooling being coded (parser, netCDF instantiation)
• Initial deployment trial across BODC, BADC datasets
Future:• Separate out wrapper implementation (array descriptors)
• Disallow ‘internal’ dictionaries
• More strongly-typed features?
• Follow (and pursue!) GML evolution, enhance compliance
• Expand tooling
Related work• WMO, IOC, IHO
• MarineXML
• MOTIIVE (INSPIRE)
18
“MarineXML is an initiative of the IOC/IODE of UNESCO to improve marine data exchange within the marine community. The European Commission has provided a funding contribution to this initiative as part of its 5th Framework Programme to undertake a ‘pre-standardisation’ task of identifying the approaches the marine community should adopt regarding XML technology to achieve improved data exchange.”
EU project – MarineXML
<gml:definitionMember> <om:Phenomenon gml:id="taxon"> <gml:description>The taxon name</gml:description> <gml:name codeSpace="http://www.vliz.be">taxon</gml:name> </om:Phenomenon> </gml:definitionMember> </NDGPhenomenonDefinitions> <!--===================================================================--> <gml:FeatureCollection> <!-- ============================================================== --> <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing 'Pholoe',10,missing 'Owenia fusiformis',23.4,missing 'Hypereteone lactea',6.8,missing 'Anaitides groenlandica',13.2,missing 'Anaitides mucosa',6.8,missing
“... there is a momentum from organisations such as IHO and WMO to adopt consistent approaches for the vocabulary of their data along the reference implementation of ISO Standards prescribed by the [Open Geospatial Consortium]...”
“The NDG format proved a robust recipient for the data from each community. It produced economical files with few redundant elements, striking about the right balance between weak and strong typing.”
Using CSML
19
Using CSML
Class1
Class2
-End1
1
-End2
*
«datatype»DataType1
conceptual model
GML app schema <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> 'ANTHOZOA',63.1,missing 'Scoloplos armiger',66.1,missing 'Spio filicornis',10,missing 'Spiophanes bombyx',60.3,missing 'Capitellidae',131.8,missing
GML dataset
UGAS
parser
Managing semantics
20
Using CSML
Stack of Builders (for UML meta-model)
• current class, object, attribute• specialised for particular
UMLXML mapping
Builder receives:• filtered SAX events• built object
Builder returns:• built object• new object class• new Builder (for inheritance
through substitutionGroups)
SAX2::DefaultHandler
+notationDecl()+unparsedEntity()
«interface»SAX2::DTDHandler
+setDocumentLocator()+startDocument()+endDocument()+startElement()+endElement()+characters()+processingInstruction()+ignorableWhitespace()+startPrefixMapping()+endPrefixMapping()+skippedEntity()
«interface»SAX2::ContentHandler
+error()+fatalError()+warning()
«interface»SAX2::ErrorHandler
+resolveEntity()
«interface»SAX2::EntityResolver
-process()
ObjectParser
+xmlElement() : BuilderUnion+xmlCharacters() : BuilderUnion+xmlAttribute() : BuilderUnion+processObject() : BuilderUnion
-currentClass-currentObject-currentAttribute
ObjectBuilderbuilderStack
*
ObjectPropertyBuilder ISO19118Builder
AbstractClassAbstractObjectAbstractBuilder
«union»BuilderUnion