geospatial etl with stetl

52
Geospatial ETL with Stetl - “Taming Your Rich GML” Just van den Broecke OSGeo Bolsena Codesprint 2013, Bolsena, Italy June 4, 2012 www.justobjects.nl

Upload: just-van-den-broecke

Post on 11-May-2015

275 views

Category:

Technology


3 download

DESCRIPTION

Stetl, Streaming ETL, is a toolkit for the transformation (ETL) of geospatial data. Stetl is based on existing ETL tools like GDAL/OGR and XSLT. Stetl processing is driven from a configuration (.ini) file. Stetl is written in Python and in particular suited for processing GML. Several INSPIRE transformations have been successfully performed with Stetl. This is an introductory presentation given at the OSGeo Bolsena Codesprint on June 4, 2013. Find more info, downloads and documentation on Stetl at http://stetl.org

TRANSCRIPT

Page 1: Geospatial ETL with Stetl

Geospatial ETL with Stetl-

“Taming Your Rich GML”

Just van den BroeckeOSGeo Bolsena Codesprint 2013, Bolsena, Italy

June 4, 2012www.justobjects.nl

Page 2: Geospatial ETL with Stetl

About MeIndependent Open Source Geospatial Professional

Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep

Just van den [email protected] www.justobjects.nl

Page 3: Geospatial ETL with Stetl
Page 4: Geospatial ETL with Stetl

OSGeo - Bolsena - 2010

Page 5: Geospatial ETL with Stetl

BOLSENA2012

Page 6: Geospatial ETL with Stetl

ALLES VORBEI ?

BOLSENA2012

Page 7: Geospatial ETL with Stetl

BOLSENA2012

Page 8: Geospatial ETL with Stetl

We have a Problem

Page 9: Geospatial ETL with Stetl

The Rich GML Problem

Page 10: Geospatial ETL with Stetl

Rich GML = Complex Mess

Page 11: Geospatial ETL with Stetl

INSPIREDutch National DSsAFIS-ALKIS-ATKIS

.

.

Page 12: Geospatial ETL with Stetl

“Semi GML” e.g. Dutch Addresses & Buildings (BAG)

Page 13: Geospatial ETL with Stetl

The Streetname!

Application Schema GML e.g. INSPIRE Addresses

Page 14: Geospatial ETL with Stetl

Complex Model

Transformations

Page 15: Geospatial ETL with Stetl

100+ MBGML Files

Page 16: Geospatial ETL with Stetl
Page 17: Geospatial ETL with Stetl

Millionsof

Objects

Page 18: Geospatial ETL with Stetl

10s of Millionsof

<Elements>

Page 19: Geospatial ETL with Stetl

MultipleTransformation

Steps

Page 20: Geospatial ETL with Stetl

Solution is Spatial ETL

Page 21: Geospatial ETL with Stetl

A.K.A.

Page 22: Geospatial ETL with Stetl

Thank You for your

Attention!

Page 23: Geospatial ETL with Stetl

But what about.......FOSS ? ... Stetl?

Page 24: Geospatial ETL with Stetl

FOSS ETL - Lower Level

Each Powerful by Itself

ogr2ogr

Page 25: Geospatial ETL with Stetl

FOSS ETL - High Level

Page 26: Geospatial ETL with Stetl

FOSS ETL - DIY ? (No!)

Page 27: Geospatial ETL with Stetl

FOSS ETL - How to Combine?

=+ + ?ogr2ogr

Page 28: Geospatial ETL with Stetl

Example - 2011 INSPIRE-FOSS

http://inspire.kademo.nl/doc/design-etl.html

Good ideas buthard to scale

and reuse. Need Framework

Page 29: Geospatial ETL with Stetl

FOSS ETL - Add Python to Equation

=+ + ?( )ogr2ogr

Page 30: Geospatial ETL with Stetl

=+ +

Stetl

( )ogr2ogr

Page 31: Geospatial ETL with Stetl

Stetl=

SimpleStreaming

SpatialSpeedy

ETL

Page 32: Geospatial ETL with Stetl

Process Chain

Input Filter Outputgml

Filter

Stetl concepts

Page 33: Geospatial ETL with Stetl

Speed: Streaming

Input Filter Output

gml

Stetl concepts

Page 34: Geospatial ETL with Stetl

Speed: Going Native

Input Filter Outputgml

ogr2ogr sETLsETL

Native C Libs/Progs

Calls

Stetl concepts

Page 35: Geospatial ETL with Stetl

Example: GML to PostGIS

ReaderXML

Splitter ogr2ogr

gml

Stetl concepts

Page 36: Geospatial ETL with Stetl

Example: INSPIRE Model Transform

ogr2ogr XSLT Writergml

Stetl concepts

Page 37: Geospatial ETL with Stetl

Example: deegree Store

ogr2ogr XSLTdeegreeWriter

Stetl concepts

Page 38: Geospatial ETL with Stetl

Process Chain - How?

Input Filters Output

Stetl concepts

Page 39: Geospatial ETL with Stetl

Example: XML to Shape

The Source

Page 40: Geospatial ETL with Stetl

Example: XML to Shape

The XSLT Script

Page 41: Geospatial ETL with Stetl

Example: XML to Shape

XSLT Transform to GML

Page 42: Geospatial ETL with Stetl

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

Page 43: Geospatial ETL with Stetl

Example: XML to Shape

The SETL Chain Config File

ProcessChain

Reader

XSLT

ogr2ogr

Page 44: Geospatial ETL with Stetl

Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT

log = Util.get_log("xsltfilter")

class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()

def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)

def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet

Page 45: Geospatial ETL with Stetl

Example Components

Input Filters Output

Stetl concepts

XMLFile XSLT GMLFile

ogr2gml GMLSplitter gml2ogr

LineStream XMLValidator WFS-T

deegree* FeatureExtractor deegree*

YourInput YourFilter YourOutput

Page 46: Geospatial ETL with Stetl

[etl]chains = input_xml_file|my_filter|output_std

[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml

# My custom component[my_filter]class = my.myfilter.MyFilter

[output_std]class = outputs.standardoutput.StandardXmlOutput

class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet

Your Own Components

Stetl concepts

Step 1- Define Class

Step 2- Config Class

Page 47: Geospatial ETL with Stetl

Data Structures

Stetl concepts

✴ Components exchange Packets✴ Packet contains data and status✴ Data formats:

xml_line_stream etree_docetree_feature_arrayxml_doc_as_stringany

Page 48: Geospatial ETL with Stetl

deegree Integration

Stetl concepts

✴Input DeegreeBlobstoreInput✴Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput

Page 49: Geospatial ETL with Stetl

Cases✴INSPIRE Download Services publish to deegree store (WFS) GML files (for Atom Feed)

✴National GML Datasets GML to PostGIS (Top10NL, BGT)

Page 50: Geospatial ETL with Stetl

[etl]chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres

# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql

# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql

# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}

[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}

# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMember

Top10NL Extract

Page 51: Geospatial ETL with Stetl

Case: INSPIRE DL Services - Dutch Addresses

Source<GML>

NLExtractStetl deegree

WFS

INSPIRE<GML>

AtomFeed

INSPIREAddresses

DutchAddresses+

Buildings

deegreeblobstore

Stetl