Transcript
Page 1: Geospatial ETL with Stetl

Geospatial ETL with Stetl-

“Taming Your Rich GML”

Just van den BroeckeOSGeo Bolsena Codesprint 2013, Bolsena, Italy

June 4, 2012www.justobjects.nl

Page 2: Geospatial ETL with Stetl

About MeIndependent Open Source Geospatial Professional

Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep

Just van den [email protected] www.justobjects.nl

Page 3: Geospatial ETL with Stetl
Page 4: Geospatial ETL with Stetl

OSGeo - Bolsena - 2010

Page 5: Geospatial ETL with Stetl

BOLSENA2012

Page 6: Geospatial ETL with Stetl

ALLES VORBEI ?

BOLSENA2012

Page 7: Geospatial ETL with Stetl

BOLSENA2012

Page 8: Geospatial ETL with Stetl

We have a Problem

Page 9: Geospatial ETL with Stetl

The Rich GML Problem

Page 10: Geospatial ETL with Stetl

Rich GML = Complex Mess

Page 11: Geospatial ETL with Stetl

INSPIREDutch National DSsAFIS-ALKIS-ATKIS

.

.

Page 12: Geospatial ETL with Stetl

“Semi GML” e.g. Dutch Addresses & Buildings (BAG)

Page 13: Geospatial ETL with Stetl

The Streetname!

Application Schema GML e.g. INSPIRE Addresses

Page 14: Geospatial ETL with Stetl

Complex Model

Transformations

Page 15: Geospatial ETL with Stetl

100+ MBGML Files

Page 16: Geospatial ETL with Stetl
Page 17: Geospatial ETL with Stetl

Millionsof

Objects

Page 18: Geospatial ETL with Stetl

10s of Millionsof

<Elements>

Page 19: Geospatial ETL with Stetl

MultipleTransformation

Steps

Page 20: Geospatial ETL with Stetl

Solution is Spatial ETL

Page 21: Geospatial ETL with Stetl

A.K.A.

Page 22: Geospatial ETL with Stetl

Thank You for your

Attention!

Page 23: Geospatial ETL with Stetl

But what about.......FOSS ? ... Stetl?

Page 24: Geospatial ETL with Stetl

FOSS ETL - Lower Level

Each Powerful by Itself

ogr2ogr

Page 25: Geospatial ETL with Stetl

FOSS ETL - High Level

Page 26: Geospatial ETL with Stetl

FOSS ETL - DIY ? (No!)

Page 27: Geospatial ETL with Stetl

FOSS ETL - How to Combine?

=+ + ?ogr2ogr

Page 28: Geospatial ETL with Stetl

Example - 2011 INSPIRE-FOSS

http://inspire.kademo.nl/doc/design-etl.html

Good ideas buthard to scale

and reuse. Need Framework

Page 29: Geospatial ETL with Stetl

FOSS ETL - Add Python to Equation

=+ + ?( )ogr2ogr

Page 30: Geospatial ETL with Stetl

=+ +

Stetl

( )ogr2ogr

Page 31: Geospatial ETL with Stetl

Stetl=

SimpleStreaming

SpatialSpeedy

ETL

Page 32: Geospatial ETL with Stetl

Process Chain

Input Filter Outputgml

Filter

Stetl concepts

Page 33: Geospatial ETL with Stetl

Speed: Streaming

Input Filter Output

gml

Stetl concepts

Page 34: Geospatial ETL with Stetl

Speed: Going Native

Input Filter Outputgml

ogr2ogr sETLsETL

Native C Libs/Progs

Calls

Stetl concepts

Page 35: Geospatial ETL with Stetl

Example: GML to PostGIS

ReaderXML

Splitter ogr2ogr

gml

Stetl concepts

Page 36: Geospatial ETL with Stetl

Example: INSPIRE Model Transform

ogr2ogr XSLT Writergml

Stetl concepts

Page 37: Geospatial ETL with Stetl

Example: deegree Store

ogr2ogr XSLTdeegreeWriter

Stetl concepts

Page 38: Geospatial ETL with Stetl

Process Chain - How?

Input Filters Output

Stetl concepts

Page 39: Geospatial ETL with Stetl

Example: XML to Shape

The Source

Page 40: Geospatial ETL with Stetl

Example: XML to Shape

The XSLT Script

Page 41: Geospatial ETL with Stetl

Example: XML to Shape

XSLT Transform to GML

Page 42: Geospatial ETL with Stetl

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

Page 43: Geospatial ETL with Stetl

Example: XML to Shape

The SETL Chain Config File

ProcessChain

Reader

XSLT

ogr2ogr

Page 44: Geospatial ETL with Stetl

Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT

log = Util.get_log("xsltfilter")

class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()

def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)

def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet

Page 45: Geospatial ETL with Stetl

Example Components

Input Filters Output

Stetl concepts

XMLFile XSLT GMLFile

ogr2gml GMLSplitter gml2ogr

LineStream XMLValidator WFS-T

deegree* FeatureExtractor deegree*

YourInput YourFilter YourOutput

Page 46: Geospatial ETL with Stetl

[etl]chains = input_xml_file|my_filter|output_std

[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml

# My custom component[my_filter]class = my.myfilter.MyFilter

[output_std]class = outputs.standardoutput.StandardXmlOutput

class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet

Your Own Components

Stetl concepts

Step 1- Define Class

Step 2- Config Class

Page 47: Geospatial ETL with Stetl

Data Structures

Stetl concepts

✴ Components exchange Packets✴ Packet contains data and status✴ Data formats:

xml_line_stream etree_docetree_feature_arrayxml_doc_as_stringany

Page 48: Geospatial ETL with Stetl

deegree Integration

Stetl concepts

✴Input DeegreeBlobstoreInput✴Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput

Page 49: Geospatial ETL with Stetl

Cases✴INSPIRE Download Services publish to deegree store (WFS) GML files (for Atom Feed)

✴National GML Datasets GML to PostGIS (Top10NL, BGT)

Page 50: Geospatial ETL with Stetl

[etl]chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres

# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql

# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql

# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}

[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}

# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMember

Top10NL Extract

Page 51: Geospatial ETL with Stetl

Case: INSPIRE DL Services - Dutch Addresses

Source<GML>

NLExtractStetl deegree

WFS

INSPIRE<GML>

AtomFeed

INSPIREAddresses

DutchAddresses+

Buildings

deegreeblobstore

Stetl


Top Related