taming rich gml with stetl - foss4g 2013 nottingham

59
Taming Rich GML with Stetl - A lightweight Python Framework for Geospatial ETL Just van den Broecke FOSS4G Nottingham 2013 Sept 21, 2013 www.justobjects.nl 1

Upload: just-van-den-broecke

Post on 11-May-2015

430 views

Category:

Technology


2 download

DESCRIPTION

Presentation on sept 21, 2013 at FOSS4G 2013 in Nottingham (UK). Stetl, Streaming ETL, is a lightweight, geospatial ETL-framework written in Python, integrating transformation tools like GDAL/OGR, XSLT and PostGIS. Stetl targets ETL cases that involve XML and GML data, like INSPIRE data harmonization, but other transformations, even non-geospatial, can also be made. Stetl applies declarative programming: a configuration file specifies an ETL chain of input/filter/output modules. Stetl uses native calls to C-level libraries like libxml2 (via lxml) for speed. See more at http://stetl.org Watch this presentation video recording on FOSSLC: http://www.fosslc.org/drupal/content/taming-rich-gml-stetl-lightweight-python-framework-geospatial-etl

TRANSCRIPT

Page 1: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Taming Rich GML with Stetl-

A lightweight Python Framework for Geospatial ETL

Just van den BroeckeFOSS4G Nottingham 2013

Sept 21, 2013www.justobjects.nl

1

Page 2: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

About MeIndependent Open Source Geospatial Professional

Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep

Just van den [email protected] www.justobjects.nl

2

Page 3: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

We have a Problem

3

Page 4: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

The Rich GML Problem

4

Page 5: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Rich GML = Complex Mess

5

Page 6: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

INSPIRE Dutch National Datasets

Germany: AFIS-ALKIS-ATKISUK: OS Mastermap

.

.6

Page 7: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

“Semi GML” e.g. Dutch Addresses & Buildings (BAG)

ArbitraryNesting

7

Page 8: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

The Street Name!

A Street Element in an INSPIRE Annex I Address..

8

Page 9: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Complex Model

Transformations

9

Page 10: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

100+ MBGML Files

10

Page 11: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

11

Page 12: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Millionsof

Objects

12

Page 13: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

10s of Millionsof

<Elements>

13

Page 14: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

MultipleTransformation

Steps

14

Page 15: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Solution is Spatial ETL

15

Page 16: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

But How ?

16

Page 17: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

FOSS ETL - DIY ? Maybe

17

Page 18: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

FOSS ETL - High Level

18

Page 19: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

FOSS ETL - Lower Level

Each powerful individually but cannot do the entire ETL

ogr2ogr

19

Page 20: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

FOSS ETL - How to Combine?

=+ + ?ogr2ogr

20

Page 21: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example - 2011 INSPIRE-FOSS

http://inspire.kademo.nl/doc/design-etl.html

Good ideas buthard to scale and reuse. Need Framework

21

Page 22: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

FOSS ETL - Add Python to Equation

=+ + ?( )ogr2ogr

22

Page 23: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

=+ +

Stetl

( )ogr2ogr

23

Page 24: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Stetl=

SimpleStreaming

SpatialSpeedy

ETL24

Page 25: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

GML1

GML2

Stetl

From Barrels of GML to Maps

25

Page 26: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

26

Page 27: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

StetlConcepts

27

Page 28: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Process Chain

Input Filter OutputFilter

Stetl concepts

Source Target

28

Page 29: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Process Chain

Input Filter Outputgml

Filter

Stetl concepts

29

Page 30: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: GML to PostGIS

Reader ogr2ogr

gml

Stetl concepts

30

Page 31: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: INSPIRE Model Transform

ogr2ogr XSLT Writergml

Stetl concepts

Simple Features

Complex Features

31

Page 32: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: deegree Store

ogr2ogr XSLTdeegreeWriter

Stetl concepts

Or viaWFS-T

32

Page 33: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Process Chain - How?

Input Filters Output

Stetl concepts

33

Page 34: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

34

Page 35: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

The Source

35

Page 36: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

XMLInput

36

Page 37: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

XMLInput

XSLTFilter

37

Page 38: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

Prepare XSLT Script

38

Page 39: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

XSLT GML Output39

Page 40: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

40

Page 41: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XML to Shape

The Stetl Config File

ProcessChain

XMLInputXSLT

Filter

ogr2ogrOutput

41

Page 42: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Running Stetl

stetl -c etl.cfg

42

Page 43: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Result Shapefile viewed in QGIS

43

Page 44: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Installing Stetl

via PyPi

Deps•GDAL+Python bindings•lxml (xml proc)•psycopg2 (Postgres)

sudo pip install stetl

44

Page 45: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Speed: Streaming

Input Filter Output

gml

Stetl concepts

45

Page 46: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Speed: Going Native

Input Filter Outputgml

ogr2ogr StetlStetl

Native C Libs/Progs

Calls

Stetl concepts

46

Page 47: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example Components

Input Filters Output

Stetl concepts

XMLFile XSLT GMLFile

ogr2ogr XMLAssembler ogr2ogr

LineStream XMLValidator WFS-T

deegree* FeatureExtractor deegree*

YourInput YourFilter YourOutput

47

Page 48: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT

log = Util.get_log("xsltfilter")

class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()

def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)

def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet

48

Page 49: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

[etl]chains = input_xml_file|my_filter|output_std

[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml

# My custom component[my_filter]class = my.myfilter.MyFilter

[output_std]class = outputs.standardoutput.StandardXmlOutput

class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet

Your Own Components

Stetl concepts

Step 1- Define Class

Step 2- Config Class

49

Page 50: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Data Structures

Stetl concepts

• Components exchange Packets• Packet contains data and status• Data formats, e.g. :

xml_line_stream etree_docetree_element (feature)etree_element_arraystringany..

50

Page 51: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

deegree Integration

Stetl concepts

•Input DeegreeBlobstoreInput•Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput

51

Page 52: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Cases - The Netherlands

•INSPIRE Download Services publish to deegree store (WFS) generate GML files (for Atom Feed)

•National GML Datasets GML to PostGIS (Top10NL, BGT)

52

Page 53: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

[etl]chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres

# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql

# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql

# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}

[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}

# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMember

Top10NL Extract

ParameterSubstitution

53

Page 54: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Top10NL+BAG (Dutch Topo + Buildings)

54

Page 55: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

BGT - Dutch Large Scale Topo

55

Page 56: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Case: INSPIRE DL Services - Dutch Addresses

Source<GML>

NLExtractStetl deegree

WFS

INSPIRE<GML>

AtomFeed

INSPIREAddresses

DutchAddresses+

Buildings

deegreeblobstore

Stetl

56

Page 57: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Project Status - Sept 21, 2013

• v1.0.4 installable via PyPi• Documentation on www.stetl.org • Real world transforms done• Seeking feedback, support and contributors

57

Page 58: Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

Rich GML Problem Solved?

58