postgis right size your spatial data - postgres conf

68
Right-size your PostGIS data Ryan Lambert [email protected]

Upload: others

Post on 16-Oct-2021

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PostGIS Right Size your Spatial Data - Postgres Conf

Right-size your PostGIS data

Ryan [email protected]

Page 2: PostGIS Right Size your Spatial Data - Postgres Conf

Owner / CEO @ RustProof Labs

Publishes on blog.rustprooflabs.comDB Developer / Analyst

@ Front Range CC

SELECT *

FROM pgconf.presenter

WHERE id = ‘ryanlambert’

Page 3: PostGIS Right Size your Spatial Data - Postgres Conf

MySQL, mid-2000s

MS SQL Server, 2009PostgreSQL, 2011OpenStreetMap && PostGIS

SELECT *

FROM pgconf.presenter

WHERE id = ‘ryanlambert’

Page 4: PostGIS Right Size your Spatial Data - Postgres Conf

Agenda

Spatial data overviewGIS tasks: Analysis vs. ThematicSimplification strategiesPolygonsLines

Page 5: PostGIS Right Size your Spatial Data - Postgres Conf

GIS Data used…

© OpenStreetMap Contributors

Thank you!

https://www.openstreetmap.org/user/RustProof%20Labs

Page 6: PostGIS Right Size your Spatial Data - Postgres Conf

What is spatial data?

Page 7: PostGIS Right Size your Spatial Data - Postgres Conf

What is spatial data?

Data about the world around us.

“GIS data is still just data!”… we have tools for that

Page 8: PostGIS Right Size your Spatial Data - Postgres Conf

SELECT ST_SetSRID(

ST_Point(-73.9815, 40.7625), 4326);

Page 9: PostGIS Right Size your Spatial Data - Postgres Conf

Spatial data types

POINT (Node)

LINE

POLYGON

Page 10: PostGIS Right Size your Spatial Data - Postgres Conf

Spatial data size

Lines and Polygons can bloat quickly

Similar toJSON

BYTEA

Page 11: PostGIS Right Size your Spatial Data - Postgres Conf

Puts on GIS analyst hat…

Page 12: PostGIS Right Size your Spatial Data - Postgres Conf

Tables in PostGIS

Layers in QGIS

Network Activity

Page 13: PostGIS Right Size your Spatial Data - Postgres Conf
Page 14: PostGIS Right Size your Spatial Data - Postgres Conf
Page 15: PostGIS Right Size your Spatial Data - Postgres Conf

Extra challenge

Feels slow

DB doesn’t always register performance issues

Page 16: PostGIS Right Size your Spatial Data - Postgres Conf

GIS Analyst Tasks

Analysis vs. Thematic

Page 17: PostGIS Right Size your Spatial Data - Postgres Conf

Using spatial data: Analysis

I need coffee, quick, where’s the nearest location?

Distance from buildings to fire hydrants?

Is my house in a flood plain?

Page 18: PostGIS Right Size your Spatial Data - Postgres Conf

Using spatial data: Thematic

Density of drivers around the northeast United States

Regional crime rates

Regional weather maps

Page 19: PostGIS Right Size your Spatial Data - Postgres Conf

Using Spatial Data: Thematic

Visualize trends over an area

Page 20: PostGIS Right Size your Spatial Data - Postgres Conf

Transactional vs. Reporting

OLTP OLAP

ETL

Page 21: PostGIS Right Size your Spatial Data - Postgres Conf

All data is dirty!

Page 22: PostGIS Right Size your Spatial Data - Postgres Conf

OpenStreetMap Data

Data entrySome professionals

Other unpaid, untrained volunteers

Everything between

Variable quality / formatting

Page 23: PostGIS Right Size your Spatial Data - Postgres Conf

Thematic GIS

Page 24: PostGIS Right Size your Spatial Data - Postgres Conf

Large-area polygons

Counties

Zip Codes

States

Countries

Lakes

Page 25: PostGIS Right Size your Spatial Data - Postgres Conf

Counties of Colorado (64)

Page 26: PostGIS Right Size your Spatial Data - Postgres Conf

Large Polygon

Jefferson County, CO

ST_Npoints(way) = 780

ST_MemSize(way) = 12.3 kb

Page 27: PostGIS Right Size your Spatial Data - Postgres Conf
Page 28: PostGIS Right Size your Spatial Data - Postgres Conf

Counties in the western US

SELECT COUNT(*) AS county_count,

AVG(ST_NPoints(way)) AS points_avg, SUM(ST_NPoints(way)) AS points_total, AVG(ST_MemSize(way)) / 1024 AS kb_avg,SUM(ST_MemSize(way)) / 1024 AS kb_total

FROM osm.boundary_polygonWHERE admin_level = '6'

Page 29: PostGIS Right Size your Spatial Data - Postgres Conf

Counties in the western US

# of Counties: 460

Total # of Nodes: 623k

Page 30: PostGIS Right Size your Spatial Data - Postgres Conf

Counties in the western US

# of Counties: 460

Total # of Nodes: 623kAverage # of Nodes: 1,355

Max # of Nodes: 13,832

Page 31: PostGIS Right Size your Spatial Data - Postgres Conf

Counties in the western US

# of Counties: 460

Total # of Nodes: 623kAverage # of Nodes: 1,355

Max # of Nodes: 13,832

Average size per polygon: 21.2 kB

Max size of polygon: 216 kB

Page 32: PostGIS Right Size your Spatial Data - Postgres Conf

Size Matters

Average of 16 bytes / node

Page 33: PostGIS Right Size your Spatial Data - Postgres Conf

What can we do?

Page 34: PostGIS Right Size your Spatial Data - Postgres Conf

PostGIS: ST_Simplify()

“Returns a "simplified" version of the given geometry...”

ST_Simplify(geometry, tolerance)

Higher tolerance == More simplification

Page 35: PostGIS Right Size your Spatial Data - Postgres Conf

In non-spatial terms…

3.14159 ≈ 3.14

Page 36: PostGIS Right Size your Spatial Data - Postgres Conf

PostGIS: ST_Simplify()

SRID 900913

tolerance=10

Page 37: PostGIS Right Size your Spatial Data - Postgres Conf

Polygon simplification

Page 38: PostGIS Right Size your Spatial Data - Postgres Conf

Benefits of Simplified Polygons

Reduced # of nodes by 45-50%

Reduced size on disk by 45-50%

Improved query performance by ~ 40%

Page 39: PostGIS Right Size your Spatial Data - Postgres Conf

Side effects

Reduced accuracy

Potential errorsBefore/AfterError rate in testing: < 0.5%

Page 40: PostGIS Right Size your Spatial Data - Postgres Conf

Too much simplification?

Page 41: PostGIS Right Size your Spatial Data - Postgres Conf

Questions so far?

Page 42: PostGIS Right Size your Spatial Data - Postgres Conf

Large number of small lines

Page 43: PostGIS Right Size your Spatial Data - Postgres Conf

Large number of small lines

OpenStreetMap roads

Interstate

Major highway

Minor highway

Residential roads

Sidewalks

Parking aisles

Hiking trails

Page 44: PostGIS Right Size your Spatial Data - Postgres Conf

I-70 in Colorado

Page 45: PostGIS Right Size your Spatial Data - Postgres Conf

I-70 in Colorado

450 miles

1,647 rows of data

19,281 nodesMIN(ST_NPoints(way)): 2

MAX(ST_NPoints(way)): 217

Page 46: PostGIS Right Size your Spatial Data - Postgres Conf

Aggregate and Simplify

ST_Collect()ST_Simplify()

Page 47: PostGIS Right Size your Spatial Data - Postgres Conf

ST_Collect()

Aggregate functionThink SUM() for spatialGROUP BYrefnamelevelcity

Page 48: PostGIS Right Size your Spatial Data - Postgres Conf

Demo (video)

Render roads and waterways in QGIS

Two windowsRaw data: Upper-left

Thematic: Lower-right

Page 49: PostGIS Right Size your Spatial Data - Postgres Conf
Page 50: PostGIS Right Size your Spatial Data - Postgres Conf

PostGIS to QGIS Rendered

https://blog.rustprooflabs.com/2018/12/postgis-tame-your-data-2

Page 51: PostGIS Right Size your Spatial Data - Postgres Conf

PostGIS to QGIS Rendered

40% faster query in Postgres

80-95% faster in QGIS

QGIS pulls 2k rows at a time

QGIS has to load, process, apply rules, and render

Page 52: PostGIS Right Size your Spatial Data - Postgres Conf

PostGIS to Rendered

80 - 95% improvement!

Page 53: PostGIS Right Size your Spatial Data - Postgres Conf

Faster in-DB spatial operations

Page 54: PostGIS Right Size your Spatial Data - Postgres Conf

Trees (Point) per county (Polygon)

EXPLAIN (ANALYZE, BUFFERS, COSTS) SELECT c.osm_id, c.name, c.way,

COUNT(n.osm_id) AS treesFROM osm.county_polygon cINNER JOIN osm.natural_point n

ON ST_Contains(c.way, n.way)WHERE n."natural" = 'tree’AND c.name = 'Jefferson County’

GROUP BY c.osm_id, c.name, c.way;

Page 55: PostGIS Right Size your Spatial Data - Postgres Conf

Trees per county

Raw

274 MB

Page 56: PostGIS Right Size your Spatial Data - Postgres Conf

Trees per county

Raw

Thematic

274 MB

134 MB (-51% diff in spillage)

Page 57: PostGIS Right Size your Spatial Data - Postgres Conf

Trees per county

Raw

Thematic

Page 58: PostGIS Right Size your Spatial Data - Postgres Conf

Latencies at Human Scale

System EventActual

LatencyScaled

LatencyOne CPU cycle 0.4 ns 1 sLevel 1 cache access 0.9 ns 2 sLevel 2 cache access 2.8 ns 7 sLevel 3 cache access 28 ns 1 minMain memory access (DDR DIMM) ~100 ns 4 minSSD I/O 50–150 μs 1.5–4 daysRotational disk I/O 1–10 ms 1–9 monthsInternet call: San Francisco to New York City 65 ms[3] 5 yearsInternet call: San Francisco to Hong Kong 141 ms3 11 years

https://www.prowesscorp.com/computer-latency-at-a-human-scale/

Page 59: PostGIS Right Size your Spatial Data - Postgres Conf

When to optimize?

ETL

Views / Materialized views

Ad-hoc queries

Page 60: PostGIS Right Size your Spatial Data - Postgres Conf

ETL: PgOSM Project

Started in 2015Transforms osm2pgsql structure to “Layers”

MIT License

https://github.com/rustprooflabs/pgosm

Page 61: PostGIS Right Size your Spatial Data - Postgres Conf

Final Thoughts

Page 62: PostGIS Right Size your Spatial Data - Postgres Conf

Postgres v11…

Covering indexes!

Page 63: PostGIS Right Size your Spatial Data - Postgres Conf

Coming in Postgres v12

Covering GIST indexes

CREATE INDEX gix_road_lineON osm.road_lineUSING GIST (way)INCLUDE (highway, ref);

https://commitfest.postgresql.org/21/1615/

https://github.com/postgres/postgres/commit/f2e403803fe6deb8cff59ea09dff42c6163b2110

Page 64: PostGIS Right Size your Spatial Data - Postgres Conf

Resources

Page 65: PostGIS Right Size your Spatial Data - Postgres Conf

PostGIS Docs

https://postgis.net/docs/reference.htmlhttps://postgis.net/workshops/postgis-intro/

Page 66: PostGIS Right Size your Spatial Data - Postgres Conf

RustProof Labs Blog

PostGIS: Tame your spatial data (Part 1)PostGIS: Tame your spatial data (Part 2)Load OpenStreetMap data to PostGISosm2pgsql on a Raspberry PiPgOSM: Transform OpenStreetMap data in

PostGISPgOSM Transformations explained

Page 67: PostGIS Right Size your Spatial Data - Postgres Conf

Versions used

SELECT version(); PostgreSQL 11.1 (Ubuntu 11.1-1.pgdg16.04+1) on x86_64-pc-linux-gnu,

compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit

SELECT PostGIS_Full_version(); POSTGIS="2.5.1 r17027" [EXTENSION] PGSQL="95" (procs need upgrade for

use with "110") GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.2, 08 September 2015" GDAL="GDAL 1.11.3, released 2015/09/16" LIBXML="2.9.3" LIBJSON="0.11.99" LIBPROTOBUF="1.2.1" RASTER

Page 68: PostGIS Right Size your Spatial Data - Postgres Conf

Thank you!

Questions?