open spatial processing
TRANSCRIPT
Open Spatial DataProgress towards a reusable gazetteer
Open Data Group – 16th April 2012@ianibbo
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Overview
Original Problem
How to transition a central gov't funded aggregation of childcare and positive activities with a budget of >2m / year to an open data* model running on £60/month hardwareRetaining security (Of a certain level)
Retaining functionality
(See http://www.madwdata.org.uk/blog/id/394)
2 Major Costs To Mitigate
Large cluster of proprietary OS hosts, ~12 front end web servers, hot backup sql server
Migrated to 1*Pound Host server ~£60/month, server has 2 hard drives, hot backup, off site rsync
Data costs – BPH Address-Point data – Used for geocoding incoming records and lookups on search terms. OS Boundary Line
???
Some Noise
Open Spatial Data Consultation......
Open Spatial Data
Ordnance Survey Open Data
http://www.ordnancesurvey.co.uk/oswebsite/products/os-locator/index.html
Code Point Open
Postcodes to Northing/Easting
OS Locator
Gazetteer of road names (And other features)
Obtained by registering on website, requesting, getting email, following link, …..
The reality of CodePoint Open
The core data is “Open”
Missing the one vital link between CodePoint Open and OS Locator – PostCode → Road Names / Identifiers.
If you're happy to display Postcodes without road names, it's ideal.
Last Mile Problem.
Finding an automated way to link the 2 is hard!
Licensed data is now open, but out of date
Address Point
Still Licensed
Expensive
Probably not that useful anyway for most projects
Problem with focus on “Open Data”
Everyone ends up implementing their own gazetteer
Large scale providers have rate limits and introduce external dependencies / Speed issues
People want local geo-coding (for lots of different reasons).
Having rolled your own gazetteer, you discover you need to handle updates (Full replacements)
It's not an end in itself
Vision
A stand-alone gazetteer web app designed for local network use with features for importing updates from OS, reconciling multiple data sources and performing geo-coding lookups.
Available Tools
Apache SOLR
Long-Standing stalwart of the open data and search community
Schemas slightly clunky
Several spatial options, all with different strengths / weaknesses. Multiple points a problem in some.
ElasticSearch
Schema Free, Apparently Solid Spatial, Multi Points
Good integration with Mongo via Rivers
Problems / Issues
ES Spatial search hard to do directly via a COOL URL
Spatial query syntax is expressive, but complex and needs JSON sub-documents
Need service wrappers
But thats easily done
Updates!
Missed Level of Abstraction(Common to many open data sets?)
Source
LocalCopy
Compare
Processing
NOSQL Like Mongo is ideal for this
ES Ideal for this
Progress
Starting to extract code from existing services into a generic spatial app
https://github.com/ianibo/AnOpenGazetteerFramework/
Work progressing under aegis of GIST Mobile group / Open Data group
Workable Gaz now, but command line interface for importing.
Questions / Comments?
Some supporting info
Original Project – FOI request to DfE
2008-09 2009-10 2010-110
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Total costs - First 3 years
Local Authority RevenueLocal Authority Capi-talCentral Office of In-formationQi ConsultingRedhouseDfE Staff Costs
Consultation sem-inarsMethods Consulting
Engine Group
Digital PublicTribal Education
2008-09 2009-10 2010-110
500000
1000000
1500000
2000000
2500000
First 3 years - Non LA costs
Central Office of In-formationQi ConsultingRedhouseDfE Staff CostsConsultation sem-inarsMethods ConsultingEngine GroupDigital PublicTribal Education