a fast, offline reverse geocoder in python

13
A FAST OFFLINE REVERSE GEOCODER IN PYTHON Ajay Thampi Data Scientist, OpenSignal

Upload: ajay-thampi

Post on 30-Jul-2015

212 views

Category:

Data & Analytics


0 download

TRANSCRIPT

A FAST OFFLINE REVERSE GEOCODER IN PYTHONAjay ThampiData Scientist, OpenSignal

OUTLINE

OpenSignal

Motivation

The Library

Demo

Performance Results

Applications

Contributions from the Community

OPENSIGNAL

http://opensignal.com | http://wifimapper.com

Cellular data points: 41 billionWiFi data points: 50 billionSpeed tests: 93 million

MOTIVATION

Reverse geocode terabytes of data (~50M coordinates / day)Options:

Online web services (Google Maps, OpenStreetMap)RestrictiveSlow

Offline (PostGIS, Python libraries)ComplexSlow

THE LIBRARY

Improves on an existing library by Richard PenmanSupports Python 2 and 3Geocodes a lot more informationHigh Performance

Open Source (LGPL license)

Statistics: (since 27/03/2015)

Downloads: 2,649 Commits: 41Committers: 5Stars: 1,089Forks: 40

#notsohumblebrag

• Place name• Country Code (ISO-3166)• Admin region 1• Admin region 2• Coordinates

IMPLEMENTATION

Two modes:Mode 1: Single-processMode 2 (Default): Multi-process

Source of data: GeoNamesPlaces with a population > 1000 (Total = 144,859)

GPS coordinates of cities loaded into a K-D TreeNearest neighbour (NN) algorithmMode 1: cKDTree class in scipyMode 2: Parallelised cKDTree

Dependencies:numpyscipy

PARALLELISED K-D TREE

Uses the multiprocessing modulePros over threading:

Exploits multiple CPUs and coresNo GIL limitation

Cons over threading:Separate memory space => IPC or Shared Memory

Static SchedulingK-D Tree Settings:

Euclidean distance (Minkowski p-norm where p = 2)Distance upper bound: Inf

Refer multiprocessing tutorial by Sturla Molden, University of Oslo

DEMO

PERFORMANCE RESULTS

APPLICATIONS (1/2)

• Top 20 regions in the UK where OpenSignal users run speed tests

Data from Sep-Dec 2014

APPLICATIONS (2/2)

• Speed test data points from the Greater London region

Data from Sep-Dec 2014Visualisation using Google Fusion Tables

HAT TIP

Python 3 Support (Brandon Liu and David J. Felix)

C++ Wrapper (Mehdi Lauters)

@thampiman

Thank YouQ & A