state of the art web mapping with open source

State of the ArtWeb Mapping

with Open SourceOSCON 2012 | Dane Springmeyer

@springmeyergithub / twitter

see also:Justin Miller

Building a mobile, offline mapping stack using open tools & data

5pm Wednesday, F150

Background

Engineer @ MapBox

Building TileMill and Mapnik

Web performance / rendering

github / twitterhttp://mapbox.com

http://mapbox.com

http://mapbox.com

We provide services & open source tools

open source toolsto cover

CartoDB

TileMill

maps are simple(a primer)

geodata as just another

data field / type

cartography as just a sexy form of data

visualization

location: lat/lon, x/y

attributes:name, type, value

styles separate from data akin to css/html

CartoCSS@motorway: #90BFE0;

.highway[TYPE='motorway'] {

.line[zoom>=0] {

line-color: spin(darken(@motorway,36),-10);

line-cap:round;

line-join:round;

}

.fill[zoom>=10] {

line-color:@motorway;

line-cap:round;

line-join:round;}}

point: •line: -----------polygon: _ _ _ _ _ | _ _ _ _ _ |

spatial types

Multi* typesmany to one

geometries feature

latitude longitude name

45.5 -122.6 PDX

tabular geo-csv

WKT Name

MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) Group of Cities

tabular geo-csv(multipoint)

{ "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": { "name": "PDX" }, "geometry" : { "type": "Point", "coordinates": [ -122.6, 45.5 ] } }]}

geojson

Works everywhere: e.g QGIS, TileMill, Web clients

postgispostgis=# select 'POINT(-122.6 45.5)'::geometry as geom,'PDX'::text as name;

geom | name --------------------------------------------+------ 01010000006666666666A65EC00000000000C04640 | PDX(1 row)

WKB(Well Known Binary)

postgispostgis=# select ST_Distance('POINT(-122.6 45.5)'::geography,'POINT(-122.3 47.6)'::geography)/1609.344 as dist_in_miles_from_pdx_to_sea;

dist_in_miles_from_pdx_to_sea ------------------------------- 145.755555956692(1 row)

minimal code for simple maps, both server and client

Mapnik

var mapnik = require('mapnik');var map = new mapnik.Map(256, 256);map.loadSync('map.xml');map.zoomAll();map.renderFileSync('map.png');

Leaflet<html><head> <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.css" /> <script src="http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.js"></script></head><body>! <div id="map" style="width: 100%; height: 100%"></div> <script> var map = new L.Map('map'); var osm = new L.TileLayer('http://tile.osm.org/{z}/{x}/{y}.png'); map.setView(new L.LatLng(45.5, -122.65), 12).addLayer(osm); var pdx = { "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": { "name": "PDX" }, "geometry" : { "type": "Point","coordinates": [ -122.65, 45.5 ]}}] }; map.addLayer(new L.GeoJSON(pdx)); </script></body></html>

http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.css

http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.css

http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.js

http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.js

http://livepage.apple.com/

http://livepage.apple.com/

but maps are hard

geodata can be messy and multi-

resolution

geodata can behuge

geodata can be dynamic

data storytakes too long

maps as the single lock-in point (google)

or point of failure(Slow WMS, IE support, clashing

design)

how modern web maps work

or, how to tell stories with maps quickly,

ensure they are fast under load and

work in IE

open dataosm.org

naturalearthdata.comus census (geo/www/tiger)local governments portals

serversidepre-processing

gradually work clientside

tile renderers mapnik / mapserver

fast app serversNode.js/Python/C++

pre-processedpre-rendered

cacheable

beautiful graphicsAntigrain Geometry

Cairo Graphics

standard formatsgeojson, wkt, csv, shapefile, postgis,

geotiff

tiled data

bake big data bitmaps

pre-render where possible, but beware the

world is big

coming...optimized tiled formats like msgpack, protobuf

(not just bitmaps)

coming ...optimized tiled storage

and indexing

coming ...more robust and

configurable clientside renderers

Storage: Database / Flat file (Spatially Indexed)

Data Tiles:Optimized vectors

Image Tiles:Software Vector

Renderer

Display:<img> tiles

Render:Hardware Vector

Renderer

UI: Browser JS client / Mobile Native client

Data Processing: GIS Apps, Scripting

Storage: Postgres-PostGIS / CSV / GeoJSON

Data Tiles:TileStache / Kothic.js /TileMill ?

Image Tiles:Mapnik via TileMill

or TileStache+ UTFGrid interactivity

Display:all browers

support

Render:Kothic.js / Vecnik

(Canvas/WebGL insome browsers)

UI: Leaflet / ModestMaps / RouteMe

Data Processing: Qgis, R stats, Python, SQL

TileMillhttp://mapbox.com/tilemill/

http://mapbox.com/tilemill/

http://mapbox.com/tilemill/

Installing TileMill

TileMill: Ubuntu

sudo apt-add-repository ppa:developmentseed/mapbox

sudo apt-get -y update

sudo apt-get -y install tilemill

sudo start tilemill

TileMill: Mac

TileMill: Basics

Cross platform - Linux, Win, OS X

Same code both desktop & web

Outputs PNG, MBTiles, Mapnik XML

Written in Javascript (Node.js) and C++ (Mapnik)

Art of the possible

http://project.wnyc.org/stop-frisk-guns/



http://gop.sites.devseed.com/



foursquare.com

http://www.npr.org/censusmap/



http://streeteasy.com/



Millions of points without sacrificing speed

TileMill: Live

http://bit.ly/MFjLnGhttp://bit.ly/SFeBfJEc2 machines only available on July 17,2012

set one up yourself like: http://mapbox.com/tilemill/docs/guides/ubuntu-service/

http://bit.ly/MFjLnG

http://bit.ly/MFjLnG

http://bit.ly/tilemill-ec2

http://bit.ly/tilemill-ec2

http://mapbox.com/tilemill/docs/guides/ubuntu-service/




Demos...

• TileMill: layer ordering, fonts, labeling, plugins, mbtiles export, mapnik xml export, svg/marker-transforms

• TileMill: reinforce basics through demos: arc.js geojson, cartodb csv, etherpad csv

• OSM-bright setup

Thanks!

@springmeyergithub / twitter

[email protected]

mailto:[email protected]

mailto:[email protected]

Do not miss Stamen and Vizzuality

(Cartodb)

TileMill Extra Details

PERSPECTIVEdoi:10.1038/nature10836

The case for open computer programsDarrel C. Ince1, Leslie Hatton2 & John Graham-Cumming3

Scientific communication relies on evidence that cannot be entirely included in publications, but the rise ofcomputational science has added a new layer of inaccessibility. Although it is now accepted that data should be madeavailable on request, the current regulations regarding the availability of software are inconsistent. We argue that, withsome exceptions, anything less than the release of source programs is intolerable for results that depend on computation.The vagaries of hardware, software and natural language will always ensure that exact reproducibility remainsuncertain, but withholding code increases the chances that efforts to reproduce results will fail.

T he rise of computational science has led to unprecedentedopportunities for scientific advance. Ever more powerful computersenable theories to be investigated that were thought almost

intractable a decade ago, robust hardware technologies allow data collec-tion in the most inhospitable environments, more data are collected, andan increasingly rich set of software tools are now available with which toanalyse computer-generated data.

However, there is the difficulty of reproducibility, by which we meanthe reproduction of a scientific paper’s central finding, rather than exactreplication of each specific numerical result down to several decimalplaces. We examine the problem of reproducibility (for an early attemptat solving it, see ref. 1) in the context of openly available computerprograms, or code. Our view is that we have reached the point that, withsome exceptions, anything less than release of actual source code is anindefensible approach for any scientific results that depend on computa-tion, because not releasing such code raises needless, and needlesslyconfusing, roadblocks to reproducibility.

At present, debate rages on the need to release computer programsassociated with scientific experiments2–4, with policies still ranging frommandatory total release to the release only of natural language descrip-tions, that is, written descriptions of computer program algorithms.Some journals have already changed their policies on computer programopenness; Science, for example, now includes code in the list of itemsthat should be supplied by an author5. Other journals promoting codeavailability include Geoscientific Model Development, which is devoted,at least in part, to model description and code publication, andBiostatistics, which has appointed an editor to assess the reproducibilityof the software and data associated with an article6.

In contrast, less stringent policies are exemplified by statements suchas7 ‘‘Nature does not require authors to make code available, but we doexpect a description detailed enough to allow others to write their owncode to do similar analysis.’’ Although Nature’s broader policy states that‘‘...authors are required to make materials, data and associated protocolspromptly available to readers...’’, and editors and referees are fullyempowered to demand and evaluate any specific code, we believe thatits stated policy on code availability actively hinders reproducibility.

Much of the debate about code transparency involves the philosophy ofscience, error validation and research ethics8,9, but our contention is morepractical: that the cause of reproducibility is best furthered by focusing onthe dissection and understanding of code, a sentiment already appreciatedby the growing open-source movement10. Dissection and understandingof open code would improve the chances of both direct and indirectreproducibility. Direct reproducibility refers to the recompilation and

rerunning of the code on, say, a different combination of hardware andsystems software, to detect the sort of numerical computation11,12 andinterpretation13 problems found in programming languages, which wediscuss later. Without code, direct reproducibility is impossible. Indirectreproducibility refers to independent efforts to validate something otherthan the entire code package, for example a subset of equations or a par-ticular code module. Here, before time-consuming reprogramming of anentire model, researchers may simply want to check that incorrect coding ofpreviously published equations has not invalidated a paper’s result, toextract and check detailed assumptions, or to run their own code againstthe original to check for statistical validity and explain any discrepancies.

Any debate over the difficulties of reproducibility (which, as we willshow, are non-trivial) must of course be tempered by recognizing theundeniable benefits afforded by the explosion of internet facilities and therapid increase in raw computational speed and data-handling capabilitythat has occurred as a result of major advances in computer technology14.Such advances have presented science with a great opportunity to addressproblems that would have been intractable in even the recent past. It isour view, however, that the debate over code release should be resolved assoon as possible to benefit fully from our novel technical capabilities. Ontheir own, finer computational grids, longer and more complex compu-tations and larger data sets—although highly attractive to scientificresearchers—do not resolve underlying computational uncertainties ofproven intransigence and may even exacerbate them.

Although our arguments are focused on the implications of Nature’scode statement, it is symptomatic of a wider problem: the scientificcommunity places more faith in computation than is justified. As weoutline below and in two case studies (Boxes 1 and 2), ambiguity in itsmany forms and numerical errors render natural language descriptionsinsufficient and, in many cases, unintentionally misleading.

The failure of code descriptionsThe curse of ambiguityAmbiguity in program descriptions leads to the possibility, if not thecertainty, that a given natural language description can be convertedinto computer code in various ways, each of which may lead to differentnumerical outcomes. Innumerable potential issues exist, but mightinclude mistaken order of operations, reference to different model ver-sions, or unclear calculations of uncertainties. The problem of ambiguityhas haunted software development from its earliest days.

Ambiguity can occur at the lexical, syntactic or semantic level15 and isnot necessarily the result of incompetence or bad practice. It is a naturalconsequence of using natural language16 and is unavoidable. The

1Department of Computing Open University, Walton Hall, Milton Keynes MK7 6AA, UK. 2School of Computing and Information Systems, Kingston University, Kingston KT1 2EE, UK. 383 Victoria Street,London SW1H 0HW, UK.

2 3 F E B R U A R Y 2 0 1 2 | V O L 4 8 2 | N A T U R E | 4 8 5

Macmillan Publishers Limited. All rights reserved©2012

state of the art web mapping with open source

Technology