state of the art web mapping with open source
DESCRIPTION
OSCON 2012 workshop given by Dane SpringmeyerTRANSCRIPT
State of the ArtWeb Mapping
with Open SourceOSCON 2012 | Dane Springmeyer
@springmeyergithub / twitter
see also:Justin Miller
Building a mobile, offline mapping stack using open tools & data
5pm Wednesday, F150
Background
Engineer @ MapBox
Building TileMill and Mapnik
Web performance / rendering
We provide services & open source tools
open source toolsto cover
CartoDB
TileMill
•
maps are simple(a primer)
geodata as just another
data field / type
cartography as just a sexy form of data
visualization
location: lat/lon, x/y
attributes:name, type, value
styles separate from data akin to css/html
CartoCSS@motorway: #90BFE0;
.highway[TYPE='motorway'] {
.line[zoom>=0] {
line-color: spin(darken(@motorway,36),-10);
line-cap:round;
line-join:round;
}
.fill[zoom>=10] {
line-color:@motorway;
line-cap:round;
line-join:round;}}
point: •line: -----------polygon: _ _ _ _ _ | _ _ _ _ _ |
spatial types
Multi* typesmany to one
geometries feature
latitude longitude name
45.5 -122.6 PDX
tabular geo-csv
WKT Name
MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) Group of Cities
tabular geo-csv(multipoint)
{ "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": { "name": "PDX" }, "geometry" : { "type": "Point", "coordinates": [ -122.6, 45.5 ] } }]}
geojson
Works everywhere: e.g QGIS, TileMill, Web clients
postgispostgis=# select 'POINT(-122.6 45.5)'::geometry as geom,'PDX'::text as name;
geom | name --------------------------------------------+------ 01010000006666666666A65EC00000000000C04640 | PDX(1 row)
WKB(Well Known Binary)
postgispostgis=# select ST_Distance('POINT(-122.6 45.5)'::geography,'POINT(-122.3 47.6)'::geography)/1609.344 as dist_in_miles_from_pdx_to_sea;
dist_in_miles_from_pdx_to_sea ------------------------------- 145.755555956692(1 row)
minimal code for simple maps, both server and client
Mapnik
var mapnik = require('mapnik');var map = new mapnik.Map(256, 256);map.loadSync('map.xml');map.zoomAll();map.renderFileSync('map.png');
Leaflet<html><head> <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.css" /> <script src="http://cdn.leafletjs.com/leaflet-0.3.1/leaflet.js"></script></head><body>! <div id="map" style="width: 100%; height: 100%"></div> <script> var map = new L.Map('map'); var osm = new L.TileLayer('http://tile.osm.org/{z}/{x}/{y}.png'); map.setView(new L.LatLng(45.5, -122.65), 12).addLayer(osm); var pdx = { "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": { "name": "PDX" }, "geometry" : { "type": "Point","coordinates": [ -122.65, 45.5 ]}}] }; map.addLayer(new L.GeoJSON(pdx)); </script></body></html>
•
but maps are hard
geodata can be messy and multi-
resolution
geodata can behuge
geodata can be dynamic
data storytakes too long
maps as the single lock-in point (google)
or point of failure(Slow WMS, IE support, clashing
design)
•
how modern web maps work
or, how to tell stories with maps quickly,
ensure they are fast under load and
work in IE
open dataosm.org
naturalearthdata.comus census (geo/www/tiger)local governments portals
serversidepre-processing
gradually work clientside
tile renderers mapnik / mapserver
fast app serversNode.js/Python/C++
pre-processedpre-rendered
cacheable
beautiful graphicsAntigrain Geometry
Cairo Graphics
standard formatsgeojson, wkt, csv, shapefile, postgis,
geotiff
tiled data
bake big data bitmaps
pre-render where possible, but beware the
world is big
coming...optimized tiled formats like msgpack, protobuf
(not just bitmaps)
coming ...optimized tiled storage
and indexing
coming ...more robust and
configurable clientside renderers
Storage: Database / Flat file (Spatially Indexed)
Data Tiles:Optimized vectors
Image Tiles:Software Vector
Renderer
Display:<img> tiles
Render:Hardware Vector
Renderer
UI: Browser JS client / Mobile Native client
Data Processing: GIS Apps, Scripting
Storage: Postgres-PostGIS / CSV / GeoJSON
Data Tiles:TileStache / Kothic.js /TileMill ?
Image Tiles:Mapnik via TileMill
or TileStache+ UTFGrid interactivity
Display:all browers
support
Render:Kothic.js / Vecnik
(Canvas/WebGL insome browsers)
UI: Leaflet / ModestMaps / RouteMe
Data Processing: Qgis, R stats, Python, SQL
•
Installing TileMill
TileMill: Ubuntu
sudo apt-add-repository ppa:developmentseed/mapbox
sudo apt-get -y update
sudo apt-get -y install tilemill
sudo start tilemill
TileMill: Mac
TileMill: Basics
Cross platform - Linux, Win, OS X
Same code both desktop & web
Outputs PNG, MBTiles, Mapnik XML
Written in Javascript (Node.js) and C++ (Mapnik)
Art of the possible
http://project.wnyc.org/stop-frisk-guns/
foursquare.com
Millions of points without sacrificing speed
TileMill: Live
http://bit.ly/MFjLnGhttp://bit.ly/SFeBfJEc2 machines only available on July 17,2012
set one up yourself like: http://mapbox.com/tilemill/docs/guides/ubuntu-service/
Demos...
• TileMill: layer ordering, fonts, labeling, plugins, mbtiles export, mapnik xml export, svg/marker-transforms
• TileMill: reinforce basics through demos: arc.js geojson, cartodb csv, etherpad csv
• OSM-bright setup
Thanks!
@springmeyergithub / twitter
•
Do not miss Stamen and Vizzuality
(Cartodb)
TileMill Extra Details
PERSPECTIVEdoi:10.1038/nature10836
The case for open computer programsDarrel C. Ince1, Leslie Hatton2 & John Graham-Cumming3
Scientific communication relies on evidence that cannot be entirely included in publications, but the rise ofcomputational science has added a new layer of inaccessibility. Although it is now accepted that data should be madeavailable on request, the current regulations regarding the availability of software are inconsistent. We argue that, withsome exceptions, anything less than the release of source programs is intolerable for results that depend on computation.The vagaries of hardware, software and natural language will always ensure that exact reproducibility remainsuncertain, but withholding code increases the chances that efforts to reproduce results will fail.
T he rise of computational science has led to unprecedentedopportunities for scientific advance. Ever more powerful computersenable theories to be investigated that were thought almost
intractable a decade ago, robust hardware technologies allow data collec-tion in the most inhospitable environments, more data are collected, andan increasingly rich set of software tools are now available with which toanalyse computer-generated data.
However, there is the difficulty of reproducibility, by which we meanthe reproduction of a scientific paper’s central finding, rather than exactreplication of each specific numerical result down to several decimalplaces. We examine the problem of reproducibility (for an early attemptat solving it, see ref. 1) in the context of openly available computerprograms, or code. Our view is that we have reached the point that, withsome exceptions, anything less than release of actual source code is anindefensible approach for any scientific results that depend on computa-tion, because not releasing such code raises needless, and needlesslyconfusing, roadblocks to reproducibility.
At present, debate rages on the need to release computer programsassociated with scientific experiments2–4, with policies still ranging frommandatory total release to the release only of natural language descrip-tions, that is, written descriptions of computer program algorithms.Some journals have already changed their policies on computer programopenness; Science, for example, now includes code in the list of itemsthat should be supplied by an author5. Other journals promoting codeavailability include Geoscientific Model Development, which is devoted,at least in part, to model description and code publication, andBiostatistics, which has appointed an editor to assess the reproducibilityof the software and data associated with an article6.
In contrast, less stringent policies are exemplified by statements suchas7 ‘‘Nature does not require authors to make code available, but we doexpect a description detailed enough to allow others to write their owncode to do similar analysis.’’ Although Nature’s broader policy states that‘‘...authors are required to make materials, data and associated protocolspromptly available to readers...’’, and editors and referees are fullyempowered to demand and evaluate any specific code, we believe thatits stated policy on code availability actively hinders reproducibility.
Much of the debate about code transparency involves the philosophy ofscience, error validation and research ethics8,9, but our contention is morepractical: that the cause of reproducibility is best furthered by focusing onthe dissection and understanding of code, a sentiment already appreciatedby the growing open-source movement10. Dissection and understandingof open code would improve the chances of both direct and indirectreproducibility. Direct reproducibility refers to the recompilation and
rerunning of the code on, say, a different combination of hardware andsystems software, to detect the sort of numerical computation11,12 andinterpretation13 problems found in programming languages, which wediscuss later. Without code, direct reproducibility is impossible. Indirectreproducibility refers to independent efforts to validate something otherthan the entire code package, for example a subset of equations or a par-ticular code module. Here, before time-consuming reprogramming of anentire model, researchers may simply want to check that incorrect coding ofpreviously published equations has not invalidated a paper’s result, toextract and check detailed assumptions, or to run their own code againstthe original to check for statistical validity and explain any discrepancies.
Any debate over the difficulties of reproducibility (which, as we willshow, are non-trivial) must of course be tempered by recognizing theundeniable benefits afforded by the explosion of internet facilities and therapid increase in raw computational speed and data-handling capabilitythat has occurred as a result of major advances in computer technology14.Such advances have presented science with a great opportunity to addressproblems that would have been intractable in even the recent past. It isour view, however, that the debate over code release should be resolved assoon as possible to benefit fully from our novel technical capabilities. Ontheir own, finer computational grids, longer and more complex compu-tations and larger data sets—although highly attractive to scientificresearchers—do not resolve underlying computational uncertainties ofproven intransigence and may even exacerbate them.
Although our arguments are focused on the implications of Nature’scode statement, it is symptomatic of a wider problem: the scientificcommunity places more faith in computation than is justified. As weoutline below and in two case studies (Boxes 1 and 2), ambiguity in itsmany forms and numerical errors render natural language descriptionsinsufficient and, in many cases, unintentionally misleading.
The failure of code descriptionsThe curse of ambiguityAmbiguity in program descriptions leads to the possibility, if not thecertainty, that a given natural language description can be convertedinto computer code in various ways, each of which may lead to differentnumerical outcomes. Innumerable potential issues exist, but mightinclude mistaken order of operations, reference to different model ver-sions, or unclear calculations of uncertainties. The problem of ambiguityhas haunted software development from its earliest days.
Ambiguity can occur at the lexical, syntactic or semantic level15 and isnot necessarily the result of incompetence or bad practice. It is a naturalconsequence of using natural language16 and is unavoidable. The
1Department of Computing Open University, Walton Hall, Milton Keynes MK7 6AA, UK. 2School of Computing and Information Systems, Kingston University, Kingston KT1 2EE, UK. 383 Victoria Street,London SW1H 0HW, UK.
2 3 F E B R U A R Y 2 0 1 2 | V O L 4 8 2 | N A T U R E | 4 8 5
Macmillan Publishers Limited. All rights reserved©2012