google public data - ecbunstats.un.org/unsd/accsub/2011docs-18th/presentation-google.pdffusion...
TRANSCRIPT
Public DataEnhancing Data Discovery and Exploration Benjamin Yolken ([email protected])
September 2011
Overview
Disseminating public statistics
Google tools● Public Data Explorer● Fusion Tables● Refine
Conclusion
Disseminating Statistics
Objective
Make public statistics accessible,useful, and well-organized.
Public statistics (2)
Public statistics (4)
Accessible...
(1) Access: Data need to be online and findable● Provider web sites● Third-party aggregators● Search engines
(2) Understanding: Statisticians aren't the only users● Lay users: Teachers, students, journalists, policy makers● Computers: Search engines● If not accessible to non-experts, data can become unused or,
worse, misused
Useful...
There are a lot of distractions today: tables and simple plots are not enough
Need to engage not just with users' eyes, but also their brains
Well-organized...
Go beyond flat lists of data...● Topics● Time periods● Geographic regions● Formats● Languages, etc...
Ultimately, depends on having good metadata
Google Tools
Public Data Explorer (PDE) [Link]
What it is:● Stand-alone product for interactively exploring and visualizing rich
datasets● Visualizations can be shared or embedded on 3rd party sites
What it's good for:● Reaching out to non-expert users● Getting traffic to your site● Categorical, aggregated, time-series data
Caveats:● Datasets must be in Dataset Publishing Language (DSPL) format
○ Have some tools to help○ Working on converters from other formats like SDMX
PDE: Demo
Demo link
Fusion Tables [Link]
What it is:● Product for creating, editing, and sharing tabular data
What it's good for:● Table edits and transformations: joining, filtering, aggregating, etc.● Static visualizations, particularly maps● Exposing data to users via APIs
Caveats:● Not connected to PDE (yet)● Not as useful for time series exploration
Google Refine
What it is:�● Desktop-based tool for cleaning up and transforming tabular data
What it's good for:�● Bulk data transformations● Faceted data browsing● Outlier-detection and cleanup
Caveats:�● No collaboration features (yet)
Google Refine
ConclusionNeed to make statistics accessible, useful, organized
Google has tools that can help
Key advice: Think about the users, their needs
Really exciting area, only scratched the surface in terms of what's possible
Thank you!
Questions?
Appendix
PDE Intro Video
PDE: Metadata
Dataset Publishing Language (DSPL)● �Designed for interactive exploration and visualization● Released under BSD, open source license● Combines data tables (CSV) with metadata (XML)
PDE: Dataset Creation and Upload