google public data - ecbunstats.un.org/unsd/accsub/2011docs-18th/presentation-google.pdffusion...

Post on 31-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Public DataEnhancing Data Discovery and Exploration Benjamin Yolken (yolken@google.com)

September 2011

Overview

Disseminating public statistics

Google tools● Public Data Explorer● Fusion Tables● Refine

Conclusion

Disseminating Statistics

Objective

Make public statistics accessible,useful, and well-organized.

Public statistics (2)

Public statistics (4)

Accessible...

(1) Access: Data need to be online and findable● Provider web sites● Third-party aggregators● Search engines

(2) Understanding: Statisticians aren't the only users● Lay users: Teachers, students, journalists, policy makers● Computers: Search engines● If not accessible to non-experts, data can become unused or,

worse, misused

Useful...

There are a lot of distractions today: tables and simple plots are not enough

Need to engage not just with users' eyes, but also their brains

Well-organized...

Go beyond flat lists of data...● Topics● Time periods● Geographic regions● Formats● Languages, etc...

Ultimately, depends on having good metadata

Google Tools

Public Data Explorer (PDE) [Link]

What it is:● Stand-alone product for interactively exploring and visualizing rich

datasets● Visualizations can be shared or embedded on 3rd party sites

What it's good for:● Reaching out to non-expert users● Getting traffic to your site● Categorical, aggregated, time-series data

Caveats:● Datasets must be in Dataset Publishing Language (DSPL) format

○ Have some tools to help○ Working on converters from other formats like SDMX

PDE: Embed

Demo link

Fusion Tables [Link]

What it is:● Product for creating, editing, and sharing tabular data

What it's good for:● Table edits and transformations: joining, filtering, aggregating, etc.● Static visualizations, particularly maps● Exposing data to users via APIs

Caveats:● Not connected to PDE (yet)● Not as useful for time series exploration

Google Refine

What it is:�● Desktop-based tool for cleaning up and transforming tabular data

What it's good for:�● Bulk data transformations● Faceted data browsing● Outlier-detection and cleanup

Caveats:�● No collaboration features (yet)

Google Refine

ConclusionNeed to make statistics accessible, useful, organized

Google has tools that can help

Key advice: Think about the users, their needs

Really exciting area, only scratched the surface in terms of what's possible

Thank you!

Questions?

Appendix

PDE: Metadata

Dataset Publishing Language (DSPL)● �Designed for interactive exploration and visualization● Released under BSD, open source license● Combines data tables (CSV) with metadata (XML)

PDE: Dataset Creation and Upload

top related