open data meti: all content as big data

22
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI 1

Upload: nani

Post on 24-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Open DATA METI: All Content As Big Data. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Open DATA METI: All Content As Big Data

1

Open DATA METI:All Content As Big Data

Dr. Brand NiemannDirector and Senior Enterprise Architect – Data Scientist

Semantic Communityhttp://semanticommunity.info/

AOL Government Bloggerhttp://gov.aol.com/bloggers/brand-niemann/

March 15, 2013http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI

Page 2: Open DATA METI: All Content As Big Data

2

Preface• Question from Brand Niemann:

– Does this deal with the data elements themselves in the data sets, so you can search for data elements that you want to integrate with other data elements and find their definitions (metadata) to know if they are the same or similar enough to be semantically integrated?

• Answer from John Erickson, Director, Web Science Operations, Tetherless World Constellation (RPI):– No. DCAT deals with the initial problems of where dataset catalogs and datasets

themselves are from and what they contain. Loosely speaking, it does for catalogs and datasets what Dublin Core did for publications: it provides a succinct vocabulary that providers can rely on for describing their datasets, and consumers can rely on for finding. DCAT has already been used as the basis for the schema.org "datasets" extension as a way to make discovery of datasets easier using popular search engines.

– Articulating the actual vocabularies used in published datasets is waaaay beyond the scope of DCAT, in part because DCAT is not restricted to datasets published as linked data. Some work including http://healthdata.tw.rpi.edu are looking at ways to communicate standard vocabularies used in published linked data...

All the work with Data Catalogs does not really help with data integration.

Page 3: Open DATA METI: All Content As Big Data

3

Preface

http://www.computerweekly.com/news/2240179544/Big-data-spells-new-architectures

"The data warehouse does what it does well and is not going to go anywhere. But it is not architected very well for the future. Our job, as IT, revolves entirely around one thing -- data integration”.

Big Data Spells New Architecture

Page 4: Open DATA METI: All Content As Big Data

4

Preface

http://radar.oreilly.com/2007/12/google-admits-data-is-the-inte.html

http://www.forbes.com/sites/jonbruner/2012/04/04/tim-oreilly-on-the-future-of-location-the-guy-with-the-most-data-wins/

‘Big Data is the new software’

Page 5: Open DATA METI: All Content As Big Data

5

Preface• Dominic Sale:

– Introduced as OMB Chief of Data Analytics & Reporting at the Big Data Technology Symposium, March 13, 2013.

– Said “new Digital Government Strategy is treating all content as data.“– Dominic Sale joined OMB’s Office of E-Government and Information

Technology in 2008 as a portfolio manager for several government-wide IT initiatives. At OMB, Dominic played a lead role in implementing and operating major initiatives such as the IT Dashboard, and he is currently heavily involved in implementing the Federal CIO’s 25-Point IT Management Reforms. Prior to arriving at OMB, Dominic began his Federal career as a program analyst in the OCIO at the Department of Transportation. In his prior life as a contractor at both BAE Systems and BearingPoint, Dominic managed EA, capital planning and security initiatives at DOL, NLRB, FDA, and Census. He has also worked on a variety of federal programs, at agencies such as the IRS, US Postal Service, US Mint, US Patent and Trademark Office, and the National Park Service.

http://semanticommunity.info/Big_Data_Symposia#Speaker_Bio_for_Dominic_Sale

“New Digital Government Strategy is treating all content as data.”

Page 6: Open DATA METI: All Content As Big Data

6

My Process

• Open DATA METI Web Site to MindTouch Knowledge Base to an Excel Spreadsheet

• Open DATA METI Data Set List by File Type to an Excel Spreadsheet

• Open DATA METI Data Sets by Metadata to an Excel Spreadsheet

• Import the Above (3) and Selected Open DATA METI Data Sets Into Spotfire

• Get Visualizations and Beginning of a Unified Big Data Architecture and Ecosystem for Big Data Integration

Page 7: Open DATA METI: All Content As Big Data

7

Open DATA METI: WordPress & CKAN

http://datameti.go.jp/

About DATA METI:HomeTerms of usePrivacy PolicyNotation of creditPartners leverage DATA METIInquiryAPIAPI Documentation

Section:TagStatisticsRevisionSite administrator

Page 8: Open DATA METI: All Content As Big Data

8

Open DATA METI: MindTouch

http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI

Knowledge Base with Well-Defined URLs

Page 9: Open DATA METI: All Content As Big Data

9

Open DATA METI: Excel Spreadsheet 1Knowledge Base

http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

Page 10: Open DATA METI: All Content As Big Data

10

Open DATA METI: Data Set List

http://datameti.go.jp/data/

Drill Down on These 19

Page 11: Open DATA METI: All Content As Big Data

11

Open DATA METI: Excel Spreadsheet 2Data Set List

http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

Page 12: Open DATA METI: All Content As Big Data

12

Open DATA METI:Comprehensive Energy Statistics

http://datameti.go.jp/data/group/statistics_sougouenergy

Page 13: Open DATA METI: All Content As Big Data

13

Open DATA METI:General Energy Statistics (FY 2011)

http://datameti.go.jp/data/dataset/statistics_sougouenergy_2011

Some Have Lots of Files

Source of Data

Page 14: Open DATA METI: All Content As Big Data

14

Open DATA METI:Source

http://www.enecho.meti.go.jp/info/statistics/jukyu/index.htm

Page 15: Open DATA METI: All Content As Big Data

15

Open DATA METI:Link to Excel Spreadsheet

http://datameti.go.jp/data/dataset/statistics_sougouenergy_2011/resource/b707e1d2-bd3d-483a-ab83-65e081c6daab

Link to SpreadsheetMy Comment: This is too many clicks to get to the actual data!

Page 16: Open DATA METI: All Content As Big Data

16

Open DATA METI:Excel Spreadsheet

http://www.enecho.meti.go.jp/info/statistics/jukyu/resource/xls/2011fysokuhou.xls

Page 17: Open DATA METI: All Content As Big Data

17

Open DATA METI:Excel Spreadsheet in Spotfire

Needs reformatting and language translation.Needs reformatting and language translation.

Beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp

Page 18: Open DATA METI: All Content As Big Data

18

Open DATA METI: Excel Spreadsheet 3Data Sets Metadata

http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

Page 19: Open DATA METI: All Content As Big Data

19

Open DATA METI:Excel Spreadsheet 1-3 in Spotfire

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp

Page 20: Open DATA METI: All Content As Big Data

20

Open DATA METI: Excel Spreadsheet 4Merged Data Sets

http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx

Page 21: Open DATA METI: All Content As Big Data

21

Open DATA METI:Merged Data Sets in Spotfire

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp

Page 22: Open DATA METI: All Content As Big Data

22

Summary• Preface:

– All the work with Data Catalogs does not really help with data integration.– Big Data Spells New Architecture.– Big Data is the new software.– New Digital Government Strategy is treating all content as data.

• The Open DATA METI Data Catalog has been turned into data in spreadsheets and statistical visualizations in Spotfire.

• This simplifies the complex WordPress & CKAN interface which requires lots of extra mouse clicks and provides no faceted search.

• Google Chrome provides Japanese language translation of the metadata, but not of the data columns in the spreadsheets.

• This process provides the beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.