data identification

18
1 Data Identification

Upload: olisa

Post on 19-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Data Identification. Open Data Around the World. Before. What data do you have? Have to ask for data through a FIOA request Wasn't always in a digital format Very long time to get and make use of. Why Should I care?. Health (hospital scores, diet/food) Economics (unemployment, CPI) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Identification

1

Data Identification

Page 2: Data Identification

2

Open Data Around the World

Page 3: Data Identification

3

Before

What data do you have?

Have to ask for data through a FIOA request

Wasn't always in a digital format

Very long time to get and make use of

Page 4: Data Identification

4

Why Should I care?

Health (hospital scores, diet/food)

Economics (unemployment, CPI)

Crime (rates, geo/temporal)

Environment (air quality, weather)

Education (rates, school districts)

So much more....

Page 5: Data Identification

5

Data.gov

Page 6: Data Identification

6

Raw Data

Page 7: Data Identification

7

Data.gov Dataset Page

Page 8: Data Identification

8

Other Raw Datasets

Page 9: Data Identification

9

Challenges

Machine-readability

Metadata

Provenance

Discovery

Mashing/linking

Page 10: Data Identification

10Linked Data

decentralized - sources may be spread out and referenced across the Web

modular - linked without advance planning or coordination

scalable - once stored in place, it’s easy to extend

advantages hold even when definitions and structure of the data changes over time.

Page 11: Data Identification

11Linked Open Data Cloud

Page 12: Data Identification

12

Page 13: Data Identification

13Linking Open Government Data

Page 14: Data Identification

14Catalog

Page 15: Data Identification

15Dataset Page

Page 16: Data Identification

16

Data Understaing

Page 17: Data Identification

17Conversion:

From Raw Tabular Data to RDF

Page 18: Data Identification

18Enhancement:

Linking Open Government Data

ID year PHSY_ST site-id cost

1998 10.0

1999 site123 11.3

2000 NY 8.3

2001 20

site-id Latitude longitude

site123 43.993 -70.326

Year claims

2000 382

PHSY_ST: state abbreviationID: unique id

cost: unit is million US dollarsyear: 1975-2008

Correlated dataset Complement dataset

Metadata (field definition) Metadata (value definition)

owl:sameAs

DS123:NY