lsd dimensions: use and reuse of linked statistical data as rdf data cube

25
LSD Dimensions Use and Reuse of Linked Statistical Data as RDF Data Cube Albert Meroño-Peñuela @albertmeronyo WAI meeting 06-10-2014

Upload: albert-merono-penuela

Post on 04-Jul-2015

106 views

Category:

Data & Analytics


0 download

DESCRIPTION

Governments, public agencies and institutions, and companies produce a great amount of statistical data every year. Much of these data are released as Open Data and published on the Web, although usually as documents, not as Linked Data. In this talk I'll introduce RDF Data Cube (QB), a W3C standard for publishing multidimensional data, such as statistics, on the Web in such a way that they can be linked to other datasets and concepts. However, QB is pretty open towards how users should model dimensions and codes (variables and values in QB jargon), which hampers reusability of existing ones. To this end, I'll show you LSD Dimensions, a web based application that monitors the usage of dimensions and codes over five hundred public SPARQL endpoints.

TRANSCRIPT

Page 1: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

LSD DimensionsUse and Reuse of Linked

Statistical Data as RDF Data Cube

Albert Meroño-Peñuela@albertmeronyo

WAI meeting 06-10-2014

Page 2: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Statistics!

Page 3: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Data integration – 220 years ago

Page 4: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Data integration - nowadays

Page 5: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Data integration - nowadays

Page 6: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Towards 5-star Linked Statistical Data

Page 7: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Towards 5-star Linked Statistical Data

Page 8: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Towards 5-star Linked Statistical Data

DFT

Page 9: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Towards 5-star Linked Statistical Data

DFT

Eurostat TSV

Page 10: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

RDF Data Cube

• 4-star LSD: use URIs to denote (statistical) things

• 5-star LSD: link own (statistical) things to other (statistical) things

“There are many situations where it would be useful to be able to publish multi-dimensional data, such as

statistics, on the web in such a way that they can be linked to related data sets and concepts.”

Page 11: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
Page 12: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
Page 13: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

RDF Data Cube vocabulary (QB)• SDMX compatible• Defines cubes as a set of observations that consist of

dimensions, measures and attributes

• Dimensions: time period, region, sex (qb:DimensionProperty)• Measure: population life expectancy (qb:MeasureProperty)

• Attribute: unit of measure = years, metadata status = measured (qb:AttributeProperty)

Observation: “the measured life expectancy of males in Newport in the period 2004-2006 is 76.7 years”

Page 14: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

5-star LSD: 270a.info

Sarven Capadisli, Sören Auer, Reinhard Riedl. “Linked Statistical Data Analysis”. 1st Int. Workshop on Semantic Statistics (SemStats) ISWC 2013.

Page 15: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Are we done?

• P1: Comparability? Can we arbitrarily combine any pair of these datasets/dimensions?

• P2: Reusability? How often are dimensions reused? Can we reuse dimensions created by others?

• P3: Discoverability? How to discover dimensions created by others?

• P4: Relevance? What’s the size of LSD?

Page 16: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

P1: Comparability of LSD: SSCLSDA

Sarven Capadisli, Albert Meroño-Peñuela, Sören Auer, Reinhard Riedl. “Semantic Similarity and Correlation of Linked Statistical Data Analysis”. 2nd Int. Workshop on Semantic Statistics (SemStats) ISWC 2014.

Page 17: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

P2+P3+P4: LSD Dimensions

Need for an intelligent system that helps us on (1) discovering (2) reusing (3) analyzing dimensions in LSD

Page 18: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

http://lsd-dimensions.org/

Page 19: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

http://lsd-dimensions.org/

Page 20: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
Page 21: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
Page 22: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
Page 23: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Are we done?

• P1: Comparability? Can we arbitrarily combine any pair of these datasets/dimensions? Unclear

• P2: Reusability? How often are dimensions reused? Can we reuse dimensions created by others? Logarithmic law / Probably yes

• P3: Discoverability? How to discover dimensions created by others? LSD Dimensions

• P4: Relevance? What’s the size of LSD? ~8.5% of the LOD cloud

Page 24: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Future Work

• Monitor additional metadata (rdfs:subPropertyOf, rdfs:range)

• Generate PROV during crawling

• Modeling of formulas in RDF Data Cube

• Plug to LOD Laundromat

• Crawl dimensions and codes from qb:Observation

• SPARQL endpoint and API– Suggest dimensions and codes to users

Page 25: LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube

Thank you

Questions, suggestions, comments most welcome

@albertmeronyo

http://lsd-dimensions.org/https://github.com/albertmeronyo/LSD-Dimensionshttps://github.com/csarven/sense-of-lsd-analysis

http://www.cedar-project.nl