LSD DimensionsUse and Reuse of Linked
Statistical Data as RDF Data Cube
Albert Meroño-Peñuela@albertmeronyo
WAI meeting 06-10-2014
Statistics!
Data integration – 220 years ago
Data integration - nowadays
Data integration - nowadays
Towards 5-star Linked Statistical Data
Towards 5-star Linked Statistical Data
Towards 5-star Linked Statistical Data
DFT
Towards 5-star Linked Statistical Data
DFT
Eurostat TSV
RDF Data Cube
• 4-star LSD: use URIs to denote (statistical) things
• 5-star LSD: link own (statistical) things to other (statistical) things
“There are many situations where it would be useful to be able to publish multi-dimensional data, such as
statistics, on the web in such a way that they can be linked to related data sets and concepts.”
RDF Data Cube vocabulary (QB)• SDMX compatible• Defines cubes as a set of observations that consist of
dimensions, measures and attributes
• Dimensions: time period, region, sex (qb:DimensionProperty)• Measure: population life expectancy (qb:MeasureProperty)
• Attribute: unit of measure = years, metadata status = measured (qb:AttributeProperty)
Observation: “the measured life expectancy of males in Newport in the period 2004-2006 is 76.7 years”
5-star LSD: 270a.info
Sarven Capadisli, Sören Auer, Reinhard Riedl. “Linked Statistical Data Analysis”. 1st Int. Workshop on Semantic Statistics (SemStats) ISWC 2013.
Are we done?
• P1: Comparability? Can we arbitrarily combine any pair of these datasets/dimensions?
• P2: Reusability? How often are dimensions reused? Can we reuse dimensions created by others?
• P3: Discoverability? How to discover dimensions created by others?
• P4: Relevance? What’s the size of LSD?
P1: Comparability of LSD: SSCLSDA
Sarven Capadisli, Albert Meroño-Peñuela, Sören Auer, Reinhard Riedl. “Semantic Similarity and Correlation of Linked Statistical Data Analysis”. 2nd Int. Workshop on Semantic Statistics (SemStats) ISWC 2014.
P2+P3+P4: LSD Dimensions
Need for an intelligent system that helps us on (1) discovering (2) reusing (3) analyzing dimensions in LSD
http://lsd-dimensions.org/
http://lsd-dimensions.org/
Are we done?
• P1: Comparability? Can we arbitrarily combine any pair of these datasets/dimensions? Unclear
• P2: Reusability? How often are dimensions reused? Can we reuse dimensions created by others? Logarithmic law / Probably yes
• P3: Discoverability? How to discover dimensions created by others? LSD Dimensions
• P4: Relevance? What’s the size of LSD? ~8.5% of the LOD cloud
Future Work
• Monitor additional metadata (rdfs:subPropertyOf, rdfs:range)
• Generate PROV during crawling
• Modeling of formulas in RDF Data Cube
• Plug to LOD Laundromat
• Crawl dimensions and codes from qb:Observation
• SPARQL endpoint and API– Suggest dimensions and codes to users
Thank you
Questions, suggestions, comments most welcome
@albertmeronyo
http://lsd-dimensions.org/https://github.com/albertmeronyo/LSD-Dimensionshttps://github.com/csarven/sense-of-lsd-analysis
http://www.cedar-project.nl