Download - Statistical data in RDF
![Page 1: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/1.jpg)
Statistical Data in RDF
Knowledge Engineering Group Seminar, November 4th 2010
Jindřich Mynarz@jindrichmynarz
![Page 2: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/2.jpg)
Scope of the talk
• not microdata (e.g., survey data)
• but aggregated data (e.g., averages)
• only RDF
• overview of existing statistical datasets
![Page 3: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/3.jpg)
RDF
• separation of content and layouto in tabular data table layout
defines the way of interpretation• flexible, schema-less data format
o not overly inclusive, nor overly exclusive
![Page 4: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/4.jpg)
Existing statistics in RDF
• CIA World Factbook• U.S. Census 2000 dataset• LOIUS - Italian linked university statistics• Linked Environment Data• EnAKTing datasets• data.gov.uk datasets
![Page 5: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/5.jpg)
Eurostat data
• Freie Universität Berlin - D2R Server• riese (RDFizing and Interlinking the EuroStat Data Set
Effort)• OntologyCentral - real-time wrapper• Eurostat's own RDF datasets
![Page 6: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/6.jpg)
Governmental statistics
• data.gov• data.gov.uk
o EnAKTing mashups and data visualizationso population, crime, CO2 emissions, transport, agriculture,
education...
![Page 7: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/7.jpg)
Data modelling
• what is being modelled?o the real worldo a part of the real worldo statistics
• two parts of modellingo structural semanticso domain semantics
![Page 8: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/8.jpg)
Structural semantics
• means of expression for the cube's structure
• groups, slices, time series• addressed in Data Cube
vocabulary
![Page 9: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/9.jpg)
Domain semantics
• how a dataset refers to the things that it is about
• connecting statistical observations to the model of the domain described by them
• domain is a set of non-information resources
![Page 10: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/10.jpg)
Vocabularies
• number of ad hoc vocabularies• riese• SCOVO• SCOVOLink• Data Cube• SDMX/RDF
![Page 11: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/11.jpg)
SCOVO
• The Statistical Core Vocabulary• inspired by riese vocabulary• modelling of dimensions and observations as separate
resources• lightweight, easy to adopt• SCOVOLink addresses domain semantics
![Page 12: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/12.jpg)
Data Cube
• inspired by SCOVOo added expressive power
• generalization from SDMX/RDF• re-use of SKOS for codelists
![Page 13: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/13.jpg)
Data Cube
![Page 14: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/14.jpg)
Data Cube
• dimensions (rdf:Property)• coded values (skos:Concept)
![Page 15: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/15.jpg)
Data Cube
![Page 16: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/16.jpg)
SDMX/RDF
• Statistical Data and Metadata eXchange reformulated in RDF
• built on top of Data Cube• contains:
o sdmxo sdmx-attributeo sdmx-codeo sdmx-concepto sdmx-dimensiono sdmx-measureo sdmx-metadatao sdmx-subject
![Page 17: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/17.jpg)
Important parts of modelling
• re-use• units• time• identifiers• URI patterns
![Page 18: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/18.jpg)
Re-use oriented design
• re-purposing parts of the existing datasets
• re-using shared vocabularies
• vocabulary hi-jacking and extension
![Page 19: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/19.jpg)
Units of measurement
• implicito “78693011 mˆ2”, “117
b”o eurostat:total_area_km2
• explicito :unit, sdmx-attribute:unitMeasure
![Page 20: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/20.jpg)
Modelling of time
• exclusion of the dimension of time (D2R Eurostat, U.S. Census 2000)
• time dimension (riese, SDMX/RDF)o dimension:Time, sdmx:TimeRoleo time series
![Page 21: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/21.jpg)
Identifiers
• blank nodes• URIs• HTTP URIs
![Page 22: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/22.jpg)
URI design patterns
• on the Webo http://
• human-readableo what/is/this/about
• clustered by resource typeo type/unique-id
• standardizedo {provider 1}/path/to/an/observationo {provider 2}/path/to/an/observation
• hierarchicalo {broader}/{narrower}
• reflecting the location of an observation in a data cubeo {dimension 1}/{dimension 2}
![Page 23: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/23.jpg)
Following steps
• data conversion• interlinking dataset's resources• linking external datasets• publishing
![Page 24: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/24.jpg)
Legacy datasets
• statistics-specific data formats• implicit context of
interpretation• parsing, cleaning• conversion mechanisms
o SQL DB wrappers (e.g., D2R Server)
o real-time exporters (e.g., OntologyCentral)
o RDFizers (e.g. RDF123)o custom-built scripts
![Page 25: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/25.jpg)
Linking
• re-use by reference• lightweight intergration• linkable data• linking properties
o e.g., owl:sameAs, skos:closeMatch
![Page 26: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/26.jpg)
Publishing data
• new dissemination standards• exchanging data with the Web• RDF dumps• linked data distribution• SPARQL• RDFa
![Page 27: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/27.jpg)
Linked open data cloud
![Page 28: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/28.jpg)
Benefits
• data can be intergrated• open data• re-usable data• data available for applications
![Page 29: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/29.jpg)
Integration
• combining and merging with other datasets• re-use oriented design
![Page 30: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/30.jpg)
Open data
• freedom of information for public sector information• open licences
o Creative Commons, Open Government Licence...• public domain
![Page 31: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/31.jpg)
Anyone can solve the cube
• data is available for individual analysis
• offices for national statistics still have the monopoly on data collection, but no longer on interpretation of that data
• data-driven journalism
![Page 32: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/32.jpg)
Building on top of statistical data
• once the data is available useful applications can be built on top of it
• data visualizations• data analysis tools
![Page 33: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/33.jpg)
Questions!
![Page 34: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/34.jpg)
Thank you for attention!
![Page 35: Statistical data in RDF](https://reader034.vdocuments.us/reader034/viewer/2022042813/540587488d7f729b768b4d8f/html5/thumbnails/35.jpg)
Image credits
Semantic Web Rubik's Cube. http://www.flickr.com/photos/dullhunk/3448804778/Rubik's Cube. http://www.flickr.com/photos/bramus/3249196137/Hypercube. http://commons.wikimedia.org/wiki/File:Hypercube.pngPICOL: Pictorial communication language. http://picol.org/Dictionary. http://www.flickr.com/photos/horiavarlan/4268897748/Oops! http://www.flickr.com/photos/rore/299375688/Tape Measure. http://www.flickr.com/photos/wwarby/4915969081/Rubik's Cube 1. http://www.flickr.com/photos/lifeontheedge/374960949/Detroit's Skyline. http://www.flickr.com/photos/showmeone/4154861617/Linked Oped Data Cloud. http://richard.cyganiak.de/2007/10/lod/Cube. http://followtherhythm.deviantart.com/art/cube-128329792Data Cube diagram. http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/qb-fig1.png