finding data sets
DESCRIPTION
TRANSCRIPT
![Page 1: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/1.jpg)
Finding Data Sets
Anja Jentzsch, Freie Universität Berlin
17 April 2012
Tutorial: Practical Cross-Dataset Queries on the Web of Data
WWW2012, Lyon, France
1
![Page 2: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/2.jpg)
Different motivations
• Finding data sets
• Look for resources to link a data set to
• Find a data set with relevant data to consume / integrate
• Finding vocabularies
• Find vocabularies to use to model data sets
• Find vocabularies to map your existing schema to
2
![Page 3: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/3.jpg)
Different tool types
• Search engines
• find data sets based on keywords
• Data catalogs / directories
• explore data sets and faceted search
• Data Marketplaces
• explore and consume data sets
3
![Page 4: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/4.jpg)
Linked Data Search Engines
• The description of the resources is published as document in RDF
• RDF search engine index the RDF documents
• Process similar to that of search engines for HTML documents
4
![Page 5: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/5.jpg)
5http://sindice.com
![Page 6: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/6.jpg)
6http://sindice.com
![Page 7: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/7.jpg)
7http://sig.ma
![Page 8: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/8.jpg)
8http://sig.ma
![Page 9: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/9.jpg)
9http://swoogle.umbc.edu
![Page 10: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/10.jpg)
10http://kmi-web05.open.ac.uk/WatsonWUI/
![Page 11: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/11.jpg)
11http://factforge.net
![Page 12: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/12.jpg)
12http://factforge.net
![Page 13: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/13.jpg)
Suitability
• Look for resources to link a data set to
• Good
• Find a data set with relevant data to consume
• Maybe good: depends on how the query is expressed
• Find vocabularies to use to model data sets
• Not good: everything is indexed, too much noise
13
![Page 14: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/14.jpg)
Data catalogs
• Several governments and institutions are opening their catalogs
• http://datacatalogs.org provides a manually curated index of 226 data catalogs
14
![Page 15: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/15.jpg)
15http://datacatalogs.org
![Page 16: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/16.jpg)
16
![Page 17: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/17.jpg)
The Data Hub
• Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets
• Various metadata for each data set
• Other views over (part of) its content
• Semantic CKAN (http://semantic.ckan.net)
• LATC Data Source Inventory
• LOD Cloud
• State of the LOD Cloud
17
![Page 18: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/18.jpg)
18http://thedatahub.org
![Page 19: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/19.jpg)
19
![Page 20: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/20.jpg)
20http://dsi.lod-cloud.net
![Page 21: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/21.jpg)
21http://lod-cloud.net
![Page 22: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/22.jpg)
22http://lod-cloud.net/state/
![Page 23: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/23.jpg)
23http://lod-cloud.net/state
![Page 24: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/24.jpg)
Data Marketplaces
• “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” (http://datamarket.com)
24
![Page 25: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/25.jpg)
Kasabi
• Data domain
• All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …
• Data population
• Public datasets
• User submitted datasets
• Data size
• 186 data sets
• Data model
• RDF
25
![Page 26: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/26.jpg)
26http://kasabi.com
![Page 27: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/27.jpg)
Freebase
• Metaweb (USA), now Google
• Free for 100K read API calls per day (10K write), paid for higher volumes
• Data access
• REST API
• Linked Data endpoint (http://rdf.freebase.com)
• Triple uploader / RDF dumps
• Data tools
• Web based – schema editor, review queue, viewers, …
• GridWorks (Google Refine)
• Exploring, data cleaning, transformation of tabular data
• Map data to Freebase schema & RDF export (3rd party extension) 27
![Page 28: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/28.jpg)
28http://www.freebase.com
![Page 29: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/29.jpg)
29
![Page 30: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/30.jpg)
Linked Open Vocabularies (LOV)
• Initiative similar to the LOD Cloud but focused on vocabularies
• 250+ vocabularies
30
![Page 31: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/31.jpg)
31http://labs.mondeca.com/dataset/lov/
![Page 32: Finding Data Sets](https://reader034.vdocuments.us/reader034/viewer/2022042714/54bd4d754a79595c058b45c4/html5/thumbnails/32.jpg)
32