exploration, visualization and querying of linked open data sources

80
Exploration, Visualization and Querying of Linked Open Data sources 2nd Keystone Training School - Keyword Search in Big Linked Data Centro Singular de Investigación en Tecnoloxías da Información (CiTIUS ), University of Santiago de Compostela (USC), Spain. Laura Po Department of Engineering «Enzo Ferrari» University of Modena and Reggio Emilia Italy

Upload: laura-po

Post on 21-Jan-2017

509 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Exploration, visualization and querying of linked open data sources

Exploration, Visualization and Querying of

Linked Open Data sources

2nd Keystone Training School - Keyword Search in Big Linked Data

Centro Singular de Investigación en Tecnoloxías da Información (CiTIUS), University of Santiago de Compostela (USC), Spain.

Laura Po

Department of Engineering «Enzo Ferrari»

University of Modena and Reggio Emilia

Italy

Page 2: Exploration, visualization and querying of linked open data sources

MODENA

Page 3: Exploration, visualization and querying of linked open data sources

Outline

• Introduction to Linked Open Data

• Searching for LOD datasets

• Exploring a dataset

• Visualization tools

• Querying a SPARQL Endpoint

MORNING SESSION

AFTERNOON HANDS-ON SESSION

Page 4: Exploration, visualization and querying of linked open data sources
Page 5: Exploration, visualization and querying of linked open data sources

Searching for LOD datasets• Portals that collects datasets

• Datahub – a portal that collect datasets• DataPortals.org- a portal that maintains a list of open data portals in the world

• International or national open data portals• EU Open Data Portal is the single point of access to a wide range of data held by EU public

administrations at all levels of government, agencies and other bodies, that allows access in all 24 EU official languages

• European Union Open Data portal – the Open Data portal for the European Commission and other institutions of the European Union.

• Popular Datasets• Wikidata - a collaboratively-created linked dataset that acts as central storage for the structured data

of its Wikimedia sister projects• DBpedia – a dataset containing extracted data from Wikipedia; it contains about 3.4 million concepts

described by 1 billion triples, including abstracts in 11 different languages• GeoNames provides RDF descriptions of more than 7,500,000 geographical features worldwide.• FOAF – a dataset describing persons, their properties and relationships

Page 6: Exploration, visualization and querying of linked open data sources

DataHub

• Datahub collects more than 10.000 datasets

• It is a data management platform from the Open Knowledge Foundation, based on the CKAN data management system.

• CKAN is a tool for managing and publishing collections of data. It is used by national and local governments, research institutions, and other organisations which collect a lot of data.

Page 7: Exploration, visualization and querying of linked open data sources

Search on Datahub

• To find datasets, type any combination of search words (e.g. “health”, “transport”, etc) in the search box on any page. CKAN displays the first page of results for your search. You can:• View more pages of results

• Repeat the search, altering some terms

• Restrict the search to datasets with particular tags, data formats, etc using the filters in the left-hand column

• If datasets are tagged by geographical area, it is also possible to run CKAN with an extension which allows searching and filtering of datasets by selecting an area on a map.

Page 8: Exploration, visualization and querying of linked open data sources

Exploring datasets

• When you have found a dataset you are interested and selected it, CKAN will display the dataset page. This includes• The name, description, and other information about the dataset• Links to and brief descriptions of each of the resources

• The resource descriptions link to a dedicated page for each resource. This resource page includes information about the resource, and enables it to be downloaded.

• Many types of resource can also be previewed directly on the resource page. .CSV and .XLS spreadsheets are previewed in a grid view, with map and graph views also available if the data is suitable. The resource page will also preview resources if they are common image types, PDF, or HTML.

• The dataset page also has two other tabs:• Activity stream – see the history of recent changes to the dataset• Related items – see any links to web pages related to this dataset, or add your own links.

Page 9: Exploration, visualization and querying of linked open data sources

Exercise 1 – Datahub

• Find the dbpedia dataset in Datahub

• Look at the possible way the dataset can be accessed

• Find datasets about Santiago

• Can you find some interesting data source?

• How much information are given on these data?

• How many formats and access points to the datasets are available?

Page 10: Exploration, visualization and querying of linked open data sources

International or national open data portals

• The European Data Portal harvests the metadata of Public Sector Information available on public data portals across European countries. Information regarding the provision of data and the benefits of re-using data is also included.

• Public sector information is information held by the public sector. The Directive on the re-use of public sector information provides a common legal framework for a European market for government-held data.

Page 11: Exploration, visualization and querying of linked open data sources

Improving the accessibility and Value of OGD

• The strategic objective of the European Data Portal is to improve accessibility and increase the value of Open Governament Data:

• Accessibility: How to access this information? Where to find it? How to make it available in the first place? In domains, across domains, across countries? In what language?

• Value: For what purpose and what economic gain? Societal gain? Democratic gain? In what format? What is the critical mass?

• The European Data Portal addresses the whole data value chain: from data publishing to data re-use.

Page 12: Exploration, visualization and querying of linked open data sources

A checklist for using Open Data

Having access to data is a first step. Data is not an end in itself. Data can be used in different ways and for different purposes. Data can also be available with different licences, formats and quality.

• Define your purpose: You might specify a topic or a service or an application of interest

• Identify data labels: Filter the data labels and metadata

• Check Openness: Take a look at the licence information. Make sure a licence is available which allows you to make use of the data in the way that you intend (e.g. that commercial re-use is allowed if you develop a commercial application).

After you have decided that a specific data set is exactly what you are looking for

• Select the useful file format - you are probably able to choose to download the datasets in different file formats. Depending on your computer skills, you can choose the file type that is most appropriate. Most datasets are available in an open file format.

• Check the data quality – check the last date the file was modified, check whether information about the time period is provided.

Page 13: Exploration, visualization and querying of linked open data sources

A checklist for using Open Data

Having access to data is a first step. Data is not an end in itself. Data can be used in different ways and for different purposes. Data can also be available with different licences, formats and quality.

• Define your purpose: You might specify a topic or a service or an application of interest

• Identify data labels: Filter the data labels and metadata

• Check Openness: Take a look at the licence information. Make sure a licence is available which allows you to make use of the data in the way that you intend (e.g. that commercial re-use is allowed if you develop a commercial application).

After you have decided that a specific data set is exactly what you are looking for

• Select the useful file format - you are probably able to choose to download the datasets in different file formats. Depending on your computer skills, you can choose the file type that is most appropriate. Most datasets are available in an open file format.

• Check the data quality – check the last date the file was modified, check whether information about the time period is provided.

• Form

• how has the data been processed?

• is it in raw or summary form?

• how will its form affect your analysis/product/application?

• what syntactic (language) and semantic (meaning) transformations will you need to make?

• is this compatible with other datasets you have?

• Quality

• how current is the data?

• how regularly is it updated?

• do you understand all the fields and their context?

• for how long will it be published? what is the commitment by the publisher?

• what do you know about the accuracy of the data?

• how are missing data handled?

Page 14: Exploration, visualization and querying of linked open data sources

Exercise 2 – EDP

• Choose one category on the EDP and find datasets describing one specific topic (for example in the category of transport the topic could be cycling routes)

General analysis

• How many datasets are available?

• What are the main datasets?

Local analysis

• How many information about your country are available?

• How many formats and access points to the datasets are available?

Page 15: Exploration, visualization and querying of linked open data sources

National open data portals

• Notable examples of Open Data portals maintained by public administrations in Europe are:

• France

• opendata.paris.fr

• www.data.gouv.fr

• Italy

• www.dati.piemonte.it

• www.dati.gov.it

• Netherland

• www.data.overheid.nl

• UK

• data.gov.uk

Page 16: Exploration, visualization and querying of linked open data sources

Open data websites in Europe

International

publicdata.eu

data.un.org

data.worldbank.org

EU Member Statesdata.gov.beopendata.government.bgopendata.czportal.opendata.dkgovdata.deopendata.eedata.gov.iedata.gov.grdatos.gob.esdata.gouv.frdata.gov.hrdati.gov.itdata.gov.cyopendata.gov.lt

data.public.ludata.gov.mtdata.overheid.nldata.gv.atdanepubliczne.gov.pldados.gov.ptdata.gov.ronio.gov.si/nio/data.gov.skavoindata.fioppnadata.sedata.gov.uk

Page 17: Exploration, visualization and querying of linked open data sources

Exercise 3 – Find national Open Data portals

• Find the government open data portal from your member state

…some suggestions:• Search in the list of EU member states open data websites• Search in DataCatalogs• Search in Google

• How many portals that collect open data are available in your member state?

• How many datasets are collected in the portals?

• Have you already used some of these data?

Page 18: Exploration, visualization and querying of linked open data sources

Ranking of the national open data datasets

Global Open Data Index – is an annual report to measure the state of open government data around the world. The goal is to provide a civil society audit of how governments actually publish data - with input and review from citizens and organisations around the world.

• Topical experts review datasets from different country, establish a baseline and track changes and trends in the open data world over time as the field evolves.

Open Data Barometer - aims to uncover the impact of open data initiatives around the world. It analyses global trends, and provides comparative data on countries and regions via an in-depth methodology combining contextual data, technical assessments and secondary indicators to explore multiple dimensions of open data readiness, implementation and impact.

• This is the second edition of the Open Data Barometer. The Open Data Barometer forms part of the World Wide Web Foundation’s work on common assessment methods for open data.

Page 19: Exploration, visualization and querying of linked open data sources

Exercise 4 – Find the ranking

• By using the information on Open Data Barometer and Global Open Data index, find the ranking for the Open Data Portals/Initiatives of your member state

Some questions

• Does the National Maps have an open lincense?

• Is the Government Budget publicly available?

• How are the Government policies of your country compared to the mean of Europe and Central Asia?

• How high/low is the impact of open data in your country?

Page 20: Exploration, visualization and querying of linked open data sources
Page 21: Exploration, visualization and querying of linked open data sources

Exploring the Web of Data• Linked Data Browsers - generic Linked Data browsers which allow users to start browsing in one data source

and then navigate along links into related data sources

• OpenLink Data Explorer a Web browser extension, and a server-side component of the OpenLink Ajax Toolkit.

• Marbles tabular Linked Data browser supporting Fresnel.

• Sigma, Live views on the Web of Data

• Quick & Dirty RDF Browser Simple RDF browser. Useful for checking RDF or RDFa says what you intended.

• Graphity Client Generic Linked Data browser and platform for building declarative SPARQL triplestore-backed Web applications. Apache license.

• Linked Data mashups

• Revyu by Tom Heath. Uses Linked Data from DBpedia to augment reviews, for instance with information about a director for a film.

• DBpedia Mobile by Christian Becker and Chris Bizer. Combines Linked Data from DBpedia, the flickrwrapper, and Revyu.

• Music Mashup by Yves Raimond. Combines Linked Data from various music related data sources.

• Linked Data Search engines - crawl the Web of Data by following links between data sources and provide expressive query capabilities over aggregated data

Page 22: Exploration, visualization and querying of linked open data sources

Linked Data Browsers

http://marbles.sourceforge.net

Marbles

Page 23: Exploration, visualization and querying of linked open data sources

Linked Data Mashup

http://revyu.com

Revyu.com

Page 24: Exploration, visualization and querying of linked open data sources

Linked Data Mashup

DBPediaMobile Pictures from revyu.com

http://wiki.dbpedia.org/DBPediaMobile

Page 25: Exploration, visualization and querying of linked open data sources

Linked Data Mashup

http://sig.ma

SIGMA

Page 26: Exploration, visualization and querying of linked open data sources

Linked Data Search Engines

http://data.nytimes.com/schools/schools.html

NYTimes

Page 27: Exploration, visualization and querying of linked open data sources

Some Application Scenarios

BBC Music

Page 28: Exploration, visualization and querying of linked open data sources

Some Application Scenarios

BBC Music

Page 29: Exploration, visualization and querying of linked open data sources

Some Application Scenarios

LinkedGeoData.org

LinkedGeoData adds a spatial dimension to the Web of Data / Semantic Web. LinkedGeoData uses the information collected by the OpenStreetMap project and makes it available as an RDF knowledge base according to the Linked Data principles. It interlinks this data with other knowledge bases in the Linking Open Data initiative.

Page 30: Exploration, visualization and querying of linked open data sources

Exercise 5 – OpenLink exploration - Facebook to Linked Data Transformation Examples

• Install the OpenLink Data Explorer (ODE) extension for your browser (currently available for Firefox, Safari, Chrome, Opera, and Internet Explorer)

• This extension will allow you to explore the raw data and entity relationships that underlay the Web resources it processes.

• Select your Facebook Profile Page (or another person Facebook Profile page)

• Right-Click (or Ctrl-Click on Mac) on the page and then click on "View Page Description" to obtain a descriptions of the resources available on the linked page

• A description of the resource Metadata available on the page is displayed.

• More example: If you what to perform some other researches look at the example page

Page 31: Exploration, visualization and querying of linked open data sources
Page 32: Exploration, visualization and querying of linked open data sources

Visualization of Linked Data

• Why is it important?

• Actually the consumption of LOD is restricted to the Semantic Web community

• Visual tools that provide a coherent and legible picture of the data allow also non-technical audience • to obtain a good understanding of the data structure,

• and to compose query,

• identify links between resources

• and intuitively discover new pieces of information

Page 33: Exploration, visualization and querying of linked open data sources

What is visualization

• The visualization of information

• Goals:• Effective communication of information

• Clarity

• Integrity (all the information)

• Stimulate viewer engagment

• Focus on effectiveness

Page 34: Exploration, visualization and querying of linked open data sources

Why is visualization important?

• With lage datasets we need an efficint way to understand a vastamount of data

• The human visual system is the highest- bandwith channel to the human brain

Page 35: Exploration, visualization and querying of linked open data sources

Why visualize data instead of provide statisticanalysis?

http://en.wikipedia.org/wiki/Anscombe's_quartet

• Anscombe's quartet of datasets having similar statistical properties but appearing very different when plotted

Page 36: Exploration, visualization and querying of linked open data sources

Example of the Linked Data visualization process

Page 37: Exploration, visualization and querying of linked open data sources

Heatmap visualization of Beatles releases

Page 38: Exploration, visualization and querying of linked open data sources

Visualization, exploration and query tools

Page 39: Exploration, visualization and querying of linked open data sources

LOD live

LodLive project provides a demonstration of the use of Linked Data standards (RDF, SPARQL) to browse RDF resources. The application aims to spread linked data principles using a simple and friendly interface with reusable techniques.

http://en.lodlive.it/http://en.lodlive.it/?http://dbpedia.org/resource/Jules_Verne

Page 40: Exploration, visualization and querying of linked open data sources
Page 41: Exploration, visualization and querying of linked open data sources

Exercise 6 - LODLive

By using LodLive online to explore dbpedia resources, search for Serena Williams http://en.lodlive.it/- who is she?- where does she live?- where does she is list as a champion actually (find and explore the

"currentChampion" relation)?- find the statistics and records associated to her, navigate to the wikipedia

page, and discover what is the total win rate of Serena in Single disciplines

Page 42: Exploration, visualization and querying of linked open data sources

Visualbox

Visualbox allows you to create visualizations based on Linked Open Data. Thegoal of Visualbox is to facilitate the creation of visualization without the need to learn Javascript libraries. You do need to know a bit of SPARQL and some notions of HTML though.

Visualbox is a simplified version of LODSPeaKr, a framework to create Linked Data-based applications.

http://orion.tw.rpi.edu/~agraves/mozfest/index.html

Page 43: Exploration, visualization and querying of linked open data sources

VisualBox – some examplehttp://orion.tw.rpi.edu/~agraves/mozfest/action

http://orion.tw.rpi.edu/~agraves/mozfest/firesock_test

Page 44: Exploration, visualization and querying of linked open data sources

LODEXIt is a tool for producing a representative summary of a Linked open Data (LOD) source starting from scratch, thus supporting users in exploring and understanding the contents of a dataset.

LODeX extracts statistical indexes that uses to build the representative summary, by quering the SPARQL endpoint of a LOD source.

Two online versions:• LODeX 2.0 (http://www.dbgroup.unimo.it/lodex2 ) includes the possibility to

compose visual queries by selecting objects from the representative summary of a LOD source

• LODeX Cluster (http://www.dbgroup.unimo.it/lodex2/testCluster ) provides a more concise schema for huge datasets

Page 45: Exploration, visualization and querying of linked open data sources

LODeX ArchitectureTwo main modules

• Extraction & Summarization

– Index Extraction (IE)

– Post Processing (PP)

LOD Cloud

SPARQL

Queries

LODeXPost-

processing

StatisticalIndexes

LODeXIndexes

Extraction

Endpoint URLs

Schema Summary

NoSQL

SPARQLQueries

Schema

Summary

QueryOrchestrator

Schema Summary

Visualizzation

BasicQueryResults

• Visualization & Querying

– Schema Summary

Visualization

– Query Orchestrator

Page 46: Exploration, visualization and querying of linked open data sources

The Schema Summary is a pseudograph composed by:

C - Classes (nodes)P - Properties (edges)

And additional elements and function:

A - Attributes associated to each classEach attribute represent the existence of a Datatype property from the instances of the class

σ𝒍 - labelsl – labeling functioncount - count function

The Schema Summary is inferred by the distribution of the instances of a dataset

The Schema summary

Page 47: Exploration, visualization and querying of linked open data sources

A running example

ex:Sector foaf:Organization

owl:Class

ex:sector

“sector”

rdf:type rdf:type

rdf:Propertyrdf:type

owl:ObjectProperty

rdf:type

sector1 organization1ex:sector

dc:title

“Energy”

Extensional Classes

ExtensionalKnowledge

Intensional Knowledge

ex:activity

“Village electrificationin the Pacific”

organization2 “+41331231”

rdfs:label

rdfs:label

rdfs:domain

rdf:type

ex:sector

rdf:type rdf:type

dbpedia:fax

person1

foaf:Person

ex:activity

“Paolo”

rdf:type

ex:ceo

rdf:type foaf:firstName

foaf:lastName “Rossi”

The information contained in the Intensional knowledge can be incomplete

or absent

Page 48: Exploration, visualization and querying of linked open data sources

Indexes needed to generate a Schema SummaryThese indexes belong to extensional group of the Statistical Indexes [2]:SC (Subject Class) contains the pairs (p,c) where p is an object property and cis its domain class.SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype property and c is its domain class.OC (Object Class) contains the pairs (p,c) where p is an object property and c is its range class.

ex:Sector foaf:Organization

sector1 ex:sector organization1

dc:title

“Energy” organization2

ExtensionalClasses

ExtensionalKnowledge

“Village electrificationin the Pacific”

“+41331231”ex:sector

rdf:type rdf:type

dbpedia:fax

person1

foaf:Person

ex:activity

“Paolo”

rdf:type

ex:ceo

rdf:type foaf:firstName

foaf:lastName “Rossi”

Page 49: Exploration, visualization and querying of linked open data sources

Indexes needed to generate a Schema SummaryThese indexes belong to extensional group of the Statistical Indexes [2]:SC (Subject Class) contains the pairs (p,c) where p is an object property and cis its domain class.SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype property and c is its domain class.OC (Object Class) contains the pairs (p,c) where p is an object property and c is its range class.

ex:Sector foaf:Organization

sector1 organization1ex:sector

dc:title

“Energy” organization2

ExtensionalClasses

ExtensionalKnowledge

“Village electrificationin the Pacific”

“+41331231”ex:sector

rdf:type rdf:type

dbpedia:fax

person1

foaf:Person

ex:activity

“Paolo”

rdf:type

ex:ceo

rdf:type foaf:firstName

foaf:lastName “Rossi”

Page 50: Exploration, visualization and querying of linked open data sources

Indexes needed to generate a Schema SummaryThese indexes belong to extensional group of the Statistical Indexes [2]:SC (Subject Class) contains the pairs (p,c) where p is an object property and cis its domain class.SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype property and c is its domain class.OC (Object Class) contains the pairs (p,c) where p is an object property and c is its range class.

ex:Sector foaf:Organization

sector1 ex:sector organization1

dc:title

“Energy” organization2

ExtensionalClasses

ExtensionalKnowledge

“Village electrificationin the Pacific”

“+41331231”ex:sector

rdf:type rdf:type

dbpedia:fax

person1

foaf:Person

ex:activity

“Paolo”

rdf:type

ex:ceo

rdf:type foaf:firstName

foaf:lastName “Rossi”

Page 51: Exploration, visualization and querying of linked open data sources

Schema Summary generationWe use an algorithm for combining these indexes and produce a SchemaSummary

Name Values

SC

(foaf:Organization,ex:ceo,1),(foaf:Organization,ex:sector,2)

SCl

(foaf:Person,foaf:firstName,1),(foaf:Person,foaf:lastName,1),

(foaf:Organization,ex:dbpedia:fax,1), (ex:Sector,dc:title,1),

(foaf:Organization,ex:activity,1), (foaf:Organization,dbpedia:fax,1)

OC

(ex:Sector,ex:sector,1)(ex:Person,ex:ceo,1)

Page 52: Exploration, visualization and querying of linked open data sources

Schema Summary generation

foaf:Organizzation2

ex:Sector1

ex:sector 2foaf:Person1

ex:ceo 1

dc:title 1foaf:firstName1foaf:lastName 1

ex:activity 1dbpedia:fax 1

We use an algorithm for combining these indexes and produce a SchemaSummary

Name Values

SC

(foaf:Organization,ex:ceo,1),(foaf:Organization,ex:sector,2)

SCl

(foaf:Person,foaf:firstName,1),(foaf:Person,foaf:lastName,1),

(foaf:Organization,ex:dbpedia:fax,1), (ex:Sector,dc:title,1),

(foaf:Organization,ex:activity,1), (foaf:Organization,dbpedia:fax,1)

OC

(ex:Sector,ex:sector,1)(ex:Person,ex:ceo,1)

Page 53: Exploration, visualization and querying of linked open data sources

Visualization & Querying

Schema Summary Visualization

Front end of the Web Application composed by three panel:

List of datasets indexed in LODeXSchema Summary and query building panelRefinement panel

Query Orchestrator

It manages the interaction between the User and the GUIIt contains a SPARQL compiler able to compile the visual query in a SPARQL one

Page 54: Exploration, visualization and querying of linked open data sources

Schema Summary – Building a Visual Query

Page 55: Exploration, visualization and querying of linked open data sources

Refinement Panel

Page 56: Exploration, visualization and querying of linked open data sources

Exercise 7 - LODeXBy using Lodexhttp://www.dbgroup.unimore.it/lodex2/ find the datasetabout World War 1

• What is the name of the dataset?

• How many classes it has? How many properties it has?

Visualize and explore the LODeX schema summary of this dataset

• How many instances does the class Water have?

• What are the incomming properties of the class Municipality?

Define a visual query that select a Dataset and its creator.

• What is the sparql query?

Page 57: Exploration, visualization and querying of linked open data sources

Exercise 8 – Linked Clean Energy Data

• Search the Linked Clean Energy Data and navigate its schema summary• (http://www.dbgroup.unimore.it/lodex2/ok#!/schemaSummary/157)

• Create a visual query that select a Document and the Project Output associated.

• For the Project Output, show the title and reference number.

• Run the query and look at the results and the SPARQL query.

• Try to perform the same query at the sparql endpoint you can find in DataHubfor the Linked Clean Energy Data

• (http://sparql.reeep.org/)

Page 58: Exploration, visualization and querying of linked open data sources
Page 59: Exploration, visualization and querying of linked open data sources

Querying LOD datasets

• SPARQL query• On a SPARQL endpoint

• On a dump dataset

• Visual tools

Page 60: Exploration, visualization and querying of linked open data sources

Introduction to SPARQL• SPARQL Query

• Declarative query language for RDF data• http://www.w3.org/TR/rdf-sparql-query/

• SPARQL Algebra• Standard for communication between SPARQL services and clients• http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html

• SPARQL Update• Declarative manipulation language for RDF data• http://www.w3.org/TR/sparql11-update/

• SPARQL Protocol• Standard for communication between SPARQL services and clients• http://www.w3.org/TR/sparql11-protocol/

Page 61: Exploration, visualization and querying of linked open data sources

SPARQL Basics• RDF triple: Basic building block, of the form subject, predicate,

object. Example:

• RDF triple pattern: Contains one or more variables. Examples:

• RDF quad pattern: Contains graph name: URI or variable. Examples:

dbpedia:The_Beatles foaf:name "The Beatles" .

dbpedia:The_Beatles foaf:made ?album.

?album mo:track ?track .

?album ?p ?o .

GRAPH <:g> {:s :p :o .}

GRAPH ?g {dbpedia:The_Beatles foaf:name ?o.}

Page 62: Exploration, visualization and querying of linked open data sources

SPARQL Basics• RDF graph: Set of RDF assertions, manipulated as

a labeled directed graph.

• RDF data set: set of RDF triples. It is comprised of:• One default graph• Zero or more named graphs

• SPARQL protocol client: HTTP client that sends requests for SPARQL Protocol operations (queries or updates)

• SPARQL protocol service: HTTP server that services requests for SPARQL Protocol operations

• SPARQL endpoint: The URI at which a SPARQL Protocol service listens for requests from SPARQL clients

Page 63: Exploration, visualization and querying of linked open data sources

Querying Linked Data with SPARQL

Page 64: Exploration, visualization and querying of linked open data sources

SPARQL QueryMain idea: Pattern matching

• Queries describe sub-graphs of the queried graph

• Graph patterns are RDF graphs specified in Turtle syntax, which contain variables (prefixed by either “?” or “$”)

• Sub-graphs that match the graph patterns yield a result

?albumdbpedia: The_Beatles

foaf:made

Page 65: Exploration, visualization and querying of linked open data sources

SPARQL Query

?albumdbpedia:

The_Beatles

foaf:made

dbpedia: The_Beatlesfoaf:made

<http://musicbrainz.org

/record/...>

<http://musicbrainz.org

/record/...>

foaf:made

Data:

Graph pattern:Results:

"Help!" "Let It Be"

dc:title dc:title

<http://musicbrainz.org

/record/...>

"Abbey Road"

dc:title

foaf:made

?album

<http://musicbrainz.org...>

<http://musicbrainz.org...>

<http://musicbrainz.org...>

Page 66: Exploration, visualization and querying of linked open data sources

SPARQL Query

?album

dbpedia: The_Beatles

dbpedia: The_Beatlesfoaf:made

<http://musicbrainz.org

/record/...>

<http://musicbrainz.org

/record/...>

foaf:made

Data:

Graph pattern:Results:

"Help!" "Let It Be"

dc:title dc:title

<http://musicbrainz.org

/record/...>

"Abbey Road"

dc:title

foaf:made

?album ?title

<http://...> "Help!"

<http://...> "Abbey Road"

<http://...> "Let It Be"?title

dc:title

Page 67: Exploration, visualization and querying of linked open data sources

SPARQL Query

?album

dbpedia: The_Beatles

dbpedia: The_Beatlesfoaf:made

<http://musicbrainz.org

/record/...>

<http://musicbrainz.org

/track/...>

foaf:made

Data:

Graph pattern:Results:

"Help!" "Help!"

dc:title dc:title

mo:track

a

mo:Record mo:Track

mo:Record

?album

<http://musicbrainz.org...>

Page 68: Exploration, visualization and querying of linked open data sources

SPARQL Query: ComponentsPREFIX dbpedia: <http://dbpedia.org/resource/>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX dc: <http://purl.org/dc/elements/1.1/>

PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?album

FROM <http://musicbrainz.org/20130302>

WHERE {

dbpedia:The_Beatles foaf:made ?album .

?album a mo:Record ; dc:title ?title

}

ORDER BY ?title

Prologue:

• Prefix definitions

• Subtly different from Turtle syntax - the final period is not used

Page 69: Exploration, visualization and querying of linked open data sources

SPARQL Query: Components

Query form:

• ASK, SELECT, DESCRIBE or CONSTRUCT

• SELECT retrieves variables and their bindings as a table

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?albumFROM <http://musicbrainz.org/20130302>WHERE {

dbpedia:The_Beatles foaf:made ?album .?album a mo:Record ; dc:title ?title

} ORDER BY ?title

Page 70: Exploration, visualization and querying of linked open data sources

SPARQL Query: Components

Data set specification:

• This clause is optional

• FROM or FROM NAMED

• Indicates the sources for the data against which to find matches

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?albumFROM <http://musicbrainz.org/20130302>WHERE {

dbpedia:The_Beatles foaf:made ?album .?album a mo:Record ; dc:title ?title

} ORDER BY ?title

Page 71: Exploration, visualization and querying of linked open data sources

SPARQL Query: Components

Query pattern:

• Defines patterns to match against the data

• Generalises Turtle with variables and keywords – N.B. final period optional

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?albumFROM <http://musicbrainz.org/20130302>WHERE {

dbpedia:The_Beatles foaf:made ?album .?album a mo:Record ; dc:title ?title

} ORDER BY ?title

Page 72: Exploration, visualization and querying of linked open data sources

Solution modifier:

• Modify the result set

• ORDER BY, LIMIT or OFFSET re-organise rows;

• GROUP BY combines them

SPARQL Query: Components

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?albumFROM <http://musicbrainz.org/20130302>WHERE {

dbpedia:The_Beatles foaf:made ?album .?album a mo:Record ; dc:title ?title

} ORDER BY ?title

Page 73: Exploration, visualization and querying of linked open data sources

Query FormsSPARQL supports different query forms:

• ASK tests whether or not a query pattern has a solution. Returns yes/no

• SELECT returns variables and their bindings directly

• CONSTRUCT returns a single RDF graph specified by a graph template

• DESCRIBE returns a single RDF graph containing RDF data about resource

Page 74: Exploration, visualization and querying of linked open data sources

Query Form: ASK• Namespaces are added with the ‘PREFIX’ directive

• Statement patterns that make up the graph are specified between brackets (“{}”)

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX dbpedia-ont: <http://dbpedia.org/ontology/>PREFIX mo: http://purl.org/ontology/mo/

ASK WHERE { dbpedia:The_Beatles mo:memberdbpedia:Paul_McCartney.}

Is Paul McCartney member of ‘The Beatles’?Query: true

Results:

Is Elvis Presley member of ‘The Beatles’?Query: false

Results: PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX dbpedia-ont: <http://dbpedia.org/ontology/>PREFIX mo: http://purl.org/ontology/mo/

ASK WHERE { dbpedia:The_Beatles mo:memberdbpedia:Elvis_Presley.}

Page 75: Exploration, visualization and querying of linked open data sources

Query Form: SELECT• The solution modifier projection nominates which

components of the matches should be returned

• “*” means all components should be returned

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX mo: <http://purl.org/ontology/mo/>SELECT ?album_name ?track_titleWHERE {

dbpedia:The_Beatles foaf:made ?album .?album dc:title ?album_name ;

mo:track ?track .?track dc:title ?track_title .}

Query: What albums and tracks did ‘The Beatles’ make?

Page 76: Exploration, visualization and querying of linked open data sources

Filter expressions

• Different types of filters and functions may be used

Query Form: SELECT (2)

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX mo: <http://purl.org/ontology/mo/>SELECT ?album_name ?track_title ?date ?durationWHERE {

dbpedia:The_Beatles foaf:made ?album .?album dc:title ?album_name ;

mo:track ?track . ?track dc:title ?track_title ;

mo:duration ?duration;FILTER (?duration>300000 && ?duration<400000) }

Query:

Filter: Comparison and logical operatorsRetrieve the albums and tracks recorded by ‘The Beatles’, where the duration of the song is more than 300 secs. and no longer than 400 secs.

Page 77: Exploration, visualization and querying of linked open data sources

Aggregates

• Calculate aggregate values: COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT and SAMPLE

• Built around the GROUP BY operator

• Prune at group level (cf. FILTER) using HAVING

Query Form: SELECT (3)

PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX mo: <http://purl.org/ontology/mo/>SELECT ?album (SUM(?track_duration) AS ?album_duration)WHERE {

dbpedia:The_Beatles foaf:made ?album .?album mo:track ?track .?track mo:duration ?track_duration .

} GROUP BY ?album HAVING (SUM(?track_duration) > 3600000)

Retrieve the duration of the albums recorded by ‘The Beatles’.Query:

Page 78: Exploration, visualization and querying of linked open data sources

Exercise 9 – British Museum

• Find the British Museum Collection dataset.

• Find the related SPARQL endpoint

• Look at information about "The Rosetta Stone"

• http://collection.britishmuseum.org/sparql

Page 79: Exploration, visualization and querying of linked open data sources
Page 80: Exploration, visualization and querying of linked open data sources

Linked Data Publishing Platforms/Frameworks

• D2R Server: a tool for publishing relational databases as Linked Data

• Talis Platform: the Talis Platform provides Linked Data-compliant hosting for content and RDF data

• Pubby: a Linked Data frontend for SPARQL Endpoints

• Paget: a framework for building Linked Data applications

• Linked Media Framework: a Linked Data server with updates and semantic search

• PublishMyData: A Linked Data Publishing Platform run by Swirrl. RDF data-hosting, Linked Data API, SPARQL endpoint and customisable visualisations.