linked data

52
Linked Data - Evolving the Web into a Global Dataspace Anja Jentzsch - @anjeve Hasso Plattner Institute, Potsdam, Germany Open Data Lecture, HTW Berlin 2015/01/12

Upload: anja-jentzsch

Post on 15-Jul-2015

90 views

Category:

Science


2 download

TRANSCRIPT

Linked Data -

Evolving the Web into a Global Dataspace

Anja Jentzsch - @anjeve Hasso Plattner Institute, Potsdam, Germany

!!!

Open Data Lecture, HTW Berlin 2015/01/12

Architecture of the classic Web

B C

HTMLHTML

Web Browsers

Search Engines

hyper- links

A

HTML

• Single global document space • Small set of simple standards • HTML as document format • HTTP URLs as globally unique IDs !

• Retrieval mechanism: Hyperlinks to connect everything

Web 2.0 APIs and Mashups

No single global dataspace !Shortcomings 1. APIs have proprietary interfaces 2. Mashups are based on a fixed set of

data sources 3. No hyperlinks between data items

within different APIsWeb API

A

Mashup

Web API

B

Web API

C

Web API

D

Web APIs slice the Web into Walled Gardens

Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY

Extend the Web with a single global dataspace 1. by using RDF to publish structured data on the Web 2. by setting links between data items within different data sources

Linked Data

B C

RDF

RDF Links

A D E

RDF Links

RDF Links

RDF Links

RDF

RDF

RDF

RDF

RDF RDF

RDF

RDF

RDF

Linked Data Principles

Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web.

1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover

related things. Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006

The RDF Data Model

Anja Jentzsch

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typens:anja

Data Items are identified with HTTP URIs

ns:anja = http://www.anjeve.de#anjadbpedia:Berlin = http://dbpedia.org/resource/Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typens:anja

dbpedia:Berlin

Anja Jentzsch

Resolving URIs over the Web

dp:Cities_in_Germany

3.499.879dp:population

skos:subjectdbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typens:anja

Anja Jentzsch

Dereferencing URIs over the Web

dbpedia:Hamburg

dbpedia:Muenchen

skos:subject

skos:subjectdp:Cities_in_Germany

3.499.879dp:population

skos:subjectdbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:typens:anja

Anja Jentzsch

RDF Representation Formats

• RDF/XML <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person rdf:about="http://anjeve.de#anja"> <foaf:name>Anja Jentzsch</foaf:name> </foaf:Person>

!• RDF N-Triples <http://anjeve.de#anja> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://anjeve.de#anja> <http://xmlns.com/foaf/0.1/name> „Anja Jentzsch“ .

RDF Representation Formats

!!!<http://anjeve.de#anja> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>. !

<Subject> <Predicate> <Object> .

!In the end it‘s all triples!

foaf:Personrdf:typens:anja

Properties of the Web of Linked Data

• Global, distributed dataspace build on a simple set of standards • RDF, URIs, HTTP

• Entities are connected by links • creating a global data graph that spans data sets and • enables the discovery of new data sources

• Provides for data-coexistence • Everyone can publish data to the Web of Linked Data • Everyone can express their personal view on things • Everybody can use the vocabularies/schemas that they like

W3C Linking Open Data Project

• Grassroots community effort to • publish existing open license datasets as Linked Data on the Web • interlink things between different data sets

LOD Data Sets on the Web: May 2007

• 12 data sets • 500+ million RDF triples • 120,000+ RDF links between data sets

LOD Data Sets on the Web: November 2007

• 28 data sets

LOD Data Sets on the Web: September 2008

• 45 data sets • 2+ billion RDF triples

LOD Data Sets on the Web: July 2009

• 95 data sets • 6.5+ billion RDF triples

LOD Data Sets on the Web: September 2010

• 203 data sets • Over 24,7 billion RDF triples • Over 436 million RDF links between data sets

LOD Data Sets on the Web: September 2011

• 295 data sets • 31+ billion RDF triples • 504+ million RDF links between data sets http://lod-cloud.net

LOD Data Sets on the Web: August 2014

http://lod-cloud.net

• 1,019 data sets • 84+ billion RDF triples • 808+ million RDF links between data sets

LOD Data Set statistics as of 08/2014

LOD Cloud Data Catalog on the Data Hub • http://datahub.io/group/lodcloud More statistics • http://lod-cloud.net/state/

Heterogeneity on the Web of Data

• The Web of Data is heterogeneous • Many different vocabularies are in use (469 as of January 2015) • Different data formats • Many different ways to represent the same information

DBpedia – The Hub on the Web of Data

• DBpedia is a joint project with the following goals • extracting structured information from Wikipedia • publish this information under an open license on the Web • setting links to other data sources

!• Partners

• Universität Mannheim (Germany) • Universität Leipzig (Germany) • OpenLink Software (UK)

Extracting structured data from Wikipedia

Extracting structured data from Wikipediadbpedia:Berlin rdf:type dbpedia-owl:City ,

dbpedia-owl:PopulatedPlace , dbpedia-owl:Place ; rdfs:label "Berlin"@en , "Berlino"@it ; dbpedia-owl:population 3499879 ; wgs84:lat 52.500557 ; wgs84:long 13.398889 .

! dbpedia:SoundCloud dbpedia-owl:location dbpedia:Berlin .

• access to DBpedia data: • dumps • SPARQL endpoint • Linked Data interface

The DBpedia Data Set

• Information on more than 4.58 million “things” • 1,445,000 persons • 241,000 organisations • 735,000 places • 123,000 music albums • 87,000 movies • 251,000 species

• overall more than 3 billion RDF triples • title and abstract in 125 different languages • 25,200,000 links to images • 29,800,000 links to external web pages • 50,000,000 links to other Linked Data sets

DBpedia Mappings

• since March 2010 collaborative editing of • DBpedia ontology • mappings from Wikipedia infoboxes and tables to DBpedia ontology

• curated in a public wiki with instant validation methods • http://mappings.dbpedia.org

• multilingual mappings to the DBpedia ontology: • ar, be, bg, bn, ca, cs, cy, de, el, en, eo, es, et, eu, fr, ga, hi, hr, hu, id, it, ja, ko, nl,

pl, pt, ru, sk, sl, sr, tr, ur, zh !

• allows for a significant increase of the extracted data’s quality • each domain has its experts

DBpedia Use Cases

1. Hub for the growing Web of Data 2. Improvement of Wikipedia search 3. Data source for applications and mashups 4. Text analysis and annotation

DBpedia Mobile

• displays Wikipedia data on a map • aggregates different data sources

Faceted Wikipedia Search

• faceted browsing and free text search

http://spotlight.dbpedia.org

Uptake in the Government Domain

• The EU is pushing Linked Data (LOD2, LATC, EuroStat) • W3C eGovernment Interest Group

Uptake in the Library Community• Libraries publishing Linked Data

• Library of Congress (subject headings) • German National Library (PND dataset and subject headings) • Swedish National Library (Libris - catalog) • Hungarian National Library (OPAC and Digital Library) • Europeana

• W3C Library Linked Data Incubator Group • Goals:

• Integrate library catalogs on global scale • Interconnect resources between repositories (by topic, by location, by

historical period, by ...)

Uptake in Life Sciences

• W3C Linking Open Drug Data Effort • Bio2RDF Project • Allen Brain Atlas !!!!!!

• Goal: Smoothly integrate internal and external data in a pay-as-you-go-fashion.

Uptake in the Media Industry• Publish data as Linked Data or RDFa • Goal: Drive traffic to websites via

search engines

Linked Data Applications

Linked Data Browsers

• Tabulator Browser (MIT, USA) • Marbles (MES / Uni Mannheim, DE) • OpenLink RDF Browser (OpenLink, UK) • Zitgist RDF Browser (Zitgist, USA) • Humboldt (HP Labs, UK) • Disco Hyperdata Browser (Uni Mannheim, DE) • Fenfire (DERI, Irland)

Web of Data Search Engines

• Sig.ma (DERI, Ireland) • Falcons (IWS, China) • Swoogle (UMBC, USA) • VisiNav (DERI, Ireland) • Watson (Open University, UK)

44

Finding Linked Data sets

• Search engines • find data sets based on keywords !

• Data catalogs / directories • explore data sets and faceted search !

• Data Marketplaces • explore and consume data sets

45

Is your data 5 star?

Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010

Make your data available on the Web (in whatever format) under an open license. Make it available as structured data (e.g., Excel instead of image scan of a table) so that it can be reused. Use non-proprietary, open formats (e.g., CSV instead of Excel). !Use URIs to identify things, so that people can point at your stuff and serve RDF from it. Link your data to other data to provide context.

★ ★

★ ★ ★

★ ★ ★ ★

★ ★ ★ ★ ★

Economic Opportunities

• Huge reservoir of free content which can be used to provide background facts about

• places,

• people,

• topics

• …

Background Facts

schema.org

• jointly proposed vocabularies for embedding data into HTML pages (Microdata) • available since June 2011

Options for Search Companies

• New vertical search engines • Sig.ma, FalconS, …

• Google Knowledge Graph • 1.6 billion facts (2014)

• Yahoo • Improving text search with background knowledge

Lower Data Integration Costs

Overall data integration effort is split between: !

• Data Publisher – publishes data as RDF – sets identity links – reuses terms or publishes mappings

• Third Parties – set identity links pointing at your data – publish mappings to the Web

• Data Consumer – has to do the rest – using record linkage and schema matching

techniques

Conclusion

• The Web of Data is growing fast • Linked Data provides a standardized data access interface • Allows for easy data integration, enhancement and browsing • Web search is evolving into query answering

• Search engines will increasingly rely on structured data from the Web • Next step: Linked Data within enterprises

• Alternative to data warehouses and EAI middleware • Advantages: schema-less data model, pay-as-you go data integration

Thanks!

References: • Tom Heath, Christian Bizer : Linked Data: Evolving the Web into a Global Data Space

http://linkeddatabook.com/ • Linking Open Data Project Wiki

http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

Email: [email protected] Twitter : @anjeve