linked data evolving the web into a global ... -...
TRANSCRIPT
Linked DataEvolving the Web into a Global Data Space
Anja Jentzsch, Freie Universität Berlin
05 October 2011
EuropeanaTech 2011, Vienna
1
Architecture of the classic Web
B C
HTMLHTML
Web Browsers
Search Engines
hyper-links
A
HTML
Single global document space
Small set of simple standards
1. HTML as document format
2. HTTP URLs as
• globally unique IDs
• retrieval mechanism
3. Hyperlinks to connect everything
2
Web 2.0 APIs and Mashups
No single global data space
Shortcomings
1. APIs have proprietary interfaces
2. Mashups are based on a fixed set of data sources
3. No hyperlinks between data items within different APIs
WebAPI
A
Mashup
WebAPI
B
WebAPI
C
WebAPI
D
3
Web APIs slice the Web into Walled Gardens
Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY 4
Extend the Web with a single global data space
1. by using RDF to publish structured data on the Web
2. by setting links between data items within different data sources
Linked Data
B C
RDF
RDFLinks
A D E
RDFLinks
RDFLinks
RDFLinks
RDF
RDF
RDF
RDF
RDF RDF
RDF
RDF
RDF
5
Linked Data Principles
Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web.
1. Use URIs as names for things.2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful RDF information.4. Include RDF statements that link to other URIs so that they can discover
related things.
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006
6
The RDF Data Model
Chris Bizer
dbpedia:Berlin
foaf:name
foaf:based_near
foaf:Personrdf:typepd:chris
7
Data Items are identified with HTTP URIs
pd:chris = http://www.bizer.de#chrisdbpedia:Berlin = http://dbpedia.org/resource/Berlin
foaf:name
foaf:based_near
foaf:Personrdf:typepd:chris
dbpedia:Berlin
Chris Bizer
8
Resolving URIs over the Web
dp:Cities_in_Germany
3.450.889dp:population
skos:subjectdbpedia:Berlin
foaf:name
foaf:based_near
foaf:Personrdf:typepd:chris
Chris Bizer
9
Dereferencing URIs over the Web
dbpedia:Hamburg
dbpedia:Muenchen
skos:subject
skos:subjectdp:Cities_in_Germany
3.450.889dp:population
skos:subjectdbpedia:Berlin
foaf:name
foaf:based_near
foaf:Personrdf:typepd:chris
Chris Bizer
10
Properties of the Web of Linked Data
• Global, distributed data space build on a simple set of standards
• RDF, URIs, HTTP
• Entities are connected by links
• creating a global data graph that spans data sources and
• enables the discovery of new data sources
• Provides for data-coexistence
• Everyone can publish data to the Web of Linked Data
• Everyone can express their personal view on things
• Everybody can use the vocabularies/schema that they like
11
W3C Linking Open Data Project
• Grassroots community effort to
• publish existing open license datasets as Linked Data on the Web
• interlink things between different data sources
12
LOD Data Sets on the Web: May 2007
• 12 data sets
• Over 500 million RDF triples
• Around 120,000 RDF links between data sources 13
LOD Data Sets on the Web: November 2007
• 28 data sets14
LOD Data Sets on the Web: September 2008
• 45 data sets
• Over 2 billion RDF triples 15
LOD Data Sets on the Web: July 2009
• 95 data sets
• Over 6.5 billion RDF triples 16
LOD Data Sets on the Web: September 2010
• 203 data sets
• Over 24,7 billion RDF triples
• Over 436 million RDF links between data sources 17
LOD Data Sets on the Web: September 2011
• 295 data sets
• Over 31 billion RDF triples
• Over 504 million RDF links between data sources 18
LOD Data Set statistics as of 09/2011
LOD Cloud Data Catalog on CKAN
• http://www.ckan.net/group/lodcloud
More statistics
• http://lod-cloud.net/state/
19
DBpedia – The Hub on the Web of Data
• DBpedia is a joint project with the following goals
• extracting structured information from Wikipedia
• publish this information under an open license on the Web
• setting links to other data sources
• Partners
• Freie Universität Berlin (Germany)
• Universität Leipzig (Germany)
• OpenLink Software (UK)
• neofonie (Germany)
Extracting structured data from Wikipedia
Extracting structured data from Wikipedia
! dbpedia:Berlin rdf:type dbpedia-owl:City ,! ! dbpedia-owl:PopulatedPlace ,! ! dbpedia-owl:Place ;! rdfs:label "Berlin"@en ,! ! "Berlino"@it ;! dbpedia-owl:population 3450889 ;! wgs84:lat 52.500557 ;! wgs84:long 13.398889 .! dbpedia:SoundCloud dbpedia-owl:location dbpedia:Berlin .
• access to DBpedia data:• dumps• SPARQL endpoint• Linked Data interface
The DBpedia Data Set
• Information on more than 3.64 million “things”
• 416,000 persons• 169,000 organisations• 526,000 places• 106,000 music albums• 60,000 movies• 183,000 species• 24,000 books
• overall more than 1 billion RDF triples
• title and abstract in 97 different languages• 2,724,000 links to images• 6,300,000 links to external web pages• 6,200,000 links to other Linked Data sets
DBpedia Mappings
• since March 2010 collaborative editing of
• DBpedia ontology• mappings from Wikipedia infoboxes and tables to DBpedia ontology
• curated in a public wiki with instant validation methods
• http://mappings.dbpedia.org• multi-langual mappings to the DBpedia ontology:
• ca, de, el, en, es, fr, ga, hr, hu, it, ko, nl, pl, pt, ru, sl, tr
• allows for a significant increase of the extracted data’s quality
• each domain has its experts
DBpedia Mobile
• displays Wikipedia data on map
• aggregates different data sources
Faceted Wikipedia Search
http://dbpedia.nefonie.de
Uptake in the Government Domain
• The EU is pushing Linked Data (LOD2, LATC, Eurostat)
• W3C Government Linked Data (GLD) Working Group
Uptake in the Libraries Community• Institutions publishing Linked Data
• Library of Congress (subject headings)
• German National Library (PND dataset and subject headings)
• Swedish National Library (Libris - catalog)
• Hungarian National Library (OPAC and Digital Library)
• British National Library
• Europeana project
30
Uptake in the Libraries Community• W3C Library Linked Data Incubator Group (2010)
• OKFN Working Group on Bibliographic Data (2010)
• Goals:
• Integrate Library Catalogs on global scale
• Interconnect resources between repositories (by topic, by location, by historical period, by ...)
31
Uptake in the Media Industry• Publish data as RDF/XML or RDFa
• Goal: Drive traffic to websites via search engines
32
Connecting the classic Web with the Web of Data
• Annotate Web documents with Linked Data URIs
• Goals
• Connect everything
• Display Web of Data content as info boxes next to news or blog posts
• Improve search by using Linked Data as background knowledge
• (Semi-) Automated Annotation Services using Named Entity Recognition
• Open Calais (Thomsons Reuters) for news
• Zemanta (startup) for blog posts
• DBpedia Spotlight
<http://data.semanticweb.org/conference/eswc/2007/paper-69> dc:subject <http://dbpedia.org/resource/Machine_learning> .
33
Linked Data Applications
B C
Thing
typed links
A D E
typed links
typed links
typed links
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
Search Engines
Linked Data Mashups
Linked Data Browsers
34
35
36
37
Lower Data Integration Costs
• Data Publisher
• publishes data as RDF
• sets identity links
• reuses terms or publishes mappings
• Third Parties
• set identity links pointing at your data
• publish mappings to the Web
• Data Consumer
• has to do the rest
• using record linkage and schema matching techniques
The overall data integration effort is split between the data publisher, the data consumer and third parties.
38
How to publish Linked Data
Tasks:
1. Make data available as RDF via HTTP2. Set RDF links pointing at other data sources3. Make your data self-descriptive4. Reuse common vocabularies
Tom Heath, Christian Bizer : Linked Data: Evolving the Web into a Global Data Space
http://linkeddatabook.com/39
Is your data 5 star?
★
★ ★
★ ★ ★
★ ★ ★ ★
★ ★ ★ ★ ★
Make your stuff available on the Web (whatever format) under an open license.
Make it available as structured data (e.g., Excel instead of image scan of a table) so that it can be reused.
Use non-proprietary, open formats (e.g., CSV instead of Excel).
Use URIs to identify things, so that people can point at your stuff and serve RDF from it.
Link your data to other data to provide context.
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010
40
Conclusion
• Linked Data provides a standardized data access interface• Linked Data allows for the development of a variety of tools to integrate,
enhance and and view the data• The Web of Data is growing rapidly
• There are active deployment communities in different domains
• Web search is evolving into query answering
• Search engines will increasingly rely on structured data from the Web
41
ThanksQuestions?
References• Tom Heath, Christian Bizer : Linked Data: Evolving the Web into a Global Data Space
http://linkeddatabook.com/• Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far
http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf • Linking Open Data Project Wiki
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
42
Email: [email protected] : @anjeve