dbpedia - uni-mannheim.dewifo5-03.informatik.uni-mannheim.de/bizer/pub/tu... · outline 1. the...
TRANSCRIPT
DBpediaDBpediaand theand the
Web of DataWeb of DataProf. Dr. Chris Bizer
F i U i ität B liFreie Universität Berlin
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)Berlin. November 28, 2008
Hello
Name Chris Bizer
Job Junior- Professor at Freie Universität Berlin
Projects Projects RAP - RDF API for PHP (together with Universität Leipzig) D2RQ und D2R Server (together with HP Labs Bristol) D2RQ und D2R Server (together with HP Labs Bristol) Named Graphs and NG4J (together with HP Labs Bristol) Fresnel Display Vocabulary (together with MIT and INRIA) Fresnel Display Vocabulary (together with MIT and INRIA) DBpedia (together with Universität Leipzig and OpenLink) Linking Open Data (community project sponsored by W3C) Linking Open Data (community project sponsored by W3C)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Outline
1. The DBpedia Project
2. The Web of Data
3. Linked Data Deployment on the Web
4. Linked Data Applications
5. What is next?
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DBpedia
DBpedia is a community effort to extract structured information from Wikipedia make this information available on the Web under an open license interlink the DBpedia dataset with other open datasets on the Web
Contributors Freie Universität Berlin (Germany) Universität Leipzig (Germany) OpenLink Software (UK)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Structured Information in Wikipedia
Wikipedia consists of 11.7 million articles (2.62 million in English) in 264 languages monthly growth-rate: 4%
Wikipedia articles contain structured information infoboxes which use a template mechanism categorization of the article images depicting the article’s topic links to external webpages intra-wiki links to other articles inter-language links to articles about the same topic
in different languages
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Extracting Structured Information from Wikipedia
http://en.wikipedia.org/wiki/Calgary
<http://dbpedia.org/resource/Calgary>
dbpedia:native_name “Calgary” ;
dbpedia:elevation “1048” ;
dbpedia:population_city “988193” ;
db di l ti t “1079310”dbpedia:population_metro “1079310” ;
mayor_name
dbpedia:Dave Bronconnier ;dbpedia:Dave_Bronconnier ;
governing_body
dbpedia:Calgary_City_Council ;_ _
...
using a PHP extraction frameworkusing a PHP extraction framework
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The DBpedia Dataset
Data about 2.6 million “things” including at least including at least 213,000 persons 328,000 places p 57,000 music albums 36,000 films 20,000 companies 80,000 species
Altogether 247 million pieces of information (RDF triples) 29 million triples originate from infoboxes p g 588,000 links to pictures 3,150,000 links to relevant external web pages 4,878,000 RDF links into other databases 414,000 Wikipedia categories 75 000 YAGO categories
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
75,000 YAGO categories
Multi-Lingual Abstracts
The dataset contains a short and a long abstract for each concept.concept.
Short abstracts English: 2 600 000 English: 2,600,000 German: 391,000 French: 383 000 French: 383,000 Dutch: 284,000 Polish: 256 000 Polish: 256,000 Italian: 286,000 Spanish: 226 000 Spanish: 226,000 Japanese: 199,000 Portuguese: 246 000 Portuguese: 246,000 Swedish: 144,000 Chinese 101 000
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Chinese: 101,000
Accessing the DBpedia Dataset over the Web
1. SPARQL Endpoint
2 Linked Data Interface2. Linked Data Interface
3. Data Dumps for Download
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The DBpedia SPARQL Endpoint
http://dbpedia.org/sparql
hosted on a OpenLink Virtuoso server
can answer SPARQL queries like can answer SPARQL queries like Give me all Sitcoms that are set in NYC? All German musicians that were born in Berlin in the 19th century? All German musicians that were born in Berlin in the 19th century? All tennis players from Moscow? All films by Quentin Tarentino? All films by Quentin Tarentino? All soccer players with tricot number 11, playing for a club having a
stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The Web of Documents
The Web is a single information space b ild t d d d h li kbuild on open standards and hyperlinks.
Web Browsers
Search Engines
HTTP
HTML HTML HTMLhyper h h
HTMLhyperlinks
hyperlinks
hyperlinks
A B C D
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
A B C D
Linked Data
Use RDF and HTTP to1. publish structured data on the Web,2. set links between data from one data source2. set links between data from one data source
to data within other data sources.
Thing Thing Thing Thing ThingThing
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
datalink
datalink
datalink
datalink
B CA D E
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Linked Data Principles
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3 When someone looks up a URI provide useful RDF3. When someone looks up a URI, provide useful RDF information.
4 I l d RDF t t t th t li k t th URI th t4. Include RDF statements that link to other URIs so that they can discover related things.
Tim Berners-Lee 2007
htt // 3 /D i I /Li k dD t ht lhttp://www.w3.org/DesignIssues/LinkedData.html
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The RDF Data Model
rdf:type
f f
foaf:Personrdf:type
pd:cygri
Richard Cyganiakfoaf:name
foaf:based neardbpedia:Berlin
foaf:based_near
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Data objects are identified with HTTP URIs
rdf:typepd:cygri
f f
foaf:Personrdf:type
Richard Cyganiakfoaf:name
foaf:based neardbpedia:Berlin
foaf:based_near
pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygridbpedia:Berlin = http://dbpedia.org/resource/Berlindbpedia:Berlin http://dbpedia.org/resource/Berlin
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Dereferencing URIs over the Web
rdf:type
3 405 259f f
foaf:Personrdf:type
pd:cygri
3.405.259dp:populationRichard Cyganiak
foaf:name
foaf:based near
skos:subject
dbpedia:Berlinfoaf:based_near
d Citi i G
skos:subject
dp:Cities_in_Germany
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Dereferencing URIs over the Web
rdf:type
3 405 259f f
foaf:Personrdf:type
pd:cygri
3.405.259dp:populationRichard Cyganiak
foaf:name
foaf:based near
skos:subject
dbpedia:Berlinfoaf:based_near
d Citi i G
skos:subject
db di H bskos:subject
dp:Cities_in_Germanydbpedia:Hamburg
dbpedia:Muenchen skos:subject
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
3. Linked Data Deployment on the Web
Is this real?
Thing Thing Thing Thing Thing
Thing Thing Thing Thing Thing
typedlinks
typedlinks
typedlinks
typedlinks
B CA D E
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
W3C Linking Open Data Project
Community effort toy publish existing open license datasets as Linked Data on the Web interlink things between different data sources
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
LOD Datasets on the Web: May 2007
Over 500 million RDF triples
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Around 120,000 RDF links between data sources
LOD Datasets on the Web: August 2007
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
LOD Datasets on the Web: February 2008
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
LOD Datasets on the Web: September 2008
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Triple Count
More than 2 billion RDF triplesMore than 2 billion RDF triples
More than 5 million links between datasets.(rough estimates)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Organizations publishing Linked Data
Universities and Research Institutes Massachusetts Institute of Technology (USA) University of Southampton (UK) Freie Universität Berlin (DE) DERI (IRE) Companies
BBC (UK) KMi, Open University (UK) University of London (UK)
BBC (UK) OpenLink (UK) Zitgist (USA)
Universität Hannover (DE) University of Pennsylvania (USA)
Zitgist (USA) Talis (UK) Garlik (UK)
Universität Leipzig (DE) Universität Karlsruhe (DE)
( ) Mondeca (FR) Cyc Foundation
Joanneum (AT) University of Toronto (CA)
y(USA)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The Bio2RDF Project
Goals1. Make bioinformatics data available in RDF format on the Web.1. Make bioinformatics data available in RDF format on the Web.2. Promote the linked data vision within the bioinformatics community. 3. Answer questions which were not possible or practical to ask before.
Participants Université Laval, Canada Queensland University of Technology, Australia
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The Bio2RDF Cloud
27 data sources
260 million records
2,7 billion RDF triples
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
4. Applications
What can I do with this?
Search Linked DataLinked DataEnginesMashupsBrowsers
Thing
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
typedlinks
typedlinks
typedlinks
typedlinks
Thing Thing Thing Thing Thing
B C
links
A D E
links links links
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
B CA D E
Linked Data Browsers
Tabulator Browser (MIT, USA)
Disco Hyperdata Browser (FU Berlin, DE)Disco Hyperdata Browser (FU Berlin, DE)
OpenLink RDF Browser (OpenLink, UK)
Zitgist RDF Browser (Zitgist, USA)
Humboldt (HP Labs, UK)( , )
Fenfire (DERI, Irland)
Marbles (FU Berlin, DE)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Linked Data Mashups
D i ifi li ti i Li k d D t f th W bDomain-specific applications using Linked Data from the Web
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DBtune Slashfacet
Visualizes music-related Linked DataUses LastFM MySpace and BBC dataUses LastFM, MySpace, and BBC data
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DBpedia Mobile
Geospatial entry point into the Web of Datainto the Web of Data
Starts with DBpedia, R d Fli k d tRevyu and Flickr data
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Web of Data Search Engines
Falcons (IWS, China)
Sindice (DERI, Ireland)
MicroSearch (Yahoo, Spain)( , p )
Watson (Open University, UK)
SWSE (DERI, Ireland)
Swoogle (UMBC, USA)g ( )
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Better Interfaces for Common Wikipedia Users
Direction: free-text search + facet-browsing
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Live Update
Current Situation DBpedia update cycle: 3 month Wikimedia Foundation provides us with live update stream
Opportunity Increase the currency of the DBpedia dataset using this update stream
Result DBpedia in synchronization with Wikipedia DBpedia in synchronization with Wikipedia.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Cross-Language Data Fusion
Opportunity there are 264 Wikipedia Editions in different languages. there are cross-language links. the Italian Wikipedia knows more about Italian villages then
the English one. the German Wikipedia contains more person infoboxes than the German Wikipedia contains more person infoboxes than
the English one.
Idea Idea Augment the infobox dataset with facts from other Wikipedia editions.
Result A much richer DBpedia dataset.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Augment DBpedia with Data from External Sources
Opportunity the Linking Open Data cloud provides lots of useful data
which is not contained in Wikipedia yet. For instance: For instance:
- EuroStat provides additional statistical information about countries.- Musicbrainz contains additional information about other bands.- Geonames provides additional information about locations.
Idea Augment DBpedia with additional data from external sources.
ResultResult A much richer DBpedia dataset.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Contribute back to the Wikipedia Community
Opportunity augmentation with data from the LOD cloud makes the DBpedia dataset
richer than Wikipedia itself. infobox data is extracted from Wikipedia editions in various languages infobox data is extracted from Wikipedia editions in various languages.
Idea Extend the Wikipedia authoring environment with
- Suggestions for infobox values- Cross-language consistency checking for infoboxes- Cross-language consistency checking for infoboxes
Initialize Wikipedia Clean-Up Cycles Initialize Wikipedia Clean-Up Cycles Data-driven search interfaces expose the weaknesses of Wikipedia
template system. Preferred items not showing up in end-user interfaces may motivate
Wikipedia editors to use templates more stringently.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
What is next for the Web of Data?
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Linking
1 Increase the amount of links between datasets1. Increase the amount of links between datasets2. Increase the quality of these links
Today: Simple pattern- and graph-matching based techniques used for automated interlinking.
There is lots of existing work in database and knowledge representation communities on identity resolution to berepresentation communities on identity resolution to be used.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Data Fusion
Application Users want an integratedUsers want an integrated view on all data that is available about an object!
IntegratedView
a a ab e about a object
Raises well known but unsolved problems: S h i
owl:sameAs Schema mapping Inconsistency resolution T t / i f ti lit
DataObject 1
DataObject 3
DataObject 5
Trust / information qualityDataObject 2
DataObject 4
DataObject 6
owl:sameAsowl:sameAs
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
B CA
Thanks!
References DBpedia
http://dbpedia org/Abouthttp://dbpedia.org/About
W3C Linking Open Data Project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Tutorial: How to Publish Linked Data on the Webhttp://www4 wiwiss fu berlin de/bizer/pub/LinkedDataTutorial/
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/