dbpedia mappings wiki, smwcon fall 2013, berlin

37
DBpedia Mappings Wiki Anja Jentzsch - @anjeve Hasso-Plattner-Institute, Potsdam, Germany SMWCon Fall 2013 2103/10/30

Upload: anja-jentzsch

Post on 06-May-2015

772 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Mappings Wiki

Anja Jentzsch - @anjeve Hasso-Plattner-Institute, Potsdam, Germany

!SMWCon Fall 2013

2103/10/30

Page 2: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Linked Data Principles

Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web.

1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover

related things. Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006

Page 3: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Properties of the Web of Linked Data

• Global, distributed dataspace build on a simple set of standards • RDF, URIs, HTTP

• Entities are connected by links • creating a global data graph that spans data sources and • enables the discovery of new data sources

• Provides for data-coexistence • Everyone can publish data to the Web of Linked Data • Everyone can express their personal view on things • Everybody can use the vocabularies/schemas that they like

Page 4: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

W3C Linking Open Data Project [2007]

• Grassroots community effort to • publish existing open license datasets as Linked Data on the Web • interlink things between different data sources

Page 5: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

LOD Data Sets on the Web: September 2011

• 295 data sets • Over 31 billion RDF triples • Over 504 million RDF links between data sources http://lod-cloud.net

Page 6: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

LOD Data Set statistics

LOD Cloud Data Catalog on the Data Hub • http://datahub.io/group/lodcloud More statistics • http://lod-cloud.net/state/

Page 7: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia [2007]• DBpedia is a joint project with the following goals

• extracting structured information from Wikipedia • publish this information under an open license on the Web • setting links to other data sources

!• Partners

• Universität Mannheim (Germany) • Universität Leipzig (Germany) • OpenLink Software (UK)

Page 8: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Extracting structured data from Wikipedia

Page 9: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Extracting structured data from Wikipediadbpedia:Berlin rdf:type dbpedia-owl:City ,

dbpedia-owl:PopulatedPlace , dbpedia-owl:Place ; rdfs:label "Berlin"@en , "Berlino"@it ; dbpedia-owl:population 3499879 ; wgs84:lat 52.500557 ; wgs84:long 13.398889 .

! dbpedia:SoundCloud dbpedia-owl:location dbpedia:Berlin .

• Access to DBpedia data: • Dumps • SPARQL endpoint • Linked Data interface

Page 10: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

The DBpedia Data Set

• Information on more than 4 million “things” • 832,000 persons • 209,000 organisations • 639,000 places • 116,000 music albums • 78,000 movies • 226,000 species

• overall more than 2.4 billion RDF triples • localised versions in 119 languages • 24.6 million links to images • 27.6 million links to external web pages • 45 million links to other Linked Data sets

Page 11: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Use Cases

1. Hub for the growing Web of Data 2. Data source for applications and mashups 3. Improvement of Wikipedia search 4. Text analysis and annotation

Page 12: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 13: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Mobile

• displays Wikipedia data on map • aggregates different data sources

Page 14: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Faceted Wikipedia Search

• faceted browsing and free text search

Page 15: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 16: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

http://spotlight.dbpedia.org

Page 17: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Information Extraction Framework (DIEF)

• Open source: http://github.com/dbpedia • More than 30 developers • Written in Scala & Java • Can be adapted to other MediaWikis

• adaption to Wiktionary http://wiktionary.dbpedia.org

Page 18: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DIEF Architecture

Page 19: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DIEF

• Simple approach, huge generality • Inconsistency in property naming

• Different infobox properties can have different names for the same meaning (e.g. born vs birth_date vs birthDate)

• Inconsistency in property data types • Data types are determined by resource with a simple greedy algorithm

Page 20: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Mapping-Based Infobox Extraction

• Correct semantics • Combine what belongs together (birth_place, Geburtsort) • Divide what is different (born, Geburtsort) • Huge impact on precision & recall

Page 21: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Mappings Wiki

• since March 2010 collaborative editing of • DBpedia ontology • mappings from Wikipedia infoboxes and tables to DBpedia ontology

• curated in a public wiki with instant validation methods • http://mappings.dbpedia.org

• multi-langual mappings to the DBpedia ontology: • ar, bg, bn, ca, cs, de, el, en, es, et, eu, fr, ga, hi, hr, hu, it, ja, ko, nl, pl, pt, ru, sl,

tr !

• allows for a significant increase of the extracted data’s quality • each domain has its experts

• ~ 170 active editors

Page 22: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Mappings Wiki Details

• MediaWiki plus • Extensions for

• validating mappings • storing and validating the ontology

• Templates for • ontology definition • mapping infoboxes to the ontology

• custom templates: date intervals, conditions, geo coordinates etc. !

• DBpedia Server • Ontology storage • Mapping validation

Page 23: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 24: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 25: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 26: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 27: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Classes and Properties

Page 28: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 29: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Page 30: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Test Mappings

Page 31: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Validate Mappings

Page 32: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia 3.9 Mapping Statistics

• 3177 template mappings • 529 classes • 927 object properties • 1,290 datatype properties • 116 specialized datatype properties • 46 owl:equivalentClass and 31 owl:equivalentProperty mappings to http://

schema.org

Page 33: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Mapping Edits

Page 34: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

DBpedia Mapping Coverage

Page 35: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Google Summer of Code [2013]

• Mapping from DBpedia to Wikidata properties • Dump from Wikidata facts with mapped properties and dataypes !

• http://wiki.dbpedia.org/gsoc2013/ideas/WikidataMappings

Page 36: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Ongoing & Future Work

• Multilingual data integration and fusion • Community-driven data quality improvement • Inline extraction • DBpedia and NLP

• structured background knowledge for e.g. named entity recognition and disambiguation

• Collaboration between Wikidata and DBpedia

Page 37: DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin

Thanks!

References: • DBpedia http://dbpedia.org • DBpedia Mappings Wiki http://mappings.dbpedia.org • LOD Cloud http://lod-cloud.net • LOD Data Set Catalogue http://www.datahub.io/group/lodcloud

Email: [email protected] Twitter : @anjeve