fusing openstreetmap with wikipedia
Post on 29-Aug-2014
55 Views
Preview:
DESCRIPTION
TRANSCRIPT
Fusing OpenStreetMap with WikipediaUlmon GmbH
08/05/2014 Linuxwochen Wien
Hello from
08/05/2014 Linuxwochen Wien
Ulmon’s recipe for a travel guideFuse sources of data to create a whole more valuable than its parts
08/05/2014 Linuxwochen Wien
Wikipedia and OSM in CityMaps2Go
08/05/2014 Linuxwochen Wien
What about unmatchable WIKI?
08/05/2014 Linuxwochen Wien
Wikipedia tag in OpenStreetMap
08/05/2014 Linuxwochen Wien
http://taginfo.openstreetmap.org
Wikipedia tag statistics
Tag name Number of valueswikipedia 339,148
wikipedia:ru 30,457 wikipedia:en 16,432 wikipedia:de 13,923 wikipedia:es 4,706
404,666
Total Wikipedia entries with location:1,621,704 in 15 languages
798,965 English
08/05/2014 Linuxwochen Wien
The Confusion of Tongues
08/05/2014 Linuxwochen Wien
Multiple OSM candidates for one Wiki
08/05/2014 Linuxwochen Wien
Multiple fitting Wiki entries
08/05/2014 Linuxwochen Wien
Wiki articles with no OSM object
08/05/2014 Linuxwochen Wien
What data to include?
… for an offline guide
178MB!
08/05/2014 Linuxwochen Wien
08/05/2014 Linuxwochen Wien
Ulmon’s matching algorithm…StephansdomStröckStephansplatzStephansplatz (U3 station)Stock-im-Eisen-PlatzCafé WeinwurmDO&CO am StephansplatzHaas-HausAida…
Distance: 0.9
Name: 1.0
Type: 0.0
?
?? ?
?
?
Comparing Names
• Edit distance (Levenshtein distance)• Soundex• Dice coefficient
08/05/2014 Linuxwochen Wien
Type score
• Compare OSM tags with Dbpedia types– Manual rules– Word similarity– Future: Synonymic analysis based on
Wordnet
08/05/2014 Linuxwochen Wien
Decision tree
• Generated using the J48 algorithm of the Weka toolkit
• How to get learning data?– Manual creation– Parsing wikipedia tags from OSM
08/05/2014 Linuxwochen Wien
Ulmon’s matching performance
• Current– Total wiki entries: 810K (674K English)– Matched entries: 429K
• Future– Total wiki entries: 1.6M– Matched entries (extrapolation): 850K
08/05/2014 Linuxwochen Wien
Multiple OSM candidates for one Wiki
08/05/2014 Linuxwochen Wien
Multiple fitting Wiki entries
08/05/2014 Linuxwochen Wien
Open questions
• Reduce false positives– Current: 10%, desired < 3%
• Get more matching!• Reduce the amount of data
08/05/2014 Linuxwochen Wien
Thank you for your attention!Come visit us at www.ulmon.com
08/05/2014 Linuxwochen Wien
top related