dsnotify - detecting and fixing broken links in linked data sets
DESCRIPTION
Bernhard Haslhofer and Niko Popitsc, University of ViennaWeb Semantic Workshop, DEXA 2009 Linz, 2 September 2009TRANSCRIPT
![Page 1: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/1.jpg)
Bernhard Haslhofer, Niko Popitsch
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
WebS ’09 @ DEXA 2009
Linz, 02/09/2009
Bernhard Haslhofer and Niko Popitsch
![Page 2: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/2.jpg)
Bernhard Haslhofer, Niko Popitsch
Summary
2
![Page 3: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/3.jpg)
![Page 4: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/4.jpg)
![Page 5: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/5.jpg)
![Page 6: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/6.jpg)
<mo:MusicGroup rdf:about="/music/artists/084308bd-1654-436f-ba03-df6697104e19#artist">
<foaf:name>Green Day</foaf:name>
<owl:sameAs rdf:resource="http://dbpedia.org/resource/Green_Day" />
<mo:image rdf:resource="/music/images/artists/7col_in/084308bd-1654-436f-ba03-
df6697104e19.jpg" />
<foaf:page rdf:resource="/music/artists/084308bd-1654-436f-ba03-df6697104e19.html" />
<mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/084308bd-1654-436f-ba03-
df6697104e19.html" />
<mo:homepage rdf:resource="http://www.greenday.com/" />
<mo:fanpage rdf:resource="http://www.greendayvideos.com/" />
<mo:fanpage rdf:resource="http://www.greenday.net" />
<mo:imdb rdf:resource="http://www.imdb.com/name/nm1554564/" />
<mo:myspace rdf:resource="http://www.myspace.com/greenday" />
...
![Page 7: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/7.jpg)
...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Green Day
is an American rock trio formed in 1987. The band has consisted of Billie Joe Armstrong
(vocals, guitar), Mike Dirnt, and Tré Cool for the majority of its existence...
</dbpprop:abstract>
</rdf:Description>
...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="de">Green Day
[gɹiːn deɪ] ist eine US-amerikanische Punk-Rock-Band, mit der Anfang der 1990er das Punk-
Revival begann. Die Band wurde 1987 von Billie Joe Armstrong und Mike Dirnt zusammen
mit dem Schlagzeuger John Kiffmeyer alias Al Sobrante als The Sweet Children....
</dbpprop:abstract>
</rdf:Description>
...
![Page 8: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/8.jpg)
Bernhard Haslhofer, Niko Popitsch
...but...
8
![Page 9: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/9.jpg)
Bernhard Haslhofer, Niko Popitsch
• Events between DBpedia 3.2 (10/2008) and 3.3 (05/2009)
• # resources created: 29449
• # resources removed: 4789
• # resources moved: 729
9
Some numbers...
![Page 10: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/10.jpg)
![Page 11: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/11.jpg)
Bernhard Haslhofer, Niko Popitsch
Link Integrity...• is a qualitative property that is given when all links
within and between a set of data sources are valid and deliver the result intended by the link creator.
• cf. referential integrity in RDBMS
• demands a solution that
• detects broken links between resources
• provides support for fixing broken links
11
![Page 12: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/12.jpg)
Bernhard Haslhofer, Niko Popitsch
Types of broken links
• Removed link targets
• e.g., resource deleted, server not available anymore, etc.
• Moved link targets
• available at another Web location
• e.g., reorganization of Web resources
• Modified link targets
12
![Page 13: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/13.jpg)
Bernhard Haslhofer, Niko Popitsch
The DSNotify Approach• periodically monitor items (resources) in a specific
Linked Data source
• extract descriptive features vector for each item
• store item + feature vector in index
• use feature vectors to detect if items have been removed or moved to another location
• if moved, add relationship between “old” and “new” item
13
![Page 14: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/14.jpg)
Bernhard Haslhofer, Niko Popitsch
Architecture
14
DSNOTIFY
LOD SourcesLOD Source
owl:sameAs
owl:sameAs
update
* Monitor (feature extraction)Event
LOG
monitor
Indices
II RII AII
* Move Detector (heuristic)
notifications
querying
user
Decision making* Decider
* LOD source
updater
LOD „consuming“
application
![Page 15: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/15.jpg)
Bernhard Haslhofer, Niko Popitsch 15
Index Interaction
http://dbpedia.org/resource/
Green_Day (band)
http://dbpedia.org/resource/
band/Green_Day
http://dbpedia.org/resource/
Green_Day (band)
http://dbpedia.org/resource/
Green_Day (band)
http://dbpedia.org/resource/
Green_Day (band)
http://dbpedia.org/resource/
band/Green_Day
http://dbpedia.org/resource/
band/Alternative/Green_Day
Item Index (II) Archived Item Index (AII) Removed Item Index (RII)
t1
t2
t3
t4
time
![Page 16: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/16.jpg)
Bernhard Haslhofer, Niko Popitsch
Move Detection
• is a semi-automatic process
• calculate similarity between items based on their feature vectors using domain-specific heuristics
• probability > given threshold: automatic decision
• probability < given threshold: ask expert user
16
![Page 17: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/17.jpg)
Bernhard Haslhofer, Niko Popitsch
DSNotify HTTP Interface
• GET http://<server>:<port>/<dsnotify>/item/<uri>
• find out what happened with an item
• GET http://<server>:<port>/<dsnotify>/eventChoice
• retrieve pending event choices (move / remove)
• ...
17
![Page 18: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/18.jpg)
Bernhard Haslhofer, Niko Popitsch
Evaluation Plan
18
t0t-1t-2t-n ...
DBpedia 3.2DBpedia 3.1DBpedia 3.0DBpedia 2.0
Diff
mv rm
manual classification
Diff
mv rm
manual classification
Diff
mv rm
manual classification
![Page 19: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/19.jpg)
Bernhard Haslhofer, Niko Popitsch
Status / Future Work
• 1st prototype (infrastructure) ready
• annotated test-data set based on DBpedia available
• Currently working on:
• system for simulating past modifications in DBpedia
• the DSNotify evaluation
19
![Page 20: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/20.jpg)
Fixing Your Web since 2009
![Page 21: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/21.jpg)
Bernhard Haslhofer, Niko Popitsch
Backup
21
![Page 22: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/22.jpg)
Bernhard Haslhofer, Niko Popitsch
Evaluation Plan
• Monitor simulated DBpedia evolution (t-n - t0)
• Precision / recall of automatic move detection
• with different similarity thresholds
• with different heuristics / and feature vectors
22
![Page 23: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets](https://reader033.vdocuments.us/reader033/viewer/2022052823/555069c1b4c90524138b4663/html5/thumbnails/23.jpg)
Bernhard Haslhofer, Niko Popitsch
Linked Data / Web of Data
• Data management paradigm on the basis of Web technologies
• HTTP, URI, and RDF/S are the key technologies
• Applications (not Web browsers) are data consumers
• Links between resources play a major role
23