using hyperlinks to enrich message board content with linked data
DESCRIPTION
Presentation from I-SEMANTICS 2010, Graz, Austria. Based on the paper "Using Hyperlinks to Enrich Message Board Content with Linked Data" by Sheila Kinsella, Alexandre Passant, and John G. Breslin.TRANSCRIPT
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Chapter
Using Hyperlinks to Enrich Message Board Content with
Linked Data
Sheila Kinsella, Alexandre Passant, John G. Breslin
Digital Enterprise Research Institute www.deri.ie
Introduction
Hyperlinks are an important part of online conversation, often represent identifiable concepts
More and more often these hyperlinks have corresponding structured data sources
Our aims in this study1) Study the growth of structured, user-generated data
and links in social media over 10 years
2) Investigate how we can use this data for enhanced analysis of online conversation
Digital Enterprise Research Institute www.deri.ie
Example post
imdb:tt0211915 foaf:topic dbpedia:Amélie .dbpedia:Amélie dc:title "Amélie“ .dbpedia:Amélie dc:date "2001" .dbpedia:Amélie dbpprop:starring dbpedia:Audrey_Tautou .dbpedia:Amélie dbpprop:director dbpedia:Jean-Pierre_Jeunet .
http://www.imdb.com/title/tt0211915/= Identifier we can use to query LinkedMDB/Dbpedia/Freebase…
Digital Enterprise Research Institute www.deri.ie
Dataset enrichment
4 of XYZ
Digital Enterprise Research Institute www.deri.ie
Boards.ie SIOC Data Competition
2008 competition to do something interesting with message board data
February 1998 – February 2008 SIOC, FOAF, DC ~ 130k users > 7m posts
Digital Enterprise Research Institute www.deri.ie
Change in type of websites linked to
6 of XYZ
Domain Main Content Type
bbc.co.uk news mediakomplett.ie shopireland.com news mediaeircom.net Web hostingyahoo.com news/
discussionrte.ie news mediagoogle.com Web searchgeocities.com
Web hosting
iol.ie Web hostingmicrosoft.com
technical support
Domain Main Content Type
youtube.com UGC: video-sharing
wikipedia.org UGC: encyclopedia
komplett.ie shopmyspace.com UGC: SNS/musicflickr.com UGC: photo-
sharingbbc.co.uk news mediarte.ie news mediacarzone.ie shopphotobucket.com
UGC: media hosting
ebay.ie shop
2002/2003 2007/2008
Digital Enterprise Research Institute www.deri.ie
1. youtube.com2. wikipedia.org
(dbpedia)3. komplett.ie4. bbc.co.uk5. myspace.com
(dbtunes)6. rte.ie7. carzone.ie8. google.com9. photobucket.com10. flickr.com
11. microsoft.com12. eircom.net13. ebay.ie14. imageshack.us15. imdb.com
(linkedmdb)16. ebay.co.uk17. yahoo.com18. amazon.co.uk19. google.ie20. blogspot.com
Identification of external data sources
Digital Enterprise Research Institute www.deri.ie
Data sources
Digital Enterprise Research Institute www.deri.ie
Structured Data
RDF Data Rich descriptions of resources using common ontologies Linked Data, RDFa, SPARQL endpoint, RDF dumps E.g. <X> <foaf:topic> <dbpedia:Education> . We store this data and merge equivalent URIs if required
API Data (fixed values) Less rich, heterogeneous, lacking common semantics Often available as JSON/XML, easily converted to RDF E.g. <category term='Education'/> We manually mapped these to URIs
API Data (tags) Plain text annotations, meaning can be ambiguous E.g. “education” We performed a naïve mapping of tags to URIs
EX
PR
ESS
IVEN
ES
S
Digital Enterprise Research Institute www.deri.ie
Analysis of external links
For 2007/2008, we could access structured datafor over 9% of all posted links
98/ 99/ 00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/99 00 01 02 03 04 05 06 07 08
Digital Enterprise Research Institute www.deri.ie
The enriched dataset
DBPEDIA
SIOC
Linked Data/ Web APIs(30k links, 21k unique)3,000 500 1,500 8,000 6,000 2,000
6,000
23,000
concepts
24,000
RDF RDF RDFRDF
Digital Enterprise Research Institute www.deri.ie
Analyis example: Post content%
of
post
s co
nta
inin
g n
am
e/t
itle
0
10
20
30
40
50
60
70
Book Title Book Author Movie Title Movie Director
Post Text
Anchor Text
Post Title
Digital Enterprise Research Institute www.deri.ie
Analyis example: Content sharing
0
10
20
30
40
50
60
70
80
beforerelease
0 - 1 days 1 day - 1week
1 week -1 month
1 month -1 year
1 - 10years
10 - 100years
Flickr (photos)
YouTube (videos)
Amazon (books)
IMDB (movies)
% o
f co
nte
nt
age
Digital Enterprise Research Institute www.deri.ie
Analysis example: User profiling
Digital Enterprise Research Institute www.deri.ie
Analysis example: User profiling
Digital Enterprise Research Institute www.deri.ie
Conclusions
Many links posted in social media sites correspond to a structured data source In 2007/2008 already more than 9%
This data can enable us to carry out new analysis and get new insight into online communities
Also potential for new applications e.g. content recommendation, enhanced cross-site browsing
Current work: using external structured data for improving topic identification in online communities