using hyperlinks to enrich message board content with linked data

16
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Chapter Using Hyperlinks to Enrich Message Board Content with Linked Data Sheila Kinsella, Alexandre Passant, John G. Breslin

Upload: sheila-kinsella

Post on 07-May-2015

2.213 views

Category:

Technology


1 download

DESCRIPTION

Presentation from I-SEMANTICS 2010, Graz, Austria. Based on the paper "Using Hyperlinks to Enrich Message Board Content with Linked Data" by Sheila Kinsella, Alexandre Passant, and John G. Breslin.

TRANSCRIPT

Page 1: Using Hyperlinks to Enrich Message Board Content with Linked Data

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Chapter

Using Hyperlinks to Enrich Message Board Content with

Linked Data

Sheila Kinsella, Alexandre Passant, John G. Breslin

Page 2: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Introduction

Hyperlinks are an important part of online conversation, often represent identifiable concepts

More and more often these hyperlinks have corresponding structured data sources

Our aims in this study1) Study the growth of structured, user-generated data

and links in social media over 10 years

2) Investigate how we can use this data for enhanced analysis of online conversation

Page 3: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Example post

imdb:tt0211915 foaf:topic dbpedia:Amélie .dbpedia:Amélie dc:title "Amélie“ .dbpedia:Amélie dc:date "2001" .dbpedia:Amélie dbpprop:starring dbpedia:Audrey_Tautou .dbpedia:Amélie dbpprop:director dbpedia:Jean-Pierre_Jeunet .

http://www.imdb.com/title/tt0211915/= Identifier we can use to query LinkedMDB/Dbpedia/Freebase…

Page 4: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Dataset enrichment

4 of XYZ

Page 5: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Boards.ie SIOC Data Competition

2008 competition to do something interesting with message board data

February 1998 – February 2008 SIOC, FOAF, DC ~ 130k users > 7m posts

Page 6: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Change in type of websites linked to

6 of XYZ

Domain Main Content Type

bbc.co.uk news mediakomplett.ie shopireland.com news mediaeircom.net Web hostingyahoo.com news/

discussionrte.ie news mediagoogle.com Web searchgeocities.com

Web hosting

iol.ie Web hostingmicrosoft.com

technical support

Domain Main Content Type

youtube.com UGC: video-sharing

wikipedia.org UGC: encyclopedia

komplett.ie shopmyspace.com UGC: SNS/musicflickr.com UGC: photo-

sharingbbc.co.uk news mediarte.ie news mediacarzone.ie shopphotobucket.com

UGC: media hosting

ebay.ie shop

2002/2003 2007/2008

Page 7: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

1. youtube.com2. wikipedia.org

(dbpedia)3. komplett.ie4. bbc.co.uk5. myspace.com

(dbtunes)6. rte.ie7. carzone.ie8. google.com9. photobucket.com10. flickr.com

11. microsoft.com12. eircom.net13. ebay.ie14. imageshack.us15. imdb.com

(linkedmdb)16. ebay.co.uk17. yahoo.com18. amazon.co.uk19. google.ie20. blogspot.com

Identification of external data sources

Page 8: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Data sources

Page 9: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Structured Data

RDF Data Rich descriptions of resources using common ontologies Linked Data, RDFa, SPARQL endpoint, RDF dumps E.g. <X> <foaf:topic> <dbpedia:Education> . We store this data and merge equivalent URIs if required

API Data (fixed values) Less rich, heterogeneous, lacking common semantics Often available as JSON/XML, easily converted to RDF E.g. <category term='Education'/> We manually mapped these to URIs

API Data (tags) Plain text annotations, meaning can be ambiguous E.g. “education” We performed a naïve mapping of tags to URIs

EX

PR

ESS

IVEN

ES

S

Page 10: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Analysis of external links

For 2007/2008, we could access structured datafor over 9% of all posted links

98/ 99/ 00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/99 00 01 02 03 04 05 06 07 08

Page 11: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

The enriched dataset

DBPEDIA

SIOC

Linked Data/ Web APIs(30k links, 21k unique)3,000 500 1,500 8,000 6,000 2,000

6,000

23,000

concepts

24,000

RDF RDF RDFRDF

Page 12: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Analyis example: Post content%

of

post

s co

nta

inin

g n

am

e/t

itle

0

10

20

30

40

50

60

70

Book Title Book Author Movie Title Movie Director

Post Text

Anchor Text

Post Title

Page 13: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Analyis example: Content sharing

0

10

20

30

40

50

60

70

80

beforerelease

0 - 1 days 1 day - 1week

1 week -1 month

1 month -1 year

1 - 10years

10 - 100years

Flickr (photos)

YouTube (videos)

Amazon (books)

IMDB (movies)

% o

f co

nte

nt

age

Page 14: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Analysis example: User profiling

Page 15: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Analysis example: User profiling

Page 16: Using Hyperlinks to Enrich Message Board Content with Linked Data

Digital Enterprise Research Institute www.deri.ie

Conclusions

Many links posted in social media sites correspond to a structured data source In 2007/2008 already more than 9%

This data can enable us to carry out new analysis and get new insight into online communities

Also potential for new applications e.g. content recommendation, enhanced cross-site browsing

Current work: using external structured data for improving topic identification in online communities