analyzing social bookmarking systems: a del.icio cookbook

32
Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1 20 July, 2008 Analyzing Social Bookmarking Systems: A del.icio.us Cookbook Robert Wetzker, Carsten Zimmermann, Christian Bauckhage Workshop on Mining Social Data, ECAI 2008 4 July 2022 Dipl.-Ing. Robert Wetzker I [email protected]

Upload: brick

Post on 16-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook Robert Wetzker, Carsten Zimmermann, Christian Bauckhage Workshop on Mining Social Data, ECAI 2008. 13 September, 2014 Dipl.-Ing. Robert Wetzker I [email protected]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 120 July, 2008

Analyzing Social Bookmarking Systems:A del.icio.us Cookbook

Robert Wetzker, Carsten Zimmermann, Christian Bauckhage

Workshop on Mining Social Data, ECAI 2008

21 April 2023 Dipl.-Ing. Robert Wetzker I [email protected]

Page 2: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 220 July, 2008

Why this paper?

Why social bookmarking?

Provides a vast amount of user-generated annotations for web content.

Reflects the interests of millions of users.

Wisdom-of-crowds.

Page 3: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 320 July, 2008

Why this paper?

Why social bookmarking?

Provides a vast amount of user-generated annotations for web content.

Reflects the interests of millions of users.

Wisdom-of-crowds.

Research areas:

(Web-) Search

(Web-) Content classification

Ontology building

Trend detection

Recommendation

Page 4: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 420 July, 2008

Outline

1. The del.icio.us bookmarking service

2. Bookmarking patterns

3. Tagging patterns

4. Social bookmarking and spam

5. Conclusions and future work

Page 5: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 520 July, 2008

The del.icio.us bookmarking service

Page 6: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 620 July, 2008

The del.icio.us bookmarking service

Page 7: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 720 July, 2008

The growth of del.icio.us

Page 8: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 820 July, 2008

The dataset

We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.

Page 9: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 920 July, 2008

The dataset

We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.

Corpus details

Page 10: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1020 July, 2008

The dataset

We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.

Corpus details

> 80% of del.icio.us

Page 11: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1120 July, 2008

Bookmarking patterns

Page 12: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1220 July, 2008

Bookmarking patterns

Top 10 most frequent URLs in the corpus

The del.icio.us community is biased toward web community and web technology related content.

Page 13: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1320 July, 2008

Bookmarking patterns

Top 10 most frequent domains in the corpus

The del.icio.us community is biased toward web community and web technology related content.

Page 14: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1420 July, 2008

Bookmarking patterns

The Top 1% of users proliferates 22% of all bookmarks. 39% of all bookmarks link to 1% of all URLs.

Page 15: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1520 July, 2008

Bookmarking patterns

The del.icio.us community pays attention to new content only for a very short period of time.

Page 16: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1620 July, 2008

Tagging patterns

Page 17: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1720 July, 2008

Tagging patterns

Each bookmark is labeled with 3.16 tags on average.About 7% of all bookmarks are not tagged at all.

Top 20 most frequent tags in the corpus

Page 18: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1820 July, 2008

Tagging patterns

700 of 7.000.000 tags account for 50% of all labels. 55% of all tags appear only once.

Page 19: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1920 July, 2008

Tagging patterns

Tendencies in the del.icio.us tag distribution strongly correlate with upcoming and periodic external events.

Occurrence of 5 sample tags in 2007.

Page 20: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2020 July, 2008

Social bookmarking and spam

Page 21: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2120 July, 2008

Social bookmarking and spam

Del.icio.us is highly vulnerable to spam.

19 of the Top 20 users are of apparently non human origin accounting for 1.3 million bookmarks, around 1% of the corpus.

Page 22: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2220 July, 2008

Social bookmarking and spam

Del.icio.us is highly vulnerable to spam.

19 of the Top 20 users are of apparently non human origin accounting for 1.3 million bookmarks, around 1% of the corpus.

We find spammers to exhibit one or more of the following characteristics:

very high activity bookmarking only few domains high tagging rate very low tagging rate bulk posts a combination of the above

Page 23: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2320 July, 2008

Social bookmarking and spam

The number of bookmarks and the number of users linking to a domain.

Page 24: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2420 July, 2008

Social bookmarking and spam

The number of user bookmarks and the average number of tags per bookmark.

Page 25: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2520 July, 2008

The diffusion of attention

Page 26: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2620 July, 2008

The diffusion of attention

In some cases spam detection may prove computational expensive or ambiguous.

The diffusion of attention concept reduces the effect of spam on the tag distribution without the actual need of spam detection.

Page 27: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2720 July, 2008

The diffusion of attention

In some cases spam detection may prove computational expensive or ambiguous.

The diffusion of attention concept reduces the effect of spam on the tag distribution without the actual need of spam detection.

We define the attention given to a tag as the number of users using the tag.

The diffusion of attention for a tag is then given by the number of users that assign a tag for the first time in a given period.

Page 28: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2820 July, 2008

The diffusion of attention

Tagging trends by tag occurrence.

Page 29: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2920 July, 2008

The diffusion of attention

Tagging trends by tag occurrence. Tagging trends by diffusion of attention.

Page 30: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3020 July, 2008

Future work

Provide automatic and scalable spam detection methods. Topic aware detection of trends.

Follow up paper:

Detecting Trends in Social Bookmarking Systems using a Probabilistic Generative Model and Smoothing, R. Wetzker, T. Plumbaum, A.Korth, C. Bauckhage, T. Alpcan, F. Metze, International Conference on Pattern Recognition (ICPR), 2008, Tampa, USA (to appear)

Page 31: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3120 July, 2008

Questions?

Thank you.

Page 32: Analyzing Social Bookmarking Systems: A del.icio Cookbook

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3220 July, 2008

Social bookmarking and spam

The number of bookmarks and the number of users linking to a domain.

http://d.hatena.ne.jp