analyzing social bookmarking systems: a del.icio cookbook

Post on 16-Jan-2016

30 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook Robert Wetzker, Carsten Zimmermann, Christian Bauckhage Workshop on Mining Social Data, ECAI 2008. 13 September, 2014 Dipl.-Ing. Robert Wetzker I robert.wetzker@dai-labor.de. - PowerPoint PPT Presentation

TRANSCRIPT

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 120 July, 2008

Analyzing Social Bookmarking Systems:A del.icio.us Cookbook

Robert Wetzker, Carsten Zimmermann, Christian Bauckhage

Workshop on Mining Social Data, ECAI 2008

21 April 2023 Dipl.-Ing. Robert Wetzker I robert.wetzker@dai-labor.de

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 220 July, 2008

Why this paper?

Why social bookmarking?

Provides a vast amount of user-generated annotations for web content.

Reflects the interests of millions of users.

Wisdom-of-crowds.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 320 July, 2008

Why this paper?

Why social bookmarking?

Provides a vast amount of user-generated annotations for web content.

Reflects the interests of millions of users.

Wisdom-of-crowds.

Research areas:

(Web-) Search

(Web-) Content classification

Ontology building

Trend detection

Recommendation

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 420 July, 2008

Outline

1. The del.icio.us bookmarking service

2. Bookmarking patterns

3. Tagging patterns

4. Social bookmarking and spam

5. Conclusions and future work

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 520 July, 2008

The del.icio.us bookmarking service

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 620 July, 2008

The del.icio.us bookmarking service

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 720 July, 2008

The growth of del.icio.us

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 820 July, 2008

The dataset

We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 920 July, 2008

The dataset

We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.

Corpus details

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1020 July, 2008

The dataset

We recursively crawled del.icio.us tag wise starting with the tag “web2.0” (Oct.-Dez. 2007). From the retrieved corpus of 45 million bookmarks we extracted the 1 million most frequent users and downloaded the bookmarks of these users. (Dez. 2007 – Apr. 2008) For the analysis presented here, we only considered the 142 million bookmarks obtained from the user wise crawling.

Corpus details

> 80% of del.icio.us

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1120 July, 2008

Bookmarking patterns

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1220 July, 2008

Bookmarking patterns

Top 10 most frequent URLs in the corpus

The del.icio.us community is biased toward web community and web technology related content.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1320 July, 2008

Bookmarking patterns

Top 10 most frequent domains in the corpus

The del.icio.us community is biased toward web community and web technology related content.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1420 July, 2008

Bookmarking patterns

The Top 1% of users proliferates 22% of all bookmarks. 39% of all bookmarks link to 1% of all URLs.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1520 July, 2008

Bookmarking patterns

The del.icio.us community pays attention to new content only for a very short period of time.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1620 July, 2008

Tagging patterns

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1720 July, 2008

Tagging patterns

Each bookmark is labeled with 3.16 tags on average.About 7% of all bookmarks are not tagged at all.

Top 20 most frequent tags in the corpus

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1820 July, 2008

Tagging patterns

700 of 7.000.000 tags account for 50% of all labels. 55% of all tags appear only once.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 1920 July, 2008

Tagging patterns

Tendencies in the del.icio.us tag distribution strongly correlate with upcoming and periodic external events.

Occurrence of 5 sample tags in 2007.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2020 July, 2008

Social bookmarking and spam

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2120 July, 2008

Social bookmarking and spam

Del.icio.us is highly vulnerable to spam.

19 of the Top 20 users are of apparently non human origin accounting for 1.3 million bookmarks, around 1% of the corpus.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2220 July, 2008

Social bookmarking and spam

Del.icio.us is highly vulnerable to spam.

19 of the Top 20 users are of apparently non human origin accounting for 1.3 million bookmarks, around 1% of the corpus.

We find spammers to exhibit one or more of the following characteristics:

very high activity bookmarking only few domains high tagging rate very low tagging rate bulk posts a combination of the above

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2320 July, 2008

Social bookmarking and spam

The number of bookmarks and the number of users linking to a domain.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2420 July, 2008

Social bookmarking and spam

The number of user bookmarks and the average number of tags per bookmark.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2520 July, 2008

The diffusion of attention

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2620 July, 2008

The diffusion of attention

In some cases spam detection may prove computational expensive or ambiguous.

The diffusion of attention concept reduces the effect of spam on the tag distribution without the actual need of spam detection.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2720 July, 2008

The diffusion of attention

In some cases spam detection may prove computational expensive or ambiguous.

The diffusion of attention concept reduces the effect of spam on the tag distribution without the actual need of spam detection.

We define the attention given to a tag as the number of users using the tag.

The diffusion of attention for a tag is then given by the number of users that assign a tag for the first time in a given period.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2820 July, 2008

The diffusion of attention

Tagging trends by tag occurrence.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 2920 July, 2008

The diffusion of attention

Tagging trends by tag occurrence. Tagging trends by diffusion of attention.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3020 July, 2008

Future work

Provide automatic and scalable spam detection methods. Topic aware detection of trends.

Follow up paper:

Detecting Trends in Social Bookmarking Systems using a Probabilistic Generative Model and Smoothing, R. Wetzker, T. Plumbaum, A.Korth, C. Bauckhage, T. Alpcan, F. Metze, International Conference on Pattern Recognition (ICPR), 2008, Tampa, USA (to appear)

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3120 July, 2008

Questions?

Thank you.

Analyzing Social Bookmarking Systems: A del.icio.us Cookbook 3220 July, 2008

Social bookmarking and spam

The number of bookmarks and the number of users linking to a domain.

http://d.hatena.ne.jp

top related