yahoo! social search tag-based social interest discovery xin li, lei guo, eric zhao yahoo!...

32
Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Upload: nathaniel-ryan

Post on 11-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! Social Search

Tag-based Social Interest Discovery

Xin Li, Lei Guo, Eric Zhao

Yahoo! International Social Search

Page 2: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Internet Social Networks Are Emerging!

• Internet social networks are self-organized by online users– Del.icio.us, facebook, flickr, MySpace, YouTube

• Users are driven by their interests– Fetch and bookmark contents– Create new contents– Share contents

• Interest discovery is crucial to a social network– Discover interests of users in different contents– Locate users with similar interests– Link people with similar interests to form communities

Page 3: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Important Features of Social Networks

• Organize users and contents– Cluster users into communities– Categorize contents into interesting topics

• Provide search functions– Given a topic, locate all matching contents

and all users that are interested in the topic– Given a user, locate all his fetched/created

contents and the topics of his interests– Given a user, locate all other users that have

similar interests

Page 4: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

The Problem: Social Interest Discovery

• Questions to answer

– How to discover a user’s interests based on his fetched/created contents?

– How to use individual users’ interests to find interesting topics shared by users?

– How to use the topics to create interest-based user communities?

Page 5: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Existing Solutions and Limitations

• User-centric– Using social network graph to discover users with common

interests– Problem: online/offline user connections are hard to identify

• Object-centric– Detect common interests based on the common objects

fetched by users– Problem: discovered interests are object-base, non-descriptive

and implicit• Predefined categorization

– Not flexible, cannot catch most recent popular or hot user interests

– Cannot reflect various user interest groups which may keep changing over time

Page 6: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Our approach

• Leverage user-generated tags

• Compute frequent co-occurrences of tag patterns

• Use the tag patterns as topics of interests

• Cluster users and content around the topics to build communities

Page 7: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Overview

• Motivation and Problem

• Analysis of tags in a social network

• ISID system design

• Evaluation

• Conclusion

Page 8: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Tags in Social Networks

• User-generated labels for annotating the contents– Descriptive, summary, reflecting human judgment

– Meta data between users and contents

• Widely used in social networks– Del.icio.us: http://del.icio.us/help/tags

– Youtube: http://www.google.com/support/youtube/bin/answer.py?hl=en&answer=55769

– Facebook: http://www.facebook.com/help.php?hq=tag

Page 9: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

del.icio.us Social Network

• A pioneer social bookmark system

– http://del.icio.us/

• Our Data Set

– Dump for a limited period of time

– 4.3 M public, tagged bookmarks, 0.2 M users, 1.4 M bookmarked URLs

Page 10: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

URL Popularity Follows Power Law

The distribution of URL bookmarking frequency. Most URLs are unpopular.

1

10

100

1000

10000

100000

1e+06

1e+07

1 10 100 1000 10000

Num

ber

of U

RLs

(lo

g)

Number of occurrences (log)

Page 11: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

User Activity Follows Heavy-tail

The distribution of user bookmarking frequency.

Most users are less active.

1

10

100

1000

10000

100000

1 10 100 1000 10000

Nu

mbe

r of

use

rs (

log)

Number of occurrences (log)

Page 12: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Tags vs. Keywords

URL http://ka1fsb.home.att.net/resolve.html

Top tf keywords

domain,name,file,resolver,server,conf,network,nameserver,ip,org,ampr

Top tfidf keywords

ampr,domain,jnos,nameserver,conf,

ka1fsb,resolver,ip,file,name,server

All tags linux,howto,network,sysadmin,dns

Page 13: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Tag Vocabulary

Tag coverage for tf keywords Tag coverage for tf-idf keywords

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

CD

F o

ver

all

UR

Ls

Fraction of keywords (TF) missed by tags

Top 10Top 20Top 40

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

CD

F o

ver

all

UR

Ls

Fraction of keywords (TFIDF) missed by tags

Top 10Top 20Top 40

User tags missed ≤ 20% of tf keywords for ≥ 98% docs and ≤ 10% of tf-idf keywords for ≥ 90% docs.

Tags covered most important keywords. But the total number of unique tags are ~10x smaller than that of keywords.

Page 14: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Tag Convergence

The total number of different tags users can use for a given document is limited no matter how popular the URL is.

0

50

100

150

200

250

300

0 200 400 600 800 1000 1200 1400

# of

tags

# of saves of URLs

Tag 0Tag -1Tag -2

Page 15: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Tags Capture Concepts of Contents

i i

Utk k

tw

twTUe k

)(

)(),( |

• Nearly 50% of all URLs have tag match ratio 1

• 70% of all URLs have a tag match ratio > 0.5

• Only 10% of the URLs have no matched tags

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tag

ma

tch

ratio

URL ids normalized and ranked

Page 16: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

From Tags to User Interests

• Bookmarks reflect user interests

• Tags summarize/describe bookmarked contents

– Meta data between users and contents

– Connect users and bookmarked contents

• Frequently used tag patterns reflect user interests

– The key is the co-occurrences of tags

Page 17: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Overview

• Motivation and Problem

• Analysis of tags in a social network

• ISID system design

• Evaluation

• Conclusion

Page 18: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

System Design

• Find topics of interests – For a given set of tagged bookmarks, find all topics

of interests, i.e., frequent co-occurrences of tags

• Clustering– For each topic, find all the URLs and the users such

that those users have labeled each of the URLs with all the tags in the topic.

• Indexing– Import the topics and their user and URL clusters

into an indexing system for application queries.

Page 19: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

ISID Architecture

Data Source Topic Discovery

Clustering Indexing

Posts

Topics, posts

Topics, Clusters

Page 20: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Topic Discovery

• Use the association rule algorithms to discover co-occurring tag patterns– Was invented for identifying frequently bought items in

supermarkets• E.g., bread and milk

– Use a support number to define the frequency threshold– Efficient in finding frequent patterns out of a large set

transactions for given support number (threshold)– The rule building part is not used

• One more step: remove pattern A if A is a sub-pattern of some other pattern B, and both A & B have the same support number– To remove duplicate clusters

Page 21: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Clustering

}.{..

}.{..

of topicallfor

post allfor

.

.

do topicallfor

urlPurlTurlT

userPuserTuserT

PT

P

urlT

userT

T

Page 22: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Indexing

• Find all URLs that contain a topic, i.e. tagged with same sets of tags

• Find all users interested in a topic

• Find all topics containing a tag

• Find all topics for a user

• Find all topics for a URL

• Combination of the above

Page 23: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Overview

• Motivation and Problem

• Analysis of tags in a social network

• ISID system design

• Evaluation

• Conclusion

Page 24: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Content Similarity of Topic Clusters

• Similarity of two documents

– Inner product of tf-idf document vectors• Keyword-based vector

• Tag-based vector (comparison)

• Intra-topic similarity

– Average cosine similarity of every document pairs

• Inter-topic similarity

– Similarity of two topics

– Average similarity of one topic to all other topics

Page 25: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Inter- and Intra- Topic Similarity

• Intra-topic similarity is significantly higher than inter-topic similarity

• Tag co-occurrence can well cluster similar content

• Tag-based similarity is quite close to keyword-based similarity

Keyword based (tf-idf)

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

Topic rank

Ave

rag

e c

osi

ne

sim

ilarit

y

intra-topicinter-topic

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

Topic rankA

vera

ge

co

sin

e s

imila

rity

intra-topicinter-topic

Tag based (tf-idf)

Page 26: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Inter-topic Similarity

0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

Number of overlapped tags

Ave

rag

e c

osi

ne

sim

ilari

ty

0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

Number of overlapped tags

Ave

rag

e c

osi

ne

sim

ilarit

y

Tag-based (tf-idf)Keyword-based (tf-idf)

Similarity of two topics with different number of overlapped tags

Inter-topic similarity increases with number of co-occurring tags. Tag co-occurrences capture similar contents.

Page 27: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

User Interest Coverage

• 90% users have ≥ 90% top 5 tags covered

• 87% users have ≥ 90% top 10 tags covered

• 90% users have ≥ 80% tags covered

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

CD

F o

f fr

actio

n o

f use

rs

Fraction of top tags covered by topics

Of top 5 tagsOf top 10 tags

Of all tags

The topics discovered by ISID capture the interests of users.

Page 28: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Human Reviews

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

# of interest topic

Ave

rage

sco

re

From the human being’s judgment, ISID indeed clusters related URLs into clusters for each topic defined by user tags.

Scores:

1, Highly unrelated

2, Unrelated

3, Not sure

4, Related

5, Highly related

Page 29: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Cluster Properties

Cluster size follows power-law User interests follows power-law. There exists really hot topics!

1

10

100

1000

10000

100 1000 10000 100000 1e+06

Num

ber

of

clus

ters

(lo

g)

Number of posts (log)

Page 30: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Cluster Properties

Most topics have less than 6 tags. Beyond 6, the number of clusters quickly drops.

1

10

100

1000

10000

100000

2 3 4 5 6 7 8 9

Num

ber

of

clus

ters

(lo

g)

Topic size (Number of tags)

Page 31: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Overview

• Motivation and Problem

• Data and Their Properties

• ISID system

• Evaluation

• Conclusion

Page 32: Yahoo! Social Search Tag-based Social Interest Discovery Xin Li, Lei Guo, Eric Zhao Yahoo! International Social Search

Yahoo! International Social Search

Conclusion

• Tags reflect human judgments on contents

• Co-occurring tags are effective to represent user interests– Reflect human understanding for different

but similar web contents

– Consensus of judgments among users

• ISID system– Topic discovery, Clustering, Indexing

– Evaluation results are promising