personalized social search based on the user’s social network david carmel et al. ibm research lab...

Personalized Social Search Based on the User’s Social Network

David Carmel et al.IBM Research Lab in Haifa, IsraelCIKM’09

16 February 2011Presentation @ IDB Lab Seminar

IDB Tagging Team, School of CSE, SNUPresented by Kangpyo Lee

2

Outline Introduction Related Work User Profiles Evaluation Summary & Discussion

3

Personalizing the search process– Done by considering the searcher’s personal attributes &

preferences while evaluating a query – A great challenge that has been extensively studied in the IR

community– Of great interest since user queries are in general very short

and provide an incomplete specification of individual users’ information needs

Introduction (1)

- Personalized Search

4

Personalization – Requires the capability of modeling the users’ preferences & interests – Usually done by tracking and aggregating users’ interaction with the

system E.g., users’ previous queries, click-through analysis, and even eye track-

ing

– Users’ interactions are structured into a user profile – A user profile is usually employed in two main scenarios

Personalized query expansion Re-ranking & filtering

Difficulties of the aggregation of user interactions – 1. Many users consider user profiling as an activity which may violate

their privacy– 2. Previous user interactions do not always provide a good indication

of current needs – 3. Personalized search results make justifying the relevance of a spe-

cific result for a given query more difficult

Introduction (2)

- Personalized Search

5

Introduction (3)

- Personalized Social Search

Social search – There are several alternative definitions of the concept social

search – In this work, it means the search process over “social” data

gathered from Web 2.0 applications E.g. social bookmarking systems, wikis, blogs, forums, SNSs, etc.

– Provides an ideal testbed for personalization due to the ex-plicit user interactions through Web 2.0 tools

1. A user profile derived from user feedback (bookmarking, rat-ing, commenting, blogging, etc.) provided a very good indication of the user’s interests

2. When the user’s social network (SN) is available, the prefer-ences of the user’s related people can be utilized to assist in ob-taining the user’s preferences

– Assuming closely related people have similar interests – Collaborative filtering (CF)

6

Introduction (4)

- SN-Based Personalized Social Search

In this work we study personalized social search in the enterprise based on the social relations of the searcher

We focus on re-ranking of search results – By considering their relationships to users that belong to the

searcher’s SN

Personalized re-ranking– Given a list of (non-personalized) results retrieved for the

user’s query and a list of related users related from his/her SN

– Search results are re-ranked by considering their relationship strength with those users

Documents that are strongly related to the user’s related people are boosted accordingly

7

Introduction (5)


SaND (SociAl Network & Discovery) – An enterprise social search system used in IBM – To retrieve the user’s social network and the user-document

relationship matrix – Provides for each user related people

Ranked list of people, who relate to the user either– through explicit familiarity connections (e.g., co-authorship of a wiki

page or a connection within an SNS)– or by some kind of similarity as reflected by their social activity (e.g.

usage of the same tags or commenting on the same blog entry)

– Provides for each user all related documents (e.g., web pages, blog entries)

each associated with relationship strength to the user

– The relative strength of each relationship type is determined by an appropriate weight

8

Introduction (6)


SN-based personalization considering three social network types – (1) familiarity-based network– (2) similarity-based network– (3) overall network

Additionally, topic-based personalization – Considers the relevance of the search results to the user’s

topic of interest – These topics are approximated by a set of terms

tags used by the user to bookmark documents tags used by others to bookmark that user

– Promotes search results that were tagged with these user’s terms

– Used as a comparative baseline for an SN-based personal-ized approach for social search

9

Introduction (7)


Evaluation by off-line study – Given a user u who bookmarked a document d with the tag t,

we assume that if u will search for t, he will consider d rele-vant for t

– Any triplet (u, d, t) can be used as a personalized query for evaluation

– The higher the rank of documents tagged by u with t, the better the personalization method is

– The main drawback is that documents that were not tagged by u are considered irrelevant

Evaluation by on-line study – A survey of randomly chosen 240 employees in IBM

10


11

Related Work Personalized search

– Many researchers utilize query logs & click-through analysis for web search personalization

– In addition to regular web log data, several works consider using desktop data & external resources

– New approaches for adaptive personalization focus on the user task & the current activity context

Social search – The amount of social data is rapidly growing and has become a

main focus of research on social search – Tags & other conceptual structures emerging in social systems are

typically modeled as graphs

Personalized social search – Directly or indirectly employing users’ social relations for personal-

ization

12


13

User Profiles (1)

- System Description

IBM Lotus Connections (LC)– A social software application suite for organizations– Five social SW applications

Profiles (of all employees) A social bookmarking system, Dogear (743,239 bookmarks,

1,943,464 tags, 17,390 users) A blogging service, Blog Central (16,337 blogs, 144,263 blog en-

tries, 69,947 users) A communities service (2,100 online communities, 50,000

members) Activities

SaND is used as an aggregation tool for information discovery & analysis over the social data gathered from all LC’s applications

14

User Profiles (2)

- System Description

Entity-Entity relationship strength– Direct relations

– Indirect relations Two entities are indirectly related if both are directly related to

the same entity Level two

15

User Profiles (3)

- User Profile Types

Familiarity SN – Direct familiarity relation if

Both persons are marked as friends One is the direct manager/employer of the other A person is familiar with those s/he tagged, but not vice versa

– Indirect familiarity relation when The two persons are both authors of the same paper, patent, or

wiki-page Both have a common manager (team members)

Similarity SN– Similarity between two individuals according to common ac-

tivity Co-usage of the same tag Co-tagging of the same document Co-membership of the same community Co-commenting on the same blog entry

16

User Profiles (4)

- User Profile Types

Overall SN – Contains all related persons according to the full relationship

model

Topic-based – Directly related terms

Tags used by the user to tag documents and other people Tags used by others to tag that user

– Indirectly related terms Those that are related to the user through other entities (e.g. all

tags of a document bookmarked by the user)

– The user’s top related terms serve as the user’s Topic-based profile

17

User Profiles (5)

- Personalizing the Search

A user profile is constructed on the fly when a person logs into the system

For a user u, two lists are retrieved– N(u) – the ranked list of users related to u – T(u) – the ranked list of related terms

Given the user profile, P(u) = (N(u), T(u)), the search results are re-ranked

18


19

Evaluation (1)

- Methodologies

Evaluating personalized search is a great challenge – Relevance judgments can only be assessed by the searchers

themselves – Existing evaluation approaches are often based on a user

study Participants are asked to judge the search results for their per-

sonal queries in a personal manner Very expensive

– Users’ implicit feedback such as clicking on a specific result can be interpreted as personal relevance judgment

Given the bookmark (u, d, t), a personalized search system is evaluated according to its ability to highly rank the corresponding documents

20

Evaluation (2)

- Methodologies A delicate issue

– The search system is already “aware of” the association be-tween d & t, as realized by u

– Over-tuning

How to eliminate the dependency between personal-ization & evaluation – Mask u bookmarking of d

For each personal query (u, d, t), we first “hide” that bookmark from the search system before handling the query (u, t)

The system is instructed as this specific bookmarking has never happened

– d content is not enriched by the tag t – t is taken out from the user profile – u’s relations with other entities that are based on this bookmark are

modified accordingly

– This masking guarantees that personalization is evaluated without any prior knowledge on u relations with d and t

21

Evaluation (3)

- Methodologies

Still suffers from the incompleteness problem – Not all documents tagged by u with t are relevant for u

searching for t – Not all documents not tagged by u with t are necessarily ir-

relevant

Confirm our findings with an extensive user survey – 240 participants – 577 personal queries

22

Evaluation (4)

- Off-line Study

Process– We randomly selected 2000 bookmarks (u, d, t) – Then t was submitted as a query and 1000 documents are

retrieved – The search results are re-ranked using u’s profile– Evaluated by measuring mean average-precision (MAP) and

mean reciprocal rank (MRR) Main results

– α = β = 0.5, top-5 related people, top-5 related people

23

Evaluation (5)

- Off-line Study

Interesting insights – 1. All personalized methods significantly outperform non-per-

sonalized one – 2. The Similarity-SN significantly outperforms the Familiarity

& Overall-SN Similarity relations better predict the users’ preferences than

familiarity relations (+_+) We do not have good explanation to the inferiority of the Over-

all-SN

– 3. Topic-based personalization with no SN data improves the search significantly, even outperforming the Familiarity & the Overall-SN

24

Evaluation (6)

- Off-line Study

User profile size – The size of the user profile size is determined by the lists

N(u) & N(T) – A risk that adding too many people or terms to the user pro-

file may personalize too much – Finding an “optimal” user profile size is an important factor

An optimal user profile should be based on a few sim-ilar people & a few related terms

25

Evaluation (7)

- User Survey

26


27

Summary & Discussion We investigated personalized social search based on

the user’s social relations

We studied the effectiveness of several social net-work types for personalization

Our results showed that – According to both evaluations, social network based person-

alization significantly outperforms non-personalized social search

– As reflected in our user survey, all three SN-based strategies significantly outperform the Topic-based strategy, which im-proves only slightly over non-personalized results

Thank You!Any Questions or Comments?

personalized social search based on the user’s social network david carmel et al. ibm research lab...

Documents

users query

users related people

users social network

list of related users

users documents

users interests2

personalized social

user related people