personalized social search based on the user’s social network david carmel et al. ibm research lab...
TRANSCRIPT
Personalized Social Search Based on the User’s Social Network
David Carmel et al.IBM Research Lab in Haifa, IsraelCIKM’09
16 February 2011Presentation @ IDB Lab Seminar
IDB Tagging Team, School of CSE, SNUPresented by Kangpyo Lee
2
Outline Introduction Related Work User Profiles Evaluation Summary & Discussion
3
Personalizing the search process– Done by considering the searcher’s personal attributes &
preferences while evaluating a query – A great challenge that has been extensively studied in the IR
community– Of great interest since user queries are in general very short
and provide an incomplete specification of individual users’ information needs
Introduction (1)
- Personalized Search
4
Personalization – Requires the capability of modeling the users’ preferences & interests – Usually done by tracking and aggregating users’ interaction with the
system E.g., users’ previous queries, click-through analysis, and even eye track-
ing
– Users’ interactions are structured into a user profile – A user profile is usually employed in two main scenarios
Personalized query expansion Re-ranking & filtering
Difficulties of the aggregation of user interactions – 1. Many users consider user profiling as an activity which may violate
their privacy– 2. Previous user interactions do not always provide a good indication
of current needs – 3. Personalized search results make justifying the relevance of a spe-
cific result for a given query more difficult
Introduction (2)
- Personalized Search
5
Introduction (3)
- Personalized Social Search
Social search – There are several alternative definitions of the concept social
search – In this work, it means the search process over “social” data
gathered from Web 2.0 applications E.g. social bookmarking systems, wikis, blogs, forums, SNSs, etc.
– Provides an ideal testbed for personalization due to the ex-plicit user interactions through Web 2.0 tools
1. A user profile derived from user feedback (bookmarking, rat-ing, commenting, blogging, etc.) provided a very good indication of the user’s interests
2. When the user’s social network (SN) is available, the prefer-ences of the user’s related people can be utilized to assist in ob-taining the user’s preferences
– Assuming closely related people have similar interests – Collaborative filtering (CF)
6
Introduction (4)
- SN-Based Personalized Social Search
In this work we study personalized social search in the enterprise based on the social relations of the searcher
We focus on re-ranking of search results – By considering their relationships to users that belong to the
searcher’s SN
Personalized re-ranking– Given a list of (non-personalized) results retrieved for the
user’s query and a list of related users related from his/her SN
– Search results are re-ranked by considering their relationship strength with those users
Documents that are strongly related to the user’s related people are boosted accordingly
7
Introduction (5)
- SN-Based Personalized Social Search
SaND (SociAl Network & Discovery) – An enterprise social search system used in IBM – To retrieve the user’s social network and the user-document
relationship matrix – Provides for each user related people
Ranked list of people, who relate to the user either– through explicit familiarity connections (e.g., co-authorship of a wiki
page or a connection within an SNS)– or by some kind of similarity as reflected by their social activity (e.g.
usage of the same tags or commenting on the same blog entry)
– Provides for each user all related documents (e.g., web pages, blog entries)
each associated with relationship strength to the user
– The relative strength of each relationship type is determined by an appropriate weight
8
Introduction (6)
- SN-Based Personalized Social Search
SN-based personalization considering three social network types – (1) familiarity-based network– (2) similarity-based network– (3) overall network
Additionally, topic-based personalization – Considers the relevance of the search results to the user’s
topic of interest – These topics are approximated by a set of terms
tags used by the user to bookmark documents tags used by others to bookmark that user
– Promotes search results that were tagged with these user’s terms
– Used as a comparative baseline for an SN-based personal-ized approach for social search
9
Introduction (7)
- SN-Based Personalized Social Search
Evaluation by off-line study – Given a user u who bookmarked a document d with the tag t,
we assume that if u will search for t, he will consider d rele-vant for t
– Any triplet (u, d, t) can be used as a personalized query for evaluation
– The higher the rank of documents tagged by u with t, the better the personalization method is
– The main drawback is that documents that were not tagged by u are considered irrelevant
Evaluation by on-line study – A survey of randomly chosen 240 employees in IBM
10
Outline Introduction Related Work User Profiles Evaluation Summary & Discussion
11
Related Work Personalized search
– Many researchers utilize query logs & click-through analysis for web search personalization
– In addition to regular web log data, several works consider using desktop data & external resources
– New approaches for adaptive personalization focus on the user task & the current activity context
Social search – The amount of social data is rapidly growing and has become a
main focus of research on social search – Tags & other conceptual structures emerging in social systems are
typically modeled as graphs
Personalized social search – Directly or indirectly employing users’ social relations for personal-
ization
12
Outline Introduction Related Work User Profiles Evaluation Summary & Discussion
13
User Profiles (1)
- System Description
IBM Lotus Connections (LC)– A social software application suite for organizations– Five social SW applications
Profiles (of all employees) A social bookmarking system, Dogear (743,239 bookmarks,
1,943,464 tags, 17,390 users) A blogging service, Blog Central (16,337 blogs, 144,263 blog en-
tries, 69,947 users) A communities service (2,100 online communities, 50,000
members) Activities
SaND is used as an aggregation tool for information discovery & analysis over the social data gathered from all LC’s applications
14
User Profiles (2)
- System Description
Entity-Entity relationship strength– Direct relations
– Indirect relations Two entities are indirectly related if both are directly related to
the same entity Level two
15
User Profiles (3)
- User Profile Types
Familiarity SN – Direct familiarity relation if
Both persons are marked as friends One is the direct manager/employer of the other A person is familiar with those s/he tagged, but not vice versa
– Indirect familiarity relation when The two persons are both authors of the same paper, patent, or
wiki-page Both have a common manager (team members)
Similarity SN– Similarity between two individuals according to common ac-
tivity Co-usage of the same tag Co-tagging of the same document Co-membership of the same community Co-commenting on the same blog entry
16
User Profiles (4)
- User Profile Types
Overall SN – Contains all related persons according to the full relationship
model
Topic-based – Directly related terms
Tags used by the user to tag documents and other people Tags used by others to tag that user
– Indirectly related terms Those that are related to the user through other entities (e.g. all
tags of a document bookmarked by the user)
– The user’s top related terms serve as the user’s Topic-based profile
17
User Profiles (5)
- Personalizing the Search
A user profile is constructed on the fly when a person logs into the system
For a user u, two lists are retrieved– N(u) – the ranked list of users related to u – T(u) – the ranked list of related terms
Given the user profile, P(u) = (N(u), T(u)), the search results are re-ranked
18
Outline Introduction Related Work User Profiles Evaluation Summary & Discussion
19
Evaluation (1)
- Methodologies
Evaluating personalized search is a great challenge – Relevance judgments can only be assessed by the searchers
themselves – Existing evaluation approaches are often based on a user
study Participants are asked to judge the search results for their per-
sonal queries in a personal manner Very expensive
– Users’ implicit feedback such as clicking on a specific result can be interpreted as personal relevance judgment
Given the bookmark (u, d, t), a personalized search system is evaluated according to its ability to highly rank the corresponding documents
20
Evaluation (2)
- Methodologies A delicate issue
– The search system is already “aware of” the association be-tween d & t, as realized by u
– Over-tuning
How to eliminate the dependency between personal-ization & evaluation – Mask u bookmarking of d
For each personal query (u, d, t), we first “hide” that bookmark from the search system before handling the query (u, t)
The system is instructed as this specific bookmarking has never happened
– d content is not enriched by the tag t – t is taken out from the user profile – u’s relations with other entities that are based on this bookmark are
modified accordingly
– This masking guarantees that personalization is evaluated without any prior knowledge on u relations with d and t
21
Evaluation (3)
- Methodologies
Still suffers from the incompleteness problem – Not all documents tagged by u with t are relevant for u
searching for t – Not all documents not tagged by u with t are necessarily ir-
relevant
Confirm our findings with an extensive user survey – 240 participants – 577 personal queries
22
Evaluation (4)
- Off-line Study
Process– We randomly selected 2000 bookmarks (u, d, t) – Then t was submitted as a query and 1000 documents are
retrieved – The search results are re-ranked using u’s profile– Evaluated by measuring mean average-precision (MAP) and
mean reciprocal rank (MRR) Main results
– α = β = 0.5, top-5 related people, top-5 related people
23
Evaluation (5)
- Off-line Study
Interesting insights – 1. All personalized methods significantly outperform non-per-
sonalized one – 2. The Similarity-SN significantly outperforms the Familiarity
& Overall-SN Similarity relations better predict the users’ preferences than
familiarity relations (+_+) We do not have good explanation to the inferiority of the Over-
all-SN
– 3. Topic-based personalization with no SN data improves the search significantly, even outperforming the Familiarity & the Overall-SN
24
Evaluation (6)
- Off-line Study
User profile size – The size of the user profile size is determined by the lists
N(u) & N(T) – A risk that adding too many people or terms to the user pro-
file may personalize too much – Finding an “optimal” user profile size is an important factor
An optimal user profile should be based on a few sim-ilar people & a few related terms
25
Evaluation (7)
- User Survey
26
Outline Introduction Related Work User Profiles Evaluation Summary & Discussion
27
Summary & Discussion We investigated personalized social search based on
the user’s social relations
We studied the effectiveness of several social net-work types for personalization
Our results showed that – According to both evaluations, social network based person-
alization significantly outperforms non-personalized social search
– As reflected in our user survey, all three SN-based strategies significantly outperform the Topic-based strategy, which im-proves only slightly over non-personalized results
Thank You!Any Questions or Comments?