fresh and diverse social signals: any impacts on search?
TRANSCRIPT
Fresh and Diverse Social Signals
Ismaïl BADACHE*, Mohand BOUGHANEM**
*LSIS Lab, Aix-Marseille University, France
**IRIT Lab, Toulouse University, France
{badache, boughanem}@irit.fr
Any Impacts on Search ?
Presentation Plan
Introduction
Related Work
Temporality of Social Signals
Diversity of Social Signals5
1
4
Conclusion
2
6
IR Model Based on Social Signals3
1.1 Emergence of social Web
1Source:blogdumoderateur.com
% Sharing in SN
More than 3.5 billion Internet users
74% among them are resgistred at least in one SN
Social content per 1 minute
70000 Posts
2.3 Million « Like »
~410 GB of Data
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Video
Image
Web Page
Web Resources
Resource
.
.
.
Social Networks
Bookmark
Comment
Share/Recommend
Rating/Vote
Like/+1
Interactions
Social Signals(User Generated Content)
2
Video
Image
Web Page
Web Resources
Resource
.
.
.
Social Networks
Bookmark
Comment
Share/Recommend
Rating/Vote
Like/+1
Interactions
Social Signals(User Generated Content)
IR Model
Taking into account
Query Results
2
Video
Image
Web Page
Web Resources
Resource
.
.
.
Social Networks
Bookmark
Comment
Share/Recommend
Rating/Vote
Like/+1
Interactions
Social Signals(User Generated Content)
Nature
Origin
Signification
Temporality
Diversity
Rating
54321
3
1.2 Research Issues
How to take into account Social Signals and their temporality (e.g. age of
each action of rating) to estimate the importance of a resource?1
What is the impact of Signals’ diversity on IR process?2
What theoretical model to combine a priori relevance of resource with its
topical relevance?3
4
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
2.1 Related Work
Sources of evidence (Social Features) Properties Models Authors
Time-Independent Social Signals Approaches
• Number of : clicks, votes, records and
recommendations, likes, share, +1, tweet, etc
Popularity
Reputation
LM and Linear
Combination
(Karweg et al., 2011)
(Badache et al., 2015)
• Number of : like, dislike, comments on
YouTube.
• The playcount (number of times a user
listens to a track on lastfm)
Importance
Machine
Learning and
Linear
Combination
(Chelaru et al., 2012)
(Khodaei et al. 2012)
(Buijs et Spruit, 2014)
• Nombre de retweets. PopularityMachine
learning
(Yang et al., 2012)
(Hong et al., 2011)
Time-Dependent Social Signals Approaches
• Analysis of social signals to classify user
interest and interactions in 5 classes: recent,
ongoing, seasonal, past et random.
Temporal
interestsStatistic Study
(Khodaei et Alonso,
2012)
• Exploit the time click called ClickBuzz to
measure the interest of a document over
time.
Buzz over
time
Machine
learning(Inagaki et al., 2010)
5
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
2.2 Positioning
Evaluating the impact of the freshness of signals on the search performance
by using their creation date,1
Considering diversity of signals as an additional factor in the estimation of
the resource relevance.3
6
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Normalizing the distribution of signals on the resource using the age of the
resource2
Part IIR Model Based on Social
Signals
3.1 « Social » Representation of a document
7
Resource (Document)
- Like
- +1
- Share
- Rating
- ….
Mono-valued Signal
e.g. Like (number of like)
Multi-valued Signal
e.g. Rating (number of Rating and the value of Rating)
Textual Representation
• Keywords : 𝐷𝑤={𝑤1, 𝑤2, …𝑤𝑧}
Social Representation
• Social Signals : 𝐷𝑎={𝑎1, 𝑎2, … 𝑎𝑚}
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
3.2 A priori Importance of Document
8
𝑃 𝐷 𝑄 =𝑟𝑎𝑛𝑘 𝑃 𝑫 ∙ 𝑃 𝑄 𝐷)
𝑃 𝑎𝑖
a priori probability of
document D
Textual Model
Query/Content
ෑ
𝑎𝑖∈ 𝐴
𝑃 𝑎𝑖
Individually Grouped
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
3.3 Estimating Priors
9
𝐵𝐴 𝐷 =𝑚𝑜𝑦 𝑟 ∙ 𝑟 + σ𝐷′∈𝑅𝑚𝑜𝑦 𝑟′ ∙ |𝑟′|
𝑟 + σ𝐷′∈𝑅 |𝑟′|
𝑃 𝑎𝑖 = 𝑅𝑎𝑡𝑖𝑛𝑔 =1 + log(1 + 𝐵𝐴 𝐷 )
1 + log(1 + σ𝐷′∈𝑅 𝐵𝐴(𝐷′))
Estimated by maximum
likelihood and smoothing by
Dirichlet
𝑷 𝒂𝒊
𝑃 𝑎𝑖 =𝐶𝑜𝑢𝑛𝑡(𝑎𝑖 , 𝐷)
𝐶𝑜𝑢𝑛𝑡(𝑎., 𝐷)
Bayesian Average BA
Multi-valued : RatingMono-valued
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Part II Temporality of Social Signals
4.1 Hypothesis (1) : Freshness of Signal
10
Time
Simple CountingResource R1
+1 1
1+1
1+1
Weighted Counting
by action dateResource R1
+1 0.5
0.75+1.5
0.72+1.2
Boosting resources associated with Fresh signals
t1 t2 t3 t4 t5
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.2 Freshness of Signal
11
• Signal biased by its creation date
𝐶𝑜𝑢𝑛𝑡𝑡𝑎 𝑡𝑗,𝑎𝑖 , 𝐷 =
𝑗=1
𝑘
𝑓 𝑡𝑗,𝑎𝑖 , 𝐷 𝑓 𝑡𝑗,𝑎𝑖 , 𝐷 = 𝑒𝑥𝑝 −∥ 𝑡𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑡𝑗,𝑎𝑖 ∥
2
2𝜎2
• Rating biased by its creation date
𝐵𝐴𝑡 𝐷 =𝑚𝑜𝑦 𝑟𝑡 ∙ 𝑟𝑡 + σ𝐷′∈𝑅𝑚𝑜𝑦 𝑟𝑡′ ∙ |𝑟𝑡′|
𝑟𝑡 + σ𝐷′∈𝑅 |𝑟𝑡′|
𝑟𝑡 = 𝑟 ∙ 𝑒𝑥𝑝 −∥ 𝑡𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑡𝑟 ∥
2
2𝜎2
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.3 Hypothesis (2) : Resource Age
12
Resource R1 : Age = 1 month
188
52
12+1
Resource R2 : Age = 1 day
2
1
0+1
The "old" resources are more likely to get more signals that the "recent"
Normalization of signals by the age of the resource
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.4 Signals Normalization with Resource Age
13
𝐶𝑜𝑢𝑛𝑡𝑡𝐷 𝑎𝑖 , 𝐷 = 𝐶𝑜𝑢𝑛𝑡 𝑎𝑖 , 𝐷 ∙ 𝐴(𝐷)
𝐴(𝐷) = 𝑒𝑥𝑝 −‖𝑡𝑎𝑐𝑡𝑢𝑒𝑙 − 𝑡𝐷‖
2
2𝜎2
• Normalize the distribution of signals by the publication date of the
resource (age of the resource).
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.5 Evaluation
14
• Evaluation Framework :
- Two INEX Datasets: IMDb et SBS.
- Collection of social signals for each document (IMDb and SBS).
Dataset Documents Topics
IMDb 2011 1.5 millions 30
SBS 2015 : Suggestion Track 2.8 millions 208
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
• Objectives
1) Evaluate the impact of signals on the performance of IR systems,
2) Evaluate the impact of signal freshness and resource age.
4.5 Evaluation : IMDb Dataset
15
Textual Content: INEX IMDb 2011
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Textual Content: INEX SBS 2015
4.5 Evaluation : SBS Dataset
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Indexed
Field Field Field FieldStatus Status Status Status
16
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.6 Evaluation : Social Signals
17
Social Content: 9 signals are collected.
DELICIOUS
Bookmark IMDb
WITTERTweet IMDb
GOOGLE++1 IMDb
Share
LINKEDIMDb
MAZON
Tag
RatingSBS
ACEBOOKLike
Share
Comment
SBS,
IMDb
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.7 Evaluation : Social Signals
18
Example:
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.8 Evaluation : Results on IMDb
0,4325
nDCG
LM.Hiemstra
0,3403
P@20
LM.Hiemstra
Baseline
0,5130,5262
0,5121
0,4769
0,5017
0,4621 0,4566
nDCG
Like Share Comment Tweet +1 Bookmark Share(LIn)
0,3620,3649
0,35510,3512
0,34680,3414 0,3432
P@20
Like Share Comment Tweet +1 Bookmark Share(LIn)
Without considering Time
+6%+7%
+16%
+10%
+18%+22%
+18%
+11%+12%+9% +8%
+7%+5% +6% 0,362
0,37210,3683
0,35790,3511
0,3427 0,3449
P@20
Like Share Comment Tweet +1 Bookmark Share(LIn)
+18%+15%
+10%
+14%+10%
With considering Resource Age
0,5308 0,5544 0,52850,4903
0,52460,4671 0,4606
nDCG
Like Share Comment Tweet +1 Bookmark Share(LIn)
+20%+30%+23% +23% +13% +7% +7%
+5% +6%
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
4.8 Evaluation : Results on SBS
0,0689 0,0711 0,0678
0,0559 0,0531
P@20
Like Share Comment Rating Tag
Without considering Time
0,18640,19
0,1807
0,1748 0,1742
nDCG
Like Share Comment Rating Tag
+7%+8%
+11%
+17%+15%
+23% +27%
Baseline
0,05
P@20
LM.Hiemstra
0,162
nDCG
LM.Hiemstra
+21%+11% +5%
Considering Resource Age
0,07080,0796
0,0711 0,06950,058
P@20
Like Share Comment Rating Tag
0,19
0,2001
0,1882
0,18550,1771
nDCG
Like Share Comment Rating Tag
10%+30%+25%
+39%+26%
+9%+13%
+15%
+22%+17%
Date of Signal
0,0732
P@20
Rating
0,1904
nDCG
Rating
+33%
+10%
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
3.1 Proposed Approach4.9 Evaluation : Features Selection Algorithms (SBS)
21
--- : Highly selected
--- : Moderately selected
--- : Less selected
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Part III Diversity of Social Signals
5.1 Hypothesis
22
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
• Diversity of signals on a resource is a clue that may indicate an interest
beyond a social network or a community, i.e., a resource dominated by a
single signal should be disadvantaged against a resource with an equitable
distribution of the signals.
• Nature
• Origin
5.2 Estimating Diversity of Signals
23
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
• Estimation of diversity using Shannon diversity index.
• The interest (importance) of a resource.
𝑃 𝐷 = ෑ
𝑎𝑖 ∈𝐴
𝑃 𝑎𝑖 ∙ 𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 𝐷
𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 𝐷 = −
𝑖=1
𝑚
𝑃(𝑎𝑖) ∙ 𝑙𝑜𝑔 𝑃 𝑎𝑖
5.3 Evaluation : Results on IMDb & SBS
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
0,0868
0,197
0,0988
0,2095
P@20 nDCG
TotalFacebook All Criteria
0,4102
0,5681
0,4262
0,5974
P@20 nDCGTotalFacebook All Criteria
0,081
0,1937
0,0787
0,1981
P@20 nDCG
TotalFB All Criteria
0,4187
0,5713
0,4318
0,6174
P@20 nDCG
TotalFacebook All Criteria
0,084
0,1945
0,0915
0,2031
P@20 nDCG
TotalFB All Criteria
Without Diversity With Diversity
IMDb
SBS +7%
+4%+1%
+16%
+1%
SBS
Diversity & Resource Age
0,4289
0,5966
0,4334
0,6311
P@20 nDCG
TotalFacebook All Criteria
IMDb
+5%
+5%+6%
+7% +26%
+2%
+2%
+7%
+1%
+1%
Conclusion and Perspectives
3.1 Proposed Approach6. Conclusion and Perspectives
25
1. Introduction 2. Related Work
6. Conclusion
3. IR Model Based on Signals
5. Diversity of Signals4. Temporality of Signals
Contributions• IR model based on social signals.
• Taking into account the temporal aspect.
- « Freshness » of signal,
- Normalization of signal by resource age.
• Diversity of signals in the resource.
- Diversity in terms of nature and origin.
• Enrichment of test datasets (IMDb, SBS).
Lessons• Social signals are fruitful for IR.
Facebook signals and Ratings are the
most relevant features.
Bookmark is the weakest signal in
terms of relevance.
• Temporality and diversity have a positive
impact on results.
Perspectives• Taking into account the weight of the signals
• Studying the importance of social networks
and the actors of these signals and their
impact on the relevance,
• Studying the personalization of search
results according to the user's social
interactions and their temporality (Profile),
• Studying the polarity of the text content of
signals,
• Leveraging signals in other frameworks,
such as: analyzing the behavior of users of
social networks in a disaster, monitoring
patients (suicide attempt), recommendation.
http://www.irit.fr/~Ismail.Badache/https://www.linkedin.com/in/ismailbadache/