fresh and diverse social signals: any impacts on search?

33
Fresh and Diverse Social Signals Ismaïl BADACHE*, Mohand BOUGHANEM** *LSIS Lab, Aix-Marseille University, France **IRIT Lab, Toulouse University, France {badache, boughanem}@irit.fr Any Impacts on Search ?

Upload: ismail-badache

Post on 11-Apr-2017

298 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Fresh and Diverse Social Signals: Any Impacts on Search?

Fresh and Diverse Social Signals

Ismaïl BADACHE*, Mohand BOUGHANEM**

*LSIS Lab, Aix-Marseille University, France

**IRIT Lab, Toulouse University, France

{badache, boughanem}@irit.fr

Any Impacts on Search ?

Page 2: Fresh and Diverse Social Signals: Any Impacts on Search?

Presentation Plan

Introduction

Related Work

Temporality of Social Signals

Diversity of Social Signals5

1

4

Conclusion

2

6

IR Model Based on Social Signals3

Page 3: Fresh and Diverse Social Signals: Any Impacts on Search?

1.1 Emergence of social Web

1Source:blogdumoderateur.com

% Sharing in SN

More than 3.5 billion Internet users

74% among them are resgistred at least in one SN

Social content per 1 minute

70000 Posts

2.3 Million « Like »

~410 GB of Data

Facebook

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 4: Fresh and Diverse Social Signals: Any Impacts on Search?

Video

Image

Web Page

Web Resources

Resource

.

.

.

Social Networks

Bookmark

Comment

Share/Recommend

Rating/Vote

Like/+1

Interactions

Social Signals(User Generated Content)

2

Page 5: Fresh and Diverse Social Signals: Any Impacts on Search?

Video

Image

Web Page

Web Resources

Resource

.

.

.

Social Networks

Bookmark

Comment

Share/Recommend

Rating/Vote

Like/+1

Interactions

Social Signals(User Generated Content)

IR Model

Taking into account

Query Results

2

Page 6: Fresh and Diverse Social Signals: Any Impacts on Search?

Video

Image

Web Page

Web Resources

Resource

.

.

.

Social Networks

Bookmark

Comment

Share/Recommend

Rating/Vote

Like/+1

Interactions

Social Signals(User Generated Content)

Nature

Origin

Signification

Temporality

Diversity

Rating

54321

3

Page 7: Fresh and Diverse Social Signals: Any Impacts on Search?

1.2 Research Issues

How to take into account Social Signals and their temporality (e.g. age of

each action of rating) to estimate the importance of a resource?1

What is the impact of Signals’ diversity on IR process?2

What theoretical model to combine a priori relevance of resource with its

topical relevance?3

4

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 8: Fresh and Diverse Social Signals: Any Impacts on Search?

2.1 Related Work

Sources of evidence (Social Features) Properties Models Authors

Time-Independent Social Signals Approaches

• Number of : clicks, votes, records and

recommendations, likes, share, +1, tweet, etc

Popularity

Reputation

LM and Linear

Combination

(Karweg et al., 2011)

(Badache et al., 2015)

• Number of : like, dislike, comments on

YouTube.

• The playcount (number of times a user

listens to a track on lastfm)

Importance

Machine

Learning and

Linear

Combination

(Chelaru et al., 2012)

(Khodaei et al. 2012)

(Buijs et Spruit, 2014)

• Nombre de retweets. PopularityMachine

learning

(Yang et al., 2012)

(Hong et al., 2011)

Time-Dependent Social Signals Approaches

• Analysis of social signals to classify user

interest and interactions in 5 classes: recent,

ongoing, seasonal, past et random.

Temporal

interestsStatistic Study

(Khodaei et Alonso,

2012)

• Exploit the time click called ClickBuzz to

measure the interest of a document over

time.

Buzz over

time

Machine

learning(Inagaki et al., 2010)

5

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 9: Fresh and Diverse Social Signals: Any Impacts on Search?

2.2 Positioning

Evaluating the impact of the freshness of signals on the search performance

by using their creation date,1

Considering diversity of signals as an additional factor in the estimation of

the resource relevance.3

6

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Normalizing the distribution of signals on the resource using the age of the

resource2

Page 10: Fresh and Diverse Social Signals: Any Impacts on Search?

Part IIR Model Based on Social

Signals

Page 11: Fresh and Diverse Social Signals: Any Impacts on Search?

3.1 « Social » Representation of a document

7

Resource (Document)

- Like

- +1

- Share

- Rating

- ….

Mono-valued Signal

e.g. Like (number of like)

Multi-valued Signal

e.g. Rating (number of Rating and the value of Rating)

Textual Representation

• Keywords : 𝐷𝑤={𝑤1, 𝑤2, …𝑤𝑧}

Social Representation

• Social Signals : 𝐷𝑎={𝑎1, 𝑎2, … 𝑎𝑚}

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 12: Fresh and Diverse Social Signals: Any Impacts on Search?

3.2 A priori Importance of Document

8

𝑃 𝐷 𝑄 =𝑟𝑎𝑛𝑘 𝑃 𝑫 ∙ 𝑃 𝑄 𝐷)

𝑃 𝑎𝑖

a priori probability of

document D

Textual Model

Query/Content

𝑎𝑖∈ 𝐴

𝑃 𝑎𝑖

Individually Grouped

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 13: Fresh and Diverse Social Signals: Any Impacts on Search?

3.3 Estimating Priors

9

𝐵𝐴 𝐷 =𝑚𝑜𝑦 𝑟 ∙ 𝑟 + σ𝐷′∈𝑅𝑚𝑜𝑦 𝑟′ ∙ |𝑟′|

𝑟 + σ𝐷′∈𝑅 |𝑟′|

𝑃 𝑎𝑖 = 𝑅𝑎𝑡𝑖𝑛𝑔 =1 + log(1 + 𝐵𝐴 𝐷 )

1 + log(1 + σ𝐷′∈𝑅 𝐵𝐴(𝐷′))

Estimated by maximum

likelihood and smoothing by

Dirichlet

𝑷 𝒂𝒊

𝑃 𝑎𝑖 =𝐶𝑜𝑢𝑛𝑡(𝑎𝑖 , 𝐷)

𝐶𝑜𝑢𝑛𝑡(𝑎., 𝐷)

Bayesian Average BA

Multi-valued : RatingMono-valued

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 14: Fresh and Diverse Social Signals: Any Impacts on Search?

Part II Temporality of Social Signals

Page 15: Fresh and Diverse Social Signals: Any Impacts on Search?

4.1 Hypothesis (1) : Freshness of Signal

10

Time

Simple CountingResource R1

+1 1

1+1

1+1

Weighted Counting

by action dateResource R1

+1 0.5

0.75+1.5

0.72+1.2

Boosting resources associated with Fresh signals

t1 t2 t3 t4 t5

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 16: Fresh and Diverse Social Signals: Any Impacts on Search?

4.2 Freshness of Signal

11

• Signal biased by its creation date

𝐶𝑜𝑢𝑛𝑡𝑡𝑎 𝑡𝑗,𝑎𝑖 , 𝐷 =

𝑗=1

𝑘

𝑓 𝑡𝑗,𝑎𝑖 , 𝐷 𝑓 𝑡𝑗,𝑎𝑖 , 𝐷 = 𝑒𝑥𝑝 −∥ 𝑡𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑡𝑗,𝑎𝑖 ∥

2

2𝜎2

• Rating biased by its creation date

𝐵𝐴𝑡 𝐷 =𝑚𝑜𝑦 𝑟𝑡 ∙ 𝑟𝑡 + σ𝐷′∈𝑅𝑚𝑜𝑦 𝑟𝑡′ ∙ |𝑟𝑡′|

𝑟𝑡 + σ𝐷′∈𝑅 |𝑟𝑡′|

𝑟𝑡 = 𝑟 ∙ 𝑒𝑥𝑝 −∥ 𝑡𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑡𝑟 ∥

2

2𝜎2

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 17: Fresh and Diverse Social Signals: Any Impacts on Search?

4.3 Hypothesis (2) : Resource Age

12

Resource R1 : Age = 1 month

188

52

12+1

Resource R2 : Age = 1 day

2

1

0+1

The "old" resources are more likely to get more signals that the "recent"

Normalization of signals by the age of the resource

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 18: Fresh and Diverse Social Signals: Any Impacts on Search?

4.4 Signals Normalization with Resource Age

13

𝐶𝑜𝑢𝑛𝑡𝑡𝐷 𝑎𝑖 , 𝐷 = 𝐶𝑜𝑢𝑛𝑡 𝑎𝑖 , 𝐷 ∙ 𝐴(𝐷)

𝐴(𝐷) = 𝑒𝑥𝑝 −‖𝑡𝑎𝑐𝑡𝑢𝑒𝑙 − 𝑡𝐷‖

2

2𝜎2

• Normalize the distribution of signals by the publication date of the

resource (age of the resource).

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 19: Fresh and Diverse Social Signals: Any Impacts on Search?

4.5 Evaluation

14

• Evaluation Framework :

- Two INEX Datasets: IMDb et SBS.

- Collection of social signals for each document (IMDb and SBS).

Dataset Documents Topics

IMDb 2011 1.5 millions 30

SBS 2015 : Suggestion Track 2.8 millions 208

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

• Objectives

1) Evaluate the impact of signals on the performance of IR systems,

2) Evaluate the impact of signal freshness and resource age.

Page 20: Fresh and Diverse Social Signals: Any Impacts on Search?

4.5 Evaluation : IMDb Dataset

15

Textual Content: INEX IMDb 2011

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 21: Fresh and Diverse Social Signals: Any Impacts on Search?

Textual Content: INEX SBS 2015

4.5 Evaluation : SBS Dataset

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Indexed

Field Field Field FieldStatus Status Status Status

16

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 22: Fresh and Diverse Social Signals: Any Impacts on Search?

4.6 Evaluation : Social Signals

17

Social Content: 9 signals are collected.

DELICIOUS

Bookmark IMDb

WITTERTweet IMDb

GOOGLE++1 IMDb

Share

LINKEDIMDb

MAZON

Tag

RatingSBS

ACEBOOKLike

Share

Comment

SBS,

IMDb

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 23: Fresh and Diverse Social Signals: Any Impacts on Search?

4.7 Evaluation : Social Signals

18

Example:

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 24: Fresh and Diverse Social Signals: Any Impacts on Search?

4.8 Evaluation : Results on IMDb

0,4325

nDCG

LM.Hiemstra

0,3403

P@20

LM.Hiemstra

Baseline

Facebook

0,5130,5262

0,5121

0,4769

0,5017

0,4621 0,4566

nDCG

Like Share Comment Tweet +1 Bookmark Share(LIn)

0,3620,3649

0,35510,3512

0,34680,3414 0,3432

P@20

Like Share Comment Tweet +1 Bookmark Share(LIn)

Without considering Time

+6%+7%

+16%

+10%

+18%+22%

+18%

+11%+12%+9% +8%

+7%+5% +6% 0,362

0,37210,3683

0,35790,3511

0,3427 0,3449

P@20

Like Share Comment Tweet +1 Bookmark Share(LIn)

+18%+15%

+10%

+14%+10%

With considering Resource Age

0,5308 0,5544 0,52850,4903

0,52460,4671 0,4606

nDCG

Like Share Comment Tweet +1 Bookmark Share(LIn)

+20%+30%+23% +23% +13% +7% +7%

Facebook

+5% +6%

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 25: Fresh and Diverse Social Signals: Any Impacts on Search?

4.8 Evaluation : Results on SBS

0,0689 0,0711 0,0678

0,0559 0,0531

P@20

Like Share Comment Rating Tag

Without considering Time

0,18640,19

0,1807

0,1748 0,1742

nDCG

Like Share Comment Rating Tag

+7%+8%

+11%

+17%+15%

+23% +27%

Baseline

0,05

P@20

LM.Hiemstra

0,162

nDCG

LM.Hiemstra

Facebook

+21%+11% +5%

Considering Resource Age

0,07080,0796

0,0711 0,06950,058

P@20

Like Share Comment Rating Tag

0,19

0,2001

0,1882

0,18550,1771

nDCG

Like Share Comment Rating Tag

10%+30%+25%

+39%+26%

+9%+13%

+15%

+22%+17%

Date of Signal

0,0732

P@20

Rating

0,1904

nDCG

Rating

+33%

+10%

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 26: Fresh and Diverse Social Signals: Any Impacts on Search?

3.1 Proposed Approach4.9 Evaluation : Features Selection Algorithms (SBS)

21

--- : Highly selected

--- : Moderately selected

--- : Less selected

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Page 27: Fresh and Diverse Social Signals: Any Impacts on Search?

Part III Diversity of Social Signals

Page 28: Fresh and Diverse Social Signals: Any Impacts on Search?

5.1 Hypothesis

22

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

• Diversity of signals on a resource is a clue that may indicate an interest

beyond a social network or a community, i.e., a resource dominated by a

single signal should be disadvantaged against a resource with an equitable

distribution of the signals.

• Nature

• Origin

Page 29: Fresh and Diverse Social Signals: Any Impacts on Search?

5.2 Estimating Diversity of Signals

23

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

• Estimation of diversity using Shannon diversity index.

• The interest (importance) of a resource.

𝑃 𝐷 = ෑ

𝑎𝑖 ∈𝐴

𝑃 𝑎𝑖 ∙ 𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 𝐷

𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 𝐷 = −

𝑖=1

𝑚

𝑃(𝑎𝑖) ∙ 𝑙𝑜𝑔 𝑃 𝑎𝑖

Page 30: Fresh and Diverse Social Signals: Any Impacts on Search?

5.3 Evaluation : Results on IMDb & SBS

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

0,0868

0,197

0,0988

0,2095

P@20 nDCG

TotalFacebook All Criteria

0,4102

0,5681

0,4262

0,5974

P@20 nDCGTotalFacebook All Criteria

0,081

0,1937

0,0787

0,1981

P@20 nDCG

TotalFB All Criteria

0,4187

0,5713

0,4318

0,6174

P@20 nDCG

TotalFacebook All Criteria

0,084

0,1945

0,0915

0,2031

P@20 nDCG

TotalFB All Criteria

Without Diversity With Diversity

IMDb

SBS +7%

+4%+1%

+16%

+1%

SBS

Diversity & Resource Age

0,4289

0,5966

0,4334

0,6311

P@20 nDCG

TotalFacebook All Criteria

IMDb

+5%

+5%+6%

+7% +26%

+2%

+2%

+7%

+1%

+1%

Page 31: Fresh and Diverse Social Signals: Any Impacts on Search?

Conclusion and Perspectives

Page 32: Fresh and Diverse Social Signals: Any Impacts on Search?

3.1 Proposed Approach6. Conclusion and Perspectives

25

1. Introduction 2. Related Work

6. Conclusion

3. IR Model Based on Signals

5. Diversity of Signals4. Temporality of Signals

Contributions• IR model based on social signals.

• Taking into account the temporal aspect.

- « Freshness » of signal,

- Normalization of signal by resource age.

• Diversity of signals in the resource.

- Diversity in terms of nature and origin.

• Enrichment of test datasets (IMDb, SBS).

Lessons• Social signals are fruitful for IR.

Facebook signals and Ratings are the

most relevant features.

Bookmark is the weakest signal in

terms of relevance.

• Temporality and diversity have a positive

impact on results.

Perspectives• Taking into account the weight of the signals

• Studying the importance of social networks

and the actors of these signals and their

impact on the relevance,

• Studying the personalization of search

results according to the user's social

interactions and their temporality (Profile),

• Studying the polarity of the text content of

signals,

• Leveraging signals in other frameworks,

such as: analyzing the behavior of users of

social networks in a disaster, monitoring

patients (suicide attempt), recommendation.

Page 33: Fresh and Diverse Social Signals: Any Impacts on Search?

http://www.irit.fr/~Ismail.Badache/https://www.linkedin.com/in/ismailbadache/