an ir-based approach to tag recommendation

24
An IR-based approach to tag recommendation C. Musto, F. Narducci, P. Lops, M.de Gemmis, G. Semeraro IIR 2010 - First Italian Information Retrieval Workshop Padova, 28 gen 10 !!"#$%& "!( #&&!)) #$* $!+),$#-./#%,$ 0!)!#+&1 2+,34 1546778889*.93$.(#9.:7;)8#4

Upload: cataldo-musto

Post on 15-Jan-2015

464 views

Category:

Documents


0 download

DESCRIPTION

Presentazione IIR 10 - Padova - "An IR-based approach to tag recommendation"

TRANSCRIPT

Page 1: An IR-based approach to tag recommendation

An IR-based approach to tag recommendation

C. Musto, F. Narducci, P. Lops, M.de Gemmis, G. Semeraro

IIR 2010 - First Italian Information Retrieval WorkshopPadova, 28 gen 10

!!"#$%&'"!(''#&&!))'#$* $!+),$#-./#%,$''0!)!#+&1'2+,34'1546778889*.93$.(#9.:7;)8#4''

Page 2: An IR-based approach to tag recommendation

outline• Background

• Web 2.0 and User-Generated Content

• Collaborative Tagging Systems

• Tag Recommendation

• STaR: Social Tag Recommender System

• Basic assumptions

• Architecture

• Experimental Evaluation

• Conclusions and future work

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 2

Page 3: An IR-based approach to tag recommendation

background•What is a tag?

•Where do we use tags?

•Why do we use tags?

•Why do we need a tag recommender?

•How does a tag recommender works?

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 3

Page 4: An IR-based approach to tag recommendation

web 2.0• Nowadays web sites tend to

be more and more social

• Web 2.0 platforms let users to publish auto-produced content

• users can post photos, videos

• users can express opinions (e.g. reviews)

• users can annotate resources

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 4

Page 5: An IR-based approach to tag recommendation

•Users annotate resources of interest with free keywords, called tags

• The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy

social tagging

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 5

Page 6: An IR-based approach to tag recommendation

• The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy

• A folksonomy is a set of tags

• Usually represented with a Tag Cloud

• The more a tag is used by the community to describe a resource, the more is the likelihood that it faithfully describes the information conveyed by the resource

folskonomies

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 6

Page 7: An IR-based approach to tag recommendation

social tagging systems•Advantages

• Information organized in a way that closely follows the user mental model

•Effective retrieval, serendipitous browsing

•Disadvantages

•Tag space usually very noisy

•Polysemy, synonymy, level variation

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 7

Page 8: An IR-based approach to tag recommendation

social tagging systems• These problems are of hindrance to completely

exploit the expressive power of folksonomies

• e.g. ) Searching the resources annotated with the tag “Macbook” will exclude the resources annotated with the tag “MacBookPro”

• Folksonomies can’t be exploited for retrieval and filtering resources in an effective way

• Tag Recommenders are more and more required

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 8

Page 9: An IR-based approach to tag recommendation

tag recommenders: how do they work?

•A user posts a new resource on a platform

•e.g. a new bookmark on bibsonomy.org

•The resource is analyzed

•A set of (hopefully) relevant tags is produced and filtered

•The user freely chooses the most appropriate tags to annotate the resource

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 9

Page 10: An IR-based approach to tag recommendation

STaR: Social Tag Recommender System

•Basic assumptions

• Resources with similar content should be annotated with similar tags

•Improved retrieval techniques

• The users previous tagging activity should be taken into account

•Increasing the weight of tags already used to annotate similar resources

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 10

Page 11: An IR-based approach to tag recommendation

STaR ArchitectureC Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 11

Page 12: An IR-based approach to tag recommendation

STaR: indexing strategy•Based on Apache Lucene engine

•A Personal Index for each user

•Information on her previously tagged resources

•A Social Index for the whole community

•Information about all the resources previously tagged by the community

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 12

Page 13: An IR-based approach to tag recommendation

STaR ArchitectureC Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 13

Page 14: An IR-based approach to tag recommendation

STaR: retrieval of similar resources

•Given a resource to be tagged

•Both the Personal Index and the Social Index queried

•Lucene Scoring function replaced with the Okapi BM25 implementation

•State-of-the-art retrieval model

•Resources with similarity exceeding a certain threshold retrieved

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 14

Page 15: An IR-based approach to tag recommendation

STaR Retrieval of Similar Resources

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 15

Page 16: An IR-based approach to tag recommendation

STaR ArchitectureC Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 16

Page 17: An IR-based approach to tag recommendation

STaR: extraction of candidate tags• Extraction of tags from the most similar resources retrieved in the

previous step

• Building a set of candidate tags

• Each tag assigned with a score by weighting the normalized occurence of the tag with the similar score returned by Lucene

• Possible different weights to resources retrieved querying the Personal Index or the Social Index

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 17

Page 18: An IR-based approach to tag recommendation

STaR Tag Extraction Process

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 18

Page 19: An IR-based approach to tag recommendation

experimental evaluation• Goal

• To evaluate the accurary of STaR using different Lucene scoring functions (Experiment 1)

• Original vs. BM25

• To evaluate the best combination of weights for resources retrieved from Personal Index and Social Index (Experiment 2)

• Dataset

• Gathered from Bibsonomy

• 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 19

Page 20: An IR-based approach to tag recommendation

results of experiment 1scoring resource precision recall f1

original bookmark 25,26 29,67 27,29

bm25 bookmark 25,62 36,62 30,15

original BibTex 14,06 21,45 16,99

bm25 BibTex 13,72 22,91 17,16

original overall 16,43 23,58 19,37

bm25 overall 16,45 26,46 20,29

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 20

Page 21: An IR-based approach to tag recommendation

results of experiment 2approach social tag

weightpersonal tag

weight precision recall f1community-

based 1,0 0,0 34,44 35,89 35,15

user-based 0,0 1,0 44,73 40,53 42,53

hybrid_1 0,7 0,3 32,31 38,57 35,16

hybrid_2 0,5 0,5 32,36 37,55 34,76

hybrid_3 0,3 0,7 35,47 39,68 37,46

baseline - - 42,03 13,23 20,13

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 21

Page 22: An IR-based approach to tag recommendation

ECML/PKDD Discovery Challenge 2009

•STaR participated in the ECML/PKDD 2009 Discovery Challenge

•The only Italian team

•Sixth place in the task of content-based tag recommendation (more than 20 participants)

We are there

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 22

Page 23: An IR-based approach to tag recommendation

conclusions• Users tend to reuse their own tags to annotate similar resources

• The integration of a more effective scoring function (BM25) improves the recommender accuracy

• Robust recommendation model

• Partecipation to the Discovery Challenge @ECML-PKDD 09

• Future Work

• Tag extraction from textual content of resources

• Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset

• Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 23

Page 24: An IR-based approach to tag recommendation

http://www.di.uniba.it/~swap/

Cataldo MustoPh.D. Student

University of Bari - “Aldo Moro”Italy

[email protected]

Thanks for your attention