tag sources for recommendation in collaborative tagging...

42
Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer Science, Dalhousie University (Canada) Marek Lipczak Yeming Hu Yael Kollet Evangelos Milios

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Tag Sources for Recommendation in Collaborative Tagging Systems

Faculty of Computer Science, Dalhousie University (Canada)

Marek LipczakYeming HuYael Kollet

Evangelos Milios

Page 2: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Content-based recommendation task- results 2009

2

Page 3: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Content-based recommendation task- results 2008

3

Page 4: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Tag recommendation system

4

URL recommender skipped for simplicity

Page 5: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Tag recommendation system

5

Example: A user is posting a web page:

YouTube - Web 2.0 ... The Machine is Us/ing Us http://www.youtube.com/watch?v=6gmP4nk0EOE

Page 6: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Content-based tags

6

Extracts tags from resource title (and URL - skipped)

Page 7: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Extraction of title based tags

7

web 20 the machine is using usyoutube

Each title word becomes a tag

Page 8: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Extraction of title based tags

8

web 20 the machine is using us0.097 0.191 0.092 < 0.001 0.075 < 0.001 < 0.001 < 0.001

youtube

Score pre-calculated for each word over the entire corpus

Page 9: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Extraction of title based tags

9

web 20 the machine is using us0.097 0.191 0.092 < 0.001 0.075 < 0.001 < 0.001 < 0.001

youtube

Removal of low-quality tags<0.05

Page 10: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Content based tags – result

10

web 20 machine0.097 0.191 0.092 0.075

youtubeContent-based tags – result

Page 11: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Retrieval of tags based on TitleToTag graph

11

Exploits co-occurrence between words from the title and tags

Page 12: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

TitleToTag graph

12

Relations model created based on training data Co-occurrence score similar to confidence in association rule mining

Page 13: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Retrieval of tags based on TitleToTag graph

13

youtube web 20 machine0,097 0,191 0,092 0,075

TitleToTag graph TitleToTag graph TitleToTag graph TitleToTag graphvideo web web20 machinelearning

0,60 0,15 0,39 0,06youtube semanticweb web machine

0,07 0,13 0,12 0,05web20 web20 20 learning

0,03 0,12 0,08 0,05music semantic Blog Juergen

0,02 0,05 0,05 0,05

... ... ... ...

Content-based tags – result

Retrieve tags that frequently co-occur with title words

Page 14: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Retrieval of tags based on TitleToTag graph

14

web 20 machine0.097 0.191 0.092 0.075

video web web20

web machine

web20 web20 20 learning

music semantic

... ... ... ...

youtube

TitleToTag graph TitleToTag graph TitleToTag graph TitleToTag graphmachinelearning

0.097*0.60 0.191*0.15 0.092*0.39 0.075*0.06youtube semanticweb

0.097*0.07 0.191*0.13 0.092*0.12 0.075*0.05

0.097*0.03 0.191*0.12 0.092*0.08 0.075*0.05Blog Juergen

0.097*0.02 0.191*0.05 0.092*0.05 0.075*0.05

Multiply the co-occurrence scores by the score of the tag from title recommender

Content-based tags – result

Page 15: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

TitleToTag recommender - result

15

web 20 machine0,097 0,191 0,092 0,075

web20 video web semantic 20 ...0.061 0.058 0.039 0.025 0.010 0.007 0.007

youtube

TitleToTag recommender – resultsema[..]web youtube

Combine scores of duplicates in probabilistic way:

l1 join l2 = l1 + l2 - l1*l2

Content-based tags – result

Page 16: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Retrieval of tags based on TagToTag graph

16

Exploits co-occurrence between tags assume that content based tags are correct

Page 17: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

TagToTag graph

17

Relations model created based on training data Co-occurrence score similar to confidence in association rule mining

Page 18: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Retrieval of tags based on TagToTag graph

18

youtube web 20 machine0,097 0,191 0,092 0,075

TagToTag graph TagToTag graph TagToTag graph TagToTag graphyoutube web 20 machine

1,00 1,00 1,00 1,00video web20 web learning

0,46 0,16 0,81 0,71web20 semantic web20 ml

0,16 0,15 0,23 0,20music tools research diplomarbeit

0,11 0,13 0,19 0,15

... ... ... ...

Retrieve tags that frequently co-occur with content-based tags

Content-based tags – result

Page 19: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Retrieval of tags based on TagToTag graph

19

web 20 machine0.097 0.191 0.092 0.075

web 20 machine

video web20 web learning

web20 semantic web20 ml

music tools research

... ... ... ...

youtube

TagToTag graph TagToTag graph TagToTag graph TagToTag graphyoutube

0.097*1.00 0.191*1.00 0.092*1.00 0.075*1.00

0.097*0.46 0.191*0.16 0.092*0.81 0.075*0.71

0.097*0.16 0.191*0.15 0.092*0.23 0.075*0.20diplomarbeit

0.097*0.11 0.191*0.13 0.092*0.19 0.075*0.15

Multiply the co-occurrence scores by the score of the tag from content recommender

Content-based tags – result

Page 20: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

TagToTag recommender - result

20

web 20 machine0,097 0,191 0,092 0,075

web 20 machine web20 learning video ...0.252 0.097 0.092 0.075 0.066 0.053 0.045

youtube

TagToTag recommender – resultyoutube

Content-based tags – result

Combine scores of duplicates in probabilistic way:

l1 join l2 = l1 + l2 - l1*l2

Page 21: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource profile recommender

21

Recommends tags used for the same resource by other users Tags scored by frequency

Page 22: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource related tags

22

Large but imprecise set of tags related to resource

Page 23: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource related tags

23

web 20 machine0,097 0,191 0,092 0,075

web20 video web semantic 20 ...0.061 0.058 0.039 0.025 0.010 0.007 0.007

web 20 machine web20 learning video ...0.252 0.097 0.092 0.075 0.066 0.053 0.045

resource profile recommender – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142

youtube

TitleToTag recommender – resultsema[..]web youtube

TagToTag recommender – resultyoutube

soc[...]ware youtube viamwesch

Content-based tags – result

Combine scores of duplicates in probabilistic way:

l1 join l2 = l1 + l2 - l1*l2

Page 24: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource related tags – result

24

web 20 machine0,097 0,191 0,092 0,075

Resource related tags – resultvideo web20 web society hypertext

0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine learning ...0.142 0.142 0.098 0.094 0,044 0.075 0.053

youtube

youtube soc[..]ware viamwesch

sema[..]web sema[..]web

Large but imprecise set of tagsrelated to the resource

Content-based tags – result

Page 25: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource related tags

25

Returns tags previously used by the user (day-based frequency)

Page 26: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource and user related tags

26

Set of tag recommendations related both to resource and user

Page 27: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource and user related tags

27

web 20 machine0,097 0,191 0,092 0,075

Resource related tags – resultvideo web20 web society hypertext

0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine social learning ...0.142 0.142 0.098 0.094 0.075 0.071 0.053

User profile tagsfire[...]marks 6606 latex books music

0.123 0.077 0.036 0.036 0.334 0.027 0.027 0.024design visualization dictionary social web20 …0.024 0.024 0.020 0.017 0.017 0.017 0.013

youtube

youtube soc[..]ware viamwesch

sema[..]web

frombrowser java msc

humor api

Intersection of tags related to resource and user

Content-based tags – result

Page 28: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Resource and user related tags – result

28

web 20 machine0,097 0,191 0,092 0,075

Resource related tags – resultvideo web20 web society hypertext

0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine social learning ...

0.142 0.142 0.098 0.094 0.075 0.071 0.053User profile tags

fire[...]marks 6606 latex books music0.123 0.077 0.036 0.036 0.334 0.027 0.027 0.024

design visualization dictionary social web20 …0.024 0.024 0.020 0.017 0.017 0.017 0.013

Intersection of tags related to resource and user – resultweb20 social0.811*0.013 0.071*0.017

youtube

youtube soc[..]ware viamwesch

sema[..]web

frombrowser java msc

humor api

Multiply the scores of tags

Content-based tags – result

Page 29: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Final recommendation

29

Union of the results of three most precise basic recommenders

Page 30: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Final result = title + resource profile + intersection of resource related and user profile

30

web 20 machine0.097 0.191 0.092 0.075

Resource profile tags – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142

Intersection of tags related to resource and user – resultweb20 social0.811*0.013 0.071*0.017

web 20 machine0.191 0.097 0.092 0.075

video web20 society hypertext0.857 0.785 0.214 0.214 0.214 0.142 0.142

web20 social0.010 0.001

youtube

soc[...]ware youtube viamwesch

youtube

soc[...]ware youtube viamwesch

Scores of results of different recommenders are not comparable – rescoring step

x 0.30.191

x 0.30.857

x 0.450.010

Content-based tags – result

Page 31: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

31

web 20 machine0.097 0.191 0.092 0.075

Resource profile tags – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142

Intersection of tags related to resource and user – resultweb20 social0.811*0.013 0.071*0.017

web 20 machine0.300 0.152 0.144 0.118

video web20 society hypertext0.300 0.275 0.075 0.075 0.075 0.050 0.050

web20 social0.450 0.045

youtube

soc[...]ware youtube viamwesch

youtube

soc[...]ware youtube viamwesch

Content-based tags – result

Final result = title + resource profile + intersection of resource related and user profile

Scores of results of different recommenders are not comparable – rescoring step

Page 32: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

32

web 20 machine0.097 0.191 0.092 0.075

Resource profile tags – resultvideo web20 society hypertext ...

0.857 0.785 0.214 0.214 0.214 0.142 0.142Intersection of tags related to resource and user – result

web20 social0.811*0.013 0.071*0.017

web 20 machine0.300 0.152 0.144 0.118

video web20 society hypertext0.300 0.275 0.075 0.075 0.075 0.050 0.050

web20 social0.450 0.045

web20 web 20 machine0.601 0.300 0.216 0.144 0.118

youtube

soc[...]ware youtube viamwesch

youtube

soc[...]ware youtube viamwesch

youtube

Content-based tags – result

Final result = title + resource profile + intersection of resource related and user profile

Page 33: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

System evaluation – Tasks

Content-based recommendation task (Task 1) Our main focus

98.2% of test posts Easier to understand

Graph-based recommendation task (Task 2) Question of practicality

1.8% of test posts Frequent tags only

Harder to draw conclusions

33

Page 34: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Results – Content-based recommendation

The overall result of the system is defined by tags extracted from resource title (and URL)

34

F1 at 5

title 0.17230

resource 0.03252

user 0.05581

userXres 0.07093

final 0.18740

BibTeX bookmark

Page 35: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

System evaluation – decisions

Title and URL/Title only For bookmark posts title tags were combined with URL tags Alternative: title as the only source of content tags for both types of

posts Clean title/Use low quality tags

Title tags with low score (<0.05) were removed from the title recommendation set

Alternative: all title tags are used in the recommendation process Separate models/Common models

Two separate sets of models were built for BibTeX and bookmark posts Alternative: Common models built based on all posts

35

Page 36: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Decision – Title and URL/Title only

WRONG – Improvement in recall does not reward precision drop Augmenting tags from precise source is hard

36

content based final recommendation F1 at 5title and URL 0.17230

title only 0.17743final (title and URL) 0.18740

final (title only) 0.19032

Page 37: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Decision – Clean title/Use low quality tags

RIGHT – to maximize f1 precision and recall should be equalized

37

final recommendation F1 at 5title only 0.17743

with low q. tags 0.16754final (title only) 0.19032final (low q. tags) 0.18425

content based

Page 38: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Decision – Separate models/Common models

38

WRONG – Separate models are slightly less accurate (counter-intuitive)

final recommendation F1 at 5title only 0.17743

title(common m.) 0.17829final (title only) 0.19032

final (common m.) 0.19122

content based

Page 39: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Results – Graph-based recommendation

39

BibTeX bookmark F1 at 5

title 0.23484

resource 0.30706

user 0.12753

userXres 0.22642

final 0.32461

Final recommendation is mostly defined by resource profile tags Intersection of user and resource related tags is worse source of tags

than resource profile (and title) – problem of imported posts?

Page 40: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Conclusions

Only one step ahead of baseline recommenders Title for content-based recommendation Resource profile for graph-based recommendation

Potential of user-based recommendation still undefined Slight improvement for both tasks

Noise caused by imported posts Are the two proposed evaluation methods representative?

40

Page 41: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Future work

Exploitation of user specific patterns User specific tags (e.g., name of the author for each BibTeX

publication) Handling of multi-word concepts (“information”, “retrieval”

or “information_retrieval”?) Short temporal patterns (sequence of posts addressing the

same problem)

Solution for imported posts noise

41

Page 42: Tag Sources for Recommendation in Collaborative Tagging ...lipczak/old/publications/Lipczak09tag... · Tag Sources for Recommendation in Collaborative Tagging Systems Faculty of Computer

Tag Sources for Recommendation in Collaborative Tagging Systems

Faculty of Computer Science, Dalhousie University (Canada)

Marek LipczakYeming HuYael Kollet

Evangelos Milios

Thank you!