tag sources for recommendation in collaborative tagging...

Tag Sources for Recommendation in Collaborative Tagging Systems

Faculty of Computer Science, Dalhousie University (Canada)

Marek LipczakYeming HuYael Kollet

Evangelos Milios

Content-based recommendation task- results 2009

2

Content-based recommendation task- results 2008

3

Tag recommendation system

4

URL recommender skipped for simplicity

Tag recommendation system

5

Example: A user is posting a web page:

YouTube - Web 2.0 ... The Machine is Us/ing Us http://www.youtube.com/watch?v=6gmP4nk0EOE

Content-based tags

6

Extracts tags from resource title (and URL - skipped)

Extraction of title based tags

7

web 20 the machine is using usyoutube

Each title word becomes a tag


8

web 20 the machine is using us0.097 0.191 0.092 < 0.001 0.075 < 0.001 < 0.001 < 0.001

youtube

Score pre-calculated for each word over the entire corpus


9

web 20 the machine is using us0.097 0.191 0.092 < 0.001 0.075 < 0.001 < 0.001 < 0.001

youtube

Removal of low-quality tags<0.05

Content based tags – result

10

web 20 machine0.097 0.191 0.092 0.075

youtubeContent-based tags – result

Retrieval of tags based on TitleToTag graph

11

Exploits co-occurrence between words from the title and tags

TitleToTag graph

12

Relations model created based on training data Co-occurrence score similar to confidence in association rule mining


13

youtube web 20 machine0,097 0,191 0,092 0,075

TitleToTag graph TitleToTag graph TitleToTag graph TitleToTag graphvideo web web20 machinelearning

0,60 0,15 0,39 0,06youtube semanticweb web machine

0,07 0,13 0,12 0,05web20 web20 20 learning

0,03 0,12 0,08 0,05music semantic Blog Juergen

0,02 0,05 0,05 0,05

... ... ... ...

Content-based tags – result

Retrieve tags that frequently co-occur with title words


14

web 20 machine0.097 0.191 0.092 0.075

video web web20

web machine

web20 web20 20 learning

music semantic

... ... ... ...

youtube

TitleToTag graph TitleToTag graph TitleToTag graph TitleToTag graphmachinelearning

0.097*0.60 0.191*0.15 0.092*0.39 0.075*0.06youtube semanticweb

0.097*0.07 0.191*0.13 0.092*0.12 0.075*0.05

0.097*0.03 0.191*0.12 0.092*0.08 0.075*0.05Blog Juergen

0.097*0.02 0.191*0.05 0.092*0.05 0.075*0.05

Multiply the co-occurrence scores by the score of the tag from title recommender


TitleToTag recommender - result

15

web 20 machine0,097 0,191 0,092 0,075

web20 video web semantic 20 ...0.061 0.058 0.039 0.025 0.010 0.007 0.007

youtube

TitleToTag recommender – resultsema[..]web youtube

Combine scores of duplicates in probabilistic way:

l1 join l2 = l1 + l2 - l1*l2


Retrieval of tags based on TagToTag graph

16

Exploits co-occurrence between tags assume that content based tags are correct

TagToTag graph

17

Relations model created based on training data Co-occurrence score similar to confidence in association rule mining


18

youtube web 20 machine0,097 0,191 0,092 0,075

TagToTag graph TagToTag graph TagToTag graph TagToTag graphyoutube web 20 machine

1,00 1,00 1,00 1,00video web20 web learning

0,46 0,16 0,81 0,71web20 semantic web20 ml

0,16 0,15 0,23 0,20music tools research diplomarbeit

0,11 0,13 0,19 0,15

... ... ... ...

Retrieve tags that frequently co-occur with content-based tags



19

web 20 machine0.097 0.191 0.092 0.075

web 20 machine

video web20 web learning

web20 semantic web20 ml

music tools research

... ... ... ...

youtube

TagToTag graph TagToTag graph TagToTag graph TagToTag graphyoutube

0.097*1.00 0.191*1.00 0.092*1.00 0.075*1.00

0.097*0.46 0.191*0.16 0.092*0.81 0.075*0.71

0.097*0.16 0.191*0.15 0.092*0.23 0.075*0.20diplomarbeit

0.097*0.11 0.191*0.13 0.092*0.19 0.075*0.15

Multiply the co-occurrence scores by the score of the tag from content recommender


TagToTag recommender - result

20

web 20 machine0,097 0,191 0,092 0,075

web 20 machine web20 learning video ...0.252 0.097 0.092 0.075 0.066 0.053 0.045

youtube

TagToTag recommender – resultyoutube



l1 join l2 = l1 + l2 - l1*l2

Resource profile recommender

21

Recommends tags used for the same resource by other users Tags scored by frequency

Resource related tags

22

Large but imprecise set of tags related to resource


23

web 20 machine0,097 0,191 0,092 0,075

web20 video web semantic 20 ...0.061 0.058 0.039 0.025 0.010 0.007 0.007

web 20 machine web20 learning video ...0.252 0.097 0.092 0.075 0.066 0.053 0.045

resource profile recommender – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142

youtube

TitleToTag recommender – resultsema[..]web youtube

TagToTag recommender – resultyoutube

soc[...]ware youtube viamwesch



l1 join l2 = l1 + l2 - l1*l2

Resource related tags – result

24

web 20 machine0,097 0,191 0,092 0,075

Resource related tags – resultvideo web20 web society hypertext

0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine learning ...0.142 0.142 0.098 0.094 0,044 0.075 0.053

youtube

youtube soc[..]ware viamwesch

sema[..]web sema[..]web

Large but imprecise set of tagsrelated to the resource



25

Returns tags previously used by the user (day-based frequency)

Resource and user related tags

26

Set of tag recommendations related both to resource and user

Resource and user related tags

27

web 20 machine0,097 0,191 0,092 0,075


0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine social learning ...0.142 0.142 0.098 0.094 0.075 0.071 0.053

User profile tagsfire[...]marks 6606 latex books music

0.123 0.077 0.036 0.036 0.334 0.027 0.027 0.024design visualization dictionary social web20 …0.024 0.024 0.020 0.017 0.017 0.017 0.013

youtube


sema[..]web

frombrowser java msc

humor api

Intersection of tags related to resource and user


Resource and user related tags – result

28

web 20 machine0,097 0,191 0,092 0,075


0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine social learning ...

0.142 0.142 0.098 0.094 0.075 0.071 0.053User profile tags

fire[...]marks 6606 latex books music0.123 0.077 0.036 0.036 0.334 0.027 0.027 0.024

design visualization dictionary social web20 …0.024 0.024 0.020 0.017 0.017 0.017 0.013

Intersection of tags related to resource and user – resultweb20 social0.811*0.013 0.071*0.017

youtube


sema[..]web

frombrowser java msc

humor api

Multiply the scores of tags


Final recommendation

29

Union of the results of three most precise basic recommenders

Final result = title + resource profile + intersection of resource related and user profile

30

web 20 machine0.097 0.191 0.092 0.075

Resource profile tags – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142


web 20 machine0.191 0.097 0.092 0.075

video web20 society hypertext0.857 0.785 0.214 0.214 0.214 0.142 0.142

web20 social0.010 0.001

youtube


youtube


Scores of results of different recommenders are not comparable – rescoring step

x 0.30.191

x 0.30.857

x 0.450.010


31

web 20 machine0.097 0.191 0.092 0.075

Resource profile tags – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142


web 20 machine0.300 0.152 0.144 0.118


web20 social0.450 0.045

youtube


youtube




Scores of results of different recommenders are not comparable – rescoring step

32

web 20 machine0.097 0.191 0.092 0.075

Resource profile tags – resultvideo web20 society hypertext ...

0.857 0.785 0.214 0.214 0.214 0.142 0.142Intersection of tags related to resource and user – result

web20 social0.811*0.013 0.071*0.017

web 20 machine0.300 0.152 0.144 0.118


web20 social0.450 0.045

web20 web 20 machine0.601 0.300 0.216 0.144 0.118

youtube


youtube


youtube



System evaluation – Tasks

Content-based recommendation task (Task 1) Our main focus

98.2% of test posts Easier to understand

Graph-based recommendation task (Task 2) Question of practicality

1.8% of test posts Frequent tags only

Harder to draw conclusions

33

Results – Content-based recommendation

The overall result of the system is defined by tags extracted from resource title (and URL)

34

F1 at 5

title 0.17230

resource 0.03252

user 0.05581

userXres 0.07093

final 0.18740

BibTeX bookmark

System evaluation – decisions

Title and URL/Title only For bookmark posts title tags were combined with URL tags Alternative: title as the only source of content tags for both types of

posts Clean title/Use low quality tags

Title tags with low score (<0.05) were removed from the title recommendation set

Alternative: all title tags are used in the recommendation process Separate models/Common models

Two separate sets of models were built for BibTeX and bookmark posts Alternative: Common models built based on all posts

35

Decision – Title and URL/Title only

WRONG – Improvement in recall does not reward precision drop Augmenting tags from precise source is hard

36

content based final recommendation F1 at 5title and URL 0.17230

title only 0.17743final (title and URL) 0.18740

final (title only) 0.19032

Decision – Clean title/Use low quality tags

RIGHT – to maximize f1 precision and recall should be equalized

37

final recommendation F1 at 5title only 0.17743

with low q. tags 0.16754final (title only) 0.19032final (low q. tags) 0.18425

content based

Decision – Separate models/Common models

38

WRONG – Separate models are slightly less accurate (counter-intuitive)

final recommendation F1 at 5title only 0.17743

title(common m.) 0.17829final (title only) 0.19032

final (common m.) 0.19122

content based

Results – Graph-based recommendation

39

BibTeX bookmark F1 at 5

title 0.23484

resource 0.30706

user 0.12753

userXres 0.22642

final 0.32461

Final recommendation is mostly defined by resource profile tags Intersection of user and resource related tags is worse source of tags

than resource profile (and title) – problem of imported posts?

Conclusions

Only one step ahead of baseline recommenders Title for content-based recommendation Resource profile for graph-based recommendation

Potential of user-based recommendation still undefined Slight improvement for both tasks

Noise caused by imported posts Are the two proposed evaluation methods representative?

40

Future work

Exploitation of user specific patterns User specific tags (e.g., name of the author for each BibTeX

publication) Handling of multi-word concepts (“information”, “retrieval”

or “information_retrieval”?) Short temporal patterns (sequence of posts addressing the

same problem)

Solution for imported posts noise

41

Tag Sources for Recommendation in Collaborative Tagging Systems

Faculty of Computer Science, Dalhousie University (Canada)

Marek LipczakYeming HuYael Kollet

Evangelos Milios

Thank you!

tag sources for recommendation in collaborative tagging...

Documents