tag sources for recommendation in collaborative tagging...
TRANSCRIPT
Tag Sources for Recommendation in Collaborative Tagging Systems
Faculty of Computer Science, Dalhousie University (Canada)
Marek LipczakYeming HuYael Kollet
Evangelos Milios
Content-based recommendation task- results 2009
2
Content-based recommendation task- results 2008
3
Tag recommendation system
4
URL recommender skipped for simplicity
Tag recommendation system
5
Example: A user is posting a web page:
YouTube - Web 2.0 ... The Machine is Us/ing Us http://www.youtube.com/watch?v=6gmP4nk0EOE
Content-based tags
6
Extracts tags from resource title (and URL - skipped)
Extraction of title based tags
7
web 20 the machine is using usyoutube
Each title word becomes a tag
Extraction of title based tags
8
web 20 the machine is using us0.097 0.191 0.092 < 0.001 0.075 < 0.001 < 0.001 < 0.001
youtube
Score pre-calculated for each word over the entire corpus
Extraction of title based tags
9
web 20 the machine is using us0.097 0.191 0.092 < 0.001 0.075 < 0.001 < 0.001 < 0.001
youtube
Removal of low-quality tags<0.05
Content based tags – result
10
web 20 machine0.097 0.191 0.092 0.075
youtubeContent-based tags – result
Retrieval of tags based on TitleToTag graph
11
Exploits co-occurrence between words from the title and tags
TitleToTag graph
12
Relations model created based on training data Co-occurrence score similar to confidence in association rule mining
Retrieval of tags based on TitleToTag graph
13
youtube web 20 machine0,097 0,191 0,092 0,075
TitleToTag graph TitleToTag graph TitleToTag graph TitleToTag graphvideo web web20 machinelearning
0,60 0,15 0,39 0,06youtube semanticweb web machine
0,07 0,13 0,12 0,05web20 web20 20 learning
0,03 0,12 0,08 0,05music semantic Blog Juergen
0,02 0,05 0,05 0,05
... ... ... ...
Content-based tags – result
Retrieve tags that frequently co-occur with title words
Retrieval of tags based on TitleToTag graph
14
web 20 machine0.097 0.191 0.092 0.075
video web web20
web machine
web20 web20 20 learning
music semantic
... ... ... ...
youtube
TitleToTag graph TitleToTag graph TitleToTag graph TitleToTag graphmachinelearning
0.097*0.60 0.191*0.15 0.092*0.39 0.075*0.06youtube semanticweb
0.097*0.07 0.191*0.13 0.092*0.12 0.075*0.05
0.097*0.03 0.191*0.12 0.092*0.08 0.075*0.05Blog Juergen
0.097*0.02 0.191*0.05 0.092*0.05 0.075*0.05
Multiply the co-occurrence scores by the score of the tag from title recommender
Content-based tags – result
TitleToTag recommender - result
15
web 20 machine0,097 0,191 0,092 0,075
web20 video web semantic 20 ...0.061 0.058 0.039 0.025 0.010 0.007 0.007
youtube
TitleToTag recommender – resultsema[..]web youtube
Combine scores of duplicates in probabilistic way:
l1 join l2 = l1 + l2 - l1*l2
Content-based tags – result
Retrieval of tags based on TagToTag graph
16
Exploits co-occurrence between tags assume that content based tags are correct
TagToTag graph
17
Relations model created based on training data Co-occurrence score similar to confidence in association rule mining
Retrieval of tags based on TagToTag graph
18
youtube web 20 machine0,097 0,191 0,092 0,075
TagToTag graph TagToTag graph TagToTag graph TagToTag graphyoutube web 20 machine
1,00 1,00 1,00 1,00video web20 web learning
0,46 0,16 0,81 0,71web20 semantic web20 ml
0,16 0,15 0,23 0,20music tools research diplomarbeit
0,11 0,13 0,19 0,15
... ... ... ...
Retrieve tags that frequently co-occur with content-based tags
Content-based tags – result
Retrieval of tags based on TagToTag graph
19
web 20 machine0.097 0.191 0.092 0.075
web 20 machine
video web20 web learning
web20 semantic web20 ml
music tools research
... ... ... ...
youtube
TagToTag graph TagToTag graph TagToTag graph TagToTag graphyoutube
0.097*1.00 0.191*1.00 0.092*1.00 0.075*1.00
0.097*0.46 0.191*0.16 0.092*0.81 0.075*0.71
0.097*0.16 0.191*0.15 0.092*0.23 0.075*0.20diplomarbeit
0.097*0.11 0.191*0.13 0.092*0.19 0.075*0.15
Multiply the co-occurrence scores by the score of the tag from content recommender
Content-based tags – result
TagToTag recommender - result
20
web 20 machine0,097 0,191 0,092 0,075
web 20 machine web20 learning video ...0.252 0.097 0.092 0.075 0.066 0.053 0.045
youtube
TagToTag recommender – resultyoutube
Content-based tags – result
Combine scores of duplicates in probabilistic way:
l1 join l2 = l1 + l2 - l1*l2
Resource profile recommender
21
Recommends tags used for the same resource by other users Tags scored by frequency
Resource related tags
22
Large but imprecise set of tags related to resource
Resource related tags
23
web 20 machine0,097 0,191 0,092 0,075
web20 video web semantic 20 ...0.061 0.058 0.039 0.025 0.010 0.007 0.007
web 20 machine web20 learning video ...0.252 0.097 0.092 0.075 0.066 0.053 0.045
resource profile recommender – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142
youtube
TitleToTag recommender – resultsema[..]web youtube
TagToTag recommender – resultyoutube
soc[...]ware youtube viamwesch
Content-based tags – result
Combine scores of duplicates in probabilistic way:
l1 join l2 = l1 + l2 - l1*l2
Resource related tags – result
24
web 20 machine0,097 0,191 0,092 0,075
Resource related tags – resultvideo web20 web society hypertext
0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine learning ...0.142 0.142 0.098 0.094 0,044 0.075 0.053
youtube
youtube soc[..]ware viamwesch
sema[..]web sema[..]web
Large but imprecise set of tagsrelated to the resource
Content-based tags – result
Resource related tags
25
Returns tags previously used by the user (day-based frequency)
Resource and user related tags
26
Set of tag recommendations related both to resource and user
Resource and user related tags
27
web 20 machine0,097 0,191 0,092 0,075
Resource related tags – resultvideo web20 web society hypertext
0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine social learning ...0.142 0.142 0.098 0.094 0.075 0.071 0.053
User profile tagsfire[...]marks 6606 latex books music
0.123 0.077 0.036 0.036 0.334 0.027 0.027 0.024design visualization dictionary social web20 …0.024 0.024 0.020 0.017 0.017 0.017 0.013
youtube
youtube soc[..]ware viamwesch
sema[..]web
frombrowser java msc
humor api
Intersection of tags related to resource and user
Content-based tags – result
Resource and user related tags – result
28
web 20 machine0,097 0,191 0,092 0,075
Resource related tags – resultvideo web20 web society hypertext
0.871 0.811 0.383 0.295 0.214 0.214 0.142 0.142identity people 20 machine social learning ...
0.142 0.142 0.098 0.094 0.075 0.071 0.053User profile tags
fire[...]marks 6606 latex books music0.123 0.077 0.036 0.036 0.334 0.027 0.027 0.024
design visualization dictionary social web20 …0.024 0.024 0.020 0.017 0.017 0.017 0.013
Intersection of tags related to resource and user – resultweb20 social0.811*0.013 0.071*0.017
youtube
youtube soc[..]ware viamwesch
sema[..]web
frombrowser java msc
humor api
Multiply the scores of tags
Content-based tags – result
Final recommendation
29
Union of the results of three most precise basic recommenders
Final result = title + resource profile + intersection of resource related and user profile
30
web 20 machine0.097 0.191 0.092 0.075
Resource profile tags – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142
Intersection of tags related to resource and user – resultweb20 social0.811*0.013 0.071*0.017
web 20 machine0.191 0.097 0.092 0.075
video web20 society hypertext0.857 0.785 0.214 0.214 0.214 0.142 0.142
web20 social0.010 0.001
youtube
soc[...]ware youtube viamwesch
youtube
soc[...]ware youtube viamwesch
Scores of results of different recommenders are not comparable – rescoring step
x 0.30.191
x 0.30.857
x 0.450.010
Content-based tags – result
31
web 20 machine0.097 0.191 0.092 0.075
Resource profile tags – resultvideo web20 society hypertext ...0.857 0.785 0.214 0.214 0.214 0.142 0.142
Intersection of tags related to resource and user – resultweb20 social0.811*0.013 0.071*0.017
web 20 machine0.300 0.152 0.144 0.118
video web20 society hypertext0.300 0.275 0.075 0.075 0.075 0.050 0.050
web20 social0.450 0.045
youtube
soc[...]ware youtube viamwesch
youtube
soc[...]ware youtube viamwesch
Content-based tags – result
Final result = title + resource profile + intersection of resource related and user profile
Scores of results of different recommenders are not comparable – rescoring step
32
web 20 machine0.097 0.191 0.092 0.075
Resource profile tags – resultvideo web20 society hypertext ...
0.857 0.785 0.214 0.214 0.214 0.142 0.142Intersection of tags related to resource and user – result
web20 social0.811*0.013 0.071*0.017
web 20 machine0.300 0.152 0.144 0.118
video web20 society hypertext0.300 0.275 0.075 0.075 0.075 0.050 0.050
web20 social0.450 0.045
web20 web 20 machine0.601 0.300 0.216 0.144 0.118
youtube
soc[...]ware youtube viamwesch
youtube
soc[...]ware youtube viamwesch
youtube
Content-based tags – result
Final result = title + resource profile + intersection of resource related and user profile
System evaluation – Tasks
Content-based recommendation task (Task 1) Our main focus
98.2% of test posts Easier to understand
Graph-based recommendation task (Task 2) Question of practicality
1.8% of test posts Frequent tags only
Harder to draw conclusions
33
Results – Content-based recommendation
The overall result of the system is defined by tags extracted from resource title (and URL)
34
F1 at 5
title 0.17230
resource 0.03252
user 0.05581
userXres 0.07093
final 0.18740
BibTeX bookmark
System evaluation – decisions
Title and URL/Title only For bookmark posts title tags were combined with URL tags Alternative: title as the only source of content tags for both types of
posts Clean title/Use low quality tags
Title tags with low score (<0.05) were removed from the title recommendation set
Alternative: all title tags are used in the recommendation process Separate models/Common models
Two separate sets of models were built for BibTeX and bookmark posts Alternative: Common models built based on all posts
35
Decision – Title and URL/Title only
WRONG – Improvement in recall does not reward precision drop Augmenting tags from precise source is hard
36
content based final recommendation F1 at 5title and URL 0.17230
title only 0.17743final (title and URL) 0.18740
final (title only) 0.19032
Decision – Clean title/Use low quality tags
RIGHT – to maximize f1 precision and recall should be equalized
37
final recommendation F1 at 5title only 0.17743
with low q. tags 0.16754final (title only) 0.19032final (low q. tags) 0.18425
content based
Decision – Separate models/Common models
38
WRONG – Separate models are slightly less accurate (counter-intuitive)
final recommendation F1 at 5title only 0.17743
title(common m.) 0.17829final (title only) 0.19032
final (common m.) 0.19122
content based
Results – Graph-based recommendation
39
BibTeX bookmark F1 at 5
title 0.23484
resource 0.30706
user 0.12753
userXres 0.22642
final 0.32461
Final recommendation is mostly defined by resource profile tags Intersection of user and resource related tags is worse source of tags
than resource profile (and title) – problem of imported posts?
Conclusions
Only one step ahead of baseline recommenders Title for content-based recommendation Resource profile for graph-based recommendation
Potential of user-based recommendation still undefined Slight improvement for both tasks
Noise caused by imported posts Are the two proposed evaluation methods representative?
40
Future work
Exploitation of user specific patterns User specific tags (e.g., name of the author for each BibTeX
publication) Handling of multi-word concepts (“information”, “retrieval”
or “information_retrieval”?) Short temporal patterns (sequence of posts addressing the
same problem)
Solution for imported posts noise
41
Tag Sources for Recommendation in Collaborative Tagging Systems
Faculty of Computer Science, Dalhousie University (Canada)
Marek LipczakYeming HuYael Kollet
Evangelos Milios
Thank you!