milan vojnovi ć msrc, systems and networking tagging done by you
TRANSCRIPT
Milan VojnovićMSRC, Systems and Networking
Tagging done by YOU
Thanks TagBooster project Dinan
Gunawardena James Cruise (U
Cambridge) Peter Marbach (U
Toronto) Fabian Suchanek (MPI)
Product groups O14 Sharepoint
Communities Tagspace Officelabs
MSR Tagging Summit TagBooster
User Study Nick Duffield John
Mulgrew Andy Slowey This talk Abi
Alex Chris Peter Stephen
Social tagging in web2.0
Why tag?
• Tagging: what and why
• Tag suggestions
• Conclusion
In this talk, we’ll find relations among the following
f
g
x y
Discover, filter, share
Faceted browsing
BBC news
Michael Palin
BBC radio
BBC shop
bbc
BBC news
Michael Palin
BBC radio
BBC shop
bbc
palin
Tagging vs. traditional classification
• Traditional classification– Pre-defined vocabulary– Structured– Done by
authors/librarians– Non trivial task
• Social tagging– Use any words– No structure– Done by anyone– Easy
Systems with controlled vocabulary
Social tagging challenges
• Vocabulary evolution– Filtering tags, tag suggestions, tagging metaphors– Uncontrolled vocabulary: scalable, mitigate vocabulary problem,
but tag noise
• User interface design– Tagcloud, tag clustering
• Cold start– Lack of prior knowledge about tags for an object– Participation incentive
• Scale– More tagging events, easier filtering
• Making use of tags– Related tags for navigation, expertise tracking, tag meta-data
for search, scoped rankings of items, faceted browsing
TagBooster User Study
Sept-Oct 07
4000+ participantsTagging web pagesQuestionnaire
Tagging done by YOU
Analogous to voting
music soul london
music
soul
jazz
london
black
artist
british
singer
Feedback !
PositiveNegative
Why suggest tags?
• Hiding users’ true preference over tags
• I picked a suggested tag that now I can’t remember
• I tend to overuse same tags all over again “exploit” vs. “explore”
• Less effort (cognitive, typing)
• Encourage users to use tags (cold start)
• Conformance in vocabulary
Top Popular: classical suggestion method
# sel. tag174 music110 radio96 internet radio77 online radio49 last.fm40 online music34 fm33 streaming music31 streaming28 last fm22 web radio19 scrobbling18 lastfm12 listen12 new music10 mp310 stream9 streaming radio
Suggested tags:radio, music, online radio, internet radio
Users’ generation of tags
singer
music
jazzBlack
british
rehab
London
soul
Set of all tags
artist
singermusic
jazz
british
singer
soul
Suggested tags
artist
Simple user model
music
jazzBlack
british
rehab
London
soul
music
jazz
british
singer
soul
Suggested tagsSet of all tags
singer
artist
artist
singer
1-p p imitationnon imitation
ri
i
ri i
Users’ tag selection affected by tag suggestions
Conditional on that the tag was
suggested
Unconditional
Frequency of tag selection
0
0.1
0.2
0.3
0.4
0.5
tag: apollo
The imitation rate
p̂ 0.32 0.31 0.4 0.34
portion of tag selections not in S;
suggestions not made
portion of tag selections not in the
suggestion set Sg
hgp
ˆBoes’ estimate:
Sel. Tag174 music110 radio96 internet radio77 online radio49 last.fm40 online music34 fm33 streaming music31 streaming28 last fm22 web radio19 scrobbling18 lastfm12 listen12 new music10 mp310 stream9 streaming radio
Move-to-Set: simple randomised rule
Suggested tags:last.fm, music, online radio, web radio
SiS
Sjj
iii r
rSprpf
:
)()1(
ji ff ji rr ?
)()( jAiA Sufficient ji rr for
Under the user model, for any imitation probability p < 1, the long run frequency of tag selections induces the true popularity ranking
Correctness of popularity order
Simple update rule
• Converges to sampling the suggestion set proportional to the product of true rank scores
Suggested tags:last.fm, music, online radio, web radio
Suggested tags:last.fm, music, radio, web radio
radio
Suggested tags:last.fm, music, online radio, web radio
• Same as “show most recent item” for suggestion set size 1
Analogous to exclusion process
jirj
rj
Frequency Move-to-Set
Rank Tag174 music110 radio96 internet radio77 online radio49 last.fm40 online music34 fm33 streaming music31 streaming28 last fm22 web radio19 scrobbling18 lastfm12 listen12 new music10 mp310 stream9 streaming radio
Suggested tags:radio, music, online radio, internet radio
radio
last.fm
Rank(radio) remains unchanged(“radio” suggested)
Rank(last.fm) ++(“last.fm” NOT suggested)
Only sufficiently popular tags eventually suggested
o.w.0
)(11 ||
|| Cir
rhs
i
cCs
i
frequency of suggesting tag i
competing set
suggestion set size harmonic mean of r1, ..., r|C|
Tag i in the competing set iff: )(1 rhr iis
i
Suggestion methods in action
Tag rank i
Frequency of tag suggestion
TOP
FMTS
MTS
NONE
Tag rank i
Norm. frequency of tag selection
Suggestion methods in action (cont’d)
TOP
FMTS
MTS
NONE
How did users appreciate the suggested tags?
Web page Method They were confusing
They were OK, but not very relevant
They were generally helpful
engadget TOP 35.00% 15.00% 50.00%FMTS 25.93% 25.93% 48.15%
MTS 22.22% 25.93% 51.85%lastfm TOP 22.22% 55.56% 22.22%
FMTS 25.00% 32.14% 42.86%
MTS 27.59% 24.14% 48.28%startup TOP 39.13% 21.74% 39.13%
FMTS 50.00% 29.17% 20.83%
MTS 30.30% 24.24% 45.46%mit TOP 21.74% 30.44% 47.83%
FMTS 23.08% 23.08% 53.85%
MTS 24.24% 18.18% 57.68%
Why did I select these tags?
Tags:gadgetstechnologyengadgetblog
2
1
I thought these are keywords that I would likely use later to find this item
I thought these are categories that best describe the object
else
Why did I select these tags? (cont’d)
YOU find, search, describe, categorise, identify, remember, organise, classify
wikipedia tag (meta data)
definition
describing the item
keyword-based classification
search
Why did I select these tags (cont’d)?
Semantic analysis of tags, search and content keywords – May 2007 popular Web searches + delicious tags
Tags similar to categories
Small overlap with search keywords
Summary
• Social tagging poses interesting research challenges– Space for innovation
• A mix of control theory, user behaviour, information retrieval, interface design
• Aim at best design of tagging systems to support particular users’ tasks
Sample of research challenges
User model?
Rate of convergence
Asymptotically accurate algorithms
Select from the list only (e.g. remote controller/mobile device)
What does it mean a tag is relevant?
Make suggestions to improve users’ task (e.g. search, faceted browsing)?
Beyond popularity ranking:
Ranking across multiple lists
Faces project
ongoingwork
Tag to attract
Familiarity with tagging
Email domain Users
microsoft.com 44%
hotmail.com 15%
gmail.com 11%
other 30%
Tagging frequency Users
daily 15%
weekly 25%
monthly 17%
less frequently 40%
still used infrequently by many