Download - Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации
![Page 1: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/1.jpg)
usage mining techniqueswith applications to web searchand content recommendation
Aristides Gionis
Yahoo! Research, Barcelona
yandex aug 31, 2012
![Page 2: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/2.jpg)
yahoo! research, barcelona
web mining
social media and multimedia
large-scale distributed systems
user engagement
semantic web
yandex aug 31, 2012
![Page 3: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/3.jpg)
web mining in yahoo! research
themes
usage mining and query-log mining
social network analysis and graph mining
influence propagation
other data mining problems
data sources
- query logs (search) and toolbar (browsing)
- social networks (flickr, messenger, email, ...)
- question-answering (answers)
- micro-blogging (twitter)
yandex aug 31, 2012
![Page 4: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/4.jpg)
web mining in yahoo! research
themes
usage mining and query-log mining
social network analysis and graph mining
influence propagation
other data mining problems
data sources
- query logs (search) and toolbar (browsing)
- social networks (flickr, messenger, email, ...)
- question-answering (answers)
- micro-blogging (twitter)
yandex aug 31, 2012
![Page 5: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/5.jpg)
overview of the talk
query-log mining
query graphsquery recommendations
yahoo! tips
news recommendations using real-time web
yandex aug 31, 2012
![Page 6: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/6.jpg)
query-log mining
yandex aug 31, 2012
![Page 7: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/7.jpg)
query-log mining
search engines collect a large amount of query logs
lots of interesting information
analyzing users’ behaviorcreating user profiles and personalizationcreating knowledge bases and folksonomiesfinding similar conceptsbuilding systems for query recommendationsusing statistics for improving systems’ performance. . .
yandex aug 31, 2012
![Page 8: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/8.jpg)
query-log mining
search engines collect a large amount of query logs
lots of interesting information
analyzing users’ behaviorcreating user profiles and personalizationcreating knowledge bases and folksonomiesfinding similar conceptsbuilding systems for query recommendationsusing statistics for improving systems’ performance. . .
yandex aug 31, 2012
![Page 9: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/9.jpg)
the click graph
[Craswell and Szummer, 2007]
yandex aug 31, 2012
![Page 10: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/10.jpg)
applications of the click graph
[Craswell and Szummer, 2007]
query-to-document search
query-to-query suggestion
document-to-query annotation
document-to-document relevance feedback
yandex aug 31, 2012
![Page 11: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/11.jpg)
the query-flow graph
[Boldi et al., 2008]
take into account temporal information
captures the “flow” of how users submit queries
definition:
nodes V = Q ∪ {s, t} the distinct set of queries Q, plusa starting state s and a terminal state tedges E ⊆ V × Vweights w(q, q′) representing the probabilitythat q and q′ are part of the same chain
yandex aug 31, 2012
![Page 12: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/12.jpg)
building the query-flow graph
an edge (q, q′) if q and q′ are consecutive inat least one session
weights w(q, q′) learned by machine learning
features used
textual features: cosine similarity, Jaccard coefficient,size of intersection, etc.session features: the number of sessions, the averagesession length, the average number of clicks in thesessions, the average position of the queries in thesessions, etc. andtime-related features: average time difference, etc.
yandex aug 31, 2012
![Page 13: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/13.jpg)
query-flow graph
barcelona fc
<T>
0.506
barcelona fcwebsite
0.043barcelona fc
fixtures
0.031
realmadrid
0.017
barcelonaweather
0.523
barcelonahotels
0.018
barcelonaweatheronline
0.100
barcelona
0.018
0.011
0.439
cheapbarcelona
hotels
0.072
luxurybarcelona
hotels
0.029
0.080
0.416
0.043
0.023
yandex aug 31, 2012
![Page 14: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/14.jpg)
query-flow graph
dog
cat
funny cat
picture of a catcat and dog
picture of a funny
breed of dog
dog for sale
picture of a dog
funny dog
^
$
yandex aug 31, 2012
![Page 15: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/15.jpg)
query recommendations
the general theme:
given an input query q
identify similar queries q
rank them and present them to the user
most query graphs can be used for both tasks:similarity and ranking
yandex aug 31, 2012
![Page 16: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/16.jpg)
query recommendations
the general theme:
given an input query q
identify similar queries q
rank them and present them to the user
most query graphs can be used for both tasks:similarity and ranking
yandex aug 31, 2012
![Page 17: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/17.jpg)
recommendations using the query-flow graph
[Boldi et al., 2008]
perform a random walk on the query-flow graph
teleportation to the submitted query
teleportation to previous queries to take into accountthe user history
normalize PageRank score to un-biasingfor very popular queries
yandex aug 31, 2012
![Page 18: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/18.jpg)
example : apple
Max. weight sq sq sq
t t apple appleapple ipod apple apple fruit apple ipodapple store apple ipod apple ipod apple trailersapple trailers apple store apple belgium apple storeamazon apple trailers eating apple apple macapple mac google apple.nl apple fruititunes amazon apple monitor apple usapc world argos apple usa apple ipod nanoargos itunes apple jobs apple.com/ipod...
yandex aug 31, 2012
![Page 19: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/19.jpg)
example : banana → apple
banana → apple banana
banana bananaapple eating bugsusb no banana holidaybanana cs opening a bananagiant chocolate bar banana shoewhere is the seed inanut
fruit banana
banana shoe recipe 22 feb 08fruit banana banana jules oliverbanana cloths banana cseating bugs banana cloths
yandex aug 31, 2012
![Page 20: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/20.jpg)
example : beatles → apple
beatles → apple beatles
beatles beatlesapple scarringapple ipod paul mcartneyscarring yarns from irelandsrg peppers artwork statutory instrument
A55ill get you silver beatles tribute
bandbashles beatles mp3dundee folk songs GHOST’Sthe beatles love album ill get youplace lyrics beatles fugees triger finger
remix
yandex aug 31, 2012
![Page 21: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/21.jpg)
recommendations as shortcuts to qfg
[Anagnostopoulos et al., 2010]
yandex aug 31, 2012
![Page 22: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/22.jpg)
the query-recommendation problem
yandex aug 31, 2012
![Page 23: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/23.jpg)
the query-recommendation problem
yandex aug 31, 2012
![Page 24: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/24.jpg)
the query-recommendation problem
yandex aug 31, 2012
![Page 25: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/25.jpg)
the query-recommendation problem
yandex aug 31, 2012
![Page 26: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/26.jpg)
the recommendation problem
model user behavior as a random walk on qfg
a user starts at query q0 and follows a path p ofreformulations on qfg before terminating
consider a reward function w(q) on the nodes of qfg
goal: “nudge” users in order to maximize their reward
objectives:
1. collect a large reward along the way
2. end the session at a high-reward node
applications: a general problem formulation for suggestingshortcuts (web graph, social networks, etc.)
yandex aug 31, 2012
![Page 27: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/27.jpg)
probabilistic model
we can only suggest, not order the user
we do not know how the user will act
random walk on qfg is modeled by stochastic matrix P
recommendations R modify P to P ′ = P + R
yandex aug 31, 2012
![Page 28: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/28.jpg)
utility functions
reward function w(q) on queries
- quality of search results, user satisfaction, dwell time,monetization, etc.
utility function U(p) on paths p = 〈q0 . . . qk−1T 〉
U(p) =∑
q∈p
w(q) U(p) = w(qk−1),
(Cafavy) (Machiavelli)
“road to Ithaca” “end justify the means”
yandex aug 31, 2012
![Page 29: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/29.jpg)
utility
w ρ ρw 1−step heuristic
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Sum of expected values
yandex aug 31, 2012
![Page 30: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/30.jpg)
qfg projections for diverse recommendations
[Bordino et al., 2010]
yandex aug 31, 2012
![Page 31: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/31.jpg)
diverse recommendations
[Bordino et al., 2010]
we want not only relevant and high-qualityrecommendations, but also a diverse set
we want recommendations that take to different“directions” in the qfg
need notions of distance of queries in the qfg
use spectral embeddings
project a graph in a low dimensional space, so thatembedding minimizes total edge distortion
finding diverse recommendations reduces to a geometricproblem
yandex aug 31, 2012
![Page 32: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/32.jpg)
example: time
Spectral projection on 2-hop neighborhood
time time magazine new york times time zone world time what time is it time warner time warner cabletime magazine 0.9953 0.0162 0.1422 0.1049 -0.6071 -0.6056new york times 0.9953 -0.0051 0.1248 0.0893 -0.6478 -0.6462
time zone 0.0162 -0.0051 0.9903 0.9891 -0.5234 -0.5254world time 0.1422 0.1248 0.9903 0.9970 -0.6263 -0.6282
what time is it 0.1049 0.0893 0.9891 0.9970 -0.6244 -0.6263time warner -0.6071 -0.6478 -0.5234 -0.6263 -0.6244 0.9999
time warner cable -0.6056 -0.6462 -0.5254 -0.6282 -0.6263 0.9999
yandex aug 31, 2012
![Page 33: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/33.jpg)
improving recommendationfor long-tail queries via templates
[Szpektor et al., 2011]
yandex aug 31, 2012
![Page 34: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/34.jpg)
motivation
goal: improve coverage of query-recommendation systems
observation: in a typical query log 50 % of query volumeare unique queries [Baeza-Yates et al., 2007]
most query-recommendation systems are based on findingqueries that co-occur frequently
inherent limitation on using co-occurrences
need to be able to develop methods to reason for rare,and even previously unseen, queries
yandex aug 31, 2012
![Page 35: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/35.jpg)
overview of the approach
1 generate candidate query-templates for each query
Paris hotels → <city> hotels
Paris hotels → <district> hotels
Moscow hotels → <city> hotels
2 infer transitions between templates
<city> hotels → <city> restaurants
3 infer recommendations for rare queries
Yancheng hotels → Yancheng restaurants
yandex aug 31, 2012
![Page 36: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/36.jpg)
overview of the approach
1 generate candidate query-templates for each query
Paris hotels → <city> hotels
Paris hotels → <district> hotels
Moscow hotels → <city> hotels
2 infer transitions between templates
<city> hotels → <city> restaurants
3 infer recommendations for rare queries
Yancheng hotels → Yancheng restaurants
yandex aug 31, 2012
![Page 37: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/37.jpg)
overview of the approach
1 generate candidate query-templates for each query
Paris hotels → <city> hotels
Paris hotels → <district> hotels
Moscow hotels → <city> hotels
2 infer transitions between templates
<city> hotels → <city> restaurants
3 infer recommendations for rare queries
Yancheng hotels → Yancheng restaurants
yandex aug 31, 2012
![Page 38: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/38.jpg)
overview of the approach
1 generate candidate query-templates for each query
Paris hotels → <city> hotels
Paris hotels → <district> hotels
Moscow hotels → <city> hotels
2 infer transitions between templates
<city> hotels → <city> restaurants
3 infer recommendations for rare queries
Yancheng hotels → Yancheng restaurants
yandex aug 31, 2012
![Page 39: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/39.jpg)
overview of the approach
1 generate candidate query-templates for each query
Paris hotels → <city> hotels
Paris hotels → <district> hotels
Moscow hotels → <city> hotels
2 infer transitions between templates
<city> hotels → <city> restaurants
3 infer recommendations for rare queries
Yancheng hotels → Yancheng restaurants
yandex aug 31, 2012
![Page 40: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/40.jpg)
overview of the approach
1 generate candidate query-templates for each query
Paris hotels → <city> hotels
Paris hotels → <district> hotels
Moscow hotels → <city> hotels
2 infer transitions between templates
<city> hotels → <city> restaurants
3 infer recommendations for rare queries
Yancheng hotels → Yancheng restaurants
yandex aug 31, 2012
![Page 41: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/41.jpg)
query templates
defined over a hierarchy of entity types
define a global set of templates over the whole query log
do not restrict on specific domains(such as, travel, weather, or movies)
examples:
jaguar spare parts → <car> spare parts
name for salt → name for <compound>
a thousand miles notes → <song> notes
yandex aug 31, 2012
![Page 42: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/42.jpg)
query templates
defined over a hierarchy of entity types
define a global set of templates over the whole query log
do not restrict on specific domains(such as, travel, weather, or movies)
examples:
jaguar spare parts → <car> spare parts
name for salt → name for <compound>
a thousand miles notes → <song> notes
yandex aug 31, 2012
![Page 43: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/43.jpg)
candidate templates – example
chocolate cookie chocolate cookie
food
dessert
drink
recipe
instruction
substance
query: chocolate cookie recipe
candidate templates: <food> cookie recipe
<drink> cookie recipe
<food> recipe
<substance> recipe
chocolate cookie <instruction> . . .
yandex aug 31, 2012
![Page 44: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/44.jpg)
candidate templates – example
chocolate cookie chocolate cookie
food
dessert
drink
recipe
instruction
substance
query: chocolate cookie recipe
candidate templates: <food> cookie recipe
<drink> cookie recipe
<food> recipe
<substance> recipe
chocolate cookie <instruction> . . .
yandex aug 31, 2012
![Page 45: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/45.jpg)
candidate templates – example
chocolate cookie chocolate cookie
food
dessert
drink
recipe
instruction
substance
query: chocolate cookie recipe
candidate templates: <food> cookie recipe
<drink> cookie recipe
<food> recipe
<substance> recipe
chocolate cookie <instruction> . . .
yandex aug 31, 2012
![Page 46: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/46.jpg)
ranking candidate templates
ambiguity
Jaguar spare parts → <car> spare parts
Jaguar spare parts → <animal> spare parts
focus
name for salt → name for <compound>
name for salt → <description> for salt
right generalization level
Paris hotels → <capital> hotels
Paris hotels → <city> hotels
Paris hotels → <location> hotels
yandex aug 31, 2012
![Page 47: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/47.jpg)
ranking candidate templates
ambiguity
Jaguar spare parts → <car> spare parts
Jaguar spare parts → <animal> spare parts
focus
name for salt → name for <compound>
name for salt → <description> for salt
right generalization level
Paris hotels → <capital> hotels
Paris hotels → <city> hotels
Paris hotels → <location> hotels
yandex aug 31, 2012
![Page 48: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/48.jpg)
ranking candidate templates
ambiguity
Jaguar spare parts → <car> spare parts
Jaguar spare parts → <animal> spare parts
focus
name for salt → name for <compound>
name for salt → <description> for salt
right generalization level
Paris hotels → <capital> hotels
Paris hotels → <city> hotels
Paris hotels → <location> hotels
yandex aug 31, 2012
![Page 49: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/49.jpg)
construction of query templates – details
hierarchy used: WordNet 3.0 hierarchy and Wikipediacategory hierarchy, connected via yago mapping
queries are tokenized, and n-grams are looked up andmapped to entities in the hierarchy
enriched with heuristic generalizations for <email>,<url>, numbers, and noun-phrases not in the taxonomy
yandex aug 31, 2012
![Page 50: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/50.jpg)
query-to-template edges
mapping from a query q to its set of templates T (q)viewed as query-to-template edges
associated edge scores
sqt(q, t) = αd
when t obtained by generalizing q at distance d in H
parameter α set experimentally to 0.9
set sqt(q, q′) = 1, if (q, q′) edge in query-flow graph
normalize so that all sqt(q, ·) sum to 1
yandex aug 31, 2012
![Page 51: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/51.jpg)
template-to-templates edges
reasoning about transitions between templates
<food> recipe → healthy <food> recipe
for templates (t1, t2) define the support set of query pairs{(q1, q2)}, s.t.
t1 ∈ T (q1) and t2 ∈ T (q2)t1 and t2 substitute the same token in q1 and q2
(e.g., dosa recipe and healthy dosa recipe)
define template-to-template edge score as
stt(t1, t2) =∑
(q1,q2)∈Sup(t1,t2)
sqq(q1, q2)
normalize so that all stt(t, ·) sum to 1
yandex aug 31, 2012
![Page 52: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/52.jpg)
example – ambiguity
consider query transition:jaguar transmission → jaguar spare parts
template transition<car> transmission → <car> spare parts
supported bybmw transmission → bmw spare parts
audi transmission → audi spare parts
. . .
template transition<animal> transmission → <animal> spare parts
will not be supported bylion transmission → lion spare parts
tiger transmission → tiger spare parts
. . .
yandex aug 31, 2012
![Page 53: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/53.jpg)
example – ambiguity
consider query transition:jaguar transmission → jaguar spare parts
template transition<car> transmission → <car> spare parts
supported bybmw transmission → bmw spare parts
audi transmission → audi spare parts
. . .
template transition<animal> transmission → <animal> spare parts
will not be supported bylion transmission → lion spare parts
tiger transmission → tiger spare parts
. . .
yandex aug 31, 2012
![Page 54: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/54.jpg)
the query-template flow graph
extension of the query-flow graph
superposition of all the concepts we have seen so far:
set of nodes consists of queries and templates
set of edges consists of
query to query edgesquery to template edgestemplate to template edges
associated weights
yandex aug 31, 2012
![Page 55: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/55.jpg)
generating recommendations
q
q q′
q′t1
t2
t3
t4
s1
s2
s3
s4
s5
s6
s7
r(q, q′) = s1s4 + s2s5 + s3s6 + s3s7
interpretation: probability of a feasible path
dashed lines do not really exist, but discovered on-the-fly
queries q and q′ may not have been seen before
transitions in the query-flow graph ranked first
yandex aug 31, 2012
![Page 56: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/56.jpg)
methodology
methods:
query-template flow graph
query-flow graph
evaluation:
inspection a sample of the results
editorial evaluation
automated evaluation
yandex aug 31, 2012
![Page 57: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/57.jpg)
training dataset
queries templates# nodes 95 279 132 5 382 051 983# edges 83 513 590 4 345 497 267avg degree 0.88 0.81max out-degree 14 145 34 249
(craigslist) (<album>)max in-degree 14 317 133 874
(youtube) (<institution>)
yandex aug 31, 2012
![Page 58: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/58.jpg)
anecdotal evidence
{“guangzhou flights”, “guangzhou map”}<capital> flights → <capital> map
{“a thousand miles notes”, “a thousand miles piano notes”}<single> notes → <single> piano notes
{“8 week old weimaraner”, “8 week old weimaraner puppy”}8 week old <breed> → 8 week old <breed> puppy
{“aaa office twin falls idaho”, “aaa twin falls idaho”}aaa office <city> → aaa <city>
{“air force titles”, “air force ranks”}<military service> titles → <military service> ranks
{“name for salt”, “chemical name for salt”}name for <compound> → chemical name for <compound>
yandex aug 31, 2012
![Page 59: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/59.jpg)
editorial evaluation
set-A: 300 pairs from each configuration,recommendation in the top-10
set-B: 100 pairs, same queries in each configuration,same position
set-C: 100 pairs for which query-flow graph has norecommendation
editors labeled query-recommendation pairs as:relevant, not relevant, cannot tell
two editors, 100 common queries, kappa-statistic 0.37
qfg qtfgset-A 98.48% 97.84%set-B 97.65% 98.86%set-C — 94.38%
yandex aug 31, 2012
![Page 60: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/60.jpg)
automated evaluation – guiding principle
extract query pairs {qi , qi+1} from a testing dataset, suchthat user submitted qi+1 after qi in the same session
measure if qi+1 is predicted by our methods, and in whichposition
assumption: qi+1 should be relevant and useful for qi
yandex aug 31, 2012
![Page 61: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/61.jpg)
results
qfg qtfg relative increase
pair occurrences
total pairs 3134388 3134388coverage 22.65 % 28.17 % 24.37 %# in top-100 16.97 % 25.49 % 50.23 %# in top-10 9.49 % 20.74 % 118.49 %# in top-1 2.86 % 10.01 % 249.5 %MAP 0.050 0.137avg. position 18.35 8.3
unique pairs
total pairs 2755922 2755922coverage 13.28 % 19.38 % 45.87 %# in top-100 12.06 % 17.25 % 42.96 %# in top-10 8.41 % 13.52 % 60.68 %# in top-1 2.86 % 6.5 % 127.32 %MAP 0.047 0.089avg. position 12.33 9.43yandex aug 31, 2012
![Page 62: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/62.jpg)
results
0
2
4
6
8
10
12
14
16
18
20
2 4 6 8 10 12 14 16
# te
st-p
airs
at t
op-1
0 (%
)
query length (words)
QFGQTFG
yandex aug 31, 2012
![Page 63: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/63.jpg)
conclusions
improve coverage of query recommendation systems
recommendations for rare or previously unseen queries
well suited for tail queries
complements rather than replaces existing methods
future work: improve quality of extracted templates
yandex aug 31, 2012
![Page 64: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/64.jpg)
yahoo! tips
[Weber et al., 2011]
yandex aug 31, 2012
![Page 65: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/65.jpg)
motivation
provide answers, not links
identify “how to” queries and provide tips
tip: piece of advice that is1 short2 concrete3 self-contained4 non-obvious
yandex aug 31, 2012
![Page 66: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/66.jpg)
yahoo! tips
yandex aug 31, 2012
![Page 67: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/67.jpg)
yahoo! tips
yandex aug 31, 2012
![Page 68: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/68.jpg)
yahoo! tips
yandex aug 31, 2012
![Page 69: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/69.jpg)
yahoo! tips
yandex aug 31, 2012
![Page 70: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/70.jpg)
extract tips from yahoo! answers
tip: To tell if your eggs are fresh : place eggs in a bowl/glassof water.....if it floats it’s bad. if it sinks it’s good.
yandex aug 31, 2012
![Page 71: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/71.jpg)
system diagram
zest lime without zester
250k candidate tips
rule-based extraction
machine learning
Does query have
how-to intent?
show normal
search resultsno
yes
Obtain quality labels for 20k
candidate tip using CrowdFlower
machine learning
22k high quality tipsAre there relevant
high quality tips?
show normal
search results
rank the matching tips and
display highest ranking one
TIP: To zest a lime if you don‘t have a zester : use a cheese grater
no
yes
yandex aug 31, 2012
![Page 72: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/72.jpg)
mining tips from yahoo! answers
consider tips of a specific structure: “X : Y ”
X : goal of the tip
Y : action of the tip
examples
To get the mildew smell out of your towels : try soakingit in a salt water solution, then washing with soap andcold water, that tends to get rid of smellsTo style your hair without heat, gel or straighteners : trycoconut oil mark k
yandex aug 31, 2012
![Page 73: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/73.jpg)
mining tips from yahoo! answers
english
only literal “how to” queries
answer should start with a verb
consider only best answers
replace I, my, me, myself, etc.with you, your, you, yourself, etc.
yandex aug 31, 2012
![Page 74: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/74.jpg)
quality filtering
generated 249 675 tips
manually label 20 000 using CrowdFlower
classes: very good (25%), ok (48%), bad (27%)
algorithms
svm (rbf)decision treesk-nn (Euclidean, k = 21 . . . 50)
feature families:
18 handcrafted features: e.g., style (Flesch-Kincaidreading level), sentiment, # urls, emoticons, etc.content: SVD on the tip×term matrix
yandex aug 31, 2012
![Page 75: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/75.jpg)
quality filtering
generated 249 675 tips
manually label 20 000 using CrowdFlower
classes: very good (25%), ok (48%), bad (27%)
algorithms
svm (rbf)decision treesk-nn (Euclidean, k = 21 . . . 50)
feature families:
18 handcrafted features: e.g., style (Flesch-Kincaidreading level), sentiment, # urls, emoticons, etc.content: SVD on the tip×term matrix
yandex aug 31, 2012
![Page 76: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/76.jpg)
quality filtering
generated 249 675 tips
manually label 20 000 using CrowdFlower
classes: very good (25%), ok (48%), bad (27%)
algorithms
svm (rbf)decision treesk-nn (Euclidean, k = 21 . . . 50)
feature families:
18 handcrafted features: e.g., style (Flesch-Kincaidreading level), sentiment, # urls, emoticons, etc.content: SVD on the tip×term matrix
yandex aug 31, 2012
![Page 77: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/77.jpg)
quality filtering — machine learning results
Method handcrafted content bothfeatures features
Har
d SVM 0.63/0.13 0.60/0.09 0.63/0.16Decision Tree 0.67/0.07 0.61/0.06 0.66/0.13k-NN 0.62/0.23 0.56/0.11 0.63/0.11
Sof
t SVM 0.95/0.11 0.93/0.05 0.95/0.08Decision Tree 0.95/0.03 0.92/0.03 0.94/0.06k-NN 0.94/0.11 0.91/0.05 0.94/0.05
yandex aug 31, 2012
![Page 78: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/78.jpg)
quality filtering — machine learning results
Category P,R VG sizeBeauty & Style 0.53,0.08 0.16 0.08Business & Finance 0.57,0.20 0.20 0.03Cars & Transportation 0.64,0.12 0.23 0.03Computers & Internet 0.69,0.33 0.45 0.15Consumer Electronics 0.70,0.23 0.38 0.06Entertainment & Music 0.60,0.39 0.15 0.05Family & Relationships 0.35,0.05 0.06 0.14Games & Recreation 0.61,0.31 0.24 0.04Health 0.62,0.07 0.15 0.09Home & Garden 0.43,0.06 0.27 0.04Society & Culture 0.50,0.19 0.09 0.03Sports 0.68,0.24 0.19 0.03Yahoo! Products 0.73,0.43 0.45 0.07
yandex aug 31, 2012
![Page 79: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/79.jpg)
detecting “how to” queries
how many? 2-3% of volume, 3-4% of distinct queries
start with “how to” “how do i” or “how can i”
how do you fix keys on a laptopP: 96-99%, cover: 1.0%
queries start with an action verb
play my music on tool bar raidoP: 7-14%, cover: 3.2%
if exists “how to X” then “X”
craft ideas for boysP: 87-94%, cover: 1.1%
incoming queries to “how to” web sites
fixing a wet cell phoneP: 61-75%, cover: 0.08%
yandex aug 31, 2012
![Page 80: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/80.jpg)
detecting “how to” queries
how many? 2-3% of volume, 3-4% of distinct queries
start with “how to” “how do i” or “how can i”
how do you fix keys on a laptopP: 96-99%, cover: 1.0%
queries start with an action verb
play my music on tool bar raidoP: 7-14%, cover: 3.2%
if exists “how to X” then “X”
craft ideas for boysP: 87-94%, cover: 1.1%
incoming queries to “how to” web sites
fixing a wet cell phoneP: 61-75%, cover: 0.08%
yandex aug 31, 2012
![Page 81: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/81.jpg)
detecting “how to” queries
how many? 2-3% of volume, 3-4% of distinct queries
start with “how to” “how do i” or “how can i”
how do you fix keys on a laptopP: 96-99%, cover: 1.0%
queries start with an action verb
play my music on tool bar raidoP: 7-14%, cover: 3.2%
if exists “how to X” then “X”
craft ideas for boysP: 87-94%, cover: 1.1%
incoming queries to “how to” web sites
fixing a wet cell phoneP: 61-75%, cover: 0.08%
yandex aug 31, 2012
![Page 82: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/82.jpg)
detecting “how to” queries
how many? 2-3% of volume, 3-4% of distinct queries
start with “how to” “how do i” or “how can i”
how do you fix keys on a laptopP: 96-99%, cover: 1.0%
queries start with an action verb
play my music on tool bar raidoP: 7-14%, cover: 3.2%
if exists “how to X” then “X”
craft ideas for boysP: 87-94%, cover: 1.1%
incoming queries to “how to” web sites
fixing a wet cell phoneP: 61-75%, cover: 0.08%
yandex aug 31, 2012
![Page 83: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/83.jpg)
detecting “how to” queries
how many? 2-3% of volume, 3-4% of distinct queries
start with “how to” “how do i” or “how can i”
how do you fix keys on a laptopP: 96-99%, cover: 1.0%
queries start with an action verb
play my music on tool bar raidoP: 7-14%, cover: 3.2%
if exists “how to X” then “X”
craft ideas for boysP: 87-94%, cover: 1.1%
incoming queries to “how to” web sites
fixing a wet cell phoneP: 61-75%, cover: 0.08%
yandex aug 31, 2012
![Page 84: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/84.jpg)
matching queries to tips
precision–recall trade-off
index only the “goal” or also “action”use AND or OR mode for queryrequire minimum “span” for the goal
ranking
rank by number of query tokens in goal, then tf·idf
yandex aug 31, 2012
![Page 85: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/85.jpg)
matching queries to tips — evaluation
mode min span vol. dist. P@1 medianAND .50 8.7% 2.7% .428/.680 1AND .66 6.8% 1.8% .557/.770 1AND 1.0 4.4% 0.8% .625/.835 1OR .50 87.4% 88.4% .048/.110 18OR .66 36.8% 36.3% .092/.200 2OR 1.0 13.5% 10.3% .160/.300 1
yandex aug 31, 2012
![Page 86: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/86.jpg)
future work
mine tips from other recourses
twitterwikitravel
improve quality of existing system
incorporating more featuresimproving rule extractionclassification
yandex aug 31, 2012
![Page 87: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/87.jpg)
information dissemination in social networks
yandex aug 31, 2012
![Page 88: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/88.jpg)
the information dissemination spectrum
news sitescontent-provider siteseditorially curatedusers browseno specific info need
web searchurl, images, music,...clear intent
social media (twitter, facebook)recommendations(content- or context- or geo-aware)user-generated content(blogs, images, q/a)
yandex aug 31, 2012
![Page 89: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/89.jpg)
the information dissemination spectrum
news sitescontent-provider siteseditorially curatedusers browseno specific info need
web searchurl, images, music,...clear intent
social media (twitter, facebook)recommendations(content- or context- or geo-aware)user-generated content(blogs, images, q/a)
yandex aug 31, 2012
![Page 90: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/90.jpg)
the information dissemination spectrum
news sitescontent-provider siteseditorially curatedusers browseno specific info need
web searchurl, images, music,...clear intent
social media (twitter, facebook)recommendations(content- or context- or geo-aware)user-generated content(blogs, images, q/a)
yandex aug 31, 2012
![Page 91: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/91.jpg)
social media
yandex aug 31, 2012
![Page 92: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/92.jpg)
the information overload problem
yandex aug 31, 2012
![Page 93: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/93.jpg)
social media and user-generated content
paradigm shift from a broadcast one-to-many mechanismto a many-to-many model
users at the role of information producers
yandex aug 31, 2012
![Page 94: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/94.jpg)
benefits and opportunities
wealth of information of extreme volume and diversity
wisdom of crowd phenomena
accurate profiling and personalization(toolbar, search, clicks)
content- and context- information available
social and geo information available
yandex aug 31, 2012
![Page 95: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/95.jpg)
challenges
heterogeneous sources
high variability in quality
needle-in-the-haystack problems
we want to:
support users to seek, filter, and disseminate information
build efficient platforms that support social-mediafunctionalities
yandex aug 31, 2012
![Page 96: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/96.jpg)
challenges
heterogeneous sources
high variability in quality
needle-in-the-haystack problems
we want to:
support users to seek, filter, and disseminate information
build efficient platforms that support social-mediafunctionalities
yandex aug 31, 2012
![Page 97: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/97.jpg)
personalized news recommendationsby harnessing the real-time web
[De Francisci Morales et al., 2012]
yandex aug 31, 2012
![Page 98: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/98.jpg)
overview
a news recommendation system based on real-time web,e.g., twitter
suggest news articles to twitter users
infer user preferences from twitter activity
yandex aug 31, 2012
![Page 99: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/99.jpg)
yahoo! news
yandex aug 31, 2012
![Page 100: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/100.jpg)
yahoo! news
yandex aug 31, 2012
![Page 101: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/101.jpg)
yahoo! news
yandex aug 31, 2012
![Page 102: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/102.jpg)
sources characteristics
news stream
+ high coverage
− sparse and noisy data for user profiling
− latency on collecting user feedback
twitter stream
+ much more accurate personalization
+ news spread very fast
yandex aug 31, 2012
![Page 103: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/103.jpg)
Entities
News
Tweets
From Chatter to Headlines:Harnessing the Real-Time Web
for Personalized News Recommendation
Overview Motivation Problem
Model Method Results
tweetsUser
tweetsFollowee
tweetsFollowee
tweetsFollowee
tweetstwitter
articlesnews
T.Rex
User Model
!
"
#
Personalized ranked list of news articles
Table 5.2: MRR, precision and coverage.
Algorithm MRR P@1 P@5 P@10 CoverageRECENCY 0.020 0.002 0.018 0.036 1.000CLICKCOUNT 0.059 0.024 0.086 0.135 1.000SOCIAL 0.017 0.002 0.018 0.036 0.606CONTENT 0.107 0.029 0.171 0.286 0.158POPULARITY 0.008 0.003 0.005 0.012 1.000T.REX 0.107 0.073 0.130 0.168 1.000T.REX+ 0.109 0.062 0.146 0.189 1.000
RECENCY: it ranks news articles by time of publication (most recent first);CLICKCOUNT: it ranks news articles by click count (highest count first);SOCIAL: it ranks news articles by using T.REX with β = γ = 0;CONTENT: it ranks news articles by using T.REX with α = γ = 0;POPULARITY: it ranks news articles by using T.REX with α = β = 0.
5.6.5 Results
We report MRR, precision and coverage results in Table 5.6.3. The twovariants of our system, T.REX and T.REX+, have the best results overall.
T.REX+ has the highest MRR of all the alternatives. This result meansthat our model has a good overall performance across the dataset. CON-TENT has also a very high MRR. Unfortunately, the coverage level achievedby the CONTENT strategy is very low. This issue is mainly caused by thesparsity of the user profiles. It is well know that most of twitter usersbelong to the “silent majority,” and do not tweet very much.
The SOCIAL strategy is affected by the same problem, albeit to a muchlesser extent. The reason for this difference is that SOCIAL draws froma large social neighborhood of user profiles, instead of just one. So ithas more chances to provide a recommendation. The quality of the rec-ommendation is however quite low, probably because the social-basedprofile only is not able to catch the specific user interests.
It is worth noting that in almost 20% of the cases T.REX+ was able torank the clicked news in the top 10 results. Ranking by the CLICKCOUNT
124
!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ave
rag
e D
CG
Rank
T.Rex+T.Rex
PopularityContent
SocialRecency
Click count
63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5
What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5
Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Popularity Model Π6'('7'*'8%?@"*"'6,/0%(0%9@"%)+)8,#*(9E%+>%$"?0%#*9(',"%1/5
in updating the popularity counts is to take into account recency: newentities of interest should dominate the popularity counts of older enti-ties. In this work, we choose to update the popularity counts using anexponential decay rule. We discuss the details in Section 5.3.1. However,note that the popularity update is independent of our recommendationmodel, and any other decaying function can be used.
Finally, we propose a ranking function for recommending news arti-cles to users. The ranking function is linear combination of the scoringcomponents described above. We plan to investigate the effect of non-linear combinations in the future.
Definition 10 (Recommendation ranking Rτ (u, n)). Given the componentsΣτ , Γτ and Πτ , resulting form a stream of news N and a stream of tweets Tauthored by users U up to time τ , the recommendation score of a news articlen ∈ N for a user u ∈ U at time τ is defined as
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n),
where α, β, γ are coefficients that specify the relative weight of the components.
At any given time, the recommender system produces a set of newsrecommendation by ranking a set of candidate news, e.g., the most re-cent ones, according to the ranking function R. To motivate the pro-posed ranking function we note similarities with popular recommenda-tion techniques. When β = γ = 0, the ranking function R resemblescollaborative filtering, where user similarity is computed on the basisof their social circles. When α = γ = 0, the function R implements acontent-based recommender system, where a user is profiled by the bag-of-entities occurring in the tweets of the user. Finally, when α = β = 0,the most popular items recommended, regardless of the user profile.
Note that Σ, Γ, Π and R are all time dependent. At any given time τ
the social network and the set of authored tweets vary, thus affecting Σ
and Γ. More importantly, some entities may abruptly become popular,hence of interest to many user. This dependency is captured by Π. Whilethe changes in Σ and Γ derive directly from the tweet stream T and thesocial network S, the update of Π is non-trivial, and plays a fundamentalrole in the recommendation system that we describe in the next section.
108
Recommendation Model R
T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"
;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0
"#$%!&'(!&)*+,!-).&!/(0(12$&!$(3.!4)/!5.(/!&!2&!&#-(τ6Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1('9+*%+>%($9"*"095
How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5
Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05
DataR"?0L%SIT-%#*9(',"0%>*+:%A#@++B%$"?0P?(99"*L%H%:+$9@%+>%'*#?,"1%9?""9052,('-0L%80"*0%+>%9?(99"*%($%A#@++B%9++,<#*%,+405
EvaluationU"%"3#,8#9"%!"#$%%#0%#%',('-%)*"1('9(+$%0E09":5%U"%9*#($%+8*%:+1",%80($4%#%,"#*$($4V9+V*#$-%#))*+#'@%#$1%08))+*9%3"'9+*%:#'@($"05P@"%9*#($%#$1%9"09%0"9%#*"%1*#?$%>*+:%',('-%,+405
Claudio [email protected]
Gianmarco De Francisci [email protected]
Aristides [email protected]
Overwhelmed by information overload! W($1%($9"*"09($4%09+*("0%($%#$%+'"#$%+>%+$,($"%$"?0%#*9(',"05
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000
Minutes
News-click delay
$8:<"
*%+>%+
''8**"$'
"0
R"?0V',('-%1",#E%1(09*(<89(+$
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
May-01 h20
May-02 h00
May-02 h04
May-02 h08
May-02 h12
May-02 h16
May-02 h20
May-03 h00
May-03 h04
May-03 h08
newstwitterclicks
9:;<;'=-1'>;?$1%9*"$10
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
May-22 h00
May-22 h12
May-23 h00
May-23 h12
May-24 h00
May-24 h12
May-25 h00
May-25 h12
May-26 h00
newstwitterclicks
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
@ABC-1'!AD1;?A'9*"$10
),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/
U
T
''(%#89@+*0@()%:#9*(J
4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%
)*+18'"1%<E%2/
U
U
('('0+'(#,%:#9*(J
in N according to a user-dependent relevance criteria. We also aim atincorporating time recency into our model, so that our recommendationsfavor the most recently published news articles.
We now proceed to model the factors that affect the relevance of newsfor a given user. We first model the social-network aspect. In our case,the social component is induced by the twitter following relationship. Wedefine S to be the social network adjacency matrix, were S(i, j) is equalto 1 divided by the number of users followed by user ui if ui follows uj ,and 0 otherwise. We also adopt a functional ranking (Baeza-Yates et al.,2006) that spreads the interests of a user among its neighbors recursively.By limiting the maximum hop distance d, we define the social influencein a network as follows.
Definition 4 (Social influence S∗). Given a set of users U = {u0, u1, . . .},organized in a social network where each user may express an interest to thecontent published by another user, we define the social influence model S∗ as the|U| × |U| matrix where S∗(i, j) measures the interest of user ui to the contentgenerated by user uj and it is computed as
S∗ =
�i=d�
i=1
σiSi
�,
where S is the row-normalized adjacency matrix of the social network, d is themaximum hop-distance up to which users may influence their neighbors, and σis a damping factor.
Next we model the profile of a user based on the content that the userhas generated. We first define a binary authorship matrix A to capturethe relationship between users and the tweets they produce.
Definition 5 (Tweet authorship A). Let A be a |U|×|T | matrix where A(i, j)is 1 if ui is the author of tj , and 0 otherwise.
The matrix A can be extended to deal with different types of relation-ships between users and posts, e.g., weigh differently re-tweets, or likes.In this work, we limit the concept of authorship to the posts actuallywritten by the user.
104
0+'(#,%($9"*"09
45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5
Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5
C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5
Z
7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*
+,-./0'('*",#9"1$"00%+>%
9?""9%F-%9+%$"?0%1/T
N
*'('9?""9V9+V$"?0%:#9*(J
*+,+!+-+.
!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/
T
Z
!'(%9?""9%:#9*(J
8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/
Z
N
.'(%$"?0%:#9*(J
yandex aug 31, 2012
![Page 104: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/104.jpg)
Entities
News
Tweets
From Chatter to Headlines:Harnessing the Real-Time Web
for Personalized News Recommendation
Overview Motivation Problem
Model Method Results
tweetsUser
tweetsFollowee
tweetsFollowee
tweetsFollowee
tweetstwitter
articlesnews
T.Rex
User Model
!
"
#
Personalized ranked list of news articles
Table 5.2: MRR, precision and coverage.
Algorithm MRR P@1 P@5 P@10 CoverageRECENCY 0.020 0.002 0.018 0.036 1.000CLICKCOUNT 0.059 0.024 0.086 0.135 1.000SOCIAL 0.017 0.002 0.018 0.036 0.606CONTENT 0.107 0.029 0.171 0.286 0.158POPULARITY 0.008 0.003 0.005 0.012 1.000T.REX 0.107 0.073 0.130 0.168 1.000T.REX+ 0.109 0.062 0.146 0.189 1.000
RECENCY: it ranks news articles by time of publication (most recent first);CLICKCOUNT: it ranks news articles by click count (highest count first);SOCIAL: it ranks news articles by using T.REX with β = γ = 0;CONTENT: it ranks news articles by using T.REX with α = γ = 0;POPULARITY: it ranks news articles by using T.REX with α = β = 0.
5.6.5 Results
We report MRR, precision and coverage results in Table 5.6.3. The twovariants of our system, T.REX and T.REX+, have the best results overall.
T.REX+ has the highest MRR of all the alternatives. This result meansthat our model has a good overall performance across the dataset. CON-TENT has also a very high MRR. Unfortunately, the coverage level achievedby the CONTENT strategy is very low. This issue is mainly caused by thesparsity of the user profiles. It is well know that most of twitter usersbelong to the “silent majority,” and do not tweet very much.
The SOCIAL strategy is affected by the same problem, albeit to a muchlesser extent. The reason for this difference is that SOCIAL draws froma large social neighborhood of user profiles, instead of just one. So ithas more chances to provide a recommendation. The quality of the rec-ommendation is however quite low, probably because the social-basedprofile only is not able to catch the specific user interests.
It is worth noting that in almost 20% of the cases T.REX+ was able torank the clicked news in the top 10 results. Ranking by the CLICKCOUNT
124
!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ave
rag
e D
CG
Rank
T.Rex+T.Rex
PopularityContent
SocialRecency
Click count
63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5
What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5
Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Popularity Model Π6'('7'*'8%?@"*"'6,/0%(0%9@"%)+)8,#*(9E%+>%$"?0%#*9(',"%1/5
in updating the popularity counts is to take into account recency: newentities of interest should dominate the popularity counts of older enti-ties. In this work, we choose to update the popularity counts using anexponential decay rule. We discuss the details in Section 5.3.1. However,note that the popularity update is independent of our recommendationmodel, and any other decaying function can be used.
Finally, we propose a ranking function for recommending news arti-cles to users. The ranking function is linear combination of the scoringcomponents described above. We plan to investigate the effect of non-linear combinations in the future.
Definition 10 (Recommendation ranking Rτ (u, n)). Given the componentsΣτ , Γτ and Πτ , resulting form a stream of news N and a stream of tweets Tauthored by users U up to time τ , the recommendation score of a news articlen ∈ N for a user u ∈ U at time τ is defined as
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n),
where α, β, γ are coefficients that specify the relative weight of the components.
At any given time, the recommender system produces a set of newsrecommendation by ranking a set of candidate news, e.g., the most re-cent ones, according to the ranking function R. To motivate the pro-posed ranking function we note similarities with popular recommenda-tion techniques. When β = γ = 0, the ranking function R resemblescollaborative filtering, where user similarity is computed on the basisof their social circles. When α = γ = 0, the function R implements acontent-based recommender system, where a user is profiled by the bag-of-entities occurring in the tweets of the user. Finally, when α = β = 0,the most popular items recommended, regardless of the user profile.
Note that Σ, Γ, Π and R are all time dependent. At any given time τ
the social network and the set of authored tweets vary, thus affecting Σ
and Γ. More importantly, some entities may abruptly become popular,hence of interest to many user. This dependency is captured by Π. Whilethe changes in Σ and Γ derive directly from the tweet stream T and thesocial network S, the update of Π is non-trivial, and plays a fundamentalrole in the recommendation system that we describe in the next section.
108
Recommendation Model R
T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"
;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0
"#$%!&'(!&)*+,!-).&!/(0(12$&!$(3.!4)/!5.(/!&!2&!&#-(τ6Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1('9+*%+>%($9"*"095
How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5
Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05
DataR"?0L%SIT-%#*9(',"0%>*+:%A#@++B%$"?0P?(99"*L%H%:+$9@%+>%'*#?,"1%9?""9052,('-0L%80"*0%+>%9?(99"*%($%A#@++B%9++,<#*%,+405
EvaluationU"%"3#,8#9"%!"#$%%#0%#%',('-%)*"1('9(+$%0E09":5%U"%9*#($%+8*%:+1",%80($4%#%,"#*$($4V9+V*#$-%#))*+#'@%#$1%08))+*9%3"'9+*%:#'@($"05P@"%9*#($%#$1%9"09%0"9%#*"%1*#?$%>*+:%',('-%,+405
Claudio [email protected]
Gianmarco De Francisci [email protected]
Aristides [email protected]
Overwhelmed by information overload! W($1%($9"*"09($4%09+*("0%($%#$%+'"#$%+>%+$,($"%$"?0%#*9(',"05
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000
Minutes
News-click delay
$8:<"
*%+>%+
''8**"$'
"0
R"?0V',('-%1",#E%1(09*(<89(+$
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
May-01 h20
May-02 h00
May-02 h04
May-02 h08
May-02 h12
May-02 h16
May-02 h20
May-03 h00
May-03 h04
May-03 h08
newstwitterclicks
9:;<;'=-1'>;?$1%9*"$10
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
May-22 h00
May-22 h12
May-23 h00
May-23 h12
May-24 h00
May-24 h12
May-25 h00
May-25 h12
May-26 h00
newstwitterclicks
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
@ABC-1'!AD1;?A'9*"$10
),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/
U
T
''(%#89@+*0@()%:#9*(J
4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%
)*+18'"1%<E%2/
U
U
('('0+'(#,%:#9*(J
in N according to a user-dependent relevance criteria. We also aim atincorporating time recency into our model, so that our recommendationsfavor the most recently published news articles.
We now proceed to model the factors that affect the relevance of newsfor a given user. We first model the social-network aspect. In our case,the social component is induced by the twitter following relationship. Wedefine S to be the social network adjacency matrix, were S(i, j) is equalto 1 divided by the number of users followed by user ui if ui follows uj ,and 0 otherwise. We also adopt a functional ranking (Baeza-Yates et al.,2006) that spreads the interests of a user among its neighbors recursively.By limiting the maximum hop distance d, we define the social influencein a network as follows.
Definition 4 (Social influence S∗). Given a set of users U = {u0, u1, . . .},organized in a social network where each user may express an interest to thecontent published by another user, we define the social influence model S∗ as the|U| × |U| matrix where S∗(i, j) measures the interest of user ui to the contentgenerated by user uj and it is computed as
S∗ =
�i=d�
i=1
σiSi
�,
where S is the row-normalized adjacency matrix of the social network, d is themaximum hop-distance up to which users may influence their neighbors, and σis a damping factor.
Next we model the profile of a user based on the content that the userhas generated. We first define a binary authorship matrix A to capturethe relationship between users and the tweets they produce.
Definition 5 (Tweet authorship A). Let A be a |U|×|T | matrix where A(i, j)is 1 if ui is the author of tj , and 0 otherwise.
The matrix A can be extended to deal with different types of relation-ships between users and posts, e.g., weigh differently re-tweets, or likes.In this work, we limit the concept of authorship to the posts actuallywritten by the user.
104
0+'(#,%($9"*"09
45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5
Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5
C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5
Z
7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*
+,-./0'('*",#9"1$"00%+>%
9?""9%F-%9+%$"?0%1/T
N
*'('9?""9V9+V$"?0%:#9*(J
*+,+!+-+.
!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/
T
Z
!'(%9?""9%:#9*(J
8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/
Z
N
.'(%$"?0%:#9*(J
yandex aug 31, 2012
![Page 105: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/105.jpg)
yandex aug 31, 2012
![Page 106: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/106.jpg)
challenges
scale to large volumes of news and tweets
high dynamicity of news and tweets
news have short life-cycle
twitter users use jargon language
find the right degree of personalization
cope with inactive twitter users
yandex aug 31, 2012
![Page 107: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/107.jpg)
relate users, tweets, and news articles
yandex aug 31, 2012
![Page 108: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/108.jpg)
T.rex architecture
Entities
News
Tweets
From Chatter to Headlines:Harnessing the Real-Time Web
for Personalized News Recommendation
Overview Motivation Problem
Model Method Results
tweetsUser
tweetsFollowee
tweetsFollowee
tweetsFollowee
tweetstwitter
articlesnews
T.Rex
User Model
!
"
#
Personalized ranked list of news articles
Table 5.2: MRR, precision and coverage.
Algorithm MRR P@1 P@5 P@10 CoverageRECENCY 0.020 0.002 0.018 0.036 1.000CLICKCOUNT 0.059 0.024 0.086 0.135 1.000SOCIAL 0.017 0.002 0.018 0.036 0.606CONTENT 0.107 0.029 0.171 0.286 0.158POPULARITY 0.008 0.003 0.005 0.012 1.000T.REX 0.107 0.073 0.130 0.168 1.000T.REX+ 0.109 0.062 0.146 0.189 1.000
RECENCY: it ranks news articles by time of publication (most recent first);CLICKCOUNT: it ranks news articles by click count (highest count first);SOCIAL: it ranks news articles by using T.REX with β = γ = 0;CONTENT: it ranks news articles by using T.REX with α = γ = 0;POPULARITY: it ranks news articles by using T.REX with α = β = 0.
5.6.5 Results
We report MRR, precision and coverage results in Table 5.6.3. The twovariants of our system, T.REX and T.REX+, have the best results overall.
T.REX+ has the highest MRR of all the alternatives. This result meansthat our model has a good overall performance across the dataset. CON-TENT has also a very high MRR. Unfortunately, the coverage level achievedby the CONTENT strategy is very low. This issue is mainly caused by thesparsity of the user profiles. It is well know that most of twitter usersbelong to the “silent majority,” and do not tweet very much.
The SOCIAL strategy is affected by the same problem, albeit to a muchlesser extent. The reason for this difference is that SOCIAL draws froma large social neighborhood of user profiles, instead of just one. So ithas more chances to provide a recommendation. The quality of the rec-ommendation is however quite low, probably because the social-basedprofile only is not able to catch the specific user interests.
It is worth noting that in almost 20% of the cases T.REX+ was able torank the clicked news in the top 10 results. Ranking by the CLICKCOUNT
124
!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ave
rage D
CG
Rank
T.Rex+T.Rex
PopularityContent
SocialRecency
Click count
63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5
What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5
Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Popularity Model Π6'('7'*'8%?@"*"'6,/0%(0%9@"%)+)8,#*(9E%+>%$"?0%#*9(',"%1/5
in updating the popularity counts is to take into account recency: newentities of interest should dominate the popularity counts of older enti-ties. In this work, we choose to update the popularity counts using anexponential decay rule. We discuss the details in Section 5.3.1. However,note that the popularity update is independent of our recommendationmodel, and any other decaying function can be used.
Finally, we propose a ranking function for recommending news arti-cles to users. The ranking function is linear combination of the scoringcomponents described above. We plan to investigate the effect of non-linear combinations in the future.
Definition 10 (Recommendation ranking Rτ (u, n)). Given the componentsΣτ , Γτ and Πτ , resulting form a stream of news N and a stream of tweets Tauthored by users U up to time τ , the recommendation score of a news articlen ∈ N for a user u ∈ U at time τ is defined as
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n),
where α, β, γ are coefficients that specify the relative weight of the components.
At any given time, the recommender system produces a set of newsrecommendation by ranking a set of candidate news, e.g., the most re-cent ones, according to the ranking function R. To motivate the pro-posed ranking function we note similarities with popular recommenda-tion techniques. When β = γ = 0, the ranking function R resemblescollaborative filtering, where user similarity is computed on the basisof their social circles. When α = γ = 0, the function R implements acontent-based recommender system, where a user is profiled by the bag-of-entities occurring in the tweets of the user. Finally, when α = β = 0,the most popular items recommended, regardless of the user profile.
Note that Σ, Γ, Π and R are all time dependent. At any given time τ
the social network and the set of authored tweets vary, thus affecting Σ
and Γ. More importantly, some entities may abruptly become popular,hence of interest to many user. This dependency is captured by Π. Whilethe changes in Σ and Γ derive directly from the tweet stream T and thesocial network S, the update of Π is non-trivial, and plays a fundamentalrole in the recommendation system that we describe in the next section.
108
Recommendation Model R
T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"
;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0
"#$%!&'(!&)*+,!-).&!/(0(12$&!$(3.!4)/!5.(/!&!2&!&#-(τ6Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1('9+*%+>%($9"*"095
How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5
Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05
DataR"?0L%SIT-%#*9(',"0%>*+:%A#@++B%$"?0P?(99"*L%H%:+$9@%+>%'*#?,"1%9?""9052,('-0L%80"*0%+>%9?(99"*%($%A#@++B%9++,<#*%,+405
EvaluationU"%"3#,8#9"%!"#$%%#0%#%',('-%)*"1('9(+$%0E09":5%U"%9*#($%+8*%:+1",%80($4%#%,"#*$($4V9+V*#$-%#))*+#'@%#$1%08))+*9%3"'9+*%:#'@($"05P@"%9*#($%#$1%9"09%0"9%#*"%1*#?$%>*+:%',('-%,+405
Claudio [email protected]
Gianmarco De Francisci [email protected]
Aristides [email protected]
Overwhelmed by information overload! W($1%($9"*"09($4%09+*("0%($%#$%+'"#$%+>%+$,($"%$"?0%#*9(',"05
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000
Minutes
News-click delay
$8:<"
*%+>%+
''8**"$'
"0
R"?0V',('-%1",#E%1(09*(<89(+$
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
May-01 h20
May-02 h00
May-02 h04
May-02 h08
May-02 h12
May-02 h16
May-02 h20
May-03 h00
May-03 h04
May-03 h08
newstwitterclicks
9:;<;'=-1'>;?$1%9*"$10
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
May-22 h00
May-22 h12
May-23 h00
May-23 h12
May-24 h00
May-24 h12
May-25 h00
May-25 h12
May-26 h00
newstwitterclicks
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
@ABC-1'!AD1;?A'9*"$10
),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/
U
T
''(%#89@+*0@()%:#9*(J
4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%
)*+18'"1%<E%2/
U
U
('('0+'(#,%:#9*(J
in N according to a user-dependent relevance criteria. We also aim atincorporating time recency into our model, so that our recommendationsfavor the most recently published news articles.
We now proceed to model the factors that affect the relevance of newsfor a given user. We first model the social-network aspect. In our case,the social component is induced by the twitter following relationship. Wedefine S to be the social network adjacency matrix, were S(i, j) is equalto 1 divided by the number of users followed by user ui if ui follows uj ,and 0 otherwise. We also adopt a functional ranking (Baeza-Yates et al.,2006) that spreads the interests of a user among its neighbors recursively.By limiting the maximum hop distance d, we define the social influencein a network as follows.
Definition 4 (Social influence S∗). Given a set of users U = {u0, u1, . . .},organized in a social network where each user may express an interest to thecontent published by another user, we define the social influence model S∗ as the|U| × |U| matrix where S∗(i, j) measures the interest of user ui to the contentgenerated by user uj and it is computed as
S∗ =
�i=d�
i=1
σiSi
�,
where S is the row-normalized adjacency matrix of the social network, d is themaximum hop-distance up to which users may influence their neighbors, and σis a damping factor.
Next we model the profile of a user based on the content that the userhas generated. We first define a binary authorship matrix A to capturethe relationship between users and the tweets they produce.
Definition 5 (Tweet authorship A). Let A be a |U|×|T | matrix where A(i, j)is 1 if ui is the author of tj , and 0 otherwise.
The matrix A can be extended to deal with different types of relation-ships between users and posts, e.g., weigh differently re-tweets, or likes.In this work, we limit the concept of authorship to the posts actuallywritten by the user.
104
0+'(#,%($9"*"09
45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5
Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5
C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5
Z
7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*
+,-./0'('*",#9"1$"00%+>%
9?""9%F-%9+%$"?0%1/T
N
*'('9?""9V9+V$"?0%:#9*(J
*+,+!+-+.
!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/
T
Z
!'(%9?""9%:#9*(J
8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/
Z
N
.'(%$"?0%:#9*(J
yandex aug 31, 2012
![Page 109: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/109.jpg)
recommendation model
Rτ(u, n) = α · Στ(u, n) + β · Γτ(u, n) + γ · Πτ(n)
social modelΣ(i , j) social relevance ofnews j to user i
content modelΓ(i , j) content relevanceof news j to user i
popularity modelΠ(j) popularity model ofnews article j
yandex aug 31, 2012
![Page 110: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/110.jpg)
recommendation model
Rτ(u, n) = α · Στ(u, n) + β · Γτ(u, n) + γ · Πτ(n)
social modelΣ(i , j) social relevance ofnews j to user i
content modelΓ(i , j) content relevanceof news j to user i
popularity modelΠ(j) popularity model ofnews article j
yandex aug 31, 2012
![Page 111: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/111.jpg)
recommendation model
Rτ(u, n) = α · Στ(u, n) + β · Γτ(u, n) + γ · Πτ(n)
social modelΣ(i , j) social relevance ofnews j to user i
content modelΓ(i , j) content relevanceof news j to user i
popularity modelΠ(j) popularity model ofnews article j
yandex aug 31, 2012
![Page 112: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/112.jpg)
recommendation model
Rτ(u, n) = α · Στ(u, n) + β · Γτ(u, n) + γ · Πτ(n)
social modelΣ(i , j) social relevance ofnews j to user i
content modelΓ(i , j) content relevanceof news j to user i
popularity modelΠ(j) popularity model ofnews article j
yandex aug 31, 2012
![Page 113: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/113.jpg)
popularity update rule
Entities
News
Tweets
From Chatter to Headlines:Harnessing the Real-Time Web
for Personalized News Recommendation
Overview Motivation Problem
Model Method Results
tweetsUser
tweetsFollowee
tweetsFollowee
tweetsFollowee
tweetstwitter
articlesnews
T.Rex
User Model
!
"
#
Personalized ranked list of news articles
Table 5.2: MRR, precision and coverage.
Algorithm MRR P@1 P@5 P@10 CoverageRECENCY 0.020 0.002 0.018 0.036 1.000CLICKCOUNT 0.059 0.024 0.086 0.135 1.000SOCIAL 0.017 0.002 0.018 0.036 0.606CONTENT 0.107 0.029 0.171 0.286 0.158POPULARITY 0.008 0.003 0.005 0.012 1.000T.REX 0.107 0.073 0.130 0.168 1.000T.REX+ 0.109 0.062 0.146 0.189 1.000
RECENCY: it ranks news articles by time of publication (most recent first);CLICKCOUNT: it ranks news articles by click count (highest count first);SOCIAL: it ranks news articles by using T.REX with β = γ = 0;CONTENT: it ranks news articles by using T.REX with α = γ = 0;POPULARITY: it ranks news articles by using T.REX with α = β = 0.
5.6.5 Results
We report MRR, precision and coverage results in Table 5.6.3. The twovariants of our system, T.REX and T.REX+, have the best results overall.
T.REX+ has the highest MRR of all the alternatives. This result meansthat our model has a good overall performance across the dataset. CON-TENT has also a very high MRR. Unfortunately, the coverage level achievedby the CONTENT strategy is very low. This issue is mainly caused by thesparsity of the user profiles. It is well know that most of twitter usersbelong to the “silent majority,” and do not tweet very much.
The SOCIAL strategy is affected by the same problem, albeit to a muchlesser extent. The reason for this difference is that SOCIAL draws froma large social neighborhood of user profiles, instead of just one. So ithas more chances to provide a recommendation. The quality of the rec-ommendation is however quite low, probably because the social-basedprofile only is not able to catch the specific user interests.
It is worth noting that in almost 20% of the cases T.REX+ was able torank the clicked news in the top 10 results. Ranking by the CLICKCOUNT
124
!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ave
rag
e D
CG
Rank
T.Rex+T.Rex
PopularityContent
SocialRecency
Click count
63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5
What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5
Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Popularity Model Π6'('7'*'8%?@"*"'6,/0%(0%9@"%)+)8,#*(9E%+>%$"?0%#*9(',"%1/5
in updating the popularity counts is to take into account recency: newentities of interest should dominate the popularity counts of older enti-ties. In this work, we choose to update the popularity counts using anexponential decay rule. We discuss the details in Section 5.3.1. However,note that the popularity update is independent of our recommendationmodel, and any other decaying function can be used.
Finally, we propose a ranking function for recommending news arti-cles to users. The ranking function is linear combination of the scoringcomponents described above. We plan to investigate the effect of non-linear combinations in the future.
Definition 10 (Recommendation ranking Rτ (u, n)). Given the componentsΣτ , Γτ and Πτ , resulting form a stream of news N and a stream of tweets Tauthored by users U up to time τ , the recommendation score of a news articlen ∈ N for a user u ∈ U at time τ is defined as
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n),
where α, β, γ are coefficients that specify the relative weight of the components.
At any given time, the recommender system produces a set of newsrecommendation by ranking a set of candidate news, e.g., the most re-cent ones, according to the ranking function R. To motivate the pro-posed ranking function we note similarities with popular recommenda-tion techniques. When β = γ = 0, the ranking function R resemblescollaborative filtering, where user similarity is computed on the basisof their social circles. When α = γ = 0, the function R implements acontent-based recommender system, where a user is profiled by the bag-of-entities occurring in the tweets of the user. Finally, when α = β = 0,the most popular items recommended, regardless of the user profile.
Note that Σ, Γ, Π and R are all time dependent. At any given time τ
the social network and the set of authored tweets vary, thus affecting Σ
and Γ. More importantly, some entities may abruptly become popular,hence of interest to many user. This dependency is captured by Π. Whilethe changes in Σ and Γ derive directly from the tweet stream T and thesocial network S, the update of Π is non-trivial, and plays a fundamentalrole in the recommendation system that we describe in the next section.
108
Recommendation Model R
T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"
;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0
"#$%!&'(!&)*+,!-).&!/(0(12$&!$(3.!4)/!5.(/!&!2&!&#-(τ6Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1('9+*%+>%($9"*"095
How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5
Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05
DataR"?0L%SIT-%#*9(',"0%>*+:%A#@++B%$"?0P?(99"*L%H%:+$9@%+>%'*#?,"1%9?""9052,('-0L%80"*0%+>%9?(99"*%($%A#@++B%9++,<#*%,+405
EvaluationU"%"3#,8#9"%!"#$%%#0%#%',('-%)*"1('9(+$%0E09":5%U"%9*#($%+8*%:+1",%80($4%#%,"#*$($4V9+V*#$-%#))*+#'@%#$1%08))+*9%3"'9+*%:#'@($"05P@"%9*#($%#$1%9"09%0"9%#*"%1*#?$%>*+:%',('-%,+405
Claudio [email protected]
Gianmarco De Francisci [email protected]
Aristides [email protected]
Overwhelmed by information overload! W($1%($9"*"09($4%09+*("0%($%#$%+'"#$%+>%+$,($"%$"?0%#*9(',"05
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000
Minutes
News-click delay
$8:<"
*%+>%+
''8**"$'
"0
R"?0V',('-%1",#E%1(09*(<89(+$
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
May-01 h20
May-02 h00
May-02 h04
May-02 h08
May-02 h12
May-02 h16
May-02 h20
May-03 h00
May-03 h04
May-03 h08
newstwitterclicks
9:;<;'=-1'>;?$1%9*"$10$+
*:#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
May-22 h00
May-22 h12
May-23 h00
May-23 h12
May-24 h00
May-24 h12
May-25 h00
May-25 h12
May-26 h00
newstwitterclicks
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
@ABC-1'!AD1;?A'9*"$10
),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/
U
T
''(%#89@+*0@()%:#9*(J
4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%
)*+18'"1%<E%2/
U
U
('('0+'(#,%:#9*(J
in N according to a user-dependent relevance criteria. We also aim atincorporating time recency into our model, so that our recommendationsfavor the most recently published news articles.
We now proceed to model the factors that affect the relevance of newsfor a given user. We first model the social-network aspect. In our case,the social component is induced by the twitter following relationship. Wedefine S to be the social network adjacency matrix, were S(i, j) is equalto 1 divided by the number of users followed by user ui if ui follows uj ,and 0 otherwise. We also adopt a functional ranking (Baeza-Yates et al.,2006) that spreads the interests of a user among its neighbors recursively.By limiting the maximum hop distance d, we define the social influencein a network as follows.
Definition 4 (Social influence S∗). Given a set of users U = {u0, u1, . . .},organized in a social network where each user may express an interest to thecontent published by another user, we define the social influence model S∗ as the|U| × |U| matrix where S∗(i, j) measures the interest of user ui to the contentgenerated by user uj and it is computed as
S∗ =
�i=d�
i=1
σiSi
�,
where S is the row-normalized adjacency matrix of the social network, d is themaximum hop-distance up to which users may influence their neighbors, and σis a damping factor.
Next we model the profile of a user based on the content that the userhas generated. We first define a binary authorship matrix A to capturethe relationship between users and the tweets they produce.
Definition 5 (Tweet authorship A). Let A be a |U|×|T | matrix where A(i, j)is 1 if ui is the author of tj , and 0 otherwise.
The matrix A can be extended to deal with different types of relation-ships between users and posts, e.g., weigh differently re-tweets, or likes.In this work, we limit the concept of authorship to the posts actuallywritten by the user.
104
0+'(#,%($9"*"09
45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5
Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5
C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5
Z
7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*
+,-./0'('*",#9"1$"00%+>%
9?""9%F-%9+%$"?0%1/T
N
*'('9?""9V9+V$"?0%:#9*(J
*+,+!+-+.
!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/
T
Z
!'(%9?""9%:#9*(J
8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/
Z
N
.'(%$"?0%:#9*(J
news become stale after twodays
track mentions in news andtweets with exponentialdecay
Zτ = λZτ−1 + wTHT + wNHN
yandex aug 31, 2012
![Page 114: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/114.jpg)
model learning and evaluation
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)
Yahoo! toolbar data
the recommendation model should rank highnews articles that users click
learn the model using SVM
use clicks and twitter profiles of 3K usersto train and test the system
yandex aug 31, 2012
![Page 115: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/115.jpg)
systems evaluated
T.rex: basic model using only user profiles
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)
T.rex+: additional features
entity hotness
news click count
news article age
yandex aug 31, 2012
![Page 116: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/116.jpg)
results
Entities
News
Tweets
From Chatter to Headlines:Harnessing the Real-Time Web
for Personalized News Recommendation
Overview Motivation Problem
Model Method Results
tweetsUser
tweetsFollowee
tweetsFollowee
tweetsFollowee
tweetstwitter
articlesnews
T.Rex
User Model
!
"
#
Personalized ranked list of news articles
Table 5.2: MRR, precision and coverage.
Algorithm MRR P@1 P@5 P@10 CoverageRECENCY 0.020 0.002 0.018 0.036 1.000CLICKCOUNT 0.059 0.024 0.086 0.135 1.000SOCIAL 0.017 0.002 0.018 0.036 0.606CONTENT 0.107 0.029 0.171 0.286 0.158POPULARITY 0.008 0.003 0.005 0.012 1.000T.REX 0.107 0.073 0.130 0.168 1.000T.REX+ 0.109 0.062 0.146 0.189 1.000
RECENCY: it ranks news articles by time of publication (most recent first);CLICKCOUNT: it ranks news articles by click count (highest count first);SOCIAL: it ranks news articles by using T.REX with β = γ = 0;CONTENT: it ranks news articles by using T.REX with α = γ = 0;POPULARITY: it ranks news articles by using T.REX with α = β = 0.
5.6.5 Results
We report MRR, precision and coverage results in Table 5.6.3. The twovariants of our system, T.REX and T.REX+, have the best results overall.
T.REX+ has the highest MRR of all the alternatives. This result meansthat our model has a good overall performance across the dataset. CON-TENT has also a very high MRR. Unfortunately, the coverage level achievedby the CONTENT strategy is very low. This issue is mainly caused by thesparsity of the user profiles. It is well know that most of twitter usersbelong to the “silent majority,” and do not tweet very much.
The SOCIAL strategy is affected by the same problem, albeit to a muchlesser extent. The reason for this difference is that SOCIAL draws froma large social neighborhood of user profiles, instead of just one. So ithas more chances to provide a recommendation. The quality of the rec-ommendation is however quite low, probably because the social-basedprofile only is not able to catch the specific user interests.
It is worth noting that in almost 20% of the cases T.REX+ was able torank the clicked news in the top 10 results. Ranking by the CLICKCOUNT
124
!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ave
rage D
CG
Rank
T.Rex+T.Rex
PopularityContent
SocialRecency
Click count
63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5
What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5
Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Popularity Model Π6'('7'*'8%?@"*"'6,/0%(0%9@"%)+)8,#*(9E%+>%$"?0%#*9(',"%1/5
in updating the popularity counts is to take into account recency: newentities of interest should dominate the popularity counts of older enti-ties. In this work, we choose to update the popularity counts using anexponential decay rule. We discuss the details in Section 5.3.1. However,note that the popularity update is independent of our recommendationmodel, and any other decaying function can be used.
Finally, we propose a ranking function for recommending news arti-cles to users. The ranking function is linear combination of the scoringcomponents described above. We plan to investigate the effect of non-linear combinations in the future.
Definition 10 (Recommendation ranking Rτ (u, n)). Given the componentsΣτ , Γτ and Πτ , resulting form a stream of news N and a stream of tweets Tauthored by users U up to time τ , the recommendation score of a news articlen ∈ N for a user u ∈ U at time τ is defined as
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n),
where α, β, γ are coefficients that specify the relative weight of the components.
At any given time, the recommender system produces a set of newsrecommendation by ranking a set of candidate news, e.g., the most re-cent ones, according to the ranking function R. To motivate the pro-posed ranking function we note similarities with popular recommenda-tion techniques. When β = γ = 0, the ranking function R resemblescollaborative filtering, where user similarity is computed on the basisof their social circles. When α = γ = 0, the function R implements acontent-based recommender system, where a user is profiled by the bag-of-entities occurring in the tweets of the user. Finally, when α = β = 0,the most popular items recommended, regardless of the user profile.
Note that Σ, Γ, Π and R are all time dependent. At any given time τ
the social network and the set of authored tweets vary, thus affecting Σ
and Γ. More importantly, some entities may abruptly become popular,hence of interest to many user. This dependency is captured by Π. Whilethe changes in Σ and Γ derive directly from the tweet stream T and thesocial network S, the update of Π is non-trivial, and plays a fundamentalrole in the recommendation system that we describe in the next section.
108
Recommendation Model R
T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"
;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0
"#$%!&'(!&)*+,!-).&!/(0(12$&!$(3.!4)/!5.(/!&!2&!&#-(τ6Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1('9+*%+>%($9"*"095
How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5
Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05
DataR"?0L%SIT-%#*9(',"0%>*+:%A#@++B%$"?0P?(99"*L%H%:+$9@%+>%'*#?,"1%9?""9052,('-0L%80"*0%+>%9?(99"*%($%A#@++B%9++,<#*%,+405
EvaluationU"%"3#,8#9"%!"#$%%#0%#%',('-%)*"1('9(+$%0E09":5%U"%9*#($%+8*%:+1",%80($4%#%,"#*$($4V9+V*#$-%#))*+#'@%#$1%08))+*9%3"'9+*%:#'@($"05P@"%9*#($%#$1%9"09%0"9%#*"%1*#?$%>*+:%',('-%,+405
Claudio [email protected]
Gianmarco De Francisci [email protected]
Aristides [email protected]
Overwhelmed by information overload! W($1%($9"*"09($4%09+*("0%($%#$%+'"#$%+>%+$,($"%$"?0%#*9(',"05
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000
Minutes
News-click delay
$8:<"
*%+>%+
''8**"$'
"0
R"?0V',('-%1",#E%1(09*(<89(+$
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
May-01 h20
May-02 h00
May-02 h04
May-02 h08
May-02 h12
May-02 h16
May-02 h20
May-03 h00
May-03 h04
May-03 h08
newstwitterclicks
9:;<;'=-1'>;?$1%9*"$10
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
May-22 h00
May-22 h12
May-23 h00
May-23 h12
May-24 h00
May-24 h12
May-25 h00
May-25 h12
May-26 h00
newstwitterclicks
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
@ABC-1'!AD1;?A'9*"$10
),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/
U
T
''(%#89@+*0@()%:#9*(J
4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%
)*+18'"1%<E%2/
U
U
('('0+'(#,%:#9*(J
in N according to a user-dependent relevance criteria. We also aim atincorporating time recency into our model, so that our recommendationsfavor the most recently published news articles.
We now proceed to model the factors that affect the relevance of newsfor a given user. We first model the social-network aspect. In our case,the social component is induced by the twitter following relationship. Wedefine S to be the social network adjacency matrix, were S(i, j) is equalto 1 divided by the number of users followed by user ui if ui follows uj ,and 0 otherwise. We also adopt a functional ranking (Baeza-Yates et al.,2006) that spreads the interests of a user among its neighbors recursively.By limiting the maximum hop distance d, we define the social influencein a network as follows.
Definition 4 (Social influence S∗). Given a set of users U = {u0, u1, . . .},organized in a social network where each user may express an interest to thecontent published by another user, we define the social influence model S∗ as the|U| × |U| matrix where S∗(i, j) measures the interest of user ui to the contentgenerated by user uj and it is computed as
S∗ =
�i=d�
i=1
σiSi
�,
where S is the row-normalized adjacency matrix of the social network, d is themaximum hop-distance up to which users may influence their neighbors, and σis a damping factor.
Next we model the profile of a user based on the content that the userhas generated. We first define a binary authorship matrix A to capturethe relationship between users and the tweets they produce.
Definition 5 (Tweet authorship A). Let A be a |U|×|T | matrix where A(i, j)is 1 if ui is the author of tj , and 0 otherwise.
The matrix A can be extended to deal with different types of relation-ships between users and posts, e.g., weigh differently re-tweets, or likes.In this work, we limit the concept of authorship to the posts actuallywritten by the user.
104
0+'(#,%($9"*"09
45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5
Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5
C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5
Z
7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*
+,-./0'('*",#9"1$"00%+>%
9?""9%F-%9+%$"?0%1/T
N
*'('9?""9V9+V$"?0%:#9*(J
*+,+!+-+.
!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/
T
Z
!'(%9?""9%:#9*(J
8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/
Z
N
.'(%$"?0%:#9*(J
yandex aug 31, 2012
![Page 117: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/117.jpg)
results
Entities
News
Tweets
From Chatter to Headlines:Harnessing the Real-Time Web
for Personalized News Recommendation
Overview Motivation Problem
Model Method Results
tweetsUser
tweetsFollowee
tweetsFollowee
tweetsFollowee
tweetstwitter
articlesnews
T.Rex
User Model
!
"
#
Personalized ranked list of news articles
Table 5.2: MRR, precision and coverage.
Algorithm MRR P@1 P@5 P@10 CoverageRECENCY 0.020 0.002 0.018 0.036 1.000CLICKCOUNT 0.059 0.024 0.086 0.135 1.000SOCIAL 0.017 0.002 0.018 0.036 0.606CONTENT 0.107 0.029 0.171 0.286 0.158POPULARITY 0.008 0.003 0.005 0.012 1.000T.REX 0.107 0.073 0.130 0.168 1.000T.REX+ 0.109 0.062 0.146 0.189 1.000
RECENCY: it ranks news articles by time of publication (most recent first);CLICKCOUNT: it ranks news articles by click count (highest count first);SOCIAL: it ranks news articles by using T.REX with β = γ = 0;CONTENT: it ranks news articles by using T.REX with α = γ = 0;POPULARITY: it ranks news articles by using T.REX with α = β = 0.
5.6.5 Results
We report MRR, precision and coverage results in Table 5.6.3. The twovariants of our system, T.REX and T.REX+, have the best results overall.
T.REX+ has the highest MRR of all the alternatives. This result meansthat our model has a good overall performance across the dataset. CON-TENT has also a very high MRR. Unfortunately, the coverage level achievedby the CONTENT strategy is very low. This issue is mainly caused by thesparsity of the user profiles. It is well know that most of twitter usersbelong to the “silent majority,” and do not tweet very much.
The SOCIAL strategy is affected by the same problem, albeit to a muchlesser extent. The reason for this difference is that SOCIAL draws froma large social neighborhood of user profiles, instead of just one. So ithas more chances to provide a recommendation. The quality of the rec-ommendation is however quite low, probably because the social-basedprofile only is not able to catch the specific user interests.
It is worth noting that in almost 20% of the cases T.REX+ was able torank the clicked news in the top 10 results. Ranking by the CLICKCOUNT
124
!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ave
rage D
CG
Rank
T.Rex+T.Rex
PopularityContent
SocialRecency
Click count
63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
T.Rex!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5/#*#:"9"*0%,"#*$"1%>*+:%',('-%1#9#%($%9@"%A#@++B%9++,<#*%,+45C0"0%08))+*9%3"'9+*%:#'@($"0%#$1%,"#*$0%#%*#$-($4%>8$'9(+$5D"8*(09('#,,E%(1"$9(="1%#%4*+8)%+>%FGHI%9?(99"*%80"*0%($%9@"%9++,<#*%#$1%80"1%9@"(*%',('-0%9+%9*#($%#$1%9"09%9@"%0E09":5
What!"#$%%(0%#%$"?%:"9@+1+,+4E%>+*%*"'+::"$1($4%($9"*"09($4%$"?0%9+%80"*0%<E%"J),+(9($4%9@"%($>+*:#9(+$%($%9@"(*%9?(99"*%)"*0+$#5
Content Model Γ&'(')'*'+%?@"*"%&,-./0%(0%9@"%'+$9"$9%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Social Model Σ!3'('45'*')'*'+%?@"*"%3,-./0%(0%9@"%0+'(#,%*","3#$'"%+>%$"?0%1/'>+*%80"*%2-5
Popularity Model Π6'('7'*'8%?@"*"'6,/0%(0%9@"%)+)8,#*(9E%+>%$"?0%#*9(',"%1/5
in updating the popularity counts is to take into account recency: newentities of interest should dominate the popularity counts of older enti-ties. In this work, we choose to update the popularity counts using anexponential decay rule. We discuss the details in Section 5.3.1. However,note that the popularity update is independent of our recommendationmodel, and any other decaying function can be used.
Finally, we propose a ranking function for recommending news arti-cles to users. The ranking function is linear combination of the scoringcomponents described above. We plan to investigate the effect of non-linear combinations in the future.
Definition 10 (Recommendation ranking Rτ (u, n)). Given the componentsΣτ , Γτ and Πτ , resulting form a stream of news N and a stream of tweets Tauthored by users U up to time τ , the recommendation score of a news articlen ∈ N for a user u ∈ U at time τ is defined as
Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n),
where α, β, γ are coefficients that specify the relative weight of the components.
At any given time, the recommender system produces a set of newsrecommendation by ranking a set of candidate news, e.g., the most re-cent ones, according to the ranking function R. To motivate the pro-posed ranking function we note similarities with popular recommenda-tion techniques. When β = γ = 0, the ranking function R resemblescollaborative filtering, where user similarity is computed on the basisof their social circles. When α = γ = 0, the function R implements acontent-based recommender system, where a user is profiled by the bag-of-entities occurring in the tweets of the user. Finally, when α = β = 0,the most popular items recommended, regardless of the user profile.
Note that Σ, Γ, Π and R are all time dependent. At any given time τ
the social network and the set of authored tweets vary, thus affecting Σ
and Γ. More importantly, some entities may abruptly become popular,hence of interest to many user. This dependency is captured by Π. Whilethe changes in Σ and Γ derive directly from the tweet stream T and thesocial network S, the update of Π is non-trivial, and plays a fundamentalrole in the recommendation system that we describe in the next section.
108
Recommendation Model R
T.Rex+KE09":%9*#($"1%?(9@%#11(9(+$#,%>"#98*"0LM "$9(9E%@+9$"00%N*#?%$8:<"*%+>%:"$9(+$0%($%$"?0%#$1%9?(99"*OM $"?0%',('-%'+8$9M $"?0%#*9(',"%#4"
;(3"$L N = $"?0%09*"#: T = 9?""9%09*"#: U = 0"9%+>%80"*0
"#$%!&'(!&)*+,!-).&!/(0(12$&!$(3.!4)/!5.(/!&!2&!&#-(τ6Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1('9+*%+>%($9"*"095
How!"#$%%80"0%#%:(J%+>%0(4$#,0%9+%:+1",%*","3#$'"%+>%$"?0%#*9(',"0%>+*%80"*0L%9@"%)*+=,"%+>%9@"%0+'(#,%$"(4@<+*@++1%+>%9@"%80"*0.%9@"%'+$9"$9%9@"(*%9?""9%09*"#:.%#$1%9+)('%)+)8,#*(9E%($%9@"%$"?0%#$1%#'*+00%9?(99"*5
Results !"#$%%(0%#<,"%9+%)*"1('9%?(9@%4++1%#''8*#'E%9@"%$"?0%#*9(',"0%',('-"1%<E%9@"%80"*0%#$1%*#$-%9@":%@(4@"*%9@#$%+9@"*%$"?0%#*9(',"05
DataR"?0L%SIT-%#*9(',"0%>*+:%A#@++B%$"?0P?(99"*L%H%:+$9@%+>%'*#?,"1%9?""9052,('-0L%80"*0%+>%9?(99"*%($%A#@++B%9++,<#*%,+405
EvaluationU"%"3#,8#9"%!"#$%%#0%#%',('-%)*"1('9(+$%0E09":5%U"%9*#($%+8*%:+1",%80($4%#%,"#*$($4V9+V*#$-%#))*+#'@%#$1%08))+*9%3"'9+*%:#'@($"05P@"%9*#($%#$1%9"09%0"9%#*"%1*#?$%>*+:%',('-%,+405
Claudio [email protected]
Gianmarco De Francisci [email protected]
Aristides [email protected]
Overwhelmed by information overload! W($1%($9"*"09($4%09+*("0%($%#$%+'"#$%+>%+$,($"%$"?0%#*9(',"05
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000
Minutes
News-click delay
$8:<"
*%+>%+
''8**"$'
"0
R"?0V',('-%1",#E%1(09*(<89(+$
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
May-01 h20
May-02 h00
May-02 h04
May-02 h08
May-02 h12
May-02 h16
May-02 h20
May-03 h00
May-03 h04
May-03 h08
newstwitterclicks
9:;<;'=-1'>;?$1%9*"$10
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
May-22 h00
May-22 h12
May-23 h00
May-23 h12
May-24 h00
May-24 h12
May-25 h00
May-25 h12
May-26 h00
newstwitterclicks
$+*:
#,(Q"1
%$8:
<"*%+
>%+''8**"$'
"0
@ABC-1'!AD1;?A'9*"$10
),-./0'('E%(X%2-%(0%9@"%#89@+*%+>%9?""9%F/
U
T
''(%#89@+*0@()%:#9*(J
4,-./0'('E%(X%2-%(0%($9"*"09"1%($%9@"%'+$9"$9%
)*+18'"1%<E%2/
U
U
('('0+'(#,%:#9*(J
in N according to a user-dependent relevance criteria. We also aim atincorporating time recency into our model, so that our recommendationsfavor the most recently published news articles.
We now proceed to model the factors that affect the relevance of newsfor a given user. We first model the social-network aspect. In our case,the social component is induced by the twitter following relationship. Wedefine S to be the social network adjacency matrix, were S(i, j) is equalto 1 divided by the number of users followed by user ui if ui follows uj ,and 0 otherwise. We also adopt a functional ranking (Baeza-Yates et al.,2006) that spreads the interests of a user among its neighbors recursively.By limiting the maximum hop distance d, we define the social influencein a network as follows.
Definition 4 (Social influence S∗). Given a set of users U = {u0, u1, . . .},organized in a social network where each user may express an interest to thecontent published by another user, we define the social influence model S∗ as the|U| × |U| matrix where S∗(i, j) measures the interest of user ui to the contentgenerated by user uj and it is computed as
S∗ =
�i=d�
i=1
σiSi
�,
where S is the row-normalized adjacency matrix of the social network, d is themaximum hop-distance up to which users may influence their neighbors, and σis a damping factor.
Next we model the profile of a user based on the content that the userhas generated. We first define a binary authorship matrix A to capturethe relationship between users and the tweets they produce.
Definition 5 (Tweet authorship A). Let A be a |U|×|T | matrix where A(i, j)is 1 if ui is the author of tj , and 0 otherwise.
The matrix A can be extended to deal with different types of relation-ships between users and posts, e.g., weigh differently re-tweets, or likes.In this work, we limit the concept of authorship to the posts actuallywritten by the user.
104
0+'(#,%($9"*"09
45,-./0%Y%,"3",%+>%($9"*"09%+>%2-%9+%9@"%'+$9"$9%)*+18'"1%<E%2/5
Z = $1F-FG':B;H$'+$9+%?@('@%T%#$1'N%#*"%:#))"15U"%80"%U(-()"1(#%)#4"0%#0%+8*%"$9(9E%0)#'"5
C)1#9"1%<E%9*#'-($4%:"$9(+$0%($%$"?0%#$1%9?(99"*%?(9@%"J)+$"$9(#,%1"'#E5
Z
7,-0'(%)+)8,#*(9E%+>%"$9(9E%I-)'(%)+)8,#*(9E%3"'9+*
+,-./0'('*",#9"1$"00%+>%
9?""9%F-%9+%$"?0%1/T
N
*'('9?""9V9+V$"?0%:#9*(J
*+,+!+-+.
!,-./0'(%*",#9"1$"00%+>%9?""9%F-'9+%"$9(9E%I/
T
Z
!'(%9?""9%:#9*(J
8,-./0'(%*",#9"1$"00%+>%%"$9(9E%I-'9+%$"?0%1/
Z
N
.'(%$"?0%:#9*(J
yandex aug 31, 2012
![Page 118: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/118.jpg)
conclusions
real-time web information can be leveraged to deliverrelevant information
future directions
LSI analysis on entities
models for different user clusters
georgaphic information
yandex aug 31, 2012
![Page 119: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/119.jpg)
conclusions
real-time web information can be leveraged to deliverrelevant information
future directions
LSI analysis on entities
models for different user clusters
georgaphic information
yandex aug 31, 2012
![Page 120: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/120.jpg)
summary
review concepts on query-log mining
answering directly queries with useful tips
challenges and opportunities in information dissemination
news recommendations using real-time web
many nice problems and research opportunities
yandex aug 31, 2012
![Page 121: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/121.jpg)
thank you!
yandex aug 31, 2012
![Page 122: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/122.jpg)
references I
Anagnostopoulos, A., Becchetti, L., Castillo, C., and Gionis, A.(2010).
An optimization framework for query recommendation.
In WSDM.
Baeza-Yates, R. A., Gionis, A., Junqueira, F., Murdock, V.,Plachouras, V., and Silvestri, F. (2007).
The impact of caching on search engines.
In SIGIR.
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., andVigna, S. (2008).
The query-flow graph: model and applications.
In Proceeding of the 17th ACM conference on Information andknowledge management (CIKM).
yandex aug 31, 2012
![Page 123: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/123.jpg)
references II
Bordino, I., Castillo, C., Donato, D., and Gionis, A. (2010).
Query similarity by projecting the query-flow graph.
In SIGIR.
Craswell, N. and Szummer, M. (2007).
Random walks on the click graph.
In Proceedings of the 30th annual international ACM conference onResearch and development in information retrieval (SIGIR).
De Francisci Morales, G., Gionis, A., and Lucchese, C. (2012).
From chatter to headlines: Harnessing the real-time web forpersonalized news recommendation.
In WSDM.
Szpektor, I., Gionis, A., and Maarek, Y. (2011).
Improving recommendation for long-tail queries via templates.
In WWW.
yandex aug 31, 2012
![Page 124: Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации](https://reader033.vdocuments.us/reader033/viewer/2022052621/557fb646d8b42a40118b47f7/html5/thumbnails/124.jpg)
references III
Weber, I., Ukkonen, A., and Gioni, A. (2011).
Answers, not links: Extracting tips from yahoo! answers to addresshow-to web queries.
In CIKM.
yandex aug 31, 2012