university of fribourg, switzerlandmedo/physics/talks/info_filtering.pdf · matúš medo university...
TRANSCRIPT
Complexity insights into information filtering
Matúš Medo
University of Fribourg, Switzerland
ICT Applications to Non-Equilibrium Social Sciences11-12 June, 2013, Lisbon
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 1 / 25
Outline
1 Growth of information networks
2 Physics-motivated approach to recommendation
3 Crowd-avoidance in recommendation
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 2 / 25
Part 1
Growth of information networks
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 3 / 25
Preferential attachment
A classical network model
Yule (1925), Simon (1955), Price (1976), Barabási & Albert (1999)
Growth of cities, citations of scientific papers, WWW,. . .
Nodes and links are added with time
Probability that a node acquires a new linkproportional to its current degree
P(i , t) ∼ ki(t)
1
2
3
4
Pros: simple, produces a power-law degree distribution
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 4 / 25
Preferential attachment
A classical network model
Yule (1925), Simon (1955), Price (1976), Barabási & Albert (1999)
Growth of cities, citations of scientific papers, WWW,. . .
Nodes and links are added with time
Probability that a node acquires a new linkproportional to its current degree
P(i , t) ∼ ki(t)
1
2
3
4
Pros: simple, produces a power-law degree distribution
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 4 / 25
Preferential attachment
A classical network model
Yule (1925), Simon (1955), Price (1976), Barabási & Albert (1999)
Growth of cities, citations of scientific papers, WWW,. . .
Nodes and links are added with time
Probability that a node acquires a new linkproportional to its current degree
P(i , t) ∼ ki(t)
1
2
3
4
Pros: simple, produces a power-law degree distribution
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 4 / 25
PA in scientific citation data
Journals of the American Physical Society from 1893 to 2009:
0 50 100 150 200 250 300degree k
0
1
2
3
4
5d
egre
e in
crem
ent
∆k
empirical datapreferential attachment
See also Adamic & Huberman (2000), Redner (2005), Newman (2009),. . .
0 50 100 150 200 250 300degree k
0
1
2
3
4
5
deg
ree
incr
emen
t ∆
k
1 year
10 years
30 years
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 5 / 25
PA in scientific citation data
Journals of the American Physical Society from 1893 to 2009:
0 50 100 150 200 250 300degree k
0
1
2
3
4
5
deg
ree
incr
emen
t ∆
k
empirical datapreferential attachment
See also Adamic & Huberman (2000), Redner (2005), Newman (2009),. . .
0 50 100 150 200 250 300degree k
0
1
2
3
4
5d
egre
e in
crem
ent
∆k
1 year
10 years
30 years
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 5 / 25
Time decay is fundamental
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 6 / 25
The model (PRL 107, 238701, 2011)
Probability that node i attracts a new link
P(i , t) ∼ ki(t)︸︷︷︸degree
× Ri(t)︸ ︷︷ ︸relevance
Relevance of every node decays with timeWhen Ri(t)→ 0, the popularity of nodes eventually saturates
Good news:Produces various realistic degree distributions (power-law, etc.)
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 7 / 25
The model (PRL 107, 238701, 2011)
Probability that node i attracts a new link
P(i , t) ∼ ki(t)︸︷︷︸degree
× Ri(t)︸ ︷︷ ︸relevance
Relevance of every node decays with timeWhen Ri(t)→ 0, the popularity of nodes eventually saturates
Good news:Produces various realistic degree distributions (power-law, etc.)
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 7 / 25
Datasets
1 Citations among papers published by the APS
2 Citations among the US patents
3 User collections of web bookmarks
4 Paper downloads from the Econophysics Forum
data description label nodes links span/resolution ∆t
APS citations APS 450k 4.7M 117 years/daily 91 daysU.S. patents PAT 3.2M 24M 31 years/yearly 1 year
web bookmarks WEB 2.3M 4.2M 4 years/daily 10 dayspaper downloads EF 600 16k 23 months/daily 10 days
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 8 / 25
Decay of relevance
0 10 20 30 40 50 6010-2
10-1
100
101
102
0 5 10 15 20 25 3010-1
100
101
0 100 200 300 400 500 60010-2
10-1
100
101
102
103
0 300 600 900 120010-1
100
101
102
APS
EF
PAT
DEL
aver
age
empir
ical
rel
evan
ce X
(t)
age t in years (top) or days (bottom)
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 9 / 25
Future challenges
Other properties of the model networks:clustering, community structure,. . .
Improved statistical validationTroubles with high-dimensional statisticsEconophysics Forum data: new model is much more likely(10410-times) than the second-best model (Eom-Fortunato, 2011)
Knowledge of the dynamics can help select nodesthat are most relevant now =⇒ useful tools?
Theory says: Total relevance matters =⇒ useful tools?
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 10 / 25
Part 2Physics-motivated approach to recommendation
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 11 / 25
Recommendation challenge
There is often too much information availableToo many books to read, movies to watch, . . .How to choose?
Recommender systems analyze data on past user preferences topredict possible future likes and interests
Many approaches exist:Collaborative filteringContent-based analysisLatent semantic modelsSpectral methods. . .
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 12 / 25
Recommendation challenge
There is often too much information availableToo many books to read, movies to watch, . . .How to choose?
Recommender systems analyze data on past user preferences topredict possible future likes and interests
Many approaches exist:Collaborative filteringContent-based analysisLatent semantic modelsSpectral methods. . .
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 12 / 25
Recommendation by random walk
Two-step random walk:
0
1
0
0
1user 1
user 2
user 3
user 4
START
−→
0
1/3
5/6
5/6
STEP 1
−→
0
5/8
3/8
5/24
19/24STEP 2
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 13 / 25
Recommendation by random walk
Two-step random walk:
0
1
0
0
1user 1
user 2
user 3
user 4
START
−→
0
1/3
5/6
5/6
STEP 1
−→
0
5/8
3/8
5/24
19/24STEP 2
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 13 / 25
Recommendation by random walk
Two-step random walk:
0
1
0
0
1user 1
user 2
user 3
user 4
START
−→
0
1/3
5/6
5/6
STEP 1
−→
0
5/8
3/8
5/24
19/24STEP 2
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 13 / 25
Key insights
Random walk favors high-degree nodesThey have more ways to receive resources
By contrast, heat diffusion favors low-degree nodesIf you touch many places, your temperature is likely to be averageIf you touch a few places, you may get “lucky” and end hot
Interestingly, the two processes are matematically closely relatedTheir matrices are transpose of each other: M and MT
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 14 / 25
Key insights
Random walk favors high-degree nodesThey have more ways to receive resources
By contrast, heat diffusion favors low-degree nodesIf you touch many places, your temperature is likely to be averageIf you touch a few places, you may get “lucky” and end hot
Interestingly, the two processes are matematically closely relatedTheir matrices are transpose of each other: M and MT
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 14 / 25
Hybridization (PNAS 107, 4511, 2010)
HEATDIFF
MT
RANDWALK
M
H(λ)
0
25
50
75
accu
racy
λ
0.6
0.7
0.8
dive
rsity
RANDWALK
HEATDIFF
Result: simultaneous improvement of accuracy and diversity
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 15 / 25
Hybridization (PNAS 107, 4511, 2010)
HEATDIFF
MT
RANDWALK
M
H(λ)
0
25
50
75
accu
racy
λ
0.6
0.7
0.8
dive
rsity
RANDWALK
HEATDIFF
Result: simultaneous improvement of accuracy and diversity
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 15 / 25
Part 3
Crowd-avoidance in recommendation
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 16 / 25
The challenge
Traditional recommender systems do not care to how many usersan item gets recommended
Since they often have bias toward popularity, a small number ofwinners often emerges
This may be harmful:Bar recommended to many people becomes overcrowdedIn the long term, our information horizons shrink
bosons fermions
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 17 / 25
The challenge
Traditional recommender systems do not care to how many usersan item gets recommended
Since they often have bias toward popularity, a small number ofwinners often emerges
This may be harmful:Bar recommended to many people becomes overcrowdedIn the long term, our information horizons shrink
bosons fermions
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 17 / 25
Crowd-avoidance in recommendation(EPL 101, 20008, 2013)
Easy to achieve ex post:1 A recommendation algorithm produces a ranked list of items for
each user
2 We impose maximal occupancy m for every item
An example with m = 1
user 1
user 2 user 3
item 7X
item 3X item 7
item 3
item 4 item 2X
Two ways to enforce occupation:User-by-user (local optimization)
Minimize the total rank of chosen items (global optimization)
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 18 / 25
Crowd-avoidance in recommendation(EPL 101, 20008, 2013)
Easy to achieve ex post:1 A recommendation algorithm produces a ranked list of items for
each user
2 We impose maximal occupancy m for every item
An example with m = 1
user 1 user 2
user 3
item 7X item 3X
item 7
item 3 item 4
item 2X
Two ways to enforce occupation:User-by-user (local optimization)
Minimize the total rank of chosen items (global optimization)
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 18 / 25
Crowd-avoidance in recommendation(EPL 101, 20008, 2013)
Easy to achieve ex post:1 A recommendation algorithm produces a ranked list of items for
each user
2 We impose maximal occupancy m for every item
An example with m = 1
user 1 user 2 user 3item 7X item 3X item 7item 3 item 4 item 2X
Two ways to enforce occupation:User-by-user (local optimization)
Minimize the total rank of chosen items (global optimization)
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 18 / 25
Crowd-avoidance in recommendation(EPL 101, 20008, 2013)
Easy to achieve ex post:1 A recommendation algorithm produces a ranked list of items for
each user
2 We impose maximal occupancy m for every item
An example with m = 1
user 1 user 2 user 3item 7X item 3X item 7item 3 item 4 item 2X
Two ways to enforce occupation:User-by-user (local optimization)
Minimize the total rank of chosen items (global optimization)
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 18 / 25
Test case
Netflix subset with 2000 users
One object recommended to each user (m = 1)
There is no reason for real crowd avoidance in DVD rentals
Country-production data would be a better test candidate but. . .
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 19 / 25
Test case
Netflix subset with 2000 users
One object recommended to each user (m = 1)
There is no reason for real crowd avoidance in DVD rentals
Country-production data would be a better test candidate but. . .
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 19 / 25
Crowd-avoidance enhances diversity
neff =(∑
α
kα)2/∑
α
k2α
1 10 100 1000maximal occupancy m
101
102
103
effe
ctiv
e num
ber
of
item
s n
eff
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 20 / 25
Crowd-avoidance enhances accuracy
100
101
102
103
maximal occupancy m
0.00
0.10
0.20
0.30pre
cisi
on P
(m)
local optimizationglobal optimization
+34%
+11%
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 21 / 25
Is it because our recommendations are wrong?
1 10 100 1000rank r
0.00
0.05
0.10
0.15p
reci
sio
n P
(r)
1 10 100 1000rank r
0.00
0.05
0.10
0.15
pre
cisi
on
P(r
)
No
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 22 / 25
Is it because our recommendations are wrong?
1 10 100 1000rank r
0.00
0.05
0.10
0.15
pre
cisi
on
P(r
)
1 10 100 1000rank r
0.00
0.05
0.10
0.15p
reci
sio
n P
(r)
No
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 22 / 25
The (probably) correct explanation
Recommendation algorithms are often biasedTypically towards popular items but less tangible reasons exist too
Hypothesis: crowd-avoidance suppresses these biases
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without bias
ARTIFICIAL DATA
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without biaswith bias
ARTIFICIAL DATA
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 23 / 25
The (probably) correct explanation
Recommendation algorithms are often biasedTypically towards popular items but less tangible reasons exist too
Hypothesis: crowd-avoidance suppresses these biases
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without bias
ARTIFICIAL DATA
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without biaswith bias
ARTIFICIAL DATA
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 23 / 25
The (probably) correct explanation
Recommendation algorithms are often biasedTypically towards popular items but less tangible reasons exist too
Hypothesis: crowd-avoidance suppresses these biases
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without bias
ARTIFICIAL DATA
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without biaswith bias
ARTIFICIAL DATA
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 23 / 25
The (probably) correct explanation
Recommendation algorithms are often biasedTypically towards popular items but less tangible reasons exist too
Hypothesis: crowd-avoidance suppresses these biases
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without bias
ARTIFICIAL DATA
0 2 4 6 8 10maximal occupancy m
0.22
0.24
0.26
0.28
pre
cisi
on
P(m
)
without biaswith bias
ARTIFICIAL DATA
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 23 / 25
Crowd-avoidance: Summary
Crowd-avoidance can improve both accuracy and diversity ofrecommendation
A rare case where constraints improve the outcome
To do
How best to introduce occupancy constraints?Constraints heterogeneous over objectsApproaches to global optimization
Study real data with crowd avoidanceCountry-production data
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 24 / 25
Crowd-avoidance: Summary
Crowd-avoidance can improve both accuracy and diversity ofrecommendation
A rare case where constraints improve the outcome
To do
How best to introduce occupancy constraints?Constraints heterogeneous over objectsApproaches to global optimization
Study real data with crowd avoidanceCountry-production data
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 24 / 25
Thank you for your attention!
Questions?
Matúš Medo (Uni Fribourg) Complexity insights into information filtering 25 / 25