pagerank increase under different collusion topologies (airweb 2005)
TRANSCRIPT
![Page 1: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/1.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank Increaseunder Different Collusion Topologies
Ricardo Baeza-Yates, Carlos Castillo and Vicente Lopez
ICREA Professor / Dept. of Technology / Catedra TelefonicaUniversitat Pompeu Fabra – Barcelona, Spain
May 10th, 2005
![Page 2: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/2.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
1 Introduction
2 Collusion and Pagerank
3 Experiments in a synthetic graph
4 Experiments in a real Web graph
5 Conclusions
![Page 3: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/3.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
![Page 4: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/4.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
![Page 5: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/5.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
![Page 6: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/6.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
![Page 7: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/7.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
![Page 8: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/8.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Framework
We use Pagerank as the ranking function [Page et al., 1998]
Pagerank
Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v
Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.
![Page 9: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/9.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Framework
We use Pagerank as the ranking function [Page et al., 1998]
Pagerank
Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v
Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.
![Page 10: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/10.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Framework
We use Pagerank as the ranking function [Page et al., 1998]
Pagerank
Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v
Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.
![Page 11: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/11.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Gain from collusion
Maximum gain [Zhang et al., 2004]:
New Pagerank
Old Pagerank≤ 1
ε
As ε ≈ 0.15, maximum gain ≈ 7.
First task: improve this bound.
![Page 12: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/12.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Gain from collusion
Maximum gain [Zhang et al., 2004]:
New Pagerank
Old Pagerank≤ 1
ε
As ε ≈ 0.15, maximum gain ≈ 7.
First task: improve this bound.
![Page 13: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/13.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Impact of collusion in Pagerank
M pages N-M pages
The Web: N pages
G
G'
![Page 14: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/14.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Grouping nodes for Pagerank calculation
Links for Pagerank, can be “lumped” together[Clausen, 2004]:
M pages N-M pagesRandom jumps
![Page 15: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/15.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Links for Pagerank calculation
Pagerankcolluding nodes = Pjump + Pin + Pself
M nodes,Pagerank=
xN-M nodes,Pagerank=
1-x
Pin
Pjump
Pself
![Page 16: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/16.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: random jumps
There are N nodes in total, M in the colluding set:
Pjump = ε(M/N)
![Page 17: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/17.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
![Page 18: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/18.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
![Page 19: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/19.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
![Page 20: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/20.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
![Page 21: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/21.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
![Page 22: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/22.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming and self links
Pin can be rewritten as:
Pin = (1− ε)(1− x)p
Using the same trick for Pself , we can take s as the weightedaverage of the fraction of self-links of each page in thecolluding set, and write:
Pself = (1− ε)xs
![Page 23: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/23.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming and self links
Pin can be rewritten as:
Pin = (1− ε)(1− x)p
Using the same trick for Pself , we can take s as the weightedaverage of the fraction of self-links of each page in thecolluding set, and write:
Pself = (1− ε)xs
![Page 24: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/24.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation summary
M nodes,Pagerank=
xN-M nodes,Pagerank=
1-x
Pin= (1-)(1-x)p
Pjump
= (M/N)
Pself
= (1-)xs
![Page 25: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/25.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Solving
Solving the stationary state Pin + Pjump + Pself = x yields:
xnormal =εMN + (1− ε) p
(p − s)(1− ε) + 1
What happens when colluding ?
Colluding means pointing more links to the insideThis means s → s ′, with s ′ > s, yielding xcolluding
![Page 26: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/26.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Solving
Solving the stationary state Pin + Pjump + Pself = x yields:
xnormal =εMN + (1− ε) p
(p − s)(1− ε) + 1
What happens when colluding ?
Colluding means pointing more links to the insideThis means s → s ′, with s ′ > s, yielding xcolluding
![Page 27: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/27.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank increase due to collusion
Making s ′ = 1, all links from the colluding set go inside now:
xcolluding
xnormal= 1 +
1− s
p + ε1−ε
Making s = p, originally the set was not colluding:
xcolluding
xnormal=
1
p(1− ε) + ε
Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set
![Page 28: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/28.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank increase due to collusion
Making s ′ = 1, all links from the colluding set go inside now:
xcolluding
xnormal= 1 +
1− s
p + ε1−ε
Making s = p, originally the set was not colluding:
xcolluding
xnormal=
1
p(1− ε) + ε
Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set
![Page 29: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/29.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank increase due to collusion
Making s ′ = 1, all links from the colluding set go inside now:
xcolluding
xnormal= 1 +
1− s
p + ε1−ε
Making s = p, originally the set was not colluding:
xcolluding
xnormal=
1
p(1− ε) + ε
Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set
![Page 30: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/30.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Expected Pagerank change
xcolluding/xnormal as a function of p
1
2
3
4
5
6
7
10-3 10-2 10-1 100
Max
imum
pag
eran
k ch
ange
Weighted average of fraction of links to colluding nodes
1/ε
![Page 31: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/31.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
![Page 32: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/32.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
![Page 33: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/33.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
![Page 34: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/34.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
![Page 35: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/35.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
![Page 36: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/36.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Original Pagerank of the nodes
These are the original Pagerank values for each group
10-6
10-5
10-4
10-3
10-2
1 2 3 4 5 6 7 8 9 10
Page
rank
val
ues
Group
Originally very bad
Originally very good
Average
![Page 37: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/37.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Modified Pagerank of the nodes
These are the modified Pagerank values when colluding.
10-6
10-5
10-4
10-3
10-2
1 2 3 4 5 6 7 8 9 10
Page
rank
val
ues
Group
OriginalClique
![Page 38: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/38.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Distribution of Pagerank
i But Pagerank values follow a power law distribution ...
10-5
10-4
10-3
10-2
10-1
100
10-6 10-5 10-4 10-3 10-2
Freq
uenc
y
Pagerank value
x-2.1
![Page 39: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/39.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Modified Pagerank position of the nodes
These are the modified Pagerank positions (rankings) whencolluding.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 2 3 4 5 6 7 8 9 10
Page
rank
rank
ing
Group
OriginalClique
![Page 40: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/40.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Variation of Pagerank when colluding
These are the ratio of xcolluding/xoriginal
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10
New
val
ue /
orig
inal
val
ue
Group
1/ε − Change in Pagerank valueChange in ranking
![Page 41: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/41.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
It is not necessary to create a clique
Spammers can use a fraction of the links to try to avoiddetection
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10
New
Pag
eran
k / o
rigi
nal P
ager
ank
Group
1/ε −Full clique
95%90%85%80%75%70%65%60%55%50%45%40%35%30%25%20%15%10%05%
In the paper, other topologies: star and ring
![Page 42: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/42.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
It is not necessary to create a clique
Spammers can use a fraction of the links to try to avoiddetection
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10
New
Pag
eran
k / o
rigi
nal P
ager
ank
Group
1/ε −Full clique
95%90%85%80%75%70%65%60%55%50%45%40%35%30%25%20%15%10%05%
In the paper, other topologies: star and ring
![Page 43: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/43.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a real Web graph
Hostgraph of 310,486 Websites from Spain
10-6
10-5
10-4
10-3
10-2
10-1
100
10-6 10-5 10-4 10-3 10-2
Freq
uenc
y
Pagerank value
x-2.1
![Page 44: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/44.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a real Web graph
Some of the nodes are already colluding [Fetterly et al., 2004]
![Page 45: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/45.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 46: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/46.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 47: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/47.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 48: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/48.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 49: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/49.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 50: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/50.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 51: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/51.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 52: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/52.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
![Page 53: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/53.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
New rankings under graph modifications
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
DisconnectedNormal
CentralRing
Inv. RingStar
Clique
Ran
king
s
Strategy
![Page 54: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/54.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Adding 5%-50% of complete subgraph
0.980
0.985
0.990
0.995
1.000
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
Ran
king
s
Percent of links of a complete subgraph
Average ranking
The best sites also increase their ranking
![Page 55: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/55.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Adding 5%-50% of complete subgraph
0.980
0.985
0.990
0.995
1.000
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
Ran
king
s
Percent of links of a complete subgraph
Average ranking
The best sites also increase their ranking
![Page 56: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/56.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
![Page 57: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/57.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
![Page 58: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/58.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
![Page 59: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/59.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
![Page 60: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/60.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
![Page 61: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/61.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
![Page 62: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/62.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Thank you
![Page 63: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/63.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Clausen, A. (2004).The cost of attack of PageRank.In Proceedings of the international conference on agents, Webtechnologies and Internet commerce (IAWTIC), Gold Coast, Australia.
Fetterly, D., Manasse, M., and Najork, M. (2004).Spam, damn spam, and statistics: Using statistical analysis to locate spamWeb pages.In Proceedings of the seventh workshop on the Web and databases(WebDB), Paris, France.
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A.,and Upfal, E. (2000).Stochastic models for the web graph.In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEECS Press.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1998).The Pagerank citation algorithm: bringing order to the web.Technical report, Stanford Digital Library Technologies Project.
![Page 64: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)](https://reader033.vdocuments.us/reader033/viewer/2022052823/555092acb4c9051e5b8b5233/html5/thumbnails/64.jpg)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pandurangan, G., Raghavan, P., and Upfal, E. (2002).Using Pagerank to characterize Web structure.In Proceedings of the 8th Annual International Computing andCombinatorics Conference (COCOON), volume 2387 of Lecture Notes inComputer Science, pages 330–390, Singapore. Springer.
Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004).Making eigenvector-based reputation systems robust to collusion.In Proceedings of the third Workshop on Web Graphs (WAW), volume3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy.Springer.