![Page 1: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/1.jpg)
Poster session: Poster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boardsWe will provide poster boards 30% of project grade
Project writeup:j p Due Friday December 11 PDF by email to course staff list Max 6 min 4 pages in ACM format More info on the website 70% of project grade
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 1
![Page 2: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/2.jpg)
Received 15 entries Received 15 entries Top score: RPL: 351 944 RPL: 351,944 GNP: 1,150,563 (5 got the OPT)
Top 5: Top 5:Name Score
Shayan Oveis Gharan 1 502 507Shayan_Oveis_Gharan 1,502,507
Farnaz_Ronaghi_Khameneh ‐2
Ying_Wang ‐11
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 2
Abhijeet_Mohapatra ‐92
Nipun_Dave ‐162
![Page 3: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/3.jpg)
Idea: combine Idea: combine min‐cut on positive edges 2nd smallest eigenvector x of Laplacian 2nd smallest eigenvector x of Laplacian
max‐cut on negative edges Largest eigenvector y of normalized Laplacian Largest eigenvector y of normalized Laplacian
So for each node 2 scores (positions): Min‐cut score Max‐cut score Min‐cut score, Max‐cut score
Now simply partition the nodes GNP (6 edges from best solution): 1 150 557GNP (6 edges from best solution): 1,150,557 RPL: 342,021 (and after local updates 351,939)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 3
![Page 4: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/4.jpg)
CS 322: (Social and Information) Network AnalysisJure LeskovecStanford University
![Page 5: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/5.jpg)
Many many documents Many many documents
How to organize/navigate it?g / g
First try: yWeb directories Yahoo, , DMOZ, LookSmartLookSmart
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 5
![Page 6: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/6.jpg)
Started in 1960s Started in 1960s Find relevant items in a repository of often small and trusted set:small and trusted set: Newspaper articles Patents et Patents, etc.
Two traditional problems:S i b d h i k d ill Synonimy: buy and purchase, sick and ill Polysemi: JaguarS d t S h Second try: Search
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 6
![Page 7: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/7.jpg)
D bi i d b tt lt ?Does bigger index mean better results?
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 7
![Page 8: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/8.jpg)
What is “best” answer to query “Stanford”?What is best answer to query Stanford ? Anchor Text: I go to Stanford where I study
What about query “newspaper”? What about query newspaper ? Not a single right answer
Scarcity (IR) vs abundance (Web) Scarcity (IR) vs. abundance (Web) Many sources of info: who to “trust”
Trick: Trick: pages that actually know about newspapers might all be pointing to many newspapersmight all be pointing to many newspapers
Ranking!12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 8
![Page 9: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/9.jpg)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 9
![Page 10: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/10.jpg)
Goal (back to newspaper example): Goal (back to newspaper example): Don’t just find newspapers but also find “experts” – people who link in a coordinated way to many– people who link in a coordinated way to many good newspapers
Idea: link votingIdea: link voting Quality as an expert (hub): Total sum of votes of pages pointed to
NYT: 10Ebay: 3Total sum of votes of pages pointed to
Quality as an content (authority): Total sum of votes of experts
Ebay: 3Yahoo: 3CNN: 8WSJ: 9p
Principle of repeated improvement12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 10
![Page 11: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/11.jpg)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 11
![Page 12: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/12.jpg)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 12
![Page 13: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/13.jpg)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 13
![Page 14: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/14.jpg)
Each page i has 2 kinds of scores: Each page i has 2 kinds of scores: Hub score: hi A th it Authority score: ai
Algorithm:I iti li h 1 Initialize: ai=hi=1 Then keep iterating:
A th it h Authority: Hub: Normalize:
ji
ij ha
ji
ji ah
Normalize:ai=1, hi=1
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 14
![Page 15: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/15.jpg)
This will converge to a single stable point This will converge to a single stable point Slightly change the notation: Vector a (a a ) h (h h ) Vector a=(a1…,an), h=(h1…,hn) Adjacency matrix (n x n): Mij=1 if ij
Then: Then:
jjiji
jiji aMhah
So: And likewise:
jji
Mah hMa T And likewise:
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis
hMa
15
![Page 16: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/16.jpg)
Algorithm in new notation: Algorithm in new notation: Set: a = h = 1n
Repeat:Repeat: h=Ma, a=MTh Normalize
T a is being updated (in 2 steps): Then: a=MT(Ma)new h
new a
a is being updated (in 2 steps):MT(Ma)=(MTM)ah is updated (in 2 steps):
Thus, in 2k steps: a=(MTM)ka
new a p ( p )M (MTh)=(MMT)h
a=(M M) ah=(MMT)kh
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis
Repeated matrix powering
16
![Page 17: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/17.jpg)
Definition: Definition: Let Ax=x for some scalar , vector x and matrix A th i i t d i it i l then x is an eigenvector, and is its eigenvalue
Fact: If A is symmetric (Aij=Aji) (note in our case MTM and MMT are symmetric)( y ) Then A has n orthogonal unit eigenvectors w1…wnthat form a basis (coordinate system) with eigenvalues 1... n (|i||i+1|)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 17
![Page 18: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/18.jpg)
Write x in coordinate system w w Write x in coordinate system w1…wnx=i iwi
x has coordinates ( ) x has coordinates (1,…, n)
Suppose: 1.. n (|1||2| … |n|)
Akx = (1k1, 2k2,…., nkn) = ikiwi
As k, if we normalize Akx11w1 (all other coordinates 0)
So authority a is eigenvector of MTM associated with y glargest eigenvalue 1 (need |1|>|2|)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 18
![Page 19: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/19.jpg)
A vote from an important page is worth more A vote from an important page is worth more
A page is important if it is pointed to by other p g p p yimportant pages
f “ ” f Define a “rank” rj for node j rj should be proportional to:
ji
iri of outdegree
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 19
![Page 20: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/20.jpg)
rj … probability I’m currently at j in a random walk jrj = Pr[at i] Pr[ij]
But rj= ri/(out‐degree of i) j i/( g )prob. of being at j after one step of a random walk
Define: Nij=Mij/di = 1/di Mij=1 if node i links to j out degree of i is d out‐degree of i is di
Nij is prob. we will be at j if we are currently at i
Then in the limit: r = Nr Then in the limit: r = Nr i.e., r is principal eigenvector of N
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 20
![Page 21: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/21.jpg)
Power iteration: Y! Power iteration: Set ri=1 /d
Y!
A MS rj=j ri/di And iterate
Y! A MS
Y! ½ ½ 0
A ½ 0 1
Example:1 1 5/4 9/8 6/5
A ½ 0 1
MS 0 ½ 0
y 1 1 5/4 9/8 6/5a = 1 3/2 1 11/8 … 6/5m 1 ½ ¾ ½ 3/5
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 21
![Page 22: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/22.jpg)
Some pages are “dead ends” Some pages are dead ends (have no out‐links) Such pages cause importance Such pages cause importanceto leak out
Spider traps (all out links arewithin the group)within the group) Eventually spider traps absorb all importance
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 22
![Page 23: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/23.jpg)
Power iteration: Y! Power iteration: Set ri=1 /d
Y!
A MS rj=j ri/di And iterate
Y! A MS
Y! ½ ½ 0
A ½ 0 0
Example:1 1 ¾ 5/8 0
A ½ 0 0
MS 0 ½ 0
y 1 1 ¾ 5/8 0a = 1 ½ ½ 3/8 … 0m 1 ½ ¼ ¼ 0
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 23
![Page 24: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/24.jpg)
Power iteration: Y! Power iteration: Set ri=1 /d
Y!
A MS rj=j ri/di And iterate
Y! A MS
Y! ½ ½ 0
A ½ 0 0
Example:1 1 ¾ 5/8 0
A ½ 0 0
MS 0 ½ 1
y 1 1 ¾ 5/8 0a = 1 ½ ½ 3/8 … 0m 1 3/2 7/4 2 3
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 24
![Page 25: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/25.jpg)
“Tax” each page by at each iteration Tax each page by at each iteration
Add a fixed constant to all pages
Models a random walk with a fixed probability of jumping to a random pageprobability of jumping to a random page
We really want:(1 ) /drj=(1‐) ij ri/di +
Random walk that follows a link with prob. 1‐ and randomly jumps with prob randomly jumps with prob.
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 25
di … outdegreeof node i
![Page 26: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/26.jpg)
PageRank as a principal eigenvector PageRank as a principal eigenvectorr=NTr rj=j ri/di
But we really want: But we really want:rj = (1‐) ij ri/di + iri
Define: Define:N’ij = (1‐)Nij + 1/n
Then: r = N’Trdi … outdegreeof node i
Then: r = N r What is ? In practice =0 15 (5 links and jump) In practice =0.15 (5 links and jump)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 26
![Page 27: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/27.jpg)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 27
![Page 28: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/28.jpg)
Topic specific PageRank Topic‐specific PageRank Goal: evaluate pages not just by popularity but by how close they are to the topicbut by how close they are to the topic
Walker has a small teleporting probability Teleporting can go to: Teleporting can go to: Any page with equal probability (we used this so far) (we used this so far)
A topic‐specific set of “relevant” pages Topic‐specific (personalized) PageRank Topic‐specific (personalized) PageRank N’ij = (1‐)Nij + c (where c is a vector)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 28
![Page 29: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/29.jpg)
Link Farms: networks of Link Farms: networks of millions of pages design to focus PageRank on a few gundeserving webpages
To minimize their influence use a teleport t f t t d bset of trusted webpages
E.g., homepages of universitiesuniversities
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 29
![Page 30: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/30.jpg)
Rich get richer Rich get richer
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 30
![Page 31: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/31.jpg)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 31
![Page 32: Poster session - Stanford Universitysnap.stanford.edu/na09/19-pagerank-annot.pdfPoster session: Thursday December 10 3‐6 pm Gates Atrium We will provide poster boards 30% of project](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f2005da06a75f1eb121a1d1/html5/thumbnails/32.jpg)
12/1/2009 Jure Leskovec, Stanford CS322: Network Analysis 32