reduce and aggregate: similarity ranking in multi ...millions of advertisers billions of queries eds...
TRANSCRIPT
![Page 1: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/1.jpg)
Reduce and Aggregate: Similarity Ranking in Multi-Categorical
Bipartite Graphs
J. Feldman*, S. Lattanzi*, S. Leonardi°, V. Mirrokni*. *Google Research °Sapienza U. Rome
Alessandro Epasto
![Page 2: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/2.jpg)
Motivation
● Recommendation Systems: ● Bipartite graphs with Users and Items. ● Identify similar users and suggest relevant
items. ● Concrete example: The AdWords case.
● Two key observations: ● Items belong to different categories. ● Graphs are often lopsided.
![Page 3: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/3.jpg)
Modeling the Data as a Bipartite Graph
Millions of Advertisers Billions of Queries
Hundreds of Labels
Nike Store New York
Soccer Shoes
Soccer Ball
2$
3$
4$
1$
5$
2$
Retailers
Apparel
Sport Equipment
![Page 4: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/4.jpg)
Personalized PageRank
v u
The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v.
For a node v (the seed) and a probability alpha
![Page 5: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/5.jpg)
The Problem
Millions of Advertisers Billions of Queries
Hundreds of Labels
Nike Store New York
Soccer Shoes
Soccer Ball
2$
3$
4$
1$
5$
2$
Retailers
Apparel
Sport Equipment
![Page 6: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/6.jpg)
Other Applications
● General approach applicable to several contexts: ●User, Movies, Genres: find similar users
and suggest movies. ● Authors, Papers, Conferences: find
related authors and suggest papers to read.
![Page 7: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/7.jpg)
Semi-Formal Problem Definition
Advertisers
Queries
![Page 8: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/8.jpg)
Semi-Formal Problem Definition
A
Advertisers
Queries
![Page 9: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/9.jpg)
Semi-Formal Problem Definition
A
Advertisers
Queries
Labels:
![Page 10: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/10.jpg)
Semi-Formal Problem Definition
A
Advertisers
Queries
Labels:Goal:
Find the nodes most “similar” to A.
![Page 11: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/11.jpg)
How to Define Similarity?
● We address the computation of several node similarity measures: ● Neighborhood based: Common neighbors,
Jaccard Coefficient, Adamic-Adar. ● Paths based: Katz. ● Random Walk based: Personalized PageRank.
● Experimental question: which measure is useful? ● Algorithmic questions: ● Can it scale to huge graphs? ● Can we compute it in real-time?
![Page 12: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/12.jpg)
Our Contribution
● Reduce and Aggregate: general approach to induce real-time similarity rankings in multi-categorical bipartite graphs, that we apply to several similarity measures.
● Theoretical guarantees for the precision of the algorithms.
● Experimental evaluation with real world data.
![Page 13: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/13.jpg)
Personalized PageRank
v u
The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v.
For a node v (the seed) and a probability alpha
![Page 14: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/14.jpg)
Challenges
● Our graphs are too big (billions of nodes) even for very large-scale MapReduce systems.
● MapReduce is not real-time.
● We cannot pre-compute the rankings for each subset of labels.
![Page 15: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/15.jpg)
Reduce and Aggregate
Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph.
Aggregate: Given a node v in A and the reduced graphs of the subset of categories interested determine the ranking for v.
a
b
c
a b
c
c
a
b
a
c
1)
b
2)
3)
![Page 16: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/16.jpg)
Reduce (Precomputation)
Advertisers
Queries
![Page 17: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/17.jpg)
Reduce (Precomputation)
Advertisers
Queries
Precomputed Rankings
![Page 18: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/18.jpg)
Reduce (Precomputation)
Advertisers
Queries
Precomputed Rankings
Precomputed Rankings
![Page 19: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/19.jpg)
Reduce (Precomputation)
Advertisers
Queries
Precomputed Rankings
Precomputed Rankings
Precomputed Rankings
![Page 20: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/20.jpg)
Aggregate (Run Time)
Precomputed Rankings
Precomputed Rankings
Ranking of Red + Yellow
A
![Page 21: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/21.jpg)
Reduce for Personalized PageRank
●Markov Chain state aggregation theory (Simon and Ado, ’61; Meyer ’89, etc.).
● 750x reduction in the number of node while preserving correctly the PPR distribution on the entire graph.
XSide A
Side BSide A
Y X
Y
![Page 22: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/22.jpg)
Run-time Aggregation
![Page 23: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/23.jpg)
Koury et al. Aggregation-Disaggregation Algorithm
Step 1: Partition the Markov chain into DISJOINT subsets
A B
![Page 24: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/24.jpg)
Koury et al. Aggregation-Disaggregation Algorithm
Step 2: Approximate the stationary distribution on each subset independently.
⇡A⇡B
A B
![Page 25: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/25.jpg)
Koury et al. Aggregation-Disaggregation Algorithm
Step 3: Consider the transition between subsets.
⇡A
PAB
PBA
PBB
PAAA B
⇡B
![Page 26: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/26.jpg)
Koury et al. Aggregation-Disaggregation Algorithm
Step 4: Aggregate the distributions. Repeat until convergence.
PAB
PBA
PBB
PAAA B
⇡0B⇡0
A
![Page 27: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/27.jpg)
Aggregation in PPR
X Y
Precompute the stationary distributions individually
⇡A
A
![Page 28: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/28.jpg)
Aggregation in PPR
X Y
Precompute the stationary distributions individually
⇡B
B
![Page 29: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/29.jpg)
Aggregation in PPR
The two subsets are not disjoint!
A B
![Page 30: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/30.jpg)
Our Approach
X Y X Y
● The algorithm is based only on the reduced graphs with Advertiser-Side nodes.
● The aggregation algorithm is scalable and converges to the correct distribution.
⇡A ⇡B
![Page 31: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/31.jpg)
Experimental Evaluation
● We experimented with publicly available and proprietary datasets:
● Query-Ads graph from Google AdWords > 1.5 billions nodes, > 5 billions edges.
● DBLP Author-Papers and Patent Inventor-Inventions graphs.
● Ground-Truth clusters of competitors in Google AdWords.
![Page 32: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/32.jpg)
Patent Graph
Recall
Prec
isio
n
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre
cis
ion
Recall
Precision vs Recall
InterJaccard
Adamic-AdarKatzPPR
![Page 33: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/33.jpg)
Google AdWords
Recall
Prec
isio
n
![Page 34: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/34.jpg)
Conclusions and Future Work
● It is possible to compute several similarity scores on very large bipartite graphs in real-time with good accuracy.
●Future work could focus on the case where categories are not disjoint is relevant.
![Page 35: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/35.jpg)
Thank you for your attention
![Page 36: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/36.jpg)
Reduction to the Query Side
X Y
⇡A ⇡B
![Page 37: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/37.jpg)
Reduction to the Query Side
X Y
This is the larger side of the graph.
⇡A ⇡B
![Page 38: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/38.jpg)
Convergence after One Iteration
0
0.2
0.4
0.6
0.8
1
10 20 30 40 50 All
Kend
all-T
au
Position (k)
Kendall-Tau Correlation
DBLPPatent
Query-Ads (cost)
![Page 39: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers](https://reader034.vdocuments.us/reader034/viewer/2022042417/5f3334062cfe806c5d0c6871/html5/thumbnails/39.jpg)
Convergence
Iterations
1-Co
sine
Sim
ilari
ty
1e-06
1e-05
0.0001
0.001
0 2 4 6 8 10 12 14 16 18 20
1-C
osi
ne
Iterations
Approximation Error vs # Iterations
DBLP (1 - Cosine)Patent (1 - Cosine)