graphs, algorithms and big data: the google adwords case study

59
Graphs, Algorithms and Big Data: the Google AdWords Case study GDG DevFest Central Italy 2013 1 Alessandro Epasto

Upload: marged

Post on 23-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Graphs, Algorithms and Big Data: the Google AdWords Case study. GDG DevFest Central Italy 2013. Alessandro Epasto. Joint work with J . Feldman, S. Lattanzi , V . Mirrokni (Google Research), S. Leonardi ( Sapienza U. Rome), H. Lynch (Google) and the AdWords team. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

1

Graphs, Algorithms and Big Data: the Google AdWords Case study

GDG DevFest Central Italy 2013

Alessandro Epasto

Page 2: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

2

Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H.

Lynch (Google) and the AdWords team.

Page 3: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

The AdWords Problem

Page 4: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

The AdWords Problem

?

Page 5: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

The AdWords Problem

?

Page 6: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

The AdWords Problem

Soccer Shoes

Page 7: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

The AdWords Problem

Soccer Shoes

Page 8: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Google Advertisement in Numbers

Over a billion of query a day. A lot of advertisers.

www.google.com/competition/howgooglesearchworks.html

Page 9: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Challenges

Several scientific and technological challenges.How to find in real-time the best ads?How to price each ads?How to suggest new queries to advertisers?

The solution to these problems involves some fundamental scientific results (e.g. a Nobel Prize-winning auction mechanism)

Page 10: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Google Advertisement in Numbers

2012 Revenues: 46 billions USD95% Advertisement: 43 billions USD.

http://investor.google.com/financial/tables.html

Page 11: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Goals of the Project

Tackling AdWords data to identify automatically, for each advertiser, its main competitors and suggest relevant queries to each advertiser.

Goals:Useful business information.Improve advertisement.More relevant performance benchmarks.

Page 12: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Information Deluge

Large advertisers (e.g. Amazon, Ask.com, etc) compete in several market segments with very different advertisers.

Query Information

Nike store New York

Market Segment: Retailer,Geo: NY (USA), Stats: 10 clicks

Soccer shoes Market Segment: Apparel,Geo: London, UK, Stats: 4 clicks

Soccer ball Market Segment: Equipment,Geo: San Franciso, CA, Stats: 5 clicks

…. millions of other queries ….

Page 13: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Representing the data

How to represent the salient features of the data?Relationships between advertisers and queriesStatistics: clicks, costs, etc.Take into account the categories.Efficient algorithms.

Page 14: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Graphs: the lingua franca of Big Data

Mathematical objects studied well before the history of computers.

Königsberg’s bridges problem. Euler, 1735.

Page 15: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Graphs: the lingua franca of Big Data

Graphs are everywhere!

Social Networks Technological Networks

Natural Networks

Page 16: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Graphs: the lingua franca of Big Data

Formal definition

A

B

C

D

A set of Nodes

Page 17: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Graphs: the lingua franca of Big Data

Formal definition

A

B

C

D

A set of Edges

Page 18: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Graphs: the lingua franca of Big Data

Formal definition

A

B

C

D

The edges might have a weight

1

4

2

3

Page 19: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Adwords data as a (Bipartite) Graph

A lot of Advertisers Billions of Queries

Hundreds of Labels

Page 20: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Semi-Formal Problem Definition

Advertisers

Queries

Page 21: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Semi-Formal Problem Definition

A

Advertisers

Queries

Page 22: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Semi-Formal Problem Definition

A

Advertisers

Queries

Labels:

Page 23: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Semi-Formal Problem Definition

A

Advertisers

Queries

Labels:

Page 24: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Semi-Formal Problem Definition

A

Advertisers

Queries

Labels:Goal:

Find the nodes most “similar” to A.

Page 25: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

How to Define Similarity?

Several node similarity measures in the literature based on the graph structure, random walk, etc. What is the accuracy?Can it scale to graphs with billions of nodes?Can be computed in real-time?

Page 26: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

The three ingredients of Big Data

A lot of data…

A sophisticated infrastructure: MapReduce

Efficient algorithms: Graph mining

Page 27: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

MapReduce

Page 28: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

MapReduce

The work is spread across several machines in parallel connected with fast links.

Page 29: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Algorithms

Personalized PageRank:Random walks on the graphClosely related to the celebrated Google

PageRank™.

Page 30: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 31: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 32: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 33: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 34: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 35: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 36: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 37: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 38: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 39: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 40: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 41: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 42: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Page 43: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Idea: perform a very long random walk (starting from v).

Rank nodes by probability of visit assigns a similarity score to each node w.r.t. node v.

Strong community bias (this can be formalized).

Page 44: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Personalized PageRank

Exact computation is unfeasible O(n^3), but it can be approximated very well.

Very efficient Map Reduce algorithm scaling to large graphs (hundred of millions of nodes)

However…

Page 45: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Algorithmic Bottleneck

Our graphs are simply too big (billions of nodes) even for large-scale systems.

MapReduce is not real-time.We cannot precompute the results for

all subsets of categories (exponential time!).

Page 46: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

1st idea: Tackling Real Graph Structure

Data size is the main bottleneck. Compressing the graph would speed up the

computation.

Page 47: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

1st idea: Tackling Real Graph Structure

a b c d e f g

A B A

B

Only advertisers.Advertisers and queries

1

Page 48: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

1st idea: Tackling Real Graph Structure

a b c d e f g

A B

1

A

B

Advertisers and queries

a b cd e fgA B

Ranking of the entire graph

2

Only advertisers.

Page 49: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

1st idea: Tackling Real Graph Structure

Theorem: the ranking computed is the corrected Personalized PageRank on the entire graph.

Based on results from the mathematical theory Markov Chain state aggregation (Simon and Ado, ’61; Meyer ’89, etc.).

Page 50: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Algorithmic Bottleneck

Our graphs are too big (billions of nodes) even for large-scale systems.

MapReduce is not real-time.We cannot precompute the results for

all subsets of categories (exponential time!).

Page 51: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Two-stage Approach

First stage: Large-scale (but feasible) MapReduce pre-computation.Second Stage: Fast iterative algorithm.

Page 52: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

First Stage: Individual Category Rankings

Advertisers

Queries

Page 53: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

First Stage: Individual Category Rankings

Advertisers

Queries

PrecomputedRankings

Page 54: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

First Stage: Individual Category Rankings

Advertisers

Queries

PrecomputedRankings

PrecomputedRankings

Page 55: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

First Stage: Individual Category Rankings

Advertisers

Queries

PrecomputedRankings

PrecomputedRankings

PrecomputedRankings

Page 56: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Second Stage: Rank aggregation

PrecomputedRankings

PrecomputedRankings

Ranking ofRed + Yellow

A real-time iterative algorithm aggregates the rankings of a given node for a subset of the categories.

Page 57: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Algorithmic Bottleneck

Our graphs are too big (billions of nodes) even for large-scale systems.

MapReduce is not real-time.We cannot precompute the results for

all subsets of categories (exponential time!).

Page 58: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Experimental evaluation shows the accuracy of the results.

Fully implemented and currently under evaluation for integration in production systems.

Ongoing research project for future scientific publications.

Conclusions

Page 59: Graphs, Algorithms and Big Data: the Google  AdWords  Case study

Thank you for your attention