cubrik research at sigmod 2012

42
+ Top-k bounded diversification Piero Fraternali, Davide Martinenghi , Marco Tagliasacchi Politecnico di Milano, Italy Scottsdale, AZ, USA - May 24, 2012 1

Upload: cubrik-project

Post on 19-Jun-2015

159 views

Category:

Documents


0 download

DESCRIPTION

presentation of "Top-k Bounded Diversification" research paper

TRANSCRIPT

Page 1: CUbRIK research at SIGMOD 2012

+

Top-k bounded diversification

Piero Fraternali, Davide Martinenghi, Marco TagliasacchiPolitecnico di Milano, Italy

Scottsdale, AZ, USA - May 24, 20121

Page 2: CUbRIK research at SIGMOD 2012

+Motivation

Diversification is useful in application domains where objects can be described by a score a 2- or 3-dimensional feature vector

Many examples from search (real estate, image search, …) Apartments distributed over a map

Score (e.g., price) + 2D feature vector (geo-localization) Evolution in time of price of apartments over a map

Score (e.g., price) + 3D feature vector (geo-localization + time)

Properties of images (e.g., HSI color features) Score (e.g., relevance to a given keyword) + 3D feature

vector (e.g., average HSI components in the image)

2

Page 3: CUbRIK research at SIGMOD 2012

+Diversified result setLooking for good restaurants in Milan

3

Page 4: CUbRIK research at SIGMOD 2012

+Diversified result setLooking for good restaurants in Milan

4

top 15

Page 5: CUbRIK research at SIGMOD 2012

+Diversified result setLooking for good restaurants in Milan

5

top 15 diversified

over the region

top 15

Page 6: CUbRIK research at SIGMOD 2012

+Diversification

We are given a set O of N objects is the vector-space representation of

object o is the relevance score of object o

Diversification problem

6

Page 7: CUbRIK research at SIGMOD 2012

+Diversification

We are given a set O of N objects is the vector-space representation of

object o is the relevance score of object o

Diversification problem

7

Best diversified set of K objects

Relevance to query (as

score)

Diversity (as distance)

Set of objects

Objective function

Page 8: CUbRIK research at SIGMOD 2012

+ Greedy approach to diversification

Diversification problems are NP-hard

Approximate greedy algorithms are needed

MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted

score K steps in total

MMR (Maximum Marginal Relevance)

8

Page 9: CUbRIK research at SIGMOD 2012

+ Greedy approach to diversification

Diversification problems are NP-hard

Approximate greedy algorithms are needed

MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted

score K steps in total

MMR (Maximum Marginal Relevance)

9

Balance between

relevance and diversity

RelevanceDiversity

Diversity-weighted score

Page 10: CUbRIK research at SIGMOD 2012

+ Greedy approach to diversification

Diversification problems are NP-hard

Approximate greedy algorithms are needed

MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted

score K steps in total

Corresponding objective function:

MMR (Maximum Marginal Relevance)

10

Page 11: CUbRIK research at SIGMOD 2012

+ Greedy approach to diversification

Diversification problems are NP-hard

Approximate greedy algorithms are needed

MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted

score K steps in total

Main disadvantage: All objects must be available from the beginning

MMR (Maximum Marginal Relevance)

11

Page 12: CUbRIK research at SIGMOD 2012

+Bounded diversification

Objects are embedded in a bounded region of space E.g., a bounding rectangle

Accessing objects is costly Objects are progressively accessed (not available at time 0) The number of accessed objects (sumDepths) should be

minimized

Indexes for sorted access to objects are available Access by score (in descending order) Access by distance from a given point (in ascending order) Both are very common in services on the Web (e.g.,

apartments search)

12

Page 13: CUbRIK research at SIGMOD 2012

+Distance-based accessRestaurants by distance from a given point q

13

+

Size of icon proportional to score

Page 14: CUbRIK research at SIGMOD 2012

+Score-based accessRestaurants by score

14

+

Size of icon proportional to score

Page 15: CUbRIK research at SIGMOD 2012

+ Attacking bounded diversification

Goal: achieve the same quality of result as MMR But minimizing the number of accessed objects

K iterations: within each of them do this as long as needed Pulling strategy: choose an access method (by score or

distance) If by distance, choose from which point (probing location)

Bounding scheme: compute an upper bound on the diversity-weighted score that can be achieved by unseen objects

If a seen object exceeds the bound, select it and do next iteration

Credits to [Schnaitter&Polyzotis 2008] for their Pull-Bound Rank Join template

The Pull-Bound MMR (PBMMR) template

15

Page 16: CUbRIK research at SIGMOD 2012

+Choosing probing locations

Goal of distance-based access: Exploring the region of space in which the object with the

best diversity-weighted score is most likely to be found

At each of the K iterations, we fix the probing locations at the most promising points of the unexplored space Vertices of the bounded Voronoi diagram of the points

selected at the previous iterations

Of these, the most promising ones are as far as possible from all the objects of the current selection

16

Page 17: CUbRIK research at SIGMOD 2012

+Example

4 objects x1, …, x4 selected during the first 4 iterations

Bounding region is a square

Voronoi diagram of selected objects

17

Page 18: CUbRIK research at SIGMOD 2012

+Example

4 objects x1, …, x4 selected during the first 4 iterations

Bounding region is a square

Voronoi diagram of selected objects

18

Probing locations

Page 19: CUbRIK research at SIGMOD 2012

+Example

A new object is selected

Voronoi diagram of selected objects

19

Page 20: CUbRIK research at SIGMOD 2012

+

Probing locations: v1, …, v4 (vertices of the bounding region)

Shading: distance from closest points (brightest in vertices)

ExampleBounded Voronoi diagram of selected objects

20

Page 21: CUbRIK research at SIGMOD 2012

+

Probing locations: v1, …, v6 (vertices of bounded Voronoi diagram)

Shading: distance from closest points (brightest in vertices)

The local maxima of the function “distance from the closest point between x1 and x2” are among v1, …, v6

ExampleBounded Voronoi diagram of selected objects

21

Page 22: CUbRIK research at SIGMOD 2012

+

Probing locations: v1, …, v8

Shading: distance from closest points (brightest in vertices)

The local maxima of the function “distance from the closest point among x1, …, x3” are among v1, …, v8

ExampleBounded Voronoi diagram of selected objects

22

Page 23: CUbRIK research at SIGMOD 2012

+

Probing locations: v1, …, v10

Shading: distance from closest points (brightest in vertices)

The local maxima of the function “distance from the closest point among x1, …, x4” are among v1, …, v10

ExampleBounded Voronoi diagram of selected objects

23

Page 24: CUbRIK research at SIGMOD 2012

+

Probing locations: v1, …, v12 (no other intersection in region)

Shading: distance from closest points (brightest in vertices)

The local maxima of the function “distance from the closest point among x1, …, x5” are among v1, …, v12

ExampleBounded Voronoi diagram of selected objects

24

Page 25: CUbRIK research at SIGMOD 2012

+Example

Inside red circumferences: explored region

Pink discs: objects retrieved by distance-based access

A running state

25

Page 26: CUbRIK research at SIGMOD 2012

+Example

Inside red circumferences: explored region

Pink discs: objects retrieved by distance-based access

A running state

26

Page 27: CUbRIK research at SIGMOD 2012

+Example

Inside red circumferences: explored region

Pink discs: objects retrieved by distance-based access

A running state

27

Page 28: CUbRIK research at SIGMOD 2012

+Example

Inside red circumferences: explored region

Pink discs: objects retrieved by distance-based access

A running state

28

Page 29: CUbRIK research at SIGMOD 2012

+Example

Inside red circumferences: explored region

Pink discs: objects retrieved by distance-based access

A running state

29

Page 30: CUbRIK research at SIGMOD 2012

+Bounding schemeComputing a tight upper bound

30

A bound is tight if it can be achieved in some hypothetical continuation of the instance being explored

A tight upper bound can be computed as follows:

Page 31: CUbRIK research at SIGMOD 2012

+Bounding schemeComputing a tight upper bound

31

A bound is tight if it can be achieved in some hypothetical continuation of the instance being explored

A tight upper bound can be computed as follows:

Maximal minimal

distance from the selected

objectsSet of selected objects

Unexplored region of space

Highest score possible (last seen by score-based access)

Page 32: CUbRIK research at SIGMOD 2012

+Bounding schemeComputing a tight upper bound

32

A bound is tight if it can be achieved in some hypothetical continuation of the instance being explored

A tight upper bound can be computed as follows:

Theorem: the point x* that maximizes the minimal distance from all the selected objects is a vertex of the convex hull of unexplored part of a cell of the bounded Voronoi diagram

Theorem: the bound obtained in this way is tight

Page 33: CUbRIK research at SIGMOD 2012

+Selecting the next probing location

In 2D, the point maximizing the minimal distance can only be A vertex of the bounded

Voronoi diagram An intersection between

an edge and a circumference

An intersection between two circumferences

The corresponding vertex is selected as the next probing location

33

Page 34: CUbRIK research at SIGMOD 2012

+Selecting the next probing location

In 2D, the point maximizing the minimal distance can only be A vertex of the bounded

Voronoi diagram An intersection between

an edge and a circumference

An intersection between two circumferences

The corresponding vertex is selected as the next probing location

34

Point maximizing the minimal

distance

Vertex selected as next probing

location

Page 35: CUbRIK research at SIGMOD 2012

+Selecting the next probing location

In 2D, the point maximizing the minimal distance can only be A vertex of the bounded

Voronoi diagram An intersection between

an edge and a circumference

An intersection between two circumferences

The corresponding vertex is selected as the next probing location

35

Point maximizing the minimal

distance

Vertex selected as next probing

location

Page 36: CUbRIK research at SIGMOD 2012

+Pulling strategy

Round robin: select, in alternation, each probing location Some loose form of instance optimality can already be

achieved with a tight bounding scheme and round robin

Potential adaptive: Choose the probing location that is most likely to reduce

the upper bound Potential adaptive is never worse than round robin Choice between access by score or by distance

Looking at how they reduce the upper bound wrt. the number of accessed objects

36

Page 37: CUbRIK research at SIGMOD 2012

+Batched access

In the model so far, objects are accessed one by one Not practical for many scenarios “Batched access” modes available in many practical

systems: Give a point and a radius and receive all objects that fall

within

Strategy with batched access: Perform exactly one request per probing location with an

optimal choice of the radius This amounts to solving an optimization problem that

Minimizes the threshold by appropriately choosing the radii

Is subject to a budget constraint (how many objects am I willing to retrieve)

37

Page 38: CUbRIK research at SIGMOD 2012

+ExperimentsSynthetic data, uniform distribution

38

Page 39: CUbRIK research at SIGMOD 2012

+ExperimentsSynthetic data, exponential distribution

39

Page 40: CUbRIK research at SIGMOD 2012

+ExperimentsReal data

40

Page 41: CUbRIK research at SIGMOD 2012

+Conclusion

Diversification revisited Sorted access modes to avoid accessing all objects Same quality as MMR A structured template with bounding scheme and pulling

strategy

Optimality guarantees with one-by-one access to objects Tight bound Instance optimality (in a loose sense)

Extreme practical efficiency with batched access mode

Future work: Adaptation to other diversification algorithms

41

Page 42: CUbRIK research at SIGMOD 2012

+Acknowledgments:CUbRIK Project CUbRIK is a research project

financed by the European Union

Goals: Advance the architecture of

multimedia search Exploit the human

contribution in multimedia search

Use open-source components provided by the community

Start up a search business ecosystem

http://www.cubrikproject.eu/

42