a probabilistic optimization framework for the empty-answer problem

41
A Probabilistic Optimization Framework for the Empty-Answer Problem A Probabilistic Optimization Framework for the Empty-Answer Problem Davide Mottin, Alice Marascu, Senjuti Basu Roy Gautam Das, Themis Palpanas, Yannis Velegrakis Talk by Davide Mottin at Yahoo! Research Barcelona

Upload: purity

Post on 22-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

A Probabilistic Optimization Framework for the Empty-Answer Problem . Davide Mottin, Alice Marascu , Senjuti Basu Roy Gautam Das, Themis Palpanas , Yannis Velegrakis. Talk by Davide Mottin at Yahoo! Research Barcelona. Who am I?. Born in Marostica Live in Trento - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem Davide Mottin, Alice Marascu, Senjuti Basu RoyGautam Das, Themis Palpanas, Yannis Velegrakis

Talk by Davide Mottin at Yahoo! Research Barcelona

Page 2: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Who am I?• Born in Marostica• Live in Trento• I’m member of the

dbTrento group in the University of Trento

• Advisors: Themis Palpanas and Yannis Velegrakis

Page 3: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Empty-Answer Problem

Page 4: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Empty-Answer Problem

CARDB

Alarm, DSL, Manual

{}

No answer

Page 5: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Issues

Users need a product matching his/her preferencesDifficult to propose an approximate answer close to user needsThe system does not provide sufficient help

Page 6: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

OutlineBackground

Query relaxationInteractive query relaxation

User ModelQuery Relaxation Tree

Problem definitionSolutions

Exact SolutionsApproximate Solution

Experimental ResultsConclusions

Page 7: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Existing SolutionsRanking function

Propose ranked results that are close to the user preferences Both IR [Baeza11] and database solutions [Chaudhuri04]

Query relaxationRemove or change one of the conditions in user query [Mishra09]

Page 8: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Query Relaxation

CARDB

Alarm, DSL, Manual

{}

Relaxed query

DSL, Manual

Page 9: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

How Many Relaxations? Exponential in the size of the query

Page 10: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

ChallengesToo many relaxation proposed

Lack of a principled method to propose the next relaxation

Exploring all the relaxations is impracticalLimited user interaction with the system

Lack of user-centric model and motivation for a refinement

SOLUTION??? Interactive Query Relaxation

Page 11: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Interactive Query Relaxation

CARDB

Alarm, DSL, Manual

{}

Remove DSL? NO

Result: {Askari, A10, …}Remove Alarm?

YES

Page 12: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Applications

Small mobile Hand-held devicesInteraction with an agent via telephoneReservations in restaurants …

Page 13: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

OutlineBackground

Query relaxationInteractive query relaxation

User ModelQuery Relaxation Tree

Problem definitionSolutions

Exact SolutionsApproximate Solution

Experimental ResultsConclusions

Page 14: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Query Relaxation Tree

Relax DSL?

Relax Alarm?

Problem: Find the optimal path that maximize or minimize some user-centric quantity

DSL is not relaxable anymore

Page 15: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Query Relaxation TreeNodes represent

Next relaxation proposed (relaxation nodes)Yes/No User choices (choice node)

Leaves representNon-relaxable queryNon-empty query

Choice branches have relprefyes relprefno probabilitiesA refused relaxation = cannot be relaxed further (hard constraint)For Each node we compute a cost that depends on the optimization adopted (Dynamic, Semi-Dynamic, Static)

Page 16: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

User-Centric ModelPrior(t,Q,Q’) • user knowledge about the existence of a

tuple t satisying relaxed query Q’ Pref(t, Q’) [preference function]• User preference about a tuple given the

query Q’

DBQ'

t

t?

t

Page 17: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

User-Centric Model

Q: What is the probability the user says NO to a relaxation?A: The user doesn’t like any of the tuples that satisfies the relaxed query Q’

Page 18: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Problem Definition

Given a query Q and a database D, find the sequence of relaxations in the Query Relaxation Tree that (3 separate goals): (a) Minimize the number of relaxations (Dynamic)(b) Maximize the user satisfaction (Semi-Dynamic)(c) Maximize some profit/benefit (Static)

Page 19: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Cost function

where optimize = Min if goal is Dynamic (minimum number of steps) and Max otherwise

Cost of a leaf: • 0 for Dynamic (minimize Number of steps)• Max preference (using pref) of tuples for Semi-Dynamic• Max value (e.g. price, revenue) of the tuples for Static

Page 20: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

OutlineBackground

Query relaxationInteractive query relaxation

User ModelQuery Relaxation Tree

Problem definitionSolutions

Exact Solutions (FullTree and FastOpt)Approximate Solution

Experimental ResultsConclusions

Page 21: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Exact Solution (FullTree)Input: query Q, database DOutput: optimal cost1. Construct the Query Relaxation 2. Compute the cost for each node (bottom-up)3. Returns the cost of the root

Page 22: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

FullTree Algorithm (Dynamic)

0 00.3 0.7

11 10 0

1 2

1

0.3 0.7

Page 23: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Fast Solution (FastOpt)

Idea: prune the unpromising branches in advance and expand only the good ones• Associate an upper and a lower bound at each

node• Upper bound is the cost of the node when the

probability is 1 on no nodes• Lower bound is the opposite of the upper

• Remove a node if his lower bound is greater than some upper bound of the sibling nodes

Page 24: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

FastOpt Algorithm (Dynamic)

Prune!!!

Page 25: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

OutlineBackground

Query relaxationInteractive query relaxation

User ModelQuery Relaxation Tree

Problem definitionSolutions

Exact Solutions (FullTree and FastOpt)Approximate Solution

Experimental ResultsConclusions

Page 26: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Approximate Solution (CDR)Idea: compute the cost distribution of each node and expand the only node that maximizes the probability that the cost is less than the cost of all the siblings.1. Associate a b-size histogram to each node2. Construct the tree for the first L levels

1. Assigns uniform probability to nodes 2. Use convolution to find sum (choice nodes)/min (relaxation

nodes) distributions for costs3. Expand the branch that has the biggest probability of

having the lower cost

Page 27: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Compute cost distributions

Choice node at level 2, cost uniformly distributed in [1,3]

Compute relpref * (1 + cost(n'))

Query size 5

Remember the cost formula

Page 28: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Compute cost distributions

Sum the distributions of yes child and no child using sum-convolution

Probability distribution of nyes

Probability distribution of nyes

Page 29: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Compute Cost Distributions

Compute the min convolution of the child of relaxation node

Page 30: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Choose the Branch to ExpandIdea: for each son of the root, compute the probability that the cost is smaller than the siblings and choose the son with the highest probability

Pr(n1<n2) = 0.6 n1 n2 Pr(n2<n1) = 0.4

Expand this!

Page 31: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

OutlineBackground

Query relaxationInteractive query relaxation

User ModelQuery Relaxation Tree

Problem definitionSolutions

Exact Solutions (FullTree and FastOpt)Approximate Solution

Experimental ResultsConclusions

Page 32: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Experimental SetupDatasets:

US Home dataset: 38k tuples 18 attributesCar dataset: 100k tuples, 31 attributesSyntetic datasets: 20k to 500k tuples

Baseline algorithms: Query refinement algorithm [Mishra09] (QueryRef)Random relaxationGreedy: choose the first non empy otherwise random

Page 33: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Experimental SetupEffectiveness:

Query timeSize of the tree (number of nodes)Cost of the root (expected number of steps)

CDR calibration: Impact of L and number of buckets

User study: 125 users with Mechanical TurkRandom queries with 4-8 attributesEvaluation of the usefulness system

Page 34: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Root Cost

• CDR close to optimal• QueryRef is 30%

worse on average• Random is 150%

worse than FullTree

Page 35: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Goal comparison

All the objective functions correctly optimize their goalsDynamic and Semi-Dynamic are very similar in performance

Page 36: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Query TimeExponential behaviour

Efficient for small queries

1.4 sec for query size 10!!!

Page 37: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

User StudyQ1 - Rate the suggested refinementsQ2 - Did you like the system guiding you?Q3 - Did the system help you arrive to the results fast?Q4 - Did you prefer using the help of this system to relaxing the query by yourself?

Page 38: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

ConclusionsWe propose

a novel principled, user-centric and interactive approach for the empty-answer problemtwo exact algorithms and an approximate algorithm

We show that the framework can deal with the combinatorial explosion the user effort is minimizedthe user is generally satisfied by the system

Page 39: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Page 40: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Bibliography[Mishra09] C. Mishra and N. Koudas, “Interactive query refinement,” in EDBT,2009.[Roy08] S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania, “Minimum-effort driven dynamic faceted search in structured databases,” in CIKM, 2008.[Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic ranking of database query results,” in VLDB, 2004.[Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval, 2011.

Page 41: A  Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Cost ProbabilityProbability that the cost of n1 lesser than cost n2

Relaxation of the root