a probabilistic optimization framework for the empty-answer problem
DESCRIPTION
A Probabilistic Optimization Framework for the Empty-Answer Problem . Davide Mottin, Alice Marascu , Senjuti Basu Roy Gautam Das, Themis Palpanas , Yannis Velegrakis. Talk by Davide Mottin at Yahoo! Research Barcelona. Who am I?. Born in Marostica Live in Trento - PowerPoint PPT PresentationTRANSCRIPT
A Probabilistic Optimization Framework for the Empty-Answer Problem
A Probabilistic Optimization Framework for the Empty-Answer Problem Davide Mottin, Alice Marascu, Senjuti Basu RoyGautam Das, Themis Palpanas, Yannis Velegrakis
Talk by Davide Mottin at Yahoo! Research Barcelona
A Probabilistic Optimization Framework for the Empty-Answer Problem
Who am I?• Born in Marostica• Live in Trento• I’m member of the
dbTrento group in the University of Trento
• Advisors: Themis Palpanas and Yannis Velegrakis
A Probabilistic Optimization Framework for the Empty-Answer Problem
Empty-Answer Problem
A Probabilistic Optimization Framework for the Empty-Answer Problem
Empty-Answer Problem
CARDB
Alarm, DSL, Manual
{}
No answer
A Probabilistic Optimization Framework for the Empty-Answer Problem
Issues
Users need a product matching his/her preferencesDifficult to propose an approximate answer close to user needsThe system does not provide sufficient help
A Probabilistic Optimization Framework for the Empty-Answer Problem
OutlineBackground
Query relaxationInteractive query relaxation
User ModelQuery Relaxation Tree
Problem definitionSolutions
Exact SolutionsApproximate Solution
Experimental ResultsConclusions
A Probabilistic Optimization Framework for the Empty-Answer Problem
Existing SolutionsRanking function
Propose ranked results that are close to the user preferences Both IR [Baeza11] and database solutions [Chaudhuri04]
Query relaxationRemove or change one of the conditions in user query [Mishra09]
A Probabilistic Optimization Framework for the Empty-Answer Problem
Query Relaxation
CARDB
Alarm, DSL, Manual
{}
Relaxed query
DSL, Manual
A Probabilistic Optimization Framework for the Empty-Answer Problem
How Many Relaxations? Exponential in the size of the query
A Probabilistic Optimization Framework for the Empty-Answer Problem
ChallengesToo many relaxation proposed
Lack of a principled method to propose the next relaxation
Exploring all the relaxations is impracticalLimited user interaction with the system
Lack of user-centric model and motivation for a refinement
SOLUTION??? Interactive Query Relaxation
A Probabilistic Optimization Framework for the Empty-Answer Problem
Interactive Query Relaxation
CARDB
Alarm, DSL, Manual
{}
Remove DSL? NO
Result: {Askari, A10, …}Remove Alarm?
YES
A Probabilistic Optimization Framework for the Empty-Answer Problem
Applications
Small mobile Hand-held devicesInteraction with an agent via telephoneReservations in restaurants …
A Probabilistic Optimization Framework for the Empty-Answer Problem
OutlineBackground
Query relaxationInteractive query relaxation
User ModelQuery Relaxation Tree
Problem definitionSolutions
Exact SolutionsApproximate Solution
Experimental ResultsConclusions
A Probabilistic Optimization Framework for the Empty-Answer Problem
Query Relaxation Tree
Relax DSL?
Relax Alarm?
Problem: Find the optimal path that maximize or minimize some user-centric quantity
DSL is not relaxable anymore
A Probabilistic Optimization Framework for the Empty-Answer Problem
Query Relaxation TreeNodes represent
Next relaxation proposed (relaxation nodes)Yes/No User choices (choice node)
Leaves representNon-relaxable queryNon-empty query
Choice branches have relprefyes relprefno probabilitiesA refused relaxation = cannot be relaxed further (hard constraint)For Each node we compute a cost that depends on the optimization adopted (Dynamic, Semi-Dynamic, Static)
A Probabilistic Optimization Framework for the Empty-Answer Problem
User-Centric ModelPrior(t,Q,Q’) • user knowledge about the existence of a
tuple t satisying relaxed query Q’ Pref(t, Q’) [preference function]• User preference about a tuple given the
query Q’
DBQ'
t
t?
t
A Probabilistic Optimization Framework for the Empty-Answer Problem
User-Centric Model
Q: What is the probability the user says NO to a relaxation?A: The user doesn’t like any of the tuples that satisfies the relaxed query Q’
A Probabilistic Optimization Framework for the Empty-Answer Problem
Problem Definition
Given a query Q and a database D, find the sequence of relaxations in the Query Relaxation Tree that (3 separate goals): (a) Minimize the number of relaxations (Dynamic)(b) Maximize the user satisfaction (Semi-Dynamic)(c) Maximize some profit/benefit (Static)
A Probabilistic Optimization Framework for the Empty-Answer Problem
Cost function
where optimize = Min if goal is Dynamic (minimum number of steps) and Max otherwise
Cost of a leaf: • 0 for Dynamic (minimize Number of steps)• Max preference (using pref) of tuples for Semi-Dynamic• Max value (e.g. price, revenue) of the tuples for Static
A Probabilistic Optimization Framework for the Empty-Answer Problem
OutlineBackground
Query relaxationInteractive query relaxation
User ModelQuery Relaxation Tree
Problem definitionSolutions
Exact Solutions (FullTree and FastOpt)Approximate Solution
Experimental ResultsConclusions
A Probabilistic Optimization Framework for the Empty-Answer Problem
Exact Solution (FullTree)Input: query Q, database DOutput: optimal cost1. Construct the Query Relaxation 2. Compute the cost for each node (bottom-up)3. Returns the cost of the root
A Probabilistic Optimization Framework for the Empty-Answer Problem
FullTree Algorithm (Dynamic)
0 00.3 0.7
11 10 0
1 2
1
0.3 0.7
A Probabilistic Optimization Framework for the Empty-Answer Problem
Fast Solution (FastOpt)
Idea: prune the unpromising branches in advance and expand only the good ones• Associate an upper and a lower bound at each
node• Upper bound is the cost of the node when the
probability is 1 on no nodes• Lower bound is the opposite of the upper
• Remove a node if his lower bound is greater than some upper bound of the sibling nodes
A Probabilistic Optimization Framework for the Empty-Answer Problem
FastOpt Algorithm (Dynamic)
Prune!!!
A Probabilistic Optimization Framework for the Empty-Answer Problem
OutlineBackground
Query relaxationInteractive query relaxation
User ModelQuery Relaxation Tree
Problem definitionSolutions
Exact Solutions (FullTree and FastOpt)Approximate Solution
Experimental ResultsConclusions
A Probabilistic Optimization Framework for the Empty-Answer Problem
Approximate Solution (CDR)Idea: compute the cost distribution of each node and expand the only node that maximizes the probability that the cost is less than the cost of all the siblings.1. Associate a b-size histogram to each node2. Construct the tree for the first L levels
1. Assigns uniform probability to nodes 2. Use convolution to find sum (choice nodes)/min (relaxation
nodes) distributions for costs3. Expand the branch that has the biggest probability of
having the lower cost
A Probabilistic Optimization Framework for the Empty-Answer Problem
Compute cost distributions
Choice node at level 2, cost uniformly distributed in [1,3]
Compute relpref * (1 + cost(n'))
Query size 5
Remember the cost formula
A Probabilistic Optimization Framework for the Empty-Answer Problem
Compute cost distributions
Sum the distributions of yes child and no child using sum-convolution
Probability distribution of nyes
Probability distribution of nyes
A Probabilistic Optimization Framework for the Empty-Answer Problem
Compute Cost Distributions
Compute the min convolution of the child of relaxation node
A Probabilistic Optimization Framework for the Empty-Answer Problem
Choose the Branch to ExpandIdea: for each son of the root, compute the probability that the cost is smaller than the siblings and choose the son with the highest probability
Pr(n1<n2) = 0.6 n1 n2 Pr(n2<n1) = 0.4
Expand this!
A Probabilistic Optimization Framework for the Empty-Answer Problem
OutlineBackground
Query relaxationInteractive query relaxation
User ModelQuery Relaxation Tree
Problem definitionSolutions
Exact Solutions (FullTree and FastOpt)Approximate Solution
Experimental ResultsConclusions
A Probabilistic Optimization Framework for the Empty-Answer Problem
Experimental SetupDatasets:
US Home dataset: 38k tuples 18 attributesCar dataset: 100k tuples, 31 attributesSyntetic datasets: 20k to 500k tuples
Baseline algorithms: Query refinement algorithm [Mishra09] (QueryRef)Random relaxationGreedy: choose the first non empy otherwise random
A Probabilistic Optimization Framework for the Empty-Answer Problem
Experimental SetupEffectiveness:
Query timeSize of the tree (number of nodes)Cost of the root (expected number of steps)
CDR calibration: Impact of L and number of buckets
User study: 125 users with Mechanical TurkRandom queries with 4-8 attributesEvaluation of the usefulness system
A Probabilistic Optimization Framework for the Empty-Answer Problem
Root Cost
• CDR close to optimal• QueryRef is 30%
worse on average• Random is 150%
worse than FullTree
A Probabilistic Optimization Framework for the Empty-Answer Problem
Goal comparison
All the objective functions correctly optimize their goalsDynamic and Semi-Dynamic are very similar in performance
A Probabilistic Optimization Framework for the Empty-Answer Problem
Query TimeExponential behaviour
Efficient for small queries
1.4 sec for query size 10!!!
A Probabilistic Optimization Framework for the Empty-Answer Problem
User StudyQ1 - Rate the suggested refinementsQ2 - Did you like the system guiding you?Q3 - Did the system help you arrive to the results fast?Q4 - Did you prefer using the help of this system to relaxing the query by yourself?
A Probabilistic Optimization Framework for the Empty-Answer Problem
ConclusionsWe propose
a novel principled, user-centric and interactive approach for the empty-answer problemtwo exact algorithms and an approximate algorithm
We show that the framework can deal with the combinatorial explosion the user effort is minimizedthe user is generally satisfied by the system
A Probabilistic Optimization Framework for the Empty-Answer Problem
A Probabilistic Optimization Framework for the Empty-Answer Problem
Bibliography[Mishra09] C. Mishra and N. Koudas, “Interactive query refinement,” in EDBT,2009.[Roy08] S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania, “Minimum-effort driven dynamic faceted search in structured databases,” in CIKM, 2008.[Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic ranking of database query results,” in VLDB, 2004.[Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval, 2011.
A Probabilistic Optimization Framework for the Empty-Answer Problem
Cost ProbabilityProbability that the cost of n1 lesser than cost n2
Relaxation of the root