1 ranking inexact answers. 2 ranking issues when inexact querying is allowed, there may be many...

30
1 Ranking Inexact Answers Ranking Inexact Answers

Upload: april-cobb

Post on 30-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

1

Ranking Inexact AnswersRanking Inexact Answers

Page 2: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

2

Ranking IssuesRanking Issues

• When inexact querying is allowed, there may be MANY answers– different answers have a different level of

incompleteness

• Ranking the answers allows the user to quickly see the (hopefully) most relevant answers

• Preference: Create answers in ranking order– Why is this important?

• We will consider several different approaches to this problem

Page 3: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

3

Tree Pattern RelaxationTree Pattern Relaxation

Amer-Yahia, Cho, Srivastava

EDBT 2002

Page 4: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

4

Tree PatternsTree Patterns

• Queries are tree patterns, as considered in

previous lessons

Book

Collection Editor

Name Address

Double line indicates

descendent

Page 5: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

5

Relaxed QueriesRelaxed Queries

• Four types of “relaxations” are allowed on the trees

• Node Generalization: Assume that we know a

relationship of types/super-types among labels.

Allow label to be changed to super-type

Book

Collection Editor

Name Address

Document

Collection Editor

Name Address

Page 6: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

6

Relaxed QueriesRelaxed Queries

• Leaf Node Deletion: Delete a leaf node (and its

incoming edge) from the tree

Book

Collection Editor

Name Address

Book

Editor

Name Address

Page 7: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

7

Relaxed QueriesRelaxed Queries

• Edge Generalization: Change a parent-child edge

to an ancestor-descendent edge

Book

Collection Editor

Name Address

Book

Editor

Name Address

Collection

Page 8: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

8

Relaxed QueriesRelaxed Queries

• Subtree Promotion: A query subtree can be

promoted so that it is directly connected to its

former grandparent by an ancestor-descendent

edgeBook

Collection Editor

Name Address

Book

Editor Name

Address

Collection

Page 9: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

9

Composing RelaxationsComposing Relaxations

• Relaxations can be composed. Are the following

relaxations of Q?

Book

Collection Editor

Name Address

QBook

Collection

Book

Collection Address

Name

Document

Address

Page 10: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

10

Approximate Answers and RankingApproximate Answers and Ranking

• An approximate answer to Q is an exact answer to a

relaxed query derived from Q

• In order to give different answers different rankings, tree

patterns are weighted

• Each node and edge has 2 weights – value when exactly

satisfied, value when satisfied by a relaxationBook

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

A fragment of a document that

exactly satisfies the query will have a

score of: 45

Page 11: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

11

Example RankingExample Ranking

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book

Person

Name Address

Details

Sam NY

How much would this

answer score?

Page 12: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

12

Example RankingExample Ranking

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book

Person

Name Address

Details

Sam NY

How much would this

answer score?

Page 13: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

13

Problem DefinitionProblem Definition

Given an XML document D, a weighted tree

pattern Q and a threshold t, find all approximate

answers of Q in D whose scores are ≥ t

• Naive strategy to solve the problem:

– Find all relaxations of Q

– For each relaxation, compute all exact answers

– remove answers with score below t

• Is this a good strategy?

Page 14: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

14

Problem DefinitionProblem Definition

Given an XML document D, a weighted tree pattern Q and a threshold t, find all approximate answers of Q in D whose scores are ≥ t

• A better strategy to compute an answer to a relaxation of a query:– Intuition: Compute the query as a series of joins

– Can use stack-merge algorithms (studied before) for computing joins

– filter out intermediate results whose scores are too low

Page 15: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

15

The Query PlanThe Query Plan

• We now show the how to derive a plan for

evaluating queries in this setting

• First, we show how an exact plan is derived

• Then, we consider how each individual

relaxation can be added in

• Finally, we show the complete relaxed plan

Page 16: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

16

Query Plan: Exact AnswersQuery Plan: Exact Answers

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book Collection

Editor

Address

Name

c(Book, Collection)

c(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(x,y) = y is child of x

d(x,y) = y is descendent of x

(6, 0)

Page 17: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

17

Query Plan: Exact AnswersQuery Plan: Exact Answers

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Book Collection

Editor

Address

Name

c(Book, Collection)

c(Book, Editor)

c(Editor, Name)

d(Editor, Address)

Remember, to compute a join, e.g., of Book and Collection, we actually find the list of Books and the list of Collections (from the index) and perform the stack-merge algorithms

(6, 0)

Page 18: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

18

Adding Relaxations into PlanAdding Relaxations into Plan

• Node generalization: Book relaxed to Document

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

Document

c(Book, Collection)c(Document, Collection)

c(Document, Editor)

Page 19: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

19

Adding Relaxations into PlanAdding Relaxations into Plan

• Edge generalization: Relax Editor-Name Edge

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(Book, Collection)

c(Editor, Name) or

(Not exists c(Editor,Name)

and d(Editor, Name((

Written in short as:c(Editor, Name) or

d(Editor, Name(

We only allow relaxations when a direct child does

not exist

Page 20: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

20

Adding Relaxations into PlanAdding Relaxations into Plan

• Subtree Promotion: Promote tree rooted at Name

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(Book, Collection)

c(Editor, Name) or

(Not exists c(Editor,Name)

and d(Book, Name((

Written in short as:c(Editor, Name) or

d(Book, Name(

Page 21: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

21

Adding Relaxations into PlanAdding Relaxations into Plan

• Leaf Node Deletion: Make Address Optional

Book Collection

Editor

Address

Namec(Book, Editor)

c(Editor, Name)

d(Editor, Address)

c(Book, Collection)

Outer Join Operator: Means that should join if possible, but not delete values that

cannot join

Page 22: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

22

Combining All Possible RelaxationsCombining All Possible Relaxations

• All approximate answers can be derived from the following

query plan

Document Collection

Editor

Address

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document, Address)

c(Book, Collection) OR d(Document, Collection)

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)(6, 0)

Page 23: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

23

Creating “Best Answers”Creating “Best Answers”

• Want to find answers whose ranking is over

the threshold t

• Naive solution: Create all answers. Delete

answers with low ranking

• Algorithm Thres: Goal of the algorithm is to

prune intermediate answers that cannot

possibly meet the specified threshold

Page 24: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

24

Associating Nodes with Maximal WeightAssociating Nodes with Maximal Weight

• The maximal weight of a node in the evaluation plan is the

largest value by which the score of an intermediate answer

computed for that node can grow

Document Collection

Editor

Address

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document,Address)

c(Book, Collection) OR d(Document, Collection)

Page 25: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

25

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Document Collection

Editor

Address

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document,Address)

c(Book, Collection) OR d(Document, Collection)

(38) (39)

(6, 0)

(30) (40)

(39)

(41)

(21)

(7)

(0)

Page 26: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

26

Algorithm ThresAlgorithm Thres

• Relaxed query evaluation plan is computed

bottom-up

– Note that the joins are computed for all matching

intermediate results at the same time

• At each step, intermediate results are computed,

along with their scores

• If the sum of an intermediate result score with the

maximal weight of the current node is less than the

threshold, prune the intermediate result

Page 27: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

27

Example: Threshold = 35Example: Threshold = 35

Book

Editor

Name Address

Details

Sam NYDocument Collection

Editor

Namec(Document, Editor) OR d(Document, Editor)

c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)

d(Editor, Address) OR d(Document,Address)

c(Book, Collection) OR d(Document, Collection)

(38) (39)

(30) (40)

(39)

(41)

(21)

(7)

(0)

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Address

(6, 0)

When will the answer be pruned?

7

7

16

27

Page 28: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

28

Test YourselfTest Yourself

Page 29: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

29

Example RankingExample Ranking

Book

Collection Editor

Name Address

(7, 1)

(4, 3)(2, 1)

(6, 0) (5, 0)

(8, 5)

(6, 0) (4, 0)

(3, 0)

Document

Name Address

Sam NY

How much would this

answer score?Collection

Page 30: 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness

30

(8, 5)

Query PlanQuery Plan

Book

Collection Editor

Name

(7, 1)

(4, 3)(2, 1)

(5, 0)

(6, 0)

(6, 0)

1. What will the exact plan look like?

FName LName

2. What will the plan look like if all possible relaxations are added?

3. What is maximal weight by which the score of an intermediate answer can

grow, for each node?

(2, 1) (2, 1)

(2, 0)(1, 0)