adaptive processing of top-k queries in xml amelie marian, sihem amer-yahia nick koudas, divesh...

Post on 21-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Adaptive Processing of Top-k Queries in XML

Amelie Marian , Sihem Amer-Yahia

Nick Koudas , Divesh Srivastava

Proceedings of the 21st International Conference on Data Engineering (ICDE2005)

XML<book> <title>wodehouse</title> <info> <publisher> <name>psmith</name> <location>london</location> </publisher> <isbn>1234</isbn> </info> <price>48.95</price></book>

<book> <title>wodehouse</title> <publish> <name>psmith</name> <location>london</location> </publish> <info> <isbn>1234</isbn> </info></book>

XML

XML

XML XPath

pc : parent – child

ad : ancestor-descendant

Scoring Function

The traditional tf*idf function is defined in IR.tf : term frequency : quantifies the relative importance of a keyword in an individual document.idf : inverse document frequency : quantifies the relative importance of an individual keyword in the collection of documents.

Scoring Function

XML unlike traditional IR

An answer to an XPath query need not be an entire document, but can be any node in a document.

An XPath query consists of several predicates linking the returned node to other query nodes, instead of simply “keyword containment in the document” (as in IR).

Scoring FunctionXPath Component Predicates

XPath query Q

q0 : query answer node

qi , 1 <= i <= l : other query nodes

p( q0 , qi ) : XPath axis between query nodes q0 and qi , i>=1

PQ (component predicates of Q): set of predicates {p(q0,qi)}, 1<= i <= l

Scoring Function

XML idf

Scoring Function

XML tf

Scoring Function

XML tf*idf Score

Whirlpool Architecture

Whirlpool Architecture

Servers and Server Queues

Top-k Set

Router and Router Queue

Server Predicates Generation

Whirlpool

Scheduling between components

Single-threaded

Multi-threaded

Experimental

Conclusion

Whirlpool , an adaptive evaluation strategy for computing exact and approximate top-k answers of XPath queries.

We are investigating new directions such as increasing the number of threads per server for maximal parallelism.

top related