adaptive processing of top-k queries in xml amelie marian, sihem amer-yahia nick koudas, divesh...
TRANSCRIPT
Adaptive Processing of Top-k Queries in XML
Amelie Marian , Sihem Amer-Yahia
Nick Koudas , Divesh Srivastava
Proceedings of the 21st International Conference on Data Engineering (ICDE2005)
XML<book> <title>wodehouse</title> <info> <publisher> <name>psmith</name> <location>london</location> </publisher> <isbn>1234</isbn> </info> <price>48.95</price></book>
<book> <title>wodehouse</title> <publish> <name>psmith</name> <location>london</location> </publish> <info> <isbn>1234</isbn> </info></book>
XML
XML
XML XPath
pc : parent – child
ad : ancestor-descendant
Scoring Function
The traditional tf*idf function is defined in IR.tf : term frequency : quantifies the relative importance of a keyword in an individual document.idf : inverse document frequency : quantifies the relative importance of an individual keyword in the collection of documents.
Scoring Function
XML unlike traditional IR
An answer to an XPath query need not be an entire document, but can be any node in a document.
An XPath query consists of several predicates linking the returned node to other query nodes, instead of simply “keyword containment in the document” (as in IR).
Scoring FunctionXPath Component Predicates
XPath query Q
q0 : query answer node
qi , 1 <= i <= l : other query nodes
p( q0 , qi ) : XPath axis between query nodes q0 and qi , i>=1
PQ (component predicates of Q): set of predicates {p(q0,qi)}, 1<= i <= l
Scoring Function
XML idf
Scoring Function
XML tf
Scoring Function
XML tf*idf Score
Whirlpool Architecture
Whirlpool Architecture
Servers and Server Queues
Top-k Set
Router and Router Queue
Server Predicates Generation
Whirlpool
Scheduling between components
Single-threaded
Multi-threaded
Experimental
Conclusion
Whirlpool , an adaptive evaluation strategy for computing exact and approximate top-k answers of XPath queries.
We are investigating new directions such as increasing the number of threads per server for maximal parallelism.