adaptive processing of top-k queries in xml amelie marian, sihem amer-yahia nick koudas, divesh...

18
Adaptive Processing of Top-k Queries in XML Amelie Marian , Sihem Amer- Yahia Nick Koudas , Divesh Srivastava Proceedings of the 21st International Conference on Data Engineering (ICDE2005)

Upload: vernon-hawkins

Post on 21-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Adaptive Processing of Top-k Queries in XML

Amelie Marian , Sihem Amer-Yahia

Nick Koudas , Divesh Srivastava

Proceedings of the 21st International Conference on Data Engineering (ICDE2005)

Page 2: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

XML<book> <title>wodehouse</title> <info> <publisher> <name>psmith</name> <location>london</location> </publisher> <isbn>1234</isbn> </info> <price>48.95</price></book>

<book> <title>wodehouse</title> <publish> <name>psmith</name> <location>london</location> </publish> <info> <isbn>1234</isbn> </info></book>

Page 3: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

XML

Page 4: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

XML

Page 5: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

XML XPath

pc : parent – child

ad : ancestor-descendant

Page 6: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Scoring Function

The traditional tf*idf function is defined in IR.tf : term frequency : quantifies the relative importance of a keyword in an individual document.idf : inverse document frequency : quantifies the relative importance of an individual keyword in the collection of documents.

Page 7: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Scoring Function

XML unlike traditional IR

An answer to an XPath query need not be an entire document, but can be any node in a document.

An XPath query consists of several predicates linking the returned node to other query nodes, instead of simply “keyword containment in the document” (as in IR).

Page 8: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Scoring FunctionXPath Component Predicates

XPath query Q

q0 : query answer node

qi , 1 <= i <= l : other query nodes

p( q0 , qi ) : XPath axis between query nodes q0 and qi , i>=1

PQ (component predicates of Q): set of predicates {p(q0,qi)}, 1<= i <= l

Page 9: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Scoring Function

XML idf

Page 10: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Scoring Function

XML tf

Page 11: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Scoring Function

XML tf*idf Score

Page 12: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Whirlpool Architecture

Page 13: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Whirlpool Architecture

Servers and Server Queues

Top-k Set

Router and Router Queue

Page 14: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Server Predicates Generation

Page 15: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Whirlpool

Page 16: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Scheduling between components

Single-threaded

Multi-threaded

Page 17: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Experimental

Page 18: Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference

Conclusion

Whirlpool , an adaptive evaluation strategy for computing exact and approximate top-k answers of XPath queries.

We are investigating new directions such as increasing the number of threads per server for maximal parallelism.