streaming xpath engine oleg slezberg amruta joshi
TRANSCRIPT
Streaming XPath Engine
Oleg Slezberg
Amruta Joshi
Overview• Motivation
– Querying Streaming XML – XPath Challenges (predicates, //, nesting…)
• Basic Objective– Comparative Analysis of Algorithms
• Implementation– Implemented engine in Java using JDK 1.4.2 – Apache Xerces 2.6.2 for parsing (both XML and
XPath)– Used existing XSQ Java implementation– Benchmark for evaluation - XPathMark
XStream
• Builds parse tree for input query
• Maintains an event stack
• Keeps matching input streaming document for each node
Our Contributions
• Correction –
• Verification –
• Performance Figures –
• Recursive Query Handling –
• Query Evaluation Support –
Performance• Benchmark: XPathMark, set of 23 queries (mostly
predicate queries)• Criteria: Queries Per Second Rate• Test Setup: Run on elaine2, 900 MHz 2-CPU
processor • Results:
– XSQ QPS: 4.39 Coverage: 17% – TurboXPath QPS: 5.75 Coverage: 21%+
• Time = XML Parsing + Processing• QPS: XStream 30% faster + better coverage on given
benchmark
Recursive Query Handling
• For query node n and elements e1, e2 in d– Both e1 and e2 match n– e1 contains e2
• Example:• Document <a><a><b/></a><b></b></a>• Query //a/b
• FA-based algorithms – Exponential number of states
Query Evaluation Support
• 2 Questions:– Filtering
• Does this document match the query? • F1: XML => boolean
– Evaluation• What parts of the document match the query? • F2: XML => XML
• Modifications:– Output buffers for predicate owner – Predicate node buffers – Predicate evaluation
Multiple Simultaneous Queries
• combine the queries OR-ing them together:
• q = (q1) | (q2) | … | (qn);
• Resulting query has multiple output nodes
• Associate a query-id with output node
Conclusion
• Streaming XPath Engine– All Objectives met! (XPath Stream Evaluator
implemented, Performance Analysis)
– Algorithm correction and enhancements
• Future Directions– Backward Axis Support
– Function Support – reuse predicate evaluation model
– Extended expression type support
– Predicate Pipelining