xpath query evaluation

33
Xpath Query Evaluation

Upload: duaa

Post on 12-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Xpath Query Evaluation. Goal. Evaluating an Xpath query against a given document To find all matches We will also consider the use of types Complexity is important Huge Documents. Data complexity vs. Combined Complexity. Two inputs to the query evaluation problem - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Xpath Query Evaluation

Xpath Query Evaluation

Page 2: Xpath Query Evaluation

Goal

• Evaluating an Xpath query against a given document– To find all matches

• We will also consider the use of types

• Complexity is important– Huge Documents

Page 3: Xpath Query Evaluation

Data complexity vs. Combined Complexity

• Two inputs to the query evaluation problem– Data (XML document) of size |D|– Query (Xpath expression) of size |Q|– Usually |Q| << |D|

• Polynomial data complexity– Complexity that is polynomial in |D|, possibly exponential in |Q|

• Polynomial combined complexity– Complexity that is polynomial in |D| and |Q|

• Fixed Parameter Tractable complexity – Complexity Poly(|D|)*f(|Q|)

Page 4: Xpath Query Evaluation

Xpath Query Evaluation

• Input: XML Document D, Xpath query Q

• Output: A subset of the nodes of D, as defined by Q

• We will follow Efficient Algorithms for Processing Xpath Queries / Gottlob, Koch, Pichler, TODS 2005

Page 5: Xpath Query Evaluation

Simple algorithm

process-location-step(n,Q) { S:-= Apply Q.first to n; If |Q|> 1 For each node n’ in s do process-location-step(n’,Q.next)}

Page 6: Xpath Query Evaluation

Complexity

• Worst case: in each step of Q the axis is “following”

• So we apply the query in each step on O(|D|) nodes

• And we get Time(|Q|)= |D|*Time(|Q|-1)

• I.e. the complexity is O(|D|^|Q|)

Page 7: Xpath Query Evaluation

Early Systems Performance

Figure taken from Gottlob, Koch, Pichler ‘05

Page 8: Xpath Query Evaluation

Internet Explorer 6

Figure taken from Gottlob, Koch, Pichler ‘05

Page 9: Xpath Query Evaluation

IE6 – performance as a function of document size

Figure taken from Gottlob, Koch, Pichler ‘05

Page 10: Xpath Query Evaluation

Polynomial data complexity

• Poly data complexity is sometimes considered good even if exponential in the query size

• But can we have polynomial combined complexity for Xpath query evaluation?

• Yes!

Page 11: Xpath Query Evaluation

Two main principles

• Query parse trees: the query is divided to parts according to its structure (not to be confused with the XML tree structure)

• Context-value tables: for every expression e occurring in the parse tree, compute a table of all valid combinations of context c and value v such that e evaluates to v in c.

Page 12: Xpath Query Evaluation

Xpath query parse tree

descendant::b/following-sibling::* [position() != last()]

Page 13: Xpath Query Evaluation

Bottom-up vs. Top-down evaluation

• We will discuss two kinds of query evaluation algorithms:– Bottom-up means that the query parse tree is

processed from the leaves up to the root– Top-down means that the parse tree is processed

from the root to the leaves

• When processing we will fill in the context-value table

Page 14: Xpath Query Evaluation

Bottom-up evaluation

• Main idea: compute the value for each leaf for every possible context

• Propagate upwards until the root

• Dynamic programming algorithm to avoid re-evaluation of queries in the same context

Page 15: Xpath Query Evaluation

Operational semantics

• Needed as a first step for evaluation algorithms

• Similar ideas used in compilers design

• Here the semantics is based on the notion of contexts

Page 16: Xpath Query Evaluation

Contexts

• The domain of contexts is C= dom X {<k,n> | 1<k<n< |dom|} A context is c=<x,k,n> where x is a context node k is a context position n is the context size

Page 17: Xpath Query Evaluation

Semantics for Xpath expressions

• The semantics of evaluating an expression is a 4-tuple where the first 3 elements are the context, and the fourth is the value obtained by evaluation in the context

Page 18: Xpath Query Evaluation

Some notations

• T(t): all nodes satisfying a predicate t

• E(e): all nodes satisfying a regular exp. e (applied with respect to a given axis)

• Idxx(x,S) is the index of a node x in the set s with respect to a given axis and the document order

Page 19: Xpath Query Evaluation
Page 20: Xpath Query Evaluation

Context-value Table

• Given a query sub-expression e, the context-value table of e specifies all combinations of context c and value v, such that computing e on the context c results in v

• Bottom-up algorithm follows: compute the context-value table in a bottom-up fashion with respect to the query

Page 21: Xpath Query Evaluation

Bottom-up algorithm

Page 22: Xpath Query Evaluation

Example

4 times

Page 23: Xpath Query Evaluation

Complexity

• O(|D|^3*|Q|) space ignoring strings and numbers– O(|Q|) tables, with 3 columns, each including values

in 1…|D| thus O(|D|^3*|Q|)– An extra O(|D|*|Q|) multiplicative factor for strings

and numbers

• O(|D|^5*|Q|) time ignoring strings and numbers– It can take O(|D|^2) to combine two nodesets– Extra O(|Q|) in case of strings and numbers

Page 24: Xpath Query Evaluation

Optimization

• Represent contexts as pairs of current and previous node

• Allows to get the time complexity down to O(|D|^4* |Q|^2)

• Space complexity can be brought down to O(|D|^2*|Q|^2) via more optimizations

Page 25: Xpath Query Evaluation

Top-down evaluation

• Similar idea

• But allows to compute only values for contexts that are needed

• Same worst-case bounds

Page 26: Xpath Query Evaluation

Top-down or bottom-up?

• General question in processing XML trees• The tradeoff:

– Usually easier to combine results computed in children to obtain the result at the parent

• So bottom-up traversal is usually easier to design

– On the other hand, some of the computation is redundant since we don’t know if it will become relevant

• So top-down traversal may be more efficient

Page 27: Xpath Query Evaluation

Linear-time fragment• Core Xpath includes only navigation

– \ and \\

• Core Xpath can be evaluated in O(|D|*|Q|)

• Observtion: no need to consider the entire triple, only current context node

• Top-down or bottom-up evaluation with essentially the same algorithm

• But smaller tables (for every query node, all document nodes and values of evaluation) are maintained.

Page 28: Xpath Query Evaluation

Types are helpful

• Can direct the search– In some parts of the tree there is no hope to get a

match to a given sub-expression of the query– As a result we may have tables with less entries.

• Whiteboard discussion

Page 29: Xpath Query Evaluation

Type Checking and Inference

• Type checking a single document: straightforward– Polynomial combined complexity if automaton

representing type is deterministic, exponential in automaton size but polynomial in document size otherwise

• Type checking the results of a (Xpath) query• Inferring the results of a query

Page 30: Xpath Query Evaluation

Type Inference

• An (incomplete) algorithm for type inference can work its way to the top of the query parse tree to infer a type in a bottom-up fashion – Start by inferring a type for the leaves (simple

queries), then use it for their parents

• Type Inference is inherently incomplete.• Can be performed for some languages that

are “regular” in a sense.

Page 31: Xpath Query Evaluation

Restricted language allowing for type inference

• Axes: child, descendant, parent, ancestor, following-sibling, etc.

• variables can be bound to nodes in the input tree= then passed as parameters

• An equality test can be performed between node ID's, but not between node values.

Page 32: Xpath Query Evaluation

Type Checking

• In addition to inferring a type we need to verify containment in another type.

• Type Inference can be used as a tool for Type Checking.

• Type Checking was shown to be decidable for the same language fragment, but with high complexity.

Page 33: Xpath Query Evaluation

Intuitive connection to text

• Queries => regular expressions• Types (tree automata) => context free

languages• Type Inference => intersection of context free

and regular languages, resulting in a context free one

• Type checking => Type Inference + inclusion of context free languages (with some restrictions to guarantee decidability)