indexing and querying xml data for regular path expressions

29
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.

Upload: lyre

Post on 01-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Indexing and Querying XML Data for Regular Path Expressions. Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001. Querying XML. XML has tree structured data model. Queries involve navigating data using regular path expressions.(e.g., XPath) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Indexing and Querying XML Data for Regular Path Expressions

Indexing and Querying XML Data for Regular Path

Expressions

Quanzhong Li and Bongki Moon Dept. of Computer Science

University of ArizonaVLDB 2001.

Page 2: Indexing and Querying XML Data for Regular Path Expressions

Querying XML

XML has tree structured data model.Queries involve navigating data using regular path expressions.(e.g., XPath)e.g. /chapter/-*/figure[@caption=“Tree Frogs”] Accessing all elements with same name

string. Ancestor-descendant relationship between

elements.

Page 3: Indexing and Querying XML Data for Regular Path Expressions

Contribution

New system for Indexing XML data.Querying XML data based on a numbering scheme for elementsJoin algorithms for processing complex regular path expressions.

Page 4: Indexing and Querying XML Data for Regular Path Expressions

Outline

Numbering schemeIndex structureJoin algorithmsExperimental results

Page 5: Indexing and Querying XML Data for Regular Path Expressions

Path expression evaluation

Previous approaches Conventional tree traversals

Disadvantage: Overhead of traversing for long or unknown path lengths.

New approach Indexing for efficient element access. Numbering scheme for ancestor-

descendant relationship.

Page 6: Indexing and Querying XML Data for Regular Path Expressions

Dietz’s Numbering Scheme

for two given nodes x and y, x is an ancestor of y, if and only if x occurs before y

in the preorder traversal of T and

after y in postorder traversal.

(1,7)

(2,4)

(3,1) (4,2) (5,3)

(6,6)

(7,5)

Page 7: Indexing and Querying XML Data for Regular Path Expressions

Proposed numbering scheme

This associates with each nodea pair of numbers <order,

size>as follows:

For a tree node y and its parent x,

order(x) < order(y) order(y)+size(y) =<

order(x) + size(x)For two sibling nodes x and y, if x is the predecessor of y in preorder traversal then

order(x) + size(x) < order(y)

(1,100)

(10,30)

(11,5) (17,5)(25,5)

(41,10)

(45,5)

Page 8: Indexing and Querying XML Data for Regular Path Expressions

Advantages

Efficient Updates• Extra space can be reserved to

accommodate future insertions.

Page 9: Indexing and Querying XML Data for Regular Path Expressions

Ancestor–descendant relationship

For two given nodes x and y of a tree T, x is an ancestor of y if and only if order(x) < order(y) =< order(x) +

size(x).

Page 10: Indexing and Querying XML Data for Regular Path Expressions

Outline

Numbering schemeIndex structureJoin algorithmsExperimental results

Page 11: Indexing and Querying XML Data for Regular Path Expressions

Index and Data Organization

XML Raw

Data

Document

Loader

Element

Index

Attribute

Index

Structure

Index

Name

Index

Value

Table

Paged File

Query

ProcessorQuery

XISS

Result

Page 12: Indexing and Querying XML Data for Regular Path Expressions

Element Index

Element nid

Document ID list

Element list with the

Same name in the

Same Document

B+-tree<Order, Size>

Depth,

Parent ID

Element

Record

Element nid

B+-tree

Page 13: Indexing and Querying XML Data for Regular Path Expressions

Structure Index

Document ID

(did)

Array of All Elements

And Attributes in the

Same Document

nid,

<order,size>,

Parent order,

Child order,

Sibling order,

Attribute order

B+-tree

Page 14: Indexing and Querying XML Data for Regular Path Expressions

Outline

Numbering schemeIndex structureJoin algorithmsExperimental results

Page 15: Indexing and Querying XML Data for Regular Path Expressions

Regular Path expression

complex regular path expressions. e.g.,

/chapter/_*/figure[@caption=“Tree Frogs”]

Symbol Function of symbol

__ Any single node

/ Union of node

* Zero or more occurrences of a node

@ Denotes attributes

Page 16: Indexing and Querying XML Data for Regular Path Expressions

Regular expression Decomposition

A regular path expression can be decomposed to a combination of following basic subexpressions:

1. A subexpression with a single element or a single attribute,

2. A subexpression with an element and an attribute ( e.g., figure[@caption = “Tree Frogs”])

3. A subexpression with two elements (e.g., chapter/figure or chapter/_*/figure),

4. A subexpression with a Kleene closure (+,*) of another subexpression, and

5. A subexpression that is a union of two other subexpressions.

Page 17: Indexing and Querying XML Data for Regular Path Expressions

Example ( E1 / E2 ) * / E3 / ( ( E4 [ @A = v ] ) | ( E5 / _* / E6 ) )

*

[ ]

E1 E2 E3 E4 @A=v E5 E6

/

/

/

/

/_*/EE-Join

KC-Join

EE-Join

EE-Join

Union

EA-Join EE-Join

Page 18: Indexing and Querying XML Data for Regular Path Expressions

Join algorithms

Element – Attribute joinElement – Element joinKleene – Closure join

Page 19: Indexing and Querying XML Data for Regular Path Expressions

EA-Join Algorithm

Input: {E1..Em}: Ei is a set of elements having a common

document identifier; {A1..An}: Aj is a set of attributes having a common

document identifier;Output:

A set of (e,a) pairs such that the element e is the parent of the attribute a.

//Sort-merge {Ei} and {Aj} by document identifier.For each Ei and Aj with the same did do

//Sort-merge Ei and Aj by PARENT-CHILD relationship.For each e in Ei and a in Aj do

If ( e is a parent of a) then output (e,a);End

End.

Page 20: Indexing and Querying XML Data for Regular Path Expressions

Example

chapter chapter chapter appendix

Figure Figure Figure

book

Page 21: Indexing and Querying XML Data for Regular Path Expressions

Attribute-element position

chapter <1,3>

chapter<2,1>

name <3,0>

name <4,0>

chapter <1,3>

name<2,0>

chapter <3,1>

name <4,0>

Page 22: Indexing and Querying XML Data for Regular Path Expressions

EE-Join Algorithm

Input: {E1..Em} and {F1..Fn}: Ei and Fj is a set of elements

having a common document identifier.Output:

A set of (e,f) pairs such that the element e is an ancestor of the element f.

//Sort-merge {Ei} and {Fj} by doc. identifier.For each Ei and Fj with the same did do

//Sort-merge Ei and Fj by ANCESTOR-DESCENDANT relationship.For each e in Ei and f in Fj do

If (e is an ancestor of f ) then output (e,f)End

End

Page 23: Indexing and Querying XML Data for Regular Path Expressions

Extreme case of EE-Join

chapter <1,90>

chapter <2,80>

chapter <8,20>

chapter <9,10>

figure <10,0>

figure <11,0>

figure <19,0>

Page 24: Indexing and Querying XML Data for Regular Path Expressions

KC-Join Algorithm

Input: {E1..Em}: where Ei is a group of elements from an

XML document.Output:

A Kleene Closure of {E1..Em}//Apply EE-Join algorithm repeatedly.Set x = 1;Set Ki = {E1..Em};Repeat

Set I = I +1;Set Ki = EE-Join(Ei-1, E1);

Until ( Ki is empty);Output union of K1,K2..Ki-1.

Page 25: Indexing and Querying XML Data for Regular Path Expressions

Outline

Numbering schemeIndex structureJoin algorithmsExperimental results

Page 26: Indexing and Querying XML Data for Regular Path Expressions

Experiment Results

Comparison with top-down and bottom-up evaluation methods.Comparison for EE-Join ( E1 /_*/ E2 ) EA-Join ( E[@A] )

Scalability test

Page 27: Indexing and Querying XML Data for Regular Path Expressions

EE-Join performance

Page 28: Indexing and Querying XML Data for Regular Path Expressions

EA-Join performance

Page 29: Indexing and Querying XML Data for Regular Path Expressions

Results

EE-Join algorithm outperformed bottom-up.EA-Join algorithm is comparable with top-down but outperformed bottom-up.Both are linearly scalable.