storing and querying ordered xml using relational database system

40
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude

Upload: clayton-bowers

Post on 30-Dec-2015

34 views

Category:

Documents


0 download

DESCRIPTION

Storing and Querying Ordered XML Using Relational Database System. Swapna Dhayagude. Agenda. Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation. Ordered XML Data Model. XML document as a tree structure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Storing and Querying Ordered XML Using Relational Database System

Storing and Querying Ordered XML Using Relational Database System

Swapna Dhayagude

Page 2: Storing and Querying Ordered XML Using Relational Database System

Agenda

Ordered XML Data Model

Order Encoding Methods

Shredding Ordered XML into Relations

Translating XML queries to SQL

Performance Evaluation

Page 3: Storing and Querying Ordered XML Using Relational Database System

Ordered XML Data Model

XML document as a tree structure - Relation as the ‘root’

- Nodes represent elements

- Leaf nodes hold data values

Document Type Descriptor

- schema information about the XML document

Order - a salient feature of an XML document

Page 4: Storing and Querying Ordered XML Using Relational Database System

Significance of order in XML

Order –

Important from the point of view of reconstruction of XML documents- To ensure a lossless mapping from XML to RDB

Performance issues- Choice of order dramatically affects performance- Enhances Efficient Translation of XML into SQL

Order based functionality of XPath and XQuery

XPath – a simple ‘path based’ query language XQuery – a complex query language based on XPath

Page 5: Storing and Querying Ordered XML Using Relational Database System

Three dimensions of XML order

Evaluation of Order based axesXPath expressions requiring document order

1. preceding

2. following

Inter Element Order

result set enforces document order among result set elements

Intra Element Order

For reconstruction, document order is important

Page 6: Storing and Querying Ordered XML Using Relational Database System

Agenda

Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Page 7: Storing and Querying Ordered XML Using Relational Database System

How is order encoded ?

Order is preserved using

a simple numbering scheme

Each node is represented

using a node_id

Node-id is stored as a data value

within the relation

Numbering schemes capture enough information

to reconstruct XML documents

Page 8: Storing and Querying Ordered XML Using Relational Database System

Order Based Functionality of XPath

XPath follows a step-by-step sequential evaluation, Each step is applied to a single node (context node) Result of each step is a set of nodes {node1,node2,..,node n}

XPath syntax Path :: = /Step1/Step2/…/StepN

Where each Xpath Step is defined as follows:Step :: = Axis :: Node-test Predicate*

Axis selects a direction of navigation

e.g. child :: titleWould select all children that are ‘titles’

Page 9: Storing and Querying Ordered XML Using Relational Database System

Order Based Functionality of XPath

Axes – specify the direction of navigation in an XML document Up

parent ancestor

Down child descendant

Left preceding Preceding-sibling

Right following Following-sibling

Page 10: Storing and Querying Ordered XML Using Relational Database System

Order Based Functionality of XQuery

BEFORE operator- Return nodes from the first sequence that are before some node in the second sequence

AFTER operator

- Return nodes from the first sequence that are after some node in the second

sequence

XQuery supports range predicates

- allows selection of a range of elements from a sequence

e.g. /play/act[2 TO 4]

Will return act #2 ,act #3, and act #4 in document order.

Page 11: Storing and Querying Ordered XML Using Relational Database System

Global Order Encoding Methods

Global Order Encoding Absolute positioning of nodes Best performance on queries - Query evaluation requires simple

comparison between node positions

Worst performance on updates, especially deletes

play(1)

title(2)

text#(3)

act(8)act(4)

title(5) scene(7)

text#(6)

Page 12: Storing and Querying Ordered XML Using Relational Database System

Global Order Encoding (contd)

Initially, sparse numbering is used for Global Order Encoding Sparse numbering brings down the cost of renumbering

(on inserts/updates) Sparse numbering results in better performance on updates

Makes intra-element and inter-element ordering easy(since global document order is easily available)

Drawback - performs poorly on inserts(Local Order offers better performance for inserts/updates)

Page 13: Storing and Querying Ordered XML Using Relational Database System

Global Order Renumbering Scenario

Inserting a new element in an existing document causes many nodes to be renumbered

In the adjoining figure, the highlighted nodes need to be renumbered (maximum in the global ordering scheme)

play(1)

title(2)

text#(3)

act(8)

New Element

act(4)

title(5) scene(7)

scene(7)

Page 14: Storing and Querying Ordered XML Using Relational Database System

Local Order Encoding Methods

Local Order Encoding1. Relative positioning of nodes

2. Best performance on updates

3. Worst performance on queries

play(1)

act(2)title(1) act(3)

text(1)title(1) scene(2)

text(1)

Page 15: Storing and Querying Ordered XML Using Relational Database System

Local Order Encoding (continued….)

How does local Order encoding reconstruct absolute path ?

the relative position of a node is combined

with the relative order of the

parent

this combined effect yields a vector that

uniquely identifies the absolute

position within the document

(relative position of node) + (relative position of ancestor)

= (absolute position of node in the document)

Page 16: Storing and Querying Ordered XML Using Relational Database System

Local Order Renumbering Scenario

As opposed to Global Order Encoding, Local Order requires a minimum number of nodes to be renumbered

This is a major advantage, since it dramatically reduces the cost of inserts

play(1)

title(1)

text#(1)

act(2)

New Element

act(2)

title(1) scene(2)

scene(1)

Page 17: Storing and Querying Ordered XML Using Relational Database System

Local Order Encoding (continued….)

Incurs low overhead on updates

Only “following-sibling “ may require renumbering

Drawbacks – Lack of global order information

results in complex evaluations of

following and preceding axes

Page 18: Storing and Querying Ordered XML Using Relational Database System

Dewey Order Encoding Methods

Dewey Order Encoding

1. Strikes a balance between Global and Local

2. Reasonable performance on updates and queries

Play 1

title(1.1)

text(1.1.1)

act(1.2)

title(1.1.2)

act(1.3)

scene(1.2.2)

text(1.1.2.1)

Page 19: Storing and Querying Ordered XML Using Relational Database System

Dewey Order Encoding

Each path uniquely identifies

absolute position of a node in a document

Query processing is similar to that of

Global order

Only “following-sibling “ may require renumbering

Drawbacks – Extra space required to store paths

from root to the node

Page 20: Storing and Querying Ordered XML Using Relational Database System

Dewey Order Renumbering Scenario

Renumbering required is more than that for Local Encoding, however much less than that for Global Encoding

play

title

text#

act

New element

act

title scene

scene

Page 21: Storing and Querying Ordered XML Using Relational Database System

Agenda

Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Page 22: Storing and Querying Ordered XML Using Relational Database System

Shredding XML into Relations

Schema-less Case

Unknown schema of input XML documents

Edge Approach -

Each document is stored as a single table

Schema-aware Case

Schema of input XML documents is available

Inlining –

Single occurrence of child – store within parent relation

Multiple occurrences – store as a new relation table

Page 23: Storing and Querying Ordered XML Using Relational Database System

Inlining

Inlining is an effective way of storing and querying XML

provided the availability of Document Schema

Inlining adapts to Global, Local and Dewey Orders.

Every relation requires an additional column

to encode document order

storing order information of ‘inlined’ elements is unnecessary

(Element position is determined from the position of parent

and from the document schema)

Page 24: Storing and Querying Ordered XML Using Relational Database System

Storing Order Information – Schema less case

The Edge Approach Each relation is stored as a table Each tuple within the table represents a node

Edge (id, parent_id, name, value)

id synonymous to a primary key

parent_id synonymous to the foreign key, provides link to the node’s parent

name stores tag name of element

value stores text value

Page 25: Storing and Querying Ordered XML Using Relational Database System

Storing Order Information – Schema less case

Edge approach adapts differently to Global, Local and Dewey

Global OrderEdge (id, parent_id, end_desc_id, path_id, value)

end_desc_id – id of the last descendant of a node

Local OrderEdge (id, parent_id, sIndex, path_id, value) sIndex – sibling index of a node

Dewey OrderEdge (dewey, path_id, value)dewey – represents both order and ancestor information

Page 26: Storing and Querying Ordered XML Using Relational Database System

Agenda

Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Page 27: Storing and Querying Ordered XML Using Relational Database System

Query Translation for Global Order

Edge (id, parent_id, end_desc_id, path_id, value)Translation of following/preceding

Select nodes from Edge table whereid value (context node) > end_descendant_id of context node

Select nodes from Edge table where id value (context node) < end_descendant_id of context node

Translation of following-sibling/ preceding-siblingSelect

(nodes in Edge table with id value > id of context node) AND

(nodes with parent_id = parent_id of context node)

Select (nodes in Edge table with id value < id of context node)

AND (nodes with parent_id = parent_id of context node)

Note : above expressions are NOT actual SQL statements

Page 28: Storing and Querying Ordered XML Using Relational Database System

Query Translation for Local Order

Edge (id, parent_id, sIndex, path_id, value)

Translation of following-sibling/ preceding-sibling(Similar to Global and Dewey Order)

Translation of following/preceding (Complex Task !!!)

1. Compute all ancestors of context node – {anc}

2. Compute ancestors of following-sibling - {anc_sib}

3. Compute descendants of {anc_sib}

Challenges: Without knowledge of XML schema,

retrieving ancestors/descendants is a complex task Involves recursion

Page 29: Storing and Querying Ordered XML Using Relational Database System

Query Translation for Dewey Order

Edge (dewey, path_id, value)

dewey column

- stored as variable length byte string

- replaces parent_id, and end_desc_id in Global Edge Table- Encodes parent and descendant information within the dewey path

- Eliminates need to store parent_id and child_id

Drawback:

Storage overhead due to large number of bytes

allocated to each component.

Page 30: Storing and Querying Ordered XML Using Relational Database System

Query Translation in Inlining

Essentially uses the same algorithm as that for Edge approach

but with 2 extensions XML data can be spread across several tables

therefore evaluating axes requires access to multiple tables

as opposed to accessing just one Edge table

Secondly translation algorithm does not use recursion

(since the schema contains sufficient information about

depth and postion of nodes)

Drawback:

Data is partitioned across many tables, too many tables to handle

Page 31: Storing and Querying Ordered XML Using Relational Database System

Agenda

Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

Page 32: Storing and Querying Ordered XML Using Relational Database System

Storage Requirements

Table 1: Indicates the storage requirements of Global, Local and Dewey Encoding Methods

Order Scheme

Edge Inlining

Table Size Index Size Table Size Index Size

Global 52.1 MB 57.9 MB 44.1 MB 28.9 MB

Local 52.1 MB 87.9 MB 47.7 MB 36.8 MB

Dewey 48.9 MB 38.7 MB 44.5 MB 15.8 MB

Page 33: Storing and Querying Ordered XML Using Relational Database System

Performance

All experiments are based on the Shakespeare’s Plays dataset.Table 2: Test Queries

Query Query Definition

Q1 /play

Q2 /play/act//speech

Q3 /play/act/scene/speech

Q4 /play/act/scene/speech[2]

Q5 /play/act/scene/*[2]

Q6 /play/act/scene/speech[1 TO 3]

Q7 /play/act[2]/following:: speech

Q8 /play/act/scene/speech/speaker/following-sibling::line[2]

Q9 //act/scene/speech BEFORE /play/act[2]

Page 34: Storing and Querying Ordered XML Using Relational Database System

Select and Reconstruct Modes

XPath Queries essentially run in 2 different modes

Select Mode : Result set contains only the ID’s

of the nodes satisfying the XPath expression

Reconstruct Mode: Entire XML fragments are extracted

from the database in document order

Page 35: Storing and Querying Ordered XML Using Relational Database System

Ordered Selection Edge Results

0

2

4

6

8

10

12

14

16

18

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9

Global

Local

Dewey

X axis: QueriesY axis: Time

(seconds)

Page 36: Storing and Querying Ordered XML Using Relational Database System

Inlining Results

0

1

2

3

4

5

6

7

8

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9

Global

Local

Dewey

Page 37: Storing and Querying Ordered XML Using Relational Database System

Reconstruction

In reconstruct mode,

XML documents need to be extracted from DB in document order

Optimizers inability to pick the best plan rendered poor results

On the other hand, using ‘tuned’ SQL queries yielded better results

Note: Queries Q3,Q4,Q5,Q9 had a disastrous performance (way beyond the scope of indicated scale)

0

1

2

3

4

5

6

7

8

9

10

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9

Initial

Tuned

Page 38: Storing and Querying Ordered XML Using Relational Database System

Performance

Results based on experiments Global order is the most efficient order encoding method

Followed by Dewey Order – second best performance

Local Order uses sorting very often which degrades

overall performance

Typically Inlining performs better than Edge

In general the XML document parsing overhead was more than XPath processing

Page 39: Storing and Querying Ordered XML Using Relational Database System

Performance

Conclusions based on results

RDBMS efficiently supports ordered XML Global order is the best for query workloads Dewey Order is slightly less efficient than Global Order

Best for a mix of queries and updates Schema Information makes Local Order a viable alternative Incomprehensiveness of Relational Optimizers

to the hierarchical XML structure

Page 40: Storing and Querying Ordered XML Using Relational Database System

Acknowledgements…

Prof. Elke Rundensteiner

Thank You …