1 introduction to xml algebra based on talk prepared for cs561 by wan liu and bintou kane

Post on 22-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Introduction to XML Algebra

Based on talk prepared for CS561 by Wan Liu and Bintou Kane

2

Data Model data model ~ core data structures

and data types supported by DBMS relational database is a table (set-

oriented) data model XML format is a tree-structured

hierarchical model

3

Why XML Algebra?

It is common to translate a query language into an algebra.

First, the algebra is used to give a semantics for the query language.

Second, the algebra is used to support query optimization.

5

NIAGARA Title : Following the paths of XML

Data: An algebraic framework for XML query evaluation

By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

Univ. of Wisconsin

6

Outline

Concepts of Niagara Algebra

Operations

Optimization

7

Goals of Niagara Algebra

Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful

algebraic expressions Allow re-use of traditional optimization

techniques

8

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice No = 1>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>AT&T</carrier>

<total>$0.75</total>

</invoice>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

9

XML Data Model and Tree Graph

Example:Invoice_Document

Invoice Invoice…

numbercarrier total number

carriertotal

2 AT&T $0.25 1 Sprint $1.20

<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>

<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>

</Invoice_Document>

Ordered Tree Graph,

Semi structured Data

10

XML Data Model [GVDNM01]

Collection of bags of vertices. Vertices in a bag have no order. Example:

Root invoice.xml invoice invoice.account_number

<invoice>Invoice-element-content

</invoice>

< account_number >element-content

</ account_number >

[Root“invoice.xml”, invoice, invoice. account_number ]

11

Data Model

Bag elements are reachable by path expressions.

Path expression consists of two parts: An entry point A relative forward part

Example: account_number:invoice

12

Operators

Source S , Follow , Select , Join , Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .

13

Source Operator S

Input : a list of documents Output :a collection of singleton bags

Examples :

S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename match “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to schema.dtd

14

Follow operator Input : a path expression in entry

point notation Functionality : extracts vertices

reachable by path expression Output : a new bag that consists of

the extracted vertex + all contents of original bag (in case of unnesting follow)

15

Follow operator (Example*)

Root invoice.xml invoice

<invoice>Invoice-element-content

</invoice>

Root invoice.xml invoice invoice.carrier

<invoice>Invoice-element-content

</invoice>

<carrier>carrier -element-content

</carrier >

(carrier:invoice)*Unnesting Follow

{[Root invoice.xml , invoice]}

{[Root invoice.xml , invoice, invoice.carrier]}

16

Select operator

Input : a set of bags Functionality : filters the bags of a

collection using a predicate Output : a set of bags that conform

to the predicate Predicate : Logical operator (,,), or simple

qualifications (,,,,,)

17

Select operator (Example)

invoice.carrier =Sprint

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

{[Root invoice.xml , invoice],… }

18

Join operator Input: two collections of bags Functionality: Joins the two

collections based on a predicate Output: the concatenation of pairs of

pages that satisfy the predicate

19

Join operator (Example)

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root customer.xml customer<customer>

customer-element-content</customer>

account_number: invoice =number:customer

Root invoice.xml invoice Root customer.xml customer<invoice>

Invoice-element-content</invoice>

<customer>customer-element-content

</customer>

{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

{[Root invoice.xml , invoice, Root customer.xml , customer]}

20

Expose operator

Input: a list of path expressions of vertices to be exposed

Output: a set of bags that contains vertices in the parameter list with the same order

21

Expose operator (Example)

Root invoice.xml invoice. bill_period invoice.carrier

<invoice>carrier-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

(bill_period,carrier)

{[Root invoice.xml , invoice.bill_period, invoice.carrier]}

Root invoice.xml invoice invoice.carrier invoice.bill_period

<invoice>Invoice-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

<invoice>carrier-element-content

</invoice>

22

Vertex operator

Creates the actual XML vertex that will encompass everything created by an expose operator

Example :

(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

23

Other operators Group : is used for arbitrary

grouping of elements based on their values Aggregate functions can be used with

the group operator (i.e. average) Rename : Changes entry point

annotation of elements of a bag. Example: (invoice.bill_period,date)

24

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<total>$0.75</total>

</invoice>

<auditor> maria </auditor>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

25

Xquery ExampleList account number, customer name, and

invoice total for all invoices that has carrier = “Sprint”.

FOR $i in (invoices.xml)//invoice,

$c in (customers.xml)//customer

WHERE $i/carrier = “Sprint” and

$i/account_number= $c/account

RETURN

<Sprint_invoices>

$i/account_number,

$c/name,

$i/total

</Sprint_invoices>

26

Example: Xquery output

<Sprint_Invoice>

<account_number>1 </account_number>

<name>Tom </name>

<total>$1.20</total>

</Sprint_Invoice >

27

Algebra Tree Execution

customer (2) customer(1) Invoice (1) invoice (2) invoice (3)

Source (Invoices.xml) Source (cutomers.xml)

Follow (*.invoice) Follow (*.customer)

Select (carrier= “Sprint” )

invoice (2)

Join (*.invoice.account_number=*.customer.account)

invoice(2) customer(1)

Expose (*.account_number , *.name, *.total )

Account_number name total

28

Optimization with Niagara

Optimizer based on Niagara algebra:

Use the operation more efficiently Produce simpler expressions by

combining operations

29

Language Convention A and B are path expressions A< B -- Path Expression A is

prefix of B AnB --- Common prefix of path

A and B AńB --- Greatest common of

path A and B ┴ --- Null path Expression

30

Heuristics using Rewrite Rules

Allow optimization based on path selectivity

When applying un-nesting following operation Φμ

31

Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]

TRUE when exists C such that C < A && C < B and C = AńB

Or AnB = ┴

Interchangeability of Follow operation

32

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] *

=?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] **

33

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

Equivalent because both share the common prefix “invoice”.

Case AńB = invoice

34

Benefit of Rule Application NOTE: let us assume that acc_Num is required for each invoice

element, while carrier is not required for invoice element

THEN:Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

Then what algebra tree do we prefer?

Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)]

make more sense than ** Why?

35

Discussion

Reduction of Input Size on firstSub-operation:

Φμ(carrier:invoice)

36

Should we/can we apply the rule below?

Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]

37

“acc_Num:invoice” and“acc_Num:customer” are two totally different paths

Case is: AnB = ┴

So yes, rule is valid.

top related