xquery to sql by xat
DESCRIPTION
XQuery to SQL by XAT. Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke. Outline. Merged algebra proposed based on Niagara XPERANTO One thorough example of XQuery SQL. Data Model. An Ordered Table in two dimensions Tuple order Column order. Every cell has its own domain - PowerPoint PPT PresentationTRANSCRIPT
1
XQuery to SQL by XAT
Xin ZhangThanks:
Brian, Mukesh, Maged, Lily, Elke
2
Outline Merged algebra proposed based on
Niagara XPERANTO
One thorough example of XQuery SQL
3
Data Model An Ordered Table in two dimensions
Tuple order Column order.
Every cell has its own domain Every column binds to one variable. The domain can be:
SQL domains. XML Fragment.
Can be a list of XML elements. Comparison are done by values
4
Data Model Examples Table of XML Fragments. Explicit Naming
E.g. variable bindings Implicit Naming
E.g. XPath notations. Reduce complexity of many
internal variables.
<carrier> $carrier</carrier
invoice_id
carrier
carrier_entry
carriers <carrier> $carrier</carrier<carrier> $carrier</carrier>……….
/ /invoice /invoice/account_number
$rate
5
Naming of Columns Implicit:
SQL operators Navigate
Explicit ( “name”): Variable binding:
Holding a set of values. Variable name ($name) is name of a column
Rename Distinguish in one operator where,
same “names” from different sources Abbreviate a very long “name”.
Create a new name for creation operators Need to used with those operators. E.g. Tagger
6
Operators SQL like (9):
Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), COp.
XML like (4): Tagger, Navigate, is(Element, Text),
Aggregate. Special:
SQL, Function, Source, Name, FOR
7
SQL like Operators (9)NiagaraNiagara XPERANTOXPERANTO
Project Expose Project
Select Select Select
Theta Join Join Theta Join
Outer Join N/A Outer Join
Semi Join N/A N/A
Groupby Group Groupby
Orderby N/A Orderby
Union Union Union
Outer Union
Union Outer Union
COp N/A Correlated Join
8
XML like OperatorsNiagarNiagaraa
XPERANTOXPERANTO
Tagger*(pattern)
Vertex Project:cr8(Elem, AttList, Att, XMLFragList),
Navigate(from, path)
Follow Project:get(TagName, Attributes, Contents, AttName, AttValue), Unnest
Is N/A Select:is(Element, Text),
Aggregate Group AggXMLFrags
9
Special Operators
NiagarNiagaraa
XPERANTXPERANTOO
DescriptionDescription
SQL N/A Input Denote a SQL query.
Function
N/A Function Used to represent recursive query
Source Source Table, View
Identify a data source.
Name Rename
N/A Naming of columns.
FOR N/A N/A FOR iteration.
10
Operator Specification Description
Input Specification. Output Specification. Logic description.
Illustrative Example
11
Naming Operator Syntax:
Name(“from_name”, “to_name”) Simplified Syntax:
to_name := from_name
12
Steps in Translation XQuery XML Algebra Tree User View XML Algebra Tree View Composition Computation Pushdown Optimization
13
<?xml version=”1.0” encoding=”US-ASCII” ?> <!DOCTYPE invoice [<!ELEMENT invoice (account_number,
bill_period, carrier+, itemized_call*, total)>
<!ELEMENT account_number (#PCDATA)><!ELEMENT bill_period (#PCDATA)><!ELEMENT carrier (#PCDATA)><!ELEMENT itemized_call EMPTY><!ATTLIST itemized_call
no ID #REQUIREDdate CDATA #REQUIREDnumber_called CDATA #REQUIREDtime CDATA #REQUIREDrate (NIGHT|DAY) #REQUIREDmin CDATA #REQUIREDamount CDATA #REQUIRED>
<!ELEMENT total (#PCDATA)>]>
<invoice>
<account_number>555 777-3158 573 234 3</account_number>
<bill_period>Jun 9 - Jul 8, 2000</bill_period>
<carrier>Sprint</carrier>
<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />
<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />
<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />
<total>$0.35</total>
</invoice>
Example of Telephone Bill
14
Example XQueryUser XQuery: <summary>{
FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)
LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]
WHERE $itemized_call/@number_called LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>
}</summary>
Count number of itemized_calls in calling area 973 grouped by the calling rate.
15
XQuery XML Algebra Tree Divide into query blocks Convert each query block into XML
Algebra Tree (XAT). Identify Correlated Operators Combine into one XML Algebra
Tree. Query decorrelation
16
Query BlocksUser XQuery: <summary>{
FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)
LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]
WHERE $itemized_call/@number_called LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>
}</summary>
B1: Construct summary from the result from B2B2: Get all the distinct rate and iterate through it.
B1
B2
B3
B3: Count itemized call for a given rate.
The block identification is
arbitrary (wrong).
17
XAT of B1
B1B1<summary>
</summary>
B2B2
XAT:
Tagger(<summary>[V1] </summary>)
B2
[V2] it is a name instead of a part of pattern.
Name(“Tagger(<summary>[V1]</summary>)”, “V2”)
18
XAT of B2
B3B3{FOR
$rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)
}
B3B3
XAT:
Select(distinct(“invoice/itemized_call/@rate:/”))
B3
Source(“invoice.xml”)
Navigate(“/”, invoice/itemized_call/@rate)
Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”)
FOR($rate)
Aggregate
19
XAT of B3
B4B4LET
$itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate]
WHERE $itemized_call /@number_called LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls> count($itemized_call) </number_of_calls>
XAT:
Navigate(“$itemized_call”, @rate)
Source(“invoice.xml”)
Navigate(“/”, invoice/itemized_call)
B2
Select(“@rate:$itemized_call” = “$rate”)
Name(“invoice/itemized_call:/”, “$itemized_call”)
20
XAT of B3 (Cont.)
B4B4LET
$itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate]
WHERE $itemized_call /@number_called LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls> count($itemized_call) </number_of_calls>
XAT:
Navigate(“$itemized_call”, @number_called)
Select(“@number_called:$itemized_call” like ‘973%’)
21
XAT of B3 (Cont.)
B4B4LET
$itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate]
WHERE $itemized_call /@number_called LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls> count($itemized_call) </number_of_calls>
XAT:
Tagger(<rate>[$rate]</rate><number_of_calls>
[count($itemized_call)]</number_of_calls>)
Select(count(“$itemized_call”))B2
Name(“Tagger(<rate>[$rate]</rate> <number_of_calls>[count($itemized_call)]</number_of_calls>)”, “V1”)
22
Put it Together
Select(count(“$itemized_call”))
Navigate(“$itemized_call”, @number_called)
Select(“@number_called:$itemized_call” like ‘973%’)
Source(“invoice.xml”)
Navigate(“/”, invoice/itemized_call)
Select(“@rate:$itemized_call” = “$rate”)
Name(“Tagger(<summary>[V1]</summary>)”, “V2”)
Select(distinct(“invoice/itemized_call/@rate:/”))
Source(“invoice.xml”)
Navigate(“/”, invoice/itemized_call/@rate)
Name(“invoice/itemized_call:/”, “$itemized_call”)
Navigate(“$itemized_call”, @rate)
B1
B2
B3
FOR($rate)
Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”)
Aggregate()Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)
Name(“Tagger(<rate>[$rate]</rate> <number_of_calls>[count($itemized_call)]</number_of_calls>)”, “V1”)
Tagger(<summary>[V1] </summary>)
23
Syntax Suger
Select(count(“$itemized_call”))
Navigate(“$itemized_call”, @number_called)
Select(“@number_called:$itemized_call” like ‘973%’)
Source(“invoice.xml”)
$itemized_call := Navigate(“/”, invoice/itemized_call)
Select(“@rate:$itemized_call” = “$rate”)
V2 := Tagger(<summary>[V1] </summary>)
$rate := Select(distinct(“invoice/itemized_call/@rate:/”))
Source(“invoice.xml”)
Navigate(“/”, invoice/itemized_call/@rate)Navigate(“$itemized_call”, @rate)
B1
B2
B3
FOR($rate)
Aggregate()
V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)
24
Query Decorrelation for COp Top-down approach over XAT Tree. Approach:
Correlated Binding (CB) Op1[COp(CB, Op2)[Op3[Correlated
Operator[A],B]]] Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]]
For example: Correlated Join Outer Join with Groupby
with Cartesian
25
Query Decorrelation for FOR Top-down approach over XAT Tree. Approach:
Correlated Binding (CB) Op1[FOR(CB)[Op2[Correlated
Operator[A],B]]] Op1[Groupby(CB, Op2[]) [Operator[Cartesian[A,B]]]]
Differences: SQL Decorrelation: Return Outer Query XQuery Decorrelation: Return Inner Query CO: Return both Outer/Inner Query
26
FOR Decorrelation Example
Source(“invoice.xml”)
Select(“@rate:$itemized_call” = “$rate”)
…1
Source(“invoice.xml”)
…3
B2
B3
FOR($rate)
…2
Source(“invoice.xml”)Select(“@rate:$itemized_call” = “$rate”)
Groupby(“$ratel”, )
Cartesian
Source(“invoice.xml”)
…3
B1 B1B2
B3
AggregateAggregate
…2
…1
27
Default XML View<invoice>
<row><id> 1 </id><account_number>555 777-3158 573 234
3</account_number><bill_period> Jun 9 – Jun 8, 2000 </bill_period><total>$0.35</total>
</row>
</invoice><carrier>
<row><invoice_id> 1 </invoice_id><carrier>Sprint</carrier>
</row>
</carrier>...
id account_number bill_period total
1 555 777-3158 573 234 3
Jun 9 – Jun 8, 2000
$0.35
invoice
invoice_id carrier
1 Sprint
carrier
invoice_id
no
date number_called
time rate min
amount
1 1 JUN 10
973 555-8888
10:17pm
NIGHT
1 0.05
1 2 JUN 13
973 650-2222
10:19am
DAY 1 0.15
1 3 JUN 15
206 365-9999
10:25pm
NIGHT
3 0.15
itemized_call
28
User Defined XML View<invoice>
<account_number>555 777-3158 573 234 3</account_number>
<bill_period>Jun 9 - Jul 8, 2000</bill_period>
<carrier>Sprint</carrier>
<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />
<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />
<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />
<total>$0.35</total>
</invoice>
<invoice><row>
<id> 1 </id><account_number>555 777-3158 573 234
3</account_number><bill_period> Jun 9 – Jun 8, 2000 </bill_period><total>$0.35</total>
</row>
</invoice><carrier>
<row><invoice_id> 1 </invoice_id><carrier>Sprint</carrier>
</row>
</carrier><itemized_call>
<row><invoice_id>1</invoice_id><no>1</no>…
</row>…
</itemized_call>
29
User Defined XML View Cont.
Create view invoice as (FOR
$invoice IN view(“default”)/invoice/row
RETURN<invoice>
<account_number>$invoice/account_number/text()</account_number><bill_period>$invoice/bill_period/text()</bill_period>FOR
$carrier in view(“default”)/carrier/rowWHERE
$carrier/invoice_id = $invoice/idRETURN
<carrier>$carrier/carrier/text()</carrier>FOR
$itemized_call in view(“default”)/itemized_call/rowWHERE
$itemized_call/invoice_id = $invoice/idRETURN
<itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()>
SORTBY (@no)<total>$invoice/total/text()</total>
</invoice>
)
30
User Defined XML View Block
Create view invoice as (FOR
$invoice IN view(“default”)/invoice/row
RETURN<invoice>
<account_number>$invoice/account_number/text()</account_number><bill_period>$invoice/bill_period/text()</bill_period>FOR
$carrier in view(“default”)/carrier/rowWHERE
$carrier/invoice_id = $invoice/idRETURN
<carrier>$carrier/carrier/text()</carrier>FOR
$itemized_call in view(“default”)/itemized_call/rowWHERE
$itemized_call/invoice_id = $invoice/idRETURN
<itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()>
SORTBY (@no)<total>$invoice/total/text()</total>
</invoice>
)
B4
B5
B6
31
XML View XAT
V4 := Tagger(<invoice><account_number>[$invoice/account_number/text()]</account_number><bill_period>[$invoice/bill_period/text()</bill_period>…[V3]<total>[$invoice/total/text()]</total></invoice>)
V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/>
Aggregate()Aggregate()
Source(“default..xml”)
$invoice := Navigate(“/”,invoice/row )
FOR($invoice/id)
Source(“default.xml”)
$itemized_call := Navigate(“/”, itemized_call/row)
Navigate($itemized_call, no/text())
Navigate($itemized_call, invoice_id)
Select(“$itemized_call/invoice_id”=“$invoice/id”)
Navigate(“$invoice”, id)
B5
Navigate($itemized_call, amount/text())
…
32
3-Way Correlation
…2
Source(“invoice.xml”)
B4
FOR($invoice/id)
…1
B5B6
33
3-Way Decorrelation
…2
Source(“default.xml”)
B4
JOIN($invoice/id)
…1
B5 with CartesianB6 with Cartesian
GB($invoice/id, …) GB($invoice/id, …)
…2
Source(“default.xml”)
34
View XAT After Decorrelation
V4 := Tagger(<invoice><account_number>[$invoice/account_number/text()]</account_number><bill_period>[$invoice/bill_period/text()</bill_period>…[V3]<total>[$invoice/total/text()]</total></invoice>)
V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/>
Aggregate()Groupby($invoice/id, Aggregate())
Source(“default..xml”)
$invoice := Navigate(“/”,invoice/row )
Join($invoice/id)
Source(“default.xml”)
$itemized_call := Navigate(“/”, itemized_call/row)
Navigate($itemized_call, no/text())
Navigate($itemized_call, invoice_id)
Join(“$itemized_call/invoice_id”=“$invoice/id”)
Navigate(“$invoice”, id) B5
Navigate($itemized_call, amount/text())
…
Groupby($invoice/id…)
Source(“default..xml”)
$invoice := Navigate(“/”,invoice/row )
Navigate(“$invoice”, id)
35
View Composition Input:
User Query XAT + User View XAT Output:
Simplified composite XAT Approach:
XAT Cutting: Remove un-referenced columns and operators.
Pushdown Navigation By using the commutative rules
Cancel out the navigation operators By using the composition rules
36
XAT Cutting Cut Query Blocks
User query only require itemized_call. B5 is cut, Invoice is cut
B4 is simplified. B6 is simplified.
Cut Columns User query only used
itemized_call@rate.
37
View XAT After B5 is Cut.
V4 := Tagger(<invoice><account_number>[$invoice/account_number/text()]</account_number><bill_period>[$invoice/bill_period/text()</bill_period[V3]<total>[$invoice/total/text()]</total></invoice>)
V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/>
Aggregate()Groupby($invoice/id, Aggregate())
Source(“default..xml”)
$invoice := Navigate(“/”,invoice/row )
Source(“default.xml”)
$itemized_call := Navigate(“/”, itemized_call/row)
Navigate($itemized_call, no/text())
Navigate($itemized_call, invoice_id)
Join(“$itemized_call/invoice_id”=“$invoice/id”)
Navigate(“$invoice”, id)
Navigate($itemized_call, amount/text())
…
38
View After Columns are Cut.
V4 := Tagger(<invoice>[V3] </invoice>)
V3 := Tagger(<itemized_call number_called=[] rate=[] />
Aggregate()
Groupby($invoice/id, Aggregate())
Source(“default..xml”)
$invoice := Navigate(“/”,invoice/row )
Source(“default.xml”)
$itemized_call := Navigate(“/”, itemized_call/row)
Navigate($itemized_call, number_called/text())
Navigate($itemized_call, invoice_id)
Join(“$itemized_call/invoice_id”=“$invoice/id”)
Navigate(“$invoice”, id)
Navigate($itemized_call, rate/text())
39
Navigation Cancel Out Navigation Pushdown
Based on some transformation rules. E.g. commutative of navigation and
other operators. Navigation + Tagger Cancel Out
Composition Rules. The cancellation result is “renaming”
40
Query XAT Navi. Pushdown
Select(count(“$itemized_call”))
Navigate(“$itemized_call”, @number_called)
Select(“@number_called:$itemized_call” like ‘973%’)
Source(“invoice.xml”)
$itemized_call := Navigate(“/”, invoice/itemized_call)
Select(“@rate:$itemized_call” = “$rate”)
Navigate(“$itemized_call”, @rate)
B3V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)
Select(count(“$itemized_call”))
Navigate(“$itemized_call”, @number_called)
Select(“@number_called:$itemized_call” like ‘973%’)
Source(“invoice.xml”)
$itemized_call := Navigate(“/”, invoice/itemized_call)
Select(“@rate:$itemized_call” = “$rate”)
Navigate(“$itemized_call”, @rate)
V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)
41
Navi. Tagger Cancel Out
Navigate(“$itemized_call”, @number_called)
Source(“invoice.xml”)
$itemized_call := Navigate(“/”, invoice/itemized_call)
Navigate(“$itemized_call”, @rate)
B3…1
V4 := Tagger(<invoice>[V3] </invoice>)
V3 := Tagger(<itemized_call number_called=[] rate=[] />
Aggregate()
Groupby($invoice/id, Aggregate())
Navigate($itemized_call, number_called/text())
Navigate($itemized_call, rate/text())
…2
42
The Result of Cancel Out
…1
$itemized_call@rate := Navigate($itemized_call, rate/text())
$Itemized_call@number_called := Navigate($itemized_call, number_called/text())
…2
43
Computation Pushdown Goal: XAT SQL operators + XML
operators Step 0: Navigation Pushdown. Step 1: XML Default View SQL
Operators Renaming columns
Step 2: SQL Computation Pushdown. By commutative and composition rules. E.g: predicates pushdown.
44
Navigation Pushdown.
Source(“default..xml”)
$invoice := Navigate(“/”,invoice/row )Source(“default.xml”)
$itemized_call := Navigate(“/”, itemized_call/row)
Navigate($itemized_call, invoice_id)
Join(“$itemized_call/invoice_id”=“$invoice/id”)
Navigate(“$invoice”, id)
$itemized_call@rate := Navigate($itemized_call, rate/text())
$Itemized_call@number_called := Navigate($itemized_call, number_called/text())
Source(“default.xml”)
$itemized_call := Navigate(“/”, itemized_call/row)
$itemized_call@rate := Navigate($itemized_call, rate/text())
$Itemized_call@number_called := Navigate($itemized_call, number_called/text())
Join(“$itemized_call/invoice_id”=“$invoice/id”)
45
XML Default View SQL
Source(“default.xml”)
$itemized_call := Navigate(“/”, itemized_call/row)
$itemized_call@rate := Navigate($itemized_call, rate/text())
$Itemized_call@number_called := Navigate($itemized_call, number_called/text())
Source(“itemized_call”)
Project(rate, number_called)
$itemized_call@rate := rate
$Itemized_call@number_called := number_called
…
…
46
Computation Pushdown
A SQL Block
Select(count(“$itemized_call”))
Select(“@number_called:$itemized_call” like ‘973%’)
Select(“@rate:$itemized_call” = “$rate”)
V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)
B3
Select(“@number_called:$itemized_call” like ‘973%’)
Select(count(“$itemized_call”))
Select(“@rate:$itemized_call” = “$rate”)
V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)
A SQL Block
47
Result of the Transformation
Tagger(<summary>[V1]</summary>)
V1 := Aggregate
Tagger(<rate>[rate]</rate><number_of_calls>[count(*)]</number_of_calls>)
SQL:SELECT rate, count(*)FROM itemized_call, invoiceWHERE number_called LIKE ‘973%’ AND invoice.id = itemized_call.invoice_idGROUPBY rate
48
Optimization Efficient Publishing XML Views
Sorted Outer Union. Special Tagger implementation
A lot More!
49
Summary XQuery XAT
Query Block Identification Query Decorrelation
View Composition XAT Cutting Navigation Pushdown Navigation Cancel Out
Computation Pushdown Navigation Pushdown XML Default View SQL Operators Computation Pushdown
Optimization