Order-sensitive XML Query Processing over Relational Sources: An Algebraic Approach
Authors: Ling Wang, Song Wang, Brian Murphy and Elke A. Rundensteiner
Institute: Database Systems Research Group, Worcester Polytechnic Institute (WPI)
IDEAS’2005IDEAS’2005
IDEAS’052
Order in XMLOrder in XML
Order is important to XML Document order XML view can be ordered (OrderBy) … User query can be order-sensitive (OrderBy, position(), range()…)
<RESULT> <SONG>
SXE Revenge Shutdown
</SONG></RESULT>
<RESULT>FOR $play in document(“record.xml")/PLAYOrderBy $play/bandRETURN <SONG>
$play[3]/SONG[rang 1 to 2]/text() </SONG></RESULT>
1. Sort PLAY by its band’s name2. Find third PLAY3. Extract its first and second SONG
<RECORDLIST> <PLAY> <BAND> Misfits </BAND>
<SONG> She </SONG> </PLAY> <PLAY> <BAND> Back Street Boy </BAND> <SONG> Bullet </SONG>
<SONG> We Are 138 </SONG> </PLAY> <PLAY> <BAND> Project X </BAND> <SONG> SXE Revenge </SONG> <SONG> Shutdown </SONG> </PLAY></RECORDLIST>
IDEAS’053
Why XML-to-SQL?Why XML-to-SQL?
XML is stored in relational database to … provide reliable persistent storage exploit mature technologies
XML-to-SQL Systems SilkRoute (AT&T), XPERANTO (IBM), RAINBOW (WPI), Rolex
(BellLab), Agora, MARS Oracle XML DB, Microsoft SQL Server 2000 SQLXML, IBM DB2 XML
Extender
IDEAS’054
XML Views support XML view mechanism for XML data publishing support queries (updates) over XML views
XML publishing scenario Relational model is not order-sensitive Order in XML views over RDB has no meaning
XML storage scenario Order is essential !!! Order-preserving loading
– XML document Relational database– Implicit order in XML document explicit order code in RDB
Order-restoring in extraction views– Explicit order code in RDB implicit order in XML view
through view query
XML ViewsXML Views
XML
RDB
XML View
User query
Order encoding
IDEAS’055
Order-specific loadingOrder-specific loading
Order-specific loading: Loading strategies: Inline, edge, … Order encoding methods: Global, local, dewey …
IDEAS’056
ExampleExample
<xs:schema xmlns:xsd="http://www.w3.org/XMLSchema"> <xs:element name="RECORDLIST"> <xs:complexType> <xs:element name="PLAY"
minoccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="BAND" type="xs:string"/> <xs:element name="SONG" type="xs:string"
minoccurs="1"/> </xs:sequence> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>
<RECORDLIST> <PLAY> <BAND> Misfits </BAND>
<SONG> She </SONG> </PLAY> <PLAY> <BAND> Back Street Boy </BAND> <SONG> Bullet </SONG>
<SONG> We Are 138 </SONG> </PLAY> <PLAY> <BAND> Project X </BAND> <SONG> SXE Revenge </SONG> <SONG> Shutdown </SONG> </PLAY></RECORDLIST>
XML schema XML document
IDEAS’057
IID PID POSITION
1 0 1
RECORDLIST
IID PID POSITION BAND_PCDATA
2 1 1 Misfits
3 1 2 Back Street Boy
4 1 3 Project X
PLAY
IID PID POSITION SONG_PCDATA
5 2 1 She
6 3 1 Bullet
7 3 2 We Are 138
8 4 1 SXE Revenge
9 4 2 Shutdown
SONG
Relational Database Inline loading + local order encoding
ExampleExample
<RECORDLIST>FOR $play IN document("dxv.xml")/PLAY/ROWORDER BY $play/POSITION/text()RETURN <PLAY>
<BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY></RECORDLIST>
View query
IDEAS’058
MotivationMotivation
Many loading + Encoding combinations are possible … {inline, edge, …} * {local, global, dewey…}
Hybrid of multiple loading and encoding may occur: Loading:
– Schema is available --- inline– Schema is not available --- edge
Order-encoding– Heavy update workload --- dewey– Query workload --- global
Multiple XML documents are loaded into RDB
Other loading and encoding methods may emerge in future
Conclude: Need general approach for XQuery-to-SQL translation
IDEAS’059
XSOTXSOT
XML-to-SQL Order-sensitive Translation (XSOT):
Step1: Encode XML document with explicit order code(order-exposing)
Step 2: Load XML to relational database (order-preserving)
Step 2: Extract XML view from relational database (order-restoring)
Step 3: Query via XML view with order predicates(order-sensitive)
IDEAS’0510
XQuery Parser
Default XML Schema
Default XML View
Web/Intranet Application User
Sub- System
Process
Data
Query flow
Data flow
Legend
XAT Generator
User XAT
View Composer
XAT Optimizer
View XAT
SQL
XML Result
Ordered Tuple Streams
XML Schema
XML Data
View Query
XML Generator
XAT
RDBMS
SQL GeneratorMapping Manager
XQuery Engine
DB2Oracle SQL Server
Loading XQuery
Schema generation
Data Loading
Order EncodingXQuery
Data Extracting
XML Source Wrapper
Default XML View
Order-Sensitive User Query
Composed XAT
Optimized XAT
Order CodeComparison
Function
Sybase
XSOT FrameworkXSOT Framework
IDEAS’0511
IID PID POSITION
1 0 1
RECORDLIST
IID PID POSITION BAND_PCDATA
2 1 1 Misfits
3 1 2 Back Street Boy
4 1 3 Project X
PLAY
IID PID POSITION SONG_PCDATA
5 2 1 She
6 3 1 Bullet
7 3 2 We Are 138
8 4 1 SXE Revenge
9 4 2 Shutdown
SONG
<RESULT> <SONG>
We are 138Shutdown
</SONG></RESULT>
<RESULT>FOR $record in document(“record.xml")RETURN <SONG>
$record/PLAY/SONG[2]/text() </SONG></RESULT>
Find second song of each play
<RECORDLIST>FOR $play IN document("dxv.xml")/PLAY/ROWORDER BY $play/POSITION/text()RETURN <PLAY>
<BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY></RECORDLIST>
View queryRelational Database Inline loading + local order encoding
Running ExampleRunning Example
IDEAS’0512
Order-sensitive XML Algebra TreeOrder-sensitive XML Algebra Tree
XSOT methodology: An algebraic approach
XML Algebra Tree (XAT) XAT operators
– Select, CartesianProduct, ThetaJoin, LeftOuterJoin, Distinct, GroupBy, OrderBy
– Source, Navigate, Combine, Tagger
XAT Order Extension– Position()
– Range()
Composition of the view and user XAT
IDEAS’0513
View Query XATView Query XAT
<RECORDLIST>FOR $play IN document("dxv.xml")/PLAY/ROWORDER BY $play/POSITION/text()RETURN <PLAY>
<BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY></RECORDLIST>
Combine $dataPlayTag
Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag
GroupBy $play
Combine $dataSongTag
Navigate $song, SONG_PCDATA/text()$sData
Tagger <SONG>$sData</SONG>$dataSongTag
Navigate $song, POSITION/text()$sPos OrderBy $sPos
GroupBy $play
Source “dxv.xml” $S
Navigate $S, SONG/ROW$song
Navigate $song, PID/text()$sPID
ThetaJoin $pIID=$sPID
Source “dxv.xml” $P
Navigate $P, PLAY/ROW$play
Navigate $play, POSITION/text()$pPos
OrderBy $pPos
Tagger <RECORDLIST> $dataPlayTag</ RECORDLIST >$record
Navigate $play, IID/text()$pIID
1
2
3
4
5
6
7
8
9
1110
12
13
14 15
16
17
18
19
IDEAS’0514
User Query XATUser Query XAT
<RESULT>FOR $record in document(“record.xml")RETURN <SONG>
$record/PLAY/SONG[2]/text() </SONG></RESULT>
GroupBy $record, $uPlay
Combine $uDataSongTag
Tagger <RESULT> $uDataSongTag</RESULT>$result
Tagger <SONG> $uSongData</SONG>$uDataSongTag
Navigate $uRecord, PLAY$uPlay
Navigate $uPlay, SONG$uSong
Navigate $uSong, text()$uSongData
Select $uNumPos=2
Source “record.xml” $P
20
21
22
23
24
25
26
27
28
POS $uSong $uNumPos
IDEAS’0515
GroupBy $record, $uPlay
Combine $uDataSongTag
Tagger <RESULT> $uDataSongTag</RESULT>$result
Tagger <SONG> $uSongData</SONG>$uDataSongTag
Navigate $uRecord, PLAY$uPlay
Navigate $uPlay, SONG$uSong
Navigate $uSong, text()$uSongData
Select $uNumPos=2
Source “record.xml” $P
20
21
22
23
24
25
26
27
28
POS $uSong $uNumPos
User XAT $P=$record
Composed XATComposed XAT
Combine $dataPlayTag
Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag
GroupBy $play
Combine $dataSongTag
Navigate $song, SONG_PCDATA/text()$sData
Tagger <SONG>$sData</SONG>$dataSongTag
Navigate $song, POSITION/text()$sPos OrderBy $sPos
GroupBy $play
Source “dxv.xml” $S
Navigate $S, SONG/ROW$song
Navigate $song, PID/text()$sPID
ThetaJoin $pIID=$sPID
Source “dxv.xml” $P
Navigate $P, PLAY/ROW$play
Navigate $play, POSITION/text()$pPos
OrderBy $pPos
Tagger <RECORDLIST> $dataPlayTag</ RECORDLIST >$record
Navigate $play, IID/text()$pIID
1
2
3
4
5
6
7
8
9
1110
12
13
14 15
16
17
18
19
View XAT
IDEAS’0516
XAT Optimization – Order ExplicitXAT Optimization – Order Explicit
Why? Order in user XAT depends on the implicit order in the view It blocks further optimization: Computation push down
IDEAS’0517
XAT Optimization – Order ExplicitXAT Optimization – Order Explicit
Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag
GroupBy $play
Combine $dataSongTagTagger <SONG>$sData</SONG>
$dataSongTag
View XAT
construct SONG
construct PLAY
GroupBy $record, $uPlay
Select $uNumPos=2
POS $uSong $uNumPos
For each PLAY
Sort SONGs
Pick second song
User XAT
Depend onCannot push down! Cannot translated into SQL!
IDEAS’0518
XAT Optimization – Order ExplicitXAT Optimization – Order Explicit
Goal: Convert user query order FROM: implicit order in the XML view TO: Explicit order-code column in relational encoding
POS$uSong = POS$sPos
POS $uSong $uNumPos
GroupBy $record, $uPlay22
23
View Portion XAT
Navigate $song, POSITION/text()$sPos OrderBy $sPos
GroupBy $play
10 11
12
POS $sPos $uNumPos
GroupBy $play22
23
User Portion XAT
View Portion XAT
Navigate $song, POSITION/text()$sPos OrderBy $sPos
GroupBy $play
10 11
12
User Portion XAT
IDEAS’0519
SQL-oriented XAT optimizationSQL-oriented XAT optimization
Goal: Optimize XAT for efficient order-sensitive SQL generation
Rules: Computation push-down
– Push as much as possible to RDB
Order pull-up – Sort as late as possible
– Avoid re-sorting !!!
Order-step rewrite – Match RDB order template
IDEAS’0520
Optimized XATOptimized XAT
Navigate $song, POSITION/text()$sPos
OrderBy $sPos
GroupBy $play
Source “dxv.xml” $S
Navigate $S, SONG/ROW$song
Navigate $song, PID/text()$sPID
ThetaJoin $pIID=$sPID
Source “dxv.xml” $P
Navigate $P, PLAY/ROW$play
Navigate $play, POSITION/text()$pPos
Navigate $play, IID/text()$pIID
1
2
3
5
6
7
8
9 11
10
12
OrderBy $sPos, $pPos4
GroupBy $pPos
Combine $uDataSongTag
Tagger <RESULT> $uDataSongTag</RESULT>$result
Tagger <SONG> $sData</SONG>$uDataSongTag
Select $uNumPos=2
22
23
24
26
27
28
POS $sPos $uNumPos
Navigate $song, SONG_PCDATA/text()$sData
13Computation push down
Order pull up
OrderStep rewriteOrderStep [$pPos], [$pPos, $sPos]
$uNumPos
IDEAS’0521
Navigate $song, POSITION/text()$sPos
OrderStep [$pPos], [$pPos, $sPos] $uNumPos
Source “dxv.xml” $S
Navigate $S, SONG/ROW$song
Navigate $song, PID/text()$sPID
ThetaJoin $pIID=$sPID
Source “dxv.xml” $P
Navigate $P, PLAY/ROW$play
Navigate $play, POSITION/text()$pPos
Navigate $play, IID/text()$pIID
1
2
3
5
6
7
8
9
10
29
OrderBy $sPos, $pPos4
Combine $uDataSongTag
Tagger <RESULT> $uDataSongTag</RESULT>$result
Tagger <SONG> $sData</SONG>$uDataSongTag
Select $uNumPos=224
26
27
28
Navigate $song, SONG_PCDATA/text()$sData
13
Optimized XATOptimized XAT
IDEAS’0522
TEMPLATE:SELECT row_number() over(<PARTITION>?<ORDERBY>) $pos_func_bindingFROM <TABLE>+
PARTITION: partition by <ELEMENT>
ORDERBY:order by <TONUMBER>|<ELEMENT>
TONUMBER: to_number(<ELEMENT>)
ELEMENT:element name
TABLE:table name | TEMPLATE
Order TemplateOrder Template
SQL-99 standard Oracle, DB2 …
Order Template
IDEAS’0523
Order-sensitive SQL generationOrder-sensitive SQL generation
About push-down strategies In general ----
push as much computation as possible into relational engine.
In order scenario --- tradeoff Deep push:
– Push OrderStep into Relational Engine
– Relational engine has to support order template (SQL99)
Q5 = SELECT Q2.sData FROM
(SELECT Q1.pPos, Q1.sPos, Q1.sData,row_number() OVER (PARTITION BY Q1.pPos ORDER BY Q1.sPos) uNumPos
FROM(SELECT P.POSITION AS pPos, S.POSITION AS sPos,
S.SONG_PCDATA AS sData FROM PLAY P, SONG SWHERE P.IID = S.PID) Q1
) Q2WHERE Q2.uNumPos = 2ORDER BY Q2.pPos, Q2.sPos
SQL Q5$sData
32
Combine $uDataSongTag
Tagger <RESULT> $uDataSongTag</RESULT>$result
Tagger <SONG> $sData</SONG>$uDataSongTag
26
27
28
IDEAS’0524
Order-sensitive SQL generationOrder-sensitive SQL generation
Shallow push (otherwise)– leave OrderStep outside RDB
– No requirement for Relational engine
for supporting order template (SQL99)
SELECT P.POSITION AS pPos, S.POSITION AS sPos, S.SONG_PCDATA AS sData
FROM PLAY P, SONG SWHERE P.IID = S.PID
OrderStep [$pPos], [$pPos, $sPos] $uNumPos
29
OrderBy $sPos, $pPos4
Combine $uDataSongTag
Tagger <RESULT> $uDataSongTag</RESULT>$result
Tagger <SONG> $sData</SONG>$uDataSongTag
Select $uNumPos=224
26
27
28
SQL Q1$sData
IDEAS’0525
Deep Push vs. Shallow PushDeep Push vs. Shallow Push
Low selectivitysimilar
High selectivity Shallow push is better Repeated sorting in deep
push is expensive!
IDEAS’0526
Experimental StudyExperimental Study
SQL Execution time --- Global vs. Local order encoding
IDEAS’0527
Discussion: Further SQL optimizationDiscussion: Further SQL optimization
General SQL optimization can be applied… Cost-based SQL translation (SilkRoute) Any other SQL optimization…
When order encoding is assumed… SQL statements can be simplified by avoiding re-ordering
When relational database schema is aware … Schema specific SQL optimization [KKN2002]
IDEAS’0528
Related WorkRelated Work
XQuery-to-SQL translation systems: XPERANTO, SilkRoute, …
[TVB2002] I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database System. In SIGMOD, 2002. Three order encoding methods are utilized Algorithms of translating ordered XPath expressions into SQL
But …
[KKN2002] R. Krishnamurthy, R. Kaushik, and J. F. Naughton. Optimizing Fixed-Schema XML to SQL Query Translation. In VLDB, 2002.
IDEAS’0529
ConclusionConclusion
Propose a general framework for order-sensitive XQuery-to-SQL translation (XSOT)
Propose order-sensitive XML algebra Tree (XAT)
SQL-oriented order-sensitive XAT optimization
Efficient order SQL statements generation and optimization techniques
Implementation using Rainbow query engine
Experiments to verify the generality and SQL performance