![Page 1: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/1.jpg)
Efficiently Publishing Relational Data as XML Documents
Jayavel Shanmugasundaram et al.
Proceedings -VLDB 2000, Cairo.
![Page 2: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/2.jpg)
What drove them? No…it wasn’t the chaffeur… XML rapidly emerging as a global
standard Large amount of data stored in
RDBMS and needs to be exchanged
![Page 3: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/3.jpg)
Primary Issues Language specification Implementation – what method
works best? Adding TAG and STRUCTURE -
when do you do these operations?
![Page 4: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/4.jpg)
Roadmap Language specification Implementation
Early tagging, structuring Late tagging, structuring Early structure, late tagging
Performance Evaluation
![Page 5: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/5.jpg)
Our little sample…<customer id=C1> <name> Dilys Thomas </name> <accounts> <account id=A1>34552211233</account> <account id=A2>98653412223</account> <! Dilys – we know where your money is…> </accounts> <porders> <porder id=PO1 acct=A1> <items> <item id=I1>Gift for Consulate VISA Woman</item> <item id=I1>Traveller’s cheques</item> </items> <payments> <payment id=P1> due Feb 12 </payment> <!that’s today!!> </payments> </porder> </porders></customer>
Note the•Elements•Names/Tags•ID Refs
![Page 6: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/6.jpg)
Underlying tablesCustomer(id int, name varchar)
Account(id varchar, custID int, acctnum int)
Item(id int, poID int, desc varchar)
PurchOrder(id int, custID int, acctID varchar, date varchar)
Payment(id int, poID int, desc varchar)
![Page 7: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/7.jpg)
SQL-based language spec.
Sqlfunctions: Define XMLConstruct ITEM(id int, desc varchar) AS {
<item id=$id>$desc </item>
}
Sqlaggregates: Select XMLAGG(ITEM(id, desc))
From Item
// returns an XML aggregation of items
![Page 8: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/8.jpg)
Sample querySelect cust.name, CUST(cust.id, cust.name, (Select XMLAGG(ACCT(acct.id, acct.acctnum)) From Account acct Where acct.custId=cust.id), (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, (Select XMLAGG(ITEM(item.id, item.desc)) From Item item Where item.poid=porder.id) (Select XMLAGG(PAYMENT(pay.id,pay.desc)) From Payment pay, Where pay.poid=porder.id))) From PurchOrder porder Where porder.custID=cust.id))From Customer cust
Constructs XML from the relational tables.
![Page 9: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/9.jpg)
Roadmap Language specification Implementation
Early tagging, structuring Late tagging, structuring Early structure, late tagging
Performance Evaluation
![Page 10: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/10.jpg)
Implementation alternatives
Late/early tagging Late/early structuring (No late structuring+early tagging)
DB Result
TAG STRUCTURE
![Page 11: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/11.jpg)
Early tagging and structuring Stored Procedure
Explicitly issue nested queries Get corresponding nested data using
other queries Done outside relational engine. Tag/str
as soon as results are available. Too many queries per tuple. Fixed order (nested loop join)
![Page 12: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/12.jpg)
Contd… Correlated CLOB
Push queries into the engine Plug in XMLAGG, XMLCONSTRUCT support into
engine Have to handle huge CLOBS in the engine Fixed join order
Decorrelated CLOB Decorrelate and use Outer Joins – no longer
fixed order Still carry around CLOBs (due to early
tagging!)
![Page 13: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/13.jpg)
Roadmap Language specification Implementation
Early tagging, structuring Late tagging, structuring Early structure, late tagging
Performance Evaluation
![Page 14: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/14.jpg)
Late tagging and Structuring2 phases -> Content creation +
Tagging/Structuring Redundant Relation:
Blindly join all constituent tables ‘Parent’ data repeated
Unsorted Outer Union: Decorrelate query, compute common
subexpressions and use Outer Joins Take an Outer Union of result tables Columns grow with width/depth of XML
doc. - Path Outer Union
![Page 15: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/15.jpg)
Contd… Alternatively, don’t repeat Parent
node at every child. Feed parent into Outer Union and
only keep parent Ids with children. – Node Outer Union
Greatly increases no of tuples generated
![Page 16: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/16.jpg)
Outer UnionNote: Outer Joins to retain parents.
OU• Separate column in result for each column of input• Unused cols set to NULL• Type column added for each row.
![Page 17: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/17.jpg)
Contd…2 phases -> Content creation +
Tagging/Structuring Inside the Engine
XMLAGG, XMLCons support required Final step after content generated CLOBs not carried around
Outside the Engine GROUP Siblings Eliminate Duplicates Extract info and TAG
![Page 18: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/18.jpg)
Grouping data
HASH!Every row in the final table has a column with name of
element with all parents (a.b.c.d.e) Check if <a.b.c.d> hashes true, TAG accordingly and
add as another child at that level Else check if <a.b.c> hashes true, add <a.b.c.d> and
then <a.b.c.d.e> And so on…till you either find hash or hit root element. Tuples can come in any order. Sufficient mem
required!
![Page 19: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/19.jpg)
Roadmap Language specification Implementation
Early tagging, structuring Late tagging, structuring Early structure, late tagging
Performance Evaluation
![Page 20: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/20.jpg)
Late tagging, Early Structuring As before…only, now SORT the
outer union Ensure
Parent info comes before child Info about node and desc. completed
before any other node info starts Ordering follows user-def condns
![Page 21: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/21.jpg)
Sort and TagSort on Pkeys Define an order on Pkeys (CustID, AcId, POId,
ItemId, PaymentID) – based on structure of XML Doc.
Parent tuples will have filled values for first few cols and null for the later ones
Nulls sort lowTag in constant memory Maximum amount of info to be stored is
proportional only to the depth of the XML Doc.
![Page 22: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/22.jpg)
Roadmap Language specification Implementation
Early tagging, structuring Late tagging, structuring Early structure, late tagging
Performance Evaluation
![Page 23: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/23.jpg)
Modeling transformations Query fanout Query nesting depth
# Root nodes (tuples in the root table)
# Leaf tuples (tuples corresp to all leaf nodes)
Structure
Result size
![Page 24: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/24.jpg)
Results (graph time!!)
• “Inside the engine” versions are about 3 times as fast as “outside” counterparts
![Page 25: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/25.jpg)
Inside –vs- Outside the engine
• Query Execution• Bind out • Tag/Structure• Write XML to file
Bind-out time not required for Inside Engine approaches
![Page 26: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/26.jpg)
Query Fan out Increasing QFO -> Greater Joins ->
More time CorrCLOB has to use Nested Loop Join
Order – bad performance Unsorted OU better than Sorted OU.
Sorting cost > Cost of complex tagging DecorrCLOB – optimized by DB2
engine. CLOBs retained in memory (low fanout)
![Page 27: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/27.jpg)
Query DepthDecorrCLOB – huge increase!• Complexity of queries increases. • Engine makes bad choices (sorting after XMLAGG etc)
![Page 28: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/28.jpg)
Number of Roots• Outer Union approaches not affected• CorrCLOB at #root=1 equiv to just 2 queries!
![Page 29: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/29.jpg)
Number of tuples, Memory If sufficient memory, no great
changes! If not, Unsorted OU which requires
large space for tagging, fails. Overflow!
Sorted OU – based on scalable sorting. Adapt to large size and less mem better.
![Page 30: Efficiently Publishing Relational Data as XML Documents](https://reader033.vdocuments.us/reader033/viewer/2022051018/568144c8550346895db18f5d/html5/thumbnails/30.jpg)
Roadmap Language specification Implementation
Early tagging, structuring Late tagging, structuring Early structure, late tagging
Performance Evaluation Quo vadis?