adt 2010 xquery updates in monetdb/xquery other approaches...
TRANSCRIPT
1
[email protected] MonetDB/XQuery: Updates ADT 2010
ADT 2010ADT 2010
XQuery Updates in MonetDB/XQueryXQuery Updates in MonetDB/XQuery
&&
Other Approaches to XQuery ProcessingOther Approaches to XQuery Processing
Stefan [email protected]
http://www.cwi.nl/~manegold/
2
[email protected] MonetDB/XQuery: Updates ADT 2010
• 09.11.2010:
•RDBMS back-end support for XML/XQuery (1/2):
•Document Representation (XPath Accelerator, Pre/Post plane)
• 16.11.2010:
•XPath navigation (Staircase Join)
•XQuery to Relational Algebra Compiler:
•Item- & Sequence- Representation
•Efficient FLWoR Evaluation (Loop-Lifting)
•Optimization
• 23.11.2010:
•RDBMS back-end support for XML/XQuery (2/2):
•Updateable Document Representation
•Other (DB-) approaches to XML/XQuery processing
ScheduleSchedule
3
[email protected] MonetDB/XQuery: Updates ADT 2010
XQuery Update Facility 1.0 W3C Candidate Recommendation http://www.c3.org/TR/xquery-update-10/
• Categorize updates into• Value updates• Structural updates
(MonetDB/XQuery does not yet support the latest syntax changes made by W3C; for details see
http://monetdb.cwi.nl/XQuery/Documentation/XQuery-Updates.html)
XML/XQuery UpdatesXML/XQuery Updates
4
[email protected] MonetDB/XQuery: Updates ADT 2010
do replace value of fn:doc("bib.xml")/books/book[1]/pricewith fn:doc("bib.xml")/books/book[1]/price * 1.1
do replace value of fn:doc(“bib.xml”)/books/book[2]/@isbnwith “90-6196-517-9”
do rename fn:doc(“bib.xml”)/books/book[3]/author[1]into “primary-author”
do rename fn:doc(“bib.xml”)/journals/journal[9]/@isbninto “issn”
=> map directly to simple value updates in relational storage
Value UpdatesValue Updates
5
[email protected] MonetDB/XQuery: Updates ADT 2010
do insert attribute isbn {“90-6196-517”}into fn:doc("bib.xml")/books/book[17]
do delete fn:doc(“bib.xml”)/books/book[2]/@wrong
do insert <author>Stefan Manegold</author>after fn:doc(“bib.xml”)/books/book[33]/author[last()]
do replace fn:doc(“bib.xml”)/books/book[44]/author[1]with fn:doc(“bib.xml”)/books/book[33]/author[last()]
do delete fn:doc(“bib.xml”)/books/book[author = “Kermit”]
=> How to implement on pre-/post-encoding?
Structural UpdatesStructural Updates
6
[email protected] MonetDB/XQuery: Updates ADT 2010
XML/XQuery XML/XQuery UpdatesUpdates
do insert <k><l/><m/></k> as first into /a/f/g
7
[email protected] MonetDB/XQuery: Updates ADT 2010
XML/XML/XQuery XQuery UpdatesUpdates
do insert <k><l/><m/></k> as first into /a/f/g
10
[email protected] MonetDB/XQuery: Updates ADT 2010
XML/XML/XQuery XQuery UpdatesUpdates
StaircaseStaircaseJoinJoin
11
[email protected] MonetDB/XQuery: Updates ADT 2010
XML Storage RevisitedXML Storage Revisited
N9N8N7
N6N5N4N3N2nullnullN1N0nid
147
null03
30113010229
208
306305224
null-121510110
levelsizerid
309308227206145304303222131090
levelsizepre
null-12nullnull3
30113010229208147306305224
1510110
levelsizepre
69j58i77h46g85f14e03d22c31b90a
postpre
post = pre + size - level
Allow holes Define logical pages
12
[email protected] MonetDB/XQuery: Updates ADT 2010
XML Storage RevisitedXML Storage Revisited
N5N4N3
N2N9N8N7N6nullnullN1N0nid
307
null03
14113010309
228
306225204
null-121510110
levelsizerid
309308227206145304303222131090
levelsizepre
null-12nullnull3
30113010229208147306305224
1510110
levelsizepre
69j58i77h46g85f14e03d22c31b90a
postpre
post = pre + size - level
Allow holes Define logical pages
122100
mappage
rid = pre.swizzle( )
13
[email protected] MonetDB/XQuery: Updates ADT 2010
XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column
MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join
Opportunity currently not exploited by other RDBMS
Occurs widely in our XQuery translation.
N5N4N3
N2N9N8N7N6nullnullN1N0nid
307
null03
14113010309
228
306225204
null-121510110
levelsizerid
14
[email protected] MonetDB/XQuery: Updates ADT 2010
XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column
MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join
Opportunity currently not exploited by other RDBMS
Occurs widely in our XQuery translation.
N5N4N3
N2N9N8N7N6nullnullN1N0nid
307
null03
14113010309
228
306225204
null-121510110
levelsizerid
15
[email protected] MonetDB/XQuery: Updates ADT 2010
MonetDB/XQueryMonetDB/XQueryOur own XML DBMS with (almost..) full XQuery support.• Built purely on an RDBMS, namely MonetDB
Pathfinder compiler & “staircase join”:– Universität Tübingen (Torsten Grust, et al.)
– Technical University Twente (Maurice van Keulen, et. al.)
MonetDB High-Performance DBMS– CWI Amsterdam (Peter Boncz, Stefan Manegold, ...)
Useful for:
• Large XML databases!
• Querying XML annotations (multimedia, forensic NFI)
• XML information retrieval
• ...
Pathfinder Compiler
RelationalAlgebra
XQuery
RDBMS
(MonetDB)
16
[email protected] MonetDB/XQuery: Updates ADT 2010
Research Projects & ExtensionsResearch Projects & Extensions• Value indeces
• Runtime optimization• SIGMOD'09 [Abdel Kader, Boncz, v. Keulen Manegold]
• Algebraic Query Optimization• Grust, Rittinger, et al. (Universität Tübingen)
• Distributed XQuery P2P XQuery• SOAP group communication, XQuery RPC
• VLDB'07 [Zhang, Boncz]
• Benchmarking beyond XMark• ExpDB'06 Workshop [Manegold]
• Support for XML Interval Annotations• XIME-P'06 Workshop [Alink et al.]
• Xquery + Information Retrieval: PF/Tijah
17
[email protected] MonetDB/XQuery: Updates ADT 2010
ConclusionsConclusions• Relational approach can be scalable & fast
• MonetDB/XQuery compares favorably with all other available systems
• Techniques that made it work• Property-driven peephole optimization
Order & other properties
• Loop-lifted XPath steps Evaluate Sets of context nodes in a single pass
• Support for dense (autoincrement) keys Positional lookup
• Background Information & Literaturehttp://monetdb-xquery.orghttp://pathfinder-xquery.org
18
[email protected] Other Xquery Processing Approaches ADT 2010
• 09.11.2010:
•RDBMS back-end support for XML/XQuery (1/2):
•Document Representation (XPath Accelerator, Pre/Post plane)
• 16.11.2010:
•XPath navigation (Staircase Join)
•XQuery to Relational Algebra Compiler:
•Item- & Sequence- Representation
•Efficient FLWoR Evaluation (Loop-Lifting)
•Optimization
• 23.11.2010:
•RDBMS back-end support for XML/XQuery (2/2):
•Updateable Document Representation
•Other (DB-) approaches to XML/XQuery processing
ScheduleSchedule
19
[email protected] Other Xquery Processing Approaches ADT 2010
TopicsTopics Other approaches & techniques (selection, far from complete!)
Document storage / tree encoding:
ORDPATH
DataGuides
XPath processing:
Tree patterns, holistic twig joins
20
[email protected] Other Xquery Processing Approaches ADT 2010
Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are
Good for read(-only) processing
small footprint, positional lookup, staircase join
But inherently static
Milo et al., PODS 2002:
“There is a sequence of updates (subtree insertions) for any persistent tree encoding scheme E (where each node keeps its initial encoding label even under updates), such that E needs labels of length (N) to encode the resulting tree of N nodes.”
21
[email protected] Other Xquery Processing Approaches ADT 2010
Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are
Good for read(-only) processing
small footprint, positional lookup, staircase join
But inherently static
Milo et al., PODS 2002:
“There is a sequence of updates (subtree insertions) for any persistent tree encoding scheme E (where each node keeps its initial encoding label even under updates), such that E needs labels of length (N) to encode the resulting tree of N nodes.”
22
[email protected] Other Xquery Processing Approaches ADT 2010
XML/XQuery XML/XQuery UpdatesUpdates
do insert <k><l/><m/></k> as first into /a/f/g
23
[email protected] Other Xquery Processing Approaches ADT 2010
XML/XML/XQuery XQuery UpdatesUpdates
MonetDB/XQuery
hack:
exploit paging
& mmap trick
but:
updating pg|off
is still O(N)
do insert <k><l/><m/></k> as first into /a/f/g
24
[email protected] Other Xquery Processing Approaches ADT 2010
Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are
Good for read(-only) processing
small footprint, positional lookup, staircase join
But inherently static
Non-solutions:
Gaps in the encoding (never large enough)
Encoding based on decimal fractions (limited precision)
Possible solution:
Variable-width tree encodings:
Cheaper updates
At the expense of more expensive read(-only) processing
25
[email protected] Other Xquery Processing Approaches ADT 2010
A Variable-Width Tree Encoding: ORDPATHA Variable-Width Tree Encoding: ORDPATH
O'Neil et al., SIGMOD 2004.
26
[email protected] Other Xquery Processing Approaches ADT 2010
ORDPATH Encoding: ExampleORDPATH Encoding: Example
27
[email protected] Other Xquery Processing Approaches ADT 2010
ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings
28
[email protected] Other Xquery Processing Approaches ADT 2010
ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings
29
[email protected] Other Xquery Processing Approaches ADT 2010
ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings
30
[email protected] Other Xquery Processing Approaches ADT 2010
Is ORDPATH suitable for XQuery?Is ORDPATH suitable for XQuery?• Mapping core operations of the XQuery processing model
to operations on ORDPATH labels:
31
[email protected] Other Xquery Processing Approaches ADT 2010
ORDPATH: Variable-Length Node EncodingORDPATH: Variable-Length Node Encoding• For a 10 MB XML sample document, the authors of ORDPATH observed
label lengths between 6 and 12 bytes.
• ORDPATH labels encode root-to-node paths => common prefixes.
=> Label comparisons often need to inspect encoding bits at the far right.
•
• MS SQL Server employs further path encodings organized in reverse
(node-to-root) order.
• Note: - Preorder ranks fit into CPU registers.- 4 byte pre's sufficient for 232 = 4G nodes (11 GB XMark fits easily).- 8 byte pre's sufficient for 264 nodes, i.e., “the universe” ...
32
[email protected] Other Xquery Processing Approaches ADT 2010
TopicsTopics Other approaches & techniques (selection, far from complete!)
Document storage / tree encoding:
ORDPATH
DataGuides
XPath processing:
Tree patterns, holistic twig joins
33
[email protected] Other Xquery Processing Approaches ADT 2010
DataGuidesDataGuides
XPath Accelerator, ORDPATH & similar encoding schemes
encode the document's tree structure in the node ranks/labels
they assign
DataGuides
Developed in the context of Lore project (DBMS for semi-
structured data)
Stanford University, Goldman & Widom, VLDB 1997
encode the document's tree structure in relation names
Observation:
Each node is uniquely identified by its path from the root
Paths of siblings with equal tag names can be unified,
Provided we keep their relative order (rank) explicitly
34
[email protected] Other Xquery Processing Approaches ADT 2010
DataGuidesDataGuides
Definition
given a semistructured data instance DB, a DataGuide for DB is a graph G s.t.:
- every path in DB also occurs in G
- every path in G occurs in DB
- every path in G is unique
36
[email protected] Other Xquery Processing Approaches ADT 2010
■ Multiple DataGuides for the same data:
DataGuidesDataGuides
37
[email protected] Other Xquery Processing Approaches ADT 2010
DefinitionLet p, p’ be two path expressions and G a graph; we define
p ≡ G p’ if p(G) = p’(G)
i.e., p and p' are indistinguishable on G.
DefinitionG is a strong dataguide for a database DB if ≡ G is the same as ≡ DB
Example:- G1 is a strong dataguide- G2 is not strong
person.project !≡ DB dept.project
person.project !≡ G1 dept.project
person.project ≡ G2 dept.project
DataGuidesDataGuides
38
[email protected] Other Xquery Processing Approaches ADT 2010
■ Constructing the strong DataGuide G:
Nodes(G)={{root}}
Edges(G)=∅
while changes do
choose s in Nodes(G), a in Labels
add s’={y|x in s, (x -a->y) in Edges(DB)} to Nodes(G)
add (x -a->y) to Edges(G)
• Use hash table for Nodes(G)
• This is precisely the powerset automaton construction.
DataGuidesDataGuides
39
[email protected] Other Xquery Processing Approaches ADT 2010
Monet XML approachMonet XML approach
Early attempt to store and query XML data in MonetDB
By Albrecht Schmidt
Not related to Pathfinder & MonetDB/XQuery
40
[email protected] Other Xquery Processing Approaches ADT 2010
Monet XML approachMonet XML approach
41
[email protected] Other Xquery Processing Approaches ADT 2010
Monet XML approachMonet XML approach
42
[email protected] Other Xquery Processing Approaches ADT 2010
Monet XML approachMonet XML approachMonet XML approachMonet XML approach
43
[email protected] Other Xquery Processing Approaches ADT 2010
Monet XML approachMonet XML approach
Early attempt to store and query XML data in MonetDB
By Albrecht Schmidt
Not related to Pathfinder & MonetDB/XQuery
No XQuery compiler
XMark queries are hand-crafted and -optimized in MIL
Child, Descendant, Parent & Ancestor steps become regular
expressions on the relation names (i.e., catalog)
Open: preceeding & following steps?
44
[email protected] Other Xquery Processing Approaches ADT 2010
TopicsTopics Other approaches & techniques (selection, far from complete!)
Document storage / tree encoding:
ORDPATH
DataGuides
XPath processing:
Tree patterns, holistic twig joins
45
[email protected] Other Xquery Processing Approaches ADT 2010
Twig Join AlgorithmsTwig Join Algorithms
So far: interpreted XPath expressions in an imperative manner
Evaluated XPath expressions step-by-step, as stated in the query
Given /1::1/2::2/.../n::n,
we first evaluated /, then XPath step 1::1, then step 2::2, ...
This may not always be the best choice:
Intermediate results can get very large, even if the final result is
small:
Database context => think in a declarative manner
DBMS optimizer / engine can evaluate query in “best” order
46
[email protected] Other Xquery Processing Approaches ADT 2010
Tree PatternsTree Patterns
In fact, XPath is a declarative language. /descendant::timeline/child::event
“Find all nodes v1, v2, and v3, such that
v1 is a document root,
v2 is a descendant element of v1 and is named timeline, and
v3 is a child element of v2 and named event.
All nodes of type v3 form the query result.
Observe the combination of
(a) predicates on single nodes, and
(b) structural conditions between these nodes.
47
[email protected] Other Xquery Processing Approaches ADT 2010
Tree PatternsTree Patterns
Structural conditions: Intuitively expressed as tree patterns:
Nodes labeled with node predicates
Structural conditions:
Double line: ancestor/descendant relationships
Single line: parent/child relationships
Arbitrary predicates are allowed, but typical are predicate on tag names:
Nodes labeled with requested tag name
Document root: label /
If not /-node specified:
search for pattern anywhere in the documenttimeline
event
48
[email protected] Other Xquery Processing Approaches ADT 2010
Tree PatternsTree Patterns
timeline
event
49
[email protected] Other Xquery Processing Approaches ADT 2010
Tree PatternsTree Patterns
Not limited to path patterns
May also be twig patterns
Mapping between tree patterns and XPath is in general not trivial
Examples:
a
b d
ec
f
g
h i
j
50
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack AlgorithmPathStack Algorithm
51
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
abcde
d
e
52
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
53
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
54
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
55
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
timeline
timeline timeline
event
first timeline node visited
second timeline node visited
first event node visited
timeline
event
56
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
57
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
58
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
59
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
60
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
61
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
62
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
63
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
64
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns
65
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns
So far we only considered path patterns
Can we extend our ideas for efficient twig pattern evaluation?
Idea:
Decompose twig patterns into multiple path patterns.
All path patterns start from the same root.
Use PathStack for each of them and merge their results.
66
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns
Example: Decompose twig pattern into path patterns
a
b
c d
e
67
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns
Example: Decompose twig pattern into path patterns
a
b
c d
e
a a
b b
c d
e
68
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns
69
[email protected] Other Xquery Processing Approaches ADT 2010
PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns
70
[email protected] Other Xquery Processing Approaches ADT 2010
Summary (1/5)Summary (1/5) XML
Document markup
Data exchange
Semi-structured
Tree model
DTDs
XML Schema
XPath
Navigation, location steps, axes, node tests, predicates, functions
XQuery
Sequences & Iterations (FLWoR expressions)
71
[email protected] Other Xquery Processing Approaches ADT 2010
Summary (2/5)Summary (2/5) XML Data Management
XML file processors
XML databases
XML integration platforms
RDBMS with XML functionality, SQL/XML
Relational XML storage: schema-based vs. schema-oblivious
72
[email protected] Other Xquery Processing Approaches ADT 2010
Summary (3/5)Summary (3/5) Purely Relational XML/XQuery processing: MonetDB/XQuery
Document encoding: XPath Accelerator (pre/post plane)
XPath navigation: Staircase Join
XQuery to Relational Algebra translation
Item- & Sequence-representation
Iterations: Loop-lifting
Loop-lifted staircase join
Peephole Optimization
Order-awareness, sort avoidance
XML/XQuery Update Support
73
[email protected] Other Xquery Processing Approaches ADT 2010
Summary (4/5)Summary (4/5) Other approaches & techniques
Document storage/encoding:
ORDPATH
DataGuides
XPath processing:
Tree patterns, holistic twig joins
74
[email protected] Other Xquery Processing Approaches ADT 2010
Summary (5/5)Summary (5/5) Literature
Slides
Literature references in slides
Literature references on website:
http://www.cwi.nl/~manegold/teaching/adt/html/xquery.html
• Tentamen / Exam:
Tuesday December 21 2010
09:00 – 11:00
Zaal / Room: A1.14
75
[email protected] Other Xquery Processing Approaches ADT 2010
Projects: Join the MonetDB Team!Projects: Join the MonetDB Team!• Own ideas, suggestions, initiative welcome!
• Master Student Projects (6 Months)
• Various projects, each consisting of both research & implementation
• See monetdb.cwi.nl/Development/Research/Projects/ for a sample list
• Feel free to come with your own idea(s)!
• Implementation Projects
• Both short-term & long-term
• E.g. open feature requests: sf.net/tracker/?group_id=56967
• Become owner/maintainer of some (new) part of MonetDB
• We are (desperately) looking for Windows SW-development & system
experts!
76
[email protected] Other Xquery Processing Approaches ADT 2010
• 24x7x365 support & advice
• Membership in a kind & friendly Family-Team of Experts
• Chance to participate in & contribute to a large & successful open-source research project
• Lots of experiences, exiting research & fun
• Desk & workstation at CWI
Fridge, micro-wave, free coffee, free soup, free cake (occasionally)
Master Students only (possibly part-time)
Limited availability => FCFS!
Some pocket money (stage vergoeding)
Master Students only
Limited availability => FCFS!
...
We Offer...We Offer...