monetdb/xquery infomgmt 2009 peter boncz [email protected] (cwi amsterdam) querying xml data sources...
Post on 19-Dec-2015
222 views
TRANSCRIPT
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Peter Boncz
[email protected] (CWI Amsterdam)
Querying XML Data Sourcesusing
MonetDB/XQuery
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Extensions:• Keyword Search in XML (PF/Tijah)
• StandOff Annotation
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Extensions• Keyword Search in XML (PF/Tijah)
• StandOff Annotation
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
What is XML?
• The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web.
• Base specifications:• XML 1.0, W3C Recommendation Feb '98
• Namespaces, W3C Recommendation Jan '99
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
A Little Bit Of History
• Database world• 1980 relational databases• 1990 nested relational model and object oriented databases• 2000 semi-structured databases
• Documents world• 1974 SGML (Structured Generalized Markup Language)• 1990 HTML (Hypertext Markup Language)• 1992 URL (Universal Resource Locator)
Data + documents = information1996 XML (Extended Markup Language)URI (Universal Resource Identifier)
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML : A First Look• XML document describing catalog of books
<?xml version="1.0" encoding="ISO-8859-1" ?><catalog> <book isbn="ISBN 1565114302"> <title>No Such Thing as a Bad Day</title> <author>Hamilton Jordan</author> <publisher>Longstreet Press, Inc.</publisher> <price currency="USD">17.60</price> <review> <reviewer>Publisher</reviewer>: This book is the moving account of one man's
successful battles against three cancers ... <title>No Such Thing as a Bad Day</title> is warmly recommended.
</review> </book>
<!-- more books and specifications -->
</catalog>
Mixed content
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML Semantics: a Tree !
<catalog> <book isbn="ISBN 1565114302"> <title>No Such Thing as a Bad Day</title> <author>Hamilton Jordan</author> <publisher>Longstreet Press, Inc.</publisher> <price currency="USD">17.60</price> <review> <reviewer>Publisher</reviewer>: This book is the moving account of one man's successful battles against three cancers ... <title>No Such Thing as a Bad Day</title> is warmly
recommended. </review> </book> <!-- more books and specifications --></catalog>
<catalog> <book isbn="ISBN 1565114302"> <title>No Such Thing as a Bad Day</title> <author>Hamilton Jordan</author> <publisher>Longstreet Press, Inc.</publisher> <price currency="USD">17.60</price> <review> <reviewer>Publisher</reviewer>: This book is the moving account of one man's successful battles against three cancers ... <title>No Such Thing as a Bad Day</title> is warmly
recommended. </review> </book> <!-- more books and specifications --></catalog>
catalog
No Such Thing As A Bad Day
book
title review
reviewertitle
isbn
ISBN1565114302
Elementnode
Textnode
Attributenode
author publisher price
Hamilton Jordan
17.60
Longstreet
Press Inc.
currency
USDTextnodeTextnode
Elementnode
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML Food Chain
Data Exchange FormatXML
Schema, Formatting, Querying, XML Schema, Xpath, XSLT, XQuery, …
Web Services Infrastructure
Business Process FrameworksBEA, IBM WebSphere, WebMethods…
BPEL, BPML, ebXML, RosettaNet, CommerceXML
Enabling Technology:Does NOT solve hard problems Removes excuses for ignoring them
SOAP, WSDL, UDDI
W3C Recommendations
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Killer XML advantages
• Code/schema/data independence
• Covers the continuous spectrum from
totally structured data to documents
from data management to information management
• Unique model for representing data, metadata and code
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XQuery 1.0 • Functional, strongly-typed query language
• XQuery 1.0 = XPath 2.0 for navigation, selection, extraction
+ A few more expressions For-Let-Where-Order By-Return (FLWOR)
XML construction
Operators on types
+ User-defined functions & modules
Modularize large queries
Process recursive data
+ Strong typing
Checks values of required type (operator, function)
Guarantees result value instance of output type
Enforced statically or dynamically
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
FLWR (“Flower”) Expressions
FOR ... sequence expression
LET... variable definition
WHERE... condition
RETURN... result expression
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
FLWR (“Flower”) Expressions
FOR ... sequence expression
LET... variable definition
WHERE... condition
RETURN... result expression
ORDERBY
SELECT… result expressions
FROM... tables
WHERE... condition
GROUP BY...
ORDER BY...
SQL
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Navigation, Selection, Extraction
• Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title <title>No Such Thing As A Bad Day</title>
• Publications with Don Chamberlin as author or editor • $cat//*[(author|editor) = “Don Chamberlin”]
<book><title>XQuery from the Experts</title>…</book>
<spec><title>XQuery Formal Semantics</title>…</spec>
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Transformation & Construction• First author & title of books published by A/W
for $b in $cat//book[publisher = “Addison Wesley”]
return <awbook> { $b/author[1], $b/title } </awbook>
<awbook>
<author>Don Chamberlin</author>
<title>XQuery from the Experts</title>
</awbook>
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
TeXQuery : Full-text extensions• Text search & querying of structured content
• Limited support in XQuery 1.0
• String operators with collation sequences
$cat//book[contains(review/text(), “two thumbs up”)]
• Stop words, proximity searching, ranking
Ex: “Tony Blair” within two words of “George Bush”
• Phrases that span tags and annotations
Ex: Match “Mr. English sponsored the bill” in <sponsor> Mr. English </sponsor> <footnote> for himself and <co-
sponsor> Mr.Coyne </co-sponsor> </footnote> sponsored the bill in the <committee-name> Committee for Financial Services </committee-name>
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML Schema Languages• Many variants…
• DTDs, XML Schema, RELAX-N/G, XDuce
• … with similar goals to define• Types of literal (terminal) data
• Names of elements & attribute
• “Vertical” & “horizontal” structure of documents
• XQuery designed to support (all of) XML Schema• Structural & name constraints over types
• Regular tree expressions over elements, attributes, atomic types
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XQuery Update Facility (XUF)
V1.0 W3C Recommendation
Internal primitives:upd:insertBeforeupd:insertAfterupd:insertIntoupd:insertIntoAsLastupd:insertAttributesupd:deleteupd:replaceValueupd:rename
Pending update list concept
upd:applyUpdates
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
insert
<item id="{id}">
<location>Brazil</location>
<quantity>200</quantity>
<name>XML in a nutshell</name>
<payment>Credit Card, Personal check</payment>
<shipping>Will ship internationally</shipping>
<incategory category="category1"/>
</item>
as last into
fn:doc("xmark.xml")/site/regions/samerica
Example
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XSLT vs. XQuery• XSLT 1.0: XML XML, HTML, Text
• Loosely-typed scripting language• Format XML in HTML for display in browser• Must be highly tolerant of variability/errors in data
• XQuery 1.0: XML XML• Strongly-typed query language• Large-scale database access• Must guarantee safety/correctness of operations on data
• Over time, XSLT & XQuery may both serve needs of many application domains
• XQuery will become a hidden, commodity language
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XSLT vs. XQuery Updates• XSLT
• Transformations, complexity at least linear• Large cost when making a small update to a huge document• Effective when full document is transformed
• XQuery Update Facility• Concurrent access, transactions (“ACID” properties)• Systems optimized for small non-conflicting transactions• Systems break down when update volume is large
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• StandOff Annotations
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• StandOff Annotations
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB
open-source Mozilla license download at monetdb-xquery.org
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery open-source Mozilla license download at monetdb-xquery.org
Pathfinder Project
Torsten Grust, Jens Teubner, Jan Rittinger
Maurice van Keulen
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XPath Accellerator [SIGMOD02]
pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8
g 6 4h 7 7i 8 5j 9 6
Node-based relational encoding of XQuery's data model
<a> <b> <c> <d/> <e/> </c> </b> <f> <g/> <h> <i/> <j/> </h> </f></a>
descendant
ancestor following
preceding
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XPath evaluation (SQL)
Example query: /descendant::open_auction[./bidder]/annotation
SELECT DISTINCT a.pre FROM doc r, doc oa, doc b, doc a WHERE r.pre=0 AND oa.pre > r.pre AND oa.post < r.post AND oa.name = “open_auction” AND oa.kind = “elem” AND b.pre > oa.pre AND b.post < oa.post AND b.level = oa.level + 1 AND b.name = “bidder” AND b.kind < “elem” AND a.pre > oa.pre AND a.post < oa.post AND a.level = oa.level + 1 AND a.name = “annotation” AND a.kind < “elem” ORDER BY a.pre
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Loop-lifted Staircase Join [SIGMOD06]
document
List of context nodes
seek
pre|post are not random numbers: exploit the tree properties encoded in them
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Staircase Join [VLDB03]
document
List of context nodes
scan
pre|post are not random numbers: exploit the tree properties encoded in them
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
document
List of context nodes
skip
pre|post are not random numbers: exploit the tree properties encoded in them
Loop-lifted Staircase Join [SIGMOD06]
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
pre|post are not random numbers: exploit the tree properties encoded in them
document
List of context nodes
seek
seek scan skip seek scan skip ...
Loop-lifted Staircase Join [SIGMOD06]
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
• skipping: avoid touching node ranges that cannot contain results
Generate a duplicate-free result in document order • pruning: reduce the context set a-priori• partitioning: single sequential pass over the document
document
List of context nodes
seek
seek scan skip seek scan skip ...
Loop-lifted Staircase Join [SIGMOD06]
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Relational Algebra
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Sequence Representation
• sequence = table of items• add pos column for maintaining order• ignore polymorphism for the moment
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
For-loops: the iter column
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Loop-lifting
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Full Example
join calc project
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XQuery On SQL Hosts [VLDB04]
XQuery Construct
sequence construction
if-then-else
for-loops
calculations
list functions, e.g. fn:first()
element construction
XPath steps
Relational Mapping
A union B
select(cond=true,B) union select(A=false,C)
cartesian product
project(A,x=expr(Y1,..Yn)
select(A,pos=1)
updates in temporary tables
staircase-join
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XMark Query 8let $auction := doc("auctions.xml") return for $p in $auction/site/people/person let $a := for $t in $auction/site/ closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $treturn <item person="{$p/name/text()}"> {count($a)} </item>
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Peephole Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Peephole Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Peephole Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Peephole Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Peephole Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Peephole Optimization• Plan simplification
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Performance Evaluation
Extensive performance Evaluation on XMark• data sizes 110KB, 1MB, 11MB, 110MB, 1.1GB, 11GB• MonetDB/XQuery, Galax0.6, X-Hive 6.0, Berkeley DB XML 2.0, eXisT• 8GB RAM
Extensive XMark performance Literature Overview• IPSI-XQ v1.1.1b , Dynamic Interval Encoding , Kweelt , QuiP, FluX , TurboXPath, Timber, Qizx/Open (Version 0.4/p1), Saxon (Version 8.0), BEA/XQRL, VX• Crude comparison (normalized by CPU SPECint)
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XMark Benchmark (sigmod 2006)
1MB XML 1GB XML
Galax
X-Hive
BerkeleyDB XML
eXisT
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics:• Keyword Search in XML (PF/Tijah)
• StandOff Annotations
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• StandOff Annotations
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics:• Keyword Search in XML (PF/Tijah)
• StandOff Annotations
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• Standoff Annotations
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML Annotation Use Cases
• multimedia objects (MPEG-7, SMIL)
• which text is spoken in which video shot?
• hard-drive dumps (forensic data analysis)
• which telephone numbers occur in emails from X at date Y?
• DNA sequences (bioinformatics)• who annotated the same DNA region?
• text (Natural Language Processing)• return phrases that have “Eiffel Tower” as subject
and in the object contain “meter”
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
StandOff Joins
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
StandOff Joins
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Loop-lifted StandOff Join
5 10 20 30 40 50 60 70 80
r1
c1
c2
r2
c3
r3
c4
r4
iter1 r1
result
iter1 r4
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Evaluation
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [ <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#"> ]>
<rdf:RDF
xmlns:H3K4Me3="ENCODEChIPchipDataModel.owl#"
xmlns:TFBS="TFBSdataModel.owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<H3K4Me3:Datafile rdf:about="H3K4me3.txt">
<H3K4Me3:contains>
<H3K4Me3:Datafile_row rdf:about="H3K4me3.xml/row1“>
<H3K4Me3:bin rdf:datatype="&xsd;integer">1713</H3K4Me3:bin>
<H3K4Me3:chrom>chr1</H3K4Me3:chrom>
<H3K4Me3:chromStart rdf:datatype="&xsd;integer">147971248</H3K4Me3:chromStart>
<H3K4Me3:chromEnd rdf:datatype="&xsd;integer">147972628</H3K4Me3:chromEnd>
<H3K4Me3:score rdf:datatype="&xsd;decimal">0.73</H3K4Me3:score>
</H3K4Me3:Datafile_row>
</tH3K4Me3:contains>
......
BioInformatics RDF
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• StandOff Annotation
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• StandOff Annotations
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
70
Goals of XRPC
XQuery
XQuery
XQuery
XQuery
XQuery
XQuery
Distributed XQuery
(or even P2P)
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
71
Design Goals• Orthogonal
• All XQuery semantics, typing and validation
• also including XQuery Update Facility
• Interoperable• Network-layer specification!
• SOAP-based
• Potential for efficiency• Protocol exposes set-at-a-time opportunities
• Keep it simple• A minimal extension
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
72
XRPC Syntax Extension
execute at { Expr } { FunApp(ParamList) }
XRPC: XQuery + RPC
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
73
Distributed XQuery with XRPC: Possibilities
Peer1
doc1.xml
Peer2
doc2.xml
Peer3
for each person from Vienna in doc1.xmlwho has bought something (item)from doc2.xmlreturn ... something ...
<auctions>... FULL DOC ...</auctions>
<auctions>... FULL DOC ...</auctions>
doc(“doc1.xml”)
//person[@city=“Vienna”]
doc(“doc2.xml”)
//item
person/@id buyer/@person
Data Shipping Strategy
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
74
Distributed XQuery with XRPC: Possibilities
Peer1
doc1.xml
Peer2
doc2.xml
Peer3
for $person in execute at {Peer1} getViennaPersons()return execute at {Peer2} getPersonItems($person/@id)
Function Shipping using XRPCDistributed Semi-Join!
getViennaPersons():
(“doc1.xml”)//person[@city=“Vienna”]
getPersonItems($pid):
(“doc2.xml”)//item[./buyer/@pid = $pid]
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
75
Distributed XQuery with XRPC: Possibilities
Peer1
doc1.xml
Peer2
doc2.xml
Peer3
for $person in execute at {Peer1} getViennaPersons()return execute at {Peer2} getPersonItems($person/@id)
getViennaPersons()at {Peer1} <person id=1>
<person id=2>...
getPersonItems(@id=1)at {Peer2}getPersonItems(@id=2)at {Peer2}...
<item>...<item>......
Distributed Semi-Join➠ Push loop dependent parameters
Selection Pushdown
Bulk RPC
SelectionSemi-Join Result
Semi-Join
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
76
Remote Function Calls• Functions: unit of distributed execution in XRPC
• Defined in an XQuery module, identified by a URL
• XQuery: a compositional, functional language• each sub expression = function of its free variables
• each function can be executed with XRPC on any peer
• Automatic Query decomposition• described in ICDE 2009
• “Efficient Decomposition of Full-Fledged XQuery”, Zhang, Tang, Boncz
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics:• Keyword Search in XML (PF/Tijah)
• StandOff Annotation
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics:• Keyword Search in XML (PF/Tijah)
• StandOff Annotation
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML node values (W3C spec)
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML node values (W3C spec)
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML node values (W3C spec)
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
XML node values (W3C spec)
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Indexes for XML: a challenge• Building an index on all nodes that have a values of a
certain type is easy Eg a B-tree containing all node-ids that have a numeric value
• However, how to maintain the index if the node is changed?
node change affects the value of all ancestors root node always changes! Locking hotspot. computing the string value of the root is ... very expensive
Not well thought over by W3C committee
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• StandOff Annotation
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Overview• Crash course XML and XQuery + XQ Update Facility
• XML Databases
• MonetDB/XQuery• Open-source XML database by me and my group at CWI
• Start downloading now monetdb.cwi.nl
• Additional Topics• Keyword Search in XML (PF/Tijah)
• StandOff Annotation
• Distributed XML (XRPC)
• More Internals• XML Indexing and why it is hard
• Runtime Query Optimization
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
MonetDB/XQuery http://monetdb.cwi.nl BioWise InfoMgmt 2009
Questions?[sigmod 2009]
R. Abdel Kader, P. A. Boncz, S. Manegold, M. van Keulen. ROX: Run-time Optimization of XQueries
[dataX 2009] L. Sidirourgos, P. A. Boncz. Generic and Updatable XMLValue Indices Covering Equality and Range Lookups
[icde 2009]Y. Zhang, N. Tang, P. A. Boncz. Efficient Distribution of Full
Fledged XQuery[vldb 2007]
Y. Zhang, P. A. Boncz. XRPC: Interoperable and Efficient Distributed XQuery
[digital forensics 2007]W. Alink, R. Bhoedjang, A. P. de Vries, P. A. Boncz. XIRAF:Ultimate Forensic Querying
[sigmod 2006]P. A. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, J. Teubner. MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine.