![Page 1: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/1.jpg)
A Unified Model for XQuery Evaluation over XML Data Streams
Jinhui Jian
Hong Su
Elke A. Rundensteiner
Worcester Polytechnic Institute
ER 2003
![Page 2: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/2.jpg)
Need for Stream Processing
New environment Data sources are everywhere Data requests are everywhere
New applications Sensor networks Analysis of XML web logs Selective dissemination of XML information
(e.g., news)
![Page 3: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/3.jpg)
Specific Challenges for XML Streams <biditems>
<book year=“2001">
<title>Dream Catcher</title>
<author><last>King</last><first>S.</first></author>
<publisher>Bt Bound </publisher>
<price> 20 </price>
</book>
…
Token-by-Token access manner
timeline
<biditems> <book> <title> Dream Catcher </title> …
Token: not a direct counterpart of a tuple
Pattern retrieval + Filtering/Restructuring
FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>
![Page 4: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/4.jpg)
Two Computation Paradigms
Automata-based [yfilter02, x-scan01, xsm02, xsq03, xpush03…]
Algebraic [niagara00, …]
This project intends to integrate both paradigms into one
![Page 5: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/5.jpg)
Automata Paradigm:
FOR $b in stream(biditems.xml) //bookLET $p = $b/price/text(), $t = $b/titleWHERE $p < 30RETURN <Inexpensive>$t</Inexpensive>
1book*
2
4title
3
price
5Text()
Auxiliary structures for:
1. Buffering data
2. Evaluating predicates
3. Restructuring buffered data
…
//book
//book/title
//book/price/text()
![Page 6: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/6.jpg)
Algebraic Computation
book bookbook
title author
last first
publisher price
Text
Text Text
Text Text
Navigate //book, price
Tagger
Navigate //book, title
Select price < 30
Navigate //book, price
Select price < 30
Tagger
Navigate //book, title
Selection push-down enabled
FOR $b in doc (biditems.xml) //bookLET $p = $b/price/text(), $t = $b/titleWHERE $p < 30RETURN <Inexpensive>$t</Inexpensive>
<book year=“2001"> …</book>
<book>… … </book>
<title>… </title>
Navigate//book, /title
![Page 7: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/7.jpg)
Observations Automata paradigm
Good and long studied for pattern retrieval on tokens
Patches needed for complex filtering and restructuring
Algebraic paradigm Good and long studied for expressing and optimizin
g query plans on sets of tuples Tokenized inputs not accommodated yet
Either paradigm has deficiencies
Both patterns complement each other
![Page 8: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/8.jpg)
Research Challenges
How to integrate the two models? How to optimize a query within the integrated query
model?
![Page 9: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/9.jpg)
Raindrop Approach:Uniform Modeling in an Algebraic Framework
![Page 10: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/10.jpg)
Uniform Algebraic Plan
XML data stream
Query answer
Algebraic Plan
![Page 11: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/11.jpg)
Uniform Algebraic Plan
Token-based plan (automata plan)
Tuple-based plan
Tuple stream
XML data stream
Query answer
![Page 12: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/12.jpg)
Modeling the Automata in Algebraic Plan:Black Box[xscan] vs. White Box
$b := //book$p := $b/price$t := $b/title
SJoin//book
Extract //book/price
Extract //book/title
Black Box White Box
Xscan
FOR $b in stream(biditems.xml) //bookLET $p = $b/price/text(), $t = $b/titleWHERE $p < 30RETURN <Inexpensive>$t</Inexpensive>
![Page 13: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/13.jpg)
A Unified Process at the Logical View
FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>
Token-based plan (automata plan)
Tuple-based plan
![Page 14: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/14.jpg)
A Unified Process at the Logical View
FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>
Tuple-based plan
SJoin//book
Extract$p, //book/price
Extract$t, //book/title
![Page 15: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/15.jpg)
A Unified Process at the Logical View
FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>
SJoin//book
Extract//book/price
Extract//book/title
Select //book/price >5 0
Navigate //book, //book/title
![Page 16: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/16.jpg)
The Algebra CoreOp Symbol Semantic
Selection Filter tuples based on the predicate pred
Projection Filter columns in the input tuples based on the variable list v
Join Join input tuples based on the predicate pred
Aggregate Aggregate over input tuples with the aggregate function f, e.g., sum and average
Tagger Format outputs based on the pattern pt, i.e., reconstruct XML tags
Navigate Take input elements of path p1 and output ancestor elements of path p2
Extract Identify elements of path p from the input stream
Structural Join
Join input tuples on their structural relationship, e.g, the common parent relationship p
2,1 pp
p
pred
v
ptT
f
Relational-like
XML-Specific
SJ
![Page 17: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/17.jpg)
Extract Operator
1 2book
*
Extract//book/title
<bib> <book> <title> Dream Catcher </title> … </book>…
1title
<title> Dream Catcher </title>
![Page 18: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/18.jpg)
Structural Join Operator
1 2book
3title*
4price
Extract//book/title
Extract//book/price
SJoin//book
<title>…</title> <price>…</price>
<biditems> <book> <title> Dream Catcher </title> … </book>…
<price>…</price><title>…</title>
FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>
![Page 19: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/19.jpg)
Optimization via Query Rewriting
![Page 20: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/20.jpg)
In or Out?
Token-based plan (automata plan)
Tuple-based Plan
Tuple stream
XML data stream
Query answer
Pattern retrieval
![Page 21: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/21.jpg)
Plan Alternatives
Extract //book
Navigate /price
Select price<30
Navigate book/title
The pull-out plan
Extract //book/price
Extract //book/title
SJoin //book
Select price < 30
The push-in plan
TaggerTagger
![Page 22: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/22.jpg)
Pattern Retrieval Alternatives<title>…</title> <price>…</price>
<title>…</title> <price>…</price>
<price>…</price>
<price>…</price>
<title>…</title>
<title>…</title>
In Automata (/title, /price)
1book
*
2
4title
3
price
<book>… … </book>
<book year=“2001"> <title>Dream Catcher</title> <author> <last> King </last> <first> S. </first> </author> <publisher> Bt Bound </publisher> <price> 20 </price> </book>
<title>…</title>
<title>…</title>
<book>… … </book>
<book>… … </book>
<title>…</title>
<title>…</title>
<book>… … </book>
<book>… … </book>
<price>…</price>
<price>…</price>
Out of Automata(/title, /price)
1book
*
2
t2
t10
t2t10
SJ
![Page 23: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/23.jpg)
Experiment:
Selectivity = 5% Selectivity = 90%
![Page 24: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/24.jpg)
Related Work
![Page 25: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/25.jpg)
Camp 1: Complete Automata Model [XSQ, XSM, XPush]
For $x in $R/a return
for $Y in $X/b return
<res>$Y, $X </res>
0,0,0
1,0,0
2,1,0
2,2,1
2,2,2
2,1,3
1,1,3
1,2,2
1,2,1
1,1,0
*r=er|r++*r=sr|r++
*r!=<a>|r++*r=<a>|w(x,sx),w(x,<a>),r++,x”++
*r=</a>|w(x,</a>),w(x,ex),r++,xs=x
*r!=</a>&*r!=</b>|w(x,*r),r++,x”++
*r=<b>|w(x,<b>),r++
*true|xm=x’, w(o,<res>),w(o,<b>),x’++
*r!=</a>&*r!=</b>|w(x,*r),w(o,*r),x”++,r++
*r=</b>|w(x,</b>),w(o,</b>),r++,x”++
!AE(x’)&*x’!=ex|w(o,*x’),x’++
AE(x’)&*r!=</a>|w(x,*r),w(o,*r),r++,x”++
AE(x’)&*r=</a>|w(x,</a>),w(o,</a>),w(x,ex),r++,x’++
!AE(x’)&x’!=ex|w(o,*x’),x’++
!AE(x”)&x”=</b>|w(o,</b>),x”++
!AE(x”)&*x”!=</b>|w(o,*x”),x”++
True|xm=x’,w(o,<res>),w(o,<b>),x’++
!AE(x”)&*x”=<b>|x”++
!AE(x”)&*x”!=<b>&*x”!=ex|x”++
!AE(x”)&*x”=ex|xs=x”
![Page 26: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/26.jpg)
Camp 1: Complete Automata Model [XSQ, XSM, XPush]
All details are presented on the same level (and low level!) Hard to understand Not suitable for optimizing at different levels
Little has been studied for using automata as query processing paradigm
![Page 27: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/27.jpg)
Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter]
Fixed interface for automata computation (all pattern retrieval pushed down)
No opportunity of pushing/pulling computation into/from automata
Bloated, black box operator Algebraic rewriting impossible for internal
optimization
AutomataPlan
$b := //book$p := //book/price$t := //book/title
$b $p $t
![Page 28: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/28.jpg)
Contributions
Combining automata and algebra leads to a powerful query processing model Modeling:
Uniform, simple logical view – better understandability Optimization:
Uniform rewriting – more optimization opportunities (e.g., pushin/pullout)
Optimization necessity is verified by experiments
![Page 29: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/29.jpg)
Email: [email protected]
![Page 30: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d7a5503460f94a5daf8/html5/thumbnails/30.jpg)
Experiment 2
Number of patterns = 2 Number of patterns = 20