a unified model for xquery evaluation over xml data streams jinhui jian hong su elke a....

30
A Unified Model for XQue ry Evaluation over XML D ata Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Post on 22-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

A Unified Model for XQuery Evaluation over XML Data Streams

Jinhui Jian

Hong Su

Elke A. Rundensteiner

Worcester Polytechnic Institute

ER 2003

Page 2: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Need for Stream Processing

New environment Data sources are everywhere Data requests are everywhere

New applications Sensor networks Analysis of XML web logs Selective dissemination of XML information

(e.g., news)

Page 3: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Specific Challenges for XML Streams <biditems>

<book year=“2001">

<title>Dream Catcher</title>

<author><last>King</last><first>S.</first></author>

<publisher>Bt Bound </publisher>

<price> 20 </price>

</book>

Token-by-Token access manner

timeline

<biditems> <book> <title> Dream Catcher </title> …

Token: not a direct counterpart of a tuple

Pattern retrieval + Filtering/Restructuring

FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

Page 4: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Two Computation Paradigms

Automata-based [yfilter02, x-scan01, xsm02, xsq03, xpush03…]

Algebraic [niagara00, …]

This project intends to integrate both paradigms into one

Page 5: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Automata Paradigm:

FOR $b in stream(biditems.xml) //bookLET $p = $b/price/text(), $t = $b/titleWHERE $p < 30RETURN <Inexpensive>$t</Inexpensive>

1book*

2

4title

3

price

5Text()

Auxiliary structures for:

1. Buffering data

2. Evaluating predicates

3. Restructuring buffered data

//book

//book/title

//book/price/text()

Page 6: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Algebraic Computation

book bookbook

title author

last first

publisher price

Text

Text Text

Text Text

Navigate //book, price

Tagger

Navigate //book, title

Select price < 30

Navigate //book, price

Select price < 30

Tagger

Navigate //book, title

Selection push-down enabled

FOR $b in doc (biditems.xml) //bookLET $p = $b/price/text(), $t = $b/titleWHERE $p < 30RETURN <Inexpensive>$t</Inexpensive>

<book year=“2001"> …</book>

<book>… … </book>

<title>… </title>

Navigate//book, /title

Page 7: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Observations Automata paradigm

Good and long studied for pattern retrieval on tokens

Patches needed for complex filtering and restructuring

Algebraic paradigm Good and long studied for expressing and optimizin

g query plans on sets of tuples Tokenized inputs not accommodated yet

Either paradigm has deficiencies

Both patterns complement each other

Page 8: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Research Challenges

How to integrate the two models? How to optimize a query within the integrated query

model?

Page 9: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Raindrop Approach:Uniform Modeling in an Algebraic Framework

Page 10: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Uniform Algebraic Plan

XML data stream

Query answer

Algebraic Plan

Page 11: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Uniform Algebraic Plan

Token-based plan (automata plan)

Tuple-based plan

Tuple stream

XML data stream

Query answer

Page 12: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Modeling the Automata in Algebraic Plan:Black Box[xscan] vs. White Box

$b := //book$p := $b/price$t := $b/title

SJoin//book

Extract //book/price

Extract //book/title

Black Box White Box

Xscan

FOR $b in stream(biditems.xml) //bookLET $p = $b/price/text(), $t = $b/titleWHERE $p < 30RETURN <Inexpensive>$t</Inexpensive>

Page 13: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

A Unified Process at the Logical View

FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

Token-based plan (automata plan)

Tuple-based plan

Page 14: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

A Unified Process at the Logical View

FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

Tuple-based plan

SJoin//book

Extract$p, //book/price

Extract$t, //book/title

Page 15: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

A Unified Process at the Logical View

FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

SJoin//book

Extract//book/price

Extract//book/title

Select //book/price >5 0

Navigate //book, //book/title

Page 16: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

The Algebra CoreOp Symbol Semantic

Selection Filter tuples based on the predicate pred

Projection Filter columns in the input tuples based on the variable list v

Join Join input tuples based on the predicate pred

Aggregate Aggregate over input tuples with the aggregate function f, e.g., sum and average

Tagger Format outputs based on the pattern pt, i.e., reconstruct XML tags

Navigate Take input elements of path p1 and output ancestor elements of path p2

Extract Identify elements of path p from the input stream

Structural Join

Join input tuples on their structural relationship, e.g, the common parent relationship p

2,1 pp

p

pred

v

ptT

f

Relational-like

XML-Specific

SJ

Page 17: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Extract Operator

1 2book

*

Extract//book/title

<bib> <book> <title> Dream Catcher </title> … </book>…

1title

<title> Dream Catcher </title>

Page 18: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Structural Join Operator

1 2book

3title*

4price

Extract//book/title

Extract//book/price

SJoin//book

<title>…</title> <price>…</price>

<biditems> <book> <title> Dream Catcher </title> … </book>…

<price>…</price><title>…</title>

FOR $b in doc (biditems.xml) //bookLET $p := $b/price/text() $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

Page 19: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Optimization via Query Rewriting

Page 20: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

In or Out?

Token-based plan (automata plan)

Tuple-based Plan

Tuple stream

XML data stream

Query answer

Pattern retrieval

Page 21: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Plan Alternatives

Extract //book

Navigate /price

Select price<30

Navigate book/title

The pull-out plan

Extract //book/price

Extract //book/title

SJoin //book

Select price < 30

The push-in plan

TaggerTagger

Page 22: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Pattern Retrieval Alternatives<title>…</title> <price>…</price>

<title>…</title> <price>…</price>

<price>…</price>

<price>…</price>

<title>…</title>

<title>…</title>

In Automata (/title, /price)

1book

*

2

4title

3

price

<book>… … </book>

<book year=“2001"> <title>Dream Catcher</title> <author> <last> King </last> <first> S. </first> </author> <publisher> Bt Bound </publisher> <price> 20 </price> </book>

<title>…</title>

<title>…</title>

<book>… … </book>

<book>… … </book>

<title>…</title>

<title>…</title>

<book>… … </book>

<book>… … </book>

<price>…</price>

<price>…</price>

Out of Automata(/title, /price)

1book

*

2

t2

t10

t2t10

SJ

Page 23: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Experiment:

Selectivity = 5% Selectivity = 90%

Page 24: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Related Work

Page 25: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Camp 1: Complete Automata Model [XSQ, XSM, XPush]

For $x in $R/a return

for $Y in $X/b return

<res>$Y, $X </res>

0,0,0

1,0,0

2,1,0

2,2,1

2,2,2

2,1,3

1,1,3

1,2,2

1,2,1

1,1,0

*r=er|r++*r=sr|r++

*r!=<a>|r++*r=<a>|w(x,sx),w(x,<a>),r++,x”++

*r=</a>|w(x,</a>),w(x,ex),r++,xs=x

*r!=</a>&*r!=</b>|w(x,*r),r++,x”++

*r=<b>|w(x,<b>),r++

*true|xm=x’, w(o,<res>),w(o,<b>),x’++

*r!=</a>&*r!=</b>|w(x,*r),w(o,*r),x”++,r++

*r=</b>|w(x,</b>),w(o,</b>),r++,x”++

!AE(x’)&*x’!=ex|w(o,*x’),x’++

AE(x’)&*r!=</a>|w(x,*r),w(o,*r),r++,x”++

AE(x’)&*r=</a>|w(x,</a>),w(o,</a>),w(x,ex),r++,x’++

!AE(x’)&x’!=ex|w(o,*x’),x’++

!AE(x”)&x”=</b>|w(o,</b>),x”++

!AE(x”)&*x”!=</b>|w(o,*x”),x”++

True|xm=x’,w(o,<res>),w(o,<b>),x’++

!AE(x”)&*x”=<b>|x”++

!AE(x”)&*x”!=<b>&*x”!=ex|x”++

!AE(x”)&*x”=ex|xs=x”

Page 26: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Camp 1: Complete Automata Model [XSQ, XSM, XPush]

All details are presented on the same level (and low level!) Hard to understand Not suitable for optimizing at different levels

Little has been studied for using automata as query processing paradigm

Page 27: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter]

Fixed interface for automata computation (all pattern retrieval pushed down)

No opportunity of pushing/pulling computation into/from automata

Bloated, black box operator Algebraic rewriting impossible for internal

optimization

AutomataPlan

$b := //book$p := //book/price$t := //book/title

$b $p $t

Page 28: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Contributions

Combining automata and algebra leads to a powerful query processing model Modeling:

Uniform, simple logical view – better understandability Optimization:

Uniform rewriting – more optimization opportunities (e.g., pushin/pullout)

Optimization necessity is verified by experiments

Page 29: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Email: [email protected]

Page 30: A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Experiment 2

Number of patterns = 2 Number of patterns = 20