a uniform and layered algebraic framework for xqueries on xml streams hong su jinhui jian elke a....

30
A Uniform and Layered Al gebraic Framework for XQ ueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov 5, 2003

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

A Uniform and Layered Algebraic Framework for XQueries on XML Streams

Hong Su

Jinhui Jian

Elke A. Rundensteiner

Worcester Polytechnic Institute

CIKM, Nov 5, 2003

Page 2: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

2

Need for Stream Processing

New computing environment Data sources can be anywhere/anytime

On-line arriving data Data requests can be anywhere/anytime

Real-time response requirement New applications

Relational Sensor networks

XML Analysis of XML web logs Selective dissemination of XML information (e.g., news)

Page 3: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

3

What’s Special for XML Stream Processing<Biditems>

<book year=“2001">

<title>Dream Catcher</title>

<author><last>King</last><first>S.</first></author>

<publisher>Bt Bound </publisher>

<price> 30 </initial>

</book>

<biditems> <book> <title> Dream Catcher </title> …

Token-by-Token access manner

timeline

Pattern retrieval + Filtering + Restructuring

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 20Return <Inexpensive> $t </Inexpensive>

Token: not a direct counterpart of a tuple

30Bt BoundS.KingDream2001

pricepublisherfirstlasttitleyear

Pattern Retrieval on Token Streams

Page 4: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

4

Two Computation Paradigms

Automata-based [yfilter02, xscan01, xsm02, xsq03, xpush03…]

Algebraic [niagara00, …]

This Raindrop framework intends to integrate both paradigms into one

Page 5: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

5

Automata-Based Paradigm

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 20Return <Inexpensive> $t </Inexpensive>

1book*

2

4title

price

Auxiliary structures for:

1. Buffering data

2. Filtering

3. Restructuring

//book

//book/title

//book/price3

Page 6: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

6

Algebraic Computation

book bookbook

title author

last first

publisher price

Text

Text Text

Text Text

<book> …</book>

<title>… </title>

<book>… </book>

Navigate$b, /title -> $t

Navigate $b, /price->$p

Navigate $b, /title->

$t

Tagger

Select $p < 30

Logic Plan

Navigate //$b, /title->$t

Rewrite by “pushing down selection”

Navigate $b,/price->$p

Select $p < 30

Tagger

Rewritten Logic Plan

Navigate-Index $b, /price -> $p

Select $p < 30

Tagger

Navigate-Scan $b, /title -> $t

Physical Plan

Choose low-level implementation alternatives

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 20Return <Inexpensive> $t </Inexpensive>

$b

$b $t

Page 7: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

7

Observations

Either paradigm has deficiencies

Both paradigms complement each other

Automata Paradigm Algebra Paradigm

Good for pattern retrieval on tokens Does not support token inputs

Need patches for filtering and restructuring

Good for filtering and restructuring

Present all details on same low level Support multiple descriptive levels (declarative->procedural)

Little studied as query processing paradigm

Well studied as query process paradigm

Page 8: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

8

How to Integrate Two Paradigms

Page 9: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

9

How to Integrate Two Models? Design choices

Extend algebraic paradigm to support automata? Extend automata paradigm to support algebra? Come up with completely new paradigm?

Extend algebraic paradigm to support automata Practical

Reuse & extend existing algebraic query processing engines Natural

Present details of automata computation at low level Present semantics of automata computation (target patterns)

at high level

Page 10: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

10

Raindrop: Four-Level Framework

Semantics-focused PlanSemantics-focused Plan

Stream Logic PlanStream Logic Plan

Stream Physical PlanStream Physical Plan

Stream Execution PlanStream Execution Plan

Abstraction Level

High (Declarative)

Low (Procedural)

Page 11: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

11

Level I: Semantics-focused Plan [Rainbow-ZPR02] Express query semantics regardless of store

d or stream input sources Reuse existing techniques for stored XML pr

ocessing Query parser Initial plan constructor Rewriting optimization

Decorrelation Selection push down …

Page 12: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

12

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 20Return <Inexpensive> $t </Inexpensive>

<Biditems>

<book year=“2001">

<title>Dream Catcher</title>

<author><last>King</last><first>S.</first></author>

<publisher>Bt Bound </publisher>

<price> 30 </initial>

</book>

$S1<Biditems> … </Biditems>

$S1<Biditems>… </Biditems>

$b<book> … </book>

<Biditems> … </Biditems>

<book> … </book>

$S1<Biditems>… </Biditems>

$b<book>… </book>

$p

30

<Biditems> … </Biditems>

<book>… </book>

$S1<Biditems>…

</Biditems>

$b<book>… </book>

$p 30

$t Dream

Catcher

<Biditems>… </Biditems>

<book>. .. </book>

… …

NavUnnest $S1, //book ->$b

NavNest $b, /price/text() ->$p

NavNest $b, /title ->$t

Select$p<30

Tagger “Inexpensive”, $t->$r

Example Semantics-focused Plan

Page 13: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

13

Level II: Stream Logical Plan

Extend semantics-focused plan to accommodate tokenized stream inputs New input data format:

contextualized tokens New operators:

StreamSource, Nav, ExtractUnnest, ExtractNest, StructuralJoin

New rewrite rules: Push-into-Automata

Page 14: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

14

One Uniform Algebraic View

Token-based plan (automata plan)

Tuple-based plan

Tuple stream

XML data stream

Query answer

Algebraic Stream Logical Plan

Page 15: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

15

Modeling the Automata in Algebraic Plan:Black Box[XScan01] vs. White Box

$b := //book$p := $b/price$t := $b/title

Black Box

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 20Return <Inexpensive> $t </Inexpensive>

XScan

StructuralJoin$b

ExtractNest $b, $p

ExtractNest $b, $t

White Box

Navigate $b, /price->$p

Navigate $b, /title->$t

Navigate $S1, //book ->$b

Page 16: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

16

Example Uniform Algebraic Plan

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

Tuple-based plan

Token-based plan (automata plan)

Page 17: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

17

Example Uniform Algebraic Plan

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

StructuralJoin$b

ExtractNest $b, $p

ExtractNest $b, $t

Navigate $b, /price->$p

Navigate $b, /title->$t

Navigate $S1, //book ->$b

Tuple-based plan

Page 18: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

18

Example Uniform Algebraic Plan

FOR $b in stream(biditems.xml) //bookLET $p := $b/price $t := $b/titleWHERE $p < 30Return <Inexpensive> $t </Inexpensive>

StructuralJoin$b

ExtractNest $b, $p

ExtractNest $b, $t

Navigate $b, /price->$p

Navigate $b, /title->$t

Navigate $S1, //book ->$b

Select$p<30

Tagger “Inexpensive”, $t->$r

Page 19: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

19

From Semantics-focused Plan to Stream Logical Plan

StructuralJoin$b

ExtractNest $b, $p

ExtractNest $b, $t

Nav $b, /price/text()->$p

Nav $b, /title->$t

Nav $S1, //book ->$b

Select$p<30

Tagger “Inexpensive”, $t->$r

NavUnnest $S1, //book ->$b

NavNest $b, /price/text() ->$p

NavNest $b, /title ->$t

Select$p<30

Tagger “Inexpensive”, $t->$r

Apply “push into automata”

Apply “push into automata”

Page 20: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

20

Level III: Stream Physical Plan

For each stream logical operator, define how to generate outputs when given some inputs Multiple physical implementations may be provided fo

r a single logical operator Automata details of some physical implementation are

exposed at this level Nav, ExtractNest, ExtractUnnest, Structural Join

Page 21: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

21

One Implementation of Extract/Structural Join

1book

title*

4price

3

2

ExtractNest$b, $t

ExtractNest/$b, $p

SJoin//book

<title>…</title> <price>…</price>

<biditems> <book> <title> Dream Catcher </title> … </book>…

<price>…</price><title>…</title>

Nav $b, /price->$p

Nav $b, /title->$t

Nav ., //book ->$b

Page 22: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

22

Level IV: Stream Execution Plan

Describe coordination between operators regarding when to fetch the inputs When input operator generates one output tuple When input operator generates a batch When a time period has elapsed …

Potentially unstable data arrival rate in stream makes fixed scheduling strategy unsuitable Delayed data under scheduling may stall engine Bursty data not under scheduling may cause overflow

Page 23: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

23

Raindrop: Four-Level Framework (Recap)

Semantics-focused PlanSemantics-focused Plan

Stream Logic PlanStream Logic Plan

Stream Physical PlanStream Physical Plan

Stream Execution PlanStream Execution Plan

Express the semantics of query regardless of

input sources

Accommodate tokenized

input streams

Describe how operators manipulate

given data

Decides the Coordination

among operators

Page 24: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

24

Optimization Opportunities

Page 25: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

25

Optimization Opportunities

Semantics-focused PlanSemantics-focused Plan

Stream Logic PlanStream Logic Plan

Stream Physical PlanStream Physical Plan

Stream Execution PlanStream Execution Plan

General rewriting (e.g., selection push down)

General rewriting (e.g., selection push down)

Break-linear-navigation rewriting

Break-linear-navigation rewriting

Physical implementations choosing

Physical implementations choosing

Execution strategy choosing

Execution strategy choosing

Page 26: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

26

From Semantics-focused to Stream Logical Plan: In or Out?

Token-based plan (automata plan)

Tuple-based Plan

Tuple stream

XML data stream

Query answer

Pattern retrieval in Semantics-focused plan

Apply “push into automata”

Page 27: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

27

Plan Alternatives

Nav $b, /price->$p

ExtractNest $b, $p

ExtractNest $b, $t

SJoin //book

Select price < 30

Tagger

Nav $b, /title->$t

Nav $S1, //book->$b

ExtractNest $S1, $b

Navigate /price

Select price<30

Navigate book/title

Tagger

Nav $S1, //book->$b

NavUnnest $S1, //book ->$b

NavNest $b, /price ->$p

NavNest $b, /title ->$t

Select$p<30

Tagger “Inexpensive”, $t->$r

Out In

Page 28: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

28

Experimentation Results

Execution Time on 85M XML Stream Under Various Selectivity

25000

30000

35000

40000

45000

50000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Selectivity of Selection

Exe

cutio

n Ti

me

(ms)

1 Nav

2 Navs

3 Navs

4 Navs

5 Navs

Page 29: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

29

Contributions

Combined automata and algebra based paradigms into one uniform algebraic paradigm

Provided four layers in algebraic paradigm Query semantics expressed at high layer Automata computation on streams hidden at low layer

Supported optimization at an iterative manner (from high abstraction level to low abstraction level)

Illustrated enriched optimization opportunities by experiments

Page 30: A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov

30

Email: [email protected]