holistic twig joins: optimal xml pattern matching nicholas bruno, nick koudas, divesh srivastava acm...

44
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Upload: owen-stanley

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Problem Statement Given a query twig pattern Q, and a XML database D, compute ALL the answers to Q in D. Example: QueryXML document

TRANSCRIPT

Page 1: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Holistic Twig Joins: Optimal XML Pattern Matching

Nicholas Bruno, Nick Koudas, Divesh Srivastava

ACM SIGMOD 02

Presented by: Li Wei, Dragomir Yankov

Page 2: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Outline• Problem Statement• PathStack Algorithm• TwigStack Algorithm• Experimental Results

Page 3: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Problem Statement• Given a query twig pattern Q, and a XML database D, compute

ALL the answers to Q in D. • Example:

author

l n

j ane doe

fn

book(1, 1: 150, 1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61:63, 2)

chapter(1, 64:93, 2)

XML(1, 3, 3)

author(1, 6:20, 3)

fn(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65:67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

Page 4: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Binary Structural Joins• The approach

– Decompose the twig pattern into binary structural relationships

– Use structural join algorithms to match the binary relationships against the XML database

– Stitch together the basic matches• The problem

– The intermediate result sizes can get large, even when the input and output sizes are more manageable.

Page 5: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

author

l n

j ane doe

fn

book(1, 1:150,1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61: 63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author(1, 6: 20, 3)

f n(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

Page 6: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

author

l n

j ane doe

fn

book(1, 1:150,1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61: 63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author(1, 6: 20, 3)

f n(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

Decomposition

author – fn

author – ln

fn – jane

ln – doe

Page 7: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

Decomposition Number of Intermediate Results3

author

l n

j ane doe

fn

book(1, 1:150,1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61: 63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author(1, 6: 20, 3)

f n(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

author – fn

author – ln

fn – jane

ln – doe

Page 8: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

author

l n

j ane doe

fn

book(1, 1:150,1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61: 63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author(1, 6: 20, 3)

f n(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

Decomposition Number of Intermediate Results3

3

author – fn

author – ln

fn – jane

ln – doe

Page 9: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

author

l n

j ane doe

fn

book(1, 1:150,1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61: 63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author(1, 6: 20, 3)

f n(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

Decomposition Number of Intermediate Results3

3

2

author – fn

author – ln

fn – jane

ln – doe

Page 10: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

author

l n

j ane doe

fn

book(1, 1:150,1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61: 63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author(1, 6: 20, 3)

f n(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

Decomposition

author – fn

author – ln

fn – jane

ln – doe

Number of Intermediate Results3

3

2

2

Page 11: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

author

l n

j ane doe

fn

book(1, 1:150,1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61: 63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author(1, 6: 20, 3)

f n(1, 7: 9, 4)

l n

j ane(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author

fn l n

j ohn doe(1, 26, 5)

author

fn l n

j ane(1, 43, 5)

doe(1, 46, 5)

Query XML document

Decomposition

author – fn

author – ln

fn – jane

ln – doe

Number of Intermediate Results3

3

2

2

Output

1

Page 12: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Holistic Twig Joins• The approach

– Uses linked stacks to compactly represent partial results to query paths

– Merges results to query paths to obtain matches for the twig pattern

• The advantage– It ensures that no intermediate solutions is

larger than the final answer to the query.

Page 13: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

author

l n

j ane doe

fn

Query XML documentbook

(1, 1: 150, 1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61:63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author1(1, 6: 20, 3)

fn1(1, 7: 9, 4)

l n1

j ane1(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68:78, 3)

head(1, 69:71, 4)

Ori gi ns(1, 70, 5)

author2

fn2 l n2

j ohn doe1(1, 26, 5)

author3

fn3 l n3

j ane2(1, 43, 5)

doe2(1, 46, 5)

Page 14: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Example

Decomposition

author – fn – jane

author – ln – doe

Intermediate Results

1

1

Output

author

l n

j ane doe

fn

Query XML document

1

book(1, 1: 150, 1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61:63, 2)

chapter(1, 64: 93, 2)

XML(1, 3, 3)

author1(1, 6: 20, 3)

fn1(1, 7: 9, 4)

l n1

j ane1(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65: 67, 3)

XML(1, 66, 4)

secti on(1, 68:78, 3)

head(1, 69:71, 4)

Ori gi ns(1, 70, 5)

author2

fn2 l n2

j ohn doe1(1, 26, 5)

author3

fn3 l n3

j ane2(1, 43, 5)

doe2(1, 46, 5)

Number of Intermediate Results

author3 – fn3 – jane2

author3 – ln3 – doe2

Page 15: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

book(1, 1: 150, 1)

t i t l e(1, 2: 4, 2)

al l authors(1, 5: 60, 2)

year(1, 61:63, 2)

chapter(1, 64:93, 2)

XML(1, 3, 3)

author1(1, 6: 20, 3)

fn1(1, 7: 9, 4)

l n1

j ane1(1, 8, 5)

poe(1, 11, 5)

2000(1, 62, 3)

t i t l e(1, 65:67, 3)

XML(1, 66, 4)

secti on(1, 68: 78, 3)

head(1, 69: 71, 4)

Ori gi ns(1, 70, 5)

author2

fn2 l n2

j ohn doe1(1, 26, 5)

author3

fn3 l n3

j ane2(1, 43, 5)

doe2(1, 46, 5)

author

l n

j ane doe

fn

Query

isLeaf (author) = false

isRoot (author) = true

parent (fn) = author

children (author) = {fn, ln}

subtreeNodes (author) = {fn, ln, jane, doe}

XML document

StreamsTa: a1, a2, a3

Tfn: fn1, fn3

Tln: ln2, ln3

Tj: j1, j2

Td: d1, d2

eof (Ta) = false

advance (Ta) => Ta: a1, a2, a3

next (Ta) = a1

nextL (Ta) = 6

nextR (Ta) = 20

Notation

SaSfnSl nSjSd

a3f 3

Stacks

empty (Sa) = false

pop (Sf)

push (Sln, ln3, pointer to a3)

topL (Sa) = LeftPos of a3

topR (Sa) = RightPos of a3

Page 16: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Algorithm: PathStack

A1

B1

A2

B2

C1

SASBSC

A1B1A2B2

C1

While the streams of the leaves are not empty (i.e. a solution could be found) do:- select the node with minimal LeftPos value and push it into stack- if it is a leaf, print the solution

A

B

C

A1

B1

A2

B2

C1

A1B1C1

A1B2C1

A2B2C1

Intuition:

Page 17: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

TA: A1, A2

TB: B1, B2

TC: C1

Stacks Comments

A1

B1

A2

B2

C1

A

B

C

SASBSC

Streams

A1B1

A2B2C1

qmin = A

06) moveStreamToStack(TA, SA, null)

Page 18: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

TA: A1, A2

TB: B1, B2

TC: C1

Stacks Comments

A1

B1

A2

B2

C1

A

B

C

Streams

A1B1

A2B2C1

qmin = B

06) moveStreamToStack(TB, SB, A1)SASBSC

A1

Page 19: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

SASBSC

A1B1

TA: A1, A2

TB: B1, B2

TC: C1

Stacks Comments

A1

B1

A2

B2

C1

A

B

C

Streams

A1B1

A2B2C1

qmin = A

06) moveStreamToStack(TA, SA, null)

Page 20: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

SASBSC

A1B1A2

TA: A1, A2

TB: B1, B2

TC: C1

Stacks Comments

A1

B1

A2

B2

C1

A

B

C

Streams

A1B1

A2B2C1

qmin = B

06) moveStreamToStack(TB, SB, A2)

Page 21: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

SASBSC

A1B1A2B2

TA: A1, A2

TB: B1, B2

TC: C1

Stacks Comments

A1

B1

A2

B2

C1

A

B

C

Streams

A1B1

A2B2C1

qmin = C

06) moveStreamToStack(TC, SC, B2)

Page 22: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

SASBSC

A1B1A2B2

C1

TA: A1, A2

TB: B1, B2

TC: C1

Stacks Comments

A1

B1

A2

B2

C1

A

B

C

Streams

A1B1

A2B2C1

07) isLeaf(C) = true

08) showSolutions(SC, 1)

09) pop(SC)

Page 23: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

SASBSC

A1B1A2B2

TA: A1, A2

TB: B1, B2

TC: C1

Stacks Comments

A1

B1

A2

B2

C1

A

B

C

Streams

A1B1

A2B2C1

01) end(q) = true

Algorithm ends.

Page 24: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Procedure: showSolutions

SASBSC

A1B1A2B2

C1

Intuition:- stacks have the compact encodings of the anwers

- output is in leaf-to-root order

A

B

C

A1

B1

A2

B2

C1 C1B1A1

C1B2A1

C1B2A2

Page 25: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Analysis: PathStack• Correctness

– (Theorem 3.1) Given a query path pattern Q and an XML database D, Algorithm PathStack correctly returns all answers for Q on D.

• Optimality– (Theorem 3.2) Algorithm PathStack has worst

case I/O and CPU time complexities linear in the sum of sizes of the input lists and the output list.

Page 26: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

PathMPMJ

• A naïve extension of MPMGJN could be to backtrack all possible solutions – PathMPMJNaive

• A much faster approach is to keep “k” pointers on the streams and prune part of the solutions - PathMPMJ

A

B

C

TA = A1, A2, A3…

TB = B1, B2 … BK…

TC = C1, C2, C3 …

Page 27: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

PathStack Limitations• Merging the path queries for twig joins is

not optimalExample:

allauthors(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

ln1

jane1(1,8,5)

poe(1,11,5)

author2

fn2 ln2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

...

Query result:

(a3, fn3, ln3, j2, d2)

Query:

author

jane

fn

author

doe

ln

(a1, fn1, j1)

(a3, fn3, j3)

(a2, ln2, d2)

(a3, ln3, d3)

Page 28: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

TwigStackallauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

ln1

jane1(1,8,5)

poe(1,11,5)

author2

fn2 ln2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

...

Intuition:

author

l n

j ane doe

fn

While the streams of the leaves are not empty (i.e. a solution could be found) do:

- select a node that could be expanded to a solution - if it is a leaf, print the solution

Page 29: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

SaSf nSl nSjSd

StacksComments: Phase101: while (notEmpty(Tj) || notEmpty(Td)) do:

TwigStack: Example...

allauthors(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2 ln2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1

Page 30: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

SaSf nSl nSjSd

StacksComments: iteration1qact = getNext(a) fn getNext(fn) fn getNext(j) j nmin=nmax=8 (j1) getNext(ln) ln getNext(d) d nmin=nmax=26 (d1)

advance(ln) nmin=7(fn1) nmax=ln2 advance(Ta)advance(Tfn)

TwigStack: Example...

allauthors(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 31: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

SaSf nSl nSjSd

StacksComments: iteration2qact = getNext(a) j getNext(fn) j getNext(j) j nmin=nmax=8 (j1) getNext(ln) ln getNext(d) d nmin=nmax=26 (d1) nmin=8(j1) nmax=ln2advance(Tj)

TwigStack: Example...

allauthors(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 32: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

SaSf nSl nSjSd

StacksComments: iteration3qact = getNext(a) ln getNext(fn) fn getNext(j) j nmin=nmax=43 (j2) advance(fn) getNext(ln) ln getNext(d) d nmin=nmax=26 (d1) nmin=ln2 nmax=fn3 advance(Ta)advance(Tln)

TwigStack: Example

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

...allauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

Page 33: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

SaSf nSl nSjSd

StacksComments: iteration4qact = getNext(a) d getNext(fn) fn getNext(j) j nmin=nmax=43 (j2) getNext(ln) d getNext(d) d nmin=nmax=26 (d1) nmin=26(d1) nmax=fn3advance(Td)

TwigStack: Example...

allauthors(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 34: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

StacksComments: iteration5qact = getNext(a) a getNext(fn) fn getNext(j) j nmin=nmax=43 (j2) getNext(ln) ln getNext(d) d nmin=nmax=46 (d2) nmin=fn3 nmax=ln3moveStreamToStack(Ta) advance(Ta)

TwigStack: Example

SaSfnSlnSjSd

a3

...allauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 35: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

StacksComments: iteration6qact = getNext(a) fn getNext(fn) fn getNext(j) j nmin=nmax=43 (j2) getNext(ln) ln getNext(d) d nmin=nmax=46 (d2) nmin=fn3 nmax=ln3moveStreamToStack(Tfn) advance(Tfn)

TwigStack: Example

SaSfnSlnSjSd

a3fn3

...allauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 36: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

StacksComments: iteration7qact = getNext(a) j getNext(fn) j getNext(j) j nmin=nmax=43 (j2) getNext(ln) ln getNext(d) d nmin=nmax=46 (d2) nmin=43(j2) nmax=ln3moveStreamToStack(Tj) advance(Tj) pop(Sj)showSolutionsWithBlocking(j)

TwigStack: Example

“Merge-joinable” root-to-leaf path: (j2, fn3, a3)

SaSfnSlnSjSd

a3fn3j2

...allauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 37: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

StacksComments: iteration8qact = getNext(a) ln3 getNext(fn) nil getNext(j) nil nmin=nmax=nil getNext(ln) ln getNext(d) d nmin=nmax=46 (d2) nmin=ln3 nmax=ln3moveStreamToStack(Tln) advance(Tln)

TwigStack: Example

“Merge-joinable” root-to-leaf path: (j2, fn3, a3)

SaSfnSlnSjSd

a3fn3ln3

...allauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 38: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

StacksComments: iteration9qact = getNext(a) ln3 getNext(fn) nil getNext(j) nil nmin=nmax=nil getNext(ln) d getNext(d) d nmin=nmax=46 (d2) nmin=d nmax=dmoveStreamToStack(Td) advance(Td) pop(Sd)showSolutionsWithBlocking(d)

TwigStack: Example

“Merge-joinable” root-to-leaf paths: (j2, fn3, a3)

(d2, ln3, a3)

SaSfnSlnSjSd

a3fn3ln3d2

...allauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

Page 39: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

author

l n

j ane doe

fn

StacksComments: Phase212: MergeAllPathSolutions()

TwigStack: Example

TwigStack solution:

(j2, fn3, d2, ln3, a3)

SaSfnSlnSjSd

a3fn3ln3

StreamsTa: a1, a2, a3

Tfn: fn1, fn2, fn3

Tln: ln1, ln2, ln3

Tj: j1, j2

Td: d1, d2

...allauthors

(1,5:60,2)

author1(1,6:20,3)

fn1(1,7:9,4)

jane1(1,8,5)

poe(1,11,5)

author2

fn2

john doe1(1,26,5)

author3

fn3 ln3

jane2(1,43,5)

doe2(1,46,5)

ln1 ln2

Page 40: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Analysis of TwigStack• Let getNext(q) = qN

– qN has minimum descendant extension

– for all qi subtreeNodes(qN) next(Tqi) = hqi

– Either q=qN or parent(qN) has no min right extension

• Any ancestor of qN whose extension uses hqn is returned by getNext before qN => correctness (TwigStack finds all solutions to q)

• TwigStack is time and space optimal for ancestor-descendant edges

Page 41: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Suboptimality for parent-child edges

Example

A1

A2 B2

B1

C2

C1

A

B C

final solutions

TS Phase1 solutions:

(A1, B2, C2)

(A2, B1, C1)

(A1, B1, C1)

(A1, B1, C2)Would be optimal for:

A

B C

Page 42: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

TwigStack and XB-Treesa1

(2:95)

a2(3:50)

a3(6:48)

a4(10:45)

a5(20:30)

a6(55:58)

a7(60:94)

a8(62:75)

a10(80:88)

a9(70:72)

a11(80:88)

• XB-Trees - B+ trees with some additional features1

-Internal nodes have the form [L:R], sorted on L

-Parent node interval includes child node intervals

-Each page P has pointer P.parent

• TwigStackXB – same as TwigStack with the following modifications

-Tq for a query node with an index is now the XB tree rather than a stream

-The advance operation is modified according to the pointer act=(actPage,actIndex)

- The drilldown operation is introduced

2:95 20:88

2:95 6:48

2:95

3:50

6:48

10:45

20:58 60:94

2:95

50:58

60:94

62:75

80:88

82:86

80:88

70:72

1. “An Evaluation of XML indexes for Structural Join” demonstrates that while all – B+, XR and XB trees build the same tree structure, for “highly recursive” XML XB trees outperform the other two

Page 43: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Experimental Results

PS vs TS for binary twig query PS vs TS for parent-child query

Page 44: Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

Questions?