1 optimizing cursor movement in holistic twig joins marcus fontoura, vanja josifovski, eugene...

25
1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford) CIKM’2005

Post on 22-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

1

Optimizing Cursor Movement in Holistic

Twig Joins

Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center)

Beverly Yang (Stanford)

CIKM’2005

Page 2: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

2

Motivation

for $a in //article[year = “2005” or

keyword = “XML”]

for $s in $a/section

return $s/title

In an index-based method, 7 tags and text elements need to be verified to process this query Running time is dominated by the I/O for manipulating this

cursors Twig join Algorithms are not optimized for I/O and do

not exploit the query’s extraction points

article

AND

OR section

titleyear

2005

keyword

XML

Page 3: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

3

Our Contributions

1. TwigOptimal, a new holistic twig join algorithm that supports a large fraction of XQuery (including AND/OR branches)

2. Description of how extraction points improve query performance

3. Experimental evaluation that shows how TwigOptimal outperforms current algorithms

Page 4: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

4

Agenda

Background TwigOptimal algorithm Experimental results Conclusions

Page 5: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

5

XML Indexing

Begin/End/Level encoding Begin: preorder position of tag/text End: preorder position of last descendent Level: depth

Containment: X contains Y iff

X.begin < Y.begin <= X.end (assuming well-formed)

A1

B1 B2

C1 D1

B3

C2

R (0,7,0)(1,5,1)

(2,2,2)

(4,4,3)(5,5,3)

(6,7,1)

(7,7,2)(3,5,2)

Page 6: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

6

Basic Access Path

Inverted lists Posting: <Token, Location> Token = <term/tag> Location = <DocumentID, Position>

Supported method on cursor: CB.fowardTo(Position p)

A1

B1 B2

C1 D1

B3

C2

R

B1 B2 B3

C1 C2

Page 7: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

7

Joins in XML Structural (Containment) Joins

Twig Joins

A||B

A||B

|| ||C D

B||C

B||D

A||B||C

Page 8: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

8

LocateExtension

“Extension” (w.r.t. query node q) – a solution for the subquery rooted at q

Input: q Result: the cursors of all descendants of q

point to an extension for qA||B

|| ||C D

B1

C1 X1 X2 D2

B3D1

A

C2

Page 9: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

9

LocateExtension

While (not end(q) && not hasExtension(q)) {(p, c) = PickBrokenEdge(q);ZigZagJoin(p, c);

}

A||B

|| ||C D

B1

C1 X1 X2 D2

B3D1

A

C2

Page 10: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

10

TwigOptimal Algorithm

Tests if the cursor with the minimal location has an extension If not, try to virtually move cursors until they form an

extension Only move cursors physically if no more virtual move is

possible

A virtual move just sets the begin value of the cursor, therefore no I/O is involved: Cq.begin = new begin value for Cq; Cq.virtual = true; //indicates that the cursor is virtual

Page 11: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

11

Checking Extension

We have an extension for cursor q if: All cursors underneath q are properly aligned All cursors underneath q have physical locations

A||B

|| ||C D

B1

C1 X1 X2 D2

B3D1

A

C2

Return false

Page 12: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

12

Checking Extension

We have an extension for cursor q if: All cursors underneath q are properly aligned All cursors underneath q have physical locations

A||B

|| ||C D

B1

C1 X1 X2 D2

B3D1

A

C2

Return true

Page 13: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

13

Moving Cursors

Two passes over the query tree Bottom-up: move each parent cursor forward so it

contains the children cursors Top-down: move the children cursors forward so

they are contained by their parents

Page 14: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

14

Move Cursors Example

x2

y4 y5y1

x1

z2z1

y2 y3

1

3

2 4

5

6

7

= virtual move

Query = //x[.//y and .//z] = physical move

Page 15: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

15

Comparing with TSGeneric+

w1

x1

w2

x2

y2y3… y50 y51 y52 ... y100

z2

x50

y49 y98

x3 x4... x49

= current cursor position

Query = //w//x//y//z = virtual move

= physical move

y1

z1

y99

Page 16: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

16

Comparing with TSGeneric+

x2

y2 y50 y51y52...y49 y98

x3 x4... x49

= current cursor position

Query = //w//x//y//z = physical move

w1

x1

y1

z1

y3…

w2

y100

z2

x50

y99

Page 17: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

17

Extraction Points Optimization

If neither q or its descendants in the query are extraction points we can virtually move these cursors within q’s parent

C1 B1

A1

C99

|| ||B C

A

C100

A2

B2 B3

Page 18: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

18

Prototype

Implemented over Berkeley DB B-tree Inverted lists

Posting: <Token, Location> Token = <term/tag> Location = <DocumentID, Position>

Position is BEL

Page 19: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

19

Data Sets

Xmark 10 documents of size ~ 100MB each

Synthetic 4 tags: W, X, Y, Z Uncorrelated, no self-nesting Same frequency

Page 20: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

20

Experimental Results

0

500

1000

1500

2000

2500

3000

3500

4000

//w [.//x] //w [.//x//z] //w [.//x//y//z]

Physical cursor moves

TSGeneric+

Tw igOptimal

Page 21: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

21

Experimental Results

0

2

4

6

8

10

12

14

16

//w [.//x] //w [.//x//z] //w [.//x//y//z]

Running time (ms)

TSGeneric+

Tw igOptimal

Page 22: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

22

Experimental Results

0200000400000600000800000

100000012000001400000160000018000002000000

Small Xmark Query (4nodes)

Large Xmark Query (10nodes)

Physical cursor moves

TSGeneric+

Tw igOptimal

Page 23: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

23

Experimental Results

05

1015

2025

3035

4045

50

//w //x//y//z //w //x//y[.//z] //w //x[.//y//z] //w [.//x//y//z]

Physical cursor moves

Page 24: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

24

Experimental Results

0

5

10

15

20

25

30

35

//w //x//y//z //w //x//y[.//z] //w //x[.//y//z] //w [.//x//y//z]

Running time (ms)

Page 25: 1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)

25

Conclusion

TwigOptimal algorithm outperforms existing twig join algorithms by more than 40%, especially for larger queries Optimized for I/O, which is the performance

bottleneck Extraction points optimization improve

performance