querying streaming xml data

34
Querying Streaming XML Data

Upload: sagira

Post on 08-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

Querying Streaming XML Data. Layout of the presentation. Introduction Common Problems faced Solution proposed Basic Building blocks of the solution How to build up a solution to a given query Features of the system. Streaming XML. XML – standard for information exchange. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Querying Streaming XML Data

Querying Streaming XML Data

Page 2: Querying Streaming XML Data

Layout of the presentation

Introduction Common Problems faced Solution proposed Basic Building blocks of the solution How to build up a solution to a given

query Features of the system

Page 3: Querying Streaming XML Data

Streaming XML XML – standard for information exchange. Some XML documents only available in

streaming format. Streaming is like reading data from a tape

drive. Used in Stock Market, News, Network

Statistics. Predecessor systems used to filter

documents.

Page 4: Querying Streaming XML Data

Structure of an XPath Query

Consists of a Location path and an Output Expression (name).

Location path consists of closure axis(//), node test (book) and predicate (year>2000).

e.g. //book[year>2000]/name

Page 5: Querying Streaming XML Data

Features of our Approach

Efficient Easy to understand design. Design of BPDT is tricky

Page 6: Querying Streaming XML Data

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Page 7: Querying Streaming XML Data

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Page 8: Querying Streaming XML Data

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Page 9: Querying Streaming XML Data

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Page 10: Querying Streaming XML Data

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Buffer both A & B

Page 11: Querying Streaming XML Data

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Failed price<11. Remove

Buffer both A & B

Page 12: Querying Streaming XML Data

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Failed price<11. Remove

Buffer both A & B

Test passed. Output

Page 13: Querying Streaming XML Data

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Page 14: Querying Streaming XML Data

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Page 15: Querying Streaming XML Data

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Passes year=2002

Page 16: Querying Streaming XML Data

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>7. <book>8. <name> Y </name>9. <author> B </author>10. <pub>11. <book>12. <name> Z </name>13. <author> B </author>14. </book>15. <year> 1999 </year>16. </pub>17. </book>18. <year> 2002 </year>19. </pub>20. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Passes year=2002

Lets add author. Result?

Page 17: Querying Streaming XML Data

Handling XML Stream

Input – well formed XML stream. Use SAX API to parse XML. Events belong to

Begin = {(a, attrs, d)} End = {(/a, d)} Text = {(a, text(), d)}

XML Stream: {e1,e2,…,ei,…} ¦

ei Є Begin υ End υ Text

Page 18: Querying Streaming XML Data

Grammar for XPath Queries Q N+[/O] N [/¦//] tag [F] F [FO[OP constant]] FO @attribute ¦ tag [@attribute] ¦ text() O @attribute ¦ text() OP > ¦ ≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains

XPath query of the form N1N2…Nn/O

Cant handle Reverse Axis, Positional Functions.

Page 19: Querying Streaming XML Data

Solution to QueryQuery: /pub[year=2002]/book[price<11]/author

PDA PDT

Page 20: Querying Streaming XML Data

Basic PushDown Transducer (BPDT)

Similar to PushDown Automata Actions defined on Transition Arcs Finite set of states

A Start state A set of final states

Set of input symbols Set of Stack symbols

Page 21: Querying Streaming XML Data

Book – Author: Buffer for future: Begin event of Author.

Book – Author: Remove from Buffer: End event of Book.

Book – Author: Output result if predicates true: Begin event of Author.

Building a BPDTQuery: /pub[year>2000]/book[author]/name/text()

Consider location step: /book[author]

Page 22: Querying Streaming XML Data

Basic Building Blocks

XPath Expression: /tag[child]

Page 23: Querying Streaming XML Data

Buffer Operations needed Enqueue(x): Add x to the end of the queue.

Clear(): Removes all items from the queue.

Flush(): Outputs all items in the queue in FIFO order.

Upload(): Moves all items to the end of the queue of a parent BPDT.

No Dequeue operation needed.

Page 24: Querying Streaming XML Data

Basic Building Blocks

XPath Expression: /tag[@attr=val]

Page 25: Querying Streaming XML Data

Basic Building Blocks

XPath Expression: /tag[text()=val]

Page 26: Querying Streaming XML Data

Basic Building Blocks

XPath Expression: /tag[child@attr=val]

Page 27: Querying Streaming XML Data

Basic Building Blocks

XPath Expression: /tag[child=val]

Page 28: Querying Streaming XML Data

A sample BPDT

Query: /pub[year>2000]

Page 29: Querying Streaming XML Data

Building a solutionHPDT for Query:

//pub[year>2000]//book[author]//name/text()

Page 30: Querying Streaming XML Data

HPDT Structure Each BPDT in HPDT has:

Position BPDT POSITION (l,K) :- l = depth of BPDT in HPDT, K

= sequence # from right to left BPDT Position (i-1,k) – has right child BPDT position

(i,2k) – connected to NA state BPDT Position(i-1,k) – has left child BPDT position

(I,2k+1) – connected to True state. BPDT Position (i, 2i – 1) – means predicates in higher

level BPDT’s evaluate to trueBuffer – potential resultsStack – stack of elements (SAX) eventsDepth Vector

Page 31: Querying Streaming XML Data

Example Query

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

rootpub book name

1 2 7 11

1 2 10 11

1 9 10 11

3 paths from $1 to $14

Page 32: Querying Streaming XML Data

System Features

Name Support Streaming Multiple

Predicates Closure

Buffered Predicate

Evaluation

XSQ-F XPath X X X X

XSQ-NC XPath X X X

XMLTK XPath X X

XQEngine XQuery X X

Galax XQuery X X

Joost STX X X

Page 33: Querying Streaming XML Data

Reference Feng Peng and Sudarshan Chawate. XPath Queries

on Streaming Data. In SIGMOD 2003.

Page 34: Querying Streaming XML Data

Thank You

???