the connection factory jeroen van rotterdam, cto may 19th, www9

24
The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Upload: destiny-perez

Post on 27-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

The Connection Factory

Jeroen van Rotterdam, CTO

May 19th, WWW9

Page 2: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Contents

- Xhive setup

- Xpath

- Xpath performance issues within XML collections

Page 3: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Xhive

- OO-XML database- Highly scalable- High granularity- W3C DOM L2 compliant- Xpath 1.0 compliant

Page 4: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Architecture

Xhive Core

OODB

XP

ath

DO

M C

ore

L2

Ex

ten

de

d D

OM

DO

M T

rav

ers

al

DB

Ad

min

istr

ato

r

RMI Layer ( EJB / CORBA / SOAP )

RMI Layer

Client

Sc

he

ma

SQ

L l

oa

de

r

Page 5: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Architecture

Page 6: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Why XPath

Competing solutions:

- XML-QL: Where-In constructs- XQL: limited- SQL: no alternative

Xpath a complete pattern match language.

Page 7: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Xpath

Advantages:

- fairly complete- multiple axes- supported by W3C- base for Xpointer, Xlink- base for XML Query WG- user based functions

Disadvantages:

- document oriented- minor different tree model- no updates

Page 8: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Extending DOMCollection setup:

Every document is a “Bastard Node”

getLastChild()getFirstChild()

null

Library Node

Document Nodes

getParentNode()

Page 9: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Library Node

Advantages

- Natural extension of DOM- extendible- closely related to directory structures- searchable with Xpath

Page 10: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Library Node

Disadvantages

- potential bottleneck

Page 11: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Xpath

- Xpath in a large PDOM collection environment:

1. Address memory issues2. Solve differences in specs3. Address performance issues

Page 12: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Memory issues

- Avoid recursion- make subresults persistent capable

Page 13: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Solve differences

Differences in specs are f.i.:

- getParent on attributes vs. ownerElement- namespace nodes

Page 14: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance

Increase Xpath performance:

- Query analysis- Avoid reparsing- Lazy evaluation- Index structures- Cache strategy- DTD analysis- Statistical data

Page 15: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance

1. Query analysis:

a. Can I simplify my query

f.i: /child::chapter[5+5]

Page 16: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance

1. Query analysis:

b. Does your query depends on the context node.

Absolute queries are context independent:

“Give me all chapters where the title is the same as the book title”

//chapter[title=string(/book/title)]Evaluate string(/book/title) only once.

Page 17: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance

2. Storing parsed queries:

“Compile”, optimize queries only once

Page 18: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance3. Lazy evaluation:

f.i. operations on Nodesets

- booleans (evaluate first node)- strings (first in doc order)- number (string to number)

Example: “give me all chapters which have paragraphs”

/chapter[paragraph]

Finding 1 paragraph will do

Page 19: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance

4. Indexing:

- getFirstChildElementByName(String name)- getNextSiblingElementBySameName()- getFirstChildByType( short type )- getNextSiblingByType( short type )

Page 20: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance5. Caching strategy:

top level paging/cluster strategy

Library Node

Document Nodes......

...... Root elements

Page 21: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance

6. Use DTD information:

f.i. /child::chapter/child::book[4]

Might return null if you have info on the DTD’s used.

Page 22: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Performance

7. Gather statistical info:

DTD’s or Xschema specify structures that may occur, not what’s actually in your collection.

Page 23: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Conclusion

- DOM within database environments- Xpath on top of a PDOM - Xpath is fairly complete- Focus on performance

Page 24: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

WWW9

Beta testers, Developers wanted.

Email: [email protected]

Have fun…...