native xml databases - hong kong polytechnic universitycstyng/webdb.07/lectures/lesson4.pdf ·...

86
Native XML Databases 1 Native XML Databases Lesson 4

Upload: hoanghanh

Post on 06-Mar-2018

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 1

Native XML Databases

Lesson 4

Page 2: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 2

XML Data• XML adds a new data model to the world

– In addition to relational, hierarchical, OO, ...• The “XML” data model is

– A tree of ordered nodes– Nodes have different types (element, attribute, etc.)– Some nodes are labeled (Date, Quantity, Price, etc.)– Data stored in leaf nodes

• Modeling language is an XML schema language– DTD, XML Schemas, etc.

Page 3: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 3

Storing XML in a DB

• Build a model– Model the data in the XML document, or...– Model the XML document itself

• Map the model to the database• Transfer data according to the model

Page 4: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 4

Sample XML document<Order>

<Number>1234</Number><Customer>Gallagher Co.</Customer><Date>29.10.00</Date><Item Number="1">

<Part>A-10</Part><Quantity>12</Quantity><Price>10.95</Price>

</Item><Item Number="2">

<Part>B-43</Part><Quantity>600</Quantity><Price>3.99</Price>

</Item></Order>

Page 5: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 5

Modeling data

• Objects in model are specific to XML schema object Order {

number=1234;customer="Gallagher Corp.";date=29.10.00;items={ptrs to Items};

}

object Item {number=1;part="A-10";quantity=12;price=10.95;

}

object Item {number=2;part="B-43";quantity=600;price=3.99;

}

Page 6: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 6

Storing data

• Database schema specific to XML schemaOrdersNumber Customer Date1234 Gallagher Co. 291000... ... ...... ... ...

ItemsSONum Item Part Qty Price1234 1 A-10 12 10.951234 2 B-43 600 3.99... ... ... ... ...

Page 7: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 7

Modeling documents

• Objects in model independent of XML schema Element

(Order)

Element Element Element Element Element(Number) (Customer) (Date) (Item) (Item)

Text Text Text ...Element Element Element Attr

1234 Gallagher Co. 29.10.00 (Part) (Quantity) (Price) (Number)

Text Text Text Text

A-10 12 10.95 1

Page 8: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 8

Storing documents• Database schema independent of XML schema

(order columns not shown)

Elements Attributes TextID Name Parent ID Name Parent Parent Value 1 Order -- 13 Number 5 2 12342 Number 1 14 Number 6 3 Gallagher Co.3 Customer 1 4 29.10.004 Date 1 7 A-105 Item 1 8 126 Item 1 9 10.957 Part 5 10 B-438 Quantity 5 11 6009 Price 5 12 3.9910 Part 6 13 111 Quantity 6 14 212 Price 6

Page 9: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 9

Native XML Database

– Defines an XML data model– Uses a document as its fundamental unit of

(logical) storage– Can have any physical storage

Page 10: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 10

XML data model

• XML 1.0 did not define a model• Native XML databases define their own model

– Model must include elements, attributes, text, and document order

– Examples are XPath, Info Set, DOM, and SAX– XQuery model will be de facto standard in future?

• Data transferred according to the model

Page 11: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 11

Fundamental unit of storage

• Document is fundamental unit of logical storage– Equivalent structure in RDBMS is a row

• Document usually contains single set of data

Page 12: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 12

Text-based storage

• Stores documents as text• Uses file system, CLOB, etc.

– Includes XML-aware text in RDBMS

• May need to parse documents at run time• Uses indexes to avoid extra parsing,

increase search speed

Page 13: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 13

Text-based storage<Address>

<Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>

</Address>

<Address><Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>

</Address>

<Address><Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>

</Address>

<Address><Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>

</Address>

Page 14: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 14

Text-based databases

• Indexed files– TextML

• CLOBs– Oracle 9i release 2, DB2

Page 15: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 15

Model-based storage

• Stores documents in “object” form• Documents parsed when inserted• For example, store DOM objects in

OODBMS• Underlying storage can be relational,

object-oriented, hierarchical, proprietary• Uses indexes to speed searches

Page 16: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 16

Model-based storage<Address>

<Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>

</Address>

Element

Element Element Element Element Element

Text Text Text Text Text

Page 17: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 17

Model-based databases

• Proprietary– Tamino, Xindice, Neocore, Ipedo, XStream DB,

XYZFind, Infonyte, Virtuoso, Coherity, Luci, TeraText, Sekaiju, Cerisent, DOM-Safe, XDBM, ...

• Relational– Xfinity, eXist, Sybase, DBDOM

• Object-oriented– eXcelon, X-Hive, Ozone/Prowler, 4Suite, Birdstep

Page 18: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 18

Whole documents and fragments(Text-based databases)

• Should be very fast– Data is contiguous on disk– Retrieval requires index lookup and single disk

read 1. Index lookup2. Position disk head3. Read to here

Page 19: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 19

Whole documents and fragments (Model-based databases)

• Databases with proprietary stores should be fast– Can use physical pointers between nodes

• Databases built on other DBs may be fast or slow– Depends on underlying database and

implementation

Node Node Node Node NodeNode

1. Index lookup2. Position disk head3. Follow pointers to end

Page 20: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 20

Unindexed data• Slow for model-based databases

– Must read many elements, not just particular type– Comparisons may be slower due to converting text

• Very slow for text-based databases– Must parse document as well as comparing values

Element

Element Element Element Element

Text Text Text Attr Element ...

Task:Find date 29.10.00

Relational database:1. Search this column

Model-based native XML database:1. Search all elements for Date elements2. Search text for all Date elements

Orders... ... ...1234 29.10.00 Gallagher Industries... ... ...... ... ...

Page 21: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 21

Unindexed data: Optimization

• /Order[Date="29.10.00"]– Only need to search children of Order– Schema may help locate Date element

• //Order[Date="29.10.00"]– Schema may help locate Order and Date

elements– Without schema, must search entire document

Page 22: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 22

The eXist project• eXist is the leading open source implementation of

XML:DB• It acts as a repository for indexing and retrieval of

XML and RDF documents• Uses a native java backend data store

– Which splits out elements, attributes and entities into columns and associates tree path information

– This allows for very high response times to the search queries

• The eXist project goes beyond simple XPath queries by adding functionality like NEAR based queries and document grouping (collections)

Page 23: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 23

eXist DB• Schema-less data storage• Collections• Index-based query processing• XQuery/XPath extension for performing full text search• XUpdate support also

• exist-db.org

Page 24: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 24

eXist Architecture• eXist is split into three distinct areas

– Brokers, for accessing the data held in either the native Java data store or a relational DB like mySQL or Oracle

– The engine itself, used to rebuild the documents and query the data store – Interfaces to the engine, either XML-RPC, the XML:DB API (Java) or

SOAP for use within a Web Services framework

Page 25: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 25

Building products using eXist• There are three main options for building applications over

eXist– Using WebDAV access– Java applications, using the XML:DB API– Web Service applications (Built using Java, Python, Perl)

• Web Services can then be used through any SOAP aware programming languages

• XML-RPC Interfaces, available through most modern programming languages

• A natural fit for XML applications built on eXist is using Apache’s Cocoon as the presentation layer of your application

Page 26: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 26

xml.apache.org

• The Apache XML Project has activities focused on different aspects of XML – Xerces - XML parsers in Java, C++ (with Perl and COM bindings) – Xalan - XSLT stylesheet processors, in Java and C++ – Cocoon - XML-based web publishing, in Java – FOP - XSL formatting objects, in Java – Xang - Rapid development of dynamic server pages, in Java – Indice

• Native XML database

Page 27: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 27

Building products using eXist

Page 28: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 28

XPath Extension

• document()– Selection of a document, a set of documents, or

all

• collection()– Specification of a collection of documents to be

included in query evaluation– collection(‘/db/vincent’)//scence[speech[speaker=“David”]]/title

Page 29: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 29

XPath Extension

• Querying text– XPath only have a few functions in text searching– contains(), near(), &=, |=, match-all()

• //chapter[ contains(., ‘XML’) and contains(., ‘database)]– Find a chapter that its content contains the words ‘XML’and ‘database’

• //section[ near(., ‘XML database’, 50)]– Find sections containing both keywords in the correct order and with less than 50

words between them• //scene [ speech [&= ‘witch game’and line |= ‘fenny snake’]]

– Two additional operators for simple keyword queries: &= and |=– &= : selects context nodes containing ALL of the keywords in the right-hand

argument in any order. – |= : selects context nodes containing ANY of the keywords in the right-hand

argument • //speech [ match-all [line, ‘li[fv]e[s]’)]

– Tries to match the regular expression string

Page 30: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 30

Query Execution

• Basic approach– Top-down/bottom-up traversal for XPath expression– Very inefficient

• /book//section[ contains (title, ‘XML’)]• Follow every child path beginning at BOOK to check for potential

SECTION descendants

• Indexing structure– Efficient processing of regular path expressions on large,

unconstrained document collections

Page 31: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 31

IndexingVirtual nodes Complete K-ary tree

Page 32: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 32

Indexing• Too high numbers => document size limitation => drop completeness

– For 2 nodes x and y of a tree, size(x) = size(y) if level(x) = level(y), where size(n) is the number of children of a node n and level(m) is the length of the path from the root node of the tree to m

Page 33: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 33

Storage Implementation

• Storage backend– dom.dbx collects DOM nodes in a paged file

and associates node identifiers to the actual nodes

– collections.dbx manages the collection hierarchy

– element.dbx indexes elements and attributes– words.dbx keeps track of word occurrences and

is used by the full text search extensions

Page 34: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 34

Storage Implementation• All indexes are based on B+ - tree

node n1, node n2, …

Multiroot B+ -tree

Data pages

DOM nodes

Node-id Address

Document d1

Document d2

Page 35: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 35

Query Processing• Decompose a path into a chain of basic steps

– /PLAY//SPEECH[SPEAKER=`HAMLET`]

• Load the root element (PLAY) for all documents in the input document set

• The set of SPEECH elements is retrieved for the input documents via an index lookup from file element.dbx

• Use an ancestor-descendant path join algorithm to join the two sets

• Evaluate the predicate

Page 36: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 36

Query Screen

Page 37: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 37

Advantages of eXist

• Advantages of eXist as a Native XML DB– eXist Provides a scalable, reliable XML database implementation

royalty free– By including multiple different interfaces eXist makes application

integration simple– With the addition of other open source platforms (like Cocoon for

presentation and Axis for Web Services) eXist can be extended easily to fit many application needs

– eXist is being used currently in many live implementations

Page 38: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 38

Drawbacks• Drawbacks of eXist as a Native XML DB

– With open source, documentation is scarce, you will have to rely on mailing lists for the more difficult problems

– Warranty is not included from source, although it can be given by a third party

– Although stable, eXist is still considered beta software. Current version is 1.1.1 (Feb 2007)

Page 39: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 39

Sedna – Another Native XML DB

• Full-featured database system (external and main memory management, query and update facilities, concurrency etc.)

• Native XML database• Based on the XQuery language and the

XQuery/XPath data model• XUpdate language• Implemented in Scheme and C/C++• Supported platforms are Windows and Linux• http://modis.ispras.ru/Development/sedna.htm

Page 40: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 40

Data Organization

• Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their positions in descriptive schema

• Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships

Page 41: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 41

Descriptive Schema (Data Guide)<library><book><title>Foundation on databases</title><author>Abiteboul</author><author>Hull</author><author>Vianu</author>

</book>. . .<book><title>An Introduction to DatabaseSystems</title>

<author>Date</author><issue><publisher>Addison-Wesley</publisher><year>2004</year>

</issue></book><paper><title>A Relational Model for Large SharedData Banks</title><author>Codd</author>

<paper>. . .<paper><title>The Complexity of Relational Query Languages</title>

<author>Codd</author><paper>

</library>

library

book paper

title author issue

publisher year

title book

/library/book/title

library

book

title

Page 42: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 42

Data Structures (Node descriptor)title

. . .

node handle

Indirection table

children “by descriptive schema”

next-in-block

right-sibling

prev-in-block

left-sibling

parent

Label (numbering scheme)

Page 43: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 43

Structural query efficiency

When we answer structural queries like

We• Read only blocks containing necessary

information and do not read other blocks• Every block, which is being read, does

contain only those nodes that are to be in the answer

/library/book/title

Page 44: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 44

Node updates efficiency

• Node descriptors have fixed size aside the block

• Node descriptors are partly ordered

• Immutable numbering scheme

• Indirection table for parents

node right-sibling

left-sibling

parent

indirectiontable

child child…

Page 45: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 45

Memory Management• Pointers are used to present relationships between

nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective

• Database address space should be big enough to represent large volumes of data

OS memory management restrictions• Restriction on the size of address space caused by

32-bit architecture that prevails nowadays• Cannot control the page replacement (swapping)

procedure

Page 46: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 46

Layered Address Space (LAS)Layered Address Space

OS Virtual Process Address Space

Transaction

process

Buffer Manager

External Memory (Disk)

(layer, addr)

addr

MapViewOfFile(Windows)

mmap (Linux)

Buffer Memory

VirtualLock (Windows)

mlock (Linux)

layer * LAYER_SIZE + addr

Page 47: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 47

Query Evaluation Aspects

• Suspended element constructors• Different strategies for XPath queries

evaluation• Combining Lazy and Strict Semantics

Page 48: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 48

Element Constructors

• XML element construction requires deep copy of its content – so, the operation is heavy

• Suspended element constructors – Does not perform the deep copy

• Stores a pointer instead– the copy is performed on demand when

some operation gets into the constructed element

Page 49: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 49

Different strategies for XPath queries evaluation

library

book paper

title author issue

publisher year

title book

/library/book[issue/year=2004]

/library/book/issue/year[.=2004]../../title

year

book

Top-down: descriptive schema

Two steps

Page 50: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 50

Combining Lazy and Strict Semantics

• Iterative result computation (open; next; close)

• Iterative result computation with functional programming language give lazy evaluation

• On the other hand, strict semantic of a language is more efficient comparing with lazy semantics

• So, strict and lazy semantics is combined for XQuery

Page 51: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 51

Combining Lazy and Strict Semantics

• Query evaluations starts in lazy mode• Every function call is a reason to switch to

strict mode if the sizes of arguments are relatively small

• The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode

Page 52: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 52

Summary

• Efficient evaluation of structured XPath queries

• Local node-level updates• Effective processing of XML data in main

memory comparable to general purpose programming language

Page 53: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 53

Tamino XML Server• Supports Internet and W3C standards• Stores/Retrieves native XML documents and non-XML • Offers full text search on XML documents • Accesses Rdbms databases• Integrates with other business applications• Supports any programming language • Works with major Web servers and Applications

servers

Page 54: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 54

Tamino XML Server Architecture

Core Services

Security Service

Tokenizer (opt.) Chin.,Jap.,Kor.

Administration Services

Query / Text-RetrievalService

XML Parser + Query Interpreter

Obj. Processor& Obj. Composer

X-TensionService

Tamino Manager

Data Map

XML SchemaService

Native XMLData Store

Single Customer View

Self ServicePortals

Supply Chain Integration

Business Reporting

XML Business Integration Solutions + Customer Solutions

Customer Solutions

Enabling Services

InteractiveServices

Application Programming

Software AGIntegration Svcs

SchemaServices

More ...

EnterpriseEdition Services

External DBServices (opt.)

UDDI andWeb Services

Internet (HTTP, WebDAV, SOAP)

Databases

Applications

Back Office/ Back End

incl. Adabas

Front Office/ Clients

Mobile Phone

PDA

Printer

Browser

CDInternet File

System

XQuery, XPath

XML, WML, HTML

ODBC,Adabas

Data / Metadata / non XMLCOM

Page 55: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 55

Tamino Features

• Integration of data in existing, external data sources– Access to and modification of data from diverse

systems (relational DBs, object DBs, Office-Systems...)

• Database Queries with ‘XQuery’– XPath-based - regarding document structure and

content• Simple administration

– Browser-based control via any PC having Internet access

• Simple connection to Internet via standard Web-Server

Page 56: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 56

Tamino Features• Security

– Group / User authorization rights down to XML Element level

• Read-Only databases supported• Dynamic Style-Sheet processing supported

– e.g. HTML formatted output of XML documents

Page 57: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 57

The Communication Interface

Application

Web Server

Web Browser

HTTP

X-Port

Apache, MS IIS, IBM WebSphere,Sun iPlanet

Tamino Server

Port 3204

"xml"

Tamino Server

Port 3207

"mydata"

Tamino Server

Port 3210

"test1"

http://www.ebiz1.com/tamino/xml

Page 58: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 58

• Connecting XML-based businesses to Web• Highest performance with unlimited native XML data

storage• “Valid” and “well-formed” XML accepted• XML database structure easily changeable• Full-text search (Indexing for Full Text and Standard

search)

Native XML Store

Page 59: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 59

Tamino XML Data Store

Page 60: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 60

Tamino Data Map

Page 61: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 61

Tamino XML Schema• Contains all the information needed

for storage, indexing and processing of XML objects, especially for– storage of XML structures and properties– Integration of external databases /

applications– Construction of standard and text indices

Page 62: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 62

Schema - CollectionsCollection"sailing"

Doctype "cruise"

Doctype"yacht"

Doctype "contract"

Collection"shop"

Doctype "order"

Doctype "product"

Doctype "contract"

Page 63: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 63

XML Schema Support• Complete Support for XML Schema 1.0 Specification• Industry Schema Support:

Docbook 4.4FpML 4.1METS 1.0NewsML 1.2SVG 1.0UBLVoiceMLWord 2003XBRL 2.1

• Full DTD Support

Page 64: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 64

Storing - customer.xml<CUSTOMER>

CUSTOMERID>1044</CUSTOMERID><FIRSTNAME>Paul</FIRSTNAME><LASTNAME>Astoria</LASTNAME><HOMEADDRESS>

<STREET>123 Cherry Lane</STREET><CITY>Best</CITY><STATE>CA</STATE><ZIP>94132</ZIP>

</HOMEADDRESS></CUSTOMER>

Store XML document into Schema using URL with command _Process in default collection

Ready!OR USE

DTD Tamino XML Schema

Read DTD or XML Schemainto Tamino Schema Editor

Mark check boxes for (text) indexing

Save XML Schema with a collection name

Store XML document using URL with command _Process in named collection

Page 65: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 65

Tamino Search and Retrieval• W3C XQuery Support

– User-defined functions– If-Then-Else– Node-level update

• XPath Support– Extended with text search

XML Schema

Ap

plic

atio

n

APIXQuery

XML Web Server

Data Map

TaminoX-Tension

customApplication

existingDBMS

for $b ininput()/bib/booklet $a := $b/author where$b/pricelt200return($b/title, $a)

Page 66: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 66

Tamino Indexing and Retrieval• Standard

– Classical database indexes– Index any combination of elements and attributes– Supports relational operators, exact comparisons, sorting

• Text– Use in conjunction with text retrieval functions– Supports wildcard searches

• Structure– Index declared on the document– Registers instances of undeclared nodes

Page 67: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 67

Tamino Indexing and Retrieval• Reference

– Indexes specific sub-trees of a document (e.g. /doc/a/b)– Useful for documents of high complexity (multiplicity of sub-

trees)• Multipath

– Index any element or attribute that meets an XPath expression• Compound

– Index a combination of two elements (e.g. lastname and firstname)

Page 68: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 68

• Easy Integration with existing DBMS

• “One Server View” on integrated heterogeneous databases

• non-native objects stored in other database types (SQL, Adabas,...) • Connection to existing data storage in other DB types

(e.g. RDBMS, Excel ... )

DB Connector

X-N

OD

EO

pen

AP

I

SCHEMA

Tamino KernelRDBMS

RDBMS

RDBMSAdabas

Page 69: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 69

–XML-based application– logic on the Tamino server

• Event functions, query functions, content dependent mapping ...

• Message forwarding on events• User programmable server

functionality, customization • Allows integration with ext. Applications• Technology: COM Objects (C++) & Java

X-Tension

Financial System

Spreadsheet...

ServerExtension

Page 70: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 70

SearchDisplay

Modify

XAppGenerator

X-Application Architecture

SOAP

Web Service

Internet

HTML Pages

Browser

Web Server

HTML

JSP TagLibrary

Business

Modules

Tamino API

Tamino

Plu

gin

s

Internet

Java API

ServersidePRESENTATION

LAYER

APPLICATION LAYER

DATA ACCESSLAYER

Page 71: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 71

MS Office 2000, Internet Explorer,

XML-Spy, Adobe Acrobat, Python,

WebDAV Explorer, Dreamweaver , other applications ....

Tamino WebDAV Server

• WebDAV = Web-based Distributed Authoring and Versioning

• WebDAV is a standard • Future: framework for

CMS (with check-in/-out, versioning, query)

• Community downloads

Tamino WebDAV Server

TaminoServer

Page 72: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 72

Tamino APIs

• Various APIs to Tamino Server

• Released with Tamino 3.1– Java APIs – EJB-API (Application

Server support – BEA, IBM, HP ...)

– ActiveX

Tamino XML Server

EJB API

C-API

Java API

.NET

WebServerless APIs

ActiveX

… … .

Page 73: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 73

Tamino DOM APIsJava

ApplicationCOM

Application

HTTP

HTTP C Web browser/ASP

JavaInterface

JScriptInterface

HTTP CInterface

COMInterface

Tamino server

Page 74: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 74

Cons of Native XML DB

• Products are immature• Many standards are still in development• Techniques are unfamiliar to people• Not good at transaction processing• Tool support is minimal• Some expected database practices are still

unsupported• Interoperability between products is minimal

Page 75: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 75

Pros of Native XML DB

• Good way to store XML• Can store document or data style XML• Tremendous flexibility• Applications can be loosely coupled• Data modeling is simple and flexible• Complement RDBMS with XML mapping

solutions• Performance can be very good

Page 76: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 76

HKCAN• Hong Kong Chinese Authority (Name)• A collaborative project since 1999• 7 Hong Kong university libraries

– Chinese University of Hong Kong– City University of Hong Kong– Hong Kong Baptist University– Hong Kong Institute of Education– Hong Kong Polytechnic University– Lingnan University– University of Hong Kong

Page 77: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 77

Aims• To build up a Chinese name authority file with CJK

(Chinese, Japanese, Korean) scripts that meets the need of the bilingual community

• To improve and streamline authority-control operations by setting up standardization for name headings and principles for authority record selection to achieve “Better”, “Faster” and “Cheaper”

• To participate in regional and global cooperative activities on authority work

Page 78: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 78

Record model - HKCAN record 008 941020nc acannaabn |a aaa |||010 $anr 94034993035 $a(DLC#)nr 94034993a040 $aDLC-R$beng$cDLC-R$dOCoLC$dHkCU$dHkCAN066 $c$1100 1 $aZhou, Ying,$d17th cent.400 1 $wnne$aChou, Ying,$d17th cent.400 1 $aZhou, Fangshu,$d17th cent.400 1 $a周方叔,$d17th cent.400 1 $aChou, Fang-shu,$d17th cent.670 $aChih lin (卮林), 1992:$bt.p. (Chou Ying)670 $aChung wen ta tz{176}u tien (中文大詞典):$bv. 6, p. 290 (Chou Ying; of Ming; native of P{176}u-t{176}ien; t. Fang-shu; author of Chih lin; lived around the mid of Emperor Ch{176}ung-chen reign)670 $aHis卮林 : 10卷, 附補遺1卷, [1963]:$bt.p. (周嬰)670 $a中國人名大辭典, 1934:$bp. 545 (周嬰, 明莆田人, 字方叔, 崇禎中以貢生知上猶縣, 所著卮林, 体近類書)700 1 $a周嬰,$d17th cent

Page 79: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 79

Record model – in library system

Page 80: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 80

Document Type Definition

• HKCAN DTD (Document Type Definition) – to specify the structure of each XML authority record

• With this DTD, records can be output to the XML schema or other related schemas if needed

• DTD has well-served all the necessary functionality in the present XML platform

Page 81: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 81

DTD<?xml version="1.0" encoding="UTF-8"?><!--DTD generated by XMLSPY v2004 rel. 3 U (http://www.xmlspy.com)--><!ELEMENT Leader (#PCDATA)><!ELEMENT Name (Leader, (Tag* | tag_type00* | tag_type10* | tag_type11* | tag_type30* | tag_1xx* | tag_4xx* | tag_5xx* | tag_7xx* | tag_670*)*)>

<!ATTLIST Nametag001 CDATA #IMPLIEDrecord_type CDATA #IMPLIED

><!ELEMENT Subfield (#PCDATA)><!ATTLIST Subfield

subfield_code CDATA #IMPLIED><!ELEMENT Tag (#PCDATA | Subfield*)*><!ATTLIST Tag

tagcode CDATA #IMPLIEDrecord_type CDATA #IMPLIED

ind1 CDATA #IMPLIEDind2 CDATA #IMPLIED

><!ELEMENT tag_1xx (#PCDATA)><!ELEMENT tag_4xx (#PCDATA)><!ELEMENT tag_5xx (#PCDATA)>

<!ELEMENT tag_670 (#PCDATA)><!ELEMENT tag_7xx (#PCDATA)><!ELEMENT tag_type00 (#PCDATA)><!ELEMENT tag_type10 (#PCDATA)><!ELEMENT tag_type11 (#PCDATA)><!ELEMENT tag_type30 (#PCDATA)>

Page 82: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 82

HKCAN XML platform

Web interface

Records in Communication MARC format with EACC encoding

Records in XML format with EACC encoding

Records in XML format with UTF-8 encoding

HKCAN XML full text search server (Tamino)

for full text search, records display & download

HKCAN index search server

(SQL anywhere 8.0)

Full text searchIndex search

Program to convert records from Communication MARC format to XML format

Program to convert the records from CCCII encoding to UTF-8 encoding

Import the records to a relational database for index search

Retrieve the full record from HKCAN XML server

MARC

Page 83: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 83

Record conversion

00681cz 2200193n 4504001001000000003000600010005001700016008004100033010001600074035002300090040003000113066000700143100003000150670002600180670018700206670005700393678000900450700002800459 000000001 HkCAN 19960504052613.5 800523n| acannabb| |n aaa an 50026575 a(DLC#)n 50026575a aDLC cDLC dCU dDLC-R dHkCU c$1 1 aNakayama, Shigeru, d1928- aHis Senseijutsu, 1963 aKagaku gijutsu to ekoroj{229}i, 1995: bt.p. (Nakayama Shigeru) colophon (r; b. 1928; Ph.D. (from Harvard Univ.); prof., Kanagawa Daigaku; former asst. prof., T{229}oky{229}o Daigaku) a�$1!Bs!Ci!O(':`!5=�(B, 1999: bp. 2 (�$1!04!;e!Th�(B) aSc.D 1 a�$1!04!;e!Th�(B, d1928-

From MARC record with EACC encoding

Page 84: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 84

Record conversionTo XML record with EACC encoding

<Name tag001 = "000000001" record_type = "00"><Leader>00681cz 2200193n 4504</Leader><Tag tagcode = "003" record_type = "" ind1 = "" ind2 = "">HkCAN</Tag><Tag tagcode = "005" record_type = "" ind1 = "" ind2 = "">19960504052613.5</Tag><Tag tagcode = "008" record_type = "" ind1="" ind2="">800523n| acannabb| |n aaa </Tag><Tag tagcode = "010" record_type = "" ind1=" " ind2=""><Subfield subfield_code = "a">n 50026575 </Subfield></Tag>… ..<Tag tagcode = "670" record_type = "" ind1=" " ind2=""><Subfield subfield_code � �= "a"> $1!Bs!Ci!O(':̀ !5= (B, 1999: </Subfield><Subfield subfield_code � �= "b">p. 2 ( $1!04!;e!Th (B) </Subfield></Tag>… … .

� �<tag_type00> $1!04!;e!Th (B, | 1928- | </tag_type00>� �<tag_7xx> $1!04!;e!Th (B, | 1928- | </tag_7xx>

</Name>

Page 85: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 85

Record search• Index search (browse search)• Full text search (phrase/keyword search)• Tamino server: Full text search, record display, record download• SQL Anywhere server: Index search

Page 86: Native XML Databases - Hong Kong Polytechnic Universitycstyng/webdb.07/lectures/lesson4.pdf · Native XML Databases 2 ... – XQuery model will be de facto standard in future?

Native XML Databases 86

One stop search• Inspired by VIAF (Virtual International Authority Files) & the

LEAF (Linking and Exporting Authority Files) Projects• Search across multiple authority files concurrently• HKCAN, Chinese Authority Name Database (Taiwan), LC

Authority File, National Library of China