natix done by asmaa hassanain csc 5370 dr. hachim haddoutti 12/8/2003

28
Natix Natix Done by Done by Asmaa Hassanain Asmaa Hassanain CSC 5370 CSC 5370 Dr. Hachim Haddoutti Dr. Hachim Haddoutti 12/8/2003 12/8/2003

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

NatixNatixDone byDone by

Asmaa HassanainAsmaa Hassanain

CSC 5370CSC 5370

Dr. Hachim HaddouttiDr. Hachim Haddoutti12/8/200312/8/2003

Page 2: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 22

ContentsContentsXML data management TechniquesXML data management TechniquesWhat is NatixWhat is NatixNatix ArchitectureNatix ArchitectureStorage Layer: Logical Data ModelStorage Layer: Logical Data ModelMapping between XML and the Logical ModelMapping between XML and the Logical ModelXML page Interpreter Storage FormaterXML page Interpreter Storage FormaterXML segment mapping for large treesXML segment mapping for large treesIndex StructuresIndex StructuresNatix Physical AlgebraNatix Physical AlgebraExample PlansExample PlansTo do...To do...

Page 3: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 33

XML data management XML data management TechniquesTechniques

Map data to relational databaseMap data to relational database But:But:

Unnormalized relationsUnnormalized relations

Data centric view: Large number Data centric view: Large number

of tablesof tables

Document centric view: all Document centric view: all

informantion in a single data informantion in a single data itemitem

(e.g. CLOB) (e.g. CLOB)

Store data as a plain text fileStore data as a plain text file

But:But: Need to parse the entire file for Need to parse the entire file for

processing every query processing every query

Store data as objectsStore data as objects

But:But: OOD systems are not enough OOD systems are not enough developed to provide efficient developed to provide efficient querying capabilities querying capabilities

Designing Native XML database Designing Native XML database systems from scratchsystems from scratch

Page 4: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 44

NatixNatix

Page 5: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 55

What is Natix?What is Natix?

Natix is a native XML RepositoryNatix is a native XML Repository Proposed by Kanne and Moerkotte at Proposed by Kanne and Moerkotte at

University of Mannheim University of Mannheim (Germany)(Germany)

Natix requires Linux to run (kernel Natix requires Linux to run (kernel 2.2.16 or later, or 2.4.*), with CODA 2.2.16 or later, or 2.4.*), with CODA support enabled in the kernel.support enabled in the kernel.

Still under developmentStill under development

Page 6: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 66

Natix ArchitectureNatix Architecture

Page 7: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 77

Natix ArchitectureNatix ArchitectureBinding Layer:Binding Layer: map between the Natix map between the Natix

Engine Interface and different Engine Interface and different application interfaces application interfaces

Page 8: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 88

Natix ArchitectureNatix Architecturee. g. NatixFS:e. g. NatixFS: File system interface – Natix can be mounted like File system interface – Natix can be mounted like

an ordinary file systeman ordinary file system Allows to view XML tree as a file system treeAllows to view XML tree as a file system tree Importing a document – just copy it to a directory, Importing a document – just copy it to a directory,

e.g. cp bib.xml /natixe.g. cp bib.xml /natix Exporting a document – just open it, e.g. Exporting a document – just open it, e.g.

more /natix/bib.xmlmore /natix/bib.xml Removing a document – just delete a file, e.g. Removing a document – just delete a file, e.g.

rm /natix/bib.xmlrm /natix/bib.xml XPath expressions – just use it as file name, e.g.XPath expressions – just use it as file name, e.g.

more /natix/{%%title}more /natix/{%%title}

Page 9: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 99

Natix ArchitectureNatix ArchitectureService Layer: Service Layer: Provides all DBMS Provides all DBMS

functionality required in addition to functionality required in addition to simple storage and retrievalsimple storage and retrieval

Natix Engine InterfaceNatix Engine Interface Query execution engineQuery execution engine Query compilerQuery compiler Transaction managerTransaction manager Object managerObject manager

Page 10: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1010

Natix ArchitectureNatix Architecture

Natix Engine Interface:Natix Engine Interface:

The interface through which the The interface through which the database services communicate with database services communicate with each other and with applications each other and with applications

provides a unified facade to specify provides a unified facade to specify requests to the database system.requests to the database system.

Page 11: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1111

Natix ArchitectureNatix ArchitectureQuery compiler:Query compiler: translates queries translates queries

expressed in XML query languages expressed in XML query languages into optimized query execution plansinto optimized query execution plans

Page 12: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1212

Natix ArchitectureNatix ArchitectureQuery execution engine:Query execution engine: evaluates evaluates

queriesqueries

Interprets the plan passed by the Interprets the plan passed by the query compilerquery compiler

Able to execute all queries Able to execute all queries expressible in a typical XML query expressible in a typical XML query language like XQuery language like XQuery

Page 13: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1313

Natix ArchitectureNatix ArchitectureTransaction managementTransaction management :: contains contains

classes that provide ACID style classes that provide ACID style transactions + Components for transactions + Components for recoveryrecovery

adapt the ARIES protocol for adapt the ARIES protocol for recoveryrecovery

For synchronization, an S2PL based For synchronization, an S2PL based scheduler is introducedscheduler is introduced

Page 14: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1414

Natix ArchitectureNatix ArchitectureStorage Layer:Storage Layer: manages all persistent data manages all persistent data

structures astructures and their transfer between main and nd their transfer between main and secondary memory secondary memory ..

contains classes for efficient XML storage, contains classes for efficient XML storage, indexes and meta data storage. indexes and meta data storage.

manages the storage of the recovery log and manages the storage of the recovery log and controls the transfer of data between main and controls the transfer of data between main and secondary storage. secondary storage.

accesses raw disks or file system files and accesses raw disks or file system files and provides a memory space divided into segments, provides a memory space divided into segments, which are a linear collection of equal-sized pages.which are a linear collection of equal-sized pages.

Page 15: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1515

Storage Layer:Storage Layer: Logical Data Model Logical Data Model

Logical Data Model:Logical Data Model: logical tree logical tree

New nodes can be inserted as children New nodes can be inserted as children or siblings of existing nodesor siblings of existing nodes

Any node can be removedAny node can be removed

Individual documents are represented Individual documents are represented as ordered treesas ordered trees

Page 16: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1616

Mapping between XML and the Mapping between XML and the Logical ModelLogical Model

A small wrapper class is used to map the A small wrapper class is used to map the XML model with its node types and XML model with its node types and attributes to a simple tree model and vice attributes to a simple tree model and vice versa:versa: Elements are mapped one to one to tree Elements are mapped one to one to tree

nodes of Logical Data Modelnodes of Logical Data Model Atributes are mapped to child nodes of an Atributes are mapped to child nodes of an

additional attribute additional attribute container childcontainer child node node The name of referenced entities are The name of referenced entities are

retained in special internal nodesretained in special internal nodes

Page 17: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1717

XML page Interpreter Storage XML page Interpreter Storage FormaterFormater

The logical data tree is partitioned The logical data tree is partitioned into subtreesinto subtrees

Each sudtree is stored in a single Each sudtree is stored in a single record of variable lenghtrecord of variable lenght

Each record contains a pointer to the Each record contains a pointer to the record containing the parent node record containing the parent node and the document identifierand the document identifier

Page 18: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1818

XML page Interpreter Storage XML page Interpreter Storage FormaterFormater

Subtrees of original XML document are Subtrees of original XML document are stored together in a single physical recordstored together in a single physical record

clusters connected subtrees of the clusters connected subtrees of the document tree into large records and document tree into large records and represents intra-record references represents intra-record references differently from inter-record referencesdifferently from inter-record references

The inner structure of the subtrees is The inner structure of the subtrees is retainedretained

Page 19: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 1919

XML segment mapping for large XML segment mapping for large treestrees

Proxy nodes refer to connected subtrees not stored in the same Proxy nodes refer to connected subtrees not stored in the same record record

Helper aggregate nodes group together a subset of children of a Helper aggregate nodes group together a subset of children of a nodenode

Page 20: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2020

Index StructuresIndex StructuresNatix uses two Index Structures:Natix uses two Index Structures: Full text index framework Full text index framework (inverted files): store lists of document (inverted files): store lists of document references to indicate in which references to indicate in which documents search terms appeardocuments search terms appear

Index Map search terms to list identifier and store these mappings persistenly

Provides the main interface for the user to work with inverted files

List Manager Maps the list identifiers to the actual lists (managing the directory of the inverted file)

FragmentedListLists are divided to fragments that fit on a page + linked together + can be traversed sequentially

It manages all the fragments of one list and control insertions and deletions on this list

ContextDescription Establishes the actual representation in which data is stored in a list

eXtended Access Support RelationeXtended Access Support Relation Preserves the parent/child, ancestor/ descandant, and preceding/following relationships between nodes

The XASR combined with a full text index provides a powerful method to search on contentens of nodes

Page 21: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2121

Natix Physical AlgebraNatix Physical Algebra

‘‘LetLet’, ‘’, ‘forfor’, ‘’, ‘wherewhere’ and ‘’ and ‘returnreturn’ in ’ in XQueryXQuery are supported are supported

‘‘SelectSelect’, ‘’, ‘mapmap’, ‘’, ‘joinjoin’, ‘’, ‘groupinggrouping’ and ’ and ‘‘sortsort’ operations are performed by ’ operations are performed by standard algebraic operators standard algebraic operators borrowed from borrowed from relational relational contextcontext

‘‘D-joinD-join’ and ‘’ and ‘unary and binary unary and binary groupinggrouping’ are borrowed from the ’ are borrowed from the object orientedobject oriented context context

Page 22: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2222

Natix Physical AlgebraNatix Physical AlgebraScan operationsScan operations: e. g. ExpressionScan: e. g. ExpressionScanExpressionScanExpressionScan: generates a tuple containing : generates a tuple containing

the root of the document identified by its the root of the document identified by its name by evaluating a given expressionname by evaluating a given expression

UnnestMapUnnestMap is used to generate variable bindings is used to generate variable bindings for XPath expressionsfor XPath expressions

e.g./a//b/c e.g./a//b/c UnnestMap$4=child($3,c)( UnnestMap$4=child($3,c)( UnnestMap$3=desc($2,b)(UnnestMap$3=desc($2,b)( UnnestMap$2=child($1,a)([$1])))UnnestMap$2=child($1,a)([$1])))‘‘BA-MapBA-Map’, ‘’, ‘FL-MapFL-Map’, ’’, ’Groupify-GroupApplyGroupify-GroupApply’ and ’ and

‘‘NGroupify-NGroupApplyNGroupify-NGroupApply’ are use to construct ’ are use to construct the XML resultthe XML result

Page 23: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2323

Example Plans (1):Example Plans (1):

This query retrieves the title and the year for all recent books

Page 24: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2424

Example Plans (2):Example Plans (2):

Page 25: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2525

To do...To do... Support for functions inside XPath Support for functions inside XPath

expressionsexpressions

Cannot import DTDs as of nowCannot import DTDs as of now

Support for different character encodingsSupport for different character encodings

Support for XML namespacesSupport for XML namespaces

preparing for the launch of the first preparing for the launch of the first full commercial end-user release of full commercial end-user release of Natix that may support all these Natix that may support all these featuresfeatures

Page 26: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2626

QuestionsQuestions ??

Page 27: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2727

ReferencesReferencesNatix: A Technology Overview:Natix: A Technology Overview:http://pi3.informatik.uni-mannheim.de/publications.html#79http://pi3.informatik.uni-mannheim.de/publications.html#79Efficient storage of XML data:Efficient storage of XML data:http://pi3.informatik.uni-mannheim.de/publications.html#79http://pi3.informatik.uni-mannheim.de/publications.html#79Anatomy of a Natix XML base Management System:Anatomy of a Natix XML base Management System:http://pi3.informatik.uni-mannheim.de/publications.html#79http://pi3.informatik.uni-mannheim.de/publications.html#79Alebraic XML Construction and its Optimization in Alebraic XML Construction and its Optimization in

Natix:Natix:http://pi3.informatik.uni-mannheim.de/publications.html#79http://pi3.informatik.uni-mannheim.de/publications.html#79Data ex machina:Data ex machina:www.dataexmachina.de/natix.htmlwww.dataexmachina.de/natix.html

Page 28: Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

CSC 5370 XML and Data ManagementCSC 5370 XML and Data Management 2828

Thank YouThank You