xml programming with sql/xml and xqueryquery language, xquery. in this paper, we describe an...

XML programmingwith SQL/XMLand XQuery

by J. E. FunderburkS. MalaikaB. Reinwald

Most business data are stored in relationaldatabase systems, and SQL (StructuredQuery Language) is used for data retrievaland manipulation. With XML (ExtensibleMarkup Language) rapidly becoming the defacto standard for retrieving and exchangingdata, new functionality is expected fromtraditional databases. Existing SQLapplications will evolve to retrieve relationaldata as XML data using database or SQLextensions for XML. New XML data will bestored, searched, and manipulated in thedatabase as a “first class” citizen along withexisting relational data. Furthermore, newapplications will emerge that solely operate interms of XML. These new XML applicationsoperate on the same database using an XMLquery language, XQuery. In this paper, wedescribe an integrated database architecturethat enables SQL applications with XMLextensions as well as XQuery applications tooperate on the same data. The architectureallows for a seamless flow from relationaldata to XML and back.

The recent expansion of the Internet and inventionof the XML (Extensible Markup Language) data for-mat has created both the opportunity and the needfor businesses to exchange information, and tointeroperate in a uniform way that has not beenachieved nor been possible before except in isolatedsegments within the business community. Today’seconomic practices often create businesses and bus-iness processes formed from many incompatible sys-tems. The need to integrate and exchange data within

a company is as profound as it is between compa-nies. Much of the data being exchanged are oper-ational: data that enable transactions, determine thecourse of business processes, and in the aggregate,become business intelligence data that affect deci-sions of business leaders.

The XML data format provides a way of regularizingthe storage of semi-structured data, historical data,and other information requiring content manage-ment. XML can be used to store the content itselfand data mined from the content. Data mined fromthe content can be used to form catalogs, similar inconcept to card catalogs in libraries, which containexistence and location information and possibly otherinteresting summary information. Information minedfrom content can be stored and used in business orscientific intelligence queries.

Information that is low in quantity and importancecan be stored using a variety of simple techniques.However, business-critical data and data in largequantities require a data storage system that canproperly manage the data. Relational database man-agers fulfill vital responsibilities in complex informa-tion systems by consolidating storage and distribu-tion of data. They provide a uniform, high-functioninterface and support other features such as secur-ity, data consistency, control, and the regularizationand automation of backup procedures. These re-

�Copyright 2002 by International Business Machines Corpora-tion. Copying in printed form for private use is permitted with-out payment of royalty provided that (1) each reproduction is donewithout alteration and (2) the Journal reference and IBM copy-right notice are included on the first page. The title and abstract,but no other portions, of this paper may be copied or distributedroyalty free without further permission by computer-based andother information-service systems. Permission to republish anyother portion of this paper must be obtained from the Editor.

FUNDERBURK, MALAIKA, AND REINWALD 0018-8670/02/$5.00 © 2002 IBM IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002642

sponsibilities have been established and relied uponfor many years; most systems are already in placeand are entrenched. It is vital that existing data storesand business processes be extended to support XMLtechnologies in a variety of ways. Systems integra-tors need to interact with existing data, implement-ing new business processes using XML without theneed to write and continually rewrite low-level codeto interact between XML data and the interface tech-nology supported by databases.

The most basic requirement is that data alreadystored in relational database systems must be pub-lishable as XML. Subsets of the data (or, in general,the results of queries) must be able to be formed intoXML documents and sent to applications or consum-ers elsewhere. Usually, XML documents must matcha particular XML format or schema. Such formats arespecified by DTD (Document Type Definition), XMLSchema, or by documentation. Existing data mustbe transformed into forms that comply with the stan-dard of interchange. These forms may or may notbe under the system integrator’s control and oftenevolve over time. The ability to adapt easily is re-quired.

XML programming model evolution

SAX (the simple API [application programming in-terface] for XML)1 was the first popular interface forXML programming. A SAX application was a set ofevent handlers, each called when the parser encoun-tered an element or some text in the document. With-out any control over its execution, one had to buildXML applications as state machines in order to doany nontrivial work. SAX became a de facto standard,with the Java** language being the official bindingof the API. Non-Java language bindings cropped up,but they were particular to a specific implementa-tion of a SAX parser.

DOM (Document Object Model)2 followed and pro-vided a navigational interface that standard program-ming languages could use. A DOM application wasin control of its execution flow. This made the ap-plication a normal program that could be organizedusing familiar approaches. Navigation is performedwith individual steps using DOM function calls (suchas getFirstChild� and getNextSibling�). Processing a doc-ument while validating its semantics or syntax (sup-plementing a DTD) often required a rather verboseand somewhat fragile program. Evolution of thestructure of the input document, although a prob-lem under any circumstances, required a detailed ad-

justment of individual navigation steps and loopswritten in a mix of programming language statementsand DOM interface calls. Without standard languagebindings, DOM applications were, by default, trivi-ally dependent on a particular parser. Traditionally,DOM-capable XML parsers have to read the entireXML document into memory.

XSLT (Extensible Stylesheet Language Transforma-tions)3 1.0 arrived, was the first recognized XML pro-gramming language, and was substantially complete.An XSLT program transforms its input document intoan output document using a programming constructcalled templates. The input document is read intomemory, and traversed. At each point, template-matching patterns are tested. When a match is found,the template body is executed and a portion of theoutput document is formed. A sublanguage calledXPath is used to form template-matching patternsand to navigate within the input document tree, se-lecting the desired element and attribute data to usein expressions to form output or conditional logic.XPath allows several navigations to be performedin succession and various predicates can be appliedat each step. XSLT processes the input documentwithout type information and therefore processesdata in the document as text. It is also possible toexplicitly form expressions that perform basic oper-ations on double floating-point numbers.

Because XSLT uses optimistic recursion when tryingto find matching templates, it is the responsibility ofthe programmer to make sure that templates will onlymatch where they are intended. Longer patterns aremore restrictive than shorter ones, but programs areoften written with short patterns. XSLT determineswhich template matches based on whether the pat-tern matches, the selectivity of the pattern, the modeof the template, and the set of nodes that were se-lected for matching. Templates can be imported fromvarious sources. Adding templates to an import filecould affect the output of programs. In addition, im-ports can be nested, and imported templates havea precedence scheme. Programmers must expend ef-fort to maintain control over the power of templatesin nontrivial programs. Even so, XSLT processors giveXML programmers freedom from the drudgery ofSAX and DOM programming. XSLT programs workvery well with document-oriented XML documentssuch as XHTML,4 but for program-to-program, data-oriented operations, the potential for error and lackof data typing is a concern.

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 FUNDERBURK, MALAIKA, AND REINWALD 643

XML Schema5 was developed to add data types toXML and to provide better document validation thanDTDs.6 XML Schema is also important because DTDsdo not support XML Namespaces.7 Because XMLNamespaces are an integral part of the XSLT pro-cessing model, XSLT processors typically did not val-

idate their input documents. XML Namespaces nowplay a part in any new XML effort. XML Schema ora suitable replacement is a necessity.

SOAP (Simple Object Access Protocol)8 is an XMLmessaging format. It defines a simple standard en-velope for messages, which can be used in a widevariety of protocols. When combined with HTTP (Hy-perText Transfer Protocol), it is lightweight and ac-cessible, eliminating the difficulties that must be facedwhen using other protocols.

At this time, we have XSLT processors running XSLstylesheets, and handwritten DOM and SAX applica-tions. The overall processing model is one where onestep processes a whole document and then dumpsthe output document out to a file or sends it overthe wire to another step that processes the documentas input. Integrators and Webmasters use a varietyof small, single-function processors and APIs for in-teracting between XML and databases, messaging sys-tems, and Web servers.

What is the next advance in XML programming? Itis language. Language has been a factor in almostevery advance in software and data management.Language attracts and draws in. The SQL (StructuredQuery Language) is largely responsible for the re-placement of hierarchical and navigational databasesby relational ones. Even in the face of greater ef-ficiency and performance, on average, successful lan-guages win over nonlanguage approaches. However,language is only useful within the context of its envi-ronment. Because of this, the next advance in XMLprogramming will be XML-related languages com-bined with XML databases/data stores. Whereassingle-document, file-oriented, processors are use-ful tools, an XML language is much more useful when

combined with a database—just as SQL is. Even at apragmatic level, combining the two affords better func-tion, support, and performance. An XML-capable da-tabase manager can become a central highway in thenext generation of XML programming. Capability willbe measured in terms of language.

Using relational data in a world ofhierarchical messages

The use or transmission of data in a system requiresrepresentation of the data values and the semanticrelationships among the values. The modeling of se-mantic relationships varies most among systems. Inaddition to defining a way to model semantic rela-tionships, most systems also require abiding by thecardinality rules for storing data and maintaining asemantic relationship.

In relational systems, data are separated accordingto cardinalities and according to logical dependen-cies (normalization). Separate tables are requiredto store a single, but nontrivial, piece of knowledge.Data in each table form regular records, called tuples,that contain the same number of fields. Data in eachof the separate tables are correlated by column val-ues that serve as keys. Key values may represent ac-tual external data or be manufactured values solelyfor the purpose of correlating tuples. The organi-zation of related data into tables is dependent oncardinalities in the following way.

1. Data that are one-to-one with respect to otherdata can be stored in the same tuple.

2. Data that are one-to-zero with respect to otherdata can be stored in the same tuple, with nullrepresenting not-present. The data can be storedin a separate table as well.

3. Data that are one-to-many with respect to otherdata must be stored in a separate table.

4. Data that are many-to-many with respect to otherdata must be stored in another table, and a cor-respondence table must be created to record thecorrespondence between the two tables by stor-ing key pairs from the respective tables.

Simple relationships are maintained by storing datain the same tuple. Complex relationships are main-tained by using separate relations and keys. All thedata of a category (by cardinality or logical depen-dence) are stored in the same table. This places in-formation together that is unrelated, but which hasthe same type and structure properties.

What is thenext advance

in XML programming?It is language.

FUNDERBURK, MALAIKA, AND REINWALD IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002644

In the relational world, it is as easy to query all theauthors a book has as it is to query all the books anauthor has written. The data objects and relation-ships between data are in a separate domain of meta-data, and such objects and relationships must be un-derstood and materialized by an application. Theapplication does this by utilizing the correct tablenames and by creating the appropriate “join” expres-sions. Utilizing a set of tables at a server requirestoo much involvement with the names and data de-sign for a particular server. A table or set of tablesdoes not make a very appealing message format, be-cause the data require too much correlation and in-terpretation by the recipient.

In the XML-centric world, the parent/child elementand the element/attribute relationships are used toencode semantic relationships. A specific documentmust choose a specific hierarchy that models the re-lationships that are meaningful in the specific com-munication. Data cardinality also influences XMLschema design. Here are the most common gener-alities for XML.

1. Data that are one-to-one with respect to otherdata can be stored in attributes of an element ordescendant elements of the same element—ei-ther as element content or in attributes.

2. Data that are one-to-zero with respect to otherdata can be stored in the same way as the one-to-one case, but the element or attribute housingthe data can be absent or the contents of the nodecan be empty.

3. Data that are one-to-many with respect to otherdata can be stored in repeating child elements ofthe element representing the other data.

4. Data that are one-to-many with respect to otherdata can be stored in child elements where dataare duplicated. Data can also be stored in sep-arate hierarchies that are cross-linked by identi-fiers, or some other application convention.

In the XML world, relationships are encoded withinthe document itself by using parent/child relation-ships and element/attribute relationships. Data thatare related are adjacent. Data appear when the dataare needed, and the structure varies accordingly. Thismakes an XML document more self-contained andrequires less interpretation. The XML format doeshave its downside. An XML document originates incharacter form, is organized in a particular hierar-chy that favors particular relationships, and may du-plicate various facts. Because of this, the XML data

format is not desirable as a method of storing datathat need to be correlated with many other collec-tions of data, in many relationships, or for storingdata that are updated.

Any system that provides many views of data in var-ious relationships must be able to efficiently mate-rialize results at run time. Typical queries form joinsand other various predicates and projections thatcombine and form subsets from the data. Stored datathat support these kinds of queries must be readilyaccessible, in a digested, typed format ready for com-parison or computation. Such queries need the as-sistance of indexes to perform well.

Thus, often the right way to store data is relationalbut it is not the right way to encode messages. Thedata manager must enable the creation of any ar-bitrary number of specific mappings between rela-tional data and XML, and make these mappings ortransformations invokeable by applications. Figure1 depicts the structure of a database manager thatis designed to support both XML and relational re-quests. SQL or XQuery requests are accepted, andspecific APIs can be chosen by the application to pro-cess the result.

APPLICATION INTERFACE

SOAPDOMSAXJDBC STATIC SQLODBC

RELATIONAL STORAGE WITH XML COLUMN SUPPORT

SQL AND SQL/XML LANGUAGE

XQUERY LANGUAGE

DATABASE META-DATA,

XML META-DATA,

DATABASE CONFIGURATION

HTTPOR

MSGQUEUE

QUERY / TRANSFORMATON PROCESSOR

TRANSACTION SUPPORT, DATA MANAGEMENT

Figure 1 XML/relational database manager


SQL extensions for XML

In this section, we describe how traditional SQL ap-plications can evolve to deal with XML data. Build-ing on the relational data model with SQL and ex-isting data access protocols, we introduce SQLextensions for XML to construct XML data from re-lational data, as well as store, query, and retrieve XMLdata. Some of the SQL extensions described here areprovided by DB2* (Database 2*) XML Extender, someare proposed for the ANSI/ISO (American NationalStandards Institute/International Organization forStandardization) SQL standard (SQL/XML),9,10 andothers are discussed in workgroups or exist in pro-totype implementations. ANSI/ISO approved a proj-ect for a new part of SQL:200n, part 14, XML-RelatedSpecifications (SQL/XML).

An informal group of companies, called SQLX (http://www.sqlx.org) including IBM, Microsoft, Oracle, andSybase began to define XML extensions for SQL inearly 2000. The group focuses on SQL capabilitiesand consciously avoids vendor extensions while en-couraging state-of-the-art and projected future de-velopments. SQLX forwards proposals to the INCITS(InterNational Committee for Information Technol-ogy Standards) H2 Database Committee for approval.

IBM’s vendor extensions were incorporated in the lat-est version of DB2 XML Extender. XML Extender addsXML functionality to IBM DB2 databases through aset of user-defined types, user-defined functions, andstored procedures. This section is not intended tocompare SQL/XML with XML Extender. A compar-ison would not be useful, because the currentSQL/XML proposal only contains the first phase of ex-tensions, and more extensions are in preparation.Both technologies are presented as partly overlap-ping approaches with similar objectives. The line be-tween SQL/XML and XML Extender is kept fuzzyintentionally, with the projection that these technol-ogies will merge in the course of time.

XML publishing functions. Relational data are theuniversal backbone of any business. With XML as auniversal data exchange format, the capability of con-structing XML data from existing relational data,while preserving the power of SQL, tremendously sim-plifies business-to-business (B2B) application devel-opment. In this subsection, we introduce a list of sca-lar and aggregate functions as SQL extensions:

● XMLElement and XMLAttributes: construct XMLelements with attributes

● XMLForest: constructs a sequence of XML ele-ments

● XMLConcat: concatenates XML elements● XMLAgg: aggregates XML elements● XMLGen: constructs XML according to an XML el-

ement constructor specification

XMLElement constructs a new XML element withattributes and content. The attribute names and val-ues are specified in XMLAttributes through columnnames or aliases for value expressions. Element con-tent is constructed from a variable list of value ex-pressions. The result of the value expressions ismapped from SQL to XML according to the mappingrules specified in SQL/XML.9 SQL/XML defines map-ping rules to map SQL identifiers and XML identi-fiers, SQL data types and XML schema types, and SQLdata and XML data on a value, table, schema, andcatalog level. An example for mapping an SQL tableto XML is discussed in the subsection, “ExampleXQuery view.”

In a supply chain sample scenario, a company storesorders and orderItems in relational tables. A businesspartner posts through a secure Internet connectiona query to retrieve all open orders:

SELECT XMLELEMENT (NAME “order”,XMLATTRIBUTES (o.oid AS “id”),XMLELEMENT (

NAME “signdate”,o.contractdate

),XMLELEMENT (

NAME “amount”,(SELECT SUM(orderitem)FROM orderItems AS oiWHERE i.oid � o.oid)

))

FROM orders AS oWHERE status � ‘open’;

The query returns a result set with two rows, witheach row containing an XML value:

�order id�“4711”�

�signdate�2002-03-18�/signdate�

�amount�24000�/amount��/order�





XMLForest simplifies queries as it constructs se-quences of XML elements from SQL value expressionsin the order of the expressions. XMLForest is a short-hand for a sequence of XMLElement invocations.XMLForest takes a variable list of SQL value expres-sions as input, and produces for each expression anXML Element with the column name or alias of theexpression as the tag name, and the value of theexpression as the element content.

SELECT XMLELEMENT (NAME “order”,XMLFOREST (

o.oid AS “id”,o.name AS “name”,o.city AS “city”

))

FROM orders AS oWHERE status � ‘open’;

�order��id�4711�/id�

�name�steel company��/name�

�city�Hamburg�/city�

�/order�

�order��id�4712�/id�

�name�beer company�/name�

�city�Munich�/city�

�/order�

XMLConcat takes a variable number of XML valueexpressions and constructs a single XML value as asequence of XML values. This function is used to con-struct an XML element from pieces of independentlyconstructed XML. XMLConcat is a scalar function.The following example produces the same output asthe previous example.

SELECT XMLELEMENT (NAME “order”,XMLCONCAT (

XMLELEMENT(NAME “id”,o.oid

),XMLELEMENT(

NAME “name”,o.name

),

XMLELEMENT(NAME “city”,o.city

),)

)FROM orders AS oWHERE status � ‘open’;

XMLAgg is an aggregate function, which constructsan XML value from a collection of XML value expres-sions. XMLAgg resolves the 1:n relationships in XML.The following example retrieves information aboutan order including its orderItems.

SELECT XMLELEMENT(NAME “order”,XMLATTRIBUTES(o.oid AS “id”),XMLAGG(

XMLELEMENT(NAME “item”,XMLATTRIBUTES(

oi.listnbr AS “listnbr”),XMLFOREST(

oi.name AS “name”,oi.quantity AS “quantity”

))ORDER BY oi.listnbr

))

FROM orders AS o, orderItems AS oiWHERE o.oid � oi.oidGROUP BY o.oid;

This query returns the result:


�item listnbr�“1”�

�name�bike�/name�

�quantity�10�/quantity�

�/item�

�item listnbr�“2”�

�name�racket�/name�

�quantity�5�/quantity�

�/item�

�/order�

Mapping relational data to XML in DAD description.An alternative method for publishing XML docu-ments from relational tables is through the DB2 XMLExtender Document Access Definition (DAD), whichitself is an XML document. The DB2 XML Extender11


is a component of DB2 that provides various formsof XML support.

Two distinct notations can be used in DADs to de-scribe how to map from relational tables to XML.

● SQL Composition: A notation that incorporates anSQL SELECT statement, followed by instructions onhow the resulting rows should be tagged as XML

● RDB (relational database) Node: A notation thatincludes a list of the tables whose contents are tobe tagged as XML, together with the primary-for-eign key relationships between the tables. As inSQL Composition, the list of tables is followed byinstructions on how the contents (or more typicallya subset of the contents) should be tagged as XML.

There are seven steps to consider when generatinga DAD for publishing.

1. Scoping the document content2. Shaping the document structure3. Mapping the relational content to the document4. Controlling the number of documents generated5. Outputting document header information6. Validating the generated documents7. Transforming the generated documents (e.g., to

HTML, HyperText Markup Language)

We describe in detail the steps required to producethe following XML documents from relational data:



�amount�24000�/amount��/order��order id�“4712”�



Scoping the content of the generated documents. Thefirst part of the SQL Composition DAD contains anSQL SELECT statement that retrieves all the rowswhose content is required in the generated docu-ment. The SQL statement can include join operations,subselects and SQL functions. For example,

�SQL_stmt�SELECT

o.oid AS id,o.contractdate AS cdate,SUM(oi.orderitem) AS total,

FROM orders AS o, orderItems AS oi

WHERE oi.oid � o.oid AND status � ‘open’ORDER BY id;

�/SQL_stmt�

The first part of the RDB_node contains a list of ta-bles whose rows form the content of the generateddocument. The key relationships between the tablesare listed in the scoping portion of the DAD. Forexample,

�RDB_node�

�table name�“orders” key�“oid”/��table name�“orderItems” key�“oid”/��condition�

orders.oid�orderItems.oid�/condition�

�/RDB_node�

Another condition can be added to restrict the doc-uments generated to those that have an ‘open’ sta-tus by inserting the following check into the DAD:�condition�status�‘open’�/condition�.

Shaping the structure of the generated documents. Forboth the SQL Composition DAD and RDB Node DAD,the shape of the output document is governed by thestructural tag layout in the second part of the DAD.Multiple hierarchies can be generated in a single doc-ument, and the way elements repeat can be con-trolled. Following is an example for SQL Composi-tion:

�root_node�

�element_node name�“order”�

�attribute_node name�“id”�

�column name�“id”/��/attribute_node�

�element_node name�“signdate”�

�text_node�

�column name�“cdate”/��/text_node�

�/element_node�

�element_node name�“amount”�

�text_node�

�column name�“total”/��/text_node�

�/element_node�

�/element_node�

�/root_node�

Mapping the relational content to XML. For both theSQL Composition DAD and RDB Node DAD, the map-ping is governed by instructions that appear along-side the structural tags in the second part of the DAD.


The tag layout shown in the subsection on shaping,above, represents the mapping information for theSQL Composition example.

Controlling the number of documents generated. ForSQL Composition, the number of documents pro-duced can be controlled by the SQL statement in theDAD. The number of documents produced is equalto the number of rows grouped by the first groupingexpression. For RDB_node, the number of documentsproduced can be controlled by the options suppliedon the root element in the DAD. Suppose we wantedto produce a single document as follows:

�orders�







�/orders�

For RDB Node notation, we would use the multi_oc-currence option in the DAD to cause a single doc-ument to be produced instead of two. Following isan example of how the option is specified:

�element_node name�“orders”�

�element_node name�“order”multi_occurrence�“yes”�


�RDB_node�

�table name�“orders”/��column name�“oid”type�“varchar(20)”/�

�/RDB_node�

�/attribute_node�

�/element_node�

�/element_node�

Outputting document header information. Header in-formation such as XML declarations, DTD references,and processing instructions can be generated throughthe statements in the DAD. Following are some ex-amples:

�prolog�?xml version�“1.0”?�/prolog�

�doctype�

!DOCTYPE Order SYSTEM “orders.dtd”�/doctype�

Validating the generated documents. For both SQLComposition and RDB Node, it is possible to vali-date the generated documents against an XMLschema or a DTD. For XML document validation thereare three options.

1. Use the validation option in the DAD as follows:

�dtdid�order.dtd�/dtdid�

�validation�yes�/validation�

DB2 XML extender will perform the validationagainst a DTD stored in the file system, or storedin a special table called the DTD_REF table.

2. Use the dvalidate UDF (user-defined function) asfollows:

db2xml.dvalidate( doc, dtd )

3. For schema validation, the svalidate UDF is avail-able as follows:

db2xml.svalidate( doc, xmlschema )

Transforming the generated documents. It is possibleto apply further transformations to the generateddocuments, for example to convert them to HTML.There are a number of ways to transform.

1. Place an XSL processing instruction in the headerinformation.

2. Use the XML Extender-supplied XSLT UDF, forexample XSLTransformToClob( xmldoc, stylesheet,parameters, validate ). A CLOB is a character largeobject.

In this subsection, we have seen DAD fragments only.Following is a complete SQL Composition DAD:

�?xml version�“1.0”?�

�!DOCTYPE DAD SYSTEM “dad.dtd”�

�DAD�

�validation�no�/validation�

�Xcollection�

�SQL_stmt�SELECT

o.oid AS id,o.contractdate AS cdate,SUM(oi.orderitem) AS total,

FROM orders AS o, orderItems AS oiWHERE oi.oid � o.oid AND status � ‘open’ORDER BY id;


�/SQL_stmt��prolog�?xml version�“1.0”?�/prolog�

�doctype�

!DOCTYPE Order SYSTEM “orders.dtd”�/doctype�

�root_node�

�element_node name�“order”�


�column name�“id”/��/attribute_node�

�element_node name�“signdate”�

�text_node�

�column name�“cdate”/��/text_node�

�/element_node�

�element_node name�“amount”�

�text_node�

�column name�“total”/��/text_node�

�/element_node�

�/element_node�

�/root_node�

�/Xcollection�

�/DAD�

Publishing XML documents from a DAD. Having cre-ated a DAD, using a text or XML editor, or throughthe WebSphere* Studio application developmentfamily of tools,12 it is necessary to invoke aDB2-supplied stored procedure to produce one ormore XML documents. The stored procedures, whichare a component of the DB2 XML Extender,11 makeit possible to select whether:

● The generated documents are placed in permanenttables, temporary tables, WebSphere MQ queues,or in memory.

● The generated documents should be validated.● A maximum limit should be set on the number of

documents produced.● The content of the DAD should be overridden.

Following is an example of a stored procedure in-vocation to generate XML documents from a DAD:

EXEC SQL CALL dxxGenXML(:orderdad:orderdad_ind;:result_tab:rtab_ind,:result_colname:rescol_ind,:valid_colname:val_ind,:overrideType:ovtype_ind,:override:ov_ind,:max_row:maxrow_ind,:num_row:numrow_ind,

:returnCode:returnCode_ind,:returnMsg:returnMsg_ind

);

where the input parameters are:

● orderdad: contains the DAD● result_tab: contains the name of a table where the

documents to be published (in result_colname) areplaced together with an indication (in valid_colname)for each document whether it is valid or not (if val-idation was requested in the supplied DAD)

● overrideType: contains an indication of the type ofscoping override to be supplied in this request.There are three possible values: (1) SQL override:to override the SQL statement in a SQL composi-tion DAD, (2) location path restrictions (an XPathsubset): to override the values included inRDB_node-generated documents, and (3) no over-ride.

● override: contains either one SQL statement or a se-ries of location path restrictions

● max_row: the maximum number of rows to be re-turned.

The main output parameters are the actual numberof rows returned, and the result table populated withdocuments and validation indicators.

DB2 supplies a number of stored procedures for doc-ument generation. Table 1 shows the major storedprocedures.

Storing XML in the database. SQL provides exten-sibility features such as UDTs (user-defined types) andUDFs to extend the system. DB2 XML Extender pro-vides a character-based UDT for XML, and a collec-tion of UDFs to operate on XML. SQL extensions areproposed for SQL/XML with a built-in XML SQL datatype.9 The SQL/XML data type is more generic (flex-ible) and provides XML-specific functionality, ratherthan the XML extender data types that are built us-ing the SQL UDT and character data type function-ality.

For example, an application can create a table “or-ders” with a column “purchaseOrder” of data typeXML with the request:

create table orders ( oid integer,customer varchar(20), purchaseOrder XML );

An SQL INSERT request is used to store an XML pur-chase order document in the table. XMLParse is used


to “cast” an XML character string to an instance ofthe XML data type. An explicit XMLParse is intro-duced as opposed to using cast syntax, as additionalparameters for whitespace handling or other optionsmight be provided. Furthermore, XMLParse per-forms much more than just switching a type indica-tor. It might be an expensive operation to check forwell-formedness of the document.

INSERT INTO ordersvalues (1000, ‘Steel Inc.’, XMLParse(‘�?xml

version��purchaseOrder� . . . �/purchaseOrder�’);

An additional function, XMLValidate, is being de-fined to perform XML Schema validation on an XMLinstance.

Retrieving an XML instance requires serialization ofthe XML instance into a particular format and en-coding. XMLSerialize provides options to deal withdata type issues, encoding, and so on.

SELECT XMLSerialize(purchaseOrder)FROM ordersWHERE oid � 1000;

XMLSerialize produces a character string (CHAR,VARCHAR, or CLOB) on the database server side. Be-sides XMLSerialize, host language mapping rules willbe defined to retrieve XML values into a client ap-plication as serialized XML or even a DOM document.

The DB2 XML Extender11 provides two ways of stor-ing XML data in DB2:

● XML Column: where the XML data are stored in-tact with optional hierarchical indexes for speedysearch

● XML Collection: where the data are shredded intorelational form and the tags are removed. XML Col-lections are sets of relational columns (whose datacontain no XML tags) within one or more relationaltables that can be composed into XML documentsor processed in a routine way by regular relationaltools and applications.

In the previous subsection, a DAD was used to mapfrom relational data to XML data when publishingXML documents from traditional relational databasecontent. In this subsection, we illustrate that the DADalso maps from XML data to relational data whenstoring XML data in DB2.

In the case of XML Column, the DAD defines the in-dexes (known as side tables) that DB2 builds andmaintains, as documents are inserted and modifiedby DB2 XML Extender. Location path notation, a re-stricted form of XPath, is used in the DAD to specifythe portions of the document that are to be indexed.

In the case of XML Collection, the DAD defines themapping between XML content (elements and at-tribute values) and relational columns across manytables. As documents are shredded or are generated,DB2 consults the relevant DADs to determine how toproceed.

XML column storage. Intact XML documents, incor-porating all the tags, can be stored in three user-de-fined types in relational tables:

1. XMLVarchar: documents stored in DB2 and up to3K in length

2. XMLCLOB: stored in DB2 and up to 32K in length3. XMLFILE: documents stored in the local file sys-

tem

Table 1 DB2 stored procedures for document generation

Publishing StoredProc Name

Features

dxxGenXML� Documents are placed in a permanent or temporary table.dxxRetrieveXML� The name associated with the DAD (known as a collection name) is

supplied as input instead of the DAD itself.

dxxmqGen� Documents are placed in or retrieved from a WebSphere MQ queue.dxxmqRetrieve�

dxxGenXMLClob� One document is placed in or retrieved from an output parameter clob.dxxRetrieveXMLClob�

dxxmqGenClob� One document is placed in or retrieved from a WebSphere MQ queue.dxxmqRetrieveClob�


In all three cases, as XML documents are insertedthrough SQL INSERT requests, indexing tables knownas side tables are built in accordance with an XMLColumn DAD.

To index the “id” attribute in the following docu-ment:


�SIGNDATE�2002-03-18�/SIGNDATE�

�AMOUNT�24000�/AMOUNT�

�/order�

The following XML column DAD could be used:

�?xml version�“1.0”?�

�!DOCTYPE DAD SYSTEM “dad.dtd”�

�DAD�

�dtdid�order.dtd�/dtdid�

�validation�yes�/validation�

�Xcolumn�

�table name�“order_side_tab”�

�column name�“order_key”type�“integer”path�“/Order/@id”multi_occurrence�“no”/�

�/table�

�/Xcolumn�

�/DAD�

The path�“/Order/@id” option indicates the name ofthe attribute to be indexed in a side table. The op-tion multi_occurrence�“no” indicates that the order, andhence its attribute called id, will appear at most oncein each document.

Multiple single occurrence element or attribute val-ues can be placed in one side table. Each time anintact XML order document is inserted into anXML Column, DB2 inserts one row into the tableorder_side_tab. The row in order_side_tab represents theorder. For speedy access, the row in order_side_tab in-cludes the key of the row that contains the intact XMLorder. DB2 creates a default relational view that in-corporates the side tables and the table containingthe XML column. The view provides a simple inter-face to retrieve the intact documents programmat-ically by content, in an efficient and simple way, with-out requiring XML parsing.

Consider storing the following document, that con-tains multiple orders, in an XML column:

�orders�


�signdate�

2002-03-18�/signdate�

�amount�24000

�/amount��/order��order id�“4712”�

�signdate�

2002-03-19�/signdate�

�amount�44000

�/amount��/order�

�/orders�

We would adapt the DAD as follows:

�Xcolumn�

�table name�“order_side_multitab”�

�column name�“order_key”type�“integer”path�“orders/order/@id”multi_occurrence�“yes”/�

�/table�

�/Xcolumn�

Each time an orders document is inserted, DB2 insertsmultiple rows into the table order_side_multitab, eachof which includes the key of the row containing theintact XML orders document. Each row in the sidetable represents a single order.

When XML documents are inserted into XML col-umns, they can be validated against XML schemas orDTDs through the svalidate and dvalidate functions men-tioned in the previous section. Alternatively, they canbe validated on insertion by specifying validation yesin the XML Column DAD, as illustrated previously.

XML Collection storage. The following SQL requestshreds XML data into relational form:

EXEC SQL CALLDB2XML.dxxShredXML( :dad:dad_ind;

:xmlDoc:xmlDoc_ind,:returnCode:returnCode_ind,:returnMsg:returnMsg_ind);

Stored procedures for shredding are described in Ta-ble 2.


XML storage guidelines. With the DB2 XML Extender,where data content is updated often and speedy per-formance of updates is very important, we recom-mend XML Collection as the storage method. XMLCollection also makes it possible to apply analyticaltools for relational data to the XML content. Whereit is desired to view documents precisely as they wereon input to the system, and document updates arenot frequent, we recommend XML Column, whichalso provides fast search capability. Often, docu-ments are stored in XML Column for nonrepudia-tion purposes.

Some applications use a combination of XML Col-lection and XML Column. For example, an insuranceclaims system may store and index the claims in theform in which they were input to the system, for easysubsequent access. The claims may also be shred-ded into XML Collections to drive a claims process-ing system.

Working with XML data in SQL. Using XML data inSQL requires the ability to search, update, extract,and shred XML data. SQL by itself is not sufficient toperform these operations, as it does not provide lan-guage to navigate and traverse XML data. Operatingon XML values requires an XML query language suchas XPath and/or XQuery. In this subsection we de-scribe SQL extensions for XML to provide means tosearch, update, extract, and shred XML in the con-text of SQL. Some functions are currently discussedin SQLX as a potential proposal for SQL/XML, somefunctions are implemented in DB2 XML Extender, andothers are implemented in various prototypes. Thefunctions are discussed here from the perspective of

application requirements without addressing theircurrent status.

XMLExists is a Boolean function, which evaluatesan XPath expression on an XML value. If XPath re-turns a nonempty sequence of nodes, then XMLEx-ists is true, otherwise it is false. XMLExtract returnsthe result of the XPath query as an XML instance.The following sample query returns all customers anddates of orders that include a shoe item:

SELECT customer,XMLExtract(

purchaseOrder,‘/purchaseOrder/@orderdate’)

FROM ordersWHERE

XMLExists(purchaseOrder,‘/purchaseOrder[list/item/desc/text��“Shoes”]’

)�1;

XMLUpdate modifies fragments in an XML value.It takes three arguments as input. XMLUpdate op-erates on an XML value (first argument), locates XMLfragments in the XML value using an XPath expres-sion (second argument), replaces the identified XMLfragments with an updated XML fragment (third ar-gument), and finally returns the new XML value. Thefollowing example updates the customer name in or-der XML documents to ‘IBM’, where the salespersonis ‘John Doe’.

update sales_tabset order � XMLUpdate(order,

Table 2 Stored procedures for shredding

ShreddingStored Proc

Name

Features

dxxShredXML� A document is shredded into many rows in many tables.dxxInsertXML� The name associated with the DAD (known as a collection name)

is supplied as input instead of the DAD itself.

dxxmqShred( )dxxmqShredall( )dxxmqInsert( )dxxmqInsertall( )

These are similar to the stored procedures in the row above,except that they shred documents held in WebSphere MQ queuesinto DB2 tables. dxxmqShredall( ) and dxxmqInsertall shred allthe documents in a queue whereas dxxmqShred and dxxmqInsertshred the first document only.

dxxmqShredCLOB()dxxmqShredAllCLOB()dxxmqInsertCLOB()

These are similar to the stored procedures in the row above,except that they shred large documents held in WebSphere MQqueues into DB2 tables.


‘/order/customer/name’,XMLParse(‘�Name�IBM�/Name�’))

where sales_person � ‘John Doe’

Shredding of XML takes an XML value as input andreturns a derived table with columns containing ex-tracted values. The extracted values are specifiedthrough XPath expressions. A context XPath expres-sion provides the set of values. For example, an XMLvalue may contain a list of items. The context XPathexpression navigates to each item, produces a row,and additional XPath expressions operate on eachitem to retrieve column values for the rows. A ge-neric function XMLTable, takes as input an XMLvalue, an XPath expression for the context, and NXPath expressions to return N column values. Thefollowing example query takes an XML value with anitem list as input. The context XPath expression‘/items’ produces a row per item in this list, and thecolumn XPath expressions ‘./item/@id’ and ‘./item/@desc’return column values for item id and item descrip-tion.

select *from table ( XMLTable( :item list, ‘/items’,

‘./item/@id’, ‘./item/@desc’))As T (id integer , desc varchar(10) )

The resulting table is:

ID DESC

23 Shoes25 Bungee Ropes

Manipulating XML column content through the DB2XML extender. DB2 XML Extender provides over onehundred SQL functions that can be used to manip-ulate XML stored in tables, in memory, in the file sys-tem, in WebSphere MQ, or generated through doc-ument publishing via the DAD. We highlight justsome of the functions and their features. Of course,all of these functions can be used in conjunction withsophisticated SQL requests. Table 3 describes thefamilies of SQL functions available in DB2 XML Ex-tender.

Other XML Extender features. The DB2 XML Ex-tender provides optional validation support againstDTDs and XML schemas. The validation can takeplace prior to storing or shredding the XML docu-ment into DB2, or after retrieving, extracting, or gen-erating XML documents from DB2. In addition, it ispossible to store and manage DADs and DTDs in DB2tables. In the future, integration with an XML repos-itory will be provided, so that XML meta-data can bemanaged in a very general way.

The DB2 XML Extender helps integrate data XML datastored in the file system and in WebSphere MQ withrelational data. For example, it is possible to removea number of items from a WebSphere MQ queue andshred them into relational data, all through a singleSQL request.

Figure 2 depicts the components used in implement-ing the XML data integration functions we have pre-sented for DB2 XML Extender.

Table 3 DB2 XML Extender SQL function descriptions

XML Extender SQL FunctionFamily

Features

Validation Enable the validation of XML documents against schemas and DTDs

Transformation Enable the transformation of XML documents through XSLtransformations

Import Enable importing XML documents from a file system into DB2

Export Enable exporting XML documents from a file system into DB2

Extract document fragment Enable the extraction of one or more well formed documentfragments from a document by specifying a location path

Extract element & attribute values Enable the extraction of one or more well element or attribute valuesfrom a document by specifying a location path

Update Enable the modification of element or attribute values in a documentby specifying a location path


XQuery

XQuery is a functional expression language that canbe used to query or process XML data or any datathat can be represented within the same model asXML. Being purely an expression language, XQueryprograms are easier to understand and maintain thanXSLT, because they do not include the complexitiesor management of templates (rule-based system).This is especially true for highly structured data, andfor longer programs. XQuery will still be able to ef-fectively process semi-structured data. The query lan-guage is small and powerful. It has both an easy, hu-man-readable form and an XML representation. TheXQuery language is an activity of the W3C (WorldWide Web Consortium). Several publicly availableworking drafts have been published.13

XQuery as the next XML programming language.XQuery can become the next XML programming lan-guage. XQuery provides needed concepts, upgradedfunctionality, and new ideas that will fundamentallychange the way XML applications are designed andimplemented.

XQuery supports XML data typing using XML Schemaas a base, but will also be able to process documentswithout type information. XQuery includes an up-

graded version of XPath whose semantics mesh bet-ter with typed data. XQuery provides a powerfulFLWR (for, let, where, return) statement that allowsjoins to be expressed. XQuery also allows the userto construct sequences of items in a given order andto perform arbitrary sorts on any generated se-quence. New elements, sequences, and XML datafragments can be formed. The result of expressionscan form the output of queries, or as a subexpres-sion, form a temporary data structure that can itselfbe queried. Variables can be bound to the result ofan expression and utilized in multiple places. Var-iables are not declared with a specific type, but in-stead take on the type of the expression they arebound to. XQuery includes a type-switch expressionthat allows programs to test an expression for beinga particular type and to form a result based on thistest. XQuery has the usual programming constructs,such as if-then-else and arithmetic, Boolean, andcomparison operations.

The XQuery language allows one to define functionsand invoke them. These functions can be local func-tions to the query, allowing one to organize one’squery in smaller, more understandable fragments.Local functions do not have to be defined to the da-tabase system. Some XQuery processors may allow

DB2 XMLCOLLECTION

DB2 XMLCOLUMN

XML DOCUMENTSIN THE FILE SYSTEM

XML DOCUMENTSIN MQ SERIES

XML DOCUMENTSIN MEMORY

XML DOCUMENT DECOMPOSITION

XML DOCUMENT COMPOSITION

EXTRACT XMLDOCUMENT FRAGMENTS

EXTRACT XML STRINGS

UPDATE DOCUMENT CONTENT

IMPORT AND EXPORTDOCUMENTS

SEARCH FOR DOCUMENTS

DB2 XMLAPPLICATION

Figure 2 XML integration with DB2


function libraries to be defined and then accessedin queries. The XQuery language defines an exten-sive built-in function library. Besides mathematicalfunctions, operations on nodes such as deep-equal,filter, and various functions on sequences will giveprogrammers a wide and detailed level of controlover expressions. These features allow XQuery pro-grams to be longer and more understandable thanprograms in most query languages.

The effective XML programming paradigm involvesdata transformation, transactions, work flows andmessaging. The XQuery 1.0 draft definition coversthe core topic of data transformation.

The data manager as the XQuery view processor.The most fundamental requirement for a databasemanager that supports XML applications is that itmust be able to form an XML result. The most nat-ural approach to do this is for the database managerto support XQuery. This makes human or program-matic interaction with data consistent with the XMLmodel and eliminates the need to straddle multipleparadigms such as SQL, XPath, and potentiallySAX/DOM/XSLT in the same application (along withtheir individual programming interfaces and proto-cols).

XQuery operates on an XML data model, so in or-der for XQuery to be able to process relational data,stored tables, and columns, an extension mechanismis needed. The extension mechanism can simply pro-vide a simplistic view of a table as XML (actually aninstance of the XQuery data model). In XQuery, theway to accomplish this is to invoke a special func-tion that returns an XML document with a formatsuch as the following:

�table-name�

�row�

�column1-name� data �/column1-name�



�/row�

�row�




�/row�. . .

�/table-name�

The concept of the default view can be applied toany table, view, database or even arbitrary SQL que-ries.

The view provides the XQuery programmer the rawmaterial to form hierarchic results. There are sev-eral proposals for simplistic (sometimes called “de-fault”) XML representations of relational tables.These views can span the entire database, whereasothers represent only a particular table. The processof standardizing formats is a work in progress, butemerging XQuery processors can easily provide areasonable extension function providing the requiredfunctionality until such standards are in place.SQL/XML is an example of a standardization effortthat defines XML views of relational data.

An application could simply request the entire de-fault view and send it to an outside processor suchas XSLT to form the desired result. The use of XSLTwould be a needed step because the default view isalmost never the desired input or output of an ap-plication. Using XSLT would eliminate the need forthe database to support XQuery, but this approachis wrong for obvious reasons: XML views of entiretables must be exported to the application. XSLT mustaccept the transformation request, then receive po-tentially multiple default view documents, and thenperform the needed joins. XPath does not naturallyexpress joins, and processors like XSLT are likely tobe highly inefficient at processing them.

XQuery should be implemented at the database sothat data can be selected (predicated), projected, andjoined efficiently within the database process. Thedatabase manager also provides highly optimized fa-cilities such as sorts, and resources such as charac-ter collation sequences to support the XQuery re-quests. Since XQuery has the ability to construct newresult node hierarchies based on the input data, thedesired output format, matching the desired XMLschema, can be generated. The database can utilizeinformation about the data and various indexes toefficiently process the query and send a single resultdocument back as a result.

All applications could access default views and formoutput documents, but this approach would still leadto difficulties. Applications would have to becomeinvolved with table and column names and the re-lationships between the tables in order to form thejoin conditions needed in forming the required out-put hierarchies. These expressions often becomecomplex for nontrivial documents. A standard B2B


purchase order would be an example. In order to iso-late applications and other requestors, the XML da-tabase administrator should be able to create XMLviews of relational data and define them usingXQuery. Applications and other requestors wouldonly need to issue an XQuery request in relation toa provided XML view.

One fundamental way to think about XQuery viewsis a publishing paradigm. XML database administra-tors talk with application developers and content pro-viders about the required schemas and hierarchies.The database administrator creates XML views thatsupport the required XML hierarchies including hi-erarchies required by direct user interaction andthose required by applications. The database admin-istrator is free to adjust table and column definitionsand storage structures while providing the same view.The schema of the view can also be published, andapplications, users, and tools can use that informa-tion to formulate queries against the view itself.

Because the database query processor referencesdata in aggregate, queries against views should becomposed internally during optimization of the queryso the total cost of performing the query can be min-imized, just as SQL databases optimize queries withreference to views. This prevents the internal ma-terialization of the view. Applications requiring onlysmall results from large data stores are sent onlysmall amounts of data from an optimized query thatis executed at the server. Client, network, and serverresources are conserved. Applications requiring largeresults benefit from the optimized queries and fromdisk buffering and streaming technologies that thedata manager can provide.

By using the XQuery expression language, request-ors can provide all the required predicates, joins,unions, and other expressions needed to form thedesired result. This is much more flexible than, say,a set of remote procedures each with a fixed set ofparameters that produce particular XML results. Insuch a system, a procedure has to be developed foreach output requirement, or the requestor must ac-cept intermediate results and then use another pro-cessor (XSLT) against potentially many streams to for-mulate the desired result.

The idea is to publish a relatively small set of usefulviews, such as a list of books organized by authorand books organized by subject, or purchase ordersand requisitions. A user chooses the most applica-ble view and operates XQuery against it, providing

the additional constraints to formulate the desiredresult. As a result, the XML database administratoris relieved from the details of every application/user,and the application/user does not need to know thekeys and join conditions to formulate results fromthe schema and correlate them hierarchically.

The use of views also isolates the user from the ex-tension mechanism used to access relational data.One does not need to know the various default viewformats, how they might evolve into standardizedforms, or any special language extensions that applyto the database. The user sees only a pure XML viewof hierarchical data.

The XQuery view mechanism also avoids separateand proprietary mapping languages and files. Be-cause the view mechanism uses XQuery itself, there

is a seamless integration and a lack of limitations as-sociated with creating mappings, querying mappings,or using multiple mappings in the same query. Dif-ferent technologies do not need to be learned. Viewscan be prototyped as queries. With sufficient author-ity to access tables, any query can be executed with-out the administrator forming a mapping.

Views have been used for years in relational systems,but XML views have greater applicability because itmakes sense to encode a great deal more informa-tion in a hierarchy than in a single row. This is be-cause all the required data and the relationshipswithin the data are self-contained as a single unit,rather than spread across several tables or views.

All XML technologies used by applications, includ-ing XML languages such as XPath, XSLT, and XMLapplication interfaces, are best at processing datawhose hierarchical structure provides useful and se-mantic relationships in the context of the specific ap-plication. XML documents and messages will be de-signed with this in mind. Therefore, data stores havethe need to materialize hierarchical information invarious formats using the same base data. The useof XQuery views provides a clean, simple, and pow-erful way to solve this problem. Applications are

XQuery is the wayto process and formulate

XML data models.


made simpler by performing a single XQuery requestover these views and receiving satisfactory results inone step.

Highly operational data are normalized and arestored in most systems using the relational model.Selecting, forming, and correlating data hierarchiesefficiently is best performed by the data store, be-cause it has the resources available to perform therequests and will incur the least cost in processingthe requests.

Example XQuery view. Table 4 depicts an examplescenario in which four database tables are used torepresent simplistic purchase orders. A simple queryover the default view for customers would be:

table( “Customer”)

The query would return a document with the fol-lowing structure:

�Customer��row�

�customer_id�777�/customer_id�

�name�William P Barnes�/name�

�address�104 West Avery Lane, CA�/address�

�/row�

�row�

�customer_id�888�/customer_id�

�name�Shirley Jackson�/name�

�address�1344 Pennsylvania Ave, OH�/address�

�/row�

�/Customer�

XQuery can transform the default view into a morelogical document:

�customerList�{for $c in table(“Customer”)/Customer/rowreturn�customer id�“{ $c/customer_id }”�

�name�{ data( $c/name ) }�/name�

�address�{ data( $c/address) }�/address�

�/customer�}�/customerList�

The output no longer simply exposes underlying ta-ble and column names but instead structures the datawith the desired names and hierarchy. Both subtleand dramatic changes are easily accomplished.

�customerList��customer id�“777”�

�name�William P Barnes�/name�

�address�

104 West Avery Lane, CA�/address�

Table 4 Example of purchase order tables

Order Tableorder_id customer_id ship_method date status

100 777 UPS 1999-10-23 shipped101 777 USPS 2002-01-25 accepted102 888 UPS 2002-02-05 shipped

Order_Items Tableorder_id item_number product_id quantity price

100 1 1001 1 108.25100 2 1002 2 17.42101 1 1002 5 17.42102 1 1003 1 104.10

Product Tableproduct_id description stock price

1001 Sound Blaster Audigy MP3� 1 108.251002 Travel Alarm With Radio 1 17.421003 5.1 Surround Sound Speaker System 1 104.10

Customer Tablecustomer_id name address

777 William P Barnes 104 West Avery Lane, CA888 Shirley Jackson 1344 Pennsylvania Ave, OH


�/customer��customer id�“888”�

�name�Shirley Jackson�/name�

�address�

1344 Pennsylvania Ave, OH�/address�

�/customer��/customerList�

Now, an XQuery view to materialize the entire pur-chase order is seen in Figure 3.

The “orders” XML view constructs an “orders” tagthat contains all the orders in the system. The or-ders tag contains the order id, date and shipping in-formation, a customer information hierarchy, andthe items in the purchase order. Here we can seemore name changes.

The following query over the orders view returns asingle order:

view( “orders” )/orders/order[ order_id � 100 ]

SCHEME 7

<orders>{

for $o in table(“Order”)/Order/row return <order>{

<order_id>{ data( $o/order_id ) }</order_id>, <date>{ data( $o/date ) }</date>, <ship_by>{ data( $o/ship_method ) }</ship_by>, <status>{ data( $o/status ) }</status>

for $c in table(“Customer”)/Customer/row where $c/customer_id=$o/customer_id return <customer id =“{ data( $c/customer_id ) }”> <name>{ data( $c/name ) }</name> <address>{ data( $c/address ) }</address> </customer>,

<items>{ for $i in table (“Order_Items”)/Order_Items/row, $p in table(“Product”)/Product/row where $i/order_id = $o/order_id and $p/product_id = $i/product_id return <item> <item_no>{ data( $i/item_number)}</item_no> <item_desc>{ data( $p/description ) }</item_desc> <item_qty>{ data( $i/quantity ) }</item_qty> <item_price>{ data( $i/price ) }</item_price> </item> sortby ( item_no ) }</items>

}</order>

}</orders>

Figure 3 XQuery view for purchase order example

create view orders as


The document returned by this example is:

�order��order_id�100�/order_id�

�date�1999-10-23�/date�

�ship_by�UPS�/ship_by�

�status�shipped�/status�

�customer id�‘777’��name�William P Barnes�/name�

�address�104 West Avery Lane, CA�/address�

�/customer��items�

�item�

�item_no�1�/item_no�

�item_desc�Sound Blaster Audigy MP3�

�/item_desc�

�item_qty�1�/item_qty�

�item_price�108.25�/item_price�

�/item�

�item�

�item_no�2�/item_no�

�item_desc�Travel Alarm With Radio�/item_desc�

�item_qty�1�/item_qty�

�item_price�17.42�/item_price�

�/item�

�/items�

�/order�

An application query over the view produces:

�summary�{view (“order_summary”)/ order_summary/order

[ @custld � 777]}�/summary�

�summary�

�order custld � “777”�

�order_id�100�/order_id�

�date�1999-10-23�/date�

�status�shipped�/status�

�/order��order custld � “777”�

�order_id�101�/order_id�

�date�2002-01-25�/date�

�status�accepted�/status�

�/order��/summary�

When views are accessed, they can be composed withviews they reference, making the previous view equiv-alent to:

�order_summary�{for $o in table(“Order”)/Order/row return�order custld � “{$o/customer_id}”�{

�order_id�

{ data ( $o/order_id) }�/order_id�

�date�

{ data( $o/date ) }�/date�

�status�

{data( $o/ship_method ) }�/status�

}�/order�}�/order_summary�

Composed views can be cached, allowing views tobe created against other views with minimal impact.Views do not need to be formed against the defaultview for efficiency.

Integration with novel storage structures and fed-eration. Highly operational data are data that areused as search and join conditions, in calculations,are updated, or control the behavior of programs.This kind of data is probably best stored using ma-ture relational mechanisms. However, not all dataare highly operational and not all data fit well in therelational model. Data can be historical in nature,such as billing records. Some data may be read butalmost never written. Some documents are usuallyretrieved as a complete unit. In this arena, search-ing, retrieval, and preservation of the original doc-ument structure are the significant operations. Cat-aloging and attaching external information to intactdocuments is often a requirement. Versioning maybe more important than updating. Other kinds ofdata may involve documents whose structure evolvesover time, such as survey forms. Data may also besparse, such as a catalog of diverse products eachwith different sets of attributes that can be queried.

Because the relational model is often out of sync withthe practicalities of nontraditional data, databasesmay be enhanced with novel data storage structures.Some of these can be: the ability to store documentsin full, the ability to form collections of documents,specialized indexes on XML documents, or XML-aware text search indexes capable of handling semi-structured sections of an XML document. The data-base may also support storage structures that main-tain user-defined data cataloging functions.

One problem with storing operational data in XMLdocuments is that XML must form trees. Data lower


in the hierarchy can often be duplicated, and thispresents a problem for efficient storage, buffering ofdata used to increase performance, and maintenanceof consistency when updating. To solve this prob-lem, databases may allow optimized storage of XMLfragments. Such fragments can be normalized andindexed. The fragments would be able to be accessedand joined efficiently with other fragments to formcomplete documents. Here, the ability to create viewsis important.

As the market demand grows, these new features willemerge from database providers. Specialized serv-ers and data storage techniques are applied to ad-dress the critical aspects, the heavy, day-to-day pro-cesses of the data. However, it is important to realizethat data are never utilized for a single purpose.Eventually we need all data to be integrated. Datamust be cataloged and summarized for intelligencereasons. In large corporate data environments, datain one system are often the source of messages orupdates that need to be applied to other systems.

When differing storage and indexing schemes exist,there is a need to tie them together. The commonground is XML and its flexible data model. XQueryis the way to process and formulate XML data mod-els. Using the database as the processor will makeit possible to efficiently integrate a variety of storagestructures and processing models. It is also possible forthe data manager to tie in existing federated supportin order to further increase the ability to integrateand manipulate various kinds and sources of data.

B2B or scientific intelligence applications may requireaggregations of data to be readily available. The da-tabase can form these results and they can be stored,or these computations can be specified in views. Thedata manager can cache view results according tospecified policy. In addition, functions to help gen-erate these kinds of queries can be added to theXQuery language by use of extension function librar-ies.

Special features for XQuery. A number of specialfeatures can be added to the database XQuery pro-cessor that will make it different than other querylanguages.

A default view can be generated across the entiredatabase. If this is supported, seamless queriesagainst meta-data and data will be possible. For ex-ample, one can ask for all the tables that have a col-umn named salary and have a value larger than

10000. XML query languages naturally query acrossmeta-data (tags) and data (node values). Exposingany XML view affords this ability. As more data areplaced in the view, the queries can become morepowerful and abstract. For example, a view could alsoexpose type, ownership, and data cataloging infor-mation as well as data values.

The XQuery processor can be enhanced with higherorder operators. An operator called ExecXQuerycould be added to allow a query to form queries un-der program control, execute them, and process theresults.14 Other, more specialized, higher-order op-

erators could be developed to help access variousdata sources or to implement customized pointer-to-data operations.

A database manager should be able to publish theXML schema of its views. An integrated way of pub-lishing human readable documentation of the fieldsof the view should be provided.

Data manager/development environment toolsshould be able to read existing XML schemas, sug-gest relational table formats that can retain the in-formation, allow the designer to interactively imple-ment storage structures, and form appropriate XMLviews upon the storage structures that implement de-sired schema types.14 The same technology can beextended to allow users to easily form new XML viewsfrom existing storage structures and to interact withXML shredders or automatic transaction generators.

When performance-critical applications are writtenusing SAX, DOM, or XSLT, specialized versions ofthese processors could be provided that supportstreaming forms of XML input and/or compressed orbinary formats of XML. As the XML programmingmodel evolves, the data manager can provide func-tionality that matches the needs of business appli-cations.

A user chooses the mostapplicable view and operates

XQuery against it, providingthe additional constraints to

formulate the desired result.


The database should provide a mechanism to directlyissue XQueries using the XML programming model,rather than traditional database application inter-faces or protocols. One such way is for databases torespond directly to XQueries via SOAP requests.

These additional features can further increase thepower of XML data stores and will allow users, pro-grammers, and applications to interact with dataservers in new ways and with greater ease.

Integrated database architecture

The integrated data manager will span relational andXML technologies. Figure 4 depicts an integrated re-lational and XQuery data manager (optional com-ponents are shown with dashed lines):

The integrated data manager can respond to requestspresented by new XML interfaces such as XQuery toSOAP, XQuery to SAX or DOM, and to traditional SQL

Figure 4 Integrated database architecture

HTTP

APPLICATION INTERFACE

SOAPDOMSAXJDBC STATIC SQLODBC

QUERYMODEL

COMPOSEDVIEW CACHE

HIGHER-ORDEROPERATORPROCESSOR

XQUERYFUNCTIONSANDOPERATORS

SQL ANDSQL/XMLFUNCTIONSANDOPERATORS

XML OUTPUTPLAN REWRITE

INDEXINTERFACE

PERMANENTSTORAGEMANAGER

TEMP TUPLEDATA MODELMANAGER

XML DATAMODELMANAGER

RELATIONAL

XML

(APPLICATION INTERFACE)

VIEWS

SQL AND SQL/XML PARSER

XQUERY PARSER

QUERYREWRITE

RUN TIME

OPTIMIZATION

TUPLEEXTERNALIZER

XMLSERIALIZER


interfaces. An integrated XQuery SOAP interface al-lows the database manager to respond to requestsover the network without having to install and con-figure a separate Web server. This approach also sim-plifies designs including Web servers because it elim-inates the need to use an SQL-based interface orprotocol in the design.

Data requests may be made using SQL/XML orXQuery. The data manager’s application interfacedetermines the type of query and sends it to the ap-propriate parser. The data manager may allowXQuery and SQL queries to be intermixed either di-rectly or through views. This requires a query modeland type system that accurately models both SQL andXQuery semantics.

Query rewrites are applied against the user’s query.Views are included and the combined query is op-timized by the rewrites. For XQuery, views areparsed, optimized, and cached before merging themwith the requesting query. This is important becauseXQuery views normally contain many XPath que-ries against the input views, either default views orother user-defined views that can be collapsed intomore direct accesses. Caching composed views al-lows later queries against the view to utilize an al-ready simplified representation of the view. Alter-natively, views can be simplified when they are storedby the database. It is likely that XML queries will betreated as dynamic queries (in contrast to traditionalstatic SQL queries), so compiling and optimizationtime is a part of the total cost of each invocation un-less the optimized views/queries are reused. Tech-niques similar to SQLJ’s (SQL interface for Java) cus-tomization feature could be developed for XQuery,but even so, not all applications will be customized.

If the database supports higher-order operators, por-tions of the query below the higher-order operatorare executed and the results of the subqueries formnew subqueries that are parsed and grafted into theuser query.14

If XML hierarchies will be output, query rewrites willbe performed against the simplified query to imple-ment specific plans that are needed to help the run-time component output correlated, hierarchical, in-formation in an efficient way. For example, therewrite would apply the Sorted Outer Union ap-proach.15 With new run-time operators, plans thatare even more efficient could be generated.

The query is then optimized to create the final run-time plan. If the database manager supports mul-tiple storage and indexing technologies, the optimizeracts against an abstract interface allowing many suchtechnologies to be plugged into the system. Tradi-tional features of optimizers such as plan caching andplan storage for static SQL continue to be applica-ble.

The run-time component executes the optimizedplan. The run-time component can execute any SQLor XQuery semantic contained in the query model.This allows query rewriting and optimization to becost-based and discretionary. In order to accomplishthese goals, the run-time component must imple-ment the set of both SQL and XQuery functions andoperators, because each language specifies differentsemantics. The run-time component must be ableto support XQuery-only operations such as naviga-tions using XPath. The reference-based XQuery datamodel must be added as a part of the run-time com-ponent. The run-time component uses an integrateddata model that allows tuples to contain instancesof the XQuery data model. Temporary storage ofintermediate results includes temporary instances ofthe XQuery data model held by a reference, and tem-porary tuples that may contain reference instancesof the XQuery data model. If the data manager sup-ports permanent storage in XML format, temporaryreferences may refer to permanent instances or tem-porary instances.

Finally, XML results are serialized, and converted tocharacter format, if required. XML results will ap-pear either by themselves or be bound out as col-umns of a tuple. The details are handled by the ap-plication interface.

The XTABLES16 project prototypes the XQuery re-write portions of the integrated data manager archi-tecture. XQueries are accepted by a SOAP-enabledWeb server (Lunar Eclipse) and are passed to an SQLstored procedure for execution at the data server.

The XTABLES stored procedure implements theXQuery parser and query rewriting system. Manysimple XQueries against relational data can be fullycomposed into SQL-compatible queries. Given anumber of limitations such as the fact that data inan SQL database are not ordered unless this is ex-plicitly requested, data in SQL databases can be suc-cessfully queried and published as XQuery views.


Full support of XQuery requires preserving docu-ment order, supporting duplicate node removal, andexact semantics of operations and functions. Imple-mentation of the integrated database architecturewill overcome these limitations.

Conclusion and future directions

The future of XML programming will be driven byfunctionality provided by standards-based languagesimplemented by data management systems. Data arestored in the relational model and for many reasonswill continue to be stored in this way, but XML is thecorrect way to present messages and to communi-cate with other systems. Data managers are the startand/or end point of many if not most data flows, andas such should perform the role of query processingand data transformation. SQL/XML is a new part ofthe SQL standard that allows users to form and queryXML data within the relational model. XQuery is anemerging XML-based language that can effectivelyquery and transform both relational and hierarchi-cal data. Nonrelational forms of data are becomingincreasingly important, and databases may be en-hanced with XML-specific storage structures. XQueryand SQL/XML are critical in supporting and integrat-ing new hierarchical storage structures within the da-tabase. Federated databases and mediators will alsobenefit from the XQuery data model and languagesthat can integrate nontraditional data sources withexisting relational data. Databases with SQL/XML andXQuery functionality will eliminate the need to pro-duce ad hoc, customized XML applications. Imple-menting standards-based XML languages at the dataserver will eliminate specialized skills needed for ap-plication development, long development cycles, andinefficient requests against data servers generated bynaive client-based solutions. XML-enabled databaseswill provide easier, faster, and more efficient systemsintegration efforts.

*Trademark or registered trademark of International BusinessMachines Corporation.

**Trademark or registered trademark of Sun Microsystems, Inc.

Cited references

1. D. Megginson, et al., SAX: The Simple API for XML. See http://www.saxproject.org., Version 1.0, May 1988.

2. World Wide Web Consortium, Document Object Model(DOM) Level 1 Specification, Version 1.0, W3C Recommen-dation 1 October, 1998. See http://www.w3c.org/TR/1998/REC-DOM-Level-1-19981001 and see http://www.w3c.org/DOM/DOMTR.

3. World Wide Web Consortium, XSL Transformations (XSLT),

Version 1.0, W3C Recommendation 16 November, 1999. Seehttp://www.w3c.org/TR/1999/REC-xslt-19991116.

4. World Wide Web Consortium, XHTMLTM 1.0: The Extensi-ble HyperText Markup Language: A Reformulation of HTML4 in XML 1.0, W3C Recommendation 26 January, 2000. Seehttp://www.w3.org/TR/xhtml1/ and see http://www.w3c.org/MarkUp/.

5. World Wide Web Consortium, XML Schema Part 1: Struc-tures, W3C Recommendation 2 May 2001 and XML SchemaPart 2: Datatypes, W3C Recommendation 02 May, 2001. Seehttp://www.w3c.org/TR/xmlschema-1/ and http://www.w3c.org/TR/xmlschema-2/.

6. World Wide Web Consortium, Extensible Markup Language(XML) 1.0 (Second Edition), W3C Recommendation 6 October,2000. See http://www.w3c.org/TR/2000/REC-xml-20001006.

7. World Wide Web Consortium, Namespaces in XML, 14-Jan-uary-1999. See http://www.w3c.org/TR/1999/REC-xml-names-19990114/.

8. World Wide Web Consortium, Simple Object Access Proto-col (SOAP) 1.1, W3C Note 08 May, 2000. See http://www.w3.org/TR/SOAP/. See http://www.w3c.org/2000/xp/Group/ for work in progress.

9. “(ISO-ANSI Working Draft) XML-Related Specifications(SQL/XML),” J. Melton, Editor, for FCD Ballot, WG3:ICN-011, H2-2002-063 (March 2002).

10. A. Eisenberg and J. Melton, “SQL/XML and the SQLX In-formal Group of Companies,” ACM SIGMOD Record 30, No.3 (September 2001).

11. DB2 Universal Database Extenders: XML Extender Adminis-tration and Programming, SC27-1234, IBM Corporation(2002).

12. WebSphere Studio Application Developer Programming Guide,SG24-6585-00, IBM Corporation.

13. World Wide Web Consortium, XQuery 1.0: An EML QueryLanguage, W3C Working Draft, 16 August 2002. See http://www.w3.org/TR/xquery.

14. J. E. Funderburk, G. Kiernan, J. Shanmugasundaram, E. She-kita, and C. Wei, “XTABLES: Bridging Relational Technol-ogy and XML,” IBM Systems Journal 41, No. 4, 616–641(2002, this issue).

15. J. Shanmugasundaram, et al., “Efficiently Publishing Rela-tional Data as XML Documents,” Proceedings of the VLDBConference, Cairo, Egypt, September 2000.

16. IBM, Xperanto Technology Demo, based on Xperanto/XTABLES prototype technology. See http://www.ibm.com/software/data/developer/demos/xperanto/.

Accepted for publication August 8, 2002.

John E. Funderburk IBM Software Group, Silicon Valley Lab-oratory, 555 Bailey Avenue, San Jose, California 95141 (electronicmail: [email protected]). Mr. Funderburk is a software devel-oper at IBM’s Silicon Valley Lab. He previously worked on DB2’sXML Extender and is currently working on XTABLES.

Susan Malaika IBM Software Group, Silicon Valley Laboratory,555 Bailey Avenue, San Jose, California 95141 (electronic mail:[email protected]). Ms. Malaika is a senior software engineerwith IBM’s Silicon Valley DB2 development group. She worksin the area of XML, DB2, and the Web.

Berthold Reinwald IBM Research Division, Almaden ResearchCenter, 650 Harry Road, San Jose, California 95120 (electronic mail:[email protected]). Dr. Reinwald joined the IBM Al-


maden Research Center in 1993, after finishing his Ph.D. degreein computer science from the University of Erlangen-Nuernberg.His Ph.D. thesis on workflow management received the “bestPh.D. thesis” award from the university and was published as abook. At IBM Research, Dr. Reinwald contributed to SMRC(shared memory-resident cache) in DB2 Common Server, queryexplain tools, workflow management with Lotus Notes�, Flow-Mark�, and MQSeries, researched and delivered in DB2 Uni-versal Database� support for OLE/COM, OLEDB, XML, andmost recently Web services. Dr. Reinwald is active in the design,architecture, and implementation of SQL extensions for XML.