xml and databases chap. 12. databases today data today: structured - info in databases – data...

Post on 13-Jan-2016

230 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

XML and databasesChap. 12

Databases Today

• Data today:• Structured - Info in databases

– Data organized into chunks, similar entities groups together– Descriptions for entities in groups – same format, length, etc.

• Semi-structured – data has certain structure, but not all items identical– Schema info may be mixed in with data values– Similar entities grouped together – may have different attributes– Self-describing data– May be displayed as a graph

• Unstructured data – Data can be of any type, may have no format or sequence

• Web pages in HTML• Video, sound, images

HTML <html>

<head> <title> Web page EXAMPLE</title>

</head> <body>

<P>This is an example of a HTML web page.</P> <B><P>This part is in bold font.</P></B> <I><P>This part is in italic font.</P></I>

</body> </html>

Figure APPH.1 HTML Code Example The file HTMLExample.html containing the HTML code shown in Figure APPH.1 will be retrieved by a web browser as the web page shown in Figure APPH.2.

Why XML not HTML?• HTML not suitable for specifying structured

data from databases - – Does not contain schema information

(unstructured • only how to display information

• Want: Data source – database with Web interface

• Specify content and format of Web pages with HTML– Use HTML tags (predefined) for formatting Web

documents

Types of XML documents

1. Data-centric – documents have small data items following a specific structure

2. Document-centric – documents with large amount of text (little structured data)

3. Hybrid – documents with structured and unstructured data

XML documents and DBMS

Options:• Use a DBMS to store XML documents as text

– If DBMS has module for document processing• Use DBMS to store XML document contents as data elements

– Works if all documents have same structure– map XML schema to DB schema

• Design special DBMS for storing XML data– New type of XML DBMS designed, e.g. based on

hierarchical model • Create XML documents from preexisting RDBS and store into

DB

XML

• XML – standard for structuring and exchanging data over Web

• Basic object is XML document• Structuring concepts:

– Elements (tags)– Attributes

• XML attributes – describe properties and characteristics of elements (tags)

– Also• Entities, identifiers, references

Elements

• Elements identified by:– Start tag < …>– End tag </…>

• Simple elements – data values• Complex elements – constructed from other

elements hierarchically– XML called a tree or hierarchical model– No limit on number of nesting elements

Well-formed

• XML document is well formed if:– Starts with XML declaration to indicate version

and other relevant attributes– Single root element– Element matching pair of start/end tags within

parent element– Syntactically correct– Can be processed to create internal tree

• Semi-structured can be schemaless or have a schema

Semi-structured Schemaless• If semistructured data – schemaless XML document• Want to use info in XML document to determine

schema of database• To do this, parse document to create tree structure

of data• schemaless XML document is standalone

<?xml version =“1.0” standalone=“yes”?> (no corresponding file, specifying schema)

Slide 27- 13

• Whole document must be parsed beforehand to generate tree

• Each time access data, must parse document to create tree structure of data

• Parsing SLOWS down the process• Set of API functions to manipulate tree and parsing models:

– DOM (Document Object Model) - uses main memory to parse entire document

– SAX – allows processing XML documents on the fly (also good for streaming XML documents)

Semi-structured schemaless --Parsing

Semi-structured not Schemaless

• To specify structure of data of semi-structured data if not schemaless– DTD (Document Type Definition), XML Schema

• Once parsed, allows validation of XML documents against:– DTD file or– XML schema file

• Valid means elements must follow structure and types specified in the separate schemas

XML DTD• First specify root tag• Parenthesis following element can be

– Type – names of other elements (children)– (#PCDATA) means element is a leaf node (parsed character data –

string)– | indicates either– * - element can be repeated 0 or more times– + - element can be repeated 1 or more times– ? – element can be repeated zero or one times– If no symbol, element must appear exactly once– Parenthesis can be nested

• Fig. 12.4

Slide 27- 18

DTD cont’d

• To check for conformance to DTD add to XML document:

<?xml version=“1.0” standalone=“no”><!DOCTYPE project SYSTEM “proj.dtd”>

• Could also include DTD doc at beginning of XML doc• Problems with DTD

– datatypes not general– Special syntax requires specialized processors– Elements must follow ordering of document

XML Schema

• Alternative to (evolution from) DTD• Standard for specifying structure of XML

documents• xsd – XML schema definition

XML Schema

• Same syntax rules, so same processors on both

• Could display the entire Company database as a single document

• Could store DB in XML format instead of relational DB

• Fig. 12.5

Slide 27- 22

Slide 27- 23

Slide 27- 24

Features of XML Schema

1) To identify XML schema language elements used, specify a file at a Web site location

• Each such definition is XML namespace<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>

• File name assigned to xsd, and this variable used as prefix to all XML schema commands

Features cont’d

2) Annotation, documents and language– Used for providing comments and other

descriptions, e.g. “en” means english

3) Elements and typesxsd:element - specifies element namexsd:ComplexType – if elements and childrenxsd:sequence –ordered set of element types,

e.g. dept, employee, etc.,

Features cont’d

4) First-level elementsspecified in element tags

5) Element type, minimum and maximum occurrences

MinOccurs, etc.

6) also xsd:key - PK

xsd:unique tag, but must give constraint a namexsd:keyref – foreign keys

Features cont’d

7) Structures of complex elements - complex types

8) Composite (compound) attributes – complex types

XSD 1.0XSD 2.0XSD 3.0XSD 4.0

To Query

• XPointer – Specify position in XML document so other

documents can link to it

• XPath– Query Language for selecting nodes from an XML document– Addresses parts of an XML document – Language mainly consists of location paths and expressions

semantics for functionality– facilities for manipulation of strings, numbers and booleans

To Query:

Use XPath (XML Path Language) to retrieve data• XPath – expression language, based on tree

representation of XML document• Small query language• Provides ability to navigate around the tree• Addresses specific parts of XML document• Provides a common syntax and shared model between

XPointer and XSLT (Extensible Stylesheet Language Transformations)

XPath location path and expressions

• A location path is e.g. child::para[position=(1)] • XPath expressions

– Returns collection of element nodes that satisfy patterns specified in expression

– Name with qualifier conditions– Separators:

/ means tag must appear as child of previous parent tag// means tag can appear as descendant of previous tag at any level

XPath• To access whole XML document:

Doc(www.company.com/info.xml)/company)• /company/department• //employee [employeeSalary gt 70000]/employeeName• /company/employee [employeeSalary gt 70000]/employeeName• /company/project/projectWorker [hours ge 20.0]

XPath Language• Provide a common syntax and shared model

between XPointer and XSLT – XSLT – functional language - no states, domain specific– XSLT - Language to create new document to transform the

format of XML data into data of other formats (human readable) - eg. XML data into HTML, plain text, PDF

– XSLT - Describes how files encoded in XML are to be formatted or transformed

• With XSLT can transform a document from XML to XML, from XML to HTML, etc.– Xpath is used to query elements

XSLT LanguageUses Xpath to query elements (select= )XSLT to specify results of the transformation (xsl: )

XSLT template:<xsl:template match="rss"> <html> <head><title><xsl:value-of select="channel/title"/></title></head> <xsl:apply-templates/> </html></xsl:template>

<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="insptoweb1.xsl"?> <buildinginspectors> <inspector> <insid>I11</insid> <insname>Jane</insname> </inspector> <inspector> <insid>I22</insid> <insname>Niko</insname> </inspector> <inspector> <insid>I33</insid> <insname>Mick</insname> </inspector> </buildinginspectors>

Figure APPH.5 XML File inspectors.xml (to be displayed as a web page) XSL file insptoweb1.xsl shown in Figure APPH.6 describes how the file inspector.xml will be displayed as a webpage. <?xml version="1.0" ?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title> Web page XML EXAMPLE1</title> </head> <body> <B>Building Inspectors</B> <xsl:for-each select="buildinginspectors/inspector"> <P> --- </P> <P>Inspector ID: <xsl:value-of select="insid" /></P> <P>Inspector Name: <xsl:value-of select="insname" /></P> </xsl:for-each> </body> </html> </xsl:template> </xsl:stylesheet>

Figure APPH.6 XSL File insptoweb1.xls (formats inspectors.xml as a web page)

Figure APPH.7 XML File inspectors.xml (displayed as a web page)

<?xml version="1.0" ?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title> Web page XML EXAMPLE2</title> </head> <body> <B>Building Inspectors</B> <table border="1"> <tr bgcolor="#BBFFFF"> <th>Inspector ID</th> <th>Inspector Name</th> </tr> <xsl:for-each select="buildinginspectors/inspector"> <tr> <td><xsl:value-of select="insid" /></td> <td><xsl:value-of select="insname" /></td> </tr> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet>

Figure APPH.8 XSL File insptoweb2.xls (formats inspectors.xml as a web page)

Figure APPH.6 XML File inspectors.xml (displayed as a web page)

XQuery

• No other query languages besides XPath and XSLT until XQuery– XQuery (like SQL) to query data using XPath

expressions• Based on SQL-like FLWOR for joins• For, Let, Where, Orderby Return

XQuery – querying in XML

For <variable bindings to individual nodes>LET <variable binding to collections of nodes>WHERE <qualifer conditions>ORDER BY RETURN <query result specification>

FLWOR

For, Let can appear any number of times or in any order

Where, order by are optionalReturn – always needed

Can be nestedCan be argument to function (e.g. count, max())

XQUERY examples (Fig. 12.7)FOR $x IN

doc(www.company.com/info.xml)//employee [employeeSalary gt 70000]/employeeNameRETURN <res> $x/firstName, $x/lastName </res>

FOR $x INdoc(www.company.com/info.xml)WHERE $x/employeeSalary gt 70000 RETURN <res> $x/employeeName/firstName,

$x/employeeName/lastName </res>

XQUERY examples

FOR $x IN doc(www.company.com/info.xml)/company/

project[projectNumber=5]/projectWorker, $y IN doc(www.company.com/info.xml)/company/employee WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssn RETURN <res> $y/employeeName/firstName, $y/employeeName/lastName, $x/hours </res>

What does this do?What do you think of XQuery?

Another Example

• let $maxCredit := 3000let $overdrawnCustomers := //customer[overdraft > $maxCredit]return count($overdrawnCustomers)

• for $v in //videofor $a in //actorwhere $v/actorRef = $a/@idorder by $a, $v/yearreturn concat($a, ":", $v/title)

SQL/XML

• How are the giants of the DB market incorporating XML?– Oracle is supporting SQL/XML (also IBM)

• These are extensions of SQL

– SQL has a proprietary SQLXML for SQL Server• Not the same as SQL/XML

SQL/XML

• XML Publishing Functions• The XML Datatype• Mapping Rules

XML Publishing Functions

xmlelement() Creates an XML element, allowing the name to be specified.

xmlattributes()Creates XML attributes from columns, using the name of each

column as the name of the corresponding attribute.

xmlroot() Creates the root node of an XML document.

xmlcomment() Creates an XML comment.

xmlpi() Creates an XML processing instruction.

xmlparse() Parses a string as XML and returns the resulting XML structure.

xmlforest()Creates XML elements from columns, using the name of each

column as the name of the corresponding element.

xmlconcat()Combines a list of individual XML values to create a

single value containing an XML forest.

xmlagg()Combines a collection of rows, each containing a single XML value,

to create a single value containing an XML forest.

Another XML Example<?xml version="1.0" ?> <HAFH> <Building> <BuildingID>B1</BuildingID> <BNoOfFloors>5</BNoOfFloors> <Apartment> <AptNo>21</AptNo> <ANoOfBedrooms>1</ANoOfBedrooms> </Apartment> <Apartment> <AptNo>41</AptNo> <ANoOfBedrooms>1</ANoOfBedrooms> <CorporateClient> <CCID>C111</CCID> <CCName>BlingNotes</CCName> </CorporateClient> </Apartment> </Building> <Building> <BuildingID>B2</BuildingID> <BNoOfFloors>6</BNoOfFloors> ... ... </HAFH>

Create table Building ( BuildingID char(2), BNofOfFloors int);

Create table Apartment( AptNo int, ANoOfBedrooms int, Bnum char(2) CCID char(4));

Create table CorporateClient ( CCID char(4), CCName char(25));

SQL/XML

SQL Query A SELECT i.insid, i.insname FROM inspector i;

SQL query QueryA results in the following output:

SQL Query A Result insid insname I11 Jane I22 Niko I33 MIck

<inspector> <insid>I11</insid> <insname>Jane</insname></inspector><inspector> <insid>I22</insid> <insname>Niko</insname></inspector>

<inspector> <insid>I33</insid> <insname>Mick</insname></inspector>

SQL/XML query QueryAX utilizes SQL/XML function xmlelement() for creating an XML element. SQL/XML Query AX SELECT xmlelement(name "inspector", xmlelement(name "insid", i.insid),

xmlelement(name "insname",i.insname)) FROM inspector i;

SQL query QueryA results in the following output:

Let's compare a traditional SQL query with one that uses an XML publishing function. Here is a traditional SQL

query that shows customers and their associated projects:

Here is is an excerpt of the result:

Now let's wrap the result in XML elements using xmlelement(), one of the publishing functions:

Each row in the result contains one Customer element. A Customer element looks like this:

xmlforest() is an XML publishing function that creates elements from a list of columns, using the name of the

column as the name of the element. Using xmlforest() simplifies many queries significantly. For instance, the

following query is equivalent to the previous one:

Now suppose we want to show customers and the projects associated with them. This is easily done with the

following SQL query:

However, the result of this query is that shown in the CustomerProject table in the previous section, with one row

for each Customer/Project pair. I f a customer is associated with more than one project, there will be a row for that

customer for each project. Here is a SQL/XML query that creates the XML equivalent to that table:

Here are the results of this query:

XML Datatype

• The XML Type also plays a second important role

• Relational databases now routinely store XML in individual columns

• The XML Type provides a standard type for such columns, which is useful both in SQL and in JDBC.

XML Datatype

Suppose the above query is in a string called sqlxmlString. Then the following J ava code can be used to execute

the query and retrieve values.

SQL/XML Mapping Rules• Mapping SQL character sets to Unicode.• Mapping SQL <identifier>s to XML Names.• Mapping SQL data types (as used in SQL-schemas to define SQL-schema

objects such as columns) to XML Schema data types.• Mapping values of SQL data types to values of XML Schema data types.• Mapping an SQL table to an XML document and an XML Schema

document.• Mapping an SQL schema to an XML document and an XML Schema

document.• Mapping an SQL catalog to an XML document and an XML Schema

document.• Mapping Unicode to SQL character sets.• Mapping XML Names to SQL <identifier>s.

XQuery vs. SQL/XML

Oracle XML DB• Documentation for Oracle XML DB• Some of the features are:

– XML DB is not a separate server, but group of technologies– Can utilize unstructured/structured data– Features:

• XMLType• DOM fidelity• XML schema• XPath search• XML indexes

– Can even generate XML from Oracle DB

Oracle XML

• Creating a Table of XMLType• CREATE TABLE XMLTABLE OF XMLType;

• Creating a Table with an XMLType Column• CREATE TABLE Example1 ( KEYVALUE varchar2(10) primary key,

XMLCOLUMN xmltype );

• existsNode() to find a Node to Match the XPath Expression

• extractValue() is the same as extract() except it returns value without the XML element tags, must be a single element

Oracle and XML

• Schema is created for XML when using SQL*Plus• //A DTD is not needed, but you can register one,

whereby XSD is a meta generic DTD (info)

Create table Company of XMLType;

The rest of the definitionThe same fields do not have to be specified in every

elementUse: set long 500

to show all values in table

SELECT extractValue(OBJECT_VALUE, '/Company/Employee/Fname') FNAME From Company

SELECT extractValue(OBJECT_VALUE, '/Company/Employee/Fname') FNAME FROM Company WHERE existsNode(OBJECT_VALUE, '/Company/Employee/Salary <"40000"') = 1;

Examples in this section are based on the following PurchaseOrder XML document:<PurchaseOrder

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.oracle.com/xdb/po.xsd"> <Reference>ADAMS-20011127121040988PST</Reference> <Actions> <Action>

<User>SCOTT</User> <Date>2002-03-31</Date>

</Action> </Actions> <Reject/> <Requestor>Julie P. Adams</Requestor> <User>ADAMS</User> <CostCenter>R20</CostCenter> <ShippingInstructions>

<name>Julie P. Adams</name> <address>Redwood Shores, CA 94065</address> <telephone>650 506 7300</telephone>

</ShippingInstructions> <SpecialInstructions>Ground</SpecialInstructions>

<LineItems> <LineItem ItemNumber="1">

<Description>The Ruling Class</Description> <Part Id="715515012423" UnitPrice="39.95" Quantity="2"/>

</LineItem> <LineItem ItemNumber="2">

<Description>Diabolique</Description> <Part Id="037429135020" UnitPrice="29.95" Quantity="3"/>

</LineItem> <LineItem ItemNumber="3">

<Description>8 1/2</Description> <Part Id="037429135624" UnitPrice="39.95" Quantity="4"/>

</LineItem> </LineItems>

</PurchaseOrder>

existNode()• existsNode() to find a Node to Match the XPath Expression• Given this sample XML document, the following existsNode() operators

return true (1).SELECT existsNode(value(X),'/PurchaseOrder/Reference') FROM XMLTABLE X; SELECT existsNode(value(X), '/PurchaseOrder[Reference="ADAMS-

20011127121040988PST"]') FROM XMLTABLE X;

extractValue()• Example 3-7 Valid Uses of extractValue()• SELECT extractValue(value(x),'/PurchaseOrder/Reference') FROM

XMLTABLE X;

• Returns the following:• EXTRACTVALUE(VALUE(X),'/PURCHASEORDER/REFERENCE')

--------------------------------ADAMS-20011127121040988PST

extractValue()• Non-Valid Uses of extractValue()• SELECT extractValue(value(X),

'/PurchaseOrder/LineItems/LineItem/Description') FROM XMLTABLE X; -- FROM XMLTABLE X; -- * -- ERROR at line 3: -- ORA-19025: EXTRACTVALUE

returns value of only one node

extract()• Using extract() to Return an XML Fragment• The following extract() statement returns an XMLType that contains an XML

document fragment containing occurrences of the Description node. These match the specified XPath expression shown.

• Note: In this case the XML is not well formed as it contains more than one root node.

set long 20000 SELECT extract(value(X), '/PurchaseOrder/LineItems/LineItem/Description') FROM XMLTABLE X;

• -- This returns: -- EXTRACT(VALUE(X),'/PURCHASEORDER/LINEITEMS/LINEITEM/DESCRIPTION') -- ---- <Description>The Ruling Class</Description> <Description>Diabolique</Description> <Description>8 1/2</Description>

updateXML()• Using updateXML() to Update a Text Node Value Identified by an XPath

Expression• This example uses updateXML() to update the value of the text node

identified by the XPath expression `/PurchaseOrder/Reference':

UPDATE XMLTABLE t SET value(t) = updateXML(value(t), '/PurchaseOrder/Reference/text()', 'MILLER-200203311200000000PST') WHERE existsNode(value(t), '/PurchaseOrder[Reference="ADAMS-20011127121040988PST"]') = 1;

• This returns:• 1 row updated.

Oracle XML

• Oracle XML DB complies with the W3C XSL/XSLT recommendation by supporting XSLT transformations in the database.

• In Oracle XML DB, XSLT transformations can be performed using either of the following:– XMLTransform() function – XMLType datatype's transform() method

transform()• Using transform() to Transform an XSL• The following example shows how transform() can apply XSLT to an XSL stylesheet,

PurchaseOrder.xsl, to transform the PurchaseOrder.xml document:SELECT value(t).transform(xmltype(getDocument('purchaseOrder.xsl'))) from XMLTABLE t where existsNode(value(t), '/PurchaseOrder[Reference="MILLER-

200203311200000000PST"]' ) = 1;

• This returns:• VALUE(T).TRANSFORM(XMLTYPE(GETDOCUMENT('PURCHASEORDER.XSL'))) ---------

<html> <head/> <body bgcolor="#003333" text="#FFFFCC" link="#FFCC00" vlink="#66CC99" alink="# 669999">

<FONT FACE="Arial, Helvetica, sans-serif"> <center> ... </FONT> </body> </html>

Since the transformed document using XSLT is expected as an instance of XMLType, the source could easily be a database table.

Research Topics

• XML to relational data mapping• Updating XML in RDBMS• XML and access control• XML Parsing

– reference on this topic:• A Brief History of XML

top related