the xml standard overview of our xml standards motivation: html vs xml xml 101: syntax, elements,...

85
The XML Standard

Upload: carmella-horn

Post on 21-Dec-2015

312 views

Category:

Documents


0 download

TRANSCRIPT

The XML Standard

Overview of our XML Standards

• Motivation: HTML vs XML • XML 101: syntax, elements, attributes, DTDs,

…• XML 201: XML Schema, Namespaces • XSLT: Transforming and Rendering XML• XQuery: Search, Transform & Integrate

So what is XML (all about)?

Executive Summary:• XML = HTML – idiosyncrasies (simplified syntax) + user-definable ("semantic") tags• Separation of data and its presentation

=> simple, very flexible data exchange format: semistructured data model => new applications:

• Information exchange (B2B), sharing (diglib), integration ("mediation"), archival, ...

• Web site mangement (XML+XSL stylesheets), ...

What’s Wrong with HTML?

<DT><IMG SRC="greenball.gif" >&nbsp;<A NAME="object-fusion"></A>Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps">"ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I></DT>

Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina.“Object Fusion in Mediator Systems”. In VLDB 96.

HTML confuses presentationwith content

...What’s Wrong with HTML...

<DT><IMG SRC= "greenball.gif" >&nbsp;<A NAME="object-fusion"></A>Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps">"ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I></DT>

No Explicit Structure, Semantics, or Object-Orientation

Author

ConferenceTitle

... And Some Repercussions

• Lack of schema/semantics when querying the Web (HTML):– "find documents (books, papers, ...) where

author = Michael Jackson" (... and learn how software engineering meets the moon walker ...)

– "create a list of M. Jackson's books and (if available) their prices"

=> HTML is inappropriate for data exchange automation of information management

(retrieval, manipulation, integration)

XML is Based on Markup

<bibliography>

<paper ID= "object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper>

</bibliography>

Markup indicates

structure and semantics

Decoupled from presentation

Elements and their Content

element

element name

Character content

ElementContent

EmptyElement

<bibliography>

<paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper>

</bibliography>

Element Attributes

<bibliography>

<paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper>

</bibliography>

Attribute name

Attribute Value

XML = Labeled Ordered Trees

<bibliography> <paper id=23...> <authors> <author>Yannis</author> <author>Serge</author> ... </authors> <title>Object Fusion</title> ... </paper></bibliography>

bibliography

paper

authors

author author

...

titlefullpaper

Yannis Serge

Object Fusion...

paper

semistructured data labeled trees/graphs

can also represent• relational and • object-oriented data

@id

23

How do I sharestructure and

metadata/semantics with

my community?

In Search of the Lost Structure & Semantics

How to make all this automatable?

How do I learn and use the element structure

of a document?

Adding Structure and Semantics

• XML Document Type Definitions (DTDs):• define the structure of "allowed" documents (i.e.,

valid wrt. a DTD)

database schema => improve query formulation, execution, ...

• XML Schema – defines structure and data types

• XML Namespaces– identify your vocabulary

• Resource Description Framework (RDF) – simple metadata model

XML DTDs as Extended CFGs

<!element bibliography paper*><!element paper (authors,fullPaper?,title,booktitle)><!element authors author+>

bibliography paper* paper authors fullPaper? title booktitle authors author+

lhs = element (name)rhs = regular expression over elements + strings (PCDATA)

XML DTD

Grammar

<!element bibliography paper*><!element paper (authors, fullPaper?, title, booktitle)><!element authors author+><!element author (#PCDATA)><!element fullPaper EMPTY><!element title (#PCDATA)><!element booktitle (#PCDATA)><!attlist fullPaper source ENTITY #REQUIRED> <!attlist paper ID ID>

Document Type Definitions (DTDs)

Define and Constrain Element Names & Structure

Element TypeDeclaration

Attribute ListDeclaration

Element Declarations

<!element bibliography paper*><!element paper (authors, fullPaper?, title, booktitle)><!element authors author+><!element author (#PCDATA)>

<!element fullPaper EMPTY><!element title (#PCDATA)><!element booktitle (#PCDATA)><!attlist fullPaper source ENTITY #REQUIRED> <!attlist paper ID ID>

Character content

Authors followed byoptional fullpaper,followed by title,

followed by booktitle

Sequence of 1 ormore author

Sequence of 0 ormore paper

Element Content Declarations

Declaration Meaning element name Exactly one instance of element R? Zero or one instances of R R* Zero or more instances of R R+ One or more instances of R R1|R2|…|Rn One instance of R1 or R2 or … Rn #PCDATA Character content EMPTY Empty element (#PCDATA e*)* Mixed Content ANY Anything goes

Attributes

<bibliography>

<paper ID="object-fusion" ROLE="publication">

<authors> <author authorRef="yannis">

Y.Papakonstantinou</author></authors>

<fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <related papers= "semistructured-data" "mediators"/> </paper>

</bibliography>

Object Identity Attribute

CDATA (character data)

<person ID="yannis"> Yannis’ info </person>

IDREFintradocument

reference

Reference toexternal ENTITY

Attribute Types

Type Meaning ID Token unique within the document IDREF Reference to an ID token IDREFS Reference to multiple ID tokens ENTITY External entity (image, video, …) ENTITIES External entities CDATA Character data NMTOKEN Enumerated token NMTOKENS Enumerated tokens More toappear?

More types (eg, DATE) may soon bepart of the standard

Uses of XML Entities

• Physical partition – size, reuse, "modularity", … (both XML docs &

DTDs)

• Non-XML data– unparsed entities binary data

• Non-standard characters– character entities

• Shorthand for phrases & markup

Types of Entities

• Internal (to a doc) vs. External ( use URI)

• General (in XML doc) vs. Parameter (in DTD)

• Parsed (XML) vs. Unparsed (non-XML)

Internal Text Entities

<!ENTITY WWW "World Wide Web">

<p>We all use the &WWW;.</p>

Internal Text Entity Declaration

Entity Reference

<p>We all use the World Wide Web.</p>

Logically equivalent to actually appearing

Unparsed (& "Binary") Entities

<!ENTITY fusion SYSTEM "fusion.ps" NDATA ps>

... and unparsed entity

<fullPaper source="fusion"/>

<!attlist fullPaper source ENTITY #REQUIRED>

Element with ENTITY attribute

Declare attribute type to be entity

<!NOTATION ps SYSTEM "ghostview.exe">

NOTATION declaration (helper app)

Declare external...

From Docs to Data: XML Schema

• XML DTDs (part of the XML spec.)– flexible, semistructured data model (nesting,

ANY, ?, *, |, ...) – but document-oriented (SGML heritage)

• XML Schema (W3C working draft)– schema definition language in XML– data-oriented: data types– extends capabilities of DTD

Sample Data for Introduction to XML Schema

<?xml version="1.0" encoding="utf-8"?> <book isbn="0836217462"> <title>Being a Dog Is a Full-Time Job</title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character> <character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character></book>

The Simple “Russian Doll” Approach to XML Schema

<?xml version="1.0" encoding="utf-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="character“

minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="friend-of" type="xsd:string“

minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="since" type="xsd:date"/> <xsd:element name="qualification" type="xsd:string"/> </xsd:sequence> … <xsd:attribute name="isbn" type="xsd:string"/>

Optional Namespace Definition

Sequence Compositor Simple TypeContent fortitle andauthor

Complex TypeContent for book

Character may appear any number of times

Basic Type of XML Schema

The Catalog Approach to XML Schema: Stand-Alone Declarations & References

<xsd:element name="title" type="xsd:string"/><xsd:element name="author" type="xsd:string"/><xsd:element name="name" type="xsd:string"/>…<xsd:attribute name="isbn" type="xsd:string"/>

<xsd:element name="character"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="friend-of” minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="since"/> <xsd:element ref="qualification"/> </xsd:sequence> </xsd:complexType></xsd:element>

Simple TypeElements

Attributes

Complex TypeElement character

Reference

Catalog Approach Cont’d

<xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="author"/> <xsd:element ref="character“ minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute ref="isbn"/> </xsd:complexType></xsd:element>

Named Types

• Write stand-alone named complex type or simple type declarations

• Primitive form of inheritance (called derivation) allows– Restriction– Extension

<xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="32"/> </xsd:restriction></xsd:simpleType>

<xsd:complexType name="characterType"> <xsd:sequence> <xsd:element name="name“ type="nameType"/> <xsd:element name="friend-of“ type="nameType” minOccurs="0“ maxOccurs="unbounded"/> <xsd:element name="since" type="sinceType"/> <xsd:element name="qualification"

type="descType"/> </xsd:sequence> </xsd:complexType>

nameType derived from xsd:string by havingthe xsd:maxLength facet restrict string to a

Maximum of to 32 characters

nameType used in the declaration of characterType

Groups: Named containers of sets of Elements or Attributes

<xsd:group name="mainBookElements"> <xsd:sequence> <xsd:element name="title" type="nameType"/> <xsd:element name="author" type="nameType"/> </xsd:sequence></xsd:group>

<xsd:complexType name="bookType"> <xsd:sequence> <xsd:group ref="mainBookElements"/> <xsd:element name="character" type="characterType“ minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType>

Compositors: Sequence, Choice, All

<xsd:group name="nameTypes"> <xsd:choice> <xsd:element name="name" type="xsd:string"/> <xsd:sequence> <xsd:element name="firstName" type="xsd:string"/> <xsd:element name="middleName" type="xsd:string“

minOccurs="0"/> <xsd:element name="lastName" type="xsd:string"/> </xsd:sequence> </xsd:choice></xsd:group>

So far we have seensequences

The group nameTypes consists of one of • the element “name”• the sequence containing firstName,

middlename, lastName

Compositors (cont’d)

<xsd:complexType name="characterType"> <xsd:all> <xsd:element name="name“ type="nameType"/> <xsd:element name="friend-of“ type="nameType” minOccurs="0“ maxOccurs="unbounded"/> <xsd:element name="since" type="sinceType"/> <xsd:element name="qualification" type="descType"/> </xsd:all> </xsd:complexType>

The characterType consists of name, a list of friend-of,since, and qualification particles in no particular order.(Compare with the sequence compositor.)

Derivation of Simple Types:Unions and Lists So far we

have seenrestriction

s and facets

<xsd:simpleType name="isbnType"> <xsd:union> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{10}"/> </xsd:restriction> </xsd:simpleType>

<xsd:simpleType> <xsd:restriction base="xsd:NMTOKEN"> <xsd:enumeration value="TBD"/> <xsd:enumeration value="NA"/> </xsd:restriction> </xsd:simpleType> </xsd:union></xsd:simpleType>

The simple type isbnType will be either • a 10-digit string (notice the pattern)• the token "TBD“ or the token "NA"

Constraints: Uniqueness

<xsd:element name="book"> … <xsd:unique name="charNameMustBeUnique"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:unique> …</xsd:element>

By inserting xsd:unique in the book element declarationwe enforce that the character name’s in each book are unique

Namespaces

<xsd:schema xmlns:xsd=http://www.w3.org/2000/10/XMLSchema xmlns=http://example.org/ns/books/

targetNamespace=http://example.org/ns/books/ elementFormDefault="qualified“

attributeFormDefault="unqualified" >

Including Unknown Elements

<xsd:complexType name="descType" mixed="true"> <xsd:sequence> <xsd:any namespace=http://www.w3.org/1999/xhtml minOccurs="0" maxOccurs="unbounded“ processContents="skip"/> </xsd:sequence></xsd:complexType>

Presenting XML: XSLT

• Why Stylesheets?

– separation of content (XML) from presentation (XSL)

• Why not just CSS for XML?

– XSL is far more powerful:

• selecting elements

• transforming the XML tree

• content based display (result may depend on data)

XSLT Overview

• XSLT stylesheets are denoted in XML syntax

• XSL components:

1. a language for transforming XML documents (XSLT: integral part of the XSL specification)

2. an XML formatting vocabulary (Formatting Objects: >90% of the formatting properties inherited from CSS)

XSLT Processing Model

XML source tree XML,HTML,… result tree

XSL stylesheet

Transformation

XSLT Processing Model

• XSL stylesheet: collection of template rules• template rule: (pattern template)• main steps:

– match pattern against source tree– instantiate template (replace current node “.” by the

template in the result tree)– select further nodes for processing

• control can be– program-driven ("pull": <xsl:foreach> ...)– data/event-driven ("push": <xsl:apply-templates> ...)

<xsl:template match="product"> <table> <xsl:apply-templates select="sales/domestic"/> </table> <table> <xsl:apply-templates select="sales/foreign"/> </table> </xsl:template>

Template Rule: Example

(i) match pattern: process <product> elements(ii) instantiate template: replace each a product with two HTML tables(iii) select the <product> grandchildren (“sales/domestic”, “sales/foreign”) for further processing

pattern

template

Match/Select Patterns

• match patterns select patterns = defined in http://w3.org/TR/xpath

• Examples: – /mybook/chapter[2]/section/*– chapter|appendix– chapter//para– div[@class="appendix" and position() mod 2 = 1]//para

– ../@lang

Creating the Result Tree...• Literal result elements: non-XSL elements (e.g.,

HTML) appear “literally” in the result tree• Constructing elements:

(similar for xsl:attribute, xsl:text, xsl:comment,…)

• Generating text:

<xsl:element name = "…"> attribute & children definition</xsl:element>

<xsl:template match="person"> <p> <xsl:value-of select="@first-name"/> <xsl:text> </xsl:text> <xsl:value-of select="@surname"/> </p></xsl:template>

Example of Turning XML into HTML

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="FitnessCenter.xsl"?>

<FitnessCenter> <Member level="platinum"> <Name>Jeff</Name> <Phone type="home">555-1234</Phone> <Phone type="work">555-4321</Phone> <FavoriteColor>lightgrey</FavoriteColor> </Member></FitnessCenter>

HTML Document in an XSL Template

<?xml version="1.0"?><xsl:output method="html"/><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome! </BODY> </HTML> </xsl:template>

</xsl:stylesheet>

Extracting the Member Name

<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/>

<xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! </BODY> </HTML> </xsl:template>

</xsl:stylesheet>

Extracting a Value from an XML Document,

Navigating the XML Document

• Extracting values:– use the <xsl:value-of select="…"/> XSL

element• Navigating:

– The slash ("/") indicates parent/child relationship

– A slash at the beginning of the path indicates that it is an absolute path, starting from the top of the XML document

/FitnessCenter/Member/Name

"Start from the top of the XML document, go to the FitnessCenter element, from there go to the Member element, and from there go to the Name element."

Document/

PI<?xml version=“1.0”?>

ElementFitnessCenter

ElementMember

ElementName

ElementPhone

ElementPhone

ElementFavoriteColor

TextJeff

Text555-1234

Text555-4321

Textlightgrey

Extract the FavoriteColor and use it as the bgcolor

<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/>

<xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! </BODY> </HTML> </xsl:template>

</xsl:stylesheet>

(see html-example03)

Note

Attribute values cannot contain "<" nor ">" - Consequently, the following is NOT valid: <Body bgcolor="<xsl:value-of select='/FitnessCenter/Member/FavoriteColor'/>">

To extract the value of an XML element and use it as an attributevalue you must use curly braces: <Body bgcolor="{/FitnessCenter/Member/FavoriteColor}">

Evaluate the expression within the curly braces. Assign the valueto the attribute.

Extract the Home Phone Number<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/>

<xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! <BR/> Your home phone number is: <xsl:value-of select="/FitnessCenter/Member/Phone[@type='home']"/> </BODY> </HTML> </xsl:template>

</xsl:stylesheet>

Creating the Result Tree...

• Further XSL elements for ...– Numbering

• <xsl:number value="position()" format="1 ">

– Conditions• <xsl:if test="position() mod 2 = 0">

– Repetition...

Creating the Result Tree: Repetition

<xsl:template match="/"> <html> <head> <title>customers</title> </head> <body> <table>

<tbody> <xsl:for-each select="customers/customer"> <tr> <th>

<xsl:apply-templates select="name"/> </th> <xsl:for-each select="order">

<td> <xsl:apply-templates/></td>

... </html> </xsl:template>

Creating the Result Tree: Sorting

<xsl:template match="employees"> <ul> <xsl:apply-templates select="employee">

<xsl:sort select="name/last"/> <xsl:sort select="name/first"/> </xsl:apply-templates> </ul> </xsl:template>

<xsl:template match="employee"> <li> <xsl:value-of select="name/first"/> <xsl:text> </xsl:text> <xsl:value-of select="name/last"/> </li> </xsl:template>

More on XSL

• XSL(T):– Conflict resolution for multiple applicable rules – Modularization <xsl:include> <xsl:import>

– …

• XSL Formatting Objects– a la CSS

• XPath (navigation syntax + functions) = XSLT XPointer• ...

XQuery: Querying XML Sources

• Functional Query Language– Operates on the Xpath/XQuery data model– List of ordered trees– A document is list of size 1

• XQuery expressions are composed of– Path expressions– Element constructors– FLWR expressions– … and more …

chapter

Path Expressions

doc(“zoo.xml”)//chapter[2]//figure[caption=“Tree Frogs”]

In the second chapter of the document zoo.xml find the figures with caption “Tree Frogs”

book

chapter chapter appendixpart

section

paragraph

figure

caption

“Tree Frogs”

chapter

chapter

paragraph

figure

caption

“Just Frogs”

part

More Path Expressions

Find the first immediate chapter subelements of immediate part subelements of the document zoo.xml and retrieve

figures that have …

doc(“zoo.xml”)/part/chapter[1]//figure[caption=“Tree Frogs”]

chapter

book

chapter chapter appendixpart

section

paragraph

figure

caption

“Tree Frogs”

chapter

chapter

paragraph

figure

caption

“Just Frogs”

part

Element Construction

<result> doc(“zoo.xml”)//chapter[2]//figure[caption=“Tree Frogs”]</result>

In the second chapter of the document zoo.xml find the figures with caption “Tree Frogs” and place them into

an element called result

figure

caption

“Tree Frogs”

result

Bibliography Example Data Set<bib> <book> <author> Aho </author> <author> Hopcroft </author> <author> Ullman </author> <title> Automata Theory </title> <publisher> Morgan Kaufmann </publisher> <year> 1998 >/year> </book> <book> <author> Ullman </author> <title> Database Systems </title> <publisher> Morgan Kaufmann </publisher> <year> 1998 >/year> </book> <book> <author> Abiteboul </author> <author> Buneman </author> <author> Suciu </author> <title> Automata Theory </title> <publisher> Prentice Hall </publisher> <year> 1998 >/year> </book></bib>

Reviews Example Data Set<reviews> <review> <title> Automata Theory </title> <comment> It’s the best in automata theory </comment> <comment> A definitive textbook </comment> </review> …</reviews>

For-Let-Where-Return (FLWR)

FOR $b in doc(“bib.xml”)//bookWHERE $b/publisher = “Morgan Kaufmann”RETURN $b/title

List the titles of books published by “Morgan Kaufmann”

year

bib

book book book

publisher

MorganKaufmann

yearpublisher

MorganKaufmann

1998 1998

book

yearpublisher

PrenticeHall

1998

title title title

Think (tuples of) variable bindings

FOR/LET

WHERE

RETURN

Ordered lists of tuplesof variable bindings

Tuples of thatsatisfy the conditions

List of trees

$bbookbookbook

$bbookbook

title title

year

year

bib

book book

publisher

MorganKaufmann

yearpublisher

MorganKaufmann

1998 1998

book

yearpublisher

PrenticeHall

1998

title title title

FOR $b in doc(“bib.xml”)//book WHERE $b/year > 1990 RETURN $b/author

Return the list of authors who

published after 1990

Tuples

FOR $p in distinct(doc(“bib.xml”)//publisher)LET $b := document(“bib.xml”)//book[publisher = $p]WHERE count($b) > 1RETURN $p

List publishers who have publishedmore than 1 book

Tuples ($p, $b) are formulated

Boolean Expressions in WHERE

FOR $b in doc(“bib.xml”)//bookWHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998”RETURN $b/title

List the titles of books published by “Morgan Kaufmann” in 1998

JoinsFOR $b in doc(“bib.xml”)/book, $r in doc(“review.xml”)/reviewWHERE $b/title = $r/titleRETURN <book_with_review> {$b/@*} {$b/*} {$r/comment} </book_with_review>

For every book with a matching review output a book_with_review

that contains all the attributes and subelements of book

and the comment subelements of review

<book_with_review> <author> Aho </author> <author> Hopcroft </author> <author> Ullman </author> <title> Automata Theory </title> <publisher> Morgan Kaufmann </publisher> <year> 1998 >/year> <comment> It’s the best in automata theory </comment> <comment> A definitive textbook </comment> </book_with_review>

Relax Order Conditions

FOR $b in unordered(doc(“bib.xml”)//book)WHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998”RETURN $b/title

List the titles of books published by “Morgan Kaufmann” in 1998

Very important feature in dealing withrelational sources and other set-oriented sources.

SELECT titleFROM bibWHERE publisher = “Morgan Kaufmann” AND year =1998

Depending on the indices and access methods used, the SQL query processor may deliver the

tuples in different order

Nested queries

FOR $a IN distinct(document(“bib.xml”)//author/text())RETURN <author> <name> $a </name> { FOR $b IN document(“bib.xml”)//book[author=$a] RETURN $b/title } </author>

Invert the structure of the input document so that there is a list of author elements containing the

name of the author and the list of books he wrote

Conditionals

FOR $b IN doc(“bib.xml”)/bookRETURN <short> {$b/title} <author> {IF count($b/author) < 3 {$b/author} ELSE {$b/author[1], <author>and others</author> </short>

Existential and Universal Quantification

FOR $b in doc(“bib.xml”)/bookWHERE $b/author = “Ullman”RETURN $b

FOR $b in doc(“bib.xml”)/bookWHERE EVERY $author IN $b/author SATISFIES $author= “Ullman”RETURN $b

Return books where at least one of the authors is

“Ullman”

Return books where all authors are “Ullman”

Functions

DEFINE FUNCTION depth($e) RETURNS xsd:integer{ IF (empty($e/*) THEN 1 ELSE max(depth($e/*) + 1}

FOR $b in doc(“bib.xml”)/book RETURN depth($b)

Applicability of XML Query Languages (Xquery)

• XQuery standard does NOT elaborate on the physical aspects of the XML sources

• Custom functions can provide access and reference to the source(s) – document(“test.xml”), source(“view1”)

• Question: as we go down the list of uses of XQuery compare with XSL

XQuery on files, DOM objects, event streams, messages

• Usage scenarios– Transformation and processing of messages

• Significant (but not “killer”) advantages over XSL– Minor performance optimization superiority– Better streaming, pipelining– Cleaner extensible language

• Many academic and industrial prototypes of XQuery on files

XMLFile

XQuery Processor XQuery

DOMObject

SAXStream

Typical Scenario: XML Messaging

Wrapper

RDBMSRDBMS

Wrapper

SAP ERPSAP ERP

ApplicationRequests innative language

or specialwrapper API

SELECT *FROM Customer, OrderWHERE customer.name=“Joe” AND order.name=“Joe”

Cdom = Sap(conn1, “joe”)

SOAP serviceSOAP serviceMessage

Transformer

Summary of Steps

Developer’sProgramIssues

SQL Query

Wrapper returns

SQL result wrapped as

XML message

Developer’sXQuery

transformsXML message to

XML formatneeded by

app

Typical Scenario: XML Messaging

Wrapper

RDBMSRDBMS

Wrapper

SAP ERPSAP ERP

Application

SOAP serviceSOAP serviceMessage

Transformer

<query_result> <customer> <name> Joe </name> <balance>100M</balance> <order> fish… </order> </customer> <customer> <name> Joe </name> <balance>100M</balance> <order> meat… </order> </customer></query_result>

<customer> <name> Joe </name> <due> 780M </due> <orders> <order>fish</order> <order>meat</order> </orders></customer>

FOR $cn IN distinct(msg(123)/customer/name)RETURN <customer> $cn <due> 7.8 * msg(123)/customer[name=$cn]/balance </due> <orders> FOR $c IN msg(123)/customer WHERE $c/name = $cn RETURN {$c/order} </orders></customer>

Direct XQuery on Databases

Xquery Processor

RDBMSRDBMS

XML View of Relational DB

tuple

reldb

orders customers

name

tuple

balance

Joe 100M

XQuery

SQL (oneor more)

tuples

XMLresult

Let’s write a Russian

Doll schema

XQuery on Relational Databases FOR $c IN db(1)/customers/tupleWHERE $c/name = “Joe”RETURN <customer> $c/name <due> 7.8 * $c/balance </due> <orders> FOR $o IN db(1)/orders/tuple WHERE $c/name = $o/name RETURN $o </orders></customer> Xquery Processor

RDBMSRDBMS

XML View of Relational DB

SELECT * FROM customersWHERE name = “Joe”

For each customer #c SELECT * FROM orders WHERE orders.name = #c.name

Merge results

<customer> <name> Joe </name> <due> 780M </due> <orders> <order>fish</order> <order>meat</order> </orders></customer>

Summary of Steps

Developer’sProgramIssues

Xquery onXML view of

SQL DB

XqueryProcessor automatically

sends SQL queriesto DB and structures

XML result

XQuery on Relational Databases

• Single language for accessing database and structuring XML result

• Avoids deficiencies of SQL in dealing with nested structures, optional elements, etc

• …

XQuery on Distributed Sources

Xquery Processor (Mediator)

RDBMSRDBMS

XML View of All Sources

RDBMSRDBMS

XQuery XMLresult

XMLFile

Example: Access to Two Relational Databases

Xquery Processor (Mediator)

RDBMS(orders)

RDBMS(orders)

XML View of All Relational DBs

RDBMS(customers)

RDBMS(customers)

XQuery XMLresult

FOR $c IN db(1)/customers/tupleWHERE $c/name = “Joe”RETURN <customer> $c/name <due> 7.8 * $c/balance </due> <orders> FOR $o IN db(2)/orders/tuple WHERE $c/name = $o/name RETURN $o </orders></customer>

XQuery on Integrated Views

Xquery Processor (Mediator)

RDBMSRDBMS

Virtual Integrated XML View

RDBMSRDBMS

XQuery XMLresult

XMLFile

customer

view

customers

name

customer

balance

Joe 100M

orders

order order order

Let’s write the

“Joe” query again

and using XQuery to build the view

Xquery Processor (Mediator)

RDBMSRDBMS

Virtual Integrated XML View

RDBMSRDBMS

XQuery XMLresult

XMLFile

XQuery asView

Definition

View = Query

FOR $c IN db(1)/customers/tupleRETURN <customer> $c/name <due> 7.8 * $c/balance </due> <orders> FOR $o IN db(2)/orders/tuple WHERE $c/name = $o/name RETURN $o </orders></customer>