xml, xml schema, xpath and xquery query languages cs561 slides collated from several sources,...
Post on 22-Dec-2015
249 views
TRANSCRIPT
XML, XML Schema, XPath and XQuery Query Languages
CS561
Slides collated from several sources, including D. Suciu at Univ. of Washington
XML Data
CS561 - Spring 2007. 3
XML
W3C standard to complement HTML
• origins: structured text SGML
• motivation:– HTML describes presentation– XML describes content
• HTML e XML subset SGML
CS561 - Spring 2007. 4
From HTML to XML
HTML describes the presentation
CS561 - Spring 2007. 5
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
CS561 - Spring 2007. 6
XML<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>XML describes the content
CS561 - Spring 2007. 7
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags
CS561 - Spring 2007. 8
XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
attributes are alternative ways to represent data
CS561 - Spring 2007. 9
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references in XML are just syntax
CS561 - Spring 2007. 10
So Far
• Differences between “xml data” versus “relational data” ?
– Data model?– Typed?– Homogeneity?– Correctness?– Usage/Purpose ?
CS561 - Spring 2007. 11
“XML Data Model”
Numerous competing models:
• Document Object Model (DOM):– class hierarchy (node, element, attribute,…)– defines API to inspect/modify the document
• XML query data model (formal)
CS561 - Spring 2007. 12
XML Namespaces
• http://www.w3.org/TR/REC-xml-names
• name ::= [prefix:]localpart
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
CS561 - Spring 2007. 13
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
XML Namespaces
• syntactic: <number> , <isbn:number>
• semantic: provide URL for “shared” schema
defined here
CS561 - Spring 2007. 14
So Far
• What are “namespaces” good for ?
• Are they typically available for relational databases?
Schemas for XML
CS561 - Spring 2007. 16
DTD - Element Type Definitions
<!ELEMENT paper (title,author*, year, (journal|conference) )>
CS561 - Spring 2007. 17
XML Schemas
• generalizes DTDs (SGML derivative)
• now, instead uses XML syntax
• two main documents: structure and data types
• XML Schema more powerful but more complex
CS561 - Spring 2007. 18
XML Schema<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType
</xsd:element>
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType
</xsd:element>DTD: <!ELEMENT paper (title,author*,year, (journal|
conference))>
CS561 - Spring 2007. 19
So Far
• Differences between “xml schema” versus “relational schema” ?
– Purpose ? Do we need it ?– Definition time?– Strictness of typing ?– Underlying model ?
CS561 - Spring 2007. 20
Elements versus Types in XML Schema
<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>
<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>
<xsd:element name=“person” type=“ttt” /><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>
<xsd:element name=“person” type=“ttt” /><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>
DTD: <!ELEMENT person (name, address) >
CS561 - Spring 2007. 21
• Types:– Simple types (integers, strings, ...)– Complex types (regular expressions, like in DTDs)
• Element-type-element alternation:– Root element has a complex type – Complex type is a regular expression of elements– Those elements have their complex types ...– ...– Leaves have simple types
Elements versus Types in XML Schema
CS561 - Spring 2007. 22
Local and Global Types in XML Schema• Local type: <xsd:element name=“person”>
[define locally the person’s type] </xsd:element>
• Global type: <xsd:element name=“person” type=“ttt”/>
<xsd:complexType name=“ttt”> [define here the type ttt] </xsd:complexType>
Global types: can be reused in other elements
CS561 - Spring 2007. 23
Local v.s. Global Elements inXML Schema
• Local element: <xsd:complexType name=“ttt”>
<xsd:sequence> <xsd:element name=“address” type=“...”/>... </xsd:sequence> </xsd:complexType>
• Global element: <xsd:element name=“address” type=“...”/>
<xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element ref=“address”/> ... </xsd:sequence> </xsd:complexType> Global elements: like in DTDs
CS561 - Spring 2007. 24
Regular Expressions in XML Schema
Recall the element-type-element alternation: <xsd:complexType name=“....”>
[regular expression on elements] </xsd:complexType>
Regular expressions:• <xsd:sequence> A B C </...>• <xsd:choice> A B C </...>• <xsd:group> A B C </...> • <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...>• <xsd:... minOccurs=“0” maxOccurs=“1”> ..</...>
CS561 - Spring 2007. 25
Regular Expressions in XML Schema
Regular expressions:• <xsd:sequence> A B C </...> = A B C• <xsd:choice> A B C </...> = A | B | C• <xsd:group> A B C </...> = (A B C)• <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*• <xsd:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?
CS561 - Spring 2007. 28
Derived Types by Extensions
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base= "ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base= "ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Corresponds to inheritance
Key Constraints in XML
CS561 - Spring 2007. 30
Keys in XML Schema
<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
XML:
XML Schema for Key :
CS561 - Spring 2007. 31
Keys in XML Schema• In general, syntax is :
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
Notes: All XPath expressions “start” at the element currently being definedThe fields must identify a single “node”.
CS561 - Spring 2007. 32
Keys in XML Schema
• Unique = guarantees uniqueness• Key = guarantees uniqueness and existence• All XPath expressions are “restricted”:
– /a/b | /a/c OK for selector– //a/b/*/c OK for field
• Note: better than DTD’s ID mechanism
CS561 - Spring 2007. 33
Examples of Keys in XML Schema
• Examples<key name="fullName">
<selector xpath=".//person"/>
<field xpath="firstname"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
<key name="fullName">
<selector xpath=".//person"/>
<field xpath="firstname"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
Note: Must havesingle firstname,Single surname
CS561 - Spring 2007. 34
Foreign Keys in XML Schema
• Example
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
CS561 - Spring 2007. 35
So Far
• Differences between “keys/foreign-keys”in xml versus relational model?
– Purpose ? – Underlying model ?
XPath
“The Basic Building Block”
CS561 - Spring 2007. 38
XPath
• Goal = Permit access some nodes from document
• XPath main construct : Axis navigation
• Navigation step : axis + node-test + predicates
• Examples– descendant::node()– child::author– attribute::booktitle =“XML”
CS561 - Spring 2007. 39
XPath• XPath path consists of one or more navigation steps,
separated by “/”
• Navigation step : axis + node-test + predicates
• Examples– /descendant::node() /child::author– /descendant::node() /child::author [parent /attribute::booktitle =“XML”][2]
• XPath offers shortcuts :– no axis means child– // /descendant-or-self::node()/
CS561 - Spring 2007. 40
XPath- Child Axis Navigation• author is shorthand for child::author. • Examples:
– aaa -- all the children nodes labeled aaa – aaa/bbb -- all the bbb grandchildren of aaa children – */bbb all the bbb grandchildren of any child
• Notes:– . -- the context node– / -- the root node
aaa
bbb
ccc aaa
aaa bbb ccc
1 2 3
4 5 6 7
context node
CS561 - Spring 2007. 42
XPath- Child Axis Navigation
– /doc -- all doc children of the root– ./aaa -- all aaa children of the context node
(equivalent to aaa)
– text() -- all text children of context node– node() -- all children of the context node
(includes text and attribute nodes)– .. -- parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //text() -- all the text nodes in the document
CS561 - Spring 2007. 43
Predicates– [2] -- the second child node of the context node
– chapter[5] -- the fifth chapter child of context node
– [last()] -- the last child node of the context node
– chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (string-value is concatenation of all text on descendant text nodes)
– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”
CS561 - Spring 2007. 44
Axis navigation
• So far, our expressions have moved us down by moving to children nodes.
• Exceptions are :– . stay where you are– / go to the root– // all descendants of the root– .// all descendants of the context node
CS561 - Spring 2007. 45
Axis navigation
• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self
• Some of these describe single nodes:– self, parent
• Some describe sequences of nodes: – All others
CS561 - Spring 2007. 46
XPath Navigation Axesancestor
descendant
followingpreceding
following-siblingpreceding-sibling
child
attribute
namespace
self
CS561 - Spring 2007. 47
XPath Abbreviated Syntax
(nothing) child::@ attribute::// /descendant-or-self::node(). self::node().// descendant-or-self::node.. parent::node()/ (document root)
CS561 - Spring 2007. 49
So Far
Differences between SQL and XPATH?
• What are similar query capabilities?• What features does SQL have, but not XPATH?• What features does XPATH support, but not SQL?• Is XPath a full-fledged query language?
Query Languages - XQuery
CS561 - Spring 2007. 51
Summary of XQuery
• FLWR expressions• FOR and LET expressions• Collections and sorting
ResourcesXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/
CS561 - Spring 2007. 52
XQuery
• Designed based on Quilt (which is based on XML-QL)
• http://www.w3.org/TR/xquery/2/2001
• XML Query data model (ordered)
CS561 - Spring 2007. 53
FLWR (“Flower”) Expressions
FOR ... LET... FOR... LET...
WHERE...
RETURN...
CS561 - Spring 2007. 54
XQuery
Find the titles of all books published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
How does result look like?
CS561 - Spring 2007. 55
XQuery
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
CS561 - Spring 2007. 56
XQuery Example
FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
CS561 - Spring 2007. 57
XQuery Example
FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
For each author of a book by Morgan Kaufmann,
list all books she published:
What is query result ?
CS561 - Spring 2007. 58
XQueryResult: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author>Jones</author> <title> abc </title> <title> def </title> </result>
<result> <author> Smith </author> <title> ghi </title> </result>
CS561 - Spring 2007. 59
XQuery Example: Duplicates
For each author of a book by Morgan Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates
CS561 - Spring 2007. 60
Example XQuery Result
Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result>
<result> <author> Smith </author> <title> ghi </title> </result>
CS561 - Spring 2007. 61
XQuery
• FOR $x in expr – binds $x to each element in the list expr– Useful for iteration over some input list
• LET $x = expr – binds $x to the entire list expr– Useful for common subexpressions and for grouping
and aggregations
CS561 - Spring 2007. 62
XQuery with LET Clause
count = a (aggregate) function that returns number of elements
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
CS561 - Spring 2007. 63
XQuery
Find books whose price is larger than average:LET $a = avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
LET $a = avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
CS561 - Spring 2007. 64
FOR versus LET
FOR
• Binds node variables iteration
LET
• Binds collection variables one value
CS561 - Spring 2007. 65
FOR v.s. LET
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result>
CS561 - Spring 2007. 66
Collections in XQuery• Ordered and unordered collections
– /bib/book/author = an ordered collection
– distinct(/bib/book/author) = an unordered collection
• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>
CS561 - Spring 2007. 67
XQuery Summary
FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses
WHERE Clause
RETURN Clause
List of tuples
List of tuples
Instances of XQuery data model
CS561 - Spring 2007. 68
XQuery
Some more query features
CS561 - Spring 2007. 69
Sorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY (price DESCENDING) </publisher> SORTBY (name) </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY (price DESCENDING) </publisher> SORTBY (name) </publisher_list>
CS561 - Spring 2007. 70
Sorting in XQuery
• Sorting arguments: refer to name space of RETURN clause, not of FOR clause
• TIP: To sort on an element you don’t want to display, first return it, then remove it with an additional query.
CS561 - Spring 2007. 71
If-Then-Else
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
CS561 - Spring 2007. 72
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
CS561 - Spring 2007. 73
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
CS561 - Spring 2007. 74
So Far
• Similarities between SQL and XQuery?
• Differences between SQL and XQuery?
XML, XML Data Model
XML Schema, XPath XQuery