xml
DESCRIPTION
XML. May 6th, 2002. Instructor. AnHai Doan Brief bio high school in Vietnam & undergrad in Hungary M.S. at Wisconsin Ph.D. at Washington under Alon & Pedro Will move to Illinois in Aug Interests: databases, AI, Web. Agenda. XQuery: misc. stuff Processing XML data. XQuery. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/1.jpg)
XML
May 6th, 2002
![Page 2: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/2.jpg)
Instructor• AnHai Doan
• Brief bio– high school in Vietnam & undergrad in Hungary– M.S. at Wisconsin– Ph.D. at Washington under Alon & Pedro– Will move to Illinois in Aug– Interests: databases, AI, Web
![Page 3: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/3.jpg)
Agenda
• XQuery: misc. stuff
• Processing XML data
![Page 4: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/4.jpg)
XQuery
Summary:
• FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses
WHERE Clause
RETURN Clause
List of tuples
List of tuples
Instance of Xquery data model
![Page 5: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/5.jpg)
FOR v.s. LET
FOR
• Binds node variables iteration
LET
• Binds collection variables one value
![Page 6: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/6.jpg)
FOR v.s. LET
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>
![Page 7: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/7.jpg)
Collections in XQuery
• Ordered and unordered collections– /bib/book/author = an ordered collection
– Distinct(/bib/book/author) = an unordered collection
• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>
![Page 8: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/8.jpg)
Collections in XQuery
What about collections in expressions ?
• $b/price list of n prices
• $b/price * 0.7 list of n numbers
• $b/price * $b/quantity list of n x m numbers ??
• $b/price * ($b/quant1 + $b/quant2) $b/price * $b/quant1 + $b/price * $b/quant2 !!
![Page 9: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/9.jpg)
Sorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
![Page 10: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/10.jpg)
Sorting in XQuery
• Sorting arguments: refer to the name space of the RETURN clause, not the FOR clause
![Page 11: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/11.jpg)
If-Then-Else
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
![Page 12: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/12.jpg)
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
![Page 13: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/13.jpg)
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
![Page 14: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/14.jpg)
Other Stuff in XQuery
• BEFORE and AFTER– for dealing with order in the input
• FILTER– deletes some edges in the result tree
• Recursive functions– Currently: arbitrary recursion– Perhaps more restrictions in the future ?
![Page 15: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/15.jpg)
Processing XML Data
• Do we really need to process XML data? What are we processing XML for?
• How are we going to do it? Use existing technology?
• Are there other processing paradigms that we need to consider?
![Page 16: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/16.jpg)
Query Processing For XML
• Approach 1: store XML in a relational database. Translate an XQuery query into a set of SQL queries.– Leverage 20 years of research & development.
• Approach 2: store XML in an object-oriented database system.– OO model is closest to XML, but systems do not
perform well and are not well accepted.
• Approach 3: build a native XML query processing engine. – Still in the research phase
![Page 17: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/17.jpg)
Relational Approach
• Step 1: given a DTD, create a relational schema.
• Step 2: map the XML document into tuples in the relational database.
• Step 3: given a query Q in Xquery, translate it to a set of queries P over the relational database.
• Step 4: translate the tuples returned from the relational database into XML elements.
![Page 18: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/18.jpg)
Which Relational Schema?• The key question! Affects performance.
• No magic solution.
• Some options:– The EDGE table: put everything in one table– The Attribute tables: create a table for every tag
name.– The inlining method: inline as much data into
the tables.
![Page 19: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/19.jpg)
An Example DTD<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (name, state)> <!ELEMENT name (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ATTLIST book pub IDREF #IMPLIED>]>
![Page 20: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/20.jpg)
Recall: The XML Tree
db
book book publisher
title author title author author name state“CompleteGuideto DB2”
“Chamberlin” “TransactionProcessing”
“Bernstein” “Newcomer”“MorganKaufman”
“CA”
Tags on nodesData values on leaves
![Page 21: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/21.jpg)
The Edge Approach
sourceID tag destID destValue
- Don’t need a DTD.- Very simple to implement.
![Page 22: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/22.jpg)
The Attribute Approach
rootID bookId
bookID title
rootID pubID
pubID pubName
bookID author
Book
Title
Author
Publisher
pubID state
PubName
PubState
![Page 23: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/23.jpg)
The In-lining Approach
bookID title pubName pubState
bookID authorBookAuthor
Book
sourceID tag destID destValuePublisher
![Page 24: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/24.jpg)
Let the Querying Begin!
• Matching data using elements patterns.FOR $t IN
document(bib.xml)/book/[author=“bernstein”]/author/title
RETURN
<bernsteinBook> $t </bernsteinBook>
![Page 25: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/25.jpg)
The Edge Approach
SELECT e3.destValueFROM E as e1, E as e2, E as e3WHERE e1.tag = “book” and e1.destID=e2.sourceID and e2.tag=“title” and e1.destID=e3.sourceID and e3.tag=“author” and e2.author=“Bernstein”
![Page 26: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/26.jpg)
The Attribute Approach
SELECT Title.titleFROM Book, Title, Author WHERE Book.bookID = Author.bookID and Book.bookID = Title.bookID and Author.author = “Bernstein”
![Page 27: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/27.jpg)
The In-lining Approach
SELECT Book.titleFROM Book, BookAuthor WHERE Book.bookID =BookAuthor.bookID and BookAuthor.author = “Bernstein”
![Page 28: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/28.jpg)
A Challenge: Reconstructing Elements
• Matching data using elements patterns.FOR $b IN document(bib.xml)/book/[author=“bernstein”]
RETURN
<bernsteinBook> $b </bernsteinBook>
![Page 29: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/29.jpg)
Reconstructing XML Elements
• Matching data using elements patterns.WHERE <book>
<author> Bernstein </author>
<title> $t </title>
</book> ELEMENT-AS $e
IN “www.a.b.c/bib.xml”
CONSTRUCT
$e
![Page 30: XML](https://reader036.vdocuments.us/reader036/viewer/2022070404/56813c20550346895da59888/html5/thumbnails/30.jpg)
Some Open Questions
• Native query processing for XML• To order or not to order?• Combining IR-style keyword queries with
DB-style structured queries• Updates• Automatic selection of a relational schema• How should we extend relational engines to
better support XML storage and querying?