web data management
DESCRIPTION
Web Data Management. XQuery. In this lecture. Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting Resources XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/. - PowerPoint PPT PresentationTRANSCRIPT
Web Data Management
XQuery
1
In this lecture
• Summary of XQuery• FLWOR expressions – For, Let, Where, Order by, Return• FOR and LET expressions• Collections and sorting
ResourcesXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/
2
XQuery
• Based on Quilt (which is based on XML-QL)
• Uses XPath to express more complex queries• http://www.w3.org/TR/xquery• XML Query data model (of course )
– Ordered !
3
4
Sample Data for Queries
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
5
Data Model for XPath
bib
book book
publisher author . . . .
Addison-Wesley Serge Abiteboul
The root
The root element
FLWOR (“Flower”) Expressions• FLWOR expression supports iteration and binding of
variables to intermediate results – useful for computing joins between two or more documents and
for restructuring data FOR ... LET... WHERE...ORDER BY…RETURN...• The for and let clauses generate an ordered sequence of
tuples of bound variables, called the tuple stream• The optional where clause serves to filter the tuple stream
• The optional order by clause can be used to reorder the
tuple stream• The return clause constructs the result of the FLWOR
expression
6
FOR-WHERE-RETURN
Find all book titles published after 1995:
7
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year/text() > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year/text() > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
8
FOR-WHERE-RETURN
Equivalently (perhaps more geekish)
FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title
RETURN $x
FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title
RETURN $x
And even shorter:
document("bib.xml")/bib/book[year/text() > 1995] /title document("bib.xml")/bib/book[year/text() > 1995] /title
9
FOR-WHERE-RETURN
• Find all book titles and the year when they were published:
FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>
FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>
Result: <answer> <title> abc </title> <year> 1995 </ year > </answer> <answer> <title> def </title> < year > 2002 </ year > </answer> <answer> <title> ghk </title> < year > 1980 </ year > </answer>
10
FOR-WHERE-RETURN
• Notice the use of “{“ and “}”• What is the result without them ?
FOR $x IN document("bib.xml")/bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
FOR $x IN document("bib.xml")/bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
11
Nesting
For each author of a book by Morgan Kaufmann, list all books she published:
FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>
FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>
In the RETURN clause comma concatenates XML fragments
12
Result
<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
13
Aggregates
Find all books with more than 3 authors:
count = a function that countsavg = computes the averagesum = computes the sumdistinct-values = eliminates duplicates
FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x
FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x
14
Aggregates
Same thing:
FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x
FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x
15
Aggregates
Print all authors who published more than 3 books – be aware of duplicates !
FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a])>3 RETURN <author> { $a } </author>
FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a])>3 RETURN <author> { $a } </author>
Aggregates
16
count = a (aggregate) function that returns the number of elms
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
Aggregates
Find books whose price is larger than average:
17
LET $a=avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
LET $a=avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
18
Flattening
• “Flatten” the authors, i.e. return a list of (author, title) pairs
FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>
FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>
Result:<answer> <title> abc </title> <author> efg </author></answer><answer> <title> abc </title> <author> hkj </author></answer>
19
Re-grouping
• For each author, return all titles of her/his books
FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>
FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>
What aboutduplicateauthors ?
Result:<answer> <author> efg </author> <title> abc </title> <title> klm </title> . . . .</answer>
20
Re-grouping
• Same, but eliminate duplicate authors:
FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>
FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>
21
Re-grouping
• Same thing:
FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>
FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>
22
Another Example
Find book titles by the coauthors of “Database Theory”:
FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>
FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>
Result: <answer> abc </ answer > < answer > def </ answer > < answer > abc </ answer > < answer > ghk </ answer >
Question:Why do we get duplicates ?
23
Distinct-values
Same as before, but eliminate duplicates:
Result: <answer> abc </ answer > < answer > def </ answer > < answer > ghk </ answer >
distinct-values = a function that eliminates duplicates
Need to apply to a collectionof text values, not of elements – note how query has changed
FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())
RETURN <answer> { $y } </answer>
FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())
RETURN <answer> { $y } </answer>
24
SQL and XQuery Side-by-side
Product(pid, name, maker, price) Find all product names, prices,sort by price
SELECT x.name, x.priceFROM Product xORDER BY x.price
SELECT x.name, x.priceFROM Product xORDER BY x.price
SQL
FOR $x in document(“db.xml”)/db/Product/rowORDER BY $x/price/text()RETURN <answer> { $x/name, $x/price } </answer>
FOR $x in document(“db.xml”)/db/Product/rowORDER BY $x/price/text()RETURN <answer> { $x/name, $x/price } </answer>
XQuery
25
<answer> <name> abc </name> <price> 7 </price></answer> <answer> <name> def </name> <price> 23 </price></answer> . . . .
Xquery’s Answer
Notice: this is NOT awell-formed document !(WHY ???)
26
Producing a Well-Formed Answer
<myQuery> { FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN <answer> { $x/name, $x/price } </answer> }</myQuery>
<myQuery> { FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN <answer> { $x/name, $x/price } </answer> }</myQuery>
27
<myQuery> <answer> <name> abc </name> <price> 7 </price> </answer> <answer> <name> def </name> <price> 23 </price> </answer> . . . .</myQuery>
Xquery’s Answer
Now it is well-formed !
28
SQL and XQuery Side-by-sideProduct(pid, name, maker, price)Company(cid, name, city, revenues) Find all products made in Seattle
SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”
SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”
SQL
FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }
FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }
XQuery
FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }
FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }
CoolXQuery
29
<product> <row> <pid> 123 </pid> <name> abc </name> <maker> efg </maker> </row> <row> …. </row> …</product><product> . . .</product>. . . .
30
SQL and XQuery Side-by-sideFor each company with revenues < 1M count the products over $100
SELECT y.name, count(*)FROM Product x, Company yWHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000GROUP BY y.cid, y.name
SELECT y.name, count(*)FROM Product x, Company yWHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000GROUP BY y.cid, y.name
FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text()<1000000]RETURN <proudCompany> <companyName> { $y/name/text() } </companyName> <numberOfExpensiveProducts> { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } </numberOfExpensiveProducts> </proudCompany>
FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text()<1000000]RETURN <proudCompany> <companyName> { $y/name/text() } </companyName> <numberOfExpensiveProducts> { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } </numberOfExpensiveProducts> </proudCompany>
31
SQL and XQuery Side-by-sideFind companies with at least 30 products, and their average price
SELECT y.name, avg(x.price)FROM Product x, Company yWHERE x.maker=y.cidGROUP BY y.cid, y.nameHAVING count(*) > 30
SELECT y.name, avg(x.price)FROM Product x, Company yWHERE x.maker=y.cidGROUP BY y.cid, y.nameHAVING count(*) > 30
FOR $r in document(“db.xml”)/db, $y in $r/Company/rowLET $p := $r/Product/row[maker/text()=$y/cid/text()]WHERE count($p) > 30RETURN <theCompany> <companyName> { $y/name/text() } </companyName> <avgPrice> avg($p/price/text()) </avgPrice> </theCompany>
FOR $r in document(“db.xml”)/db, $y in $r/Company/rowLET $p := $r/Product/row[maker/text()=$y/cid/text()]WHERE count($p) > 30RETURN <theCompany> <companyName> { $y/name/text() } </companyName> <avgPrice> avg($p/price/text()) </avgPrice> </theCompany>
A collection
An element
32
Sorting in XQuery
<publisher_list> FOR $b IN document("bib.xml")//book[publisher = ‘M.K’] ORDER BY $b/price/text() RETURN <book> { $b/title , $b/price } </book></publisher_list>
<publisher_list> FOR $b IN document("bib.xml")//book[publisher = ‘M.K’] ORDER BY $b/price/text() RETURN <book> { $b/title , $b/price } </book></publisher_list>
If-Then-Else
33
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
Existential Quantifiers
34
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
Universal Quantifiers
35
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
36
Duplicate Elimination
• distinct-values(list-of-text-values)• How do we eliminate duplicate “tuples” ?
<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row><row> <a>3</a> <b>100</b> </row><row> <a>3</a> <b>200</b> </row><row> <a>8</a> <b>500</b> </row>
<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row><row> <a>3</a> <b>100</b> </row><row> <a>3</a> <b>200</b> </row><row> <a>8</a> <b>500</b> </row>
<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row>
<row> <a>3</a> <b>200</b> </row>
<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row>
<row> <a>3</a> <b>200</b> </row>
FOR v.s. LET
• FOR $x in expr -- binds $x to each element in the list expr
• LET $x = expr -- binds $x to the entire list expr– Useful for common subexpressions and for
aggregations
37
FOR v.s. LET
Summary:• FOR-LET-WHERE-RETURN = FLWR
38
FOR/LET Clauses
WHERE Clause
RETURN Clause
List of tuples
List of tuples
Instance of Xquery data model
FOR v.s. LET
FOR• Binds node variables iteration
LET• Binds collection variables one value
39
FOR v.s. LET
40
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>
Collections in XQuery
• Ordered and unordered collections– /bib/book/author = an ordered collection– Distinct(/bib/book/author) = an unordered collection
• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)
41
RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>
Collections in XQuery
What about collections in expressions ?
• $b/@price list of n prices
• $b/@price * 0.7 list of n numbers
• $b/@price * $b/@quantity list of n x m numbers ??
• $b/@price * ($b/@quant1 + $b/@quant2) $b/@price * $b/@quant1 + $b/@price * $b/@quant2 !!
42
Sorting in XQuery
43
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
Sorting in XQuery
• Sorting arugments: refer to the name space of the RETURN clause, not the FOR clause
• To sort on an element you don’t want to display, first return it, then remove it with an additional query.
44
Other Stuff in XQuery
• BEFORE and AFTER– for dealing with order in the input
• FILTER– deletes some edges in the result tree
• Recursive functions– Currently: arbitrary recursion– Perhaps more restrictions in the future ?
45
Group-By in Xquery ??
• No GROUPBY currently in XQuery• A recent proposal (next)
– What do YOU think ?
46
Group-By in Xquery ??
47
FOR $b IN document("http://www.bn.com")/bib/book,
$y IN $b/@year
WHERE $b/publisher="Morgan Kaufmann"
RETURN GROUPBY $y
WHERE count($b) > 10
IN <year> $y </year>
FOR $b IN document("http://www.bn.com")/bib/book,
$y IN $b/@year
WHERE $b/publisher="Morgan Kaufmann"
RETURN GROUPBY $y
WHERE count($b) > 10
IN <year> $y </year>
SELECT year
FROM Bib
WHERE Bib.publisher="Morgan Kaufmann"
GROUPBY year
HAVING count(*) > 10
SELECT year
FROM Bib
WHERE Bib.publisher="Morgan Kaufmann"
GROUPBY year
HAVING count(*) > 10
with GROUPBY
Equivalent SQL
Group-By in Xquery ??
48
FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
with GROUPBY
Without GROUPBY
Group-By in Xquery ??
49
FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@year, $t IN $b/title, $p IN $b/publisher RETURN GROUPBY $p, $y IN <result> $p, <year> $y </year>, GROUPBY $a IN <authorEntry> $a, GROUPBY $t IN $t <authorEntry> </result>
Nested GROUPBY’s