web data management

49
Web Data Management XQuery 1

Upload: adamma

Post on 19-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Web Data Management. XQuery. In this lecture. Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting Resources XQuery: A Query Language for XML Chamberlin, Florescu, et al. W3C recommendation: www.w3.org/TR/xquery/. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Web Data Management

Web Data Management

XQuery

1

Page 2: Web Data Management

In this lecture

• Summary of XQuery• FLWOR expressions – For, Let, Where, Order by, Return• FOR and LET expressions• Collections and sorting

ResourcesXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/

2

Page 3: Web Data Management

XQuery

• Based on Quilt (which is based on XML-QL)

• Uses XPath to express more complex queries• http://www.w3.org/TR/xquery• XML Query data model (of course )

– Ordered !

3

Page 4: Web Data Management

4

Sample Data for Queries

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Page 5: Web Data Management

5

Data Model for XPath

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

The root

The root element

Page 6: Web Data Management

FLWOR (“Flower”) Expressions• FLWOR expression supports iteration and binding of

variables to intermediate results – useful for computing joins between two or more documents and

for restructuring data FOR ... LET... WHERE...ORDER BY…RETURN...• The for and let clauses generate an ordered sequence of

tuples of bound variables, called the tuple stream• The optional where clause serves to filter the tuple stream

• The optional order by clause can be used to reorder the

tuple stream• The return clause constructs the result of the FLWOR

expression

6

Page 7: Web Data Management

FOR-WHERE-RETURN

Find all book titles published after 1995:

7

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year/text() > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year/text() > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

Page 8: Web Data Management

8

FOR-WHERE-RETURN

Equivalently (perhaps more geekish)

FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title

RETURN $x

FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title

RETURN $x

And even shorter:

document("bib.xml")/bib/book[year/text() > 1995] /title document("bib.xml")/bib/book[year/text() > 1995] /title

Page 9: Web Data Management

9

FOR-WHERE-RETURN

• Find all book titles and the year when they were published:

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>

Result: <answer> <title> abc </title> <year> 1995 </ year > </answer> <answer> <title> def </title> < year > 2002 </ year > </answer> <answer> <title> ghk </title> < year > 1980 </ year > </answer>

Page 10: Web Data Management

10

FOR-WHERE-RETURN

• Notice the use of “{“ and “}”• What is the result without them ?

FOR $x IN document("bib.xml")/bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

FOR $x IN document("bib.xml")/bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

Page 11: Web Data Management

11

Nesting

For each author of a book by Morgan Kaufmann, list all books she published:

FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>

FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>

In the RETURN clause comma concatenates XML fragments

Page 12: Web Data Management

12

Result

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Page 13: Web Data Management

13

Aggregates

Find all books with more than 3 authors:

count = a function that countsavg = computes the averagesum = computes the sumdistinct-values = eliminates duplicates

FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x

FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x

Page 14: Web Data Management

14

Aggregates

Same thing:

FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

Page 15: Web Data Management

15

Aggregates

Print all authors who published more than 3 books – be aware of duplicates !

FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a])>3 RETURN <author> { $a } </author>

FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a])>3 RETURN <author> { $a } </author>

Page 16: Web Data Management

Aggregates

16

count = a (aggregate) function that returns the number of elms

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

Page 17: Web Data Management

Aggregates

Find books whose price is larger than average:

17

LET $a=avg(document("bib.xml")/bib/book/@price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/@price > $a

RETURN $b

LET $a=avg(document("bib.xml")/bib/book/@price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/@price > $a

RETURN $b

Page 18: Web Data Management

18

Flattening

• “Flatten” the authors, i.e. return a list of (author, title) pairs

FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>

FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>

Result:<answer> <title> abc </title> <author> efg </author></answer><answer> <title> abc </title> <author> hkj </author></answer>

Page 19: Web Data Management

19

Re-grouping

• For each author, return all titles of her/his books

FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

What aboutduplicateauthors ?

Result:<answer> <author> efg </author> <title> abc </title> <title> klm </title> . . . .</answer>

Page 20: Web Data Management

20

Re-grouping

• Same, but eliminate duplicate authors:

FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

Page 21: Web Data Management

21

Re-grouping

• Same thing:

FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> {$x} </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

Page 22: Web Data Management

22

Another Example

Find book titles by the coauthors of “Database Theory”:

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>

Result: <answer> abc </ answer > < answer > def </ answer > < answer > abc </ answer > < answer > ghk </ answer >

Question:Why do we get duplicates ?

Page 23: Web Data Management

23

Distinct-values

Same as before, but eliminate duplicates:

Result: <answer> abc </ answer > < answer > def </ answer > < answer > ghk </ answer >

distinct-values = a function that eliminates duplicates

Need to apply to a collectionof text values, not of elements – note how query has changed

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())

RETURN <answer> { $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())

RETURN <answer> { $y } </answer>

Page 24: Web Data Management

24

SQL and XQuery Side-by-side

Product(pid, name, maker, price) Find all product names, prices,sort by price

SELECT x.name, x.priceFROM Product xORDER BY x.price

SELECT x.name, x.priceFROM Product xORDER BY x.price

SQL

FOR $x in document(“db.xml”)/db/Product/rowORDER BY $x/price/text()RETURN <answer> { $x/name, $x/price } </answer>

FOR $x in document(“db.xml”)/db/Product/rowORDER BY $x/price/text()RETURN <answer> { $x/name, $x/price } </answer>

XQuery

Page 25: Web Data Management

25

<answer> <name> abc </name> <price> 7 </price></answer> <answer> <name> def </name> <price> 23 </price></answer> . . . .

Xquery’s Answer

Notice: this is NOT awell-formed document !(WHY ???)

Page 26: Web Data Management

26

Producing a Well-Formed Answer

<myQuery> { FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN <answer> { $x/name, $x/price } </answer> }</myQuery>

<myQuery> { FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN <answer> { $x/name, $x/price } </answer> }</myQuery>

Page 27: Web Data Management

27

<myQuery> <answer> <name> abc </name> <price> 7 </price> </answer> <answer> <name> def </name> <price> 23 </price> </answer> . . . .</myQuery>

Xquery’s Answer

Now it is well-formed !

Page 28: Web Data Management

28

SQL and XQuery Side-by-sideProduct(pid, name, maker, price)Company(cid, name, city, revenues) Find all products made in Seattle

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

SQL

FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }

FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }

XQuery

FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }

FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }

CoolXQuery

Page 29: Web Data Management

29

<product> <row> <pid> 123 </pid> <name> abc </name> <maker> efg </maker> </row> <row> …. </row> …</product><product> . . .</product>. . . .

Page 30: Web Data Management

30

SQL and XQuery Side-by-sideFor each company with revenues < 1M count the products over $100

SELECT y.name, count(*)FROM Product x, Company yWHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000GROUP BY y.cid, y.name

SELECT y.name, count(*)FROM Product x, Company yWHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000GROUP BY y.cid, y.name

FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text()<1000000]RETURN <proudCompany> <companyName> { $y/name/text() } </companyName> <numberOfExpensiveProducts> { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } </numberOfExpensiveProducts> </proudCompany>

FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text()<1000000]RETURN <proudCompany> <companyName> { $y/name/text() } </companyName> <numberOfExpensiveProducts> { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) } </numberOfExpensiveProducts> </proudCompany>

Page 31: Web Data Management

31

SQL and XQuery Side-by-sideFind companies with at least 30 products, and their average price

SELECT y.name, avg(x.price)FROM Product x, Company yWHERE x.maker=y.cidGROUP BY y.cid, y.nameHAVING count(*) > 30

SELECT y.name, avg(x.price)FROM Product x, Company yWHERE x.maker=y.cidGROUP BY y.cid, y.nameHAVING count(*) > 30

FOR $r in document(“db.xml”)/db, $y in $r/Company/rowLET $p := $r/Product/row[maker/text()=$y/cid/text()]WHERE count($p) > 30RETURN <theCompany> <companyName> { $y/name/text() } </companyName> <avgPrice> avg($p/price/text()) </avgPrice> </theCompany>

FOR $r in document(“db.xml”)/db, $y in $r/Company/rowLET $p := $r/Product/row[maker/text()=$y/cid/text()]WHERE count($p) > 30RETURN <theCompany> <companyName> { $y/name/text() } </companyName> <avgPrice> avg($p/price/text()) </avgPrice> </theCompany>

A collection

An element

Page 32: Web Data Management

32

Sorting in XQuery

<publisher_list> FOR $b IN document("bib.xml")//book[publisher = ‘M.K’] ORDER BY $b/price/text() RETURN <book> { $b/title , $b/price } </book></publisher_list>

<publisher_list> FOR $b IN document("bib.xml")//book[publisher = ‘M.K’] ORDER BY $b/price/text() RETURN <book> { $b/title , $b/price } </book></publisher_list>

Page 33: Web Data Management

If-Then-Else

33

FOR $h IN //holding

RETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding> SORTBY (title)

FOR $h IN //holding

RETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding> SORTBY (title)

Page 34: Web Data Management

Existential Quantifiers

34

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

Page 35: Web Data Management

Universal Quantifiers

35

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

Page 36: Web Data Management

36

Duplicate Elimination

• distinct-values(list-of-text-values)• How do we eliminate duplicate “tuples” ?

<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row><row> <a>3</a> <b>100</b> </row><row> <a>3</a> <b>200</b> </row><row> <a>8</a> <b>500</b> </row>

<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row><row> <a>3</a> <b>100</b> </row><row> <a>3</a> <b>200</b> </row><row> <a>8</a> <b>500</b> </row>

<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row>

<row> <a>3</a> <b>200</b> </row>

<row> <a>3</a> <b>100</b> </row><row> <a>8</a> <b>500</b> </row>

<row> <a>3</a> <b>200</b> </row>

Page 37: Web Data Management

FOR v.s. LET

• FOR $x in expr -- binds $x to each element in the list expr

• LET $x = expr -- binds $x to the entire list expr– Useful for common subexpressions and for

aggregations

37

Page 38: Web Data Management

FOR v.s. LET

Summary:• FOR-LET-WHERE-RETURN = FLWR

38

FOR/LET Clauses

WHERE Clause

RETURN Clause

List of tuples

List of tuples

Instance of Xquery data model

Page 39: Web Data Management

FOR v.s. LET

FOR• Binds node variables iteration

LET• Binds collection variables one value

39

Page 40: Web Data Management

FOR v.s. LET

40

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x := document("bib.xml")/bib/book

RETURN <result> $x </result>

LET $x := document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>

Page 41: Web Data Management

Collections in XQuery

• Ordered and unordered collections– /bib/book/author = an ordered collection– Distinct(/bib/book/author) = an unordered collection

• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)

41

RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>

Page 42: Web Data Management

Collections in XQuery

What about collections in expressions ?

• $b/@price list of n prices

• $b/@price * 0.7 list of n numbers

• $b/@price * $b/@quantity list of n x m numbers ??

• $b/@price * ($b/@quant1 + $b/@quant2) $b/@price * $b/@quant1 + $b/@price * $b/@quant2 !!

42

Page 43: Web Data Management

Sorting in XQuery

43

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>

Page 44: Web Data Management

Sorting in XQuery

• Sorting arugments: refer to the name space of the RETURN clause, not the FOR clause

• To sort on an element you don’t want to display, first return it, then remove it with an additional query.

44

Page 45: Web Data Management

Other Stuff in XQuery

• BEFORE and AFTER– for dealing with order in the input

• FILTER– deletes some edges in the result tree

• Recursive functions– Currently: arbitrary recursion– Perhaps more restrictions in the future ?

45

Page 46: Web Data Management

Group-By in Xquery ??

• No GROUPBY currently in XQuery• A recent proposal (next)

– What do YOU think ?

46

Page 47: Web Data Management

Group-By in Xquery ??

47

FOR $b IN document("http://www.bn.com")/bib/book,

$y IN $b/@year

WHERE $b/publisher="Morgan Kaufmann"

RETURN GROUPBY $y

WHERE count($b) > 10

IN <year> $y </year>

FOR $b IN document("http://www.bn.com")/bib/book,

$y IN $b/@year

WHERE $b/publisher="Morgan Kaufmann"

RETURN GROUPBY $y

WHERE count($b) > 10

IN <year> $y </year>

SELECT year

FROM Bib

WHERE Bib.publisher="Morgan Kaufmann"

GROUPBY year

HAVING count(*) > 10

SELECT year

FROM Bib

WHERE Bib.publisher="Morgan Kaufmann"

GROUPBY year

HAVING count(*) > 10

with GROUPBY

Equivalent SQL

Page 48: Web Data Management

Group-By in Xquery ??

48

FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

with GROUPBY

Without GROUPBY

Page 49: Web Data Management

Group-By in Xquery ??

49

FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@year, $t IN $b/title, $p IN $b/publisher RETURN GROUPBY $p, $y IN <result> $p, <year> $y </year>, GROUPBY $a IN <authorEntry> $a, GROUPBY $t IN $t <authorEntry> </result>

Nested GROUPBY’s