introduction to xquery bob ducharme [email protected] these slides:

26
Introduction to XQuery Bob DuCharme www.snee.com/bob [email protected] these slides: www.snee.com/xml

Upload: maude

Post on 25-Feb-2016

40 views

Category:

Documents


4 download

DESCRIPTION

Introduction to XQuery Bob DuCharme www.snee.com/bob [email protected] these slides: www.snee.com/xml. What is XQuery? . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Introduction to XQuery Bob DuCharmewww.snee.com/bob

[email protected] slides: www.snee.com/xml

Page 2: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

What is XQuery?

“ A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.”

“ XQuery 1.0: An XML Query Language” W3C Working Draft

Page 3: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

History

• February 1998: XML (Rec) • November 1999: XSLT 1.0, Xpath 1.0 (Recs) • (as of 8 June 2005): XPath 2.0, XSLT 2.0,

XQuery 1.0 in “last call Working Draft” status

• Steps for a W3C “standard”: – Working Draft – Last Call Working Draft – Candidate Recommendation – Proposed Recommendation – Recommendation

Page 4: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

input1.xml sample document

<doc> <p>This is a sample file.</p> <p>This line <emph>really</emph> has an inline element.</p> <p>This line doesn't.</p> <p>Do <emph>you</emph> like inline elements?</p> </doc>

Page 5: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Our first query

Querying from the command line: java net.sf.saxon.Query " {doc('input1.xml')//p[emph]} "

Result: <?xml version="1.0" encoding="UTF-8"?> <p>This line <emph>really</emph> has an inline

element.</p> <p>Do <emph>you</emph> like inline elements?</p>

Page 6: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Query stored in a file

• xq1.xqy: (: Here is an XQuery comment. :) doc('data1.xml')//p[emph]

• Executing it: java net.sf.saxon.Query xq1.xqy

Page 7: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Simplifying the command line

• Linux shell script xquery :

java net.sf.saxon.Query $1 $2 $3 $4 $5 $6

• Windows batch file xquery.bat :

java net.sf.saxon.Query %1 %2 %3 %4 %5 %6

(assuming saxon8.jar is in classpath)

• Executing either: xquery xq1.xqy

Page 8: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Data for more serious examples

• RecipeML: DTD and documentation http://www.formatdata.com/recipeml

• Squirrel's RecipeML Archive http://dsquirrel.tripod.com/recipeml/indexrecipes2.html

• My sample: 294 files

Page 9: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

RecipeML: typical structure <recipeml version="0.5"> <recipe>

<head> <title>Walnut Vinaigrette</title> <categories><cat>Dressings</cat></categories> <yield>1</yield> </head>

<ingredients> <ing> <amt><qty>1</qty><unit>cup</unit></amt> <item>Canned No Salt Chicken</item></ing> <ing> <!-- more ing elements --> </ingredients>

<directions> <step>Bring chicken broth to a boil.</step> <!-- more step elements --> </directions>

</recipe> </recipeml>

Page 10: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Saxon and collection() function

• Argument to function names document in this format:

<collection> <doc href="_Band__Sloppy_Joes.xml"/> <doc href="_Cheese__Fricadelle.xml"/> <!-- more doc elements... --> <doc href="Walton_Mountain_Coffee_Cake.xml"/> <doc href="Walty's_Dressing.xml"/> <doc href="Wan_Tan_(Wonton).xml"/> </collection>

Page 11: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Looking for some sugar

collection('recipeml/docs.xml')/recipeml/ recipe/head/title [//ingredients/ing/item[contains(.,'sugar')]]

Page 12: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

A more SQL-like approach

for $ingredient in collection('recipeml/docs.xml')// ingredients/ing/item[contains(.,'sugar')] return $ingredient/../../../head/title

Page 13: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Outputting well-formed XML

<sweets> { let $target := 'sugar'

for $ingredient in collection('recipeml/docs.xml')// ingredients/ing/item[contains(., $target )] return $ingredient/../../../head/title } </sweets>

Page 14: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

FLWOR expressions • for • let • where • order by • return

"a FLWOR expression ... supports iteration and binding of variables to intermediate results. This kind of expression is often useful for computing joins between two or more documents and for restructuring data."

Page 15: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Extracting subsets: XPath vs. FLWOR approach

• Get the title element for each recipe whose yield is greater than 20:

collection('recipeml/docs.xml')/recipeml/ recipe/head/title[../yield > 20]

• Go through all the documents in the collection, and for any with a yield of more than 20, get the title:

for $doc in collection('recipeml/docs.xml')/recipeml

where $doc/recipe/head/yield > 20 return $doc/recipe/head/title

Page 16: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Doing more with the for clause variable

(: Create an HTML page linking to recipes that serve more than 20 people. :)

<html><head><title>Food for a Crowd</title></head> <body> <h1>Food for a Crowd</h1> { for $doc in collection('recipeml/docs.xml') where $doc /recipeml/recipe/head/yield > 20 return <p><a href="{document-uri( $doc )}"> { $doc /recipeml/recipe/head/title/text()} </a></p> } </body></html>

Page 17: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Calling functions from a let clause (: Which recipe(s) serves the most people? :)

let $maxYield := max(collection('recipeml/docs.xml')/recipeml/

recipe/head/yield)

return collection('recipeml/docs.xml')/recipeml/ recipe[head/yield = $maxYield]

Page 18: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

distinct-values and order by (: A unique, sorted list of all unique ingredients in the recipe collection, with URLS to link to the recipes. :)

<ingredients> { for $ingr in distinct-values( collection('recipeml/docs.xml')/ recipeml/recipe/ingredients/ing/item ) order by $ingr return <item name="{$ingr}"> { for $doc in collection('recipeml/docs.xml') where $doc/recipeml/recipe/ ingredients/ing/item = $ingr

Page 19: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

distinct-values and order by, continued

return <title url="{document-uri($doc)}"> {$doc/recipeml/recipe/head/title/ text() } </title> } </item> } </ingredients>

Page 20: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Excerpt from output <ingredients> <!-- some item elements removed --> <item name=" (12-oz) tomato paste "> <title url="file:/C:/dat/recipeml/ _Best_Ever__Pizza_Sauce.xml"> "Best Ever" Pizza Sauce</title> </item> <item name=" Baking Powder"> <title url="file:/c:/dat/recipeml/ _Blondie__Brownies.xml"> "Blondie" Brownies</title> <title url="file:/c:/dat/recipeml/ Walnut_Pound_Cake.xml"> Walnut Pound Cake</title> </item> <item name=" Baking Soda "> <title url="file:/c:/dat/recipeml/ _Faux__Sourdough.xml"> "Faux" Sourdough</title> </item> <item name=" Baking potatoes "> <title url="file:/c:/dat/recipeml/ _Indian_Chili_.xml"> "Indian Chili"</title> </item> <item name=" Baking powder "> <title url="file:/c:/dat/recipeml/ _Best__Apple_Nut_Pudding.xml"> "Best" Apple Nut Pudding</title> <title url="file:/c:/dat/recipeml/ _Gold_Room__Scones.xml">

"Gold Room" Scones</title> <title url="file:/c:/dat/recipeml/ _Outrageous_Chocolate_Chipper.xml"> "Outrageous" Chocolate-Oatmeal Chipper (Cooki</title> </item> <item name="Baking soda"> <title url="file:/c:/dat/recipeml/ _First__Ginger_Cookies.xml"> "First" Ginger Molasses Cookies</title> <title url="file:/c:/dat/recipeml/ _Foot_in_the_Cake.xml"> "Foot in the Fire" Chocolate Cake</title> </item> <item name="Tomato paste"> <title url="file:/C:/dat/recipeml/ Crawfish_Etouff'ee.xml"> "Frank's Place" Crawfish Etouff'ee </title> <title url="file:/C:/dat/recipeml/ Hamburger____Ground_Meat_Balti.xml"> "Hamburger" / Ground Meat Balti </title> <title url="file:/C:/dat/recipeml/ Indian_Chili_.xml"> "Indian Chili"</title> </item> <!-- some item elements removed --> </ingredients>

Page 21: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

RecipeML: varying markup richness • One way to do it:

<ing><item> (12-oz) tomato paste </item></ing>

• Another way: <ing> <amt> <qty>12</qty> <unit>oz</unit> </amt> <item>tomato paste</item> </ing>

Page 22: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Normalizing data with declared functions

(: A unique, sorted list of all unique ingredients in the recipe collection, with URLs to link to them. Ingredient names get normalized by functions declared in the query prolog. :)

declare namespace sn = "http://www.snee.com/ns/misc/" ;

declare function sn:normIngName($ingName) as xs:string { (: Normalize ingredient name. :) (: remove parenthesized expression that may begin string, e.g. in "(10 ozs) Rotel diced tomatoes":) let $normedName := replace($ingName,"^\(.*?\)\s*","") (: convert to all lower-case :) let $normedName := lower-case($normedName) (: replace multiple spaces with a single one :) let $normedName := normalize-space($normedName)

return $normedName };

Page 23: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Normalizing data with functions, part 2 of 3

declare function sn:normIngList($ingList) as item()* { (: Normalize a list of ingredient names. :) for $ingName in $ingList return sn:normIngName($ingName) };

<ingredients> { let $normIngNames := sn:normIngList(collection('recipeml/docs.xml')// ing/item)

Page 24: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Normalizing data with functions, part 3 of 3

for $ingr in distinct-values($normIngNames) order by $ingr return <item name="{$ingr}"> { for $doc in collection('recipeml/docs.xml'), $i in $doc/recipeml/recipe/ingredients/ing/item where sn:normIngName($i) = $ingr return <title url="{document-uri($doc)}"> {$doc/recipeml/recipe/head/title/text()} </title> } </item> } </ingredients>

Page 25: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Specs at http://www.w3.org/tr

• XQuery 1.0: An XML Query Language • XQuery 1.0 and XPath 2.0 Formal Semantics • the XQuery 1.0 and XPath 2.0 Data Model • XSLT 2.0 and XQuery 1.0 Serialization • XQuery 1.0 and XPath 2.0 Functions and

Operators • XML Query Use Cases

Page 26: Introduction to XQuery    Bob DuCharme  bob@snee.com these slides:

Other resources

• eXist: http://www.exist-db.org • http:ww/w3.org/TR:• MarkLogic: http://www.marklogic.com • Mike Kay “Comparing XSLT and XQuery”:

http://idealliance.org/proceedings/xtech05/papers/02-03-01/

• http:ww/w3.org/TR:– XQuery Update Requirements– XQuery 1.0 and XPath 2.0 Full-Text