inls 760 – web databases · 2013. 4. 9. · extensible markup language (xml) ... xpath –...
TRANSCRIPT
1
INLS 760 – Web Databases
Lecture 12 – XML, XPATH, XSLT
Robert Capra
Spring 2013
Note: These lecture notes are based on the tutorials on XML, XPath, and
XSLT at W3Schools: http://www.w3schools.com/ and from the book, “XML
in a Nutshell, 3rd Edition”, by Harold and Means.
2
HTML <html>
<head>
<title>Chaucer DOM Example</title>
</head>
<body>
<h1>The Canterbury Tales</h1>
<h2>by Geoffrey Chaucer</h2>
<table>
<tr>
<td>Whan that Aprill</td>
<td>with his shoures soote</td>
</tr>
<tr>
<td>The droghte of March</td>
<td>hath perced to the roote</td>
</tr>
</table>
<body>
</html>
HTML
HEAD
TITLE H1
BODY
H2 TABLE
TR TR
TBODY
text:
Chaucer
DOM
Example
text:
The
Canterbury
Tales
text:
By
Geoffrey
Chaucer
TD TD TD TD
1. Hierarchical
structure
2. Common
understanding of tags
3
EXtensible Markup Language (XML)
<?xml version="1.0" encoding="ISO-8859-1"?>
<release>
<title>Punch the Clock</title>
<artist>Elvis Costello and
The Attractions</artist>
<date><day>5</day>
<month>August</month>
<year>1983</year>
</date>
<label>F-Beat/Columbia</label>
</release>
1. Data description
language
2. Hierarchical structure
3. Tags and meanings are
NOT pre-defined
http://www.w3schools.com/xml/xml_whatis.asp
http://www.w3schools.com/xml/xml_syntax.asp
lect10/music1.xml
4
Viewing XML documents
• Current versions of Firefox and Internet
Explorer will display XML documents
• Documents are parsed and may generate
errors
– Ex: lect10/music0.xml
5
XHTML
<HTML>
<body>
<h1>Title</H1>
<p>
This is a paragraph
<ul>
<li>Item 1
<li>Item 2
</ul>
<hr>
<b><u>Bold underline</b></u>
regular text
</html>
<html>
<body>
<h1>Title</h1>
<p>
This is a paragraph
</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
<hr />
<b><u>Bold underline</u></b>
regular text
</body>
</html>
HTML XHTML
lect10/bad.html
6
Valid XML Documents
• XML documents are “stricter” than HTML
– Balanced tags/closing tags required
– Case-sensitive
– Proper nesting
• DTDs and XML Schemas
– XML documents must conform to a DTD or
XML Schema
7
XML Attributes HTML:
<a href="lect10.pdf">Lect 10</a>
<form name="fred">
XML:
<fruit type="tropical">
<name>Papaya</name>
</fruit>
Attributes are used less often in XML than in HTML:
<fruit>
<type>tropical</type>
<name>Papaya</name>
</fruit>
However, id attributes
are good for having an
easy way to access an
element (i.e. using
getElementById)
8
Child Nodes and Rich Structure
<article>
<title></title>
<authors>
<author>
<firstname></firstname>
<lastname></lastname>
</author>
<author>
<firstname></firstname>
<lastname></lastname>
</author>
</authors>
<abstract></abstract>
<body>
<section>
<subsect></subsect>
</section>
<section></section>
</body>
<references>
<reference>
<authors></authors>
<title></title>
<publication>
</publication>
<year></year>
</reference>
</references>
</article>
9
XML Namespaces <apple>
<color>Red</color>
<name>Macintosh</name>
<size>medium</size>
<country>USA<country>
</apple>
<apple>
<color>Red</color>
<name>Macintosh</name>
<size>medium</size>
<country>USA<country>
<havested>May<harvested>
<orchard>Apple Farms</orchard>
</apple>
<apple>
<color>Red</color>
<name>Macintosh</name>
<size>medium</size>
<country>USA<country>
</apple>
<apple>
<color>Red</color>
<name>Macintosh</name>
<size>medium</size>
<country>USA<country>
<processor>Intel</processor>
<screen>17in</screen>
</apple>
What if we tried to combine these in the same XML document?
10
Namespace syntax
General form:
xmlns:prefix="URI"
Example:
xmlns:fred="http://foo.org/fred"
The URI can reference a location where information
about the namespace (such as a DTD or XML
Schema) is located, but XML does not rely on that
being the case.
11
Namespaces
<frt:apple xmlns:frt="http://foo.org/fruit">
<frt:color>Red</frt:color>
<frt:name>Macintosh</frt:name>
<frt:size>medium</frt:size>
<frt:country>USA</frt:country>
<frt:havested>May</frt:harvested>
<frt:orchard>Apple Farms</frt:orchard>
</frt:apple>
<cmp:apple xmlns:cmp="http://foo.org/computer">
<cmp:color>Red</cmp:color>
<cmp:name>Macintosh</cmp:name>
<cmp:size>medium</cmp:size>
<cmp:country>USA</cmp:country>
<cmp:processor>Intel</cmp:processor>
<cmp:screen>17in</cmp:screen>
</cmp:apple>
12
Default Namespaces
<apple xmlns="http://foo.org/fruit">
<color>Red</color>
<name>Macintosh</name>
<size>medium</size>
<country>USA</country>
<havested>May</harvested>
<orchard>Apple Farms</orchard>
<apple>
<apple xmlns:cmp="http://foo.org/computer">
<color>Red</color>
<name>Macintosh</name>
<size>medium</size>
<country>USA</country>
<processor>Intel</processor>
<screen>17in</screen>
</apple>
Using the default
namespace syntax
shown here, you do
NOT need to
include the prefix
in the child
elements
13
XML-based Languages
• VoiceXML
• WML
• RSS
• MathML
• SVG
• Many many others
14
XML Examples
• SVG – http://www.w3schools.com/svg/svg_example.asp
– http://www.w3schools.com/svg/svg_inhtml.asp
• RDF – http://en.wikipedia.org/wiki/Resource_Description_Framework
• RSS – http://en.wikipedia.org/wiki/RSS_%28file_format%29
• VoiceXML
– http://www.w3.org/TR/voicexml20/#dml2.2
15
Is XML a Database? • Pros
• Cons
16
Storing XML Documents
• From Elmasri, p. 948-949:
• Using a DBMS to store documents as text.
– then use keyword indexing/processing
• Using a DBMS to store the document contents as data
elements.
– requires a schema and mapping
• Designing a specialized system for storing native XML
data
• Creating or publishing customized XML documents form
preexisting relational databases
MySQL XML Support
• MySQL dump XML output option
– http://dev.mysql.com/doc/refman/6.0/en/mysql
dump.html#option_mysqldump_xml
• Load XML (in MySQL 6)
– http://dev.mysql.com/doc/refman/6.0/en/load-
xml.html
17
18
XML Tutorials
• W3Schools
– Intro
– Tree
– Validation
– Viewing
– http://www.w3schools.com/xml/default.asp
19
XPath
• XPath is a way to specify a particular node
in an XML document
Copyright 2007, Robert G. Capra III 20
XPath Resources
• Chapter 9, XML in a Nutshell, 3rd ed. by E.R. Harold, W.S. Means
– UNC Library Electronic Access
• XPath Tutorial, W3Schools
– http://www.w3schools.com/xpath/default.asp
• RIT IT722
– http://www.it.rit.edu/~jxs/classes/2007_Spring/XSLT/02_week/
• Zvon XpathTutorial
– http://www.zvon.org/xxl/XPathTutorial/General/examples.html
• TopXML
– http://www.topxml.com/xsl/tutorials/intro/default.asp
21
XPath <?xml version="1.0" encoding=“ISO-8859-1“?>
<collection>
<release country="USA">
<title>Look Sharp!</title>
<artist>Joe Jackson</artist>
<date>1979</date>
<label>A&M</label>
</release>
<release>
<title>Night and Day</title>
<artist>Joe Jackson</artist>
<date>1982</date>
<label>A&M</label>
</release>
</collection>
Xpath Node Types
1. Element
2. Attribute
3. Text
4. Namespace
5. Processing-Instruction
6. Comment
7. Document (root)
Node Relationships
1. Parent
2. Children
3. Siblings
4. Ancestors
5. Decendants
22
XPath
<?xml version="1.0" encoding=“ISO-8859-1“?>
<collection>
<release country="USA">
<title>Look Sharp!</title>
<artist>Joe Jackson</artist>
<date>1979</date>
<label>A&M</label>
</release>
<release>
<title>Night and Day</title>
<artist>Joe Jackson</artist>
<date>1982</date>
<label>A&M</label>
</release>
</collection>
Document node
Element node
Attribute node
23
XPath Tree <?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="music1.xsl"?>
<collection>
<release country="USA">
<title>Look Sharp!</title>
<artist>Joe Jackson</artist>
<date><month>April</month><year>1979</year></date>
<label>A&M</label>
</release>
<release>
<title>Night and Day</title>
<artist>Joe Jackson</artist>
<date><year>1982</year></date>
</release>
</collection>
Root node
<?xml-stylesheet…?> <collection>..</collection>
<artist>..
</artist>
<title>..
</title>
<release>..</release> <release>..</release>
<date>..
</date>
<label>..
</label>
Root node: /
Root element:
collection
<month>..
</month> <year>..
</year>
<year>..
</year>
<artist>..
</artist>
<title>..
</title>
<date>..
</date>
24
XPath Selection
http://www.zvon.org/xxl/XPathTutorial/
General/examples.html
25
XPath – Location Path Basics <?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="music1.xsl"?>
<collection>
<release country="USA">
<title>Look Sharp!</title>
<artist>Joe Jackson</artist>
<date><month>April</month><year>1979</year></date>
<label>A&M</label>
</release>
<release>
<title>Night and Day</title>
<artist>Joe Jackson</artist>
<date><year>1982</year></date>
</release>
</collection> Root node
<?xml-stylesheet…?> <collection>..
</collection>
<artist>..
</artist>
<title>..
</title>
<release>..
</release>
<date>..
</date>
<label>..
</label>
<artist>..
</artist>
<title>..
</title>
<release>..
</release>
<date>..
</date>
/ – selects the root node, regardless
of context node
element_node – selects child
elements of the current node that
have that name
//foo – selects all foo nodes in the
document, regardless of context
node
. – current (context) node
.. – parent of current node
@foo – selects all attribute nodes
named foo in the current context
[foo] – select using predicate foo
26
Exensible Stylesheet Language (XSL)
• CSS for XML documents
QTP: HTML tags are defined, XML tags are
created by users. How to create a style-
sheet for XML?
27
XSL Transformations (XSLT)
• Very powerful
• XSLT provides a way to
– transform one XML document into another XML document
– or into some other format (such as HTML)
• Allows manipulation of XML elements
• Uses XPath
• Supported by Firefox, IE, other browsers
28
XSLT Resources • Chapter 8, XML in a Nutshell, 3rd ed. by E.R. Harold,
W.S. Means
– UNC Library Electronic Access
• XSLT Tutorial, W3Schools
– http://www.w3schools.com/xsl/default.asp
• Zvon XpathTutorial
– http://www.zvon.org/xxl/XSLTutorial/Output/index.html
• TopXML
– http://www.topxml.com/xsl/tutorials/intro/default.asp
29
XSLT Examples – XML file <?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="music1.xsl"?>
<collection>
<release country="USA">
<title>Look Sharp!</title>
<artist>Joe Jackson</artist>
<date>
<month>April</month>
<year>1979</year>
</date>
<label>A&M</label>
</release>
<release>
<title>Night and Day</title>
<artist>Joe Jackson</artist>
<date>
<year>1982</year>
</date>
</release>
</collection>
lect10/music1.xml
stylesheet link
30
XSLT Example 1
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="release">
<text>Fred</text>
</xsl:template>
</xsl:stylesheet>
lect10/music1.xsl Simple stylesheet
Traverses the XML document
Looks at every node
If node matches the template rule,
then process and output the template
Template rule
Template
31
XSLT Example 2
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="collection">
<b>Fred</b>
</xsl:template>
<xsl:template match="artist">
<b>Ethel</b>
</xsl:template>
</xsl:stylesheet>
lect10/music2.xsl
Match “eats” the node it matches
i.e. child nodes do not get templates applied to them
32
XSLT Example 1 – Revisited
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="release">
<b>Fred</b>
</xsl:template>
</xsl:stylesheet>
lect10/music1.xsl
Differences in XSLT processors (e.g. IE & Firefox)
text/xsl vs. application/xsl-xml
Default processing rules
Try: match on collection, release, artist
in both IE and Firefox
33
XSLT Example 3 – value-of
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="date">
<b>
<xsl:value-of select="."/>
</b>
</xsl:template>
</xsl:stylesheet>
lect10/music3.xsl
Value-of returns the value of the entire node specified
by the select
34
XSLT Example 4 – Order of Eval
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match=“text( )">
<b>
<xsl:value-of select="."/>
</b>
</xsl:template>
</xsl:stylesheet>
lect10/music4.xsl
By default, XSLT looks for matches using a preorder
traversal of the XML Tree
Selects any node
with text
QTP: What if match=“date/text( )” ?
35
XSLT Ex. 5 – Change Order
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="date">
<i> <xsl:value-of select="month"/> </i>
<b> <xsl:value-of select="year"/> </b>
</xsl:template>
<xsl:template match="release">
<xsl:apply-templates select="date"/>
</xsl:template>
</xsl:stylesheet>
lect10/music5.xsl
Use apply-templates to change the evaluation order
Applies all the templates to the children of the
current node; optional select to filter children
36
Ex. 6 – More Apply Templates
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*|/">
<p> <b> <xsl:value-of select="."/> </b> </p>
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
lect10/music6.xsl
Use apply-templates to change the evaluation order
Applies all the templates to the children of the
current node; optional select to filter children
37
XSLT Ex. 7 – Create HTML <?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="collection">
<html>
<h1>My CD Collection</h1>
<table border="1">
<xsl:apply-templates/>
</table>
</html>
</xsl:template>
<xsl:template match="release">
<tr>
<td> <xsl:value-of select="title"/> </td>
<td> <xsl:value-of select="artist"/> </td>
</tr>
</xsl:template>
</xsl:stylesheet>
lect10/music7.xsl
Use apply-templates to
control the evaluation
order and placement in
a template
38
XSLT Ex. 8 – for-each
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<xsl:for-each select="//date">
<p><b> <xsl:value-of select="year"/>
</b></p>
</xsl:for-each>
</html>
</xsl:template>
</xsl:stylesheet>
lect10/music8.xsl
for-each iterates through a set of nodes
Resources
• Default rules
– XML in a Nutshell, 3rd ed., section 8.7
• Harold and Means, O'Reilly, 2004.
– http://www.zvon.org/xxl/XSLTutorial/Output/e
xample73_ch2.html
• MIME type for stylesheet
– XML in a Nutshell, 3rd ed., section 8.3
• Harold and Means, O'Reilly, 2004.
39
40
XSLT Examples – XML file <?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="music1.xsl"?>
<collection>
<release country="USA">
<title>Look Sharp!</title>
<artist>Joe Jackson</artist>
<date>
<month>April</month>
<year>1979</year>
</date>
<label>A&M</label>
</release>
<release>
<title>Night and Day</title>
<artist>Joe Jackson</artist>
<date>
<year>1982</year>
</date>
</release>
</collection> lect10/music1.xml
type is important
IE:
"text/xsl"
Firefox:
"application/xml"
Eventually:
"application/xslt+xml"
This affects the application of
the DEFAULT RULES
41
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="text()|@*">
<text><xsl:value-of select="."/></text>
</xsl:template>
<xsl:template match="artist">
<b>FOO</b>
</xsl:template>
</xsl:stylesheet>
lect12/foo.xsl
Default rules
Two important ones:
1. For text nodes
2. For root/element nodes
See XML in a Nutshell,
section 8.7