inls 760 – web databases · 2013. 4. 9. · extensible markup language (xml) ... xpath –...

41
1 INLS 760 Web Databases Lecture 12 XML, XPATH, XSLT Robert Capra Spring 2013 Note: These lecture notes are based on the tutorials on XML, XPath, and XSLT at W3Schools: http://www.w3schools.com/ and from the book, “XML in a Nutshell, 3 rd Edition”, by Harold and Means.

Upload: others

Post on 16-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

1

INLS 760 – Web Databases

Lecture 12 – XML, XPATH, XSLT

Robert Capra

Spring 2013

Note: These lecture notes are based on the tutorials on XML, XPath, and

XSLT at W3Schools: http://www.w3schools.com/ and from the book, “XML

in a Nutshell, 3rd Edition”, by Harold and Means.

Page 2: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

2

HTML <html>

<head>

<title>Chaucer DOM Example</title>

</head>

<body>

<h1>The Canterbury Tales</h1>

<h2>by Geoffrey Chaucer</h2>

<table>

<tr>

<td>Whan that Aprill</td>

<td>with his shoures soote</td>

</tr>

<tr>

<td>The droghte of March</td>

<td>hath perced to the roote</td>

</tr>

</table>

<body>

</html>

HTML

HEAD

TITLE H1

BODY

H2 TABLE

TR TR

TBODY

text:

Chaucer

DOM

Example

text:

The

Canterbury

Tales

text:

By

Geoffrey

Chaucer

TD TD TD TD

1. Hierarchical

structure

2. Common

understanding of tags

Page 3: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

3

EXtensible Markup Language (XML)

<?xml version="1.0" encoding="ISO-8859-1"?>

<release>

<title>Punch the Clock</title>

<artist>Elvis Costello and

The Attractions</artist>

<date><day>5</day>

<month>August</month>

<year>1983</year>

</date>

<label>F-Beat/Columbia</label>

</release>

1. Data description

language

2. Hierarchical structure

3. Tags and meanings are

NOT pre-defined

http://www.w3schools.com/xml/xml_whatis.asp

http://www.w3schools.com/xml/xml_syntax.asp

lect10/music1.xml

Page 4: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

4

Viewing XML documents

• Current versions of Firefox and Internet

Explorer will display XML documents

• Documents are parsed and may generate

errors

– Ex: lect10/music0.xml

Page 5: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

5

XHTML

<HTML>

<body>

<h1>Title</H1>

<p>

This is a paragraph

<ul>

<li>Item 1

<li>Item 2

</ul>

<hr>

<b><u>Bold underline</b></u>

regular text

</html>

<html>

<body>

<h1>Title</h1>

<p>

This is a paragraph

</p>

<ul>

<li>Item 1</li>

<li>Item 2</li>

</ul>

<hr />

<b><u>Bold underline</u></b>

regular text

</body>

</html>

HTML XHTML

lect10/bad.html

Page 6: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

6

Valid XML Documents

• XML documents are “stricter” than HTML

– Balanced tags/closing tags required

– Case-sensitive

– Proper nesting

• DTDs and XML Schemas

– XML documents must conform to a DTD or

XML Schema

Page 7: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

7

XML Attributes HTML:

<a href="lect10.pdf">Lect 10</a>

<form name="fred">

XML:

<fruit type="tropical">

<name>Papaya</name>

</fruit>

Attributes are used less often in XML than in HTML:

<fruit>

<type>tropical</type>

<name>Papaya</name>

</fruit>

However, id attributes

are good for having an

easy way to access an

element (i.e. using

getElementById)

Page 8: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

8

Child Nodes and Rich Structure

<article>

<title></title>

<authors>

<author>

<firstname></firstname>

<lastname></lastname>

</author>

<author>

<firstname></firstname>

<lastname></lastname>

</author>

</authors>

<abstract></abstract>

<body>

<section>

<subsect></subsect>

</section>

<section></section>

</body>

<references>

<reference>

<authors></authors>

<title></title>

<publication>

</publication>

<year></year>

</reference>

</references>

</article>

Page 9: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

9

XML Namespaces <apple>

<color>Red</color>

<name>Macintosh</name>

<size>medium</size>

<country>USA<country>

</apple>

<apple>

<color>Red</color>

<name>Macintosh</name>

<size>medium</size>

<country>USA<country>

<havested>May<harvested>

<orchard>Apple Farms</orchard>

</apple>

<apple>

<color>Red</color>

<name>Macintosh</name>

<size>medium</size>

<country>USA<country>

</apple>

<apple>

<color>Red</color>

<name>Macintosh</name>

<size>medium</size>

<country>USA<country>

<processor>Intel</processor>

<screen>17in</screen>

</apple>

What if we tried to combine these in the same XML document?

Page 10: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

10

Namespace syntax

General form:

xmlns:prefix="URI"

Example:

xmlns:fred="http://foo.org/fred"

The URI can reference a location where information

about the namespace (such as a DTD or XML

Schema) is located, but XML does not rely on that

being the case.

Page 11: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

11

Namespaces

<frt:apple xmlns:frt="http://foo.org/fruit">

<frt:color>Red</frt:color>

<frt:name>Macintosh</frt:name>

<frt:size>medium</frt:size>

<frt:country>USA</frt:country>

<frt:havested>May</frt:harvested>

<frt:orchard>Apple Farms</frt:orchard>

</frt:apple>

<cmp:apple xmlns:cmp="http://foo.org/computer">

<cmp:color>Red</cmp:color>

<cmp:name>Macintosh</cmp:name>

<cmp:size>medium</cmp:size>

<cmp:country>USA</cmp:country>

<cmp:processor>Intel</cmp:processor>

<cmp:screen>17in</cmp:screen>

</cmp:apple>

Page 12: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

12

Default Namespaces

<apple xmlns="http://foo.org/fruit">

<color>Red</color>

<name>Macintosh</name>

<size>medium</size>

<country>USA</country>

<havested>May</harvested>

<orchard>Apple Farms</orchard>

<apple>

<apple xmlns:cmp="http://foo.org/computer">

<color>Red</color>

<name>Macintosh</name>

<size>medium</size>

<country>USA</country>

<processor>Intel</processor>

<screen>17in</screen>

</apple>

Using the default

namespace syntax

shown here, you do

NOT need to

include the prefix

in the child

elements

Page 13: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

13

XML-based Languages

• VoiceXML

• WML

• RSS

• MathML

• SVG

• Many many others

Page 14: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

14

XML Examples

• SVG – http://www.w3schools.com/svg/svg_example.asp

– http://www.w3schools.com/svg/svg_inhtml.asp

• RDF – http://en.wikipedia.org/wiki/Resource_Description_Framework

• RSS – http://en.wikipedia.org/wiki/RSS_%28file_format%29

• VoiceXML

– http://www.w3.org/TR/voicexml20/#dml2.2

Page 15: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

15

Is XML a Database? • Pros

• Cons

Page 16: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

16

Storing XML Documents

• From Elmasri, p. 948-949:

• Using a DBMS to store documents as text.

– then use keyword indexing/processing

• Using a DBMS to store the document contents as data

elements.

– requires a schema and mapping

• Designing a specialized system for storing native XML

data

• Creating or publishing customized XML documents form

preexisting relational databases

Page 17: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

MySQL XML Support

• MySQL dump XML output option

– http://dev.mysql.com/doc/refman/6.0/en/mysql

dump.html#option_mysqldump_xml

• Load XML (in MySQL 6)

– http://dev.mysql.com/doc/refman/6.0/en/load-

xml.html

17

Page 18: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

18

XML Tutorials

• W3Schools

– Intro

– Tree

– Validation

– Viewing

– http://www.w3schools.com/xml/default.asp

Page 19: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

19

XPath

• XPath is a way to specify a particular node

in an XML document

Page 20: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

Copyright 2007, Robert G. Capra III 20

XPath Resources

• Chapter 9, XML in a Nutshell, 3rd ed. by E.R. Harold, W.S. Means

– UNC Library Electronic Access

• XPath Tutorial, W3Schools

– http://www.w3schools.com/xpath/default.asp

• RIT IT722

– http://www.it.rit.edu/~jxs/classes/2007_Spring/XSLT/02_week/

• Zvon XpathTutorial

– http://www.zvon.org/xxl/XPathTutorial/General/examples.html

• TopXML

– http://www.topxml.com/xsl/tutorials/intro/default.asp

Page 21: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

21

XPath <?xml version="1.0" encoding=“ISO-8859-1“?>

<collection>

<release country="USA">

<title>Look Sharp!</title>

<artist>Joe Jackson</artist>

<date>1979</date>

<label>A&M</label>

</release>

<release>

<title>Night and Day</title>

<artist>Joe Jackson</artist>

<date>1982</date>

<label>A&M</label>

</release>

</collection>

Xpath Node Types

1. Element

2. Attribute

3. Text

4. Namespace

5. Processing-Instruction

6. Comment

7. Document (root)

Node Relationships

1. Parent

2. Children

3. Siblings

4. Ancestors

5. Decendants

Page 22: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

22

XPath

<?xml version="1.0" encoding=“ISO-8859-1“?>

<collection>

<release country="USA">

<title>Look Sharp!</title>

<artist>Joe Jackson</artist>

<date>1979</date>

<label>A&M</label>

</release>

<release>

<title>Night and Day</title>

<artist>Joe Jackson</artist>

<date>1982</date>

<label>A&M</label>

</release>

</collection>

Document node

Element node

Attribute node

Page 23: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

23

XPath Tree <?xml version="1.0" encoding="ISO-8859-1"?>

<?xml-stylesheet type="text/xsl" href="music1.xsl"?>

<collection>

<release country="USA">

<title>Look Sharp!</title>

<artist>Joe Jackson</artist>

<date><month>April</month><year>1979</year></date>

<label>A&amp;M</label>

</release>

<release>

<title>Night and Day</title>

<artist>Joe Jackson</artist>

<date><year>1982</year></date>

</release>

</collection>

Root node

<?xml-stylesheet…?> <collection>..</collection>

<artist>..

</artist>

<title>..

</title>

<release>..</release> <release>..</release>

<date>..

</date>

<label>..

</label>

Root node: /

Root element:

collection

<month>..

</month> <year>..

</year>

<year>..

</year>

<artist>..

</artist>

<title>..

</title>

<date>..

</date>

Page 24: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

24

XPath Selection

http://www.zvon.org/xxl/XPathTutorial/

General/examples.html

Page 25: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

25

XPath – Location Path Basics <?xml version="1.0" encoding="ISO-8859-1"?>

<?xml-stylesheet type="text/xsl" href="music1.xsl"?>

<collection>

<release country="USA">

<title>Look Sharp!</title>

<artist>Joe Jackson</artist>

<date><month>April</month><year>1979</year></date>

<label>A&amp;M</label>

</release>

<release>

<title>Night and Day</title>

<artist>Joe Jackson</artist>

<date><year>1982</year></date>

</release>

</collection> Root node

<?xml-stylesheet…?> <collection>..

</collection>

<artist>..

</artist>

<title>..

</title>

<release>..

</release>

<date>..

</date>

<label>..

</label>

<artist>..

</artist>

<title>..

</title>

<release>..

</release>

<date>..

</date>

/ – selects the root node, regardless

of context node

element_node – selects child

elements of the current node that

have that name

//foo – selects all foo nodes in the

document, regardless of context

node

. – current (context) node

.. – parent of current node

@foo – selects all attribute nodes

named foo in the current context

[foo] – select using predicate foo

Page 26: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

26

Exensible Stylesheet Language (XSL)

• CSS for XML documents

QTP: HTML tags are defined, XML tags are

created by users. How to create a style-

sheet for XML?

Page 27: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

27

XSL Transformations (XSLT)

• Very powerful

• XSLT provides a way to

– transform one XML document into another XML document

– or into some other format (such as HTML)

• Allows manipulation of XML elements

• Uses XPath

• Supported by Firefox, IE, other browsers

Page 28: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

28

XSLT Resources • Chapter 8, XML in a Nutshell, 3rd ed. by E.R. Harold,

W.S. Means

– UNC Library Electronic Access

• XSLT Tutorial, W3Schools

– http://www.w3schools.com/xsl/default.asp

• Zvon XpathTutorial

– http://www.zvon.org/xxl/XSLTutorial/Output/index.html

• TopXML

– http://www.topxml.com/xsl/tutorials/intro/default.asp

Page 29: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

29

XSLT Examples – XML file <?xml version="1.0" encoding="ISO-8859-1"?>

<?xml-stylesheet type="text/xsl" href="music1.xsl"?>

<collection>

<release country="USA">

<title>Look Sharp!</title>

<artist>Joe Jackson</artist>

<date>

<month>April</month>

<year>1979</year>

</date>

<label>A&amp;M</label>

</release>

<release>

<title>Night and Day</title>

<artist>Joe Jackson</artist>

<date>

<year>1982</year>

</date>

</release>

</collection>

lect10/music1.xml

stylesheet link

Page 30: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

30

XSLT Example 1

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="release">

<text>Fred</text>

</xsl:template>

</xsl:stylesheet>

lect10/music1.xsl Simple stylesheet

Traverses the XML document

Looks at every node

If node matches the template rule,

then process and output the template

Template rule

Template

Page 31: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

31

XSLT Example 2

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="collection">

<b>Fred</b>

</xsl:template>

<xsl:template match="artist">

<b>Ethel</b>

</xsl:template>

</xsl:stylesheet>

lect10/music2.xsl

Match “eats” the node it matches

i.e. child nodes do not get templates applied to them

Page 32: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

32

XSLT Example 1 – Revisited

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="release">

<b>Fred</b>

</xsl:template>

</xsl:stylesheet>

lect10/music1.xsl

Differences in XSLT processors (e.g. IE & Firefox)

text/xsl vs. application/xsl-xml

Default processing rules

Try: match on collection, release, artist

in both IE and Firefox

Page 33: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

33

XSLT Example 3 – value-of

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="date">

<b>

<xsl:value-of select="."/>

</b>

</xsl:template>

</xsl:stylesheet>

lect10/music3.xsl

Value-of returns the value of the entire node specified

by the select

Page 34: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

34

XSLT Example 4 – Order of Eval

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match=“text( )">

<b>

<xsl:value-of select="."/>

</b>

</xsl:template>

</xsl:stylesheet>

lect10/music4.xsl

By default, XSLT looks for matches using a preorder

traversal of the XML Tree

Selects any node

with text

QTP: What if match=“date/text( )” ?

Page 35: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

35

XSLT Ex. 5 – Change Order

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="date">

<i> <xsl:value-of select="month"/> </i>

<b> <xsl:value-of select="year"/> </b>

</xsl:template>

<xsl:template match="release">

<xsl:apply-templates select="date"/>

</xsl:template>

</xsl:stylesheet>

lect10/music5.xsl

Use apply-templates to change the evaluation order

Applies all the templates to the children of the

current node; optional select to filter children

Page 36: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

36

Ex. 6 – More Apply Templates

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="*|/">

<p> <b> <xsl:value-of select="."/> </b> </p>

<xsl:apply-templates/>

</xsl:template>

</xsl:stylesheet>

lect10/music6.xsl

Use apply-templates to change the evaluation order

Applies all the templates to the children of the

current node; optional select to filter children

Page 37: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

37

XSLT Ex. 7 – Create HTML <?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="collection">

<html>

<h1>My CD Collection</h1>

<table border="1">

<xsl:apply-templates/>

</table>

</html>

</xsl:template>

<xsl:template match="release">

<tr>

<td> <xsl:value-of select="title"/> </td>

<td> <xsl:value-of select="artist"/> </td>

</tr>

</xsl:template>

</xsl:stylesheet>

lect10/music7.xsl

Use apply-templates to

control the evaluation

order and placement in

a template

Page 38: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

38

XSLT Ex. 8 – for-each

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">

<html>

<xsl:for-each select="//date">

<p><b> <xsl:value-of select="year"/>

</b></p>

</xsl:for-each>

</html>

</xsl:template>

</xsl:stylesheet>

lect10/music8.xsl

for-each iterates through a set of nodes

Page 39: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

Resources

• Default rules

– XML in a Nutshell, 3rd ed., section 8.7

• Harold and Means, O'Reilly, 2004.

– http://www.zvon.org/xxl/XSLTutorial/Output/e

xample73_ch2.html

• MIME type for stylesheet

– XML in a Nutshell, 3rd ed., section 8.3

• Harold and Means, O'Reilly, 2004.

39

Page 40: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

40

XSLT Examples – XML file <?xml version="1.0" encoding="ISO-8859-1"?>

<?xml-stylesheet type="text/xsl" href="music1.xsl"?>

<collection>

<release country="USA">

<title>Look Sharp!</title>

<artist>Joe Jackson</artist>

<date>

<month>April</month>

<year>1979</year>

</date>

<label>A&amp;M</label>

</release>

<release>

<title>Night and Day</title>

<artist>Joe Jackson</artist>

<date>

<year>1982</year>

</date>

</release>

</collection> lect10/music1.xml

type is important

IE:

"text/xsl"

Firefox:

"application/xml"

Eventually:

"application/xslt+xml"

This affects the application of

the DEFAULT RULES

Page 41: INLS 760 – Web Databases · 2013. 4. 9. · EXtensible Markup Language (XML)   ... XPath – Location Path Basics

41

XSLT

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="text()|@*">

<text><xsl:value-of select="."/></text>

</xsl:template>

<xsl:template match="artist">

<b>FOO</b>

</xsl:template>

</xsl:stylesheet>

lect12/foo.xsl

Default rules

Two important ones:

1. For text nodes

2. For root/element nodes

See XML in a Nutshell,

section 8.7