xml and xsl overview
DESCRIPTION
XML and XSL Overview. by Alex Chaffee [email protected], http://www.purpletech.com/ Purple Technology: Open source development jGuru: Java online resource FAQs and News and other cool stuff. XML. eXtensible Markup Language Replacement for HTML Metalanguage - used to create other languages - PowerPoint PPT PresentationTRANSCRIPT
XML and XSL Overview
by Alex [email protected], http://www.purpletech.com/Purple Technology: Open source development
jGuru: Java online resourceFAQs and News and other cool stuff
XML
• eXtensible Markup Language• Replacement for HTML• Metalanguage - used to create other
languages• Has become a universal data-
exchange format
Advantages of XML
• Human-readable• Machine-readable (easy to parse)• Standard format for data interchange• Possible to validate• Extensible
– can represent any data– can add new tags for new data formats
• Hierarchical structure (nesting)
Why not HTML?
• Browsers are too lenient• Led to sloppy HTML code all over the
Web– <imG src="foo.gif> is "legal" HTML
• Told HTML, "go to your room and don't come out until it's clean"– Out came XML
XML Searching and Agents
• An early motivation for XML• Allows detailed queries of disparate data
sources– Find best price for certain product– Search for properties with different real estate
brokers
• HTML insufficient– Good for humans, bad for computers– Doesn't scale
XML Example
<?xml version="1.0"?><!DOCTYPE menu SYSTEM "menu.dtd"><menu>
<meal name="breakfast"><food>Scrambled Eggs</food><food>Hash Browns</food><drink>Orange Juice</drink>
</meal></menu>
XML Languages
• MML - musical scores• CML - chemicals• HRMML - Human Resource
Management (???)• MathML - equations• RSS - web syndication
Tag vs. Element
• A tag is a name, enclosed by angle brackets, with optional attributes– <foo id=“123”>
• An element is a tree, containing an open tag, contents, and a close tag– <foo id=“123”>This is <bar>an
element</bar></foo>
XML Syntax
• Tags properly nested• Tag names case-sensitive• All tags must be closed
– or self-closing– <foo/> is the same as <foo></foo>
• Attributes enclosed in quotes• Document consists of a single (root) element• A few other details
Well-Formed vs. Valid
• Well-Formed:– Structure follows XML syntax rules
• Valid:– Structure conforms to a DTD
DTD
• Document Type Definition• A grammar for XML documents• Defines
– which elements can contain which other elements
– which attributes are allowed/required/permitted on which elements
DTD and Data Exchange
• Both sides must agree on DTD ahead of time
• DTD can be part of document or stored separately
DTD Example
<?xml encoding="US-ASCII"><!ELEMENT menu (meal)*><!ATTLIST menu name CDATA #OPTIONAL><!ELEMENT meal (food|drink)*><!ATTLIST meal
name CDATA #REQUIRED>
<!ELEMENT food (#PCDATA)*><!ELEMENT drink (#PCDATA)*>
Why isn't a DTD in XML?
• It will be someday: XSchema
XML Namespaces
• A single document can use multiple DTDs
• But! Two DTDs can use the same element name with different rules
• Solution: Namespaces• Must prefix tag name with namespace
name– e.g. <xsl:apply-templates select="."/>
Entities
• Macros / constants• Values defined once, used in
document<!DOCTYPE foo SYSTEM "foo.dtd" [
<!ENTITY background "#99FFFF">]><BODY BGCOLOR="&background;">
SML / Minimal XML
• Simplified Markup Language• Subset of XML, but stripped down• Easier to understand, parse• No
– DTDs– Attributes– Processing instructions– etc.
XSL: XML Transformation
XSL
• The eXtensible Style Language• Transforms XML into HTML• Actually, transforms XML into a tree,
then turns that tree into another tree, then outputs that tree as XML
XSL Architecture
XMLSource
XSLStylesheet
HTMLOutput
XSLProcessor
XML is a Tree
<?xml version="1.0"?>
<!DOCTYPE menu SYSTEM "menu.dtd">
<menu>
<meal name="breakfast">
<food>Scrambled Eggs</food>
<food>Hash Browns</food>
<drink>Orange Juice</drink>
</meal>
<meal name="snack">
<food>Chips</food>
</meal>
</menu>
menu
meal
name
"breakfast"
food
"ScrambledEggs"
food
"HashBrowns"
drink
"OrangeJuice"
meal
XML Is A Tree
• Nodes– Branch nodes contain children– Leaf nodes contain content
• Attributes, Values, Entities, etc.
• DOM provides API-based access to tree models
• XSL turns one tree into a different tree
Command Line Invocation
• Apache Xalanjava org.apache.xalan.xslt.Process
-IN faq.xml –XSL faq.xsl –OUT faq.html
• IBM LotusXSLjava com.lotus.xsl.xml4j.ProcessXSL
-in servletfaq.xml -xsl faq.xsl -out faq.html
• And so on…
Formatting Objects
• Forget about it for now
XSLT
• The meat of XSL• Syntax for making XSL template files• Pattern matching• Output formatting• Rules-based (like Prolog)
XPath
• The stuff inside the quotes in XSL patterns– "/person/name/firstname"
• A sensible way to locate content in an XML document
• More straightforward than walking a DOM tree or waiting for a SAX callback
XPath Syntax
• book/title– title child of book child of current node
• /book/title– title child of book child of document root
• @language– language attribute of current node
• chapter/@language– language attribute of chapter child of
current node
XPath Syntax (cont.)
• chapter[3]/para– all the para children of the third chapter
• book/*/title– all title children of all children of book (but not of
their children)
• chapter//para– all para children of any child of chapter, recursively
• ../../title– title child of parent of parent– parent::node()/parent::node()/child::title
XPath Abbreviations
. self::node()
.. parent::node()
// descendant-or-self::node()
@ attribute::
XPath Functions
• para[1] or para[position()=1]– the first para node of the current node
• para[last()]• para[count(child::note)>0]
– all paragraphs with one or more notes
• para[id("abstract")]– selects all child nodes like
<para id="abstract">
• para[@type='secret'] or para[attribute::type='secret']– selects all child nodes like
<para type="secret">
XPath Functions (cont.)
• para[not(title)]– selects all child paragraphs with no title elements
• para[position() >= 2 and position() < last()]– selects all but the first and last paragraphs
• para[lang("en")]– matches <para xml:lang="en-uk">…</para>
• note[contains(., "alex")]– . means "test childrens' content too, recursively" in
this context
• note[starts-with(., "hello")]
XPath Disadvantages
• Not XML– Not hierarchical– New syntax rules– Weird mix of /,[],(),*,:,::,.,..,@
• New function set– Not Java
• Concepts like "axis" not always clear
XSLT Syntax
XSL Rules
• XSL is a series of rules or templates• Each template matches an element• Templates can contain XML
commands
XSL Commands: apply-templates
• Main rule: apply-templates– looks for a template match– applies it
• Usually the template calls apply-templates recursively on its children
• If not, then processing stops at that node (but continues for its other siblings that matched this template)
Default Rule
• For a leaf node, output its contents• For a branch node, apply templates
(recursively) (including default rule)
Some XSL Commands
• value-of– grabs raw value, good for text elements and
attributes
• if– executes conditionally
• number– counts position of element in group– good for ordered list numbering, table of
contents, etc.
XSL Example
<?xml version="1.0"?><!DOCTYPE xsl:stylesheet [
<!ENTITY background "#99FFFF">]><xsl:stylesheet
xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns="http://www.w3.org/TR/REC-html40" result-ns="">
Example (cont.)
<xsl:template match="menu"><HTML> <HEAD> <TITLE>Menu: <xsl:value-of select="@name"/>
</TITLE> </HEAD> <BODY BGCOLOR="&background;"> <H1> Menu <xsl:value-of select="@name"/> </H1>
[Note: Can reuse contents, unlike CSS]
Example (cont.)
<xsl:apply-templates />
</BODY></HTML></xsl:template>
Example (cont.)
<xsl:template match="meal"> <H2><xsl:value-of select="@name"/></H2><br />; <UL>
<xsl:apply-templates/> </UL></xsl:template>
Example (cont.)
<xsl:template match="food"> <LI><xsl:apply-templates/></LI></xsl:template>
<xsl:template match="drink"> <LI><xsl:apply-templates/></LI></xsl:template>
</xsl:stylesheet>
Outputting Attributes
• From This:– <link>
<name>Stinky</name> <url>http://www.stinky.com/</url></link>
• We Want This:– <A href="http://www.stinky.com/">Stinky</A>
Outputting Attributes
• The Hard Way:– <xsl:element name="A">
<xsl:attribute name="href"> <xsl:value-of select="url" /> </xsl:attribute> <xsl:value-of select="name" /></xsl:element>
• The Easy Way:– <A href="{url}">
<xsl:value-of select="name"/></A>
Copying Subtrees
• <xsl:template match="*|@*|text()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()"/> </xsl:copy></xsl:template>
• No, I don't understand it either • Default copy rule strips all tags/attributes• Also copy-of
XSL conditionals: if
• <xsl:if test="author"> by <xsl:apply-templates select="author" /></xsl:if>
• Note: no else (?!?)
XSL Conditonals: choose
• Case 1– <link>
<name>Stinky</name> <url>http://www.stinky.com/</url></link>
– <a href="http://www.stinky.com/">Stinky</a>
• Case 2– <link>
<url>http://www.stinky.com/</url></link>
– <a href="http://www.stinky.com/">http://www.stinky.com/</a>
• Case 3– <link>
<name>Stinky</name></link>
– Stinky
XSL Conditionals: choose• <xsl:choose>
<xsl:when test="url"> <A href="{url}"> <xsl:choose> <xsl:when test="name"> <xsl:value-of select="name" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="url" /> </xsl:otherwise> </xsl:choose> </A> </xsl:when> <xsl:otherwise> <xsl:value-of select="name" /> </xsl:otherwise></xsl:choose>
XSL Looping: for-each
• <xsl:for-each select="chapter"> <h2><xsl:value-of select="@title"/> </h2></xsl:for-each>
• Functional overlap with apply-templates– Difference in programming style– Use it inside a given template rule
Template Modes• Same element name, different context -> different
template, different output• Can invoke apply-templates with a mode, matches
corresponding moded template• <h1>Table of Contents</h1>
<ol><xsl:apply-templates select="chapter" mode="toc"/></ol>
• <xsl:template select="chapter" mode="toc"> <li><xsl:value-of select="@title"/></li></xsl:template>
• <xsl:template select="chapter"> <h1><xsl:value-of select="@title"/></h1> <xsl:apply-templates/></xsl:template>
XSL vs. CSS
• Similar problem, different solutions• CSS takes HTML and applies fonts,
styles, positions• XSL takes any XML and turns it into
anything else• XSL more powerful than CSS
– e.g. can use same content in multiple places in result document
XSL Disadvantages
• Confusing syntax and semantics– Like Prolog+C+XML – It's really a programming language, but using markup
language syntax – yuck!
• Hard to debug– XSL Trace helps a little
• Don't have full power of, say, Java inside templates– No database access, hashtables, methods, objects, etc.
• Still need separate .xsl file for each client device
Other XSL-Based Products
• LotusXSL• Resin by Caucho• Cocoon• IBM XSL Trace• Xalan (Apache)• XT• Cocoon• Resin• Lots more
Links: XML
• XML Spec– http://www.w3.org/TR/REC-xml
• XML FAQ– http://www.ucc.ie/xml/
• Café con Leche– http://metalab.unc.edu/xml/
• XML.com– http://www.xml.com/
• Servlet FAQ in XSL– http://www.purpletech.com/servlet-faq/
References
• McLaughlin, "Java and XML", O'Reilly• Eckstein, "XML Pocket Reference",
O'Reilly• Harrold, "XML Bible"• Bradley, "The XML Companion",
Addison-Wesley
Q&A