the semantic blessings of xslt diederik gerth van wijk [email protected] xml holland 2008 planetarium...
TRANSCRIPT
The Semantic Blessings of XSLT
Diederik Gerth van [email protected]
XML Holland 2008Planetarium Gaasperplas, Amsterdam, 20 november
DOXATRIX
Diederik Gerth van Wijk Semantic Blessings of XSLT 2
Intended audience
Understands English Knows what XML is about Cares about meaning, processing and validation Does not need to know about XSLT Does not need to be a programmer But might be aware that computers need to be programmed
Diederik Gerth van Wijk Semantic Blessings of XSLT 3
Semantic? Blessings? XSLT?
XML is about the structure of a document Semantics are about “meaning” A schema can say that a document should have a title (structure) The documentation might add that a title is used for identification (unique
within a set of documents), and give a clue about what the document is about (semantics)
The words used in the title are really semantics Blessings are good, helpful, you want them What is XSLT? How can XSLT help you in adding, verifying and using semantic markup?
Diederik Gerth van Wijk Semantic Blessings of XSLT 4
Why bother marking up explicitly?
Diederik Gerth van Wijk Semantic Blessings of XSLT 5
NLP is good, Explicit Markup is better
“Plein 26 Den Haag”=<street>Plein</street><nr>26</nr><city>Den Haag</city>
“Plein 1813 Den Haag”=<street>Plein 1813</street><city>Den Haag</city>
XML is about tagging structure A schema adds semantics <name>Quattro Staggioni</name>: Pizza by Mario or piece by Vivaldi? I don’t care (in this presentation)
Diederik Gerth van Wijk Semantic Blessings of XSLT 6
eXtensible Stylesheet Language - Transformations
XSL: the eXtensible Stylesheet Language Family of three W3C recommendations for transformation and
presentation
XML Path Language (XPath)
XSL Transformations (XSLT)
XSL Formatting Objects (XSL-FO)XSLT
stylesheet 1
XSLTstylesheet 2
XSLT processor
HTMLpages
XMLsource
document(s)
XSL-FOdocument XSL-FO processor
Diederik Gerth van Wijk Semantic Blessings of XSLT 7
XSLT characteristics
An XSLT style sheet is an XML document Input is one or more XML documents Output is one or more XML (XSLT!), HTML, XSL-FO or plain text (CSS!)
documents Style sheet can look like template of the result document (data pull) Or be event driven (data push) Elements and attributes are “events” Functional programming language Rule based Declarative No side effects Statements can be executed in any order Embeds XPath XSLT 2.0 and XPath 2.0 know XML Schema types XSLT 2.0 can compute from implicit structure
Diederik Gerth van Wijk Semantic Blessings of XSLT 8
XSLT engines
stand alone:
Saxon (open source, Michael Kay)
Altova (free, XML Spy)
MSXML
on server:
Saxon + .NET
Altova + .NET
MSXML + ASP
built in browser:
IE6 and higher
FF1 and higher
Opera9 and higher
Diederik Gerth van Wijk Semantic Blessings of XSLT 9
What’s the competition?
CSS (Cascading Style Sheets)
Easier, simpler
Don’t transform
Perl, Python, Java, JavaScript, C(++), (V)Basic
Generic programming or scripting languages
No built in knowledge of XML, but lots of libraries for DOM or SAX
JSP, ASP, PHP
Server side processing
Not really XML aware
Little or no transformation
IS-10179 DSSSL: Document Style Semantics and Specification Language
SGML based
Rarely used
Diederik Gerth van Wijk Semantic Blessings of XSLT 10
XSLT and semantics...
XML elements describe what the content is (semantics) XSLT stylesheets what to do (processing) with them How can a processing stylesheet be a semantic blessing?
Diederik Gerth van Wijk Semantic Blessings of XSLT 11
Blessing 3: XSLT 2.0 may be schema aware
A schema defines the semantics of a document type XSLT 2.0 is based on XPath 2.0 XSLT 2.0 may use schemas Then, XPath 2.0 can use the type of element types or attributes So it can know whether to treat an attribute as string or as integer
(”12” < ”3” if type is string, ”12” > ”3” if type is integer) But will it sort correctly:
<song title=”50 ways to leave your lover” performer=”Paul Simon” /><song title=”1919 rag” performer=”Kid Ory” />or<king name=”Henry VIII” born=”1491-06-28” died=”1547-01-28” /><king name=”Henry IX” born=”1725-03-11” died=”1807-07-13” />(yes, if the roman numbers were coded as Ⅷ and Ⅸ)
With the “instance of” operator you can use information that is not in the document, but is in the schema
Therefore, XSLT 2.0 disencourages stand alone processing From a semantic point of view, that’s a blessing
Diederik Gerth van Wijk Semantic Blessings of XSLT 12
Blessing 4: Schema independent processing (1)
In a sequence group, the order contains no information:(title, abbreviated-title?) (1)is equivalent to(abbreviated-title?, title) (2)
Suppose, you want to print the abbreviated title if one is coded, and otherwise the full title
In streamprocessing, the q&d solution might be as simple as:temp=getNextElement; if existsNextElement then write(getNextElement)
else write(temp); (1)orwrite(getNextElement); (2)
But what if you decide to change from order (1) to (2)? Or add an optional element toc-title?
(title, abbreviated-title?, toc-title?) (1)(toc-title?, abbreviated-title?, title) (2)
The simple program breaks
Diederik Gerth van Wijk Semantic Blessings of XSLT 13
Blessing 4: Schema independent processing (2)
In XSLT, you have access to the elements by name, in arbitrary order The style sheet fragment looks like
<xsl:choose><xsl:when test="./abbreviated-title">
<xsl:value-of select="abbreviated-title"/></xsl:when><xsl:otherwise>
<xsl:value-of select="title"/></xsl:otherwise>
</xsl:choose>
If the schema (and documents) change order, the stylesheet remains the same
If an optional toc-title is added, the stylesheet remains the same Verbosity turns out to be simpler, in the long run By the way, if sequence matters in the document, it shouldn’t in the
schema Reasons to prescribe sequence:
to ease input
to enforce cardinality
Diederik Gerth van Wijk Semantic Blessings of XSLT 14
Blessing 5: functional programming
No variables Suppose you want to sort items alphabetically and do act on each new
letter First idea:
<xsl:variable name="PrevLetter" select="' '" /><xsl:for-each select="book">
<xsl:sort select="title" data-type="text" order="ascending"/>
<xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" />
<xsl:if test="$PrevLetter!=$ThisLetter">
<H2><xsl:value-of select="$ThisLetter"/></H2>
</xsl:if>
<xsl:variable name="PrevLetter" select="$ThisLetter" />
<H3><xsl:value-of select="title"/></H3>
</xsl:for-each>
No good: the value of the variable PrevLetter is reset in every iteration of the for-each loop
Diederik Gerth van Wijk Semantic Blessings of XSLT 15
Would this work?
<xsl:for-each select="book">
<xsl:sort select="title" data-type="text" order="ascending"/>
<xsl:variable name="PrevLetter" select="substring(preceding-sibling::book[1]/title/.[1],1,1)" />
<xsl:variable name="ThisLetter" select="substring(title/.[1],1,1)" />
<xsl:if test="$PrevLetter!=$ThisLetter">
<H2><xsl:value-of select="$ThisLetter"/></H2>
</xsl:if>
<H3><xsl:value-of select="title"/></H3>
</xsl:for-each>
Better, but the function preceding-sibling operates on the original order, not on the sorted...
Is that a bug or a feature? It’s a blessing!
Diederik Gerth van Wijk Semantic Blessings of XSLT 16
The solution
<xsl:for-each-group select="book" group-by="substring(title/.[1],1,1)">
<H2><xsl:value-of select="current-grouping-key()"/></H2>
<xsl:for-each select="current-group()">
<xsl:sort select="title" data-type="text" order="ascending"/>
<H3><xsl:value-of select="title"/></H3>
</xsl:for-each>
</xsl:for-each-group>
Think XML Think in creating hierarchies: groups of titles starting with the same letter
Diederik Gerth van Wijk Semantic Blessings of XSLT 17
The ultimate semantic normalisation
“PCDATA considered harmful” (Han Nonnekes, Shell Oil) Text is the outer structure in a specific language of a deeper meaning You should encode a text as that deeper tree With references to abstract words (concepts) For each language (“English, upper class, around 1850”) give dictionary
and transformation rules Then generate the text
Diederik Gerth van Wijk Semantic Blessings of XSLT 18
Questions?
Ask me now Ask me during lunch or tea break Ask me during buffet Mail [email protected] Presentation can be downloaded from
www.xmlholland2008.nl
www.doxatrix.nl/dg
Diederik Gerth van Wijk Semantic Blessings of XSLT 19