processing of structured documents. transforming xml

84
Processing of structured documents

Upload: elijah-ward

Post on 02-Jan-2016

229 views

Category:

Documents


1 download

TRANSCRIPT

Processing of structured documents

Transforming XML

Extensible Stylesheet Language (XSL)

a language for transforming XML documents: XSLT

an XML vocabulary for specifying the formatting of XML documents

XSLT

specifies the conversion of a document from one format to another

XSLT transformation (stylesheet) is a valid XML document

based on hierarchical tree structure a transformation describes rules for transforming

a source tree into a result tree a rule: a template with a pattern

a pattern is matched against elements in the source treea template is instantiated to create part of the result tree

Processing model

A list of source nodes is processed to create a result tree fragment

the result tree is constructed by processing a list containing just the root node

a list of source nodes is processed by appending the result tree structure created by processing each of the members of the list in order

Processing model

A node is processed by finding all the template rules with patterns that match the node, and choosing the best amongst them

the chosen rule’s template is then instantiated with the node as the current node and with the list of source nodes as the current node list

A template typically contains instructions that select an additional list of source nodes (e.g. children) for processing

processing continues until no new source nodes

XSL stylesheet is an XML document

must be well-formedmust contain an XML declarationmust declare all the namespaces it usesthe XSL namespace (prefix xsl:) defines

elements that are needed for performing transformations

Skeleton XSL stylesheet

<?xml version=”1.0” ?>

<xsl:stylesheet

xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”

version=”1.0”>

...

</xsl:stylesheet>

Printing all the text data:

<?xml version=”1.0” ?>

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”version=”1.0”>

<xsl:template match=”/”> <xsl:apply-templates /></xsl:template>

</xsl:stylesheet>

Template rules

A template rule is specified with the xsl:template element

attribute match: a pattern that identifies the source node or nodes to which the rule applies

the content is the template that is instantiated when the template rule is instantiated<xsl:template match=”[XPath expression]”>

<!-- content -->

</xsl:template>

Example

In XML document:

This is an <emph>important</emph> point.

The following template rule matches emph elements and produces a fo:inline-sequence formatting object with a font-weight property of bold.

<xsl:template match=”emph”>

<fo:inline-sequence font-weight=”bold”>

<xsl:apply-templates/>

</fo:inline-sequence>

<xsl:template>

Applying template rules

recursively processing the children of the current source element

element xsl:apply-templatesattribute select

in the absence of select attribute, the xsl:apply-templates instruction processes all of the children of the current node, including text nodes

a select attribute is used to process nodes selected by an expression (that returns a node set)

the selected set of nodes is processed in document order

Examples

<xsl:template match=”author-group”>

<fo:inline-sequence>

<xsl:apply-templates select=”author”/>

</fo:inline-sequence>

</xsl:template>

<xsl:template match=”author-group”>

<fo:inline-sequence>

<xsl:apply-templates select=”author/given-name”/>

</fo:inline-sequence>

</xsl:template>

Example

Processing of all of the heading descendant elements of the book element:

<xsl:template match=”book”>

<fo:block>

<xsl:apply-templates select=”.//heading”/>

</fo:block>

</xsl:template>

Example

Assume: a department element has a dname child and employee descendants

the rule finds an employee’s department and then processes the dname child of the department

<xsl:template match=”employee”>

<fo:block>

Employee <xsl:apply-templates select=”name”/> belongs to

department <xsl:apply-templates select=”ancestor::department/dname”/>.

</fo:block>

</xsl:template>

Built-in template rules

There are built-in template rules to allow recursive processing to continue in the absence of a successful pattern match by an explicit template rule in the stylesheet

<xsl:template match=”* | /”><xsl:apply-templates/>

</xsl:template>

<xsl:template match=”text() | @*”> <xsl:value-of select=”.”/></xsl:template>

Named templates

Templates can be invoked by namean xsl:template element with a name

attribute specifies a named template if an xsl:template element has a name

attribute, it may also have a match attributean xsl:call-template element invokes a

template by name (using name attribute)xsl:call-template does not change the current

node or the current node list (unlike xsl:apply-templates)

Creating content

Literal result elementscreating elements with xsl:elementcreating attributes with xsl:attribute and

named attribute sets with xsl:attribute-set

creating text, PIs and commentscopyingcomputing generated textnumbering

Literal result elements

In a template, an element that does not belong to the XSLT namespace (~is not an XSL instruction) is instantiated to create an element node with the same name

the content of the element is a template, which is instantiated to give the content of the created element node

the created element node will have the attribute nodes that were present on the element node in the stylesheet tree

Example: Generating HTML

<xsl:template match=”book”> <html> <head> <title>Here is my HTML page!</title> </head> <body> <xsl:apply-templates /> </body> </html></xsl:template>

Creating elements with xsl:element

The xsl:element element allows an element to be created with a computed name

the name of the element to be created is specified by a required name attribute and an optional namespace attribute

the content of the xsl:element element is a template for the attributes and children of the created element

the name attribute is interpreted as an attribute value template

Creating attributes with xsl:attribute

The xsl:attribute element can be used to add attributes to result elements whether created by literal result elements in the stylesheet or by instructions such as xsl:element

the name of the attribute to be created is specified by a required name attribute and an optional namespace attribute

instantiating an xsl:attribute element adds an attribute node to the containing result element node;

Creating attributes with xsl:attribute

the content of the xsl:attribute element is a template for the value of the created attribute

the name attribute is interpreted as an attribute value template

adding an attribute to an element replaces any existing attribute of that element with the same name

an attribute has to be added before the children

Example

<xsl:element name=”myElement”> <xsl:attribute name=”myAttribute”> XML </xsl:attribute> is great!</xsl:element>

Produces: <myElement myAttribute=”XML”>is great!</myElement>

Named attribute sets

The xsl:attribute-set element defines a named set of attributes

the name attribute specifies the name of the attribute set

the content of the xsl:attribute-set element consists of zero or more xsl:attribute elements that specify the attributes in the set

attribute sets are used by specifying a use-attribute-sets attribute on xsl:element, xsl:copy or xsl:attribute-set elements

Named attribute sets

The value of the use-attribute-sets attribute is a whitespace-separated list of names of attribute sets

attribute sets can also be used by specifying an xsl:use-attribute-sets attribute on a literal result element order of adding attributes: 1. Attribute sets, 2.

Attributes specified on the literal result element, 3. Any attributes specified by xsl:attribute elements

later ones override the earlier ones

Creating text

Creating text a template can also contain textnodes each text node will create a text node with the

same string-value in the result tree adjacent text nodes are automatically merged literal data characters may also be wrapped in

an xsl:text element (may change whitespace handling)

Creating PIs

The xsl:processing-instruction element is instantiated to create a processing instruction node

the name attribute specifies the name of the processing instruction node

<xsl:processing-instruction name=”xml-stylesheet”>

href=”book.css” type=”text/css”</processing-instruction>

creates: <?xml-stylesheet href=”book.css” type=”text/css”?>

Creating comments

The xsl:comment element is instantiated to create a comment node in the result tree

<xsl:comment>This file is automatically generated. Do not edit!</xsl:comment>

creates:

<!--This file is automatically generated. Do not edit!-->

Copying

The xsl:copy element provides an easy way of copying the current node

attributes and children are not automatically copied the content of the xsl:copy element is a template

for the attributes and children of the created node

Example

Copying the language attributes for each element

use (instead of <xsl:apply-templates/>): <xsl:call-template name=”apply-templates-copy-lang”/>

<xsl:template name=”apply-templates-copy-lang”>

<xsl:for-each select=”@xml:lang”>

<xsl:copy/>

</xsl:for-each>

<xsl:apply-templates/>

<xsl:template>

xsl:copy-of

The xsl:copy-of element can be used to insert a result tree fragment into the result tree, without first converting it to a string (as xsl:value-of does)

the required select attribute contains an expression

when the result of evaluating the expression is a result tree fragment, the complete fragment is copied

into the result tree node set, all the nodes are copied (with children)

Copying parts without transforming

sometimes a part should be passed as such, without any transformation

assume: copyright contains some HTML formatting:

<xsl:template match=”copyright”> <xsl:copy-of select=”*” /></xsl:template>

Computing generated text

Within a template, the xsl:value-of element can be used to compute generated text e.g. by extracting text from the source tree or by

inserting the value of a variable the xsl:value-of element is instantiated to create

a text node in the result tree

the required select attribute is an expression the expression is evaluated and the resulting

object is converted to a string

ExampleAssume: a person element with given-name

and family-name attributescreate an HTML paragraph

the value of the given-name attribute, a space, the value of the family-name attribute (for current node)

<xsl:template match=”person”>

<p>

<xsl:value-of select=”@given-name”/>

<xsl:text> </xsl:text>

<xsl:value-of select =”@family-name”/>

</p> </xsl:template>

Examples

<xsl:value-of select=”.”/> output the string-value of the current node

<xsl:value-of select=”title”/> output the string-value of the first child title element

of the current node

<xsl:value-of select=”sum(@*)”/> ouput the sum of the values of the attributes of the

current node, converted to a string

<xsl:value-of select=”$x”/> output the value of the variable $x, converted to a

string

Attribute value templates

In an attribute value that is interpreted as an attribute value template, such as an attribute of a literal result element, an expression can be used by surrounding the expression with curly braces ({})

Example

<xsl:variable name=”img-dir”>/images</xsl:variable>

<xsl:template match=”photograph”> <img src=”{$img-dir}/{href}” width=”{size/@width}”/><xsl:template>

XML document: <photograph> <href>headquarters.jpg</href> <size width=”300”> </photograph>

result: <img src=”/images/headquarters.jpg” width=”300”/>

Numbering

The xsl:number element is used to insert a formatted number into the result tree

the number to be inserted may be specified by an expression

the value attribute contains an expression the expression is evaluated and the resulting object

is converted to a number the number is rounded and converted to a string if no value attribute is specified, the number based

on the position of the current node is inserted

Example: numbering a sorted list

<xsl:template match=”items>

<xsl:for-each select=”item”>

<xsl:sort select=”.”>

<p>

<xsl:number value=”position()” format=”1. ”/>

<xsl:value-of select=”.”/>

</p>

</xsl:for-each>

</xsl:template>

Numbering by position

The xsl:number element has the following attributes level: specifies what levels of the source tree should

be considered; has values single, multiple, or any count: is a pattern that specifies what nodes should

be counted at those levelsif not specified, it defaults to the pattern that matches any

node with the same node type as the current node, and if the current node has a name, with the same name as the current node

from: is a pattern that specifies where counting starts

Example

Assume: a document contains a sequence of chapters followed by a sequence of appendixes both chapters and appendixes contain sections,

which in turn contain subsections

the following rules would number title elementsnumbering:

chapters: 1,2,3,… appendixes: A,B,C,… sections in chapters: 1.1, 1.2, 1.3, … sections in appendixes: A.1, A.2, A.3, ...

Example

<xsl:template match=”title”> <fo:block>

<xsl:number level=”multiple” count=”chapter|section|subsection” format=”1.1 ”/>

<xsl:apply-templates> <fo:block><xsl:/template>

Example

<xsl:template match=”appendix//title” priority=”1”> <fo:block>

<xsl:number level=”multiple” count=”appendix|section|subsection” format=”A.1 ”/>

<xsl:apply-templates> <fo:block><xsl:/template>

Number to string conversion attributes

format: tokens with separators the default value is 1 any token where the last character has a decimal

digit value of 11: 1 2 3 … 01: 01 02 03 … 09 10 11

A: A B C … Z AA AB AC … a: a b c … z aa ab ac … i: i ii iii iv v vi … I: I II III IV V VI …

separators: e.g. A.1 if more numbers than format tokens, the last format

token is used to format remaining numbers

Number to string conversion attributes

format grouping-separator: grouping (e.g.

thousands) separator in decimal numbering sequences

grouping-size: the size (normally 3) of the grouping

e.g. grouping-separator=”,” and grouping-size=”3”numbers of the form 1,000,000

Repetition

When the result has a known regular structure, it is useful to be able to specify directly the template for selected nodes

the xsl:for-each instruction contains a template, which is instantiated for each node selected by the expression specified by the select attribute

the expression must evaluate to a node-set the template is instantiated with the selected

node as the current node, and with the list of all of the selected nodes as the current node list

Example: XML document

<customers> <customer> <name>…</name> <order>…</order> <order>…</order> </customer> <customer> <name>…</name> <order>…</order> <order>…</order> </customer></customers>

Create HTML document containing a table with a row for each customer element

<xsl:template match=”/”> <html><head><title>Customers</title></head> <body> <table><tbody> <xsl:for-each select=”customers/customer”> <tr><th><xsl:apply-templates select=”name”/></th> <xsl:for-each select=”order”> <td><xsl:apply-templates/></td> </xsl:for-each> </tr> </xsl:for-each></tbody></table> </body></html></xsl:template>

Conditional processing

Two instructions support conditional processing xsl:if (if-then conditionality) xsl:choose (choice from several alternatives)

xsl:if has a test attribute, which specifies an expression

example: comma follows, if not last in the list

<xsl:template match=”namelist/name”> <xsl:apply-templates/> <xsl:if test=”not(position()=last())”>, </xsl:if></xsl:template>

Example

The following colors every other table row yellow:

<xsl:template match=”item”> <tr> <xsl:if test=”position() mod 2 = 0”> <xsl:attribute name=”bgcolor”>yellow</xsl:attribute> </xsl:if> <xsl:apply-templates/> </tr></xsl:template>

Conditional processingxsl:choose element selects one among a

number of possible alternativesconsists of a sequence of xsl:when elements

followed by an optional xsl:otherwise elementeach xsl:when element has a single attribute,

test, which specifies an expressioneach of the xsl:when elements is tested in turn the content of the first, and only the first,

element whose test is true, is instantiated if no test is true, xsl:otherwise is instantiated

Example

<xsl:for-each select=”chapter”> <xsl:choose> <xsl:when test=”@focus=’Java’”> <li><xsl:value-of select=”title” /> (Java Focus) </li> </xsl:when> <xsl:when test=”@focus=’JavaScript’”> <li><xsl:value-of select=”title”/> (JavaScript Focus) </li> </xsl:when> <xsl:otherwise> <li><xsl:value-of select=”title” /> (XML Focus)</li> </xsl:otherwise> </xsl:choose><xsl:for-each>

Sorting

Sorting is specified by adding xsl:sort elements as children of an xsl:apply-templates or xsl:for-each element

the first xsl:sort child specifies the primary sort key, the second xsl:sort child specifies the second sort key, and so on

nodes are sorted according to the sort keys, and then processed in sorted order

xsl:sort has a select attribute default is . (string-value of the current node as a key)

Sorting

xsl:sort has optional attributes order: ascending (default) or descending lang: language of the sort keys data-type: the data type of the strings

text: sort keys should be sorted lexicographicallynumber: sort keys should be converted to numbers

and then sorted according to the numeric valueother values may be provided later (from XML

Schemas)

case-order: upper-first or lower-firstdefault is language dependent

Example

<employees> <employee> <name> <given>James</given> <family>Clark</family> </name> … </employee></employees>

Example: list of employees sorted by name

<xsl:template match=”employees”> <ul> <xsl:apply-templates select=”employee”> <xsl:sort select=”name/family” /> <xsl:sort select=”name/given” /> </xsl:apply-templates> </ul></xsl:template>

<xsl:template match=”employee”> <li> <xsl:value-of select=”name/given” /> <xsl:text> </xsl:text> <xsl:value-of select=”name/family” /> </li></xsl:template>

Variables and parametersA variable is a name that may be bound to a valuethe value of the variable can be an object of any of

the types that can be returned by expressionstwo elements: xsl:variable and xsl:paramxsl:param : the value specified on the xsl:param

variable is only a default value for the binding when the template or the stylesheet within which the

xsl:param element occurs is invoked, parameters may be passed that are used instead of the defaults

Variables and parameters

Both xsl:variable and xsl:param have a required name attribute: name of the variable

for any use of xsl:variable and xsl:param, there is a region of the stylesheet tree within which the binding is visible within this region, any binding of the variable that

was visible on the variable-binding element itself is hidden -> only the innermost binding is visible

Values of variables and parameters

A variable-binding element can specify the value of the variable in three alternative ways if the element has a select attribute:

the value of the select attribute must be an expressionthe value of the variable is the object resulting from evaluation of

the expressioncontent of the variable-binding element has to be empty

if the element does not have a select attribute and the content is non-tempty:

the content of the element specifies the valuethe content is a template, which is instantiated to give the valuethe value is a result tree fragment

otherwise: the value is an empty string

Top-level variables and parameters

Both xsl:variable and xsl:param are allowed as top-level elements

a top-level variable-binding element declares a global variable that is visible everywhere

a top-level xsl:param declares a parameter to the stylesheet XSLT does not specify how the parameters are

passed to the stylesheet

context for expressions for specifying the value: the root node

Variables within templates

Both xsl:variable and xsl:param are allowed in templates

xsl:variable is allowed anywhere that an instruction (xsl:…) is allowed the binding is visible for all following siblings and their

descendants the binding is not visible for the xsl:variable element

itself

xsl:param is allowed in the beginning of an xsl:template element visibility as with xsl:variable

Passing parameters to templates

Parameters are passed to templates using the xsl:with-param element

the required name attribute specifies the name of the parameter

xsl:with-param is allowed within xsl:call-template and xsl:apply-templates

the value is specified as for xsl:variable and xsl:param

Example<xsl:template name=”numbered-block”> <xsl:param name=”format”>1. </xsl:param> <fo:block> <xsl:number format=”{$format}”/> <xsl:apply-templates/> </fo:block></xsl:template>

<xsl:template match=”ol//ol/li”> <xsl:call-template name=”numbered-block”> <xsl:with-param name=”format”>a. </xsl:with-param> </xsl:call-template></xsl:template>

Output

xsl:output element allows stylesheet authors to specify how they wish the result tree to be output

xsl:output is a top-level elementthe method attribute identifies the method

that should be used for ouputting the result tree value can be: html, xml, text … or some other name (behavior not specified

by XSLT)

Output

Default of the method attribute the default is html, if

the root node of the result tree has an element childthe name of the first element child is htmlany text nodes preceding the first element child

contain whitespace characters only

otherwise the default is xml

XML output method

Outputs the result tree as a well-formed external parsed entity

if the root node of the result tree has a single element node child and no text node children, then the entity should be a well-formed XML document entity

attributes (among others): version: the XML version (default 1.0) encoding: the preferred character encoding omit-xml-declaration (yes or no)

HTML output method

Outputs the result tree as HTMLattributes:

version: the version of HTML (default is 4.0)

un-prefixed elements are interpreted as HTML, others as XML

empty elements <br></br> and </br> -> <br>

HTML names should be recognized regardless of case

Text output method

Outputs the result tree by outputting the string-value of every text node in the result tree in document order without any escaping (= character references are expanded)

Combining stylesheets

Two mechanisms to combine stylesheets inclusion: allows stylesheets to be combined

without changing the semantics of the stylesheets being combined

import: allows stylesheets to override each other

Stylesheet inclusion

An XSLT stylesheet may include another XSLT stylesheet using an xsl:include element

the element has an href attribute whose value is a URI reference identifying the stylesheet to be included

a top-level element the resource located by the href attribute is

parsed as an XML document and the children of the xsl:stylesheet element in this document replace the xsl:include element in the including document

Stylesheet import

An XSLT stylesheet may import another XSLT stylesheet using an xsl:import element

importing a stylesheet is the same as including it, except that definitions and template rules in the importing stylesheet take precedence over template rules and definitions in the imported stylesheet

a top-level elementan href attibute

Stylesheet import

The xsl:import elements must precede all the other element children of an xsl:stylesheet element, including any xsl:include elements

when xsl:include is used to include a stylesheet, any xsl:import elements in the included document are moved up in the including document to after any existing xsl:import elements in the including document

The import treeThe xsl:stylesheet elements encountered during

processing of a stylesheet that contains xsl:import elements are treated as forming an import tree

in the import tree, each xsl:stylesheet element has one import child for each xsl:import element that it contains

any xsl:include elements are resolved before constructing the import tree

import precedence is defined based on a post-order traversal (before = lower, after = higher)

The import tree

Assume: stylesheet A imports stylesheets B and C in that

order stylesheet B imports stylesheet D stylesheet C imports stylesheet E

the order of import precedence (lowest first): D, B, E, C, A

a definition or template rule with higher precedence takes precedence over a definition or template rule with lower import precedence

Conflict resolution for template rules

It is possible for a source node to match more than one template rule

the template rule to be used is determined as follows 1. All matching template rules that have lower

import precedence than the matching rules with the highest import precedence are eliminated from consideration

2. All matching template rules that have lower priority than the matching rules with the highest priority are eliminated from consideration

Conflict resolution for template rules: priority

The priority of a template rule is specified by the priority attribute on the template rule

if the pattern contains multiple alternatives separated by | , then it is treated equivalently to a set of template rules, one for each alternative

explicit priority is specified using a numeric value for the priority attribute

implicit priority assumed based on pattern specificity (see next slide)

The default priority is defined as follows: -0.5 for patterns comprised of only a wildcard or node

typewildcards: ”*” and ”@*”node types: node(), comment(), processing_instruction() or text()

-0.25 for patterns comprised of a namespace prefix and a wildcard

”prefix:*”

0 for patterns comprised of only a node’s nameun-prefixed or child:: axis for elementsprefixed with ”@” or attribute:: axis for an attributewith or without a namespace prefix

0.5 for all other patterns

Example

The element figure specifies a figure reference

the element para specifies a paragraph of content

the element margin specifies a marginalia construct

ExampleThe rendering of figures differs based on context

when found outside a paragraph when found inside a paragraph when found inside a paragraph that is marginalia

without priority the following rules would conflict when processing a figure in a paragraph

in marginalia

<xsl:template match=”margin/para/figure” priority=”2”><xsl:template match=”para/figure”> <!-- priority=”0.5” --><xsl:template match=”figure”> <!--priority=”0” --><xsl:template match=”*”> <!-- priority=”-0.5” -->

Overriding template rules

A template rule that is being used to override a template rule in an imported stylesheet can use the xsl:apply-imports element to invoke the overridden template rule

Overriding template rules

<xsl:template match=”example”> … is contained in <pre><xsl:apply-templates/></pre> doc.xsl...<xsl:template>----------------------------------------------------------------------------<xsl:import href=”doc.xsl”/>

<xsl:template match=”example”> <div style=”border: solid red”> <xsl:apply-imports/> </div></xsl:template>

effect: <div style=”border: solid red”><pre>…</pre></div>

Modes

Modes allow an element to be processed multiple times, each time producing a different result

both xsl:template and xsl:apply-templates have an optional mode attribute if xsl:template does not have a match attribute, it

must not have a mode attribute if an xsl:apply-templates element has a mode

attribute, then it applies only to those template rules from xsl:template elements that have a mode attribute with the same value

”no mode” applies to ”no mode”

Example

<xsl:template match=”heading-1” mode=”table-of-contents”>…</xsl:template>

<xsl:apply-templates select=”heading-1” mode=”table-of-contents”/>