an introduction to xml and web technologies...
TRANSCRIPT
An Introduction to XML and Web Technologies
Summary
Hans Fredrik NordhaugBased on slides by Anders Møller & Michael I. Schwartzbach
2006 Addison-Wesley
2Summary of An Introduction to XML and Web Technologies, updated November 2012.
Things you need to know
� The XML (meta) language• HTML (SGML) vs XML
� XPath for navigating XML trees� Schema languages (to describe a ML)
• DTD, XML Schema, ...
� XSLT for transforming XML documents� XQuery for querying XML documents� XML programming – DOM and SAX� Web Services
• SOAP, WSDL, UDDI
An Introduction to XML and Web Technologies
HTML and Web Pages
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
4Summary of An Introduction to XML and Web Technologies, updated November 2012.
Markup Languages
� Notation for adding formal structure to text� Charles Goldfarb, the INLINE system (1970)� Standard Generalized Markup Language, SGML (1986)� DTD, element, attribute, tag, entity:
<!DOCTYPE greeting [
<!ELEMENT greeting (#PCDATA)>
<!ATTLIST greeting style (big|small) "small">
<!ENTITY hi "Hello">
]>
<greeting style="big"> &hi; world! </greeting>
5Summary of An Introduction to XML and Web Technologies, updated November 2012.
The Origins of the WWW
� WWW was invented by Tim Berners-Lee at CERN (1989)� Hypertext across the Internet (replacing FTP)� Three constituents: HTML + URL + HTTP
� HTML is an SGML language for hypertext
� URL is an notation for locating files on serves� HTTP is a high-level protocol for file transfers
6Summary of An Introduction to XML and Web Technologies, updated November 2012.
The Design of HTML
� Simple, purist design principles� HTML describes the logical structure of a document� Browsers are free to interpret tags differently� HTML has a formal syntax specification with 800 lines of
DTD notation.� HTML is a lightweight file format� Size of file containing just ”Hello World!”:
Postscript 11,274 bytes
PDF 4,915 bytes
MS Word 19,456 bytes
HTML 28 bytes
7Summary of An Introduction to XML and Web Technologies, updated November 2012.
Problems with Invalid HTML
� Most HTML documents on the Web are invalid
� There are several different browsers� Each browsers has many different
implementations� Each implementation must interpret invalid HTML� There are many arbitrary choices to make
� The HTML standard has been undermined
� HTML renders differently for most clients
8Summary of An Introduction to XML and Web Technologies, updated November 2012.
Bytes vs. Characters
� HTML files are represented as text files� A text file is logically a sequence of characters
� But physically a sequence of bytes
� Several mappings exist:• ASCII• ISO-8859-1• UTF-8 (Unicode)
� Unicode aims to cover all characters in all past or present written languages
9Summary of An Introduction to XML and Web Technologies, updated November 2012.
World Wide Web Consortium (W3C)
� Develops HTML, CSS, and most Web technology� Founded in 1994� Has 380 companies and organizations as members� Is directed by Tim Berners-Lee� Located at MIT (US), Inria (France), Keiko (Japan)
An Introduction to XML and Web Technologies
XML Documents
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
11Summary of An Introduction to XML and Web Technologies, updated November 2012.
What is XML?
� XML: Extensible Markup Language
� A framework for defining markup languages� Each language is targeted at its own
application domain with its own markup tags� There is a common set of generic tools for
processing XML documents � XHTML: an XML variant of HTML� Inherently internationalized and platform
independent (Unicode)� Developed by W3C, standardized in 1998
12Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML Trees
� Conceptually, an XML document is a tree structure
• node, edge• root, leaf• child, parent• sibling (ordered),
ancestor,descendant
13Summary of An Introduction to XML and Web Technologies, updated November 2012.
Nodes in XML Trees
� Text nodes: carry the actual contents, leaf nodes� Element nodes: define hierarchical logical
groupings of contents, each have a name
� Attribute nodes: unordered, each associated with an element node, has a name and a value
� Comment nodes: ignorable meta-information� Processing instructions: instructions to specific
processors, each have a target and a value
� Root nodes: every XML tree has one root node that represents the entire tree
14Summary of An Introduction to XML and Web Technologies, updated November 2012.
Textual Representation
� Text nodes: written as the text they carry� Element nodes: start-end tags
• <bla ...> ... </bla>
• short-hand notation for empty elements: <bla/>
� Attribute nodes: name=“value” in start tags� Comment nodes: <!-- bla -->
� Processing instructions: <?target value?>
� Root nodes: implicit
15Summary of An Introduction to XML and Web Technologies, updated November 2012.
More Constructs
� XML declaration� Character references� CDATA sections� Document type declarations and entity references
explained later...
� Whitespace?
16Summary of An Introduction to XML and Web Technologies, updated November 2012.
Well-formedness
� Every XML document must be well-formed
• start and end tags must match and nest properly• <x><y></y></x> �
• </z><x><y></x></y>
• exactly one root element
• Case-sensitive• …
� in other words, it defines a proper tree structure
� XML parser: given the textual XML document,constructs its tree representation
17Summary of An Introduction to XML and Web Technologies, updated November 2012.
Applications
Rough classification:� Data-oriented languages� Document-oriented languages� Protocols and programming languages� Hybrids
18Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML Namespaces
� When combining languages, element names may become ambiguous!
� Common problems call for common solutions
<widget type="gadget">
<head size="medium"/>
<big><subwidget ref="gizmo"/></big>
<info>
<head>
<title>Description of gadget</title>
</head>
<body>
<h1>Gadget</h1>
A gadget contains a big gizmo
</body>
</info>
</widget>
19Summary of An Introduction to XML and Web Technologies, updated November 2012.
The Idea
� Assign a URI to every (sub-)language
e.g. http://www.w3.org/1999/xhtmlfor XHTML 1.0
� Qualify element names with URIs:
{http://www.w3.org/1999/xhtml}{http://www.w3.org/1999/xhtml}{http://www.w3.org/1999/xhtml}{http://www.w3.org/1999/xhtml}head
20Summary of An Introduction to XML and Web Technologies, updated November 2012.
The Actual Solution
� Namespace declarations bind URIs to prefixes
<... xmlns:fooxmlns:fooxmlns:fooxmlns:foo="http://www.w3.org/TR/xhtml1">...<foo:foo:foo:foo:head>...</foo:foo:foo:foo:head>...
</...>
� Lexical scope� Default namespace (no prefix) declared withxmlnsxmlnsxmlnsxmlns="...“
� Attribute names can also be prefixed
21Summary of An Introduction to XML and Web Technologies, updated November 2012.
Widgets with Namespaces
Namespace map: for each element, maps prefixes to URIs
<widget type="gadget" xmlns="http://www.widget.inc"xmlns="http://www.widget.inc"xmlns="http://www.widget.inc"xmlns="http://www.widget.inc">
<head size="medium"/>
<big><subwidget ref="gizmo"/></big>
<info xmlns:xhtml="http://www.w3.org/TR/xhtml1"xmlns:xhtml="http://www.w3.org/TR/xhtml1"xmlns:xhtml="http://www.w3.org/TR/xhtml1"xmlns:xhtml="http://www.w3.org/TR/xhtml1">
<xhtml:head>
<xhtml:title>Description of gadget</xhtml:title>
</xhtml:head>
<xhtml:body>
<xhtml:h1>Gadget</xhtml:h1>
A gadget contains a big gizmo
</xhtml:body>
</info>
</widget>
22Summary of An Introduction to XML and Web Technologies, updated November 2012.
Summary
� XML: a notation for hierarchically structured text� Conceptual tree model vs.
concrete textual representation� Well-formedness� Namespaces
An Introduction to XML and Web Technologies
The XPath Language
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
24Summary of An Introduction to XML and Web Technologies, updated November 2012.
XPath topics
� Location steps and paths� Typical locations paths� Abbreviations� General expressions
25Summary of An Introduction to XML and Web Technologies, updated November 2012.
XPath Expressions
� Flexible notation for navigating around trees� A basic technology that is widely used
• uniqueness and scope in XML Schema• pattern matching an selection in XSLT• relations in XLink and XPointer• computations on values in XSLT and XQuery
26Summary of An Introduction to XML and Web Technologies, updated November 2012.
Location Paths
� A location path evaluates to a sequence of nodes� The sequence is sorted in document order� The sequence will never contain duplicates
27Summary of An Introduction to XML and Web Technologies, updated November 2012.
Locations Steps
� The location path is a sequence of steps
� A location step consists of• an axis
• a nodetest
• some predicates
axis :: nodetest [Exp1] [Exp2] …
28Summary of An Introduction to XML and Web Technologies, updated November 2012.
Evaluating a Location Path
� A step maps a context node into a sequence � This also maps sequences to sequences
• each node is used as context node• and is replaced with the result of applying the step
� The path then applies each step in turn
29Summary of An Introduction to XML and Web Technologies, updated November 2012.
Contexts
� The context of an XPath evaluation consists of• a context node (a node in an XML tree)• a context position and size (two nonnegative integers)• a set of variable bindings
• a function library
• a set of namespace declarations
� The application determines the initial context� If the path starts with ‘/’ then
• the initial context node is the root• the initial position and size are 1
30Summary of An Introduction to XML and Web Technologies, updated November 2012.
Axes
� An axis is a sequence of nodes� The axis is evaluated relative to the context node� XPath supports 12 different axes
•attribute•following•preceding•self•descendant-or-self•ancestor-or-self
•child•descendant•parent•ancestor•following-sibling•preceding-sibling
31Summary of An Introduction to XML and Web Technologies, updated November 2012.
Axis Directions
� Each axis has a direction
� Forwards means document order:• child, descendant, following-sibling, following, self, descendant-or-self
� Backwards means reverse document order:• parent, ancestor, preceding-sibling, preceding
� Stable but depends on the implementation:• attribute
32Summary of An Introduction to XML and Web Technologies, updated November 2012.
Node Tests
� text()
� comment()
� processing-instruction()
� node()
� *
� QName
� *:NCName
� NCName:*
33Summary of An Introduction to XML and Web Technologies, updated November 2012.
Predicates
� General XPath expressions� Evaluated with the current node as context� Result is coerced into a boolean
• a number yields true if it equals the context position• a string yields true if it is not empty• a sequence yields true if it is not empty
34Summary of An Introduction to XML and Web Technologies, updated November 2012.
Abbreviations
/child::rcp:collection/child::rcp:recipe
/child::rcp:ingredient
/rcp:collection/rcp:recipe/rcp:ingredient
/child::rcp:collection/child::rcp:recipe
/child::rcp:ingredient/attribute::amount
/rcp:collection/rcp:recipe/rcp:ingredient/@amount
/descendant-or-self::node()/ //
self::node() . parent::node() ..
35Summary of An Introduction to XML and Web Technologies, updated November 2012.
General Expressions
� Every expression evaluates to a sequence of• atomic values
• nodes
� Atomic values may be• numbers
• booleans
• Unicode strings
• datatypes defined in XML Schema
� Nodes have identity
36Summary of An Introduction to XML and Web Technologies, updated November 2012.
Atomization
� A sequence may be atomized
� This results in a sequence of atomic values
� For element nodes this is the concatenation of all descendant text nodes
� For other nodes this is the obvious string
37Summary of An Introduction to XML and Web Technologies, updated November 2012.
Filter Expressions
� Predicates generalized to arbitrary sequences� The expression ’.’ is the context item
� The expression:(10 to 40)[. mod 5 = 0 and position()>20]
has the result:30, 35, 40
38Summary of An Introduction to XML and Web Technologies, updated November 2012.
Comparisons
� Value comp (used on atomic values):• Operators: eq, ne, lt, le, gt, ge
� General comp (used on gen. values):• Operators: =, !=, <, <=, >, >=
� Node comp
• Operators: is, <<, >>• Used to compare nodes on identity and order
� XPath violates most algebraic axioms
39Summary of An Introduction to XML and Web Technologies, updated November 2012.
Functions
� XPath has an extensive function library
� Default namespace for functions:http://www.w3.org/TR/xpath-functions
� 106 functions are required• Arithmetic and aggregate • Boolean• String and regexp• Cardinality and sequence• Node• Coercion
40Summary of An Introduction to XML and Web Technologies, updated November 2012.
For and Conditional Expressions
fn:avg(
for $r in //rcp:ingredient return
if ( $r/@unit = "cup" )
then xs:double($r/@amount) * 237
else if ( $r/@unit = "teaspoon" )
then xs:double($r/@amount) * 5
else if ( $r/@unit = "tablespoon" )
then xs:double($r/@amount) * 15
else ()
)
An Introduction to XML and Web Technologies
Schema Languages
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
42Summary of An Introduction to XML and Web Technologies, updated November 2012.
Topics
� The purpose of using schemas� The schema languages DTD and XML Schema
(and DSD2 and RELAX NG)� Regular expressions – a commonly used
formalism in schema languages
43Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML Languages
� XML language: a set of XML documents with some semantics
� schema:a formal definition of the syntax of an XML language
� schema language:a notation for writing schemas
44Summary of An Introduction to XML and Web Technologies, updated November 2012.
Validation
instancedocument
schemaprocessor
schema
valid invalid
normalizedinstancedocument
errormessage
45Summary of An Introduction to XML and Web Technologies, updated November 2012.
Why use Schemas?
� Formal but human-readable descriptions� Data validation can be performed with existing
schema processors
� General Requirements• Expressiveness• Efficiency• Comprehensibility
46Summary of An Introduction to XML and Web Technologies, updated November 2012.
Regular Expressions
� Commonly used in schema languages to describe sequences of characters or elements
� Σ: an alphabet (typically Unicode characters or element names)
� σ∈Σ matches the string σ� α? matches zero or one α� α* matches zero or more α’s� α+ matches one or more α’s� α β matches any concatenation of an α and a β� α | β matches the union of α and β
47Summary of An Introduction to XML and Web Technologies, updated November 2012.
DTD – Document Type Definition
� Defined as a subset of the DTD formalism from SGML
� Specified as an integral part of XML 1.0
� A starting point for development of more expressive schema languages
� Considers elements, attributes, and character data –processing instructions and comments are mostly ignored
48Summary of An Introduction to XML and Web Technologies, updated November 2012.
Document Type Declarations
� Associates a DTD schema with the instance document
� <?xml version="1.1"?><!DOCTYPE collection SYSTEM "http://www.brics.dk/ixwt/recipes.dtd"><!DOCTYPE collection SYSTEM "http://www.brics.dk/ixwt/recipes.dtd"><!DOCTYPE collection SYSTEM "http://www.brics.dk/ixwt/recipes.dtd"><!DOCTYPE collection SYSTEM "http://www.brics.dk/ixwt/recipes.dtd"><collection>...</collection>
� <!DOCTYPE htmlPUBLIC "PUBLIC "PUBLIC "PUBLIC "----//W3C//DTD XHTML 1.0 Transitional//EN”//W3C//DTD XHTML 1.0 Transitional//EN”//W3C//DTD XHTML 1.0 Transitional//EN”//W3C//DTD XHTML 1.0 Transitional//EN”"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
� <!DOCTYPE collection [ ... ][ ... ][ ... ][ ... ]>
49Summary of An Introduction to XML and Web Technologies, updated November 2012.
Element Declarations
<!ELEMENT element-name content-model >
Content models:� EMPTYEMPTYEMPTYEMPTY
� ANYANYANYANY
� mixed content: (#PCDATA|e1|e2|...|en)*
� element content: regular expression over element names(concatenation is written with “,,,,”)
Example:<!ELEMENT table
(caption?,(col*|colgroup*),thead?,tfoot?,(tbody+|tr+)) >
50Summary of An Introduction to XML and Web Technologies, updated November 2012.
Attribute-List Declarations
<!ATTLIST element-name attribute-definitions >
Each attribute definition consists of� an attribute name� an attribute type
� a default declaration
Example:<!ATTLIST input maxlength CDATA #IMPLIED
tabindex CDATA #IMPLIED>
51Summary of An Introduction to XML and Web Technologies, updated November 2012.
Attribute Types
� CDATA: any value� enumeration: (s1|s2|...|sn)
� ID: must have unique value� IDREF (/ IDREFS): must match some ID attribute(s)� ...
Examples:<!ATTLIST p align (left|center|right|justify) #IMPLIED>
<!ATTLIST recipe id ID #IMPLIED><!ATTLIST related ref IDREF #IMPLIED>
52Summary of An Introduction to XML and Web Technologies, updated November 2012.
Attribute Default Declarations
� #REQUIRED
� #IMPLIED (= optional)
� ”value” (= optional, but default provided)
� #FIXED ”value” (= required, must have this value)
Examples:<!ATTLIST form
action CDATA #REQUIRED
onsubmit CDATA #IMPLIED
method (get|post) "get"
enctype CDATA "application/x-www-form-urlencoded" >
<!ATTLIST html
xmlns CDATA #FIXED "http://www.w3.org/1999/xhtml">
53Summary of An Introduction to XML and Web Technologies, updated November 2012.
Entity Declarations
� Internal entity declarations – a simple macro mechanism
� Internal parameter entity declarations – apply to the DTD, not the instance document
54Summary of An Introduction to XML and Web Technologies, updated November 2012.
Checking Validity with DTD
A DTD processor (also called a validating XML parser)
� parses the input document (includes checking well-formedness)
� checks the root element name� for each element, checks its contents and
attributes� checks uniqueness and referential constraints
(ID/IDREF(S) attributes)
55Summary of An Introduction to XML and Web Technologies, updated November 2012.
Limitations of DTD
1. Cannot constraint character data
2. Specification of attribute values is too limited3. Element and attribute declarations are context insensitive
4. Character data cannot be combined with the regular expressioncontent model
5. The content models lack an “interleaving” operator6. The support for modularity, reuse, and evolution is too primitive7. The normalization features lack content defaults and proper
whitespace control8. Structured embedded self-documentation is not possible9. The ID/IDREF mechanism is too simple10. It does not itself use an XML syntax
11. No support for namespaces
56Summary of An Introduction to XML and Web Technologies, updated November 2012.
Requirements for XML Schema
- W3C’s proposal for replacing DTD
Design principles:� More expressive than DTD� Use XML notation� Self-describing� Simplicity
Technical requirements:� Namespace support� User-defined datatypes� Inheritance (OO-like)� Evolution� Embedded documentation� ...
57Summary of An Introduction to XML and Web Technologies, updated November 2012.
Types and Declarations
� Simple type definition:defines a family of Unicode text strings
� Complex type definition: defines a content and attribute model
� Element declaration: associates an element name with a simple or complex type
� Attribute declaration: associates an attribute name with a simple type
58Summary of An Introduction to XML and Web Technologies, updated November 2012.
Connecting Schemas and Instances
<b:card xmlns:b="http://businesscard.org“
xmlns:xsi="http://www.w3.org/2001/XMLSchemaxmlns:xsi="http://www.w3.org/2001/XMLSchemaxmlns:xsi="http://www.w3.org/2001/XMLSchemaxmlns:xsi="http://www.w3.org/2001/XMLSchema----instance"instance"instance"instance"
xsi:schemaLocation="http://businesscard.orgxsi:schemaLocation="http://businesscard.orgxsi:schemaLocation="http://businesscard.orgxsi:schemaLocation="http://businesscard.org
business_card.xsd"business_card.xsd"business_card.xsd"business_card.xsd">
<b:name>John Doe</b:name>
<b:title>CEO, Widget Inc.</b:title>
<b:email>[email protected]</b:email>
<b:phone>(202) 555-1414</b:phone>
<b:logo b:uri="widget.gif"/>
</b:card>
59Summary of An Introduction to XML and Web Technologies, updated November 2012.
Element and Attribute Declarations
Examples:
• <elementelementelementelement name="serialnumber" type="nonNegativeInteger"/>
• <attributeattributeattributeattribute name=”alcohol" type=”r:percentage"/>
60Summary of An Introduction to XML and Web Technologies, updated November 2012.
Derivation of Simple Types
� Restriction� Union� List (not much used)
� Several built-in derived simple types exist.
61Summary of An Introduction to XML and Web Technologies, updated November 2012.
Complex Types with Complex Contents
� Content models as regular expressions:• Element reference <element ref=”name”/>
• Concatenation <sequence> ... </sequence>
• Union <choice> ... </choice>
• All <all> ... </all>
• Element wildcard: <any namespace=”...”processContents=”...”/>
� Attribute reference: <attribute ref=”...”/>
� Attribute wildcard: <anyAttribute namespace=”...” processContents=”...”/>
Cardinalities: minOccurs, maxOccurs, useMixed content: mixed=”true”
62Summary of An Introduction to XML and Web Technologies, updated November 2012.
Derived Complex Types with
� Simple Content� Complex Content
• Extension• Restriction
63Summary of An Introduction to XML and Web Technologies, updated November 2012.
Global vs. Local Descriptions
Global (toplevel) style:
<element name="card“
type="b:card_typecard_typecard_typecard_type"/>
<element name="namenamenamename“
type="string"/>
<complexType name="card_typecard_typecard_typecard_type">
<sequence>
<element refrefrefref="b:namenamenamename"/>
...
</sequence>
</complexType>
Local (inlined) style:
<element name="card">
<complexTypecomplexTypecomplexTypecomplexType>
<sequence>
<element namenamenamename="namenamenamename"
type="string"/>
...
</sequence>
</complexTypecomplexTypecomplexTypecomplexType>
</element>
inlined
64Summary of An Introduction to XML and Web Technologies, updated November 2012.
Global vs. Local Descriptions
� Local type definitions are anonymous
� Local element/attribute declarations can be overloaded – a simple form of context sensitivity
� Only globally declared elements can be starting points for validation (e.g. roots)
� Local definitions permit an alternative namespace semantics – useelementFormDefault="qualified“
to force the use of namespace anonymous types.
65Summary of An Introduction to XML and Web Technologies, updated November 2012.
Uniqueness, Keys, References
<element name="w:widget" xmlns:w="http://www.widget.org">
<complexType>
...
</complexType>
<key name="my_widget_key"><key name="my_widget_key"><key name="my_widget_key"><key name="my_widget_key">
<selector xpath="w:components/w:part"/><selector xpath="w:components/w:part"/><selector xpath="w:components/w:part"/><selector xpath="w:components/w:part"/>
<field xpath="@manufacturer"/><field xpath="@manufacturer"/><field xpath="@manufacturer"/><field xpath="@manufacturer"/>
<field xpath="w:info/@productid"/><field xpath="w:info/@productid"/><field xpath="w:info/@productid"/><field xpath="w:info/@productid"/>
</key></key></key></key>
<keyref name="annotation_references" refer="w:my_widget_key"><keyref name="annotation_references" refer="w:my_widget_key"><keyref name="annotation_references" refer="w:my_widget_key"><keyref name="annotation_references" refer="w:my_widget_key">
<selector xpath=".//w:annotation"/><selector xpath=".//w:annotation"/><selector xpath=".//w:annotation"/><selector xpath=".//w:annotation"/>
<field xpath="@manu"/><field xpath="@manu"/><field xpath="@manu"/><field xpath="@manu"/>
<field xpath="@prod"/><field xpath="@prod"/><field xpath="@prod"/><field xpath="@prod"/>
</keyref></keyref></keyref></keyref>
</element>
uniqueuniqueuniqueunique: as key, but fields may be absent
in every widget, each part must have unique (manufacturer, productid)
in every widget, for each annotation,(manu, prod) must match a my_widget_key
only a “downward” subset of XPath is used
66Summary of An Introduction to XML and Web Technologies, updated November 2012.
Limitations of XML Schema
1. The details are extremely complicated (and the spec is unreadable)2. Declarations are (mostly) context insentitive
3. It is impossible to write an XML Schema description of XML Schema
4. With mixed content, character data cannot be constrained5. Unqualified local elements are bad practice6. Cannot require specific root element
7. Element defaults cannot contain markup8. The type system is overly complicated9. xsi:type is problematic10. Simple type definitions are inflexible
67Summary of An Introduction to XML and Web Technologies, updated November 2012.
Strengths of XML Schema
� Namespace support� Data types (built-in and derivation)� Modularization� Type derivation mechanism
68Summary of An Introduction to XML and Web Technologies, updated November 2012.
Other Schema Language
� Relax NG• OASIS + ISO competitor to XML Schema• Designed for simplicity and expressiveness,
solid mathematical foundation• Pattern based• More powerful mixed content (interleave)• Has alternative compact non-XML syntax
� DSD2• University of Århus and AT&T Labs Research• Rule based (declare and require)• Supports context (in require section)
An Introduction to XML and Web Technologies
Transforming XML Documents with XSLT
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
70Summary of An Introduction to XML and Web Technologies, updated November 2012.
Topics
� How XML documents may be rendered in browsers
� How the XSLT language transforms XML documents
� How XPath is used in XSLT
71Summary of An Introduction to XML and Web Technologies, updated November 2012.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
...
</xsl:stylesheet>
� An XSLT stylesheet contains template rules
� The processor finds the most specific rule for the document root
� It then executes the template body
XSLT Stylesheets
72Summary of An Introduction to XML and Web Technologies, updated November 2012.
Use of XPath in XSLT
� Specifying patterns for template rules� Selecting nodes for processing� Computing boolean conditions� Generating text contents for the output document
73Summary of An Introduction to XML and Web Technologies, updated November 2012.
The XSLT Context
� A context item (a node in the source tree or an atomic value)
� A context position and size
� A set of variable bindings (mapping variable names to values)
� A function library (including those from XPath)� A set of namespace declarations
74Summary of An Introduction to XML and Web Technologies, updated November 2012.
The Initial Context
� The context item is the document root� The context position and size both have value 1� The set of variable bindings contains only global
parameters� The function library is the default one � The namespace declarations are those defined in
the root element of the stylesheet
75Summary of An Introduction to XML and Web Technologies, updated November 2012.
Patterns and Matching
� A pattern is a restricted XPath expression• it is a union of path expressions• each path expression contains a number of steps
separated by / or //• each step may only use the child or attribute axis
� A pattern matches a node if• starting from some node in the tree:• the given node is contained in the resulting sequence
rcp:recipe/rcp:ingredient//rcp:preparation
76Summary of An Introduction to XML and Web Technologies, updated November 2012.
Literal Constructors
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:template match="/">
<html>
<head>
<title>Hello World</title>
</head>
<body bgcolor="green">
<b>Hello World</b>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
77Summary of An Introduction to XML and Web Technologies, updated November 2012.
Explicit Constructors
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:template match="/">
<xsl:elementelementelementelement name="html">
<xsl:elementelementelementelement name="head">
<xsl:elementelementelementelement name="title">
Hello World
</xsl:elementelementelementelement>
</xsl:elementelementelementelement>
<xsl:elementelementelementelement name="body">
<xsl:attributeattributeattributeattribute name="bgcolor" select="'green'"/>
<xsl:elementelementelementelement name="b">
Hello World
</xsl:elementelementelementelement>
</xsl:elementelementelementelement>
</xsl:elementelementelementelement>
</xsl:template>
</xsl:stylesheet>
78Summary of An Introduction to XML and Web Technologies, updated November 2012.
Recursive Application
� The apply-templates element • finds some nodes using the select attribute
• applies the entire stylesheet to those nodes• concatenates the resulting sequences
� The default select value is child::node()
79Summary of An Introduction to XML and Web Technologies, updated November 2012.
Programming
� Repetitions• <xsl:for-each select=”...”> ... </xsl:for-each>
� Conditionals• <xsl:ifififif test=”...">...</xsl:if>if>if>if>
• <<<<xsl:choosechoosechoosechoose><xsl:whenwhenwhenwhen test=”...">...</xsl:when>when>when>when></</</</xsl:choosechoosechoosechoose>
� Template invocation using• <xsl:callcallcallcall----templatetemplatetemplatetemplate name=”..”>
� Variables, parameters and functions
80Summary of An Introduction to XML and Web Technologies, updated November 2012.
Built-In Template Rules
� What happens if no template matches a node?� XSLT applies a default template rule
• text is copied to the output• nodes apply the stylesheet recursively to the children
81Summary of An Introduction to XML and Web Technologies, updated November 2012.
XSL-FO
� XSLT was originally design to target XSL-FO� XSL-FO (Formatting Objects) in an XML language
for describing physical layout of texts� Widely used in the graphics industry� Not supported by any browsers yet
82Summary of An Introduction to XML and Web Technologies, updated November 2012.
XSL-FO for Business Cards
<xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:b="http://businesscard.org"xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:template match="b:card"><fo:root>
<fo:layout-master-set><fo:simple-page-master master-name="simple"
page-height="5.5cm"page-width="8.6cm"margin-top="0.4cm"margin-bottom="0.4cm"margin-left="0.4cm"margin-right="0.4cm">
<fo:region-body/></fo:simple-page-master>
</fo:layout-master-set><fo:page-sequence master-reference="simple">
<fo:flow flow-name="xsl-region-body"><fo:table>
<fo:table-column column-width="5cm"/><fo:table-column column-width="0.3cm"/><fo:table-column column-width="2.5cm"/><fo:table-body>
<fo:table-row><fo:table-cell>
<fo:block font-size="18pt"font-family="sans-serif"line-height="20pt"background-color="#A0D0FF"padding-top="3pt">
<xsl:value-of select="b:name"/></fo:block>
</fo:table-cell>
<fo:table-cell/><fo:table-cell>
<xsl:if test="b:logo"><fo:block>
<fo:external-graphic src="url({b:logo/@uri})"content-width="2.5cm"/>
</fo:block></xsl:if>
</fo:table-cell></fo:table-row>
</fo:table-body></fo:table>
</fo:flow></fo:page-sequence>
</fo:root></xsl:template>
</xsl:stylesheet>
An Introduction to XML and Web Technologies
Querying XML Documents with XQuery
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
84Summary of An Introduction to XML and Web Technologies, updated November 2012.
Topics
� How XML generalizes relational databases� The XQuery language� How XML may be supported in databases
85Summary of An Introduction to XML and Web Technologies, updated November 2012.
XQuery 1.0
� XML documents naturally generalize database relations
� XQuery is the corresponding generalization of SQL
86Summary of An Introduction to XML and Web Technologies, updated November 2012.
From Relations to Trees
87Summary of An Introduction to XML and Web Technologies, updated November 2012.
Only Some Trees are Relations
� They have height two
� The root has an unbounded number of children� All nodes in the second layer (records) have a
fixed number of child nodes (fields)
88Summary of An Introduction to XML and Web Technologies, updated November 2012.
XQuery Design Requirements
� Must have at least one XML syntax and at least one human-readable syntax
� Must be declarative
� Must be namespace aware
� Must coordinate with XML Schema� Must support simple and complex datatypes� Must combine information from multiple
documents� Must be able to transform and create XML trees
89Summary of An Introduction to XML and Web Technologies, updated November 2012.
Relationship to XPath
� XQuery 1.0 is a strict superset of XPath 2.0 � Every XPath 2.0 expression is directly an XQuery
1.0 expression (a query)� The extra expressive power is the ability to
• join information from different sources and • generate new XML fragments
90Summary of An Introduction to XML and Web Technologies, updated November 2012.
Relationship to XSLT
� XQuery and XSLT are both domain-specific
languages for combining and transforming XML data from multiple sources
� They are vastly different in design, partly for historical reasons
� XQuery is designed from scratch, XSLT is an intellectual descendant of CSS
� Technically, they may emulate each other
91Summary of An Introduction to XML and Web Technologies, updated November 2012.
XPath Expressions
� XPath expressions are also XQuery expressions� The XQuery prolog gives the required static
context� The initial context node, position, and size are
undefined
92Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML Expressions
� XQuery expressions may compute new XML
nodes
� Expressions may denote element, character data, comment, and processing instruction nodes
� Each node is created with a unique node identity
� Constructors may be either direct or computed
93Summary of An Introduction to XML and Web Technologies, updated November 2012.
FLWOR Expressions
� Used for general queries:
<doubles>
{ forforforfor $s in fn:doc("students.xml")//student
letletletlet $m := $s/major
wherewherewherewhere fn:count($m) ge 2
orderorderorderorder by $s/@id
returnreturnreturnreturn <double>
{ $s/name/text() }
</double>
}
</doubles>
94Summary of An Introduction to XML and Web Technologies, updated November 2012.
Runtime Type Checks
� XQuery supports typed functions� Type annotations are checked during runtime� A runtime type error is provoked when
• an actual argument value does not match the declared type
• a function result value does not match the declared type
• a valued assigned to a variable does not match the declared type
95Summary of An Introduction to XML and Web Technologies, updated November 2012.
XQueryX
for $t in fn:doc("recipes.xml")/rcp:collection/rcp:recipe/rcp:title
return $t
<xqx:module
xmlns:xqx="http://www.w3.org/2003/12/XQueryX"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/12/XQueryX xqueryx.xsd">
<xqx:mainModule>
<xqx:queryBody>
<xqx:expr xsi:type="xqx:flwrExpr">
<xqx:forClause>
<xqx:forClauseItem>
<xqx:typedVariableBinding>
<xqx:varName>t</xqx:varName>
</xqx:typedVariableBinding>
<xqx:forExpr>
<xqx:expr xsi:type="xqx:pathExpr">
<xqx:expr xsi:type="xqx:functionCallExpr">
<xqx:functionName>doc</xqx:functionName>
<xqx:parameters>
<xqx:expr xsi:type="xqx:stringConstantExpr">
<xqx:value>recipes.xml</xqx:value>
</xqx:expr>
</xqx:parameters>
<xqx:stepExpr>
<xqx:xpathAxis>child</xqx:xpathAxis><xqx:elementTest><xqx:nodeName><xqx:QName>rcp:collection</xqx:QName>
</xqx:nodeName></xqx:elementTest>
</xqx:stepExpr><xqx:stepExpr><xqx:xpathAxis>child</xqx:xpathAxis><xqx:elementTest><xqx:nodeName><xqx:QName>rcp:recipe</xqx:QName>
</xqx:nodeName></xqx:elementTest>
</xqx:stepExpr><xqx:stepExpr><xqx:xpathAxis>child</xqx:xpathAxis>
xqx:nodeName> <xqx:QName>rcp:title</xqx:QName>
</xqx:nodeName></xqx:elementTest>
</xqx:stepExpr></xqx:expr>
</xqx:forExpr></xqx:forClauseItem>
</xqx:forClause><xqx:returnClause><xqx:expr xsi:type="xqx:variable"><xqx:name>t</xqx:name>
</xqx:expr></xqx:returnClause>
</xqx:expr></xqx:elementContent>
</xqx:expr></xqx:queryBody>
</xqx:mainModule></xqx:module>
96Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML Databases
� How can XML and databases be merged?
� Several different approaches:• extract XML views of relations• use SQL to generate XML• shred XML into relational databases
97Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML Shredding
� Each element type is represented by a relation� Each element node is assigned a unique key in
document order� Each element node contains the key of its parent� The possible attributes are represented as fields,
where absent attributes have the null value� Contents consisting of a single character data
node is inlined as a field
98Summary of An Introduction to XML and Web Technologies, updated November 2012.
From XQuery to SQL
� Any XML document can be faithfully represented� This takes advantage of the existing database
implementation� Queries must now be phrased in ordinary SQL
rather than XQuery� But an automatic translation is possible
//rcp:ingredient[@name="butter"]/@amount
select ingredient.amount
from ingredient
where ingredient.name="butter"
99Summary of An Introduction to XML and Web Technologies, updated November 2012.
Summary
� XML trees generalize relational tables� XQuery similarly generalizes SQL
� XQuery and XSLT have roughly the same expressive power
� But they are suited for different application domains: data-centric vs. document-centric
Introduction to XML and Web Technologies
XML Programming
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
101Summary of An Introduction to XML and Web Technologies, updated November 2012.
Objectives
� How XML may be manipulated from general-purpose programming languages
� How streaming may be useful for handling large documents
102Summary of An Introduction to XML and Web Technologies, updated November 2012.
General Purpose XML Programming
� Needed for:• domain-specific applications• implementing new generic tools
� Important constituents:• parsing XML documents into XML trees• navigating through XML trees• manipulating XML trees• serializing XML trees as XML documents
103Summary of An Introduction to XML and Web Technologies, updated November 2012.
The JDOM Framework
� An implementation of generic XML trees in Java� Nodes are represented as classes and interfaces
� DOM is a language-independent alternative
104Summary of An Introduction to XML and Web Technologies, updated November 2012.
JDOM Classes and Interfaces
� The abstract class Content has subclasses:• Comment
• DocType
• Element
• EntityRef
• ProcessingInstruction
• Text
� Other classes are Attribute and Document� The Parent interface describes Document and Element
105Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML Data Binding
� XML data binding provides tools to:• map schemas to class declarations• automatically generate unmarshalling code• automatically generate marshalling code• automatically generate validation code
106Summary of An Introduction to XML and Web Technologies, updated November 2012.
Binding Compilers
� Which schemas are supported?� Fixed or customizable binding?� Does roundtripping preserve information?� What is the support for validation?� Are the generated classes implemented by some
generic framework?
107Summary of An Introduction to XML and Web Technologies, updated November 2012.
Streaming XML
� JDOM and JAXB keeps the entire XML tree in memory
� Huge documents can only be streamed:• movies on the Internet• Unix file commands using pipes
� What is streaming for XML documents?
� The SAX framework has the answer...
108Summary of An Introduction to XML and Web Technologies, updated November 2012.
Parsing Events
� View the XML document as a stream of events:• the document starts• a start tag is encountered• an end tag is encountered• a namespace declaration is seen• some whitespace is seen• character data is encountered• the document ends
� The SAX tool observes these events � It reacts by calling corresponding methods
specified by the programmer
109Summary of An Introduction to XML and Web Technologies, updated November 2012.
SAX Filters
� A SAX application may be turned into a filter
� Filters may be composed (as with pipes)� A filter is an event handler that may pass events
along in the chain
110Summary of An Introduction to XML and Web Technologies, updated November 2012.
Streaming Transformations
� SAX allows the programming of streaming applications "by hand"
� XSLT allows high-level programming of applications
� A broad spectrum of these could be streamed� But XSLT does not allow streaming...
� Solution: use a domain-specific language for streaming transformations
111Summary of An Introduction to XML and Web Technologies, updated November 2012.
STX
� STX is a variation of XSLT suitable for streaming• some features are not allowed• but every STX application can be streamed
� The differences reflect necessary limitations in the control flow
112Summary of An Introduction to XML and Web Technologies, updated November 2012.
Differences with XSLT
� STX has most XSLT functions.� apply-templates is the main problem:
• allows processing to continue anywhere in the tree• requires moving back and forth in the input file• or storing the whole document
� mutable variables to accumulate information
113Summary of An Introduction to XML and Web Technologies, updated November 2012.
STXPath
� A subset of XPath 2.0 used by STX
� STXPath expressions:• look like restricted XPath 2.0 expressions• evaluate to sequences of nodes and atomic values• but they have a different semantics
114Summary of An Introduction to XML and Web Technologies, updated November 2012.
STXPath Syntax
� Must use abbreviated XPath 2.0 syntax� The axes following and preceding are not
available� Extra node tests: cdata() and doctype()
115Summary of An Introduction to XML and Web Technologies, updated November 2012.
Transformation Sheets
� STX use transform instead of stylesheet� apply-templates is not allowed
� Processing is defined by:• process-children
• process-siblings
• process-self
� Only a single occurrence of process-childrenis allowed (to enable streaming)
116Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML in Programming Languages
� SAX: programmers react to parsing events� JDOM: a general data structure for XML trees� JAXB: a specific data structure for XML trees
� These approaches are convenient� But no compile-time guarantees:
• about validity of the constructed XML (JDOM, JAXB)• well-formedness of the constructed XML (SAX)
An Introduction to XML and Web Technologies
Web Services
Anders Møller & Michael I. Schwartzbach 2006 Addison-Wesley
118Summary of An Introduction to XML and Web Technologies, updated November 2012.
Topics
� SOAP – exchanging XML messages on a network
� WSDL – describing interfaces of Web services
� UDDI – managing registries of Web services
119Summary of An Introduction to XML and Web Technologies, updated November 2012.
What is a Web Service?
� Web Service:“software that makes services available on a network
using technologies such as XML and HTTP”
� Service-Oriented Architecture (SOA):“development of applications from distributed
collections of smaller loosely coupled service
providers”
120Summary of An Introduction to XML and Web Technologies, updated November 2012.
Why a New Framework?
� CORBA, DCOM, Java/RMI, ... already exist
� XML+HTTP: platform neutral, widely accepted and utilized
121Summary of An Introduction to XML and Web Technologies, updated November 2012.
What do We Need?
� We already know how to• represent information with XML• communicate with HTTP
� Fault tolerance� Intermediaries� RPC� Interface descriptions� Locating services� ...
ad hoc solutionsvs.
use of standards?
122Summary of An Introduction to XML and Web Technologies, updated November 2012.
XML-RPC
� A (too) simple RPC protocol based on XML and HTTP.
� Used a lot for blogs – trackbacks and desktop authoring.
123Summary of An Introduction to XML and Web Technologies, updated November 2012.
Web Service Standards
� SOAP� WSDL� UDDI
� WS-*• WS-Addressing• WS-ReliableMessaging• WS-Security, WS-Policy• WS-Resource• WS-Choreography (WS-CDL)• WS-BPEL (aka. BPEL4WS)• WS-Coordination, WS-AtomicTransaction, WS-CAF• ...
SERVICEREGISTRY
SERVICEUSER
SERVICEPROVIDER
publish
messages
find
UNDER DEVELOPMENT!
124Summary of An Introduction to XML and Web Technologies, updated November 2012.
SOAP
� Used to be “Simple Object Access Protocol”, but no longer an acronym...
� Processing Model� Data Representation and RPC� Binding to transport protocols (e.g. HTTP)
125Summary of An Introduction to XML and Web Technologies, updated November 2012.
The SOAP Processing Model
SOAP Envelope:
INITIALSENDER
INTERMEDIARY
INTERMEDIARY
INTERMEDIARY
ULTIMATERECEIVER
<Envelope xmlns="http://www.w3.org/2003/05/soap-envelope"><Header>...</Header><Body>...</Body>
</Envelope>
126Summary of An Introduction to XML and Web Technologies, updated November 2012.
Envelope Headers
� Encryption information� Access control� Routing� Auditing� Data extensions� ...
127Summary of An Introduction to XML and Web Technologies, updated November 2012.
A SOAP Message
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"xmlns:w="http://www.widget.inc/shop"xmlns:n="http://notaries.example.org">
<env:Header><w:ticket>54B42CF401A</w:ticket><n:token><n:value>32158546</n:value><n:issuer>http://notarypublic.example.com</n:issuer>
</n:token></env:Header><env:Body><w:buy><w:product>light gadget</w:product><w:amount>430</w:amount>
</w:buy></env:Body>
</env:Envelope>
128Summary of An Introduction to XML and Web Technologies, updated November 2012.
Summary of SOAP
� A transport neutral protocol for XML data interchange (but focusing on HTTP)
� Processing model (envelopes, intermediaries, ...)� SOAP Encoding� SOAP RPC� Protocol Bindings
� Foundation of WS-*
129Summary of An Introduction to XML and Web Technologies, updated November 2012.
WSDL
� Web Services Description Language
� Functionality? (operations, types of arguments)� Access? (data encoding, communication protocols)� Location?
– Necessary information for writing clients– Automatic generation of stubs and skeletons
130Summary of An Introduction to XML and Web Technologies, updated November 2012.
Structure of a WSDL Description
<description xmlns="http://www.w3.org/2004/08/wsdl"targetNamespace="..." ...>
<types><!-- XML Schema description of types being usedXML Schema description of types being usedXML Schema description of types being usedXML Schema description of types being used
in messagesin messagesin messagesin messages -->...
</types><interface name="..."><!-- list of operations and their input and outputlist of operations and their input and outputlist of operations and their input and outputlist of operations and their input and output -->...
</interface><binding name="..." interface="..." type="..."><!-- message encodings and communication protocolsmessage encodings and communication protocolsmessage encodings and communication protocolsmessage encodings and communication protocols -->...
</binding><service name="..." interface="..."><!-- combination of an interface, a binding,combination of an interface, a binding,combination of an interface, a binding,combination of an interface, a binding,
and a service locationand a service locationand a service locationand a service location -->...
</service></description>
131Summary of An Introduction to XML and Web Technologies, updated November 2012.
Interface Descriptions
� In-Only� Robust In-Only� In-Out� In-Optional-Out� Out-Only� Robust Out-Only� Out-In� Out-Optional-In <operation name="getRecipesOperation"
pattern="http://www.w3.org/2004/03/wsdl/in-out"><input messageLabel="In"
element="t:getRecipes"/><output messageLabel="Out"
element="t:collection"/></operation>
132Summary of An Introduction to XML and Web Technologies, updated November 2012.
Binding Descriptions
� Encodings and protocols for an interface
� Predefined: • SOAP binding (often using SOAP’s HTTP
binding)• HTTP binding (“raw HTTP”)
133Summary of An Introduction to XML and Web Technologies, updated November 2012.
Summary of WSDL
Description of interfaces of Web services:• message types• operations• encodings and communication protocols• location
134Summary of An Introduction to XML and Web Technologies, updated November 2012.
UDDI
� Universal Description, Discovery, and Integration
� static / dynamic discovery� public / private registries
SERVICEREGISTRY
SERVICEUSER
SERVICEPROVIDER
publish
messages
find
135Summary of An Introduction to XML and Web Technologies, updated November 2012.
THE ENDOr just one more
136Summary of An Introduction to XML and Web Technologies, updated November 2012.
Things you now should know
� The XML (meta) language• HTML (SGML) vs XML
� XPath for navigating XML trees� Schema languages (to describe a ML)
• DTD, XML Schema, ...
� XSLT for transforming XML documents� XQuery for querying XML documents� XML programming – DOM and SAX� Web Services
• SOAP, WSDL, UDDI