an introduction to xml
DESCRIPTION
eXtensible Markup Language version 1.0 Recommendation, February 1998. An Introduction to XML. Patrice Bonhomme & Laurent Romary Lucid-ITLORIA [email protected] [email protected]. Objectives. Understanding the basic concepts of XML Elements, attributes and content DTD (, Schemas) - PowerPoint PPT PresentationTRANSCRIPT
An Introduction to XML
Patrice Bonhomme & Laurent RomaryLucid-IT LORIA
[email protected]@loria.fr
eXtensible Markup Language version 1.0Recommendation, February 1998
Objectives
Understanding the basic concepts of XML Elements, attributes and content DTD (, Schemas) Namespaces
An overview of the main associated recommendations: XML path language (XPath) XML pointers and links (Xpointer and XLink) The transformation language of XSL (eXtensible Stylesheet Language)
XML in the document chain
Edition
XML
Data
Transformation
XML
XSL/XSLT
Data processing
Consultation
HTMLXHTML
User perspective
Conception
DTD/Schema
Structures
A quick historical overview
1986 SGML (Standard Generalized Markup Language) ISO standard: ISO:8879:1986
1987 TEI (Text Encoding Initiative)
1990 HTML 1.0 (HyperText Markup Language)
1997/1998 XML 1.0 (eXtensible Markup Language)
What XML is:
XML: eXtended Markup Language A W3C (World Wide Web Consortium)
Recommendation A meta-language: it allows one to define his
own markup language A simplification of the SGML standard
SGML was intended to represent the “logical” structure of a document
HTML was conceived as an application of SGML
A simplified SGML
An XML document is an SGML document With some slight (but essential) differences...
XML has the expressive power of SGML without its complexity
Opens the door to the transmission of structured documents on the web Databases also entered the game...
What can we do with it?
Data modeling (in complement to UML for instance)
Publication of structured data on the web Separation of the logical structure of a
document from its actual presentation Distributed applications (cf. well-formed vs.
valid documents) Integrating data from heterogeneous sources
Why can’t we avoid it? Simplicity, which makes it simple to integrate into any
kind of application XML specifications = 36 pages SGML standard, ISO-8879 = 250 pages
Wide variety of application already implemented Industry: Publishing, Databases, Cataloguing, e-business etc. Science, research: genomics, astronomy, maths, etc.
Consequence: a lot of software available: editors, parsers, bridges from and to
existing editing environment or DBMSs
From HTML to XML - 1 A simple HTML document:
<B> Patrice Bonhomme </B><P>[email protected] <BR>tél : 03 83 59 30 52 <BR>fax : 03 83 41 30 79 <BR>équipe : Langue et Dialogue (<I>LORIA</I>)<BR>
From HTML to XML - 2
The XML way:<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd"><!-- Un membre du LORIA --><MEMBRE TYPE="IE" ID="M28"><NOM> BONHOMME </NOM><PRENOM> Patrice </PRENOM><MEL> [email protected] </MEL><TEL> 03 83 59 30 52 </TEL><FAX> 03 83 41 30 79 </FAX><EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
Some properties of XML
Emphasis should be put on the “semantics” of a document
Underlying model: tree structure Possibility to imagine a script language to
access any part of an XML documente.g.: DB/MEMBRE[28]/MEL/text()
XML supports Unicode character encodings
Elements and their content
<MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> <NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM><MEL> [email protected] </MEL><TEL> 03 83 59 30 52 </TEL><FAX> 03 83 41 30 79 </FAX><EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
Opening tag
Closing tagTextual content
Empty element
Element
Elements and their attribute
<MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> <NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM><MEL> [email protected] </MEL><TEL> 03 83 59 30 52 </TEL><FAX> 03 83 41 30 79 </FAX><EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
Attribut name Attribut value
Other features XML declaration
<?xml version=“1.0"?><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Commentaries<!-- ceci est un commentaire -->
CDATA section<![CDATA[Langue & Dialogue]]>
Processing instruction (application specific)<?edit line="wrap"?>
From one document to a class…How do I know the
structure of my
document?
How may I share this
structure with others?
Document Type Definition
Expresses constraints on: Allowed element and attribute names Possible content of a given element (“content
model”) To which elements a given attribute can be
attached Similar to the traditional SGML approach, but:
Simplified syntax The DTD is optional for a document
Example<!ELEMENT
MEMBRE (LOGIN, NOM?, PRENOM?,MEL, TEL+, FAX*, EQUIPE)>
<!ELEMENT LOGIN EMPTY><!ATTLIST LOGIN ID ID #REQUIRED>
<!ELEMENT NOM (#PCDATA)>...<!ENTITY W3C "World Wide Web Consortium"><!ENTITY chap1 SYSTEM "http://…/chapitre-1.xml"><!ENTITY img2 SYSTEM "image2.gif" NDATA gif>...
Using a DTD<!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd"><MEMBRE TYPE="IE" ID="M28"> … </MEMBRE>
<!DOCTYPE MEMBRE [<!ELEMENT MEMBRE … >…]><MEMBRE TYPE="IE" ID="M28"> … </MEMBRE>
Valid vs. Well-formed
Well-formed documents Syntactic bracketing is preserved, without a DTD Empty element:
<toto></toto> = <toto/> Valid documents
With a DTD (à la SGML) Essential difference with SGML
Extracting and re-using document fragments One usually produce valid document and distribute well-
formed ones
XML namespaces Objectives: avoid conflicts between element and
attribute names coming from various sources Composite documents XSLT instructions, Schema declarations
Declaration:<DOC xmlns:mml="http://www.w3.org/Math/MathML/" xmlns="http://www.ua99.net/DOC/1.0"> <P>blah blah : <mml:fn mml:definitionURL="mydef.xml"> … </mml:fn> re blah blah</P></DOC>
Reserved namespaces The xml: prefix is reserved by the W3C for specific
attributes:<title xml:space="default">...</title><p xml:lang="FR">…</p>
XPath
XML Path Language 1.0 REC 29012000 Wide purpose syntax for addressing sub-parts of an
XML document Joint specification used by XML Pointers
(XPointer recommendation) and the XSLT transformation language
Allows one to access, select and filter XML fragments (cf. Tree representation of an XML document)
Addressing nodes in XPath
Absolute addressing Given: a URL id(M28), root()
Relative addressing along axes Given: a node ancestor, child descendant psibling, fsibling
An XML document represents a hierarchical structure
LOGINid="bonhomme"
BONHOMME
NOM ...
Langue et Dialogue
EQUIPELAB="LORIA"
MEMBRETYPE="IE" ID="M28"
The only view youshould ever, ever haveof an XML document
XPath - Exemples<DB> <MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> ...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE> <MEMBRE TYPE="CR" ID="M14"> <LOGIN ID="romary"/> ... </MEMBRE></DB>
/DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text()/DB/MEMBRE[2]
/DB/MEMBRE/LOGIN[@ID=‘romary’]/../@ID
/ ou /DB /DB/MEMBRE
XPointer
Cf. HTML, anchors are needed:<A NAME="TOTO">http://www.titi.fr/index.html#toto
In XML, pointers can directly address a document component:http://…/doc.xml#xptr(id(M28))http://…/doc.xml#xptr(/DB/MEMBRE[28]/MEL)
Advantage: no need to modify the target document (notion of primary source)
XLink In HTML: the elements which may carry links are
known:<A>, <IMG>, ...
In XML: any element may carry a simple or complex link This is done by using pre-defined attributes:<a xlink:type="simple" xlink:href="http://www.w3.org/">W3C</a>
Visualizing XML documents
Basically, an XML document does not provide any information about its presentation
Visualizing a document may depend on the target audience, device etc.
Stylesheets: Casdading Style Sheets (CSS 1 et 2) Extensible Style Language (XSL) >> XSLT
eXtensible Style Language Describes the way a
document will be shown, printed or verbalized…XML
XSL+
XSL: a two-fold proposal
XSL = Transformations + Visualizing properties XSLT : Transformation of XML documents
Allows one to transform an XML document into another XML document
Use this to produce well-formed (!) HTML documents XSL FO: formatting XML data
FO = Formatting Objects Is supposed to be application independent (Word/RTF, PS,
PDF, MIF, …) Not a recommendation yet :-(
General structure of an XSL document
<?xml version="1.0"?><xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
… <xsl:template match="/"> … </xsl:template>
<xsl:template match="NOM"> … </xsl:template></xsl:stylesheet>
Declarative approach
Sequence of rules (templates) specifying: The pattern (XPath) of nodes to which the rule can
be applied Actions to be undertaken:
Elements to be generated in the target document Selection of the elements to be further explored in the
source document Additional functionalities: testing, sorting, etc.
A simple rule
<xsl:template match='/DB/MEMBRE/NOM'> <B> <xsl:apply-templates/> </B></xsl:template>
pattern (XPath)
The content of <B>will be the one
produced by the instruction
HTML element to be produced
Creating a HTML core document<xsl:template match=“/”> <HTML> <HEAD> <TITLE>My directory</TITLE>
</HEAD> <BODY> <xsl:apply-templates/> </BODY></HTML>
</xsl:template>
Selecting the nodes to be explored
<xsl:template match=“MEMBRE”> <P>
<xsl:apply-templatesselect=“NOM”/>
<xsl:text> - </xsl:text>
<xsl:apply-templatesselect=“EQUIPE”/>
</P></xsl:template>
Conclusion
XML - a practical format (protocol) Next steps:
Sharing DTD, resources tools Generic mechanisms for handling families of
documents (cf. Nancy’s presentation)
References
www.oasis-open.org/cover/www.w3.org/XML/www.w3.org/TRwww.w3.org/TR/REC-xmlbabel.alis.com/web_ml/xml/REC-xml.fr.htmlwww.xml.comwww.xmlinfo.comxml.apache.org