introduction to xml john arnett, msc standards modeller information and statistics division...
TRANSCRIPT
Introduction to XMLJohn Arnett, MScStandards ModellerInformation and Statistics DivisionNHSScotlandTel: 0131 551 8073 (x2073)mailto:[email protected]://isdscotland.org/xml
Contents
• What is XML?• Anatomy of an XML Document• Conformance and Validation• Summary• Find Out More
What is XML?
– a programming language– a software panacea– an object-oriented technology– HTML with funny tags– a replacement for HTML… but it
is re-shaping publishing on the web
• XML is not…
What is XML?
– Meta-markup language derived from SGML (Standard Generalised Markup Language)
– Open Standard, currently XML 1.0 2nd edition (W3C Recommendation 6 October 2000)
• Stands for Extensible Markup Language
What is XML?
– XML is the universal format for structured documents and data on the Web
– A data object is an XML document if it is well-formed, as defined in [the W3C] specification (more on this later)
• W3C says
What is XML?
• Data Content and Presentation Sample dataset
1
0
1
0
SEX
15061976SarahJackson147678
12111979LesleyMartin 111672
23081983AlisonMcKenzie198457
06011971IanJones134376
DOBFORENAMESURNAMEID
Flat file, database, spreadsheet, etc
• Record – data oriented structure
111672 Martin Lesley 0 12111979
What is XML?
Structured Searchable Easy to understand Portable
What is XML?
• HTML – document oriented structure
<h1>Record Id: <font color="red">11672</font></h1><table><colgroup><col align="left"></colgroup>
<tr><th>Surname:</th><td>Martin</td></tr><tr><th>Given Name:</th><td>Lesley</td></tr><tr><th>Sex:</th><td>Male</td></tr><tr><th>Date of Birth:</th><td>12 November 1979</td></tr>
</table>
Record Id: 11672Surname: MartinGiven Name: LesleySex: MaleDate of Birth: 12 November 1979
Easy to understand Portable Structured Searchable
What is XML?
• XML to the rescue!<Record recordId=“11672">
<Surname>Martin</Surname><GivenName>Lesley</GivenName><Sex>M</Sex><DateOfBirth>
<Day>12</Day><Month>11</Month><Year>1979</Year> </DateOfBirth>
</Record>
Easy to understand Portable Structured Searchable
What is XML?
– Structured– Separates data from presentation– Self-describing – Searchable– Extensible
•i.e. any number of tags allowed
• But XML also…
Anatomy of an XML Document
– character data•tab, carriage return and line feed
•Unicode characters– markup
• XML documents consist of text
Anatomy of an XML Document
• Markup<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– start-, end- and empty element tags•tag names are case sensitive!
– entity and character references– comments
Anatomy of an XML Document
• Character data<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Reserved characters&, <, >,‘ and “
Anatomy of an XML Document
• Declaration<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Optional first line of markup (but W3C recommended)
– Used to match documents to parsers
Anatomy of an XML Document
• Root Element<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Uniquely named element – Contains all the data and links
to other documents
Anatomy of an XML Document
• Elements<Book>XML Bible
<Price>24.99</Price><img src=“book.gif"/><Author>E.R. Harold</Author><Publisher>J. Forbes</Publisher>
</Book>
– Define the content of the XML document
– May contain other elements, character data or can be empty
Anatomy of an XML Document
• Attributes<BookCatalog Subject="XML">
<Book Title="XML Bible" Price="24.99“/>
<Book Title="XML How To Program" Price=“19.99“/>
<Book Title=“Definitive XML Schema“ Price=“44.99“/>
</BookCatalog>
– Add data about the elements
Anatomy of an XML Document
– Built-in entities& = & “ = " < = < > = > ‘ = '
• Handling reserved characters
–CDATA Sections<CodeSnippet>
<![CDATA[if(this->getX() < 5 && values[0] => 10) cerr << "out of range";]]>
</CodeSnippet>
Anatomy of an XML Document
• Namespaces– Preventing naming collisions<order xmlns:cust="http://www.example.com/custDetails“ xmlns:book="http://www.example.com/bookDetails" xmlns="http://www.example.com/order">
<cust:title>Dr</cust:title>
<cust:name>Peter Parker</cust:name>
<book:title>White Teeth</book:title><book:price>5.99</book:price>
<orderNumber>AYT2379</orderNumber>
</order>
Conformance and Validation
– One root element– Start and end tags match <Tag>content</Tag>
– Empty elements are terminated as <Tag/>
– Tags are correctly nested <Parent><Child></Child></Parent>
– All attributes enclosed in “quotes”
• All XML processors must check well-formedness constraints
Conformance and Validation
– specified in Document Type Definitions (DTDs) or Schemas
– a valid XML document must be well-formed
– a well-formed document need not necessarily be valid
• Validating XML processors check against validity constraints
Document Type Definitions
• DTD syntax able to specify
<!ATTLIST Product EffDate CDATA #IMPLIED>
– Element attributes
•limited number of data types•default and fixed attribute values
<!ELEMENT Product (Name, Size?)><!ELEMENT Name (#PCDATA)><!ELEMENT Size (#PCDATA)>
– Structure and order of child elements
Document Type Definitions
– Easy to understand and implement– Lightweight alternative to schemas– But…
•use non-XML syntax•only limited support for data typing and namespaces
•difficult to extend
• DTD’s
Schemas
– Uses XML syntax– Provides built-in and supports
user-defined data types– Supports namespaces– Provides several extensibilty
mechanisms
• W3C Schema
Schemas
• Schemas therefore more flexible…<xs:element name="Product">
<xs:complexType><xs:sequence>
<xs:element name=“Name" type="xs:string"/><xs:element name=“Size" type="xs:positiveInteger”
minOccurs="0"/></xs:sequence><xs:attribute name=“EffDate" type="xs:date"/>
</xs:complexType></xs:element>
<!ELEMENT Product (Name, Size?)><!ELEMENT Name (#PCDATA)><!ELEMENT Size (#PCDATA)><!ATTLIST Product EffDate CDATA #IMPLIED>
• but harder to understand than DTD’s
In Summary…
• A language for describing markup languages
• Extensible, ie. define own tags • Readable, structured and self
describing• Documents must be well-formed• Documents may be validated
using DTD’s and/or Schemas
Find Out More
• World Wide Web Consortium– www.w3.org
• W3C XML v1.0 Specification– http://www.w3.org/TR/REC-xml
Find Out More
• The XML Industry Portal– www.xml.org
• O’Reilly XML site– www.xml.com
• XML Cover Pages– www.oasis-open.org/cover/
• Café Con Leche– www.ibiblio.org/xml/
Find Out More
• Scottish Health and Community Care XML Steering Group– www.isdscotland.org/xml
XML Tools
• XSV - Open Source XML Schema Validator– www.ltg.ed.ac.uk/~ht/xsv-
status.html• MSXML 4.0
– www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42
XML Tools
• XML Spy 2004 IDE– www.altova.com/
products_ide.html • Free XML Tools and Software
– www.garshol.priv.no/download/xmltools/
Printed Sources
• Numerous printed sources – for more information visit– Charles F. Goldfarb's www
.xmlbooks.com– www.amazon.com