introduction to xml john arnett, msc standards modeller information and statistics division...

33
Introduction to XML John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: 0131 551 8073 (x2073) mailto:[email protected] http://isdscotland.org/xml

Upload: zoe-gibbs

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to XMLJohn Arnett, MScStandards ModellerInformation and Statistics DivisionNHSScotlandTel: 0131 551 8073 (x2073)mailto:[email protected]://isdscotland.org/xml

Contents

• What is XML?• Anatomy of an XML Document• Conformance and Validation• Summary• Find Out More

What is XML?

– a programming language– a software panacea– an object-oriented technology– HTML with funny tags– a replacement for HTML… but it

is re-shaping publishing on the web

• XML is not…

What is XML?

– Meta-markup language derived from SGML (Standard Generalised Markup Language)

– Open Standard, currently XML 1.0 2nd edition (W3C Recommendation 6 October 2000)

• Stands for Extensible Markup Language

What is XML?

– XML is the universal format for structured documents and data on the Web

– A data object is an XML document if it is well-formed, as defined in [the W3C] specification (more on this later)

• W3C says

What is XML?

• Data Content and Presentation Sample dataset

1

0

1

0

SEX

15061976SarahJackson147678

12111979LesleyMartin 111672

23081983AlisonMcKenzie198457

06011971IanJones134376

DOBFORENAMESURNAMEID

Flat file, database, spreadsheet, etc

• Record – data oriented structure

111672 Martin Lesley 0 12111979

What is XML?

Structured Searchable Easy to understand Portable

What is XML?

• HTML – document oriented structure

<h1>Record Id: <font color="red">11672</font></h1><table><colgroup><col align="left"></colgroup>

<tr><th>Surname:</th><td>Martin</td></tr><tr><th>Given Name:</th><td>Lesley</td></tr><tr><th>Sex:</th><td>Male</td></tr><tr><th>Date of Birth:</th><td>12 November 1979</td></tr>

</table>

Record Id: 11672Surname: MartinGiven Name: LesleySex: MaleDate of Birth: 12 November 1979

Easy to understand Portable Structured Searchable

What is XML?

• XML to the rescue!<Record recordId=“11672">

<Surname>Martin</Surname><GivenName>Lesley</GivenName><Sex>M</Sex><DateOfBirth>

<Day>12</Day><Month>11</Month><Year>1979</Year> </DateOfBirth>

</Record>

Easy to understand Portable Structured Searchable

What is XML?

– Text based– Open standards– Widely used

• HTML and XML are…

What is XML?

– Structured– Separates data from presentation– Self-describing – Searchable– Extensible

•i.e. any number of tags allowed

• But XML also…

Anatomy of an XML Document

– character data•tab, carriage return and line feed

•Unicode characters– markup

• XML documents consist of text

Anatomy of an XML Document

• Markup<?xml version="1.0" encoding="UTF-8"?><Message>

<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>

</Message>

– start-, end- and empty element tags•tag names are case sensitive!

– entity and character references– comments

Anatomy of an XML Document

• Character data<?xml version="1.0" encoding="UTF-8"?><Message>

<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>

</Message>

– Reserved characters&, <, >,‘ and “

Anatomy of an XML Document

• Declaration<?xml version="1.0" encoding="UTF-8"?><Message>

<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>

</Message>

– Optional first line of markup (but W3C recommended)

– Used to match documents to parsers

Anatomy of an XML Document

• Root Element<?xml version="1.0" encoding="UTF-8"?><Message>

<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>

</Message>

– Uniquely named element – Contains all the data and links

to other documents

Anatomy of an XML Document

• Elements<Book>XML Bible

<Price>24.99</Price><img src=“book.gif"/><Author>E.R. Harold</Author><Publisher>J. Forbes</Publisher>

</Book>

– Define the content of the XML document

– May contain other elements, character data or can be empty

Anatomy of an XML Document

• Attributes<BookCatalog Subject="XML">

<Book Title="XML Bible" Price="24.99“/>

<Book Title="XML How To Program" Price=“19.99“/>

<Book Title=“Definitive XML Schema“ Price=“44.99“/>

</BookCatalog>

– Add data about the elements

Anatomy of an XML Document

– Built-in entities& = &amp; “ = &quot; < = &lt; > = &gt; ‘ = &apos;

• Handling reserved characters

–CDATA Sections<CodeSnippet>

<![CDATA[if(this->getX() < 5 && values[0] => 10) cerr << "out of range";]]>

</CodeSnippet>

Anatomy of an XML Document

• Namespaces– Preventing naming collisions<order xmlns:cust="http://www.example.com/custDetails“ xmlns:book="http://www.example.com/bookDetails" xmlns="http://www.example.com/order">

<cust:title>Dr</cust:title>

<cust:name>Peter Parker</cust:name>

<book:title>White Teeth</book:title><book:price>5.99</book:price>

<orderNumber>AYT2379</orderNumber>

</order>

Conformance and Validation

– One root element– Start and end tags match <Tag>content</Tag>

– Empty elements are terminated as <Tag/>

– Tags are correctly nested <Parent><Child></Child></Parent>

– All attributes enclosed in “quotes”

• All XML processors must check well-formedness constraints

Conformance and Validation

– specified in Document Type Definitions (DTDs) or Schemas

– a valid XML document must be well-formed

– a well-formed document need not necessarily be valid

• Validating XML processors check against validity constraints

Document Type Definitions

• DTD syntax able to specify

<!ATTLIST Product EffDate CDATA #IMPLIED>

– Element attributes

•limited number of data types•default and fixed attribute values

<!ELEMENT Product (Name, Size?)><!ELEMENT Name (#PCDATA)><!ELEMENT Size (#PCDATA)>

– Structure and order of child elements

Document Type Definitions

– Easy to understand and implement– Lightweight alternative to schemas– But…

•use non-XML syntax•only limited support for data typing and namespaces

•difficult to extend

• DTD’s

Schemas

– Uses XML syntax– Provides built-in and supports

user-defined data types– Supports namespaces– Provides several extensibilty

mechanisms

• W3C Schema

Schemas

• Schemas therefore more flexible…<xs:element name="Product">

<xs:complexType><xs:sequence>

<xs:element name=“Name" type="xs:string"/><xs:element name=“Size" type="xs:positiveInteger”

minOccurs="0"/></xs:sequence><xs:attribute name=“EffDate" type="xs:date"/>

</xs:complexType></xs:element>

<!ELEMENT Product (Name, Size?)><!ELEMENT Name (#PCDATA)><!ELEMENT Size (#PCDATA)><!ATTLIST Product EffDate CDATA #IMPLIED>

• but harder to understand than DTD’s

In Summary…

• A language for describing markup languages

• Extensible, ie. define own tags • Readable, structured and self

describing• Documents must be well-formed• Documents may be validated

using DTD’s and/or Schemas

Find Out More

• World Wide Web Consortium– www.w3.org

• W3C XML v1.0 Specification– http://www.w3.org/TR/REC-xml

Find Out More

• The XML Industry Portal– www.xml.org

• O’Reilly XML site– www.xml.com

• XML Cover Pages– www.oasis-open.org/cover/

• Café Con Leche– www.ibiblio.org/xml/

Find Out More

• Scottish Health and Community Care XML Steering Group– www.isdscotland.org/xml

XML Tools

• XSV - Open Source XML Schema Validator– www.ltg.ed.ac.uk/~ht/xsv-

status.html• MSXML 4.0

– www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42

XML Tools

• XML Spy 2004 IDE– www.altova.com/

products_ide.html • Free XML Tools and Software

– www.garshol.priv.no/download/xmltools/

Printed Sources

• Numerous printed sources – for more information visit– Charles F. Goldfarb's www

.xmlbooks.com– www.amazon.com