xml1 © 2015the university of greenwich 1 xml 1 introduction, syntax, dtds and xsds dr kevin mcmanus...

76
© 2015 the University of Greenwich 1 XML1 XML 1 Introduction, Syntax, DTDs and XSDs Dr Kevin McManus http://staffweb.cms.gre.ac.uk/~mk05/web/ XML/1/

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

© 2015 the University of Greenwich 1

XML1

XML 1

Introduction, Syntax, DTDs and XSDs

Dr Kevin McManushttp://staffweb.cms.gre.ac.uk/~mk05/web/XML/1/

© 2015 the University of Greenwich 2

XML1

XML Basics This lecture aims to cover:

What is XML and why it is significant Content versus presentation Displaying XML documents What XML is actually used for Well-formed XML documents Further XML syntax Valid XML documents Introduction to Schemas, DTD and XSD Namespaces Technologies related to XML

© 2015 the University of Greenwich 3

XML1

What is XML?1. eXtensible Markup Language

HTML tags and attributes are restricted to those that the browser has been coded to recognise

XML is extensible because tags and attributes can be invented to suit any application e.g.

<book> <ISBN>1-34565-79-8</ISBN> <date>2011-07-03</date> <title> Hamsters and other Furry Rodents </title></book>

© 2015 the University of Greenwich 4

XML1

What is XML?2. A simplified version of SGML (Standardised General

Markup Language) a language for defining mark-up languages XML and HTML are related via SGML

hence the family likeness

SGML

XML

XHTML Other XML languages

HTML Other SGML languages

is a subset ofis defined using

© 2015 the University of Greenwich 5

XML1

What is XML? SGML is too complex for

the average human to cope with easy automatic processing

Generic tools for manipulating SGML documents are expensive, large and complex

XML is designed for ease of use easy automatic processing

Generic tools for manipulating XML documents are relatively cheap and efficient

© 2015 the University of Greenwich 6

XML1

What is XML?

3. A W3C standard http://www.w3.org/XML/ the core specification is XML 1.0

4. A pervasive technology but pervasive things can be a bit difficult to get a

handle on

5. More than just hype although it has been heavily hyped

© 2015 the University of Greenwich 7

XML1

W3C Design Goals of XML1. XML shall be straightforwardly usable over the Internet.2. XML shall support a wide variety of applications.3. XML shall be compatible with SGML.4. It shall be easy to write programs which process XML documents.5. The number of optional features in XML is to be kept to the absolute

minimum, ideally zero.6. XML documents should be human-legible and reasonably clear.7. The XML design should be prepared quickly.8. The design of XML shall be formal and concise.9. XML documents shall be easy to create.10. Terseness in XML markup is of minimal importance.

http://www.w3.org/TR/REC-xml/#sec-origin-goals

© 2015 the University of Greenwich 8

XML1

Why XML? HTML tags and attributes are pre-defined in

the HTML (XHTML) standard and describe presentation

XML tags and attributes are defined to describe content and structure XML is used to model data

XML separates content from presentation

© 2015 the University of Greenwich 9

XML1

Separation of Content and Presentation

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles </title></book>

<tr> <td>1-56543-87-9</td> <td>1998-03-07</td> <td>Frogs and Toads of the British Isles </td></tr>

content and meaning is clear

content and meaning ?????

presentation in a web browser is defined

presentation ?????

© 2015 the University of Greenwich 10

XML1

Separation of Content and Presentation

web browser on a PC

app printed paper

mobile phoneaudio

Presentation can be rendered differently for different devices and needs

assistive technology

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles </title></book>

© 2015 the University of Greenwich 11

XML1

Separation of Content and Presentation

Enables meaningful searches

XML search engine

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles </title></book>

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles </title></book>

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles </title></book>

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles </title></book>

query:FIND book WHERE ISBN=

© 2015 the University of Greenwich 12

XML1

Book publisherBook retailer

Separation of Content and Presentation

A format for data exchange and communication

SQL Server on Windoze

Oracle on UNIXXML

© 2015 the University of Greenwich 13

XML1

Separation of Content and Presentation

An alternative to Database technology? Not really, XML is not a replacement for a RDBMS

but may be used in places where a full RDBMS may be overkill

XML schemas are well established but the development of XML ontologies continues

e.g. OWL, DAML, OIL

Data storage

An ontology is a formal naming and definition of the types, properties, and interrelationships of the entities that exist in a particular domain of discourse

Source: Wikipedia

XML1

Displaying XML documents XML documents define content but not presentation Some browsers can conveniently display XML

documents as a hierarchical structure

© 2015 the University of Greenwich 15

XML1

Displaying XML documents So how do you tell browsers (or other presentation software)

how to display document that use XML defined tags? using style sheets of course:

There are two main style sheet languagesCSS – Cascading Style SheetsXSL – eXtensible Stylesheet Language

XSL is much more complex and powerfulXSL-FO and XSLT

For now we'll just use CSS to explore some possibilities we will look at XSLT later

XML document + style sheet => presentable document

© 2015 the University of Greenwich 16

XML1

Displaying XML documents<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/css" href="books.css"?><booklist> <book> <ISBN>1-34565-79-8</ISBN> <date>2001-07-03</date> <title>Hamsters and other Furry Rodents</title> </book> <book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles</title> </book></booklist>

book { display:block }

ISBN { display:inline; font-family:arial; color:blue; font-size:10pt; font-weight:bold }

title { display:inline; font-family:arial; }

date { display:none}

books-css.xml

books.css

© 2015 the University of Greenwich 17

XML1

Displaying XML documents

This example mixes three XML languages that the browser

understands

XHTML + SVG + MathML

note filename is .xml

© 2015 the University of Greenwich 18

XML1

Displaying XML documents

note filename is .html

© 2015 the University of Greenwich 19

XML1

Well Formed and Valid XML Documents An XML document that conforms to the strict syntax rules in

the XML 1.0 specification can be considered to be well-formed makes life easy for an XML parser

In addition, an XML document can be considered to be valid if it conforms to a set of language rules defined in a schema either...

a Document Type Definition (DTD) or… an XML Schema (XSD)

XML documents don't need to have an associated DTD or XSD in which case they can only be checked for being well formed but not

for validity

© 2015 the University of Greenwich 20

XML1

XML Syntax Rules1. Document has a single root element2. Tags must be properly nested

- no overlapping tag pairs3. All tags must have a closing tag

- or be self closing4. Tag names are case sensitive5. Tag attributes are in the opening tag

- unique attribute name - attribute value must be quoted

© 2015 the University of Greenwich 21

XML1

XML Syntax Rules1. Only one root element is allowed in a document

This is called the document element

<head> <title>Some HTML doc</title></head><body> A bit of text</body>

<html> <head> <title>Some HTML doc</title> </head> <body> A bit of text </body></html>

not well formedwell formed

To be well-formed an XML document must have a document element that encloses all the other elements

© 2015 the University of Greenwich 22

XML1

XML Syntax Rules

Any element contained inside another element has to be completely

contained within it you can't have one element partly within another

The following may work as XHTML but it is not well formed XML

Whereas this is well formed XML (XHML)

2. All elements must be "properly nested"

<b>bold text <i>bold italic text</b> italic text</i>

bold text bold italic text italic text

<b>hold text <i>bold italic text</i></b><i> italic text</i>

bold text bold italic text italic text

© 2015 the University of Greenwich 23

XML1

XML Syntax RulesRules 1 and 2 combined mean that it is always possible to represent an XML document as a simple hierarchical tree<html> <head><title>Some XHTML</title></head> <body> <h1>Some XHTML</h1> <p>A bit of text</p></body></html>

html

body

head

h1

title Some XHTML

Some XHTML

p A bit of text

© 2015 the University of Greenwich 24

XML1

XML Syntax Rules

The following may be acceptable as HTML but is not well-formed XHTML

<p>first paragraph <p>second paragraph

Whereas this is

<p>first paragraph</p> <p>second paragraph</p>

If the tag is truly empty (i.e. it has no content) then the empty tag notation may be used so…

<hr></hr>

may be rewritten as

<hr/>

3. All elements must have a closing tag

© 2015 the University of Greenwich 25

XML1

XML Syntax Rules

<title> is different to <Title> is different to <TITLE>

closing tags must match case of course ...

<title>Hamsters and other Furry Rodents</TITLE>

...would clearly be wrong

4. Tag names are case sensitive

© 2015 the University of Greenwich 26

XML1

XML Syntax Rules

Start tags and empty tags but not end tags can contain attributes

Attributes always exists as name= "value" pairs The attribute value must always be quoted with " or ' The attribute name must be unique within the tag Some bad attribute examples...

5. Some rules concerning attributes

<film rating=PG>Snow White</film>

<car colour='silver trim' colour="red body">Ford Ka</car>

<transaction>credit</transaction id="12543">

<transaction synchronised>close account</transaction>

© 2015 the University of Greenwich 27

XML1

Some More XML Syntax Knowing about elements (i.e. tags), attributes and well-

formed documents allows you create basic XML documents

Other aspects of XML syntax include XML declaration

processing instructions

comments

character references and Entities

special symbols

CDATA sections

© 2015 the University of Greenwich 28

XML1

XML Declaration Ideally all XML documents should start with an XML

declaration (SGML processing instruction) <?xml version="1.0" encoding="UTF-8"?>

If included the declaration must: be the first line in the document be on a single line beginning with <?xml and ending with ?> include version= to indicate the version of xml

currently this must be "1.0" the declaration may optionally include:

encoding= indicates the encoding used to store the file typically this is "UTF-8" (8 bit Unicode)

standalone="[yes|no]" does the document depend on external markup declarations?

© 2015 the University of Greenwich 29

XML1

Processing Instructions Instructions intended for an application processing

the XML document Processing instructions have the form <?target instruction ?>

target identifies the program that the instruction is intended for

instruction is the instruction to the target program A very common PI is

<?xml-stylesheet href="mystyle.css" type="text/css" ?>

target instruction

© 2015 the University of Greenwich 30

XML1

Character References As in HTML these can be used to include non-

standard characters in the document i.e. things that can be displayed but not easily entered

from a standard keyboard Format is:

&#DDD; &#xHHH; DDD is the decimal number or HHH is the hex number

representing the character in the character set

<test>it&#39;s Greek to me &#934; &#916; &#x394;</test>

it's Greek to me Φ Δ Δ

© 2015 the University of Greenwich 31

XML1

Entities Some symbols have a special meaning in XML and can be

entered as entities (or character references) Standard symbols

less than symbol (<) - &lt; greater than symbol (>) - &gt; quotation mark (") - &quot; apostrophe (') - &apos; ampersand (&) - &amp; copyright (©) - &copy;

Customised ones e.g. &copyw; to insert a predefined (e.g. in

a DTD) copyright statement

© 2015 the University of Greenwich 32

XML1

Character References and Entities

© 2015 the University of Greenwich 33

XML1

CDATA Sections A way of including data that you don't want

interpreted as XML character data not to be parsed

Form is <![CDATA[the data not to be interpreted as XML]]> Why would you do this?

to hide executable JavaScript in an XML document perhaps to include examples of badly formed XML in an

XML document e.g. <![CDATA[ <wrong xml attr=val />]]>

Comments like HTML use <!-- and -->

© 2015 the University of Greenwich 34

XML1

XML Applications Used by current generation user agents

eXtensible Hypertext Markup Language XHTML Scalable Vector Graphics SVG Mathematical Markup Language MathML

Other human-facing client software Synchronised Multimedia Integration Language

SMIL only supported by the Real browser

Voice over XML VoiceXML (VML) specialised industry and commerce applications

© 2015 the University of Greenwich 35

XML1

XML Applications

<molecule convention="MDLMol" id="dopamine" title="DOPAMINE" > <date day="22" month="11" year="1995" > </date> <atomArray> <atom id="a1" > <string builtin="elementType">C</string> <float builtin="x2">0.0222</float> <float builtin="y2">0.8115</float> </atom>

Standard vocabularies for representing and exchanging specialist datae.g. legal, scientific, medical, mathematical vocabularies

© 2015 the University of Greenwich 36

XML1

XML Applications

Meta data (data about data) to describe resources e.g. Resource Description Framework RDF DARPA Agent Markup Language DAML Ontology Integration Language OIL Web Ontology Language OWL

<rdf:Description about="http://www.gre.ac.uk/examregs.html"><cd:Creator>Fred Bloggs</cd:Creator><cd:Date>20021212</cd:Date></rdf:Description>

© 2015 the University of Greenwich 37

XML1

XML Applications Buried deep in application communications

SOAP, XML-RPC, WSDL, UDDI

Business to business (B2B) data exchange ebXML

Probably of more value to B2B than a B2C website focussed e-commerce competes with JSON in B2C applications

<SOAP-ENV:Body><proc:GetCurrentPrice xmlns:proc="proc-URI"/>

<BusinessPartnerRole name="Buyer"><Performs initiatingRole="Buyer"/>

© 2015 the University of Greenwich 38

XML1

Applications of XML

CML MathML WML VoiceML XHTML SMIL SVG

RDF SOAP UDDI WSDL ebXML etc. etc.

Core XML

Syntax DTD XSD Namespaces

Supporting Specifications

Xpath Xlink

Xpointer Xquery

XSLT XSL-FO

CSS DOM etc.

Supporting Tools

Browsers – IE Mozilla

APIs – DOM SAX

Parsers – Expat MSXML Xerces

IDEs – XMLSpy Stylus

XML Technologies

© 2015 the University of Greenwich 39

XML1

DTD and XSD Schemas Document Type Definitions (DTD) and XML Schemas

(XSD) are alternative ways of defining an XML language

They contain rules to specify the vocabulary and grammar of a language the tags and attributes in the vocabulary

permissible values for attributes optional and mandatory tags and attributes tags nesting rules

XML languages defined by a DTDs or schemas are used to create valid XML documents

© 2015 the University of Greenwich 40

XML1

valid XML documentvalid XML document

valid XML documentvalid XML document

valid XML document

DTD and XSD Schemas For an XML document to be valid it must

conform to the rules specified in its DTD or XSD XML documents that use the

language defined in the DTD or XSD

DTD or XSD defines an XML language

encapsulated definition of the data model

© 2015 the University of Greenwich 41

XML1

Why do we need valid documents?

Applications must validate all incoming data data i/o is a major source of system error

check that required elements are present check that attribute values are appropriate

A DTD or XSD represents an agreed data model in a machine readable form can be processed by standard software

COTS code used at each end to generate and check the data validating parsers

Estate Agent Mortgage Broker

agreed format

XML

© 2015 the University of Greenwich 42

XML1

DTD and XSD Schemas DTDs

easy for humans to cope with older than schemas

supported by a much wider range of XML tools and software have poor support for namespaces

XSDs more verbose much more expressive than DTDs

data types, constraints on values an XML based vocabulary

can be manipulated with general purpose XML tools namespace support

© 2015 the University of Greenwich 43

XML1

Defining DTDs

root element is recommended_books the root element contains zero or more book elements each book element contains the following elements: author,

title, year_published, publisher, course and recommended_by

the author and recommended_by elements both consists of firstname and surname elements

As an example we shall develop a DTD for an XML document type intended to list books recommended by lecturers for various courses. The first version of such documents will have the following structure:

XML1<?xml version="1.0" encoding="UTF-8"?><recommended_books> <book> <author> <firstname>Stephen</firstname> <surname>Spainhour</surname> </author> <title>Webmaster in a Nutshell</title> <year_published> 1999</year_published> <publisher>O'Reilly</publisher> <course>WAT</course> <recommended_by> <firstname>Gill</firstname> <surname>Windall</surname> </recommended_by> </book> <book> <author> <firstname>Benoît</firstname> <surname>Marchal</surname> </author> <title>Applied XML Solutions</title> <year_published>2000</year_published> <publisher>Sams</publisher> <course>WAT</course> <recommended_by> <firstname>Kevin</firstname> <surname>McManus</surname> </recommended_by> </book></recommended_books>

goodbooks1.xml

Note how the firstname and surname elements appear in both author and recommended_by elements

None of the tags in this example contain attributes

© 2015 the University of Greenwich 45

XML1

goodbooks1.dtd

<?xml version="1.0" encoding="UTF-8"?><!ELEMENT recommended_books (book*)><!ELEMENT book (author, title, year_published, publisher, course, recommended_by)><!ELEMENT author (firstname, surname)><!ELEMENT title (#PCDATA)><!ELEMENT year_published (#PCDATA)><!ELEMENT publisher (#PCDATA)><!ELEMENT course (#PCDATA)><!ELEMENT recommended_by (firstname, surname)><!ELEMENT firstname (#PCDATA)><!ELEMENT surname (#PCDATA)>

contains 10 element definitions

© 2015 the University of Greenwich 46

XML1

goodbooks1.dtd

>(#PCDATA)surnameELEMENT<!

>(#PCDATA)firstnameELEMENT<!

>(firstname, surname)recommended_byELEMENT<!

>(#PCDATA)courseELEMENT<!

>(#PCDATA)publisherELEMENT<!

>(#PCDATA)year_publishedELEMENT<!

>(#PCDATA)titleELEMENT<!

>(firstname, surname)authorELEMENT<!

>(author, title, year_published, publisher, course, recommended_by)

bookELEMENT<!

>(book*)recommended_booksELEMENT<!

element contentselement / tag nametype

© 2015 the University of Greenwich 47

XML1

goodbooks1.dtd The DTD can be read as meaning:

recommended_books contains zero of more book elements each book element contains in order the elements:

author title year_published publisher course recommended_by

the author and recommended_by elements both consists of firstname and surname elements

the title, year_published, publisher, course, firstname and surname elements consist of text

the actual data

© 2015 the University of Greenwich 48

XML1

DTD syntax

parsed character data - a string of text#PCDATA

parentheses ( ) are used to group elements so thismeans zero or more occurrences of eleA followed by eleB

(eleA,eleB)*

eleA is followed by eleBeleA, eleB

eleA or eleB occurs but not botheleA | eleB

eleA occurs zero or more timeseleA*

eleA occurs one of more timeseleA+

eleA is optionaleleA?

Meaning of contentsExpression

© 2015 the University of Greenwich 49

XML1

Four Element Forms Empty Elements have no element content

can still contain information in attributes. Element-Only Elements contain only child elements

content model is a list of child elements arranged using the expressions listed in the previous table

Text-Only Elements contain only character data (text) content model is simply #PCDATA

Mixed Elements contain both child elements and character data content model must contain

a choice list beginning with #PCDATA the rest of the choice list contains the child elements it must end in an asterisk indicating that the entire choice group is optional

although this constrains the type of child element it does not constrain the order or quantity

© 2015 the University of Greenwich 50

XML1

Quick Quiz<!ELEMENT transactions (tran*)><!ELEMENT tran (account, (debit|credit)?)><!ELEMENT account (#PCDATA)><!ELEMENT debit (#PCDATA)><!ELEMENT credit (#PCDATA)>

<transactions> <tran><account>7652</account></tran> <tran><account>9856</account><credit>23.56</credit></tran> <tran><account>0085<debit>45.50</debit></account></tran> <tran> <account>1134</account> <debit>100.00</debit><credit>23.56</credit> </tran></transactions>

Here's a DTD

Why is the following not a valid document according to the DTD?

© 2015 the University of Greenwich 51

XML1

goodbooks2.xml Extending the recommended books example

to include attributes The definition of the document type is

changed to: make the year_published element optional allow more than one course to be referenced include a rating attribute of the book element

which can take the values "ok" or "good" or "excellent" and has a default value of "ok"

XML1<?xml version="1.0" encoding="UTF-8"?><recommended_books> <book rating="excellent"> <author> <firstname>Stephen</firstname> <surname>Spainhour</surname> </author> <title>Webmaster in a Nutshell</title> <year_published>1999</year_published> <publisher>O'Reilly</publisher> <course>WAT</course> <course>Internet Publishing</course> <recommended_by> <firstname>Gill</firstname> <surname>Windall</surname> </recommended_by> </book> <book rating="good"> <author> <firstname>Benoît</firstname> <surname>Marchal</surname> </author> <title>Applied XML Solutions</title> <publisher>Sams</publisher> <course>WAT</course> <recommended_by> <firstname>Kevin</firstname> <surname>McManus</surname> </recommended_by> </book></recommended_books>

attribute

repeated course element

attribute

omitted year_published

goodbooks2.xml

© 2015 the University of Greenwich 53

XML1

goodbooks2.dtd

<?xml version="1.0" encoding="UTF-8"?><!ELEMENT recommended_books (book*)><!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)><!ATTLIST book rating (ok | good | excellent) "ok"><!ELEMENT author (firstname, surname)><!ELEMENT title (#PCDATA)><!ELEMENT year_published (#PCDATA)><!ELEMENT publisher (#PCDATA)><!ELEMENT course (#PCDATA)><!ELEMENT recommended_by (firstname, surname)><!ELEMENT firstname (#PCDATA)><!ELEMENT surname (#PCDATA)>

year_published is now optional

course can occur more than once

new rule defining a rating attribute for the book element

© 2015 the University of Greenwich 54

XML1

Attribute Rules "ok" is the default value from the rating enumerated series Other attribute definitions are possible:

#REQUIRED – the attribute is required #IMPLIED – the attribute is optional #FIXED value – the attribute has a fixed value (constant)

As well as enumerated attribute types there are: CDATA – unparsed character data NOTATION – declared elsewhere in the DTD, usually a mime type ENTITY – declared elsewhere in the DTD as an ENTITY (same as a name) ID – unique identifier IDREF – reference to an ID elsewhere in the DTD NMTOKEN – name containing only token characters, i.e. no whitespace

Attributes can be defined anywhere in the DTD but usually placed immediately after the corresponding element

Multiple attributes for an element are declared in a singe attribute list

<!ATTLIST book rating (ok | good | excellent) "ok" reviewer CDATA #IMPLIED>

© 2015 the University of Greenwich 55

XML1

Not so Quick Quiz

How do you decide if information should be in an element or an attribute?

http://bit.ly/1ISG2wF

© 2015 the University of Greenwich 56

XML1

Linking the DTD to the XML document

name of the root element

URL of document containing the DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE recommended_books SYSTEM "goodBooks2.dtd"><recommended_books><book rating="excellent"> <author> <firstname>Stephen</firstname> ......

An XML document can refer to an external DTD using <!DOCTYPE >

© 2015 the University of Greenwich 57

XML1

Linking the DTD to the XML document

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE recommended_books [ <!ELEMENT recommended_books (book*)> <!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)> <!ATTLIST book rating (ok | good | excellent) "ok"> <!ELEMENT author (firstname, surname)> <!ELEMENT title (#PCDATA)> <!ELEMENT year_published (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT course (#PCDATA)> <!ELEMENT recommended_by (firstname, surname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT surname (#PCDATA)>]><recommended_books><book rating="excellent"> <author> <firstname>Stephen</firstname>

Alternatively the DTD can be included inline within the XML document

© 2015 the University of Greenwich 58

XML1

Quick Quiz

<about>This program was brought to you by <a href="http://www.webbedwonders.co.uk">Webbed Wonders</a>.We can be contacted at <address><line>Lettuce Towers</line><line>Braythorpe Street</line><line>Wessex</line><postcode>WA1 7QT</postcode></address>Thank you for your interest.</about>

Suppose we want to define an element that can contain a mixture of other elements and plain text

Which of the following do you think is the correct way of specifying in a DTD the <about> element as used above?

1. <!ELEMENT about (a, address)> 2. <!ELEMENT about (#PCDATA | a | address)*> 3. <!ELEMENT about (#PCDATA, a, address)*> 4. <!ELEMENT about (#PCDATA, a, # PCDATA, address, #PCDATA)> 5. It's not possible because the document isn't well-formed.

© 2015 the University of Greenwich 59

XML1

What else can you do with DTDs? Specify that an attribute value is unique within a document

a bit like a primary key in a data base table

<!ATTLIST BankBranch BranchID ID #REQUIRED>

Specify that the value of one attribute refers to an attribute type ID using an attribute type IDREF like a foreign key

<!ATTLIST account branch IDREF #REQUIRED> ....... <BankBranch BranchID="SC30_00_02"> ....... <account branch="SC30_00_02">

The ID value must be a valid name so cannot start with a 0-9 character

© 2015 the University of Greenwich 60

XML1

What else can you do with DTDs? Define your own entities, often commonly used

strings e.g.

<!ENTITY Disclaimer "Umpire decision is final!"> ........ <footer>&Disclaimer;</footer>

Define ways of handling non-XML data e.g.

<!NOTATION png SYSTEM "image/png"> ........ <diagram type="png" file="graph.png">

© 2015 the University of Greenwich 61

XML1

What can you not do with DTDs? Specify the data type (e.g. integer) of elements or attributes

the only element data type recognised is string attributes can validate enumerated or ID values

Easily mix XML vocabularies from different DTDs namespaces are possible but not well supported

Accurately define the structure of a mixed element cf. the preceding quick quiz

Because of these and other restrictions there have been a number of initiatives to develop alternatives to the DTD W3C supports the XML Schemas XSD specification

© 2015 the University of Greenwich 62

XML1

goodbooks3.xsd

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="recommended_books > <xs:complexType> <xs:sequence> <xs:element ref="book" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>

<!ELEMENT recommended_books (book*)>

Re-writing goodbooks2.dtd as an XML schema results in a significantly longer file. This is listed over the next 4 slides with the corresponding DTD for comparison

© 2015 the University of Greenwich 63

XML1

goodbooks3.xsd

<xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="author"/> <xs:element ref="title"/> <xs:element ref="year_published" minOccurs= "0"/> <xs:element ref="publisher" /> <xs:element ref="course" maxOccurs="unbounded"/> <xs:element ref="recommended_by"/> </xs:sequence> ......

<!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)>

unless stated the value of minOccurs and maxOccurs is 1

© 2015 the University of Greenwich 64

XML1

goodbooks3.xsd

...... <xs:attribute name="rating" use="optional" default="ok"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="excellent"/> <xs:enumeration value="good"/> <xs:enumeration value="ok"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType></xs:element>

<!ATTLIST book rating (ok | good | excellent) "ok">

Note how the attribute definition is nested within the definition of the book element

© 2015 the University of Greenwich 65

XML1

goodbooks3.xsd<xs:element name="author"> <xs:complexType> <xs:sequence> <xs:element ref="firstname"/> <xs:element ref="surname"/> </xs:sequence> </xs:complexType></xs:element> <!ELEMENT author (firstname, surname)>

<xs:element name="title" type= "xs:string"/> <xs:element name="year_published" type="xs:short"/> <xs:element name="publisher" type="xs:string"/> <xs:element name="course" type="xs:string"/>

<!ELEMENT title (#PCDATA)><!ELEMENT year_published (#PCDATA)><!ELEMENT publisher (#PCDATA)><!ELEMENT course (#PCDATA)>

note data types

© 2015 the University of Greenwich 66

XML1

goodbooks3.xsd <xs:element name="recommended_by"> <xs:complexType> <xs:sequence> <xs:element ref="firstname"/> <xs:element ref="surname"/> </xs:sequence> </xs:complexType> </xs:element>

<!ELEMENT recommended_by (firstname, surname)>

<xs:element name="firstname" type="xs:string"/><xs:element name="surname" type="xs:string"/>

</xs:schema>

<!ELEMENT firstname (#PCDATA)><!ELEMENT surname (#PCDATA)>

© 2015 the University of Greenwich 67

XML1

Things to notice about goodbooks3.xsd XML schemas are much more verbose than DTDs The XML schemas language itself conforms to XML syntax rules and so can be

manipulated using standard XML tools More specific restrictions can be made on the occurrence of elements than with

DTDs e.g.

<!ELEMENT recommended_books (book*)> <xs:element ref="book" minOccurs="0" maxOccurs="unbounded "/>

both the above mean the same but in schemas minOccurs and maxOccurs can be used to restrict the number of allowed occurrences

In DTDs the only data type for elements is #PCDATA whereas schemas contain much more support for data types e.g.

<xs:element name="title" type="xs:string"/> <xs:element name="year_published" type="xs:short"/>

A full range of data types are supported (e.g. boolean, float, datetime) plus you can define your own.

XML Schemas make use of namespaces

© 2015 the University of Greenwich 68

XML1

Linking a Schema to an XML document

Not totally standard and somewhat tied to W3C but the method below works with at least some tools that support Schemas

<?xml version="1.0" encoding="UTF-8"?><recommended_books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="goodbooks3.xsd"><book rating="excellent">

<author><firstname>Stephen</firstname><surname>Spainhour</surname>

......

this line associates the schema stored in goodbooks2.xsd in the same directory with the XML document

© 2015 the University of Greenwich 69

XML1

NamespacesNamespaces are a way of avoiding name conflicts, i.e. where different XML vocabularies use the same names to mean different things.

In designing an XML based language we may want to include elements from several other XML languages e.g.

ProductML CustomerML

InvoiceMLwhen defining a new XML language to describe invoice documents we may want to draw on existing languages for describing products and customers

© 2015 the University of Greenwich 70

XML1

NamespacesWhat to do about name clashes, e.g. it is likely that ProductML and CustomerML both contain <name> elements

<name>Giant Widget</name>

<name>George Barford</name>

We don't want applications that process InvoiceML to confuse the <name> elements.

Dear Mr Giant Widget,

Your George Barford has been despatched today ...

© 2015 the University of Greenwich 71

XML1

Namespaces

Namespaces give a mechanism for "qualifying" element names with a prefix so that they are all unique, e.g.

<prod:name>Giant Widget</prod:name>

<cust:name>George Barford</cust:name>

Wherever you see element names including a prefix followed by a ":" you can be sure that namespaces are being used e.g.

<xs:element name="event">

© 2015 the University of Greenwich 72

XML1

NamespacesThe prefix needs to be defined in the XML document that is using it by including the xmlns attribute. For example to define the prod: and cust: prefixes in an invoice document

declaring a default namespace that uses no prefix

<invoices xmlns:prod="http://mycompany.com/products"xmlns:cust="http://mycompany.com/customers"xmlns="http://mycompany.com/invoices"> <invoice> <invoice_id>2314</invoice_id> .... <prod:name>Giant Widget</prod:name> <cust:name>George Barford</cust:name> .... </invoice></invoices>

declaring a namespace associated with the prod prefix

declaring a namespace associated with the cust prefix

© 2015 the University of Greenwich 73

XML1

NamespacesIn the previous example it is tempting to guess that this line…

<invoices xmlns:prod="http://mycompany.com/products" xmlns:cust="http://mycompany.com/customers" xmlns="http://mycompany.com/invoices">

associates the prod: prefix with an XSD located at

http://mycompany.com/products

and cust: with one at

http://mycompany.com/customers

But these URLs need not be actual locations at all - they are simply unique names used to identify namespaces. URIs (URLs & URNs) are convenient ways of specifying unique values.

There is a way of tying prefixes to actual XSDs (but not DTDs) so that documents can be validated against multiple Schemas. The syntax is both messy and unclear.

© 2015 the University of Greenwich 74

XML1

References

There are masses of XML books and websites. SAMS Teach Yourself XML in 24 hours - Morrison

Cheap as chips, good scope but little depth W3Schools online tutorial www.w3schools.com/xml

Try their online XML test World Wide Web consortium at http://www.w3.org

The home of the XML specification and so much more. XML in practice from http://www.xml.org

Articles, white papers, user groups and more

© 2015 the University of Greenwich 75

XML1

Summary XML is a meta-language used to define application specific markup languages

XHTML, SVG, MathML, RSS, SOAP, WSDL, etc., etc. XML is designed to be straightforward and easy to use XML separates content from presentation

CSS and XSL can be used to render XML documents in a readable form more on XML rendering next week

XML provides simple syntactic rules that result in well-formed hierarchically structured documents

DTDs or XSDs are used to define valid XML languages DTDs are

widely supported have limited features

XSDs are an XML language provide tighter specification than DTDs provide some support for namespaces

© 2015 the University of Greenwich 76

XML1

Questions