xml : a brief introduction managing networks : understanding new technologies, birmingham, 13...

29
XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by: Email [email protected] URL http://www.ukoln.ac.uk/

Upload: nigel-pearson

Post on 04-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

XML :a brief introduction

Managing networks : understanding new technologies, Birmingham,

13 September 2001

Pete Johnston

UKOLN, University of Bath

Bath, BA2 7AY

UKOLN is supported by:

[email protected]://www.ukoln.ac.uk/

Page 2: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

2

XML: a brief introduction

• Markup & markup languages• SGML & XML• Two perspectives on XML• Some features of XML• XML & HTML• Uses of XML

Page 3: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

3

Markup & markup languages

• Markup – text added to the data content of a

document in order to convey information about data

– markup pre-dates computers!

• Marked-up document contains– data and

– information about that data (markup)

Page 4: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

4

Markup & markup languages

• Markup language – formalised system for providing markup

• Definition of markup language specifies

– what markup is allowed

– how markup is distinguished from data

– what markup means

Page 5: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

5

Exercise 1

• From your own experience, can you suggest

– some instances of where markup is used?– some examples of markup languages?

Page 6: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

6

SGML and XML

Standard Generalized Markup LanguageISO 8879 : 1986

General, flexible, powerful

Used (mainly but not exclusively) in large publishing environments

Extensible Markup LanguageRecommendation of W3C, 1998, 2000

Subset of SGML

Less flexible; easier to implement, use

Used (increasingly) everywhere…often invisibly…

Define means of describing tree-structured data in text format, using markup embedded in data

Page 7: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

7

SGML and XML

• SGML and XML– not strictly markup languages!– “meta-languages” - languages for describing

markup languages– can define unlimited number of markup languages

• All conforming languages can be processed by single program (“parser”)

• Rules made public so any programmer can write parser

• Many parsers available for application developer

• Data independent of platform, vendor

Page 8: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

8

A document perspective (1)

• Individual documents have structure– component parts– relationships between parts

• Physical structure– depends on medium

• Logical structure– hierarchical, tree structure– independent of physical rendition

• Document types– set of documents sharing common logical

structural model

Page 9: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

9

A document perspective (2)

• Logical structure communicated to human reader through presentational conventions

• Presentation defined by “procedural” markup– instructs “agent” what to do with text

– e.g. how to format it

• Problems– markup specific to processing system– specific to delivery medium– human interprets logical structure but software

can’t

Page 10: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

10

A document perspective (3)

• Descriptive markup– identifies the logical components of a document– does not specify what procedures are to be applied

to text – so e.g. how to format it must be specified separately

• Benefits– markup (potentially) independent of processing

system– permits reuse and delivery to multiple media– makes logical structure available to software

• N.B. exchange requires consensus on what markup means!

Page 11: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

11

Exercise 2

• HTML– conceived as describing the logical

structure of hypertext document– acquired features which described

presentation– extended by browser vendors

• In the HTML examples, can you see– where markup describes presentation?– where markup describes logical structure?

Page 12: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

12

A data perspective (1)

• The structured document is just one type of structured data

• Other types of structured data can be represented as tree-structures

• A “serialization” syntax is useful for various sorts of structured data (relational, object etc.)

– for exchange between application programs on different platforms, across networks etc.

• SGML too complex, “heavyweight” - but XML ideal

Page 13: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

13

A data perspective (2)

• A “document” might be any collection of information processed as a unit

– a report– a patient record– a purchase order transaction– a configuration file for an operating system – some “structured information about a resource” (a

metadata record)– …– etc!

• Applications less concerned with publishing, formatting, presentation

Page 14: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

14

XML : elements

• XML uses embedded tags to delimit and label parts of document

– tags <…..>

• Elements– containers delimited by tags which include element type

name– start tag <element>– end tag </element>

• Elements may contain– character data– other elements– both of the above– nothing (empty elements) <element/>

• Document element as root of element tree

Page 15: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

15

XML : attributes

• Attributes– pairs of names and values– occur inside element start tag, after element type

name– <element attribute=“value”>

• Element can contain only one occurrence of each attribute

• Attribute values may contain– character data only

• Attribute values must be surrounded by quotes

Page 16: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

16

XML : elements & attributes

• Nouns and adjectives?– use character data for “content”

– use attributes for “information about content”

• Document-centric view?

• No hard and fast rules

• Design decisions tend to be based (wrongly!) on behaviour of tools

• XML documents are human-readable…

• … but ease of human-readability may not be the most important consideration in their design

Page 17: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

17

XML : document types & vocabularies

• “XML lets me make up names for element types! Great!”

• But….– XML says nothing about what your names mean– will a human recipient of your document recognise

your <operation> element? – will a software agent process your <operation>

element correctly?

• Communication requires consensus on– structural model of class of document/data– labelling of components– semantics of components

• Shared use of common XML “vocabularies”

Page 18: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

18

XML : DTDs, XML Schemas

• Two methods to codify syntax rules of vocabulary used to describe document type

– what markup is allowed– structural constraints on use of markup– say nothing about what markup means

• Document Type Definition (DTD)– inherited from SGML– part of XML Recommendation

• XML Schema– recent recommendation of W3C– support for data-typing i.e. tighter control on element

content– support for combining vocabularies– use XML syntax

Page 19: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

19

XML : Validation & well-formedness

• Validation– parser can check markup of individual document

against rules expressed in DTD or Schema– authoring tool can enforce rules of DTD/Schema

while document is edited

• Well-formed documents– not checked against DTD/Schema, but do follow

basic syntax rules e.g.– all tags use proper delimiters

– all elements have start and end tags

– all elements nested

– attribute values in quotes

– appropriate use of special characters

Page 20: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

20

Exercise 3

• Well-formedness– Identify the errors which mean that the

three examples are not well-formed XML– How would you correct the errors?

Page 21: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

21

XML : namespaces (1)

• Applications wish to use element from multiple vocabularies (DTDs/Schemas)

– particularly true of metadata applications

• Problems of “name collisions”– <surgery> in GPs Directory Schema– <surgery> in MPs Appointments Schema

• XML Namespaces– recommendation of W3C– provides universal naming mechanism

• A Namespace is a collection of names• A Namespace is itself given a name, which

has the form of a URI

Page 22: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

22

XML : namespaces (2)

• Element type names and attribute names can be qualified by a namespace name (a URI)

• Association with namespace through use of a namespace prefix

• Declaration of namespace– xmlns:health=“http://nhs.gov.uk/xml/”– xmlns:parl=“http://gov.gov.uk/xml/”

• Use of qualified name– <health:surgery>– <parl:surgery>

Page 23: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

23

XML and HTML

• HyperText Markup Language (HTML)– recommendation of W3C (version 4.01)– designed as an application of SGML (not XML)– simple, easy to create– (partial?) support in browsers, editors– mixes description of structure and presentation

• Browsers– permissive – will display invalid HTML– support proprietary extensions

• Context– explosion of Web– new devices

Page 24: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

24

XML and HTML (2)

• XHTML 1.0– expression of HTML 4.01 as XML (not SGML)– same features but restrictions on syntax

– case sensitivity, XML well-formedness rules

– current W3C recommendation for creation of docs for Web

• XHTML 1.1– modularisation of XHTML– separation of structural markup from presentational

markup– support for managing extensions

Page 25: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

25

Uses of XML (1)

• Data (and metadata) exchange– e-commerce– e-government (http://www.govtalk.gov.uk)– rights management– bibliographic data– news syndication– scientific data– health - patient records– (… plus hundreds more…)– Web services

• Within systems and between systems• Many standards/protocols built on XML

Page 26: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

26

Uses of XML (2)

• Storage– publishing– scholarly texts– archival finding aids– document management– …– preservation

Page 27: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

27

XML : summary (1)

• Means of describing structured data in text format

• Independent of platform, vendor– reuse of data– exchange of data

• Used – for many types of structured data– in many different applications– both for storage and exchange

– data may be stored in database, exposed as XML

Page 28: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

28

XML : summary (2)

• Use of XML – usually invisible to end-user– increasingly invisible to information

manager?– generated and consumed by software

– requires consensus amongst communication partners

Page 29: XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath,

Managing networks: understanding new technologies, Birmingham, 13 Sep 2001

29

Acknowledgements

UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.

http://www.ukoln.ac.uk/