xml - dipartimento di informatica · validating xml documents: dtd and schema ! an xml document can...

18
XML Dott. Nicole NOVIELLI [email protected] http://www.di.uniba.it/intint/people/nicole.html XML: eXtensible Markup Language ! Permits document authors to create markup language, that is text-based notations for describing data ! Enables document authors to create entrely new markup languages for describing any type of data Eg.: ! Mathematical formulas ! Software-configuration instructions ! Chemical structures ! Music ! News ! Reports !

Upload: vanxuyen

Post on 04-Apr-2018

264 views

Category:

Documents


4 download

TRANSCRIPT

XML Dott. Nicole NOVIELLI

[email protected] http://www.di.uniba.it/intint/people/nicole.html

XML: eXtensible Markup Language

!  Permits document authors to create markup language, that is text-based notations for describing data

!  Enables document authors to create entrely new markup languages for describing any type of data Eg.:

!  Mathematical formulas

!  Software-configuration instructions

!  Chemical structures

!  Music

!  News

!  Reports

!  …

Es.: xml describing a baseball player’s information

<?xml version = "1.0"?> <!-- Fig. 14.1: player.xml --> <!-- Baseball player structured with XML -->

<player> <firstName>John</firstName> <lastName>Doe</lastName> <battingAverage>0.375</battingAverage> </player>

•  XML documents contain text that represent content (in red) and elements that speciry the document’s structure (tag)

•  XML documents delimit elements with start tags (<tagName>) and end tags (</tagName>)

•  Every XML document must have a root element hat contains all the otehr element (‘player’ in the example)

Vocabularies !  XML-based markup langugage

!  Provide a means for describing particular types of data in a standard and structured way

!  Some XML vocabularies include: !  XHTML

!  MathML !  VoiceXML !  CML (chemical markup language)

5

VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser. VoiceXML has tags that instruct the voice browser to provide speech synthesis, automatic speech recognition, dialog management, and audio playback.

An example of a VoiceXML document: <?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form> <block> <prompt> Hello world! </prompt> </block> </form> </vxml>

VoiceXML Voice Extensible MarkUp Language

Un Tutorial qui: http://www.voicexml.org/tutorials/intro1.html

6

VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations.

Un esempio

<?xml version="1.0" encoding="UTF-8"?> <vxml xmlns="http://www.w3.org/2001/vxml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd" version="2.0"> <form> <field name="drink"> <prompt>Would you like coffee, tea, milk, or nothing? </prompt> <grammar src="drink.grxml" type="application/srgs+xml"/> </field> <block> <submit next="http://www.drink.example.com/drink2.asp"/> </block> </form> </vxml>

Validating XML documents: DTD and schema

!  An XML document can refer to a DTD (Document type Definition) or to a schema

!  Validating parsers can read the DTD/Schema and check that the XML document conforms to it !  That is the document has an appropriate structure

!  E.g.: for the player’s information example: we are referencing a DTD that specified that a player element must have firstName, lastName and battingAverage elements

!  Omitting one of them would caus invalidation of player.xml, though the document would still be well-formed because it follows properly the XML syntax

!  A nonvalidating parser just checks the syntax of an XML document

XML is highly portable

!  Viewing or modifying an XML file (extension is ‘.xml’) does not require any specialized software !  Any text editor that supports ASCII/Unicode characters can open an

XML document for viewing and editing !  Most web browsers can disply XML documents in a formatted manner

that shows the XML structure

XML parser and syntax

!  Software for processing the XML files: !  makes the document available to other applications !  Checks that the document follows the syntax rules specified by W3C’s

XML Recommendation (www.w3.org/XML)

!  XML syntax requires a single root element and a start and end tag for each eleements !  Elements must be properly nested

!  If an XML parser can process the document entirely then the XML document is well-formed

Structuring data

!  XML Schema is a document definition language !  It specifies the structure of instance documents

!  “elements contained by other elements"

!  It specifies the datatype of each element/attribute

!  "this element shall hold an integer with the range 0 to 12,000"

!  The XML Schema language is also referred to as XML Schema Definition (XSD)

!  Composed of two parts: !  Structure: http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/

!  Datatypes: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/

!  XML Schema is an XML based alternative to DTD

Il Document Type Definition (definizione del tipo di documento): uno strumento utilizzato dai programmatori il cui scopo è quello di definire le componenti ammesse nella costruzione di un documento XML.

Il termine non è utilizzato soltanto per i documenti XML ma anche per tutti i documenti derivati dall'SGML (di cui peraltro XML vuole essere una semplificazione che ne mantiene la potenza riducendone la complessità) tra cui famosissimo è l'HTML.

In SGML, un DTD è necessario per la validazione del documento. Anche in XML, un documento è valido se presenta un DTD ed è possibile validarlo usando il DTD.

Tuttavia XML permette anche documenti ben formati, ovvero documenti che, pur essendo privi di DTD, presentano una struttura sufficientemente regolare e comprensibile da poter essere controllata.

DTD

Schema vs. DTD !  Both are XML document definition languages

!  XML Schema are written using XML

!  Unlike DTDs, XML Schema are Extensible – like XML

!  More verbose than DTDs

Schema vs. DTD: example <!ELEMENT BookStore (Book+)> <!ELEMENT Book (Title, Author, Date, ISBN, Publisher)>

<!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT ISBN (#PCDATA)> <!ELEMENT Publisher (#PCDATA)>

Schema vs. DTD: example <xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns="http://www.books.org"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1” maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema>

Referencing a schema in an XML instance document (simple form)

<?xml version="1.0"?>

<BookStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance”

xsi:noNamespaceSchemaLocation="BookStore.xsd">

<Book>

<Title>My Life and Times</Title>

<Author>Paul McCartney</Author>

<Date>1998</Date>

<ISBN>1-56592-235-2</ISBN>

<Publisher>McMillin Publishing</Publisher>

</Book>

</BookStore>

Un esempio: markup for a business letter

Riferimento ad un dtd esterno

È possibile, in alternativa, dichiarare il dtd nel file XML (inline)

letter.dtd

<!-- Fig. 14.9: letter.dtd -->

<!-- DTD document for letter.xml -->

<!ELEMENT letter ( contact+, salutation, paragraph+, closing, signature )> <!ELEMENT contact ( name, address1, address2, city, state, zip, phone, flag )>

<!ATTLIST contact type CDATA #IMPLIED> <!ELEMENT name ( #PCDATA )> <!ELEMENT address1 ( #PCDATA )> <!ELEMENT address2 ( #PCDATA )> <!ELEMENT city ( #PCDATA )> <!ELEMENT state ( #PCDATA )> <!ELEMENT zip ( #PCDATA )> <!ELEMENT phone ( #PCDATA )> <!ELEMENT flag EMPTY>

<!ATTLIST flag gender (M | F) "M”> <!ELEMENT salutation ( #PCDATA )> <!ELEMENT closing ( #PCDATA )> <!ELEMENT paragraph ( #PCDATA )> <!ELEMENT signature ( #PCDATA )>

Il Document Type Definition (definizione del tipo di documento): uno strumento utilizzato dai programmatori il cui scopo è quello di definire le componenti ammesse nella costruzione di un documento XML.

Il termine non è utilizzato soltanto per i documenti XML ma anche per tutti i documenti derivati dall'SGML (di cui peraltro XML vuole essere una semplificazione che ne mantiene la potenza riducendone la complessità) tra cui famosissimo è l'HTML.

In SGML, un DTD è necessario per la validazione del documento. Anche in XML, un documento è valido se presenta un DTD ed è possibile validarlo usando il DTD.

Tuttavia XML permette anche documenti ben formati, ovvero documenti che, pur essendo privi di DTD, presentano una struttura sufficientemente regolare e comprensibile da poter essere controllata.

DTD

Fonte: slide Prof. Filippo Lanubile

Referencing a schema in an XML instance document (simple form)

<?xml version="1.0"?> <BookStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation="BookStore.xsd"> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>1998</Date> <ISBN>1-56592-235-2</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> … </BookStore>

XLM Namespaces

La possibilità di creare elementi personalizzati con XML, potrebbe portare a conflitti nella gestione dei nomi

Naming collision: lo stesso nome è usato per indicare elementi diversi

An XML namespace is a collection of element and attribute names

XML namespaces provide a means for document author to unambiguosly refer to the elements with the same name (i.e. prevent collision)

esempio Problem:

<subject>Geometry</subject> and

<subject>Cardiology</subject>

both use ‘subject’ to markup data.

In the first case, the subject is something one studies in school, whereas in the second case, teh subject is a field of medicine

Solution: differentiation using namespaces

<highschool:subject>Geometry</subject> and

<medicalschool:subject>Cardiology</subject>

Differentiating elements with namespaces

<?xml version = "1.0"?> <!-- Fig. 14.7: namespace.xml --> <!-- Demonstrating namespaces --> <text:directory

xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo">

<text:file filename = "book.xml"> <text:description>A book list</text:description> </text:file>

<image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100" /> </image:file>

</text:directory>

- The xmlns reserved attribute is used to create two namespace prefixes: texts and image -  Each namespace prefi is boud to a URI (Uniform Resource Identifier) -  Document authors create their own namespace prefixes and URI -  To ensure that namespaces are unique, we must provide unique URIs -  In this example we use URN: Uniform Resource Name

Differentiating elements with namespaces

<?xml version = "1.0"?> <!-- Fig. 14.7: namespace.xml --> <!-- Demonstrating namespaces --> <text:directory

xmlns:text = ”http://www.deitel.com/xmlns-text" xmlns:image = "http://www.deitel.com/xmlns-text">

<text:file filename = "book.xml"> <text:description>A book list</text:description> </text:file>

<image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100" /> </image:file>

</text:directory>

-  Another common practice si to use URLs, which specify the location of resources on the Internet -  Using URLs guarantees that the namespaces are unique because the domain names are guaranteed to be unique -  the parser does not visi thte URL: it doesn’t have to be a an actual web pages

e.g. xmlns:text = ”abcdefgkjle" is allowed

XML Schema Types

!  Built-in Datatypes !  Primitive Datatypes

!  string, double, recurringDuration, decimal, Boolean, ...

!  Derived Datatypes: !  CDATA, integer, nonPositiveinteger, date, time, ... !  Derived from the primitive types

!  Example: integer is derived from decimal, CDATA is derived from string, time is derived from recurringDuration

!  User-defined Datatypes !  Simple Types

!  Derived from built-in or other user-defined datatypes

!  Structured !  Complex Types

!  Needed to define child elements and/or attributes of an element

Creating an XML Schema Document

!  XML Schema enables authors to specify what specific type of data (e.g. numeric, text) an element can contain

!  XML Schema are XML documents themselves and the same parser can be used for both Schema and documents

!  A document may be schema valid or schema invalid if, respectively, conforms or not to a schema document

book.xml a schema valid document describing a list of books

The books element havs the deitel prefix indicating that the books element is a part of the namespace ‘http://www.deitelcom/booklist’

book.xsd -  Creating the XML Schema document: defining the ‘vocabulary’ for writing XML documents about collection of books -  The schema defines the elements, attrubutes and parent/child relationships that such a document can (or must) include. -  It also specifies the type of data that these elements and attributes may contain

book.xsd

root

root

Binding the name space prefix deitel and defining the target namespace

book.xsd

book.xml

Connecting the XML document with the schema that defines its structure. When an XML schema validator examines book.xml and book.xsd, it will recognize that books.xml uses elements and attributes form ‘http://www.deitel.com/booklist’ namespace

book.xsd Defining an element called ‘books’ of type ‘BooksType’

Definition of ‘BooksType’: Complex Type is used to define a child/parent relation (not possible with simpleType)

Types

!  Every element in an XML Schema has a type

!  Types include the bult-in types provided by XML Schema or user-defind types, as for SingleBookType

!  Every simple type defines a restriction on an XML on a type (either built-in or user-defined). Restriction limit the possible values that an element can hold

!  Complex types may be with !  Simple content: can contain attributes and must restrict some

other existing type

!  Complex content: can contain attributes and child elements

Creating a simpleType

<simpleType name = "gigahertz”> <restriction base = "decimal”> <minInclusive value = "2.1"/> </restriction> </simpleType>

simpleType are restrictions of a type typically called a base type. In this case, the base type is the decimal that is restricted to be at least 2.1 by using the minInclusive element

Creating a complexType with simpleContent

A complexType with simple content can have attributes but not child elements. Also, they must extend or restrict some XML Schema type or user-defined type. The extension element with attribute base sets the base type as string. In this example the string type is extended with the attribute model

<complexType name = "CPU”> <simpleContent> <extension base = "string”> <attribute name = "model" type = "string"/> </extension> </simpleContent> </complexType>

Creating a complexType with complex content

A complexType with complex content is allowed to have both attributes and child elements. The element all encloses elements that mus each be included once in the corresponding XML instance document, in any order. When using types CPU and gigahertz we must include the prefix computer because thee user-defined types are part of the computer namespace

<complexType name = "portable"> <all> <element name = "processor" type = "computer:CPU"/> <element name = "monitor" type = "int"/> <element name = "CPUSpeed" type = "computer:gigahertz"/> <element name = "RAM" type = "int"/> </all> <attribute name = "manufacturer" type = "string"/> </complexType>

xmlns:computer = "http://www.deitel.com/computer” targetNamespace = "http://www.deitel.com/computer">

<element name = "laptop" type = "computer:portable"/>

This line declares the actual element that uses the three types defined in the schema.

The element is called laptop and is of type portable

We have now created an element named laptop that contains child elements processor, monitor, CPUSpeed and RAM and the attribute manufacturer

<?xml version = "1.0"?> <!-- Laptop components marked up as XML --> <computer:laptop xmlns:computer = "http://www.deitel.com/computer" manufacturer = "IBM">

<processor model = "Centrino">Intel</processor> <monitor>17</monitor> <CPUSpeed>2.4</CPUSpeed>

<RAM>256</RAM> </computer:laptop>

laptop.xml: an XML file using the laptop.xsd schema defined

Riferimenti !  Harvey M. Deitel and Paul J. Deitel, Internet & World

Wide Web: How to Program, Ed. Pearson International Edition

!  http://www.w3.org/

!  www.deitel.com/books/iw3htp4 (per il codice di esempio degli esercizi)