defining xml the document type definition. document type definition text syntax for defining...

Post on 28-Mar-2015

252 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Defining XML

The Document Type Definition

Document Type Definition

• text syntax for defining– elements of XML– attributes (and possibly default values)– structure

• <?xml … standalone = “no”… ?>

• implies that an external definition exists and may be required to properly understand the content

Why do we need DTDs?

• Define classes of xml documents– For particular applications– Agreement on data and structure

• Validate xml data– DTD is used to check structure

• Document an xml class– DTD provides complete information about an

xml class

linking an XML file to a DTD

• a document type declaration is added to the xml<!DOCTYPE message SYSTEM “myDTD.dtd”>

XMLfile

DTDDOCTYPE link

myDTD.dtdmessage.xml

What Is a DTD?

• Defines a type of xml document– What elements are allowed?– What attributes do they have?– How can they be structured?

• DTD is in text format

• Usually external to the xml data– Linked by a document type declaration

• May be included in the xml data file

Element type declarations

<!ELEMENT myElement (#PCDATA)>

the “element definition” element

name of the element being defined

content that the element can have

#PCDATA = parsed character data

<!ELEMENT message ( #PCDATA )>

One line of text, stored in messageML.dtd

<?xml version = “1.0” ?><!DOCTYPE message SYSTEM ”messageML.dtd"><message> Welcome to XML!</message>

Example of a message document conforming to this DTD

Example

Internal DTD Example

<?xml version = “1.0” ?>

<!DOCTYPE message [ <!ELEMENT message (#PCDATA)>]><message>Welcome to XML!

</message>

• Element declarations define the content of elements

• Content can be text or other elements

• Content defines structure– How are the elements nested?– How many elements can be included?– What order do elements come in?

Defining structure

Defining structure

<!ELEMENT classroom (teacher, student)>

a classroom contains exactly one teacher followed by exactly one student

<!ELEMENT dessert (iceCream ¦ pastry)>

a dessert contains either one iceCream or one pastry, but not both

<!ELEMENT album (track+)>

an album contains one or more tracks

occurrence indicators

Plus sign (+)

Asterisk (*)

Question mark (?) Element will appear 0 to 1 times

Element will appear 0 to many times

Element will appear 1 to many times

<!ELEMENT album (track+)>

<!ELEMENT library (book*)>

<!ELEMENT seat (person?)>

A Simple Document Type Definition

<!—DTD for sample document--> <!ELEMENT customer-details (name, address) > <!ELEMENT address (street, city, state, postal) > <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT postal (#PCDATA)>

DTD Example 1

<!ELEMENT class

(number, (instructor ¦ assistant+), (credit ¦ nocredit) )>

a class must contain a number followed by either an instructor or one or more assistants followed by either a credit or a nocredit

<class>

<number>CM4003</number>

<instructor>Stewart Massie</instructor>

<credit>15</credit>

</class>

DTD Example 2

<!ELEMENT donutBox (jam?, lemon*,

((cream | sugar)+ | iced))

a donutBox contains 0 or 1 jam followed by 0 to many lemon followed by either one to many cream or sugar or one iced

<donutBox>

<jam>raspberry</jam>

<lemon>sour</lemon>

<lemon>half-sour</lemon>

<iced>chocolate</iced

</donutBox>

<donutBox>

<iced>pink</iced>

</donutBox>

DTD Example 3

<!ELEMENT farm (farmer+,

(dog* | cat?), pig*,

(goat | cow)?, (chicken+ | duck*)

)>

<farm>

<farmer>Farmer Maggot</farmer>

<cat>Tiddles</cat>

<duck>Donald</duck>

</farm>

DTD Example 4

mixed content (narrative XML)

<!ELEMENT paragraph (#PCDATA|name|profession|date|irony)*>

A <paragraph> element may contain any combination of <name>, <profession> or <date> elements interspersed with parsed character data.

<paragraph> Today’s date is <date month=“October” day=“1”/> and

<name>Stewart Massie</name>, a <profession>lecturer</profession> is delivering a <irony>scintillating</irony> XML lecture.</paragraph>

Defining attributes

• attributes assigned to elements using the <!ATTLIST …> instruction

• ATTLIST defines– Which element the attribute belongs to– The name of the attribute– The values the attribute can take– Possible default values– Whether the attribute MUST be present or not

Attribute values

• In HTML all attributes are text

• DTDs support 10 attribute types

• Most common are:– CDATA (literal text)– ID (unique identifier)– NMTOKEN (“no whitespace”)– Enumeration (of all possible values)

Conditions on attributes

• #REQUIRED– the attribute must be given a value in the XML

• #IMPLIED– the attribute may be omitted from the XML

• #FIXED– the value of the attribute is fixed and defined in

the DTD

• literal– a default value is supplied literally in the DTD

Example attribute declarations

<!ELEMENT pig (PCDATA)><!ATTLIST pig weight CDATA #REQUIRED><!ATTLIST pig id_code ID #REQUIRED><!ATTLIST pig name NMTOKEN #IMPLIED><!ATTLIST pig sex (M | F) “F”><!ATTLIST pig canFly FIXED “no”>

<pig weight = “1000kg”id_code = “pig017”>

Porky</pig>

entities

• used to represent text that would cause parsing problems

• &lt; represents <

• &amp; represents &

• &gt; represents >

• &quot; represents “• &apos; represents ‘

defining entities

• <!ENTITY label replacementText>

• <!ENTITY super supercallifragilisticexpialidocious>

• now &super; is replaced in the XML (or in attribute values) by supercallifragilisticexpialidocious

CDATA or PCDATA?

• PCDATA– Parsed Character DATA– will be parsed for entities

• CDATA– Character DATA– Will NOT be parsed– CDATA sections are sometimes included in

xml to include “literal” sections of code

Writing a CDATA section

<!CDATA[Hi! I’m a CDATA section!I can include anything that would normally upset the parser:

<?<<< &&&;; ><></> hahahahahahaha!!!The only thing I have to avoid is a double square closing bracket, which means the CDATA has ended.

]]>

Validation of xml

• Validation means checking that an xml document conforms to its DTD

• Adds security to automatic processing

• Allows free machine-machine exchange of xml

• Applied before manipulating xml– See XSLT, SAX, DOM later

Well-formed vs valid

• Well-formed xml– The data obeys the xml syntax rules

• Valid xml– The data is well-formed xml– The data has a DTD– The data conforms to the DTD

• xml data may be well-formed but invalid

xml parser types

• validating parser– checks XML is well-formed

• conforms to XML specification

– checks XML is valid (has and matches a DTD)

• non-validating parser– only checks XML is well-formed– may pass invalid XML

top related