xml overview / schema / dom brent p. christie major usmc

23
XML Overview / Schema / DOM Brent P. Christie Major USMC

Upload: anissa-barrett

Post on 02-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

XMLOverview / Schema / DOM

Brent P. Christie

Major USMC

XML Overview What is XML?

– eXtensible Markup Language

– Meta-markup language defined by the World Wide Web Consortium (W3C, specification at www.w3c.org/XML)

• Meta, define your own markup language

• Supercedes other markup languages, HTML

– XML is a subset of Standard Generalized Markup Language (SGML)

• XML easier-to-use subset of SGML

• HTML an application of SGML

XML Overview What’s so great about it?

– Easy Data Exchange• Non-proprietary, no patent or copyright (encoded)

• Stored as text, human readable

• Efficient storage, yes text is efficient, MS bloatware

– Customized Markup Languages• Customized browsers or applications (IFX, CML)

– Self-Describing Data Specification<?xml version="1.0" encoding="UTF-8"?>

<DOCUMENT>

<GREETING>

Hello From XML

</GREETING>

<MESSAGE>

Welcome to the wild and woolly world of XML.

</MESSAGE>

</DOCUMENT

XML Overview– Structured and Integrated Data

• The logical design of a document (content) should be separate from its visual design (presentation)

• Separation of logical and visual design– promotes sound typography– encourages better writing– is more flexible

• XML can be used to define the logical design, while the XSL (eXtensible Style Language) is used to define the visual design (usually by mapping XML into HTML).

How XML fits into the new HTML world:– XML describes the logical structure of the document.– CSS (Cascading Style Sheets) or other style language describes the visual

presentation of the document.– The DOM (Document Object Model) allows scripting languages, such as

JavaScript to access document objects.

XML Overview SGML = Standard Generalized ML

– Defined by ISO 8879. Has been the standard, vendor-independent way to maintain repositories of structured documentation for more than a decade

– A SGML document carries with it a grammar called a Document Type Definition (DTD). The DTD defines the tags and the meaning of those tags

– Presentation is governed by a style sheet written in the Document Style Semantics and Specification Language (DSSSL)

– Note that HTML is a fixed SGML application, a hard-wired set of about 70 tags and 50 attributes, and does not need to have a DTD.

XML Overview XML is SGML Lite

– XML is also an SGML application, but since XML is extensible (XML is also a metalanguage), every XML document must be accompanied by its DTD or schema

– XML is a compromise between the non-extensible, limited capabilities of HTML and the full power and complexity of SGML

– XML offers “80% of the benefits of SGML for 20% of its complexity”

• XML designers tried to leave out all the SGML that would be rarely used on the web

• Note that XML specification is 30 pages and the SGML specification is 500 pages.

– XML allows you to define your own tags and to describe nested hierarchies of information

XML OverviewWhy XML?

– XML was created so that richly structured documents could be used over the web, nothing else is practical

– HTML, as we've already discussed, comes bound with a set of semantics and does not provide arbitrary structure

– SGML provides arbitrary structure, but is too difficult/expensive to implement just for a web browser.

XML Design Goals1. XML shall be usable over the Internet

2. XML shall support a variety of applications

3. XML shall be compatible with SGML

4. It shall be easy to write programs that process XML documents

5. Optional features in XML shall be kept to the absolute minimum, ideally zero

6. XML documents should be human-legible and reasonably clear

7. Design of XML should be prepared quickly

8. Design of XML shall be formal and concise

9. XML documents shall be easy to create

10. Terseness in XML markup is of minimal importance

XML and Related Acronyms Document Type Definition (DTD), which defines the tags and their relationships Extensible Style Language (XSL) style sheets, which specify the presentation of

the document Cascading Style Sheets(CSS) less powerful presentation technology without tag

mapping capability XPATH which specifies location in document XLINK and XPOINTER which defines link-handling details Resource Description Framework (RDF), document metadata Document Object Model (DOM), API for converting the document to a tree

object in your program for processing and updating Simple API for XML (SAX), “serial access” protocol, fast-to-execute protocol for

processing document on the fly XML Namespaces, for an environment of multiple sets of XML tags XHTML, a definition of HTML tags for XML documents (which are then just

HTML documents) XML schema, offers a more flexible alternative to DTD

XML Overview XML constraints:

1. Well-formedness, W3C, if its not well-formed its not XML.

– A data object is an XML document if it is well-formed, as defined in this specification. A well-formed XML document my in addition be valid if it meets certain further constraints

– Properly nested and nonabbreviated starting and ending tags are used, syntax rules.

– Stops browsers from fixing bugs– Allows parsing of document

– By, well defined encapsulation mechanism allowing designated sections of the data to be accessed programmatically.

2. Validity.– Obey DTD or schema

DTD The DTD specifies the logical structure of the document; it is a

formal grammar describing document syntax and semantics The DTD does not describe the physical layout of the document;

this is left to the style sheets and the scripts It is no mean task to write a DTD, so most users will adopt

predefined DTDs (or can write an XML document without a DTD).

DTDs can be written in separate files to facilitate re-use. Content-providers, industries and other groups can collaborate

to define sets of tags: the essence of “any” field (physics, music …) is captured in a domain specific DTD

DTDs store all data as text #PCDATA. This lack of precision is one of the reasons XML schemas were developed.

<?xml version="1.0"?><!DOCTYPE BOOK [ <!ELEMENT p (#PCDATA)> <!ELEMENT BOOK (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)> <!ELEMENT OPENER (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+> <!ELEMENT PART (HEADER, CHAPTER+)> <!ELEMENT SECTION (HEADER, p+)> <!ELEMENT HEADER (#PCDATA)> <!ELEMENT CHAPTER (CHAPTER_NUMBER, CHAPTER_TEXT)> <!ELEMENT CHAPTER_NUMBER (#PCDATA)> <!ELEMENT CHAPTER_TEXT (p)+>]><BOOK> <OPENER> <TITLE_TEXT> All About Me </TITLE_TEXT> </OPENER> <PART> <HEADER>Welcome To My Book</HEADER> <CHAPTER> <CHAPTER_NUMBER>CHAPTER 1</CHAPTER_NUMBER> <CHAPTER_TEXT> <p>Glad you want to hear about me.</p> <p>There's so much to say!</p> <p>Where should we start?</p> <p>How about more about me?</p> </CHAPTER_TEXT> </CHAPTER> </PART></BOOK>

Schema vs. DTD Dismissing DTD’s

– DTD simple, easy to use but limited– Schemas far more powerful and precise

• Not only syntax, as in DTD, but also:– specify actual data types of each element’s content

» simple and complex(sub-type)– inheritance, syntax from other schemas– annotate schemas– multiple namespaces– min and max occurrence of element– create list types– create attribute groups – restrict the ranges of values that elements can hold– restrict what other schemas can inherit from yours– merge fragments of multiple schemas together– require that attribute or element values be unique

Purchase Order Schema<xsd:schema xmlns:xsd="http://www.w3.org/2000/08/XMLSchema"> <xsd:annotation> <xsd:documentation> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation><xsd:element name="purchaseOrder" type="PurchaseOrderType"/><xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

Purchase Order Schema<xsd:complexType name="USAddress">

<xsd:sequence>

<xsd:element name="name" type="xsd:string"/>

<xsd:element name="street" type="xsd:string"/>

<xsd:element name="city" type="xsd:string"/>

<xsd:element name="state" type="xsd:string"/>

<xsd:element name="zip" type="xsd:decimal"/>

</xsd:sequence>

<xsd:attribute name="country" type="xsd:NMTOKEN"

use="fixed" value="US"/>

</xsd:complexType>

Purchase Order Schema<xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU"/> </xsd:complexType>

Purchase Order Schema </xsd:element> <!– End item specification </xsd:sequence> <!– End sequence for items specification </xsd:complexType> <!– End items specification

<!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>

</xsd:schema>

DOM vs. SAX SAX (Simple API for XML) and DOM (Document Object Model) were

created to serve the same purpose, which is giving you access to the information stored in XML documents using any programming language (and a parser for that language). However, both of them take very different approaches to giving you access to your information.

What is SAX?– SAX chooses to give you access to the information in your XML document, not as

a tree of nodes, but as a sequence of events!

– SAX chooses not to create a default Java object model on top of your XML document (like DOM does).

• Faster

• Necessitates– creation of your own custom object model

– creation of a class that listens to SAX events and properly creates your object model.

– In the case of DOM, the parser does almost everything, read the XML document in, create a Java object model on top of it and then give you a reference to this object model (a Document object) so that you can manipulate it.

DOM vs. SAX– All SAX requires is that the parser should read in the XML document, and fire a

bunch of events depending on what tags it encounters in the XML document

– You are responsible for interpreting these events by writing an XML document handler class, which is responsible for making sense of all the tag events and creating objects in your own object model. So you have to write:

• your custom object model to "hold" all the information in your XML document into

• a document handler that listens to SAX events (which are generated by the SAX parser as its reading your XML document) and makes sense of these events to create objects in your custom object model.

What kinds of SAX events are fired by the SAX parser?

– will fire an event for every open tag, and every close tag. It also fires events for #PCDATA and CDATA sections

– SAX also fires events for processing instructions, DTDs, comments

– your handler has to interpret these events (and the sequence of the events) and make sense out of them.

DOM vs. SAX What is the Document Object Model (DOM)?

– DOM gives you access to the information stored in your XML document as a hierarchical object model.

– DOM creates a tree of nodes (based on the structure and information in your XML document) and you can access your information by interacting with this tree of nodes.

DOM vs SAX Once a document object tree has been created (by the

XML parser, or your own code), you can access elements in that tree and you can also modify, delete and create leaves and branches by using the interfaces in the API.

DOM vs SAXThings to think about

– DOM is W3C standardize

– Level 3 recommendation will address content models (DTD and schemas)

– Tree-based APIs put a great strain on system resources, document is large. Furthermore, some applications need to build their own, different data trees, and it is very inefficient to build a tree of parse nodes, only to map it onto a new tree

What do you think?

Questions