5 processing xml 5 - 2 parsing xml documents document object model (dom) simple api for xml (sax)...

24
5 Processing XML

Upload: martin-ford

Post on 27-Dec-2015

243 views

Category:

Documents


4 download

TRANSCRIPT

5

Processing XML

5 - 2

Parsing XML documents Document Object Model (DOM) Simple API for XML (SAX)

Class generation

Overview

5 - 3

What's the Problem?

<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>

</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>

...</book>

</books>

?

Book

?

5 - 4

Parsing XML Documents

Document Tree

Parser

Docu-ment

DTD /Schema

Applicationimplements

DocumentHandler

endDocument

startDocument

endElement

endElement

startElement

startElement

DOM SAX

5 - 5

Parser

Project X (Sun Microsystems) Ælfred (Microstar Software) XML4J (IBM) Lark (Tim Bray) MSXML (Microsoft) XJ (Data Channel) Xerces (Apache) ...

5 - 6

Prescod

book

PrenticeHall

<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>

</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>

...</book>

</books>

The Document Object Model

XML Document Structure

The XMLHandbook Goldfarb 655

books

book

publisher pages isbnauthortitle

...

5 - 7

The Document Object Model

Provides a standard interface for access to and manipulation of XML structures.

Represents documents in the form of a hierarchy of nodes.

Is platform- and programming-language-neutral

Is a recommendation of the W3C (October 1, 1998)

Is implemented by many parsers

5 - 8

DOM - Structure Model

Document

Node

NodeList

Element

Prescod

book

PrenticeHall

The XMLHandbook Goldfarb 655

books

book

publisher pages isbnauthortitle

...

5 - 9

The Document Interface

Method Result

docTypeimplementationdocumentElementgetElementsByTagName(String)createTextNode(String)createComment(String)createElement(String)create CDATASection(String)

DocumentTypeDOMImplementationElementNodeListStringCommentElementCDATASection

5 - 10

The Node Interface

Method Result

nodeNamenodeValuenodeTypeparentNodechildNodesfirstChildlastChildpreviousSiblingnextSiblingattributesinsertBefore(Node new,Node ref)replaceChild(Node new,Node old)removeChild(Node)hasChildNode

StringStringshortNodeNodeListNodeNodeNodeNodeNodeNamedMapNodeNodeNodeBoolean

5 - 11

Node Types / Node NamesResult: NodeType /NodeName

Node Node Node Fields Type NameELEMENT_NODE 1 tagNameATTRIBUTE_NODE 2 name of attributeTEXT_NODE 3 "#text"CDATA_SECTION_NODE 4 "#cdata-section"ENTITY_REFERENCE_NODE 5 name of entity referencedENTITY_NODE 6 entity namePROCESSING_INSTRUCTION_NODE 7 targetCOMMENT_NODE 8 "#comment"DOCUMENT_NODE 9 "#document"DOCUMENT_TYPE_NODE 10 document type nameDOCUMENT_FRAGMENT_NODE 11 "#document-fragment"NOTATION_NODE 12 notation name

5 - 12

The NodeList Interface

Method Result

lengthitem(int)

IntNode

5 - 13

The Element Interface

Method Result

tagNamegetAttribute(String)setAttribute(String name, String value)removeAttribute(String)getAttributeNode(String)setAttributeNode(Attr)removeAttributeNode(String)getElementsByTagName

StringStringAttr

AttrAttr

NodeList

5 - 14

DOM Methods for Navigation

firstChild lastChild

nextSiblingpreviousSibling

parentNode

getElementsByTagName

childNodes(length, item())

5 - 15

DOM Methods for Manipulation

appendChildinsertBeforereplaceChildremoveChild

createElementcreateAttributecreateTextNode

5 - 16

Example

Goldfarb Spencer

books

book book

author authorauthor

Prescod

doc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).datadoc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).data

Root NodeDOM

Object TextBookssecondAuthor

TextSubnodes

firstthereof

firstBook

Authors

5 - 17

Script

<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">

var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)

alert(doc.parseError.reason); else {

root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}

</SCRIPT></BODY></HTML>

<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">

var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)

alert(doc.parseError.reason); else {

root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}

</SCRIPT></BODY></HTML>

5 - 18

SAX - Simple API for XML

Docu-ment

DTD

Application

endDocument

startDocument

endElement

endElement

startElement

startElement

Parser

5 - 19

SAX - Simple API for XML

Event-driven parsing model "Don't call the DOM, the parser calls you." Developed by the members of the XML-DEV Mailing List Released on May 11, 1998 Supported by many parsers ... ... but Ælfred is the saxon king.

5 - 20

Procedure

DOM Creating a parser instance Parsing the whole document Processing the DOM tree

SAX Creating a parser instance Registrating event handlers with the parser Parser calls the event handler during parsing

5 - 21

Namespace Support

<?xml version="1.0"?><order xmlns="http://www.net-standard.com/namespaces/order" xmlns:bk="http://www.net-standard.com/namespaces/books" xmlns:cust="http://www.net-standard.com/namespaces/customer">...<bk:book> <bk:title>XML Handbook</bk:title> <bk:isbn>0130811521</bk:isbn></bk:book>....</order>

5 - 22

Access to Qualified Elements

Node "book"

bk:book

http://www.net-standard.com/namespaces/books

bk

book

Interface "Node"

DOM Level 2

Method

nodeName

namespaceURI

prefix

localName

qName

uri

localName

SAX 2.0

startElement

5 - 23

Generation of Data Structures

DTD / Schema'yacht'

Generation

01 yacht05 name05 details10 type

Class

Processing

<?xml?><yacht yachtid='147'><name>Mona Lisa</name><image file='yacht147.jpg'/><description> Any text describing this yacht 147</description><details> <type>GULFSTAR 55</type> ength>1700</length> <width>480</width> <draft>170</draft> <sailsurface>112</sailsurface> <motor>84</motor> <headroom>202</headroom> <bunks>8</bunks></details></yacht>

01 yacht05 VENTANA05 details10 GULFSTAR 55

Object

5 - 24

Summary

To avoid expensive text processing, applications use an XML parser that creates a DOM tree of a document.

The DOM provides a standardized API to access the content of documents and to manipulate them.

Alternatively or additionally, applications can work event-based using the SAX interface, which is provided by many parsers.