sdpl 2003notes 3: xml processor interfaces1 3. xml processor apis n how can applications manipulate...

28
SDPL 2003 Notes 3: XML Processor In terfaces 1 3. XML Processor APIs 3. XML Processor APIs How can applications manipulate How can applications manipulate structured documents? structured documents? An overview of document parser An overview of document parser interfaces interfaces 3.1 SAX: an event-based interface 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing 3.3 JAXP: Java API for XML Processing

Upload: corey-chase

Post on 27-Dec-2015

252 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 1

3. XML Processor APIs3. XML Processor APIs

How can applications manipulate How can applications manipulate structured documents?structured documents?– An overview of document parser interfacesAn overview of document parser interfaces

3.1 SAX: an event-based interface3.1 SAX: an event-based interface

3.2 DOM: an object-based interface3.2 DOM: an object-based interface

3.3 JAXP: Java API for XML Processing3.3 JAXP: Java API for XML Processing

Page 2: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 2

Document Parser InterfacesDocument Parser Interfaces

Every XML application contains some kind of a Every XML application contains some kind of a parserparser– editors, browsers editors, browsers – transformation/style engines, DB loaders, ...transformation/style engines, DB loaders, ...

XML parsers are becoming standard tools of XML parsers are becoming standard tools of application development frameworksapplication development frameworks– JDK 1.4 contains JAXP, with its default parser JDK 1.4 contains JAXP, with its default parser

(Apache Crimson)(Apache Crimson)

(See, e.g., Leventhal, Lewis & Fuchs: Designing XML (See, e.g., Leventhal, Lewis & Fuchs: Designing XML Internet Applications, Chapter 10, and Internet Applications, Chapter 10, and D. Megginson: Events vs. Trees)D. Megginson: Events vs. Trees)

Page 3: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 3

Tasks of a ParserTasks of a Parser

Document instance decompositionDocument instance decomposition– elements, attributes, text, processing instructions, elements, attributes, text, processing instructions,

entities, ...entities, ... VerificationVerification

– well-formedness checking well-formedness checking » syntactical correctness of XML markupsyntactical correctness of XML markup

– validation (against a DTD or Schema)validation (against a DTD or Schema) Access to contents of the DTD (if supported)Access to contents of the DTD (if supported)

– SAX 2.0 Extensions provide info of declarations: SAX 2.0 Extensions provide info of declarations: element type names and their content model element type names and their content model expressionsexpressions

Page 4: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 4

Document Parser InterfacesDocument Parser Interfaces

I: Event-based interfacesI: Event-based interfaces– Command line and ESIS interfacesCommand line and ESIS interfaces

» Element Structure Information Set, traditional Element Structure Information Set, traditional interface to stand-alone SGML parsersinterface to stand-alone SGML parsers

– Event call-back interfaces: SAXEvent call-back interfaces: SAX

II: Tree-based (object model) interfacesII: Tree-based (object model) interfaces– W3C DOM RecommendationW3C DOM Recommendation

Page 5: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 5

Command-line ESIS interfaceCommand-line ESIS interface

ApplicationApplication

SGML/XML ParserSGML/XML Parser

CommandCommandline callline call

ESISESISStreamStream

<A<A </A></A>Hi!Hi!i="1"i="1">>

(A(AAi CDATA 1Ai CDATA 1

-Hi!-Hi!)A)A

Page 6: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 6

Event Call-Back InterfacesEvent Call-Back Interfaces

Application implements a set of Application implements a set of callback callback methodsmethods for handling parse events for handling parse events– parser notifies the application by method callsparser notifies the application by method calls– method parameters qualify events further method parameters qualify events further

» element type nameelement type name» names and values of attributesnames and values of attributes» values of content strings, …values of content strings, …

Idea behind ‘‘SAX’’ (Simple API for XML)Idea behind ‘‘SAX’’ (Simple API for XML)– an industry standard API for XML parsersan industry standard API for XML parsers– could think as “could think as “SSerial erial AAccess ccess XXML”ML”

Page 7: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 7

An event call-back applicationAn event call-back application

Application Main Application Main RoutineRoutine

startDocument()startDocument()

startElement()startElement()

characters()characters()

Parse(Parse())

Callback

Callback

Routines

Routines

endElement()endElement() <A i="1"><A i="1"> </A></A>Hi!Hi!

"A",[i="1"]"A",[i="1"]

"Hi!""Hi!"

"A""A"<?xml version='1.0'?><?xml version='1.0'?>

Page 8: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 8

Object Model InterfacesObject Model Interfaces

Application interacts with Application interacts with – a parser objecta parser object– a document a document parse treeparse tree object consisting of object consisting of

objects for objects for documentdocument, , elements, attributes, textelements, attributes, text, …, … Abstraction level higher than in event based Abstraction level higher than in event based

interfaces; more powerful access interfaces; more powerful access – to descendants, following siblings, …to descendants, following siblings, …

Drawback: Higher memory consumptionDrawback: Higher memory consumption– > applied mainly in client applications > applied mainly in client applications

(to implement document manipulation by user)(to implement document manipulation by user)

Page 9: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 9

An Object-Model Based ApplicationAn Object-Model Based Application

ApplicationApplication

ParserParserObjectObject

In-Memory In-Memory Document Document

RepresentationRepresentationParseParse

Access/Access/ModifyModify

BuildBuild

DocumentDocument

i=1i=1AA

"Hi!""Hi!"

<A i="1"><A i="1"> </A></A>Hi!Hi!

Page 10: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 10

3.1 The SAX Event Callback API3.1 The SAX Event Callback API

A de-facto industry standardA de-facto industry standard– Developed by members of the xml-dev mailing listDeveloped by members of the xml-dev mailing list– Version 1.0 in May 1998, Vers. 2.0 in May 2000Version 1.0 in May 1998, Vers. 2.0 in May 2000– NotNot a parser, but a common a parser, but a common interfaceinterface for many for many

different parsers (like, say, JDBC is a common different parsers (like, say, JDBC is a common interface to various RDBs)interface to various RDBs)

Supported directly by major XML parsersSupported directly by major XML parsers– most Java based and free: most Java based and free:

Apache Xerces, Oracle's XML Parser for Java; Apache Xerces, Oracle's XML Parser for Java; MSXML (in IE 5), James Clark's XPMSXML (in IE 5), James Clark's XP

Page 11: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 11

SAX 2.0 InterfacesSAX 2.0 Interfaces

Interplay between an application and a SAX-Interplay between an application and a SAX-conformant parser specified in terms of conformant parser specified in terms of interfaces interfaces (i.e., collections of methods)(i.e., collections of methods)

One way to classify of SAX interfaces:One way to classify of SAX interfaces:– Parser-to-application (or call-back) interfacesParser-to-application (or call-back) interfaces

» to attach special behaviour to parsing eventsto attach special behaviour to parsing events

– Application-to-parser interfacesApplication-to-parser interfaces» to use the parserto use the parser

– Auxiliary interfacesAuxiliary interfaces» to manipulate parser-provided informationto manipulate parser-provided information

Page 12: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 12

Call-Back InterfacesCall-Back Interfaces

Implemented by Implemented by applicationapplication to override default to override default behaviour (of ignoring events quietly)behaviour (of ignoring events quietly)– ContentHandlerContentHandler

» methods to process document parsing eventsmethods to process document parsing events– DTDHandlerDTDHandler

» methods to receive notification of unparsed external methods to receive notification of unparsed external entities and their notations declared in the DTDentities and their notations declared in the DTD

– ErrorHandlerErrorHandler» methods for handling parsing errors and warningsmethods for handling parsing errors and warnings

– EntityResolverEntityResolver» methods for customised processing of external methods for customised processing of external

entity references entity references

Page 13: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 13

Application-to-Parser InterfacesApplication-to-Parser Interfaces

Implemented by Implemented by parserparser (or a SAX driver): (or a SAX driver):

– XMLReaderXMLReader» methods to invoke the parser and to register objects methods to invoke the parser and to register objects

that implement call-back interfacesthat implement call-back interfaces

– XMLFilter XMLFilter (extends XMLReader)(extends XMLReader)» interface to connect interface to connect XMLReaderXMLReaders in a row as a s in a row as a

sequence of filterssequence of filters» obtains events from an obtains events from an XMLReaderXMLReader and passes and passes

them further (possibly modified)them further (possibly modified)

Page 14: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 14

SAX 2.0: Auxiliary InterfacesSAX 2.0: Auxiliary Interfaces

AttributesAttributes– methods to access a list of attributes, e.g:methods to access a list of attributes, e.g:

int getLength()int getLength()String getValue(String attrName)String getValue(String attrName)

LocatorLocator– methods for locating the origin of parse events methods for locating the origin of parse events

(e.g. systemID, line and column numbers, say, for (e.g. systemID, line and column numbers, say, for reporting semantic errors controlled by the reporting semantic errors controlled by the application) application)

Page 15: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 15

The The ContentHandlerContentHandler Interface Interface

Information of general document events. (See API Information of general document events. (See API documentation for a complete list):documentation for a complete list):

setDocumentLocator(Locator locator)setDocumentLocator(Locator locator) – Receive a locator for the origin of SAX document eventsReceive a locator for the origin of SAX document events

startDocument()startDocument();; endDocument()endDocument() – notify the beginning/end of a document. notify the beginning/end of a document.

startElement(String nsURI, startElement(String nsURI, String localName, String localName,

String rawName,String rawName,Attributes atts)Attributes atts);;

endElement( … )endElement( … ); ; similar (without attributes)similar (without attributes)

for for namespace namespace

supportsupport

Page 16: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 16

Namespaces in SAX: ExampleNamespaces in SAX: Example

<xsl:stylesheet version="1.0" <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict"> xmlns="http://www.w3.org/TR/xhtml1/strict"> <xsl:template match="/"> <xsl:template match="/"> <html> <html>

<xsl:value-of select="//total"/><xsl:value-of select="//total"/> </html> </html> </xsl:template> </xsl:template>

</xsl:stylesheet></xsl:stylesheet>

startElementstartElement for this would pass following parameters: for this would pass following parameters:

– nsURI= nsURI= http://www.w3.org/1999/XSL/Transformhttp://www.w3.org/1999/XSL/Transform

– localname = localname = template, template, rawName =rawName = xsl:template xsl:template

Page 17: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 17

Namespaces: Example (2)Namespaces: Example (2)

<xsl:stylesheet version="1.0" ... <xsl:stylesheet version="1.0" ... xmlns="http://www.w3.org/TR/xhtml1/strict"> xmlns="http://www.w3.org/TR/xhtml1/strict"> <xsl:template match="/"> <xsl:template match="/"> <html> ... </html> <html> ... </html> </xsl:template> </xsl:template>

</xsl:stylesheet></xsl:stylesheet>

endElementendElement for for htmlhtml would give would give– nsURI=nsURI= http://www.w3.org/TR/xhtml1/strict http://www.w3.org/TR/xhtml1/strict

(as default namespace for element names without a (as default namespace for element names without a prefix),prefix), localname = localname = html, html, rawName =rawName = html html

Page 18: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 18

ContentHandlerContentHandler interface (cont.) interface (cont.)

characters(char ch[], characters(char ch[], int start, int length)int start, int length)

– notification of character data notification of character data ignorableWhitespace(char ch[], ignorableWhitespace(char ch[],

int start, int length)int start, int length)– notification of ignorable whitespace in element content notification of ignorable whitespace in element content

<!DOCTYPE A [<!ELEMENT A (B)> <!DOCTYPE A [<!ELEMENT A (B)> <!ELEMENT B (#PCDATA)> ]> <!ELEMENT B (#PCDATA)> ]>

<A><A><B> <B> </B></B>

</A> </A>

Ignorable whitespaceIgnorable whitespace

Text contentText content

Page 19: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 19

SAX Processing Example (1)SAX Processing Example (1)

InputInput: XML representation of a personnel database:: XML representation of a personnel database:

<?xml version="1.0" encoding="ISO-8859-1"?><?xml version="1.0" encoding="ISO-8859-1"?>

<db><db><person idnum="1234"><person idnum="1234"><last>Kilpeläinen</last><first>Pekka</first></person><last>Kilpeläinen</last><first>Pekka</first></person>

<person idnum="5678"><person idnum="5678">

<last>Möttönen</last><first>Matti</first></person><last>Möttönen</last><first>Matti</first></person>

<person idnum="9012"><person idnum="9012"><last>Möttönen</last><first>Maija</first></person><last>Möttönen</last><first>Maija</first></person>

<person idnum="3456"><person idnum="3456"><last>Römppänen</last><first>Maija</first></person><last>Römppänen</last><first>Maija</first></person>

</db></db>

Page 20: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 20

SAX Processing Example (2)SAX Processing Example (2)

TaskTask: Format the document as a list like this:: Format the document as a list like this:

Pekka Kilpeläinen (1234)Pekka Kilpeläinen (1234)Matti Möttönen (5678)Matti Möttönen (5678)Maija Möttönen (9012)Maija Möttönen (9012)Maija Römppänen (3456)Maija Römppänen (3456)

Event-based processing strategy:Event-based processing strategy:– at the start of at the start of personperson, record the , record the idnum idnum (e.g.,(e.g., 1234 1234))

– notify starts and ends of notify starts and ends of lastlast and and first first to record to record their contents (e.g., "their contents (e.g., "KilpeläinenKilpeläinen" and "" and "PekkaPekka")")

– at the end of a at the end of a personperson, output the collected data, output the collected data

Page 21: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 21

SAX Processing Example (3)SAX Processing Example (3)

ApplicationApplication: Begin by importing relevant classes:: Begin by importing relevant classes:

importimport org.xml.sax.XMLReader; org.xml.sax.XMLReader;importimport org.xml.sax.Attributes; org.xml.sax.Attributes;importimport org.xml.sax.ContentHandler; org.xml.sax.ContentHandler;

//Default (no-op) implementation of//Default (no-op) implementation of//interface ContentHandler://interface ContentHandler:importimport org.xml.sax.helpers.DefaultHandler; org.xml.sax.helpers.DefaultHandler;

// SUN JAXP to instantiate a parser:// SUN JAXP to instantiate a parser:importimport javax.xml.parsers.*; javax.xml.parsers.*;

Page 22: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 22

SAX Processing Example (4)SAX Processing Example (4)

Implement relevant call-back methods:Implement relevant call-back methods:

public classpublic class SAXDBApp SAXDBApp extendsextends DefaultHandlerDefaultHandler{{

// Flags to remember element context:// Flags to remember element context:

privateprivate boolean InFirst = false, boolean InFirst = false, InLast = false; InLast = false;

// Storage for element contents and // Storage for element contents and

// attribute values:// attribute values:

privateprivate String FirstName, LastName, IdNum; String FirstName, LastName, IdNum;

Page 23: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 23

SAX Processing Example (5)SAX Processing Example (5)

Call-back methods:Call-back methods:– record the start of record the start of firstfirst and and last last elements, elements,

and the and the idnumidnum attribute of a attribute of a personperson::publicpublic void void startElementstartElement ( (

String namespaceURI, String localName,String namespaceURI, String localName, String rawName, Attributes atts) { String rawName, Attributes atts) {

ifif (rawName.equals("first")) (rawName.equals("first")) InFirst = true;InFirst = true;

ifif (rawName.equals("last")) (rawName.equals("last")) InLast = true;InLast = true;

ifif (rawName.equals("person")) (rawName.equals("person")) IdNum = atts.IdNum = atts.getValuegetValue("idnum");("idnum");

} // startElement} // startElement

Page 24: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 24

SAX Processing Example (6)SAX Processing Example (6)

Call-back methods continue:Call-back methods continue:– Record the text content of elements Record the text content of elements firstfirst and and lastlast::

publicpublic void void characterscharacters ( (char ch[], int start, int length) {char ch[], int start, int length) {

ifif (InFirst) FirstName = (InFirst) FirstName = newnew String(ch, start, length); String(ch, start, length);

ifif (InLast) LastName = (InLast) LastName = newnew String(ch, start, length); String(ch, start, length);

} // characters } // characters

Page 25: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 25

SAX Processing Example (7)SAX Processing Example (7)

At the end of At the end of personperson, output the collected data:, output the collected data:

publicpublic void void endElementendElement(String namespaceURI, (String namespaceURI, String localName, String qName) {String localName, String qName) {

ifif (qName.equals("person")) (qName.equals("person"))

System.out.println(FirstName + " " + System.out.println(FirstName + " " + LastName + " (" + IdNum + ")" );LastName + " (" + IdNum + ")" );

//Update the context flags://Update the context flags:

ifif (qName.equals("first")) (qName.equals("first"))

InFirst = false; InFirst = false;

//(Similarly for "last" and InLast)//(Similarly for "last" and InLast)

Page 26: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 26

SAX Processing Example (8)SAX Processing Example (8)

Application Application mainmain method: method:

publicpublic static void main (String args[]) static void main (String args[])

throwsthrows Exception { Exception {

// Instantiate an XMLReader (from JAXP // Instantiate an XMLReader (from JAXP

// SAXParserFactory):// SAXParserFactory):

SAXParserFactory spf =SAXParserFactory spf =SAXParserFactory.newInstance();SAXParserFactory.newInstance();

trytry { {

SAXParser saxParser = spf.newSAXParser();SAXParser saxParser = spf.newSAXParser();

XMLReaderXMLReader xmlReader = xmlReader =saxParser.getXMLReader();saxParser.getXMLReader();

Page 27: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 27

SAX Processing Example (9)SAX Processing Example (9)

MainMain method continues: method continues:// Instantiate and pass a new // Instantiate and pass a new // ContentHandler to xmlReader:// ContentHandler to xmlReader:

ContentHandlerContentHandler handler = handler = newnew SAXDBApp(); SAXDBApp(); xmlReader.xmlReader.setContentHandlersetContentHandler(handler);(handler); forfor (int i = 0; i < args.length; i++) { (int i = 0; i < args.length; i++) { xmlReader.xmlReader.parseparse(args[i]);(args[i]); }}

} } catchcatch (Exception e) { (Exception e) {System.err.println(e.getMessage());System.err.println(e.getMessage());

System.exit(1);System.exit(1);};};

} // main} // main

Page 28: SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser

SDPL 2003 Notes 3: XML Processor Interfaces 28

SAX: SummarySAX: Summary

A low-level parser-interface for XML A low-level parser-interface for XML documentsdocuments

Reports document parsing events Reports document parsing events through method call-backsthrough method call-backs– > efficient: does not create in-memory > efficient: does not create in-memory

representation of the documentrepresentation of the document– > often used on servers, and to process > often used on servers, and to process

LARGE documentsLARGE documents