sdpl 20113.4 streaming api for xml1 3.4 streaming api for xml (stax) n could we process xml...

31
SDPL 2011 3.4 Streaming API for XML 1 3.4 Streaming API for XML 3.4 Streaming API for XML (StAX) (StAX) Could we process XML documents Could we process XML documents more conveniently than with more conveniently than with SAX, and yet more efficiently? SAX, and yet more efficiently? A: Yes, with A: Yes, with Streaming API for XML Streaming API for XML (StAX) (StAX) general introduction general introduction an example an example comparison with SAX comparison with SAX

Upload: lynne-small

Post on 02-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 1

3.4 Streaming API for XML (StAX)3.4 Streaming API for XML (StAX)

Could we process XML documents Could we process XML documents more conveniently than with SAX, and more conveniently than with SAX, and yet more efficiently?yet more efficiently?

A: Yes, with A: Yes, with Streaming API for XML (StAX)Streaming API for XML (StAX)– general introductiongeneral introduction– an examplean example– comparison with SAXcomparison with SAX

Page 2: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 2

StAX: GeneralStAX: General

Latest of standard Java XML parser interfaces Latest of standard Java XML parser interfaces – Origin: the XMLPull API (A. Slominski, ~ 2000)Origin: the XMLPull API (A. Slominski, ~ 2000)– developed developed as a Java Community Process lead by as a Java Community Process lead by

BEA Systems (2003)BEA Systems (2003)– included in JAXP 1.4, in Java WSDP 1.6, included in JAXP 1.4, in Java WSDP 1.6,

and in Java SE 6 (JDK 1.6)and in Java SE 6 (JDK 1.6)

An event-driven streaming API, like SAXAn event-driven streaming API, like SAX– does not build in-memory representationdoes not build in-memory representation

A A "pull API""pull API"– lets the application to ask for individual eventslets the application to ask for individual events– unlike a "push API" like SAXunlike a "push API" like SAX

Page 3: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Advantages of Pull ParsingAdvantages of Pull Parsing

A A pull APIpull API provides events, provides events, on demandon demand, , from the chosen streamfrom the chosen stream– can cancel parsing, say, after processing the can cancel parsing, say, after processing the

header of a long messageheader of a long message– can read multiple documents simultaneouslycan read multiple documents simultaneously– application-controlled access (~ application-controlled access (~ iterator iterator

design patterndesign pattern) usually simpler than SAX-) usually simpler than SAX-style call-backs (~ style call-backs (~ observer design patternobserver design pattern) )

SDPL 2011 3.4 Streaming API for XML 3

Page 4: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Cursor and Iterator APIsCursor and Iterator APIs

StAX consists of two sets of APIsStAX consists of two sets of APIs– (1)(1) cursor cursor APIs, APIs, and and (2) (2) iteratoriterator APIs APIs– differ by representation of parse eventsdiffer by representation of parse events

(1) (1) cursor API cursor API XMLStreamReaderXMLStreamReader– lower-levellower-level– methodsmethods hasNext() hasNext() andand next() next() to scan to scan events, events,

represented by as represented by as intint constants constants START_DOCUMENTSTART_DOCUMENT, , START_ELEMENTSTART_ELEMENT, ..., ...

– access methods, depending on current event type:access methods, depending on current event type:

– getName()getName(), , getAttributeValue(getAttributeValue(....)), , getText()getText(), ..., ...

SDPL 2011 3.4 Streaming API for XML 4

Page 5: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

(2) (2) XMLEventReaderXMLEventReader Iterator APIIterator API

XMLEventReaderXMLEventReader provides contents of an XML document provides contents of an XML document to the application using an to the application using an event objectevent object iterator iterator

Parse events represented as immutable Parse events represented as immutable XMLEvent XMLEvent objects objects – received using methods received using methods hasNext()hasNext()and and nextEvent()nextEvent() – event properties accessed through their methods event properties accessed through their methods – can be stored (if needed)can be stored (if needed)– require more resources than the cursor API (See require more resources than the cursor API (See

later)later) Event lookahead, without advancing in the stream, with Event lookahead, without advancing in the stream, with

XMLEventReader.peek() XMLEventReader.peek() and and XMLStreamReader.getEventType() XMLStreamReader.getEventType()

SDPL 2011 3.4 Streaming API for XML 5

Page 6: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Writing APIsWriting APIs

StAX is a StAX is a bidirectional bidirectional APIAPI allows also to allows also to writewrite XML data XML data through an through an XMLStreamWriterXMLStreamWriter or an or an XMLEventWriterXMLEventWriter

Useful for "marshaling" data structures into XMLUseful for "marshaling" data structures into XML WritersWriters are not required to force well-are not required to force well-

formedness (not to mention validity)formedness (not to mention validity) provide some support: escaping of reserved chars provide some support: escaping of reserved chars

like & and <, and adding unclosed end-tagslike & and <, and adding unclosed end-tags

SDPL 2011 3.4 Streaming API for XML 6

Page 7: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 7

Example of Using StAX (1/6)Example of Using StAX (1/6)

Use Use StAX iterator StAX iterator interfaces tointerfaces to– fold element tagnames to uppercase, and tofold element tagnames to uppercase, and to– strip commentsstrip comments

Outline:Outline:– Initialize Initialize

» an an XMLEventReaderXMLEventReader for the input document for the input document» an an XMLEventWriterXMLEventWriter (for (for System.outSystem.out ) )» an an XMLEventFactoryXMLEventFactory for creating modified for creating modified StartElementStartElement and and EndElementEndElement events events

– Use them to read all input events, and to write some Use them to read all input events, and to write some of them, possibly modifiedof them, possibly modified

Page 8: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 8

StAX example (2/6)StAX example (2/6)

First import relevant interfaces & classes:First import relevant interfaces & classes:importimport java.io.*;java.io.*;importimport javax.xml.stream.*; javax.xml.stream.*;importimport javax.xml.stream.events.*; javax.xml.stream.events.*;importimport javax.xml.namespace.QName; javax.xml.namespace.QName;

public class capitalizeTags { public class capitalizeTags {

public static void main(String[] args) public static void main(String[] args) throws throws FactoryConfigurationErrorFactoryConfigurationError,, XMLStreamException XMLStreamException,, IOException {IOException {

if (args.length != 1) System.exit(1);if (args.length != 1) System.exit(1); InputStream input = InputStream input =

new FileInputStream(args[0]);new FileInputStream(args[0]);

Page 9: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 9

StAX example (3/6)StAX example (3/6)

Initialize Initialize XMLEventReaderXMLEventReader//WriterWriter//FactoryFactory:: XMLInputFactoryXMLInputFactory xif = xif =

XMLInputFactory.newInstance()XMLInputFactory.newInstance(); ; xif.setProperty(xif.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, XMLInputFactory.IS_NAMESPACE_AWARE,

true)true);; XMLEventReaderXMLEventReader xer = xer =

xif.createXMLEventReader(input)xif.createXMLEventReader(input);;

XMLOutputFactoryXMLOutputFactory xof = xof = XMLOutputFactory.newInstance()XMLOutputFactory.newInstance();;

XMLEventWriterXMLEventWriter xew = xew = xof.createXMLEventWriter(System.out);xof.createXMLEventWriter(System.out);

XMLEventFactoryXMLEventFactory xef = xef = XMLEventFactory.newInstance()XMLEventFactory.newInstance();;

Page 10: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 10

StAX example (4/6)StAX example (4/6)

Iterate over events of the InputStream: Iterate over events of the InputStream: while (while (xer.hasNext()xer.hasNext() ) { ) { XMLEvent inEvent = xer.nextEvent()XMLEvent inEvent = xer.nextEvent();; if (if (inEvent.isStartElement()inEvent.isStartElement()) {) { StartElement StartElement sese == (StartElement) inEvent (StartElement) inEvent;; QName inQName = QName inQName = se.getName()se.getName();; String localName = inQName.getLocalPart();String localName = inQName.getLocalPart(); xew.add( xef.createStartElement(xew.add( xef.createStartElement( inQName.getPrefix(),inQName.getPrefix(), inQName.getNamespaceURI(),inQName.getNamespaceURI(), localName.toUpperCase(),localName.toUpperCase(), se.getAttributes(),se.getAttributes(), se.getNamespaces() ) )se.getNamespaces() ) );;

Page 11: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 11

StAX example (5/6)StAX example (5/6)

Event iteration continues, to capitalize end tags: Event iteration continues, to capitalize end tags:

} else if (} else if (inEvent.isEndElement()inEvent.isEndElement()) {) { EndElement ee = (EndElement) inEventEndElement ee = (EndElement) inEvent; ;

QName inQName = ee. QName inQName = ee.getName()getName();; String localName = inQName.getLocalPart();String localName = inQName.getLocalPart(); xew.add( xef.createEndElement(xew.add( xef.createEndElement( inQName.getPrefix(),inQName.getPrefix(), inQName.getNamespaceURI(),inQName.getNamespaceURI(), localName.toUpperCase(),localName.toUpperCase(), ee.getNamespaces() ) )ee.getNamespaces() ) );;

Page 12: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 12

StAX example (6/6)StAX example (6/6)

Output other events, except for comments; Output other events, except for comments; Finish when input ends:Finish when input ends:

} else if (} else if (inEvent.getEventType()inEvent.getEventType() != != XMLStreamConstants.COMMENTXMLStreamConstants.COMMENT) {) {

xew.add(inEvent)xew.add(inEvent); ; } }

} // while (xer.hasNext())} // while (xer.hasNext()) xer.close()xer.close(); input.close();; input.close(); xew.flush()xew.flush(); ; xew.close()xew.close();;} // main()} // main()} // class capitalizeTags} // class capitalizeTags

Page 13: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Efficiency of Streaming Efficiency of Streaming APIs?APIs?

An experiment of An experiment of SAXSAX vs vs StAXStAX for for scanning documentsscanning documents

Task: Count and report the number of elements, Task: Count and report the number of elements, attributes, character fragments, and total char lengthattributes, character fragments, and total char length

Inputs: Similar prose-oriented documents, Inputs: Similar prose-oriented documents, of different sizeof different size– repeated fragments of W3C XML Schema Rec (Part 1)repeated fragments of W3C XML Schema Rec (Part 1)

Tested on OpenJDK 1.6.0 (different updates), withTested on OpenJDK 1.6.0 (different updates), with– Red Hat Linux 6.0.52, 3 GHz Pentium ,1 GB RAM (”OLD”)Red Hat Linux 6.0.52, 3 GHz Pentium ,1 GB RAM (”OLD”)– 64 b Centos Linux 5, 2.93 GHz Intel Core 2 Duo, 4GB RAM64 b Centos Linux 5, 2.93 GHz Intel Core 2 Duo, 4GB RAM

(”NEW”)(”NEW”)

SDPL 2011 3.4 Streaming API for XML 13

Page 14: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Essentials of the Essentials of the SAXSAX Solution Solution

Obtain and use a JAXP Obtain and use a JAXP SAXSAX parser: parser:

String docFile; // initialized from cmd line String docFile; // initialized from cmd line

SAXParserFactory spf = SAXParserFactory spf = SAXParserFactory.newInstance();SAXParserFactory.newInstance();

spf.setValidating(validate); //from cmd option spf.setValidating(validate); //from cmd option

spf.setNamespaceAware(true);spf.setNamespaceAware(true);

SAXParser sp = spf.newSAXParser();SAXParser sp = spf.newSAXParser();

CountHandler ch = new CountHandler();CountHandler ch = new CountHandler();

sp.parse( new File(docFile), ch );sp.parse( new File(docFile), ch );

ch.printResult(); // print the statisticsch.printResult(); // print the statistics

SDPL 2011 3.4 Streaming API for XML 14

Page 15: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SAX Solution: SAX Solution: CountHandlerCountHandler

public static class CountHandler public static class CountHandler extends extends DefaultHandlerDefaultHandler {{

// Instance vars for statistics:// Instance vars for statistics:

int elemCount = 0, charFragCount = 0,int elemCount = 0, charFragCount = 0,

totalCharLen = 0, attrCount = 0;totalCharLen = 0, attrCount = 0; public void startElement(String nsURI, public void startElement(String nsURI,

String locName, String qName, String locName, String qName, Attributes atts) Attributes atts) { elemCount++; { elemCount++;

attrCount += attrCount += atts.getLength()atts.getLength(); }; }

public voidpublic void characters(char[] buf, int start, characters(char[] buf, int start,int length)int length) { charFragCount++; { charFragCount++;

totalCharLen += totalCharLen += lengthlength; } ; }

SDPL 2011 3.4 Streaming API for XML 15

Page 16: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Essentials of the Essentials of the StAXStAX Solution Solution

First, initializeFirst, initialize:: XMLInputFactory xif = XMLInputFactory xif =

XMLInputFactory.newInstance();XMLInputFactory.newInstance();

xif.setProperty(xif.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true);XMLInputFactory.IS_NAMESPACE_AWARE, true);

InputStream input = InputStream input = new FileInputStream( docFile );new FileInputStream( docFile );

int elemCount = 0, charFragCount = 0,int elemCount = 0, charFragCount = 0,

totalCharLen = 0, attrCount = 0;totalCharLen = 0, attrCount = 0;

Then parse the Then parse the InputStream,InputStream, using using (a) the cursor API, or (b) the event iterator API (a) the cursor API, or (b) the event iterator API

SDPL 2011 3.4 Streaming API for XML 16

Page 17: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

(a) StAX (a) StAX CursorCursor API Solution API Solution (1)(1)

XMLStreamReader xsr = XMLStreamReader xsr = xif.createXMLStreamReader(input);xif.createXMLStreamReader(input);

while(while(xsr.hasNext()xsr.hasNext() ) { ) {

int eventType = xsr.next();int eventType = xsr.next();

switch (eventType) {switch (eventType) {

case case XMLEvent.START_ELEMENTXMLEvent.START_ELEMENT::

elemCount++;elemCount++;

attrCount += attrCount += xsr.getAttributeCount()xsr.getAttributeCount();;

break;break;

SDPL 2011 3.4 Streaming API for XML 17

Page 18: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

(a) StAX Cursor API Solution (a) StAX Cursor API Solution (2)(2)

case case XMLEvent.CHARACTERSXMLEvent.CHARACTERS::

charFragCount++;charFragCount++;

totalCharLen += totalCharLen += xsr.getTextLength()xsr.getTextLength();;

break;break;

default: break; default: break;

} // switch} // switch

} // while (xsr.hasNext() )} // while (xsr.hasNext() )

xsr.close()xsr.close();;

input.close();input.close();

SDPL 2011 3.4 Streaming API for XML 18

Page 19: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

(b) StAX (b) StAX IteratorIterator API Solution API Solution (1)(1)

XMLEventReader xer = XMLEventReader xer = xif.createXMLEventReader ( input )xif.createXMLEventReader ( input );;

while (while (xer.hasNext()xer.hasNext() ) { ) { XMLEvent event = xer.nextEvent()XMLEvent event = xer.nextEvent();;

if (if (event.isStartElement()event.isStartElement()) {) {

elemCount++;elemCount++;

Iterator attrs =Iterator attrs = event.asStartElement().getAttributes() event.asStartElement().getAttributes();;

while (attrs.hasNext()) {while (attrs.hasNext()) {

attrs.next(); attrCount++; }attrs.next(); attrCount++; }

} // if (event.isStartElement()) } // if (event.isStartElement())

SDPL 2011 3.4 Streaming API for XML 19

Page 20: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

(b) StAX Iterator API Solution (b) StAX Iterator API Solution (2)(2)

if (if (event.isCharacters()event.isCharacters()) {) {

charFragCount++;charFragCount++;

totalCharLen +=totalCharLen += ((Characters) ((Characters)

event).getData()event).getData().length();.length();

}}

} // while (xer.hasNext() )} // while (xer.hasNext() )

xer.close()xer.close();;

input.close(); input.close();

SDPL 2011 3.4 Streaming API for XML 20

Page 21: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Efficiency of SAX vs StAX Efficiency of SAX vs StAX

100

150

200

250

300

350

400

450

500

550

0 500 1000 1500 2000 2500 3000

tim

e (

ms

)

s ize (KB)

Document scanning times

SAX + v alidateSAX

StAX ev entsStAX cursor

SDPL 2011 3.4 Streaming API for XML 21

Page 22: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Efficiency of SAX vs StAX Efficiency of SAX vs StAX (NEW) (NEW)

SDPL 2011 3.4 Streaming API for XML 22

0

100

200

300

400

500

600

700

800

0 500 1000 1500 2000 2500 3000

Tim

e (

ms

)

Size (KB)

Document scanning times

StAX eventsSAX + validate

SAXStAX cursor

Page 23: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

ObservationsObservations

StAX cursor API is the most efficientStAX cursor API is the most efficient Overhead of Overhead of XMLEventXMLEvent objects makes StAX objects makes StAX

iterator some 50 – 80% sloweriterator some 50 – 80% slower SAX is on small documents ~ 40 - 100% slower SAX is on small documents ~ 40 - 100% slower

than the StAX cursor APIthan the StAX cursor API Overhead of DTD validation adds ~5 – 10 % to Overhead of DTD validation adds ~5 – 10 % to

SAX parsing timeSAX parsing time

StAX loses its advantage with bigger documents: StAX loses its advantage with bigger documents:

SDPL 2011 3.4 Streaming API for XML 23

Page 24: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Times on Larger DocumentsTimes on Larger Documents

0

500

1000

1500

2000

2500

3000

3500

4000

5 10 15 20 25 30 35 40 45 50

tim

e (

ms

)

s ize (M B)

Document scanning times

StAX ev entsStAX cursor

SAX

SDPL 2011 3.4 Streaming API for XML 24

Why? Let's take a look at memory usage Why? Let's take a look at memory usage

Page 25: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Memory Usage of SAX vs Memory Usage of SAX vs StAXStAX

SDPL 2011 3.4 Streaming API for XML 25

StAX implementation has a memory leak!StAX implementation has a memory leak!(Should get fixed in future releases) (Should get fixed in future releases)

0

50

100

150

200

250

5 10 15 20 25 30 35 40 45 50

me

m (

MB

)

document size (M B)

Used main memory

StAX ev entsStAX cursor

SAX

< 6 MB< 6 MB

Page 26: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Memory Usage of SAX vs StAX Memory Usage of SAX vs StAX (NEW)(NEW)

SDPL 2011 3.4 Streaming API for XML 26

Memory-leak also in the SAX implementation!Memory-leak also in the SAX implementation!

0

50

100

150

200

250

300

350

400

450

500

5 10 15 20 25 30 35 40 45 50

me

m (

MB

)

document size (MB)

Used main memory

StAX eventsSAX

StAX cursor

Page 27: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Circumventing the Memory Circumventing the Memory LeakLeak

The bug appears to be related to a The bug appears to be related to a DOCTYPE declaration with an external DTDDOCTYPE declaration with an external DTD

Without a DOCTYPE declarationWithout a DOCTYPE declaration– In first experiment, each API uses less than 6 In first experiment, each API uses less than 6

MBMB– In second experiment, the In second experiment, the StAX Event StAX Event objects objects

still require increasing amounts of memory; still require increasing amounts of memory; See nextSee next

SDPL 2011 3.4 Streaming API for XML 27

Page 28: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SAX vs StAX memory need (w.o. SAX vs StAX memory need (w.o. DTD)DTD)

SDPL 2011 3.4 Streaming API for XML 28

0

20

40

60

80

100

120

140

160

180

5 10 15 20 25 30 35 40 45 50

me

m (

MB

)

document size (MB)

Used main memory (without DTD)

StAX eventsSAX DTD

StAX cursor

Page 29: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Speed on documents without Speed on documents without DTDDTD

SDPL 2011 3.4 Streaming API for XML 29

0

500

1000

1500

2000

2500

3000

5 10 15 20 25 30 35 40 45 50

tim

e (

ms

)

s ize (M B)

Scan times for documents w .o. DTD

StAX ev entsStAX cursor

SAX

Page 30: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Speed on documents without DTD Speed on documents without DTD (NEW)(NEW)

SDPL 2011 3.4 Streaming API for XML 30

200

400

600

800

1000

1200

1400

1600

1800

5 10 15 20 25 30 35 40 45 50

tim

e (

ms

)

size (MB)

Scan times for documents w.o. DTD

StAX eventsSAX

StAX cursor

Page 31: SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

SDPL 2011 3.4 Streaming API for XML 31

StAX: SummaryStAX: Summary

Event-based streaming pull-API for XML Event-based streaming pull-API for XML documentsdocuments

More convenient than SAXMore convenient than SAX– and often more efficient, esp. the cursor API with small and often more efficient, esp. the cursor API with small

docsdocs

Supports also writing of XML dataSupports also writing of XML data A potential substitute for SAXA potential substitute for SAX

– NB: Sun Java Streaming XML Parser (in JDK 1.6) is NB: Sun Java Streaming XML Parser (in JDK 1.6) is non-non-validatingvalidating (but the API allows validation, too) (but the API allows validation, too)

– once some implementation bugs (in JDK 1.6) get once some implementation bugs (in JDK 1.6) get eliminatedeliminated