xerces2: the sequel with no equal
DESCRIPTION
Xerces2: The Sequel With No Equal. Andy Clark. Introduction. Speaker Worked for IBM Currently unemployed Parser First developed in IBM’s Tokyo research lab Maintained and expanded in California Donated to Apache Work continues in Toronto. Agenda. Xerces1 Overview - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/1.jpg)
20 November 2002 ApacheCon US - Las Vegas, Nevada 1
Xerces2:The Sequel With No Equal
Andy Clark
![Page 2: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/2.jpg)
ApacheCon US - Las Vegas, Nevada 220 November 2002
Introduction
SpeakerWorked for IBMCurrently unemployed
ParserFirst developed in IBM’s Tokyo research labMaintained and expanded in CaliforniaDonated to ApacheWork continues in Toronto
![Page 3: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/3.jpg)
ApacheCon US - Las Vegas, Nevada 320 November 2002
Agenda
Xerces1 OverviewDesign and problems
Xerces2 OverviewChallenges and design
Q & A
![Page 4: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/4.jpg)
4ApacheCon US - Las Vegas, Nevada20 November 2002
Xerces1 Overview:Design and Problems
Andy Clark
![Page 5: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/5.jpg)
5ApacheCon US - Las Vegas, Nevada20 November 2002
Design
XML4J/Xerces1 designed for performance Parser Implementation
Parsing pipelineCustom reader implementationsStringPool
Defers transcoding of byte buffers until needed Symbol table for common document strings
![Page 6: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/6.jpg)
6ApacheCon US - Las Vegas, Nevada20 November 2002
Scanner Validator Parser
Intended to be generic
XML API
Pipeline Configuration
![Page 7: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/7.jpg)
7ApacheCon US - Las Vegas, Nevada20 November 2002
Scanner Validator Parser
Pipeline Configuration Problems
Hard-coded dependencies on implementation Inconsistent Interfaces
XML API
![Page 8: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/8.jpg)
8ApacheCon US - Las Vegas, Nevada20 November 2002
Custom Readers
ScannerEntity
Handler
ReaderStack
UTF-8Reader
UCSReader
EBCDICReader
GenericReader
scanNamescanAttValuescanContent
…
![Page 9: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/9.jpg)
9ApacheCon US - Las Vegas, Nevada20 November 2002
Custom Readers Problems
Duplicated codeAllows more bugs to appearBugs are different based on encoding
because code is not shared More complicated
![Page 10: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/10.jpg)
10ApacheCon US - Las Vegas, Nevada20 November 2002
Deferred Transcoding
XML
StringPool
ParserComponent
StringProducer
Reader
DataBuffer
DataBuffer
…
addString
(String):i
nt
toString(int):String
addString
(StringPr
oducer,in
t,int):int
![Page 11: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/11.jpg)
11ApacheCon US - Las Vegas, Nevada20 November 2002
Deferred Transcoding Problems
All components need reference to StringPoolStrings not immediately available to methodsMust make call to StringPool to query String
Memory management is complicatedResponsibility of callee to free resourcesUses more memory
![Page 12: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/12.jpg)
12ApacheCon US - Las Vegas, Nevada20 November 2002
Xerces2 Overview:Challenges and Design
Andy Clark
![Page 13: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/13.jpg)
13ApacheCon US - Las Vegas, Nevada20 November 2002
Challenges
Requirements Simple design and implementation Easy to maintain More modularity and configurability Support current and future features
Design Decisions Always transcode bytes into Unicode characters
Removes StringPool and dependencies
Clean architecture
![Page 14: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/14.jpg)
14ApacheCon US - Las Vegas, Nevada20 November 2002
Xerces Native Interface (XNI)
“Streaming” Information SetSimilar to SAXNo loss of document information*
Parser configuration and layering Future extensions
Native pull-parser, tree model, etc.
* Does not preserve all document information but communicates more information to the application than DOM or SAX.
![Page 15: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/15.jpg)
15ApacheCon US - Las Vegas, Nevada20 November 2002
org.apache.xerces.xni org.apache.xerces.xni.parser
XMLDTDHandler
XMLDTDContentModelHandler
XMLDocumentFragmentHandler
XMLLocator
XMLDocumentHandler
NamespaceContext
XMLAttributesAugmentations
QNameXMLString
XNIException
RuntimeException
XMLPullParserConfiguration
XMLErrorHandler XMLEntityResolver
XMLDTDScanner
XMLDocumentScanner
XMLDTDContentModelSourceXMLDTDContentModelFilter
XMLDTDSourceXMLDTDFilter
XMLDocumentSourceXMLDocumentFilter
XMLComponentManager XMLComponent
XMLConfigurationException
XMLParseExceptionXMLInputSource
XMLParserConfiguration
java.lang Interface
Class
Package
Extends
XMLResourceIdentifier
![Page 16: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/16.jpg)
16ApacheCon US - Las Vegas, Nevada20 November 2002
Parsing Pipeline
Handlers communicate information between parser components
Scanner Validator ParserXML API
![Page 17: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/17.jpg)
17ApacheCon US - Las Vegas, Nevada20 November 2002
Handler Overview
XML
API
Document
Scanner
Validator Parser
DTD
Scanner
XMLDocumentHandler
XMLDTDHandlerXMLDTDContentModelHandler
![Page 18: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/18.jpg)
18ApacheCon US - Las Vegas, Nevada20 November 2002
Parser Layout
Components and Manager
Component Manager
SymbolTable
GrammarPool
DatatypeFactory
Regular Components
Scanner ValidatorEntity
ManagerError
Reporter
Configurable Components
![Page 19: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/19.jpg)
19ApacheCon US - Las Vegas, Nevada20 November 2002
Reader Management
EntityScanner
Scanner
EntityManager
ReaderStack
scanNamescanAttValuescanContent
…
UTF-8Reader
UCSReader
EBCDICReader
GenericReader
![Page 20: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/20.jpg)
20ApacheCon US - Las Vegas, Nevada20 November 2002
Parser Configuration
Before
* Parser pipeline is part of the document parser base class.
* Required duplication to re-configure parser and still take advantage of API generator code.
XML
SAX ParserDOM Parser
Document Parser
Scanner Validator
![Page 21: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/21.jpg)
21ApacheCon US - Las Vegas, Nevada20 November 2002
Parser Configuration
After
* Parser pipeline and settings are specified in a separate parser configuration object.
* Allows re-use of framework without rewriting existing code.
SAX ParserDOM Parser
Document Parser
Parser Configuration
Scanner ValidatorXML
![Page 22: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/22.jpg)
22ApacheCon US - Las Vegas, Nevada20 November 2002
API Generators
Different APIs can be generated from same document parser
XNISAX ParserDOM Parser …
Document Parser
JavaBean Parser
![Page 23: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/23.jpg)
23ApacheCon US - Las Vegas, Nevada20 November 2002
Sample Parser Configuration #1
HTML parserAvailable as NekoHTML download
SAX ParserDOM Parser
Document Parser
HTML Parser Configuration
HTML ScannerHTML Tag Balancer
![Page 24: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/24.jpg)
24ApacheCon US - Las Vegas, Nevada20 November 2002
Non-validating parser (for performance)Available with Xerces download
SAX ParserDOM Parser
Document Parser
Non-Validating Parser Configuration
Scanner / Namespace BinderXML
Sample Parser Configuration #2
![Page 25: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/25.jpg)
25ApacheCon US - Las Vegas, Nevada20 November 2002
Sample Parser Configuration #3
XInclude processingNot yet implemented
SAX ParserDOM Parser
Document Parser
XInclude Parser Configuration
ScannerXML XInclude Validator
![Page 26: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/26.jpg)
26ApacheCon US - Las Vegas, Nevada20 November 2002
Sample Parser Configuration #4
Database result set converted to XMLNot yet implemented
SAX ParserDOM Parser
Document Parser
Database Parser Configuration
Database Query ValidatorDB
![Page 27: Xerces2: The Sequel With No Equal](https://reader035.vdocuments.us/reader035/viewer/2022062422/56813c2d550346895da5a9d6/html5/thumbnails/27.jpg)
ApacheCon US - Las Vegas, Nevada 2720 November 2002
That’s All, Folks!
Question and AnswersAny questions?
Linkshttp://www.apache.org/~andyc/xml/present/
http://xml.apache.org/xerces2-j/http://www.apache.org/~andyc/neko/