on-the-fly validation of xml markup languages using off-the-shelf tools

22
On-the-fly Validation of XML Validation of XML Markup Languages Markup Languages using off-the-shelf using off-the-shelf Tools Tools Mikko Saesmaa Mikko Saesmaa Pekka Pekka Kilpeläinen Kilpeläinen Dept of Computer Science Dept of Computer Science University of Kuopio, Finland University of Kuopio, Finland

Upload: hisa

Post on 12-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

On-the-fly Validation of XML Markup Languages using off-the-shelf Tools. Mikko Saesmaa Pekka Kilpeläinen Dept of Computer Science University of Kuopio, Finland. The Talk in a Nutshell. How to support continuous validation in an XML editor... easily efficiently - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

On-the-fly Validation of XML On-the-fly Validation of XML Markup Languages using Markup Languages using off-the-shelf Toolsoff-the-shelf Tools

Mikko SaesmaaMikko Saesmaa

PekkaPekka KilpeläinenKilpeläinen

Dept of Computer ScienceDept of Computer Science

University of Kuopio, FinlandUniversity of Kuopio, Finland

Page 2: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 2

The Talk in a NutshellThe Talk in a Nutshell

How to support continuous validation in an How to support continuous validation in an XML editor... XML editor... – easilyeasily– efficientlyefficiently– against different schema languages?against different schema languages?

» DTD, XML Schema, Relax NG; ..., NVDL for compound DTD, XML Schema, Relax NG; ..., NVDL for compound docsdocs

Lesson: straightforward application of Java-Lesson: straightforward application of Java-XML APIs is effective, and quite efficient, tooXML APIs is effective, and quite efficient, too

Page 3: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 3

Intended AudienceIntended Audience

Suitable for Java/XML developersSuitable for Java/XML developers– emphasis on practical use of technologyemphasis on practical use of technology

EXTREMEEXTREME enough? enough?– Nothing extraordinary; some ideas non-Nothing extraordinary; some ideas non-

obviousobvious

Why Java?Why Java?– built-in XML and GUI facilities (JAXP + Swing)built-in XML and GUI facilities (JAXP + Swing)– general and portablegeneral and portable

Page 4: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 4

Look & Feel of ”Xeditor”Look & Feel of ”Xeditor”

- off- off- WF check, as - WF check, as XML or DTDXML or DTD- validate - validate using DTD, using DTD, or against schemaor against schema

Page 5: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 5

An An ExperimentalExperimental Editor Editor

Not production quality (yet) Not production quality (yet) – UI unfinishedUI unfinished– useful editor functionality missinguseful editor functionality missing– slightly unstableslightly unstable

Purpose to experiment with XML Purpose to experiment with XML technologytechnology– examine, learn, and explainexamine, learn, and explain

Page 6: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 6

Background and Related Background and Related WorkWork

Travis, <TAG> 1999: “Real-time XML Editor”Travis, <TAG> 1999: “Real-time XML Editor”– > Architag XRay2> Architag XRay2

Oxygen, XMLMind, XMLSpy, ...Oxygen, XMLMind, XMLSpy, ...– internal solutions of commercial editors?internal solutions of commercial editors?

Clark’s nXML (on GNU Emacs)Clark’s nXML (on GNU Emacs)– ~17,000 lines of Emacs Lisp (vs. ~1000 of ours) ~17,000 lines of Emacs Lisp (vs. ~1000 of ours)

DB research on DB research on incrementalincremental XML validation XML validation (Barbosa et al. 2004 & 2006, Balmin et al. 2004)(Barbosa et al. 2004 & 2006, Balmin et al. 2004)

Page 7: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 7

Architectural solutions of Architectural solutions of XeditorXeditor

Separation of Concerns: Editor (Swing) ignorant Separation of Concerns: Editor (Swing) ignorant of XML, of XML, XMLHandlerXMLHandler (JAXP) ignorant of editing (JAXP) ignorant of editing

After each edit, the doc is re-parsed completelyAfter each edit, the doc is re-parsed completely Pro: modularity and simplicityPro: modularity and simplicity Cons: Cons:

– limited functionality (e.g. markup completion and limited functionality (e.g. markup completion and syntax highlighting) syntax highlighting)

– some inefficiency, but not too badsome inefficiency, but not too bad

Page 8: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 8

Basic TechnicalitiesBasic Technicalities

Event-driven document checking/validation Event-driven document checking/validation – modifications caught by a Swing modifications caught by a Swing DocumentListenerDocumentListener

– errors caught as SAX parse exceptionserrors caught as SAX parse exceptions

Page 9: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 9

Error reporting in XeditorError reporting in Xeditor

Page 10: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 10

Basic Technicalities (2)Basic Technicalities (2)

JAXP is based on a Factory Design PatternJAXP is based on a Factory Design Pattern– Initialization of a Factory selects appropriate Initialization of a Factory selects appropriate

implementation of a parser, schema language, implementation of a parser, schema language, XSLT transformer, ... XSLT transformer, ... (Implementation-independence, (Implementation-independence, pluggabilitypluggability))

– > > extensibilityextensibility by new schema languages by new schema languages

Page 11: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 11

Basic Technicalities (3)Basic Technicalities (3)

Well-formedness checking and DTD Well-formedness checking and DTD validation based on JAXP SAX interfaces: validation based on JAXP SAX interfaces:

SAXParserFactorySAXParserFactory SAXParserSAXParser – Validating/non-validating parser selected based Validating/non-validating parser selected based

on editing modeon editing mode– SAX instead of DOM relevant for efficiencySAX instead of DOM relevant for efficiency– Only errors are observed; other events ignoredOnly errors are observed; other events ignored

Schema-based validation using JAXP Schema-based validation using JAXP Validators (later)Validators (later)

Page 12: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 12

Efficiency ConcernsEfficiency Concerns

Rule of thumb: access via memory much Rule of thumb: access via memory much faster than via disk, which again much faster than via disk, which again much faster than over networkfaster than over network

Document passed to parser/validator as an Document passed to parser/validator as an in-memory streamin-memory stream– > normally no observable delays> normally no observable delays

– Could cache external entities, like DTDs, tooCould cache external entities, like DTDs, too

Page 13: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 13

Different Schemas and Schema Different Schemas and Schema LanguagesLanguages

A Validator created when the user selects A Validator created when the user selects SchemaSchema

Page 14: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 14

Validation against the Validation against the SchemaSchema

As already shown:As already shown:

Page 15: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 15

Editing Schema DocumentsEditing Schema Documents

Public schema files for schema Public schema files for schema languages used for validation (as languages used for validation (as external resource files)external resource files)– xsd.rngxsd.rng or or XMLSchema.xsdXMLSchema.xsd for XML for XML

SchemaSchema– relaxng.rngrelaxng.rng for Relax NG for Relax NG

Pro: uniformity; efficiencyPro: uniformity; efficiency Con: imprecision (vs validating by Con: imprecision (vs validating by

appropriate schema compiler)appropriate schema compiler)

Page 16: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 16

Editing DTDsEditing DTDs

How to check the How to check the nonnon-XML syntax of -XML syntax of DTDs easily?DTDs easily?

Page 17: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 17

Editing DTDs (2)Editing DTDs (2)

Soln 1: Wrap the DTD in the prolog of a Soln 1: Wrap the DTD in the prolog of a dummy docdummy doc– how to check stand-alone DTDs (“external how to check stand-alone DTDs (“external

subsets”)?subsets”)? Soln 2: Parse a fixed dummy doc that refers Soln 2: Parse a fixed dummy doc that refers

to the DTD:to the DTD:<!DOCTYPE foo PUBLIC "Xeditor DTD" "IN-MEMORY" <!DOCTYPE foo PUBLIC "Xeditor DTD" "IN-MEMORY" [<!ELEMENT foo EMPTY> ]> <foo/>[<!ELEMENT foo EMPTY> ]> <foo/> randomize ”foo”, randomize ”foo”,

to avoid conflict to avoid conflict with actual DTDwith actual DTD

Page 18: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 18

Editing DTDs (3)Editing DTDs (3)

SYSTEPUBLIC

Page 19: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 19

Efficiency and ScalabilityEfficiency and Scalability

Is brute-force re-validation too inefficient?Is brute-force re-validation too inefficient? No: Delays normally unnoticeable No: Delays normally unnoticeable

times for validatingtimes for validatingXMLSchema.xsdXMLSchema.xsd

Page 20: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 20

Validating a Larger InstanceValidating a Larger Instance

appr. speed / secappr. speed / sec

28,000 lines, 1.2 MB28,000 lines, 1.2 MB

52,000 lines, 2.3 MB52,000 lines, 2.3 MB

180,000 lines, 8.1 MB180,000 lines, 8.1 MB

An XML Schema ofAn XML Schema of~ 18,000 lines/800 KB~ 18,000 lines/800 KB

Page 21: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 21

ConclusionsConclusions

Immediate validation against schemas Immediate validation against schemas in different languages (DTD, XML in different languages (DTD, XML Schema, Relax NG) easy to supportSchema, Relax NG) easy to support

Efficiency of brute-force application Efficiency of brute-force application sufficient for moderate-sized sufficient for moderate-sized documentsdocuments

Page 22: On-the-fly Validation of XML Markup Languages using off-the-shelf Tools

EML' 07 Montreal On-the-fly Validation of XML 22

Further WorkFurther Work

Potential application: teaching of XML Potential application: teaching of XML Practical enhancementsPractical enhancements

– markup completion, highlighting etc.markup completion, highlighting etc. Incremental validationIncremental validation

– to remove dependency of document lengthto remove dependency of document length– How to apply/modify standard interfaces?How to apply/modify standard interfaces?

Thank you! Questions? Comments?Thank you! Questions? Comments?