extensible markup language (xml) james atlas july 15, 2008

48
eXtensible Markup eXtensible Markup Language Language (XML) (XML) James Atlas James Atlas July 15, 2008 July 15, 2008

Upload: willis-wilkins

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EXtensible Markup Language (XML) James Atlas July 15, 2008

eXtensible Markup LanguageeXtensible Markup Language(XML)(XML)

James AtlasJames Atlas

July 15, 2008July 15, 2008

Page 2: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 22

AnnouncementsAnnouncements

• Project 1 due ThursdayProject 1 due Thursday

• Office hours:Office hours:

• Tues 3:30-4:30, Wed 11-12, Thurs 3:30-4:30Tues 3:30-4:30, Wed 11-12, Thurs 3:30-4:30

Page 3: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 33

ReviewReview

• Multithreaded ProgrammingMultithreaded Programming

• Network ProgrammingNetwork Programming

Page 4: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 44

QuizQuiz

Page 5: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 55

Using the Synchronized KeywordUsing the Synchronized Keyword

• Can use synchronized keyword inside of Can use synchronized keyword inside of methodsmethodssynchronized { synchronized {

// atomic operation // atomic operation

}} Implies synchronization on Implies synchronization on thisthis object object Same rules as synchronized method accessSame rules as synchronized method access

• Only one thread in synchronized code at a timeOnly one thread in synchronized code at a time

Page 6: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 66

Using the Synchronized KeywordUsing the Synchronized Keyword

• Alternatively, can use to synchronize access Alternatively, can use to synchronize access to an objectto an object

synchronized (sharedObject) { synchronized (sharedObject) {

// atomic operation // atomic operation

}} Same rules as synchronized method access Same rules as synchronized method access

• Equivalent code blocks:Equivalent code blocks:synchronized (this) { synchronized (this) {

// atomic operation // atomic operation

}}

synchronized { synchronized {

// atomic operation // atomic operation

}}

Page 7: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 77

XMLXML

• MotivationMotivation

• XML and JavaXML and Java parsingparsing generationgeneration translationtranslation

Page 8: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 88

What is XML? What is XML?

• eXtensible Markup LanguageeXtensible Markup Language

• Describes data in a Describes data in a textualtextual, , hierarchicalhierarchical (tree- (tree-based) structurebased) structure

Structure, store, and transmit informationStructure, store, and transmit information

PortablePortable: preserves meaning and structure : preserves meaning and structure across machines/platformsacross machines/platforms

• Doesn’t Doesn’t dodo anything itself anything itself

Other applications use XML to share informationOther applications use XML to share information

Page 9: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 99

What is XML?What is XML?

• Simplified subset of Standard Generalized Markup Language (SGML)

• Defines other languages, such as RSS

Get updates on your favorite web sites

• Like well-structured HTMLLike well-structured HTML

• Different focus from HTML

XML: focuses on describing information

HTML: focuses on displaying information, how data looks

Page 10: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1010

One Use of XMLOne Use of XML

• Transmit messages across networkTransmit messages across network

Client Server ServerNetwork

<request_message> <calculation> 4*2 + 3*4 </calculation></request_message>

<response_message> <answer> 20 </answer></response_message>

Page 11: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1111

Java and XMLJava and XML

• Java class libraries to read, parse, Java class libraries to read, parse, validate, generate, and transform XML validate, generate, and transform XML datadata

Page 12: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1212

XML DocumentsXML Documents

• All textAll text makes it relatively easy for humans (and makes it relatively easy for humans (and

programs) to create, examine, and debug themprograms) to create, examine, and debug them

• Syntax rulesSyntax rules Very strictVery strict Very simpleVery simple Make document easy to parseMake document easy to parse

• Case-sensitiveCase-sensitive• The ML of “Markup Language”The ML of “Markup Language”

Tags, enclosed in angle brackets, < >Tags, enclosed in angle brackets, < >

Page 13: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1313

XML TagsXML Tags

• Identify data using tagsIdentify data using tagscalled called elementselements<name><name>

• Place content between tags:Place content between tags:<name>Jack Sparrow</name><name>Jack Sparrow</name>

• Tags can have Tags can have attributesattributes, like this:, like this:<name <name language="English"language="English">Jack >Jack Sparrow</name>Sparrow</name>

• Tags can contain child elements<name><first>Jack</first><last> Sparrow</last></name>

Page 14: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1414

Examples of XML SyntaxExamples of XML Syntax

• Elements/Tags:Elements/Tags:<date>22 May 2006</date><date>22 May 2006</date><yesterday/><yesterday/><date><month>10</month><day>22</day></date><date><month>10</month><day>22</day></date>

• Attributes:Attributes:<time <time zone=“-2”>zone=“-2”>5:57:39</time>5:57:39</time><time <time gmt='yes'gmt='yes'>7:57:39</time>>7:57:39</time>

• Comments:Comments:<!-- XML data comment --><!-- XML data comment -->

• Entity ReferencesEntity References&otherLoc; &copyr; &LegalDisclaimer;&otherLoc; &copyr; &LegalDisclaimer;

• Document type referenceDocument type reference<!DOCTYPE MAIL SYSTEM “email.dtd"><!DOCTYPE MAIL SYSTEM “email.dtd">

“Empty” tag

Either “” or ‘’

Need to be declared previously

Describes doc’s structure

Page 15: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1515

Well-FormednessWell-Formedness

• All documents must be All documents must be well-formedwell-formed Each tag must have a corresponding Each tag must have a corresponding end tagend tag

<name>Jack Sparrow</name><name>Jack Sparrow</name>

Or, the tag must be self-containedOr, the tag must be self-contained

<name value=“Jack Sparrow”/><name value=“Jack Sparrow”/>

Start Tag for aName piece of data

CorrespondingEnd tag

Page 16: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1616

XML Syntax ExampleXML Syntax Example<?xml version="1.0"?><?xml version="1.0"?><!DOCTYPE mail SYSTEM “email.dtd"><!DOCTYPE mail SYSTEM “email.dtd"><mail><mail> <message><message> <to>[email protected]</to><to>[email protected]</to> <from>[email protected]</from><from>[email protected]</from> <body type="text/plain">Assignment due</body><body type="text/plain">Assignment due</body> <unread/><unread/> </message></message> <message importance=“high"><message importance=“high"> <to>[email protected]</to><to>[email protected]</to> <from>[email protected]</from><from>[email protected]</from> <subject>CISC370</subject><subject>CISC370</subject> <body>Please see me for extra help</body><body>Please see me for extra help</body> </message></message></mail></mail>

Prolog/Declaration

DOCTYPE reference

root

Viewing in Eclipse

Page 17: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1717

XML Syntax ExampleXML Syntax Example

mail

message message

to from body

[email protected]

[email protected]

Assignment due

type=text/plain

Importance=high

to from

[email protected]

subject body

[email protected]

CISC370

Please see me for extra help

unread

Can have only one root

attribute

Page 18: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1818

XML Syntax: Escape CodesXML Syntax: Escape Codes

• &lt; less than sign <&lt; less than sign <

• &gt; greater than sign >&gt; greater than sign >

• &amp; ampersand &&amp; ampersand &

• &quot; quote " &quot; quote "

• &apos; apostrophe ’&apos; apostrophe ’

Page 19: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 1919

XML Design ConsiderationsXML Design Considerations

• When to use attributes and when to use child When to use attributes and when to use child elements?elements?

• ElementsElements Best for “things”, particularly those that have Best for “things”, particularly those that have

propertiesproperties Easier to extend laterEasier to extend later

• AttributesAttributes Best for properties, like modifiers or units Best for properties, like modifiers or units

Page 20: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2020

XML Structure SpecificationsXML Structure Specifications

• XML documents that do not have some XML documents that do not have some kind of specification can contain kind of specification can contain anyany tags tagsmakes makes standardizedstandardized communication very hard! communication very hard!

• Most XML documents obey a particular set Most XML documents obey a particular set of rules as to their structureof rules as to their structureRules define what tags, attributes, and entities Rules define what tags, attributes, and entities

may appear in the document may appear in the document

• XML documents can use XML documents can use namespacesnamespaces, , which are separate scopes within which which are separate scopes within which tags are definedtags are definedsimilar to namespaces in C++similar to namespaces in C++

Page 21: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2121

Specifying Valid XML TagsSpecifying Valid XML Tags

• Document Type Definition (DTD)Document Type Definition (DTD)Written in SGML (which has a very confusing Written in SGML (which has a very confusing

syntax!)syntax!)DTD approach is not used very much anymoreDTD approach is not used very much anymore

• SchemaSchemaCurrent approachCurrent approachWritten in XMLWritten in XML

• Makes it much easier to define the structure of an Makes it much easier to define the structure of an XML documentXML document

Page 22: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2222

XML ProgrammingXML Programming

• Generally, programs (and programmers) Generally, programs (and programmers) deal with XML in one of three waysdeal with XML in one of three waysParsing: Parsing: XML document as program inputXML document as program input

• Given an XML document, extract the stored data Given an XML document, extract the stored data and process it programmaticallyand process it programmatically

Generation: Generation: XML doc as program outputXML doc as program output• Given some data, generate an XML representation Given some data, generate an XML representation

of itof itTransformation: Transformation: XML doc as both program input XML doc as both program input

and outputand output• Given an XML document, transform it into some Given an XML document, transform it into some

other kind of documentother kind of document

Page 23: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2323

Java XML ProgrammingJava XML Programming

• Java APIs for XML Processing (JAXP)Java APIs for XML Processing (JAXP)included with Java 1.4 and later included with Java 1.4 and later APIs make programmatically working with XML APIs make programmatically working with XML

under Java very easy and straightforwardunder Java very easy and straightforwardSun’s collection of XML APIsSun’s collection of XML APIs

• Other packages available, which all work a Other packages available, which all work a little differentlylittle differentlyJust a library of classes and methodsJust a library of classes and methodswww.jdom.orgwww.jdom.org

Page 24: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2424

Fundamental Models for XML Fundamental Models for XML ParsingParsing• SAXSAX - - Simple API for XMLSimple API for XML

Serial, sequential access to XML tags and Serial, sequential access to XML tags and contentcontent

event-oriented callback modelevent-oriented callback modelfast and low overheadfast and low overheaddifficult to use for transformsdifficult to use for transforms

• DOMDOM - - Document Object Model for XMLDocument Object Model for XMLTree-based access to entire XML documentTree-based access to entire XML documentdata traversal modeldata traversal modelkeeps entire document in memorykeeps entire document in memoryeasy to use for transformseasy to use for transforms

Page 25: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2525

The SAX Programming ModelThe SAX Programming Model• Serial Access with the Simple API for XML

Requires more programmingEvent-drivenHarder to visualize

• A SAX Parser chops an XML document into a A SAX Parser chops an XML document into a sequence of eventssequence of eventsEventEvent: encountering an element of the XML : encountering an element of the XML

documentdocumentParser delivers XML events and errors, if any Parser delivers XML events and errors, if any

occur, to application by calling methods on occur, to application by calling methods on handlerhandler objects (provided by program)objects (provided by program)

Page 26: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2626

The SAX Programming ModelThe SAX Programming Model• Handler objects must implement particular Handler objects must implement particular

interfaces:interfaces: org.xml.sax.ContentHandlerorg.xml.sax.ContentHandler

for receiving events about document contents for receiving events about document contents and tagsand tags

org.xml.sax.ErrorHandlerorg.xml.sax.ErrorHandler

for receiving errors and warnings that occur for receiving errors and warnings that occur during parsingduring parsing

org.xml.sax.DTDHandlerorg.xml.sax.DTDHandler org.xml.sax.EntityResolverorg.xml.sax.EntityResolver

Page 27: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2727

SAX ParsersSAX Parsers• Most SAX parsers can read XML data from Most SAX parsers can read XML data from

any valid java.io.Readerany valid java.io.ReaderWhy a Reader instead of an InputStream?Why a Reader instead of an InputStream?

• or directly from a URL (for directly parsing XML or directly from a URL (for directly parsing XML that exists on the Internet)that exists on the Internet)

• SAX parsers can be SAX parsers can be validatingvalidating or non- or non-validatingvalidatingValidation:Validation: checking a document for conformance checking a document for conformance

to its specified DTD or schemato its specified DTD or schemaMost SAX parsers support both operational modesMost SAX parsers support both operational modes

Page 28: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2828

SAX Programming – Event SAX Programming – Event Passing Passing <?xml version="1.0" ?>

<mail> <message> <from>[email protected]</from> </message></mail>

SAXParser

Content Handler

startDocument

endDocument

startElement

endElement

charactersLocator

setDocumentLocatorInputReader

Locator: an object that keeps track of the current position in the document

Callbackmethods

Subclass of org.xml.sax.helpers.DefaultHandler

Page 29: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 2929

SAX Programming – BasicsSAX Programming – Basics• To use SAX, you need to import the relevant To use SAX, you need to import the relevant

packages, e.g.,packages, e.g., import org.xml.sax.*;import org.xml.sax.*; import org.xml.sax.helpers.*;import org.xml.sax.helpers.*; import javax.xml.parsers.*;import javax.xml.parsers.*;

• To do parsing, you need a SAX XMLReader objectTo do parsing, you need a SAX XMLReader object

SAXParserFactory sp_factory = SAXParserFactory sp_factory = SAXParserFactory.newInstance();SAXParserFactory.newInstance();

SAXParser sp = sp_factory.newSAXParser();SAXParser sp = sp_factory.newSAXParser();

XMLReader theParser = sp.getXMLReader(); XMLReader theParser = sp.getXMLReader();

Page 30: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3030

SAX Programming – ErrorsSAX Programming – Errors

• SAX Parser can encounter three kinds of errors during parsing:SAX Parser can encounter three kinds of errors during parsing: warningwarning a non-serious problem occurreda non-serious problem occurred errorerror a serious but recoverable problem occurreda serious but recoverable problem occurred fatalErrorfatalError a problem occurred that is so grave a problem occurred that is so grave

that parsing cannot continue.that parsing cannot continue.

• WarningWarning and and errorerror conditions result from violation of document conditions result from violation of document validation (DTD or schema constraints)validation (DTD or schema constraints)

• FatalErrorFatalError conditions are from I/O problems or non-well-formed conditions are from I/O problems or non-well-formed XMLXML

• Not all SAX parsers treat each condition in the same category Check the docs!

• All SAX parsers can throw SAXExceptionsAll SAX parsers can throw SAXExceptions Any SAX handler method (the methods the parser calls on your Any SAX handler method (the methods the parser calls on your

handler object) can also throw a SAXExceptionhandler object) can also throw a SAXException

Parsing cancontinue

Page 31: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3131

SAX Programming – ExampleSAX Programming – Example

• Example program consists of two classes:Example program consists of two classes: SAXExampleSAXExample

• supplies a main() method that creates a SAX supplies a main() method that creates a SAX parser and invokes it to parse a file specified on parser and invokes it to parse a file specified on the command linethe command line

SAXHandlerSAXHandler • acts as the SAX ContentHandler and acts as the SAX ContentHandler and

ErrorHandlerErrorHandler

Page 32: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3232

SAX Programming – ExampleSAX Programming – Example11 import java.io.*;import java.io.*;22 import org.xml.sax.*;import org.xml.sax.*;33 import org.xml.sax.helpers.*;import org.xml.sax.helpers.*;44 import javax.xml.parsers.*;import javax.xml.parsers.*;5566 public class SAXExample {public class SAXExample {77 public static void main(String [] args) { public static void main(String [] args) {88 try { try {99 System.out.println("Creating and setting up the SAX parser."); System.out.println("Creating and setting up the SAX parser.");1010 SAXParserFactory sp_factory = SAXParserFactory.newInstance(); SAXParserFactory sp_factory = SAXParserFactory.newInstance();1111 XMLReader theReader = sp_factory.newSAXParser().getXMLReader(); XMLReader theReader = sp_factory.newSAXParser().getXMLReader();1212 SAXHandler theHandler = new SAXHandler(); SAXHandler theHandler = new SAXHandler();1313 theReader.setContentHandler(theHandler); theReader.setContentHandler(theHandler);1414 theReader.setErrorHandler(theHandler); theReader.setErrorHandler(theHandler);1515 theReader.setFeature("http://xml.org/sax/features/validation", theReader.setFeature("http://xml.org/sax/features/validation",

false);false);1616 System.out.println("Making InputSource for " + args[0]); System.out.println("Making InputSource for " + args[0]);1717 FileReader file_in = new FileReader(args[0]); FileReader file_in = new FileReader(args[0]);1818 System.out.println("About to parse...”); System.out.println("About to parse...”);1919 theReader.parse(new InputSource(file_in));theReader.parse(new InputSource(file_in));2020 System.out.println("...parsing done."); System.out.println("...parsing done.");2121 } catch (Exception e) { } catch (Exception e) {2222 System.err.println("Error: " + e); e.printStackTrace(); System.err.println("Error: " + e); e.printStackTrace();2323 } }2424 } }2525 }}

Can call parse with other args

Page 33: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3333

SAX Programming – ExampleSAX Programming – Example11 import org.xml.sax.*;import org.xml.sax.*;2233 public class SAXHandler implements ContentHandler, public class SAXHandler implements ContentHandler,

ErrorHandlerErrorHandler55 {{66 private Locator loc = null; private Locator loc = null;7788 public void setDocumentLocator(Locator l) { public void setDocumentLocator(Locator l) {99 loc = l; loc = l;1010 } }11111212 public void characters(char [] ch, int st, int len) { public void characters(char [] ch, int st, int len) {1313 String s = new String(ch, st, len); String s = new String(ch, st, len);1414 System.out.println("Got content string '" + s + "'"); System.out.println("Got content string '" + s + "'");1515 return; return;1616 } }17171818 public void startElement(String uri, String lname, public void startElement(String uri, String lname, 1919 String qname, Attributes attrs) { String qname, Attributes attrs) {2020 System.out.print(lname + " tag with "); System.out.print(lname + " tag with ");2121 System.out.print(attrs.getLength() + " attrs starts"); System.out.print(attrs.getLength() + " attrs starts");2222 System.out.println(" at line " + loc.getLineNumber()); System.out.println(" at line " + loc.getLineNumber());2323 } }2424

Called when a setof characters is

encountered

Called when astarting element tag

is encountered

Page 34: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3434

SAX Programming – ExampleSAX Programming – Example2525 public void endElement(String uri, String lname, String qname) { public void endElement(String uri, String lname, String qname) { 2626 System.out.print(lname + " tag ends "); System.out.print(lname + " tag ends ");2727 System.out.println("at line " + loc.getLineNumber()); System.out.println("at line " + loc.getLineNumber());2828 } }29293030 public void startDocument() { } public void startDocument() { }3131 public void endDocument() { } public void endDocument() { }3232 public void processingInstruction(String t, String d) { } public void processingInstruction(String t, String d) { }3333 public void skippedEntity(String name) { } public void skippedEntity(String name) { }3434 public void ignorableWhitespace(char[] ch, int st, int len) { } public void ignorableWhitespace(char[] ch, int st, int len) { }3535 public void startPrefixMapping(String p, String uri) { } public void startPrefixMapping(String p, String uri) { }3636 public void endPrefixMapping(String p) { } public void endPrefixMapping(String p) { }37373838 public void warning(SAXParseException e) { public void warning(SAXParseException e) {3939 System.err.print("SAX Warning: " + e); System.err.print("SAX Warning: " + e);4040 System.err.println(" at line " + loc.getLineNumber()); System.err.println(" at line " + loc.getLineNumber());4141 } }4242 public void error(SAXParseException e) { public void error(SAXParseException e) {4444 System.err.print("SAX Error: " + e); System.err.print("SAX Error: " + e);4545 System.err.println(" at line " + loc.getLineNumber()); System.err.println(" at line " + loc.getLineNumber());4646 } }47474848 public void fatalError(SAXParseException e) { public void fatalError(SAXParseException e) {4949 System.err.print("SAX Fatal Error: " + e); System.err.print("SAX Fatal Error: " + e);5050 System.err.println(" at line " + loc.getLineNumber()); System.err.println(" at line " + loc.getLineNumber());5151 } }5252 }}

Called when a SAX warning is

encountered

Called when a SAX error isencountered

Called when a SAX fatal error is

encountered

Called when anending element tag

is encountered

Called when the startor end of the doc is

encountered

Page 35: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3535

DOM Programming – DOM Programming – OverviewOverview• DOM parser builds an DOM parser builds an in-memory treein-memory tree representation of representation of

the entire XML documentthe entire XML document reference to tree is returned to your programreference to tree is returned to your program

• DOM tree is composed of objects, most of which DOM tree is composed of objects, most of which implement the following interfaces from the implement the following interfaces from the org.w3c.domorg.w3c.dom package: package: NodeNode parent interface of all DOM tree nodesparent interface of all DOM tree nodes ElementElement represents a tag in a XML document, represents a tag in a XML document,

may have childrenmay have children DocumentDocument represents an entire XML document, represents an entire XML document,

always has childrenalways has children TextText contents of an element or an attribute contents of an element or an attribute AttrAttr represents an attribute of an elementrepresents an attribute of an element

Page 36: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3636

DOM Programming – DOM Programming – OverviewOverview• Your program can manipulate the DOM treeYour program can manipulate the DOM tree

Traverse, modify, print, apply XSL transforms Traverse, modify, print, apply XSL transforms (XML Stylesheet Language Transforms … more (XML Stylesheet Language Transforms … more about that in a bit), and many more functionsabout that in a bit), and many more functions

• Most DOM parsers allow either Most DOM parsers allow either validationvalidation or or non-validation modesnon-validation modesValidationValidation refers to checking for conformance refers to checking for conformance

against the document’s specified DTD or against the document’s specified DTD or schemaschema

Page 37: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3737

Document Builder

DOM Programming – ModelDOM Programming – Model<?xml version="1.0" ?><mail> <message> <from>[email protected]</from> </message></mail>

Parser

Input Reader

DTD or schema(if any)

ApplicationCode

parse

return DOM

tree

Page 38: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3838

DOM Programming – Creating DOM Programming – Creating the DOM Treethe DOM Tree• How does the DOM API generate the tree How does the DOM API generate the tree

representing the XML document?representing the XML document?It uses a SAX parser! It uses a SAX parser! The DOM APIs run a SAX parser to parse the The DOM APIs run a SAX parser to parse the

XML document in an event-driven manner, XML document in an event-driven manner, constructing the DOM tree in memoryconstructing the DOM tree in memory

Page 39: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 3939

DOM Programming – BasicsDOM Programming – Basics• To create a DOM DocumentBuilder with the JAXP To create a DOM DocumentBuilder with the JAXP

packagespackagesDocumentBuilderFactory dbuilder_factory = DocumentBuilderFactory dbuilder_factory = DocumentBuilderFactory.newInstance();DocumentBuilderFactory.newInstance();

dbuilder_factory.setValidating(true); dbuilder_factory.setValidating(true);

DocumentBuilder dbuilder = DocumentBuilder dbuilder = dbuilder_factory.newDocumentBuilder();dbuilder_factory.newDocumentBuilder();

• Build a document tree from XML data found in a particular Build a document tree from XML data found in a particular file file FileReader file_in = new FileReader(“thefile.xml”);FileReader file_in = new FileReader(“thefile.xml”);

Document doc = dbuilder.parse(new Document doc = dbuilder.parse(new InputSource(file_in));InputSource(file_in));

Page 40: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4040

DOM Programming – DOM Programming – ExampleExample

11 import org.w3c.dom.*;import org.w3c.dom.*;

22 import javax.xml.parsers.*;import javax.xml.parsers.*;

33 import org.xml.sax.*;import org.xml.sax.*;

44

55 public class public class DOMExampleDOMExample { {

66 public static void main(String [] args) { public static void main(String [] args) {

88 try { try {

99 System.out.println("Creating Document builder."); System.out.println("Creating Document builder.");

1010 DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

1111 dbf.setValidating(true); dbf.setValidating(true);

1212 DocumentBuilder db = dbf.newDocumentBuilder(); DocumentBuilder db = dbf.newDocumentBuilder();

1313 System.out.println(“Ready to parse!”); System.out.println(“Ready to parse!”);

1414 FileReader file_in = new FileReader(“theFile.xml”); FileReader file_in = new FileReader(“theFile.xml”);

1515 Document doc = db.parse(new InputSource(file_in));Document doc = db.parse(new InputSource(file_in));

1616 System.out.println("Parsed document, ready to process."); System.out.println("Parsed document, ready to process.");

1717 myDOMTreeProcessor proc = new myDOMTreeProcessor(); myDOMTreeProcessor proc = new myDOMTreeProcessor();

1818 proc.process(doc, System.out);proc.process(doc, System.out);

1919 } }

2020 catch (Exception e) { catch (Exception e) {

2121 System.err.println("XML Exception thrown: " + e); System.err.println("XML Exception thrown: " + e);

2222 e.printStackTrace(); e.printStackTrace();

2323 } }

2424 } }

2525 }}

If want to validate

Page 41: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4141

DOM Programming – DOM Programming – ExampleExample• MyDOMTreeProcessorMyDOMTreeProcessor

Page 42: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4242

Changing the DOM TreeChanging the DOM Tree• After you have the DOM tree, you can:After you have the DOM tree, you can:

search for particular nodessearch for particular nodes extract element and text valuesextract element and text values alter the tree:alter the tree:

• add elements to the tree or change themadd elements to the tree or change them• add attributes to elements or change their valuesadd attributes to elements or change their values• re-arrange elements and subtreesre-arrange elements and subtrees• synthesize entirely new trees or subtreessynthesize entirely new trees or subtrees

• If you create a new tree or subtree, you should If you create a new tree or subtree, you should apply the apply the normalizenormalize() method before processing it () method before processing it furtherfurther optimizes layoutoptimizes layout

Page 43: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4343

XML Stylesheet Language XML Stylesheet Language (XSL)(XSL)• Core XML technology for performing Core XML technology for performing

transformations to and on XML documentstransformations to and on XML documents

• Offers extensive template matching and Offers extensive template matching and processing instructions for transforming processing instructions for transforming XML data into other XML-like dataXML data into other XML-like datae.g., XML->XML, XML->HTML, XML->texte.g., XML->XML, XML->HTML, XML->text

• A programming language onto itselfA programming language onto itselfGeared towards pattern matching and Geared towards pattern matching and

formattingformatting

Page 44: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4444

XSL Transformations (XSLT)XSL Transformations (XSLT)

• An An XSL ProcessorXSL Processor is a system (usually invoked is a system (usually invoked through an object interface) that can apply XSLT through an object interface) that can apply XSLT transforms to some XML input and generate transforms to some XML input and generate another document (normally the data in another another document (normally the data in another form)form)

• Transformation rules are expressed in XSLTransformation rules are expressed in XSL

• JAXP includes the Java implementation of the JAXP includes the Java implementation of the Apache Xalan XSLT processorApache Xalan XSLT processor

Page 45: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4545

XSL Transformations (XSLT)XSL Transformations (XSLT)

XMLInput file

Output file(may or maynot be XML)

XSLTDeclarations

XSLT Processor

• A concrete example:A concrete example: http://www.cis.udel.edu/~sprenkle/bibtex2html/

Page 46: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4646

XML Programming - XML Programming - TransformationsTransformations• Advantage of the DOM programming model: can be Advantage of the DOM programming model: can be

used easily with XSLused easily with XSL• JAXP fully supports XSLT with the package JAXP fully supports XSLT with the package

javax.xml.transformjavax.xml.transform and its sub-packages and its sub-packages• The sequence of steps for creating and applying The sequence of steps for creating and applying

transforms with XSLT and Xalan istransforms with XSLT and Xalan is Obtain or create a DOM tree, create a Source from itObtain or create a DOM tree, create a Source from it Create a TransformerFactoryCreate a TransformerFactory Create a Templates object based on a particular XSLT Create a Templates object based on a particular XSLT

documentdocument Create a Transformer object from the Templates objectCreate a Transformer object from the Templates object Create an output Results objectCreate an output Results object Apply the Transformer to the Source and the ResultApply the Transformer to the Source and the Result

Page 47: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4747

SummarySummary• XML is a technology for storing data in a standard,

portable, and text-based way Eases communication between remote programs Basis of SOAP: web service messages

• Java has packages to help your Java programs parse XML, build XML documents, and perform XML transformations

• Two basic models for XML parsingSAX: sequential access, event-drivenDOM: in-memory tree structure

• XSLT can be used to transform XML into other XML or other textual formats

Page 48: EXtensible Markup Language (XML) James Atlas July 15, 2008

July 15, 2008July 15, 2008 James Atlas - CISC370James Atlas - CISC370 4848

XML LimitationsXML Limitations

• Verbose descriptionsVerbose descriptions Can be redundantCan be redundant

• Hierarchical rather than relational modelHierarchical rather than relational model Relational: databasesRelational: databases Must choose betweenMust choose between

<movie> <actor …> <actor …> …</movie><movie>…

<actor> <movie …> <movie …> …</actor><actor>…