programming with xml written by: adam carmi zvika gutterman
Post on 21-Dec-2015
225 views
TRANSCRIPT
Agenda
• About XML
• Review of XML syntax
• Document Object Model (DOM)
• JAXP
• W3C XML Schema
• Validating Parsers
About XML
• XML – EXtensible Markup Language• Designed to describe data
– Provides semantic and structural information
– Extensible
• Human readable and computer-manipulable• Software and Hardware independent• Open and Standardized by W3C1
• Ideal for data exchange
1) World Wide Web Consortium (founded in 1994 by Tim Berners-Lee)
Comment
<offenders>
<!-- Lists all traffic offenders -->
<offender id="024378449 ">
<firstName> David </firstName>
<middleName>Reuven</middleName>
<lastName>Harel</lastName>
<violation id=’12’>
<code num=“232” category=“traffic”/>
<issueDate>2001-11-02</issueDate>
<issueTime>10:32:00</issueTime>
Ran a red light at Arik & Bentz st.
</violation>
</offender>
</offenders>
offenders.xmlInformation is marked up with structural and semantic information. The characters &, <, >, ‘, “ are reserved and can’t be used in character data. Use &, <, >, ' and " instead.
Tag
CharacterData
CharacterData
<offenders>
<!-- Lists all traffic offenders -->
<offender id="024378449 ">
<firstName> David </firstName>
<middleName>Reuven</middleName>
<lastName>Harel</lastName>
<violation id=’12’>
<code num=“232” category=“traffic”/>
<issueDate>2001-11-02</issueDate>
<issueTime>10:32:00</issueTime>
Ran a red light at Arik & Bentz st.
</violation>
</offender>
</offenders>
offenders.xml: Tags
Start Tag
End Tag
Root Tag
Shorthand for:<code num=...></code>
XML tags are not pre-defined and a are case sensitive.
An XML document may have only one root tag.
<offenders>
<!-- Lists all traffic offenders -->
<offender id="024378449 ">
<firstName> David </firstName>
<middleName>Reuven</middleName>
<lastName>Harel</lastName>
<violation id=’12’>
<code num=“232” category=“traffic”/>
<issueDate>2001-11-02</issueDate>
<issueTime>10:32:00</issueTime>
Ran a red light at Arik & Bentz st.
</violation>
</offender>
</offenders>
offenders.xml: ElementsR
oot
Ele
men
t
Elements mark-up information.
Element x begins with a start-tag <x> and ends with an end-tag </x>
XML Elements must be properly nested:<x>...<y>...</y>...</x>
XML documents must contain exactly one root element.
offenders.xml: Content
<offenders> ���<!-- �Lists �all �traffic �offenders �--> ���<offender id="024378449 �"> �����<firstName> ��David �</firstName> �����<middleName>Reuven</middleName> �����<lastName>Harel</lastName> �����<violation id=’12’> �������<code num=“232” category=“traffic”/> �������<issueDate>2001-11-02</issueDate> �������<issueTime>10:32:00</issueTime> �������Ran �a �red �light �at ��Arik �& �Benz st. �����</violation> ���</offender> �</offenders>
The content of an element is all the text that lies between its start and end tags.
An XML parser is required to pass all characters in a document, including whitespace characters.
� whitespace
offenders.xml: Attributes
<offenders>
<!-- Lists all traffic offenders -->
<offender id="024378449 ">
<firstName> David </firstName>
<middleName>Reuven</middleName>
<lastName>Harel</lastName>
<violation id=’12’>
<code num=“232” category=“traffic”/>
<issueDate>2001-11-02</issueDate>
<issueTime>10:32:00</issueTime>
Ran a red light at Arik & Benz st.
</violation>
</offender>
</offenders>
Attributes are used to provide additional information about elements.
Element values must always be enclosed in quotes (“/‘)
DOMTM
• DOMTM – Document Object Model• A Standard hierarchy of objects, recommended by
the W3C, that corresponds to XML documents.• Each element, attribute, comment, etc., in an XML
document is represented by a Node in the DOM tree.
• The DOM API1 allows data in an XML document to be accessed and modified by manipulating the nodes in a DOM tree.
1) Application Programming Interface
offenders.xml: DOM tree:Document
:Elementoffenders
:CommentListsalltrafficoffenders
:Elementoffender
:ElementfirstName :Text
David
:Attributeid
:Text
:Text
:Text
:Text
:Text024378449
Example: offenders DOM
:Elementviolation
:Attributeid
:Text
:Elementcode
:Attributenum
:Attributecategory
:Text
:ElementissueDate :Text
2001-11-02
offend
eroffend
ers :Text
12
:Text232
:Texttraffic
:ElementlastName :Text
Harel:Text
The element “middleName”
was skipped
Example: offenders DOM
:ElementissueTime :Text
10:32:00
:Text
:TextRanaredlightatArik&Benzst.
offend
er violation
:Text
offend
ers
:Text
DOM Class Hierarchy1
1) A partial class hierarchy is presented in this slide.
<<interface>>
Node
<<interface>>
Text
<<interface>>
Element<<interface>>
Document
<<interface>>
Comment
<<interface>>
CharacterData
<<interface>>
NodeList<<interface>>
NamedNodeMap
JAXP
• JAXP – JavaTM API for XML Processing• JAXP enables applications to parse and transform
XML documents using an API that is independent of a particular XML processor implementation.
• JAXP provides two parser types:– SAX1 parser: event driven– DOM document builder: constructs DOM trees
by parsing XML documents.
1) Simple API for XML
Creating a DOM Builder
1. Create a DocumentBuilderFactory object:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
2. Configure the factory object:
dbf.setIgnoringComments(true);
3. Create a builder instance using the factory:
DocumentBuilder docBuilder =dbf.newDocumentBuilder();
A ParserConfigurationException is thrown if a DocumentBuilder cannot be created which satisfies the configuration requested.
Building a DOM Document
• A DOM document can be built manually from within the application:
Document doc = docBuilder.newDocument();Element offenders = doc.createElement("offenders");doc.appendChild(offenders);Element offender = doc.createElement("offender");offender.setAttribute("id", "024378449 ");offenders.appendChild(offender);Element firstName = doc.createElement(“firstName”);Text text = doc.createTextNode(“ David “);firstName.appendChild(text);...
A DOMException is raised if an illegal character appears in a name, an illegal child is appended to a node etc.
Building a DOM Document
• A DOM representation of an XML document can be built automatically by parsing the XML document:
Document doc = docBuilder.parse(new File(xmlFile));
A SAXParseException or SAXException is raised to report parse errors.
DumpDom.java (1 of 5)
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;
Creating and traversing a DOM document
DumpDom.java (2 of 5)public class DumpDom { private int indent = 0; // text indentation level public DumpDom(String xmlFile) { try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbf.newDocumentBuilder(); Document doc = docBuilder.parse(new File(xmlFile)); recursiveDump(doc);
} catch (ParserConfigurationException pce) { System.err.println("Failed to create document builder"); } catch (SAXParseException spe) { System.err.println("Error: Line=" + spe.getLineNumber() + ": " +
spe.getMessage()); } catch (SAXException se) { System.err.println("Parse error found: " + se); } catch (IOException e) { e.printStackTrace(); }
}
DumpDom.java (3 of 5) private void recursiveDump(Node node) { switch (node.getNodeType()) { case Node.DOCUMENT_NODE: dumpNode("document", node); break; case Node.COMMENT_NODE: dumpNode("comment", node); break; case Node.ATTRIBUTE_NODE: dumpNode("attribute", node); break; case Node.TEXT_NODE: dumpNode("text", node); break; case Node.ELEMENT_NODE: dumpNode("element", node); indent += 2;
DumpDom.java (4 of 5) NamedNodeMap atts = node.getAttributes(); for (int i = 0 ; i < atts.getLength() ; ++i) recursiveDump(atts.item(i)); indent -= 2; break; default: System.err.println("Unknown node: " + node); System.exit(1); } // print children of the input node (if there are any) indent+=2; for (Node child = node.getFirstChild() ; child != null ; child = child.getNextSibling()) { recursiveDump(child); } indent-=2; }
DumpDom.java (5 of 5) private void dumpNode(String type, Node node) { for (int i = 0 ; i < indent ; ++i)
System.out.print(" "); System.out.print("[" + type + "]: "); System.out.print(node.getNodeName()); if (node.getNodeValue() != null) System.out.print("=\"" + node.getNodeValue() + "\""); System.out.print("\n");
} public final static void main(String[] args) { DumpDom dumper = new DumpDom(args[0]); }
}
XML Schema
• The purpose of an XML Schema is to define a class of XML documents.
• An XML document that is syntactically correct is considered well formed. If it also conforms to a XML schema is considered valid.
• A XML document is not required to have a corresponding Schema.
• XML Schemas are expected to replace the DTD1 as the primary means of describing document structure.
1) Document Type Definition (uses EBNF form)
XML Schema (cont.)
• XML Schema documents are themselves XML documents.– Can be manipulated as such– XML Schema is a language with a XML syntax.
• A XML document may explicitly reference the schema document that validates it.– A schema language is validated by a DTD.
• Several schema models exist. In this course we will use the W3C XML Schema1.
1) W3C recommendation since 2001
W3C XML Schema
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”> ...</schema>
• A W3C XML Schema consists of a schema element and a variety of sub-elements which determine the appearance of elements and their content in instance documents
• Each of the elements (and predefined simple types) in the schema has (by convention) a prefix xsd:which is associated with the W3C XML schema namespace.
Elements & Attribute Declarations
• Elements are declared using the element element:<xsd:element name=“firstName” type=“xsd:NMTOKEN”/><xsd:element name=“offenders” type=“Offenders”/>
• Attributes are declared using the attribute element: <xsd:attribute name=“id” type=“xsd:positiveInteger”/>
A pre-defined (simple) type
Element & Attribute Types
• Elements that contain sub-elements or carry attributes are said to have complex types.
• Elements that contain only text (e.g. numbers, strings, dates etc.) but do contain any sub-elements are said to have simple types.
• Attributes always have simple types.• Many simple types (e.g. string, date, integer etc.)
are pre-defined.
A Few Built in Simple TypesSimple Type Examples
string any textual value (white space preserved)
NMTOKEN1 student, 342, $$
ID1 s1, :myId, _4
integer -126789, -1, 0, 1, 126789, 03485
float -INF, -1E4, -0, 0,12.78, 12.78E-2, NaN
time 13:24:12, 02:15:34.879
date 2002-11-23
boolean true, false, 0, 1
1) Should only be used as attribute types
Derived Simple Types
• New simple types may be defined by deriving them from existing simple types (build-in and derived)
• New simple types are derived by restricting the range of permitted values for an existing simple type.
• A new simple type is defined using the simpleType element.
Derived Simple Types (cont.)
• Example: Numeric Restriction <xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction></xsd:simpleType>
• Example: Enumeration <xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction></xsd:simpleType>
Complex Types
• Complex types are defined using the complexType element.
• Elements with complex types may carry attributes.• The content of elements with complex types is
categorized as follows:– Empty: no content is allowed.
– Simple: content must be of simple type.
– Element: content must include only child elements.
– Mixed: both element and character content is allowed.
Complex Types: Attributes
• Attributes may be declared, using the use attribute, as required, optional (default) or prohibited.
• Default values for attributes are declared using the default attribute– Allowed only for optional attributes
• The fixed attribute is used to ensure that an attribute is set to a particular value.– Appearance of the attribute is optional.
– fixed and use are mutually exclusive.
Complex Types: Attributes (cont.)
• Example: use, fixed
<xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID“ use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/></xsd:complexType>
• Example: use, default
<xsd:complexType name="IssueTime"> ...
<xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/>
...</xsd:complexType>
Complex Types: Empty Content
• Example: schema <xsd:complexType name="Code">
<xsd:attribute name="num" type="ViolationID" use="required"/>
<xsd:attribute name="category" type="ViolationCategory“
fixed="traffic"/>
</xsd:complexType>
• Example: instance document<code num="232" category="traffic"/>
<code num="232" category="traffic"></code>
<code num="232"/>
Complex Types: Simple Content
• Example: element with no attributes
<xsd:element name="firstName" type="xsd:NMTOKEN"/>
• Example: element with attributes
<xsd:complexType name="IssueTime">
<xsd:simpleContent>
<xsd:extension base="xsd:time">
<xsd:attribute name="accuracy" type="Accuracy" use="optional"
default="accurate"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
Simple type
Complex Types: Element Content
• Element Occurrence Constraints– The minimum number of times an element may appear
is specified by the value of the optional attribute minOccurs.
– The maximum number of times an element may appear is specified by the value of the optional attribute maxOccurs.
• The value unbounded indicates that there maximum number of occurrences is unbounded.
– The default value of minOccurs and maxOccurs is 1.
Complex Types: Element Content (cont.)
• The element sequence is used to specify a sequence of sub-elements.– Elements must appear in the same order that they are declared.
<xsd:complexType> <xsd:sequence>
<xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“
minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“
minOccurs="0" maxOccurs="unbounded"/> ... </xsd:sequence> ... </xsd:complexType>
Complex Types: Mixed Content
• The optional Boolean attribute mixed is used to specify mixed content:
<xsd:complexType name="Violation" mixed="true">
<xsd:sequence>
<xsd:element name="code" type="Code"/>
<xsd:element name="issueDate" type="xsd:date"/>
<xsd:element name="issueTime" type="IssueTime"/>
</xsd:sequence>
...
</xsd:complexType>
Global Elements/Attributes• Global elements and global attributes are created
by declarations that appear as the children of the schema element.
• A global element is allowed to appear as the root element of an instance document.
• The attribute ref of element/attribute elements may be used (instead of the name attribute) to reference a global element/attribute.
• Cardinality constraints cannot be placed on global declarations, although they can be placed on local declarations that reference global declarations.
Global Elements/Attributes (cont.)
• Example: global declarations
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="offenders" type="Offenders"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:attribute name="id" type="xsd:positiveInteger"/>
...
• Example: ref attribute
<xsd:element ref="comment" minOccurs="0"/>
<xsd:attribute ref="id" use="required"/>
Anonymous Type Definitions
• When a type is referenced only once, or contains very few constraints, it can be more succinctly defined as an anonymous type.
• Saves the overhead of naming the type and explicitly referencing it.
Anonymous Type Definitions (cont.)
<xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence>
<xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“
minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“
minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/>
</xsd:complexType></xsd:element>
Is this a global declaration?Anonymous
offenders.xsd (1 of 4)<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/>
<xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional"
default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType>
<xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory" fixed="traffic"/> </xsd:complexType>
Schema for offenders
XML documents
offenders.xsd (2 of 4) <xsd:complexType name="Offenders"> <xsd:sequence> <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType>
offenders.xsd (3 of 4) <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType>
<xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction> </xsd:simpleType>
offenders.xsd (4 of 4) <xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="Accuracy"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="accurate"/> <xsd:enumeration value="approx"/> </xsd:restriction> </xsd:simpleType>
</xsd:schema>
Validating Parsers
• A validating parser is capable of reading a Schema specification or DTD and determine whether or not XML documents conform to it.
• A non validating parser is capable of reading a Schema / DTD but cannot check XML documents for conformity.– Limited to syntax checking
Creating a Validating DOM Parser
1. Create a DocumentBuilderFactory object:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
2. Configure the factory object to produce a validating parser:
dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" + "/schemaLanguage", "http://www.w3.org/2001/XMLSchema");dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" + "/schemaSource", new File(xmlSchema));dbf.setValidating(true);
3. Create a builder instance and set its error-handler:
DocumentBuilder docBuilder = dbf.newDocumentBuilder();docBuilder.setErrorHandler(new MyErrorHandler());
Handling Parsing Errors
• By default, JAXP parsers do not throw exceptions when documents are found to be invalid.
• JAXP provides the interface ErrorHandler so that users will be able to implement their own error-handling semantics.
BoundedErrorPrinter.java (1 of 3)
import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;
/** * An error handler that prints to the standard error stream a specified * number of errors. Once the specified number of errors is detected, * parsing is aborted. */public class BoundedErrorPrinter implements ErrorHandler { private int errorCount = 0; private int errorsToPrint; public BoundedErrorPrinter(int errorsToPrint) {
this.errorsToPrint = errorsToPrint; }
public void warning(SAXParseException spe) throws SAXException
{
System.err.println("Warning: " + getParseExceptionInfo(spe));
}
public void error(SAXParseException spe) throws SAXException
{
if (errorCount < errorsToPrint) {
System.err.println("Error: " + getParseExceptionInfo(spe));
++errorCount;
}
if (errorCount >= errorsToPrint)
throw spe; // abort parsing
}
BoundedErrorPrinter.java (2 of 3)
public void fatalError(SAXParseException spe) throws SAXException { if (errorCount < errorsToPrint)
System.err.println("Fatal: " + getParseExceptionInfo(spe)); throw spe; }
public boolean errorsFound() {
return errorCount > 0; }
private String getParseExceptionInfo(SAXParseException spe) { return "Line = " + spe.getLineNumber() + ": " + spe.getMessage(); }}
BoundedErrorPrinter.java (3 of 3)