XML
DCS – SWC 2
Data vs. Information
• We often use the terms data and information interchangeably
• More precisely, data is some ”value” of a certain type, like– 33– ”High Street 7”– false
• Data comes without a context
DCS – SWC 3
Data vs. Information
• When we provide a context for the data, the data ( + the context) becomes information, like:– The age of Alan Wake is 33 years– John Peterson lives at High Street 7– Is Petra Wilson married? false
• Data + Context = Information
DCS – SWC 4
Data vs. Information
• We could also denote the context as ”data about the data”
• This is often referred to as meta-data
• Information is thus composed of:– Data– Meta-data
DCS – SWC 5
Data vs. Information
• This is – more or less – how we structure our communication with each other– The age of Alan Wake is 33 years– John Peterson lives at High Street 7– Is Petra Wilson married? false
• Meta-data and data
• One part is not very useful without the other part…
DCS – SWC 6
Data vs. Information
• Of course, we are often somewhat ”implicit” when we communicate:– He is 22 years (who…?)– My dog is named Kaya (what kind of dog…?)– John is ill (Who is John, what illness…?)
• We sometimes assume part of the context implicitly, otherwise it would be very tedious to communicate…
DCS – SWC 7
Transmitting information
• When computers transmit information, they can also be more or less implicit
• A method call is a kind of data transmis-sion, which is highly implicit:
CalculateFactorial(int n)• n is ”the number for which we want to
calculate the factorial”
DCS – SWC 8
Transmitting information
• Suppose a program needs to receive information about some product
• A product has– A name– A price– A weight
• How can we transmit this information to the program?
DCS – SWC 9
Transmitting information
• Perhaps just put the data into a file:
• ”Milk 4.95 1000”
• The meaning being:– The name of the product– The price of the product (in kroner)– The weight of the product (in grams)– Each element separated by a ” ”
DCS – SWC 10
Transmitting information
• The program can then just read the file, and ”decode” the data
• However, this assumes that sender and receiver of the data have agreed about how to interpret the file content!
DCS – SWC 11
Transmitting information
• Advantages– A compact format, no space wasted– Fast to process
• Disadvantages– Static, hard to change– Receiver and sender tied to each other– What about other recipients?– Not humanly readable
DCS – SWC 12
Transmitting information
• Main problem: Meta-data is ”encoded” in the receiving program
• Probably better to make meta-data explicit, to overcome disadvantages
• Use a ”markup language” to include meta-data in the transmission
DCS – SWC 13
Markup languages
• In a markup language, we can ”mark” data in a way which conveys the context
• We mark the data with meta-data
• An example of a markup language is HTML (HyperText Markup Language):
This is <b>very</b> good
DCS – SWC 14
Markup languages
• The markings <b> and </b> are markings (tags) indicating that some meta-data should be applied to the data between the tags – write it in bold
• In HTML, tags are used for formatting and structure of ”documents”, not for defining structure of data as such
• Enter XML!
DCS – SWC 15
What is XML…?
• eXtensible
• Markup
• Language
DCS – SWC 16
XML
• XML can be seen as a genera-lisation of HTML – tags can be used for everything!
• All kinds of meta-data can be included as tags in XML
• Important! XML does not define anything about presentation of data
DCS – SWC 17
XML
• A product defined in XML:
<product>
<name>Milk</name>
<price>4.95</price>
<weight>1000</weight>
</product>
Start the Product description
End the Product description
DCS – SWC 18
XML
• XML is highly recursive
• Inside a definition, we can have a number of ”child” definitions
• At some point, the definitions only contains data, like ”Milk”
• A definition can also have attributes associated with it
DCS – SWC 19
XML
<product>
<name>Milk</name>
<price currency=”DKK”>4.95</price>
<weight unit=”gram”>1000</weight>
</product>
DCS – SWC 20
XML
• When to use attributes vs a child element
• Attributes should not be data in itself, it should be information about some data element
• Not a strict rule…
• When in doubt, use child elements
DCS – SWC 21
XML
<product name=”Milk” price=”4.95” weight=”1000”/>
• Tempting, but not in the spirit of XML…
• Harder to process by recipient
DCS – SWC 22
XML
• The general structure of an XML document is then– An XML declaration:
<?xml version=”1.0”?>
– A root element containing the data<products>
</products>
– Inside the root element; all the child elements
DCS – SWC 23
XML
<?xml=”version 1.0”?>
<products><product>
<name>Milk</name>
<price currency=”DKK”>4.95</price>
<weight unit=”gram”>1000</weight>
</product>
<product>
<name>Orange Juice</name>
<price currency=”DKK”>8.95</price>
<weight unit=”gram”>500</weight>
</product>
...
</products>
DCS – SWC 24
Processing XML documents
• How do we process an XML document, in order to retrieve data from it?
• We apply an XML parser to the document
• The XML parser transforms the XML document into a tree structure
• The tree structure follows the Document Object Model (DOM)
DCS – SWC 25
Processing XML documents
products
product product
name price weight
Milk 4.95 1000
DCS – SWC 26
Processing XML documents
DocumentBuilderFactory fac =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = fac.newDocumentBuilder();
String fileName = ...;
File xmlFile = new File(fileName);
Document doc = builder.parse(xmlFile);
// Now doc contains the DOM tree
...
DCS – SWC 27
Processing XML documents
• Given a tree following the DOM standard, we can address various elements in the tree, using the XPath syntax– XPath describes a single node in the tree, or
a set of nodes– Syntax similar to directory paths
DCS – SWC 28
Processing XML documents
products
product product
name price weight
Milk 4.95 1000
/products/product[1]/weight
DCS – SWC 29
Processing XML documents
• Other XPath constructions:– count(/products/product) – get the
number of product instances– /products/product[1]/weight/@unit
– get the value of the attribute unit– name(/products/product[1]/*[1]) –
get the name of the first child of the first product
DCS – SWC 30
Processing XML documents
XPathFactory xpfac = XPathFactory.newInstance();
XPath path = xpfac.newXPath();
...
String result = path.evaluate(”/products/product[1]/price”,doc);
// Now result contains the price of the first product
...
DCS – SWC 31
Processing XML documents
• In general, we will convert an XML document into a number of Java objects
• We map XML data to Java classes
• Up to us to define proper classes to store the data – XML does not know about classes, data is ”objects”
• Each element in an XML document is like an instance field, not a class…
DCS – SWC 32
Creating XML documents
• In addition to processing given XML documents, we often wish to program-matically produce XML documents
• For this purpose, we again use the Document-Builder classes
DCS – SWC 33
Creating XML documents
DocumentBuilderFactory fac =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = fac.newDocumentBuilder();
Document doc = builder.newDocument();
// Now doc contains an empty DOM tree
...
DCS – SWC 34
Creating XML documents
• We must now insert node elements into the tree, corresponding to the structure of the data
• Fundamental methods are:– createElement(String name);– setAttribute(String name,String value);– createTextNode(String text);– appendChild(Element e);
DCS – SWC 35
Creating XML documents
createElement(String name);• Creates an empty new element, with the
given name
• Is called on the document object
• On a new element, we will– Set value of attributes– Add child elements, or– Add text nodes
DCS – SWC 36
Creating XML documents
appendChild(Element e);• Is itself called on an element
• Appends the element e as a child on itself
• This is how we create the structure for the tree!
DCS – SWC 37
Creating XML documents
• The previous methods are enough to create a DOM tree
• Usually, we combine the methods into ”helper methods”, designed to insert a certain type of element
• Helper methods will often call other helper methods, depending on tree structure
DCS – SWC 38
Creating XML documents
private Element createTextElement(String name, String text)
{
Text t = doc.createTextNode(text);
Element e = doc.createElement(name);
e.appendChild(t);
return e;
}
DCS – SWC 39
Creating XML documents
private Element createProduct(Product p)
{
Element e = doc.createElement(”product”);
e.appendChild(createTextElement(”name”, p.getName()));
e.appendChild(createTextElement(”price”, p.getPrice()));
e.appendChild(createTextElement(”weight”, p.getWeight()));
return e;
}
DCS – SWC 40
Creating XML documents
private Element createProducts(ArrayList<Product> pList)
{
Element e = doc.createElement(”products”);
for (product p : pList)
{
e.appendChild(createProduct(p));
}
return e;
}
DCS – SWC 41
Creating XML documents
DocumentBuilderFactory fac =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = fac.newDocumentBuilder();
Document doc = builder.newDocument;
// Now doc contains an empty DOM tree
ArrayList<Product> pList = ...;
Element root = createProducts(pList);
doc.appendChild(root);
Creating XML documents
• Final step – convert the completed DOM tree to a string (which could then be displayed on screen or written to a file)
• Requires a bit of ”black maigic”…
DCS – SWC 42
DCS – SWC 43
Creating XML documents
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS =
(DOMImplementationLS) impl.getFeature(”LS”, ”3.0”);
LSSerializer ser = implLS.createLSSerializer();
ser.getDomConfig().setParameter(”format-pretty-print”, true);
String str = ser.writeToString(doc);
DCS – SWC 44
Validating XML documents
• It will often be convenient to know if an XML document obeys certain rules about its content
• Can e.g make processing easier – do not need to include error handling
• Specification of such rules can be done in various ways
DCS – SWC 45
Validating XML documents
• Original way – use a DTD
• DTD – Document Type Definition
• A DTD is a sequence of rules describing– The valid attributes for each element type– The valid child elements for each element
type
DCS – SWC 46
Validating XML documents
• Examples of DTD rules:– <!ELEMENT products (product*)> - a
products element must contain zero or more elements of type product
– <!ELEMENT product (name, price, weight)> - a product element must have the children: one name, one price, one weight, in that order
– <!ELEMENT name (#PCDATA)> - a name element must have a child of type text
DCS – SWC 47
Validating XML documents
• In order to validate an XML document against a DTD, the DTD must be specified– Can be included in the XML document– Can be referenced
• NOTE: Validation is optional, it is up to us to do it…
DCS – SWC 48
Validating XML documents
• A more modern way of validating XML documents is by using an XSD
• XSD – XML Schema Definition
• Provides a more general framework for specification of the document format
• Is itself written in XML
• Comes closer to actual class definitions
DCS – SWC 49
Validating XML documents
<xsd:complexType name=”product”>
<xsd:sequence>
<xsd:element name=”name” type=xsd:string>
<xsd:element name=”price” type=xsd:float>
<xsd:element name=”weight” type=xsd:integer>
</xsd:sequence>
</xsd:complexType>
Transforming XML documents
• A common task is to transform data given in XML format to ”something else”…
• Reading/writing XML in Java transforms the data to an in-memory object model
• This is a ”programmatic” transformation, we can also imagine more static or declarative transformations
DCS – SWC 50
DCS – SWC 51
Transforming XML documents
• Such a transformation can be specified by a so-called XSLT (XSL Transformation)
• Specifies a transformation from the XML document into….anything!– A Word document– A HTML page– Java code (!)– …?
Transforming XML documents
• Example: A complex electronic device is described in XML
• We wish to create a software model of the device, with classes, interfaces, etc., to enable software simulation of the device
• The transformation from XML to Java code could be done by an XSLT
• Input: XML, Output: Java code…
DCS – SWC 52