xml and danse michael mckerns danse software workshop caltech materials science

44
XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Post on 15-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XML and DANSE

Michael McKerns

DANSE Software WorkshopCaltech Materials Science

Page 2: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

a “Brief” XML Overview

PART I

Page 3: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

What is XML?

XML == eXtensible Markup Language

a subset of SGML (standard generic markup language)a 'metalanguage' used to describe other languagesa language designed to describe dataa language where element names and document structure are not predefineda 'clear-text' format languagea cross-platform, software and hardware independent tool for storing, processing, and transmitting information

Page 4: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XML is not...designed to display data. If you do want to display data over the web, you should try HTML.

intended to 'do' anything. It is a language that acts as a container for information. It creates and describes structure of how the information is stored.

a replacement for SGML, it is a subset. XML retains much of the functionality of SGML while removing many of the options and complexities.

difficult to learn or read. XML is written in clear text, and has relatively few rules (or exceptions).

Page 5: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XML components – Elements

XML is built from elements. Elements are composed of a start-tag, and an end-tag, with text and/or more markup (sub-elements) between them. A simple example is <title>Python and XML</title>.

There is always one root element, but there can be many nested sub-elements. Elements are built to have parent- child relationships.

Elements are (usually) named to describe the information that is contained between the tags.

Page 6: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XML components – Attributes

Elements can have attributes in the start tag. The syntax for an attribute is attribute='description'. Quotes may be single or double.

Attributes are used to provide additional information about elements. While elements are used to store and describe data, attributes should be used to store and describe metadata.

Attributes can not describe structures (no children), and can not contain multiple values.

Page 7: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

A basic XML file<?xml version='1.0'?><!-- A basic XML file --><book> <title>Python and XML</title> <author>Christopher A. Jones</author> <author>Fred L. Drake, Jr.</author> <publisher email='[email protected]'> O&apos;Reilly</publisher> <text language='english' format='textbook'> <preface> ... </preface> <chapter1> ... </chapter1> ... </text></book>

Page 8: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Making XML more specific

The actual element names are unimportant in a well-formed XML document, and can be replaced as long as the inheritance structure is maintained.

<book> <banana>Python and XML</banana> <kijjipt lgne='english'> ... </kijjipt></book>

However to help make sense for the developer and user, a schema (a set of naming and structure rules) can be imposed on an XML document.

Page 9: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Why use a Schema?

A schema defines the legal building blocks of an XML document through a list of legal elements.

A schema defines default and fixed values for elements and attributes, as well as the order and number of child elements.

A schema greatly aids the sender in describing information in a way that receiver can understand.

A schema can aid in finding errors in a (well-formed) XML document.

Page 10: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

DTD vs XSD

A DTD (Document Type Definition) is a type of very simple schema. However, a shortcoming of DTD is neither extensible or XML. Also, only a single DTD can be applied within an XML document.

XSD (XML Schema Definition) is a schema language that is written in XML. XSD is not only extensible, but it inherits all the features of XML. Further, through use of namespaces, a single XML document can contain many XSD.

Page 11: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

What are Namespaces?

Namespaces in XML are special attributes that are used to qualify elements. Namespaces allow resolution of element name conflicts caused by reusing the same element name in different schema.

Namespaces defined in a start tag associate all child elements with the element that holds the namespace value. The standard is to use a Uniform Resource Identifier (URI) to give the namespace a unique name, however no information is looked up at the URI.

Page 12: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

An XML file with Namespaces<?xml version='1.0'?><dt:data_transformation xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://arcs.caltech.edu/~jonny/ dtr.xsd' xmlns:dt='http://arcs.caltech.edu/datatrans'> <dt:description> <dt:author>Mike McKerns after Jonny Lin</dt:author> <dt:comments>calculate Bose factor</dt:comments> <dt:description> <dt:input dt:name='Energy' dt:type='Array'/> <dt:input dt:name='Intensity' dt:type='Array'/> <dt:output dt:name='Energy' dt:type='Array'/> <dt:output dt:name='Phonon DOS' dt:type='Array'/></dt:data_transformation>

Page 13: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Processing an XML file

Processing XML is broken into two parts: a parser and an application.

To check the structure and format of an XML document with a schema, the XML must first be read into a parser.

If an XML document complies with the rules of the schema (validated), then it is parsed into a form that is able to be processed by the application.

Page 14: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

What does a parser actually do?

A parser is responsible for reading raw bytes of data that make up the serialized XML document, reacting to markup specific characters ('<', '&', ...), and creating a representation for the elements and attributes that compose the conceptual XML document.

60|98|111|111|107|62|... == <|b|o|o|k|>|...

Parsers typically output data into an event-based representation or a tree-based representation.

Page 15: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Event-based parsing

<?xml version='1.0'?><!-- A basic XML file --><book><title>P

ython and XML</title><author>Christopher A. Jones</author>

<author>Fred L. Drake, Jr.</author><publisher email='corporat

[email protected]'>O&apos;Reilly</publisher><text language='en

Page 16: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Tree-based parsing

Page 17: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

SAX vs DOMSimple API for XML (SAX) is event-based, while the Document Object Model (DOM) is tree-based.

SAX is more simple to learn & implement, requires far less memory resources, and is more resistant to format change than DOM.

SAX has more difficulty searching XML and forming user understandable code. Further, SAX cannot modify an XML document, while DOM easily adds/deletes nodes.

SAX should be used for simple translations and filters, while DOM should be used for searching and interactive or complex translations.

Page 18: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Transform XML to...

Stylesheets hold translation templates for transforming the markup in a document into another markup language or dialect of the same language.

Stylesheets were intended to format the document to allow display of information in a browser.

CSS is a simple stylesheet for enabling HTML to be displayed in a browser. XSL is the stylesheet for XML. However to view XML in a browser, XSL is typically used to transform the XML to XHTML to which then a CSS can be applied.

Page 19: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XSL... more than just a stylesheet

XSL (eXtensible Style Language) is composed of a formatting application and XSLT (a transformation application).

The formatting portion of XSL is basically a translation- specific XML schema language, while XSLT is composed of parsers and serializes built around a processing engine.

Since XSLT is extensible, it can be formed to transform the XML to any language that the XSL stylesheet can hold a template for (XML, HTML, XHTML, LaTeX, PDF,...).

Page 20: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XSLTXSLT uses statements like <xsl:template match='phonon'> to define parts of the source document that match one or more of the predefined templates.

When a match is found, XSLT will transform the matching part of the source document into the result document.

Element names in XSL are very similar to protected names in a standard programming language (xsl:for-each, xsl:value-of, ...)

The parts of the source document that do not match the templates are passed unmodified to the result document.

Page 21: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

The Big Picture (so far)

Page 22: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Working outside your computer

Even though you can very happily 'do everything' on your own pc, consider the convenience of having the ability to access datasets and applications that are external to your computer and integrate them with your favorite local application...

Then, once you have a RPC client and access to a server, you have a whole set of new tools available to you – without ever having to insufficient memory to run them, or to fight through the trouble of installing, or finding disk space for them on your pc.

Page 23: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

What are Remote Procedure Calls?

A procedure call is the name of a procedure, its parameters, and the result it returns – a remote procedure call (RPC) is a call made to a remote machine.

An RPC is a communication protocol that allows cross- platform distributed computing. RPC's typically use HTTP as the transport and XML as the encoding.

Especially when called from Python, a very few lines of code can activate a RPC and return a result from a calculation done on a remote computer.

Page 24: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XMLRPC

XMLRPC is a simple and effective means to request and receive information. Many of the element names and much of the structure is fixed.

XMLRPC can only use structs (an anonymous set of name- value pairs) and arrays (an anonymous grouping of elements with no limits on type mixing).

The simplicity of XMLRPC is both a strength and a weakness. It has difficulty when passing an object as an argument to a function, specifying what portion of a receiving application the message is intended for, ...

Page 25: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

SOAPSOAP is written in XML, and thus is extensible. SOAP makes extensive use of namespacing and attribute specification tags. SOAP is relatively complex and somewhat unstable – and documentation is scarce.

Even though SOAP is only a submission at W3C, it is currently used by MS as the core for the .NET framework and is being used by IBM as the transport protocol for the Grid.

SOAP is more secure than XMLRPC, by implementing greater controls on what and how the message is sent and received, including message specific processing control, the ability to specify the recipient, ...

Page 26: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Some simple XMLRPC code

POST /RPC2 HTTP/1.0Host: betty.userland.comContent-Type: text/xmlContent-length: 181<?xml version='1.0'?><!-- Simple XMLRPC code --><methodCall> <methodName>examples.getStateName</methodName> <params> <param><value><id>41</id></value></param> </params><methodCall>

Page 27: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

XMLRPC from Python

>>> import xmlrpclib>>> server_url = 'http://betty.userland.com/RPC2'>>> server = xmlrpc.Server(server_url)>>> server.examples.getStateName(41)'South Dakota'

Page 28: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

The Bigger Picture

Vendor A Vendor B

This whole process is transparent for client and component

C lien tapplicatio n

C o m ponent

XML/HTTP

2

2. XMLRPC proxy intercepts call to construct and transmit XML request message

XM LR P Clis tener

3

3. XMLRPC listener receives, parses and validates request

4

4. Listener calls component message

56

5. Listener takes result of call and constructs and transmits XML response

6. Proxy receives and parses response and returns result to client

XM LR P Cproxy

1

1. Client application makes call

Page 29: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Python XML Resourcesxmlrpclib

http://www.pythonware.com/products/xmlrpc

PyXMLhttp://pyxml.sourceforge.net

4Suitehttp://4suite.org

SOAPyhttp://soapy.sourceforge.net

Python & XML by Jones & Drake (O'Reilly)Python Cookbook by Martelli & Ascher (O'Reilly)

Page 30: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

the DANSE Client and Server

PART II

Page 31: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Distributed Computing Services

The server can provide access to the best combination of hardware and softwareMost experimental data and analysis codes reside on the servers, so little bandwidth is neededComputing resources can be changed without affecting the userComputation can be local or non-localClean separation of GUI from analysis codesOne web portal for all neutron instruments (?)

Page 32: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Two Key Concepts

Components Pre-compiled Python objects called and re-arranged by the Python Interpreter.

Data Streams Standard communication protocol between components. Standard streams can connect components located anywhere…

Page 33: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Data Analysis Execution

• User hits “Run”

• Client interprets wiring diagram as XMLRPC commands

• Server receives commands,arranges Python script, and data processing commences.

Page 34: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

core of the DANSE Server

DTServer.py – top level serverDataTransformations.py – manages requests for information & executes DTsRPC2.py – called by Apache for XMLRPC & hands off to appropriate modules, writes result (XML) to httputil_RPC2.py – convert URI to URL, make RPC callUserDir.py – manages changes to user filesUserHash.py – manages session hash, security, encryptionCustomerProfile.py – manages user profiles

Page 35: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

conceptual DANSE Server

SQL

Page 36: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

core of the DANSE Client

ViPEr – visual programming environment (GUI)loginGUI.py – login dialogue boxCobra.py – top level: load & show librariesRPC/Remote.py – manages login & remote library callsLocal/Library.py – manages local library callsNetwork – abstract components for networkingRPC/DataTransformation.py – executes local DT's & requests remote DT's

Page 37: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

conceptual DANSE Client

Page 38: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Adding/Modifying a DataTransformation

One of the first questions commonly asked about any new software is:

“How can I use it to do MY research?”

Well, (currently) if you can provide a pure python program (or a program wrapped in python) that transforms data in the way you desire, then it can be easily added to the DANSE architecture.

Page 39: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

Demo

Now, let's add a new DataTransformation...

Page 40: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

library.xml

<?xml version="1.0"?><Library name="Default" owner="commune" xmlns:xlink="http://www.w3.org/1999/xlink"> <Shelf name="ARCS" owner="commune" xlink:type="simple" xlink:href="file://kittel.caltech.edu/~mmckerns/RPC2/RPC2.py ?UserDir.getFile=commune=/libraries/ARCS.xml" />

</Library>

Page 41: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

“Shelf.xml” (ARCS.xml)

<?xml version="1.0"?><Shelf name="ARCS Data" owner="commune"> <DataTransformation name="ARCS.ReduceNi" owner="commune" /> <DataTransformation name="ARCS.Bose_Factor" owner="commune" /> <DataTransformation name="ARCS.Born_von_Karman" owner="commune" />

</Shelf>

Page 42: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

“DataTF” (Bose_Factor)<?xml version="1.0"?><dt:data_transformation ... dt:name="ARCS.Bose_Factor" dt:owner="commune" dt:type="basic_python" dt:address="file://kittel.caltech.edu/~mmckerns/RPC2/RPC2.py ?UserDir.getFile=commune=/ARCS/Bose_Factor.py">

<dt:input dt:name="Energy" dt:type="Array"/> <dt:input dt:name="Intensity" dt:type="Array"/> <dt:output dt:name="Energy" dt:type="Array"/> <dt:output dt:name="Phonon DOS" dt:type="Array"/>

</dt:data_transformation>

Page 43: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

“DataTF.py” (Bose_Factor.py)import math

def doTransformation(inputs,outputs): energy=inputs[0].value intensity=inputs[1].value outputs[0].value=[] outputs[1].value=[] for i in range(len(energy)): if(energy[i] >= 0.0): outputs[0].value.append(energy[i]) factor = 1-math.exp(-energy[i]/25.3) outputs[1].value.append(intensity[i]*energy[i]*factor) return 1

Page 44: XML and DANSE Michael McKerns DANSE Software Workshop Caltech Materials Science

adding to DTServer vs Local

If the new DataTransform is being added to DANSE locally, then that's it!

However, if the DataTransform is to be added to the DANSE Server, then there is one more step: add the new DataTransform to the DTServer database by using “xforms.py”

python xforms.py -i Bose_Factor.xml