data mapping techniques

Upload: bala0159

Post on 09-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Data Mapping Techniques

    1/7

    Data Mapping Techniques -- WDDX

    Contents:

    y Introductiony Description of the cannonical XML formy Commonly Used Templates -- A Template Cookbook

    Back to top

    Introduction

    This paper explores a technique for exloiting the interchangability between XML documentsand Python data structures.

    It is often useful to load an XML document into Python data structures so that they can beprocessed. The DOM interface performs this task. However, DOM data structures arespecific to DOM. Often it is desirable to load the XML document into custom or application-

    specific data structures.

    There are a number of approaches to this problem that come to mind:

    y Use the SAX interface -- Scan the XML document, creating custom Python datastructures while doing so.

    y Use the DOM interface -- Load the XML document into a DOM tree, then perform atree walk on the DOM tree, creating custom Python data structures while doing so.

    In both of the above techniques, custom Python code is used to perform much of theconversion, plucking data values from the XML document and creating instances of the

    Python data structures. In effect, the developer's control over the conversion process isencoded in custom Python code.

    This paper presents a alternative technique. This technique described in this paper transforms

    the original XML document into an XML document having a cannonical form, specifically,

    WDDX, then use the (un-)marshaller that is distributed with PyXML to convert that into

    Python objects. In effect, XSLT is used to customize the conversion. The developer's control

    over the conversion process is encoded in an XSLT stylesheet.

    This technique has the following benefits:

    y XSLT can be used to perform the conversion.y We can provide a set of XSLT templates that can be easily adapted to each of a set of

    common transformations.y And the hope is that we can provide help with generating the templates for the XSLT

    stylesheet that are used in the conversion process. Perhaps, we can enable to describethe mapping from specific XML elements to specific Python data structures in an

    easier way, and then can generate XSLT stylesheet templates that perform that

  • 8/7/2019 Data Mapping Techniques

    2/7

    mapping (or more correctly, that convert to the XML elements that can beautomatically loaded into Python data structures).

    Our interest in this paper is to provide help with implementing an equivalence between XML

    documents and Python data structures.

    Note, however, that the technique we describe in this paper will work for any language forwhich there is an implementation of the marshaller/unmarshaller provided in generic.py (in

    PyXML).

    The top level process is composed of the following steps:

    1. Use an XSLT processor and the stylesheet that you have written to transform thesource XML document into an XML document of the form accepted by the classgeneric.Unmarshaller.

    2. Use the generic.py (in the PyXML distribution) to load the generated/marshalledXML into Python data structures.

    Here is sample code that performs this transformation:import genericimport libxsltmod

    class MsgHandler:def __init__(self):

    passdef write(self, msg):

    print '***', msg

    def loadfile(inFileName, stylesheetFile):msgHandler = MsgHandler()s1 = libxsltmod.translate_to_string(

    'f', stylesheetFile,

    'f', inFileName,msgHandler)

    print s1um = generic.Unmarshaller()ds = um.loads(s1)ds.show()

    Some notes about this code:

    y This code uses the libxsltmod XSLT processor, which is the libxslt C library that Iwrapped for Python. You can find it http://www.rexx.com/~dkuhlman. You should beable to use the XSLT processor in PyXML just as easily.

    y The Unmarshaller in generic.py is sensitive to whitespace. I squeezed whitespace outof the generated by including the following in my stylesheet:

    y y Module generic is included in the PyXML distribution (under the xml/marshal sub-

    directory). You can find out about PyXML here.

    y The above sample code assumes that the created data structure is an instance of aclass that has a show method.

    A note on the term "WDDX" -- I don't believe that the XML documents that we are

    generating (for input to the generic.py Unmarshaller) follow the DTD for WDDX. That

  • 8/7/2019 Data Mapping Techniques

    3/7

    doesn't concern me much for our purposes here, since in this technique we are buildingdocuments for input to generic.py. However, if you plan to share or "syndicate" those

    documents, then you will want to pay attention to generating XML documents that obey apublicly knownDTD. In the mean time, I believe that what this paper describes is in the

    "spirit" of WDDX in the sense that it marshals and unmarshals data structures in a way that is

    programming language neutral. However, the occurance of so many quasi-quotes in this

    paragraph should be a caution. As should paragraphs that refer to themselves.

    Back to top

    Description of the cannonical XML form

    This technique generates XML that can be processed by the Unmarshaller in generic.py,

    which is included in the PyXML distribution.

    This section describes the XML elements that we must generate. Effectively, we aredescribing the XML elements generated by class generic.Marshallerand functiongeneric.dumps() and accepted by class generic.Unmarshallerand function generic.loads().

    (gneric.py is in the PyXML distribution.)

    To create an instance of a class, generate something like the following:

    member_1value_1member_2

    value_2ooo

    Here are a few things to notice about this generated XML:

    y The name of the class, an instance of which is created, is the value of the attributeclass, e.g. "object_class_name".

    y This class must be defined in a module whose name is the value of the attributemodule, e.g. "object_classes". So, in this cass we would need a module

    object_classes.py.

    y The empty tuple in this generated XML could contain parameters to be passed to theconstructor to the class. If this tuple is empty, the constructure is notcalled. However,

    member variables for the instance will be initialize (see next bullet).

    y The dictionary contains the names and values of the member variables to be set in theinstance. The format is a member name followed by its value followed by the next

    member name followed by its value, and so on.

    To create a list of objects, generate something like the following:member_variable_name

  • 8/7/2019 Data Mapping Techniques

    4/7

    ooo

    o

    oo

    Or:member_variable_name

    value_1value_2ooo

    Here are a few things to notice about this generated XML:

    y If this list is to be the value of a member variable of a class, generate this code withinthe dictionary that defines the member variables of an instance of a class.

    y The list will become the value of the membermember_variable_name.To create a string value, generate the following:

    value_1

    To create an integer value, generate the following:101

    To create a float value, generate the following:1.23

    You can use class Marshaller in generic.py to determine the format of other data types. Thefollowing code will print a sample of the input to the Unmarshaller:

    import generic

    m = generic.Marshaller()ds1 = ([11,22], 333, 'bbb')s1 = m.dumps(ds1)print s1

    Back to top

    Commonly Used Templates -- A Template Cookbook

    This section presents some (skeletons of) templates that produce commonly needed XMLelements, for input to class generic.Unmarshaller. It can be viewed as a cookbook for creating

    XSLT templates to perform common data structure loading tasks.

    Create an object

    To create an instance of a class from the current element, create a template rule similar to the

    following:

  • 8/7/2019 Data Mapping Techniques

    5/7

    class_nameobject_classes

    member_x< /xsl:text>

    sub_object_list

    Where:

    y object_element_name is the name of the element.y object_class_name is the name of the Class. An instance of this class will be created

    from the element.

    y object_classes is the name of the module in which the class is defined. Create a .pyfile with this name containing the class definition.

    y Additional notes:o This example creates a member variable named member_x with a string value

    from the attribute named attribute_x.

    o This example creates a list of sub-objects and assigns it to member variablesub_object_list.

    Add a string member data item

    To add a member variable to the current object with a simple string value that comes from anattribute of the current element, do the following:

    Add the following snippet to the current template:

    member_variable_name

    Where:

    y member_variable_name is the name of the member data item to be added to thecurrent instance.

    y attribute_name is the name of the attribute that provides the value.

  • 8/7/2019 Data Mapping Techniques

    6/7

    To add a member variable to the current object with a simple string whose value that comesfrom the text (node) in the current element, do the following:

    Add the following snippet to the current template:

    member_variable_name

    Where:

    y member_variable_name is the name of the member data item to be added to thecurrent instance.

    Create a list of objects

    To create a list of objects from a nested list of elements, do the following:

    Step 1. Add the following snippet to the parent template:

    object_list

    Where:

    y object_list is the name of the member variable to be added to the parent instance.y object_element_name is the element/tag of the sub-elements. One object will be

    created and added to the list (object_list) for each sub-element of this name.

    Step 2. Add a template rule for the sub-element:

    class_nameobject_classes< /xsl:attribute>

    x

    object_list

  • 8/7/2019 Data Mapping Techniques

    7/7

    Where:

    y object_element_name is the name of the element.y object_class_name is the name of the Class. An instance of this class will be created

    from the element.

    y object_classes is the name of the module in which the class is defined. Create a .pyfile with this name containing the class definition.

    2.1 Causal mapping

    Causal mapping is one of the most commonly used cognitive mapping techniques in

    investigating the cognition of decisionmakers in organizations (Swan, 1997). Causal

    mapping is derived from personal construct theory (Kelly, 1955). This theory posits

    that an individual's set of perspectives is a system of personal constructs and

    individuals use their own personal constructs to understand and interpret events. In

    other words, an individual understands die environment with salient concepts(constructs), which can be expressed by eidier simple single-polar phrases or

    contextually rich bipolar phrases. An example of single-polar phrase is "good reader",

    while an example of bipolar phrase is "good computer skills - poor computer skills".

    As revealed by its name, a causal map represents a set of causal relationships amongconstructs within a belief system. Through capturing the causeeffect relationships,

    insights into the reasoning of a particular person are acquired.

    Semantic mapping

    It must be pointed out that causal assertions are only part of an individual's total belief

    system. There are some cognitive mapping techniques that can be used to identify other

    relations among concepts. Semantic mapping, also known as idea mapping, is used to explorean idea without the constraints of a superimposed structure (Buzan, 1993). To make a

    semantic map, one starts at the center of the paper with the main idea, and works outwards in

    all directions, producing a growing and organized structure composed of key words and keyimages. Around the main idea (a central word), five to ten ideas (child words) that are related

    to the central word are drawn. Each of these "child" words then serves as a sub-central wordfor the next level drawing (Buzan, 1993). In other words, a semantic map has one main or

    central concept with tree-like branches. Figure 2 is an example of a semantic map that depictsrelated words around the main idea "UML"