extracting typed values from xml data fabio simeoni david lievens paolo manghi steve neely richard...

16
Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

Upload: mitchell-terry

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

Extracting Typed Values from XML Data

Fabio Simeoni David Lievens Paolo Manghi

Steve Neely Richard Connor

Page 2: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 2

Outline

XML and Language Values

Programming Models over XMLLow-Level Bindings

Architecture

The SNAQue Prototype

Related Work

Conclusions

Language Bindings

A High-Level Binding

Page 3: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 3

XML and Language Values

Language values are increasingly generated and manipulated outside the language

jurisdiction, and occur instead as fragments of XML documents

irregular structure

unstable structure

regular structure

self-describing

Semistructured Documents

Portable Documents

Page 4: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 4

XML and Language Values

<staff><member code = “123517”>

<name>Richard Connor</name></member><member code = “123345”>

<name>Steve Neely</name></member><member code = “175417”>

<name>Fabio Simeoni</name></member>

</staff>

d’

Portable Document

Semistructured Document

<staff><member code = “123517”><name>Richard Connor</name>

<home>www.cis.strath...</home></member><member code = “123345”>

<name>Steve Neely</name><ext>4565</ext><project>

<name>SNAQue</name></project>

</member><member code = “175417”>

<ext>4566</ext><name>Fabio Simeoni</name>

</member></staff>

d

Page 5: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 5

staff

Class Staff {Private Member[] members;Member[] getMembers() {...}void setMembers(Member[] m)

{...}...}

Class Member {private String name;private int code;String getName() {...}void setName (String n) {...}int getCode() {...}void setCode (int c) {...}...}

XML and Language Values

<staff><member code = “123517”>

<name>Richard Connor</name></member><member code = “123345”>

<name>Steve Neely</name></member><member code = “175417”>

<name>Fabio Simeoni</name></member>

</staff>

d’

Page 6: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 6

Programming over d’ should be as:

as programming over staff with existing programming languages.

A Computational Requirement

simple

safe

efficient

Page 7: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 7

Programming Models over XML

Dedicated QLs (XQL, XML-QL, Lorel, etc.)

Research Models(XDuce,Tequila, etc.)

Language Bindings (DOM, SAX) ...

computationally incompletenot well suited to complex tasksessentially untyped

graph types vs. language typessteep learning curve/lack of tool support

Page 8: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 8

SAX & DOM

Data Structure:

Programming Model:

SAX: a string served as a temporal sequence of tokens

DOM:a tree

SAX: events and call-backs

DOM: tree navigation

Page 9: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 9

DOM supports the logical structure of the data.

SAX supports the syntactic structure of the data. Think as a Parser !

Data structures do not support data semantics (staff members are neither strings nor tree nodes)

Think as a Botanist !

Good for syntactic (SAX) and structural (DOM) manipulations, not domain-specific tasks.

SAX & DOM Limitations

Page 10: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 10

Class SaxTask extends DefaultHandler {private String name, code;private boolean inProject;

private CharArrayWriter buffer = new CharArrayWriter(); 

public void characters (char[ ] ch, int start, int length) {buffer.write(ch,start,length);} 

public void startElement (String uri, String name, String qName, Attributes atts) {if (name.equals("member")) code=atts.getValue("code");

if (name.equals("project")) inProject=true;buffer.reset();}

public void endElement (String uri, String name, String qName){ if (name.equals("project")) inProject=false;

if (name.equals("name") && !inProject) name=buffer.toString().trim()

… do something with name and code…}}

SAX Programming

Page 11: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 11

String name=null;int code;

Element staff = d.getDocumentElement();NodeList members =staff.getElementsByTagName("member");int membCount = members.getLength();

for (int i=0;i<membCount;i++) {Element member = (Element) members.item(i);code = Integer.parseInt(member.getAttribute("code"));NodeList children = member.getChildNodes();int length = children.getLength();

for (int j=0;j<length;j++) {Node child = children.item(j);if (child.getNodeType()==Node.ELEMENT_NODE) {

String tagName = ((Element) child).getTagName();if (tagName.equals("name")) name=

((character data) child.getFirstChild()).getData();}}… do something with name and code…}

DOM Programming

Page 12: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 12

Code is convoluted even for simple tasks

How easy is to write, maintain, and evolve the code? Programming is tedious, redundant and error-prone

How safe is the code?

Nodelist members = staff.getElementsByTagName(“mebmer”);

Safety is responsibility of the programmer

Programmatic checks worsen code readability

Programmatic checks are simply not enough...

SAX & DOM Programming

Page 13: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 13

Compare with: Member[] members = staff.getMembers();

for (int i=0;i<members.length;i++) { code = members[I].code; name=members[i].name;{...do something with name and code...}

Simple because the programming algebra isdomain-specific

Safe and efficient because static knowledge is domain-specific

Think Straight !

SAX & DOM Programming

Page 14: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 14

High-Level Bindings

Conclusion:

DOM & SAX are low-level bindings

We need high-level bindings that automate the conversion of XML fragments into their

counterpart within the language

d’

staff

Page 15: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 15

Programming Language

A High-Level Binding

Types Staff and

Member

staff

ExtractionMechanism

d’..

d

Binding as type projection

Page 16: Extracting Typed Values from XML Data Fabio Simeoni David Lievens Paolo Manghi Steve Neely Richard Connor

10/14/2001 OOPSLA 2001 Workshop 16