xml and databases tutorial session 1: dom

51

Upload: others

Post on 12-Feb-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML and Databases Tutorial session 1: DOM

XML and Databases

Tutorial session 1: DOM

[email protected]

Week 2

1 / 20

Page 2: XML and Databases Tutorial session 1: DOM

XML

An XML document is basically a tree-like structure:

<adressbook><con ta c t cat="Work">

<name>< f i r s t>Seba s t i a n</ f i r s t><l a s t>Maneth</ l a s t>

</name><ema i l>[email protected]</ ema i l>

</ con ta c t><con ta c t ca t="Family">

<name>< f i r s t>Yoh</ f i r s t>

<l a s t>Nguyễn</ l a s t></name><add r e s s>4 , foo St , London , UK</ add r e s s>

</ con ta c t></adressbook>

2 / 20

Page 3: XML and Databases Tutorial session 1: DOM

XML

An XML document is basically a tree-like structure:

<adressbook><con ta c t cat="Work">

<name>< f i r s t>Seba s t i a n</ f i r s t><l a s t>Maneth</ l a s t>

</name><ema i l>[email protected]</ ema i l>

</ con ta c t><con ta c t ca t="Family">

<name>< f i r s t>Yoh</ f i r s t>

<l a s t>Nguyễn</ l a s t></name><add r e s s>4 , foo St , London , UK</ add r e s s>

</ con ta c t></adressbook>

2 / 20

Page 4: XML and Databases Tutorial session 1: DOM

XML

An XML document is basically a tree-like structure:

<adressbook><con ta c t cat="Work">

<name>< f i r s t>Seba s t i a n</ f i r s t><l a s t>Maneth</ l a s t>

</name><ema i l>[email protected]</ ema i l>

</ con ta c t><con ta c t ca t="Family">

<name>< f i r s t>Yoh</ f i r s t>

<l a s t>Nguyễn</ l a s t></name><add r e s s>4 , foo St , London , UK</ add r e s s>

</ con ta c t></adressbook>

2 / 20

Page 5: XML and Databases Tutorial session 1: DOM

XML

An XML document is basically a tree-like structure:

<adressbook><con tac t cat="Work">

<name>< f i r s t>Seba s t i a n</ f i r s t><l a s t>Maneth</ l a s t>

</name><ema i l>[email protected]</ ema i l>

</ con ta c t><con ta c t ca t="Family">

<name>< f i r s t>Yoh</ f i r s t>

<l a s t>Nguyễn</ l a s t></name><add r e s s>4 , foo St , London , UK</ add r e s s>

</ con ta c t></adressbook>

2 / 20

Page 6: XML and Databases Tutorial session 1: DOM

XML

An XML document is basically a tree-like structure:

<adressbook><con ta c t cat="Work">

<name>< f i r s t>Seba s t i a n</ f i r s t><l a s t>Maneth</ l a s t>

</name><ema i l>[email protected]</ ema i l>

</ con ta c t><con ta c t ca t="Family">

<name>< f i r s t>Yoh</ f i r s t>

<l a s t>Nguyễn</ l a s t></name><add r e s s>4 , foo St , London , UK</ add r e s s>

</ con ta c t></adressbook>

2 / 20

Page 7: XML and Databases Tutorial session 1: DOM

XML

An XML document is basically a tree-like structure:

<adressbook><con ta c t cat="Work">

<name>< f i r s t>Seba s t i a n</ f i r s t><l a s t>Maneth</ l a s t>

</name><ema i l>[email protected]</ ema i l>

</ con ta c t><con ta c t ca t="Family">

<name>< f i r s t>Yoh</ f i r s t>

<l a s t>Nguyễn</ l a s t></name><add r e s s>4 , foo St , London , UK</ add r e s s>

</ con ta c t></adressbook>

2 / 20

Page 8: XML and Databases Tutorial session 1: DOM

Manipulating XML Documents within a program

I �By hand� (reading the document as a pure text-�le)

⇒ Error-prone and too much to take into account

I Using DOM: mapping the tree sturcture on a Object Hierarchy

⇒ Today's lecture and 1st assignment

I Using SAX: event-based parsing (next week)

3 / 20

Page 9: XML and Databases Tutorial session 1: DOM

Manipulating XML Documents within a program

I �By hand� (reading the document as a pure text-�le)

⇒ Error-prone and too much to take into account

I Using DOM: mapping the tree sturcture on a Object Hierarchy

⇒ Today's lecture and 1st assignment

I Using SAX: event-based parsing (next week)

3 / 20

Page 10: XML and Databases Tutorial session 1: DOM

Manipulating XML Documents within a program

I �By hand� (reading the document as a pure text-�le)

⇒ Error-prone and too much to take into account

I Using DOM: mapping the tree sturcture on a Object Hierarchy

⇒ Today's lecture and 1st assignment

I Using SAX: event-based parsing (next week)

3 / 20

Page 11: XML and Databases Tutorial session 1: DOM

Manipulating XML Documents within a program

I �By hand� (reading the document as a pure text-�le)

⇒ Error-prone and too much to take into account

I Using DOM: mapping the tree sturcture on a Object Hierarchy

⇒ Today's lecture and 1st assignment

I Using SAX: event-based parsing (next week)

3 / 20

Page 12: XML and Databases Tutorial session 1: DOM

Manipulating XML Documents within a program

I �By hand� (reading the document as a pure text-�le)

⇒ Error-prone and too much to take into account

I Using DOM: mapping the tree sturcture on a Object Hierarchy

⇒ Today's lecture and 1st assignment

I Using SAX: event-based parsing (next week)

3 / 20

Page 13: XML and Databases Tutorial session 1: DOM

DOM Concepts

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/

Node : A document is seen as a set of nodes.

⇒ Abstract concept, mapped on an abstract class

⇒ Derived to get many kinds of nodes: ElementNode,TextNode, AttributeNode, . . .

NodeList : to return sets of Nodes

Parser : to return a DocumentNode from a �lename or URI

. . .

4 / 20

Page 14: XML and Databases Tutorial session 1: DOM

DOM Concepts

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/

Node : A document is seen as a set of nodes.

⇒ Abstract concept, mapped on an abstract class⇒ Derived to get many kinds of nodes: ElementNode,

TextNode, AttributeNode, . . .

NodeList : to return sets of Nodes

Parser : to return a DocumentNode from a �lename or URI

. . .

4 / 20

Page 15: XML and Databases Tutorial session 1: DOM

DOM Concepts

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/

Node : A document is seen as a set of nodes.

⇒ Abstract concept, mapped on an abstract class⇒ Derived to get many kinds of nodes: ElementNode,

TextNode, AttributeNode, . . .

NodeList : to return sets of Nodes

Parser : to return a DocumentNode from a �lename or URI

. . .

4 / 20

Page 16: XML and Databases Tutorial session 1: DOM

DOM Concepts

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/

Node : A document is seen as a set of nodes.

⇒ Abstract concept, mapped on an abstract class⇒ Derived to get many kinds of nodes: ElementNode,

TextNode, AttributeNode, . . .

NodeList : to return sets of Nodes

Parser : to return a DocumentNode from a �lename or URI

. . .

4 / 20

Page 17: XML and Databases Tutorial session 1: DOM

DOM Concepts

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/

Node : A document is seen as a set of nodes.

⇒ Abstract concept, mapped on an abstract class⇒ Derived to get many kinds of nodes: ElementNode,

TextNode, AttributeNode, . . .

NodeList : to return sets of Nodes

Parser : to return a DocumentNode from a �lename or URI

. . .

4 / 20

Page 18: XML and Databases Tutorial session 1: DOM

Xerces: a DOM implementation

I Open source DOM implementation from the Apache foundation

I Two �avors : Xerces-J (Java), Xerces-C (C++)

I The one used for your assignments

Today: Focus on simple document loading with Xerces and documenttraversal

5 / 20

Page 19: XML and Databases Tutorial session 1: DOM

Xerces: a DOM implementation

I Open source DOM implementation from the Apache foundation

I Two �avors : Xerces-J (Java), Xerces-C (C++)

I The one used for your assignments

Today: Focus on simple document loading with Xerces and documenttraversal

5 / 20

Page 20: XML and Databases Tutorial session 1: DOM

Xerces: a DOM implementation

I Open source DOM implementation from the Apache foundation

I Two �avors : Xerces-J (Java), Xerces-C (C++)

I The one used for your assignments

Today: Focus on simple document loading with Xerces and documenttraversal

5 / 20

Page 21: XML and Databases Tutorial session 1: DOM

Xerces: a DOM implementation

I Open source DOM implementation from the Apache foundation

I Two �avors : Xerces-J (Java), Xerces-C (C++)

I The one used for your assignments

Today: Focus on simple document loading with Xerces and documenttraversal

5 / 20

Page 22: XML and Databases Tutorial session 1: DOM

Xerces-J: Java packages

Program compilation:javac -cp /usr/share/java/xercesImpl.jar MyClass.java

Java package inclusions:

// In xercesImpl . jar , this is the actual Xerces implementation

import org.apache.xerces.parsers.DOMParser;// Abstract DOM classes , implemented by Xerces

import org.w3c.dom.* ;

// Various java utilities

import java. util .∗;import java.io .∗;

6 / 20

Page 23: XML and Databases Tutorial session 1: DOM

Xerces-J: Java packages

Program compilation:javac -cp /usr/share/java/xercesImpl.jar MyClass.java

Java package inclusions:

// In xercesImpl . jar , this is the actual Xerces implementation

import org.apache.xerces.parsers.DOMParser;// Abstract DOM classes , implemented by Xerces

import org.w3c.dom.* ;

// Various java utilities

import java. util .∗;import java.io .∗;

6 / 20

Page 24: XML and Databases Tutorial session 1: DOM

Xerces-J: Getting a Parser

public class DOMExample {DOMParser myparser;

DOMExample() {try {

myparser = new DOMParser();

} catch(Exception e) {System.out. println ("Problem during initialization " + e);

}};

...

7 / 20

Page 25: XML and Databases Tutorial session 1: DOM

Xerces-J: Loading a document

...Document loadDocument(string �lename){

try {

// �rst parse the document

myparser.parse(�lename);// then retrieve it

return myparser.getDocument();

catch (Exception e) {System.out.println("could not parse document")

[do something else here to �nish properly]}

}

8 / 20

Page 26: XML and Databases Tutorial session 1: DOM

Xerces-J: Loading a document

...Document loadDocument(string �lename){

try {// �rst parse the document

myparser.parse(�lename);// then retrieve it

return myparser.getDocument();

catch (Exception e) {System.out.println("could not parse document")

[do something else here to �nish properly]}

}

8 / 20

Page 27: XML and Databases Tutorial session 1: DOM

Xerces-J: Iterating the XML document (1/2)

void traverse (Node n){

switch (n.getNodeType()){

case Node.DOCUMENT_NODE:. . .break;

case Node.ELEMENT_NODE:. . .break;

case Node.TEXT_NODE:

break;...

9 / 20

Page 28: XML and Databases Tutorial session 1: DOM

Xerces-J: Iterating the XML document (1/2)

void traverse (Node n){

switch (n.getNodeType()){case Node.DOCUMENT_NODE:

. . .break;

case Node.ELEMENT_NODE:. . .break;

case Node.TEXT_NODE:

break;...

9 / 20

Page 29: XML and Databases Tutorial session 1: DOM

Xerces-J: Iterating the XML document (1/2)

void traverse (Node n){

switch (n.getNodeType()){case Node.DOCUMENT_NODE:

. . .break;

case Node.ELEMENT_NODE:. . .break;

case Node.TEXT_NODE:

break;...

9 / 20

Page 30: XML and Databases Tutorial session 1: DOM

Xerces-J: Iterating the XML document (2/2)

Node child ;child = n.getFirstChild();

while ( child != null){// recursive call

traverse ( child );

child = child.getNextSibling();};

}

10 / 20

Page 31: XML and Databases Tutorial session 1: DOM

Xerces-J: Invocation

public static void main(string [] args){...

DOMExample e = new DOMExample();Document d = e.loadDocument("�lename.xml");e. traverse (d);...

}

11 / 20

Page 32: XML and Databases Tutorial session 1: DOM

Xerces-J: Summary

I call the parse() method of a DOMParser to load the document

I a Document is a Node. In particular, its a Node for which:getNodeType() returns Node.DOCUMENT_ELEMENT

I tons of method in the Node class: getNodeType(),getFirstChild(), getNextSibling() and many others likegetNodeValue(), getAttributes(),. . .

I consult the javadoc here for the full description:http://java.sun.com/j2se/1.5.0/docs/api/index.html

browse the org.w3c.dom.* package.

12 / 20

Page 33: XML and Databases Tutorial session 1: DOM

Xerces-J: Summary

I call the parse() method of a DOMParser to load the document

I a Document is a Node. In particular, its a Node for which:getNodeType() returns Node.DOCUMENT_ELEMENT

I tons of method in the Node class: getNodeType(),getFirstChild(), getNextSibling() and many others likegetNodeValue(), getAttributes(),. . .

I consult the javadoc here for the full description:http://java.sun.com/j2se/1.5.0/docs/api/index.html

browse the org.w3c.dom.* package.

12 / 20

Page 34: XML and Databases Tutorial session 1: DOM

Xerces-J: Summary

I call the parse() method of a DOMParser to load the document

I a Document is a Node. In particular, its a Node for which:getNodeType() returns Node.DOCUMENT_ELEMENT

I tons of method in the Node class: getNodeType(),getFirstChild(), getNextSibling() and many others likegetNodeValue(), getAttributes(),. . .

I consult the javadoc here for the full description:http://java.sun.com/j2se/1.5.0/docs/api/index.html

browse the org.w3c.dom.* package.

12 / 20

Page 35: XML and Databases Tutorial session 1: DOM

Xerces-J: Summary

I call the parse() method of a DOMParser to load the document

I a Document is a Node. In particular, its a Node for which:getNodeType() returns Node.DOCUMENT_ELEMENT

I tons of method in the Node class: getNodeType(),getFirstChild(), getNextSibling() and many others likegetNodeValue(), getAttributes(),. . .

I consult the javadoc here for the full description:http://java.sun.com/j2se/1.5.0/docs/api/index.html

browse the org.w3c.dom.* package.

12 / 20

Page 36: XML and Databases Tutorial session 1: DOM

Xerces-C: C++ headers inclusion and linking

Program compilation:g++ -o myprog -lxerces-c myClass.cpp

C++ headers inclusion:

//DOM parser and xerces utilities

#include <xercesc/parsers/XercesDOMParser.hpp>#include <xercesc/util/PlatformUtils.hpp>

//C++ utilities

#include <string>#include <stdexcept>

// namespaces opening

using namespace xercesc;using namespace std;

13 / 20

Page 37: XML and Databases Tutorial session 1: DOM

Xerces-C: C++ headers inclusion and linking

Program compilation:g++ -o myprog -lxerces-c myClass.cpp

C++ headers inclusion:

//DOM parser and xerces utilities

#include <xercesc/parsers/XercesDOMParser.hpp>#include <xercesc/util/PlatformUtils.hpp>

//C++ utilities

#include <string>#include <stdexcept>

// namespaces opening

using namespace xercesc;using namespace std;

13 / 20

Page 38: XML and Databases Tutorial session 1: DOM

Xerces-C: Getting a Parser

class DOMExample {private:DOMParser * myparser;

public:DOMExample() {try {

XMLPlatformUtils::Initialize();myparser = new XercesDOMParser();

} catch(Exception &e) {cout << "Problem during initialization " << e.what()

<< endl;}

};

...14 / 20

Page 39: XML and Databases Tutorial session 1: DOM

Xerces-C: Cleaning

~DOMExample() {myparser->release();XMLPlatformUtils::Terminate();

};

...

15 / 20

Page 40: XML and Databases Tutorial session 1: DOM

Xerces-C: Loading a document

...DOMDocument loadDocument(char ∗ �lename){

try {

XMLCh* str = XMLString::translate(�lename);

myparser->parse(�lename);DOMDocument d = myparser->getDocument();XMLString:: release(&str );return d;

}catch (Exception &e) {cout << "could not parse document";

[do something else here to �nish properly]}

}16 / 20

Page 41: XML and Databases Tutorial session 1: DOM

Xerces-C: Loading a document

...DOMDocument loadDocument(char ∗ �lename){

try {XMLCh* str = XMLString::translate(�lename);

myparser->parse(�lename);DOMDocument d = myparser->getDocument();XMLString:: release(&str );return d;

}catch (Exception &e) {cout << "could not parse document";

[do something else here to �nish properly]}

}16 / 20

Page 42: XML and Databases Tutorial session 1: DOM

Xerces-C: Iterating the XML document (1/2)

void traverse (DOMNode n){

switch (n->getNodeType()){

case DOMNode::DOCUMENT_NODE:. . .break;

case DOMNode::ELEMENT_NODE:. . .break;

case DOMNode::TEXT_NODE:

break;...

17 / 20

Page 43: XML and Databases Tutorial session 1: DOM

Xerces-C: Iterating the XML document (1/2)

void traverse (DOMNode n){

switch (n->getNodeType()){case DOMNode::DOCUMENT_NODE:

. . .break;

case DOMNode::ELEMENT_NODE:. . .break;

case DOMNode::TEXT_NODE:

break;...

17 / 20

Page 44: XML and Databases Tutorial session 1: DOM

Xerces-C: Iterating the XML document (1/2)

void traverse (DOMNode n){

switch (n->getNodeType()){case DOMNode::DOCUMENT_NODE:

. . .break;

case DOMNode::ELEMENT_NODE:. . .break;

case DOMNode::TEXT_NODE:

break;...

17 / 20

Page 45: XML and Databases Tutorial session 1: DOM

Xerces-C: Iterating the XML document (2/2)

DOMNode child;child = n->getFirstChild();

while ( child != null ){// recursive call

traverse ( child );

child = child->getNextSibling();};

}

18 / 20

Page 46: XML and Databases Tutorial session 1: DOM

Xerces-C: Invocation

}; // closing the DOMExample class

int main(int argc , char ∗∗ argv){...

// this calls the constructor

DOMExample e;DOMDocument d = e.loadDocument(argv[1]);e. traverse (d);...

}

19 / 20

Page 47: XML and Databases Tutorial session 1: DOM

Xerces-C: Summary

Xerces-C is similar to Xerces-J with the following di�erences:

I Need to manually call XMLPlatformUtils::Initialize() andXMLPlatformUtils::Terminate()

I Xerces-C uses exclusively XMLCh * as string. UseXMLString::translate() to convert back and forth to char *and XMLString::release() to free, don't use delete directly!

I everything is passed by reference (e.g. n->getNodeType()) butnever free a structure directly. Use release() methods whenavailable, otherwise the pointer doesn't need to be freed.

I The API documentation is here:http://xerces.apache.org/xerces-c/apiDocs-2/

20 / 20

Page 48: XML and Databases Tutorial session 1: DOM

Xerces-C: Summary

Xerces-C is similar to Xerces-J with the following di�erences:

I Need to manually call XMLPlatformUtils::Initialize() andXMLPlatformUtils::Terminate()

I Xerces-C uses exclusively XMLCh * as string. UseXMLString::translate() to convert back and forth to char *and XMLString::release() to free, don't use delete directly!

I everything is passed by reference (e.g. n->getNodeType()) butnever free a structure directly. Use release() methods whenavailable, otherwise the pointer doesn't need to be freed.

I The API documentation is here:http://xerces.apache.org/xerces-c/apiDocs-2/

20 / 20

Page 49: XML and Databases Tutorial session 1: DOM

Xerces-C: Summary

Xerces-C is similar to Xerces-J with the following di�erences:

I Need to manually call XMLPlatformUtils::Initialize() andXMLPlatformUtils::Terminate()

I Xerces-C uses exclusively XMLCh * as string. UseXMLString::translate() to convert back and forth to char *and XMLString::release() to free, don't use delete directly!

I everything is passed by reference (e.g. n->getNodeType()) butnever free a structure directly. Use release() methods whenavailable, otherwise the pointer doesn't need to be freed.

I The API documentation is here:http://xerces.apache.org/xerces-c/apiDocs-2/

20 / 20

Page 50: XML and Databases Tutorial session 1: DOM

Xerces-C: Summary

Xerces-C is similar to Xerces-J with the following di�erences:

I Need to manually call XMLPlatformUtils::Initialize() andXMLPlatformUtils::Terminate()

I Xerces-C uses exclusively XMLCh * as string. UseXMLString::translate() to convert back and forth to char *and XMLString::release() to free, don't use delete directly!

I everything is passed by reference (e.g. n->getNodeType()) butnever free a structure directly. Use release() methods whenavailable, otherwise the pointer doesn't need to be freed.

I The API documentation is here:http://xerces.apache.org/xerces-c/apiDocs-2/

20 / 20

Page 51: XML and Databases Tutorial session 1: DOM

Xerces-C: Summary

Xerces-C is similar to Xerces-J with the following di�erences:

I Need to manually call XMLPlatformUtils::Initialize() andXMLPlatformUtils::Terminate()

I Xerces-C uses exclusively XMLCh * as string. UseXMLString::translate() to convert back and forth to char *and XMLString::release() to free, don't use delete directly!

I everything is passed by reference (e.g. n->getNodeType()) butnever free a structure directly. Use release() methods whenavailable, otherwise the pointer doesn't need to be freed.

I The API documentation is here:http://xerces.apache.org/xerces-c/apiDocs-2/

20 / 20