xml and databases

41
XML and Databases 198:541

Upload: lionel

Post on 06-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

XML and Databases. 198:541. XML Motivation. XML Motivation. Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions (presentation) Integration of data from different sources Structural differences - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML and Databases

XML and Databases

198:541

Page 2: XML and Databases

XML Motivation

Page 3: XML and Databases

XML Motivation

Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions (presentation)

Integration of data from different sources Structural differences

Closely related to semistructured data

Page 4: XML and Databases

Semistructured Data

Integration of heterogeneous sourcesData sources with non rigid structures

Biological data Web data

Need for more structural information than plain text, but less constraints on structure than in relational data

Page 5: XML and Databases

Characteristics of Semistructured Data

Missing or additional tuplesMultiple attributesDifferent types in different objectsHeterogeneous collectionSelf-describing, irregular data with no apriori structure

Page 6: XML and Databases

HTML Document Example

<h1> Bibliography </h1><p> <i> Foundations of Databases

</i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

Type of informatio

nTitle

Authors

Year

book

Page 7: XML and Databases

The Idea Behind XML

Easily support information exchange between applications / computers

Reuse what worked in HTML Human readable Standard Easy to generate and read

But allow arbitrary markup Uniform language for semistructured

data Data Management

Page 8: XML and Databases

XML

Page 9: XML and Databases

XML

eXtensible Markup LanguageUniversal standard for documents

and data Defined by W3C

Set of emerging technologies XLink, XPointer, XSchema, DOM, SAX,

XPath, XQuery,…

Page 10: XML and Databases

XML

XML gives a syntax, not a semanticXML defines the structure of a document, not how it is processedSeparate structural information from format instructions

Page 11: XML and Databases

XML Example

<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …

</bibliography>

Page 12: XML and Databases

XML Terminology

Tags: book, title, author,… Start tag: <book> End Tag: </book>

Elements are nested Empty Element

<reviews></reviews> => <reviews/>

XML Document: single root element XML Document is well formed: matching

tags

Page 13: XML and Databases

XML Attributes

Attributes are <name, value> pairs that characterize an element.

<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>

Can define oid, but they are just syntax

Page 14: XML and Databases

More XML

Text can be CDATA or PCDATAEntity References: &amp:&, &gt:>,…Processing Instructions: <?blink?>Comments: <!-- comment text -->

Page 15: XML and Databases

Well Formed XML Documents

Elements must be properly nested <book><title> Foundations of Databases </title></book> But Not: <book><title> Foundations of Databases </book></title>

There must be a unique root element Elements can be of

‘element content’ or ‘mixed content’:

<title>This is <b>Mixed</b>Content</title>

Page 16: XML and Databases

XML: Potential

Flexible enough to represent anything Stock market, DNA, Music, Chemicals Weather information Wireless network configuration

Enables easy information exchange Between companies Within companies

Standard: everybody uses the same technology

Page 17: XML and Databases

XML: Limitations XML is only a syntax for documents We need tools!

Editors and parsers Programming APIs (for Java, C++, etc.) Languages to manipulate XML (how many

books?) Schemas (What is a book like?) Storage (What if you have a lot of XML?) Transfer protocols (How do you exchange it?) What about XML in Chinese…? How can XML fit into my phone…? Query processing? …

Page 18: XML and Databases

XML Schema Language

Page 19: XML and Databases

DTDs: Document Type Descriptors

Similar to a schemaGrammar describing constraints on

document structure and content

XML Documents can be validated against a DTD

<!ELEMENT Book (title, author*)><!ELEMENT title #PCDATA><!ELEMENT author (name, address, age?)><!ATTLIST Book id ID #REQUIRED><!ATTLIST Book pub IDREF #IMPLIED>

Page 20: XML and Databases

Shortcomings of DTDs

Useful for documents, but not so good for data:

No support for structural re-use Object-oriented-like structures aren’t supported

No support for data types Can’t do data validation

Can have a single key item (ID), but: No support for multi-attribute keys No support for foreign keys (references to other

keys) No constraints on IDREFs (reference only a Section)

Page 21: XML and Databases

XSchema

In XML format Includes primitive data types

(integers, strings, dates,…)Supports value-based constraints

(integers > 100) Inheritance Foreign keys…

Page 22: XML and Databases

Example of XSchema<schema version=“1.0”

xmlns=“http://www.w3.org/1999/XMLSchema”><element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>

Page 23: XML and Databases

XML Storage

Page 24: XML and Databases

Storing XML Data

Different approaches: Storing as text Using RDBMS Using a native system

Tailored for XML, (NATIX, Tamino, Ipedo, etc.)

Performance of the various approachesdepends on your application

Page 25: XML and Databases

Storing XML as Text

SimpleEasy to compressNo updatesNeed to parse the document every time it is needed

Page 26: XML and Databases

Storing XML in RDBMS

Uses existing RDBMS techniquesCostly in space, takes time to

reconstruct original documentExample techniques:

Schema with 2 relations: tag and value

Schema with n relations: 1 per element name

Page 27: XML and Databases

Accessing and Querying XML Data

Page 28: XML and Databases

XML as a Tree: DOM

DOM = Document Object Model Class hierarchy serving as an API to XML

trees Methods of those classes can be used to

manipulate XML (e.g., Node::child, Node::name)

Can be used from Java, C++ to develop XML applications.

Each node has an identity (i.e., a unique identifier) in the whole document

Page 29: XML and Databases

XML as a DOM Tree

Class hierarchy(node, element attribute)

bibliography

book

title author publisher year

book

authorauthor

Foundation

s of Databases

Abiteboul Hull Vianu Addison Wesley

1995

Page 30: XML and Databases

XML as a Stream: SAX

XML document = event stream. E.g., Opening tag ‘book’ Opening tag ‘title’ Text “Foundations of databases” Closing tag ‘title’ Opening tag ‘author’ Etc.

SAX allow you to associate actions with those events to build applicationsVery efficient since it corresponds to events during parsing, but not always sufficient.

Page 31: XML and Databases

XPath

Language for navigating in an XML document (seen as a tree)

One root node types of nodes: root, element, text,

attribute, comment,… XPath expression defines navigation

in the tree following axis: child, descendant, parent, ancestor,…

Page 32: XML and Databases

XPath: Examples

Find all the titles of all the books: //book/title

Find the title of all books written by Charles Dickens //book[author=“Charles Dickens”]/title

Find the title of the first section in the second chapter in “Great Expectations”

//book[title=“Great Expectations”]/chapter[2]/section[1]/title

Find the title of all sections that come after the second chapter in “Great Expectations”:

//book[title=“Great Expectations”]/chapter[2]/following::section/title

Page 33: XML and Databases

Querying XML Data

Need for a language to query XML dataShould yield XML outputShould support standard query operations No schema requiredSeveral work on an XML query language: XML-QL, XQuery,..

Page 34: XML and Databases

XQuery

XPath included in XQuery FLWR expressions: for let where

returnFOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

Page 35: XML and Databases

How to process XML Queries?

Use indexes Need to identify nodes Need to know relations between

nodes

Labeling Schemes Dewey encoding Prefix-Postfix encoding

Twigstack

Page 36: XML and Databases

Web Services

Page 37: XML and Databases

What are Web Services

Programming interfaces for application to application communication on the Web platform-independent, language-independent object model-independent

Possibility to activate methods on remote web servers (RPC)

2 main applications E-commerce Access to remote data

Page 38: XML and Databases

XML and Web Services

Exchange of information between application is in XML Input and Result Use of SOAP to generate messages

Descriptions of the web service functionality given in XML, according to the WSDL schema

Web Services standards use XML heavily

Page 39: XML and Databases

Conclusions

XML: a very active area Many research directions Many applications

Standards not finalized yet: XQuery XML Schema Web Services…

Page 40: XML and Databases

Some Important XML Standards

XSL/XSLT: presentation and transformation standards

RDF: resource description framework (meta-info such as ratings, categorizations, etc.)

XPath/XPointer/XLink: standard for linking to documents and elements within

Namespaces: for resolving name clashes DOM: Document Object Model for manipulating

XML documents SAX: Simple API for XML parsing …

Page 41: XML and Databases

References XML

http://www.w3.org/XML/ Sudarshan S. Chawathe: Describing and Manipulating XML Data. IEEE

Data Engineering Bulletin 22(3)(1999) XML Standards

http://www.w3.org/ (XSL, XPath, XSchema, DOM…) Storing XML Data

Daniela Florescu, Donald Kossmann: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin 22(3)(1999)

Hartmut Liefke, Dan Suciu: XMILL: An Efficient Compressor for XML Data. SIGMOD Conference 2000

XQuery http://www.w3.org/TR/xquery/ Peter Fankhauser: XQuery Formal Semantics: State and Challenges.

SIGMOD Record 30(3)(2001) Web Services

http://www.w3.org/2002/ws/