storing and querying xml documents using relational databases

36
Storing and Querying XML Documents Using Relational Databases Mustafa Atay [email protected] Wayne State University Detroit, MI February 28, 2006

Upload: cadee

Post on 20-Jan-2016

61 views

Category:

Documents


0 download

DESCRIPTION

Storing and Querying XML Documents Using Relational Databases. Mustafa Atay [email protected] Wayne State University Detroit, MI February 28, 2006. Outline of Talk. Schema Mapping Data Mapping Query Mapping Reconstruction Conclusions. What is XML? HTML vs. XML Problem Statement - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Storing and Querying XML Documents Using Relational Databases

Storing and Querying XML Documents Using Relational Databases

Mustafa [email protected]

Wayne State UniversityDetroit, MI

February 28, 2006

Page 2: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 2

Outline of Talk What is XML? HTML vs. XML Problem

Statement Schema-based

Relational Approach

Schema Mapping

Data Mapping Query Mapping Reconstruction Conclusions

Page 3: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 3

What is XML?

eXtensible Markup Language primarily created by Jon Bosak of

Sun Microsystems officially recommended by W3C

(World Wide Web Consortium) since 1998

a simplified form of SGML (Standard Generalized Markup Language)

Page 4: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 4

What is XML? (cont.) a meta language

allows you to create and format your own document markups

separates content from format

a method for putting structured data into a text file; these files are

easy to read unambiguous extensible platform-independent

Page 5: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 5

HTML vs. XML<html> <body>

<table border=1> <tr>

<td><b><font color=red>First Value</td><td><b><font type="Arial" color=blue>Second Value</td><tr><td><b>87</td><td><b>99</td><tr><td><b>45</td><td><b>67</td><tr><td><b>86</td><td><b>84</td></table>

</body></html>

Page 6: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 6

HTML vs. XML (cont.)<?xml version="1.0"?><!DOCTYPE house SYSTEM "house.dtd">

<house year=“1980”> <entrance>front door</entrance> <entrance>back door</entrance> <rooms> <living_room>

<window>double hung1</window> <window>double hung2</window> <leads_to>kitchen</leads_to> <leads_to>hallway</leads_to> </living_room> <kitchen> <window>double hung1</window> <leads_to>living_room</leads_to> </kitchen> </rooms></house>

Page 7: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 7

HTML vs. XML (cont.) HTML- uses tags and attributes

- content and formatting can be placed together

- tags and attributes are pre-determined and rigid

- describes what a document looks like

- doesn’t allow user to define content rules

XML- uses tags and attributes- content and format are

separate; formatting is contained in a stylesheet

- allows user to create his/her own set of tags and attributes

- describes the information in a document

- allows user to define content rules (DTD)

Page 8: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 8

Why Storing and Querying XML?

XML has emerged as the standard for representing and exchanging data on the World Wide Web.

The increasing amount of XML documents requires the need to store and query XML data efficiently.

Page 9: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 9

A Sample XML Dataset

European Bioinformatics Institute Databases ftp://ftp.ebi.ac.uk/pub/databases/interpro/ match.xml ~ 700MB

Page 10: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 10

Approaches of Storing and Querying XML Documents

using Native XML repositories Software AG’s Tamino eXcelon’s XIS

using XML-enabled commercial database systems Oracle XML DB DB2 XML Extender Microsoft SQLXML

using RDBMS/ORDBMS to store and query XML documents (Relational Approach)

Page 11: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 11

Why to store XML in RDBMS? to get advantage of mature RDBMS

technology in efficient storage, indexing and optimization techniques

to enable companies or researchers to store and query XML data using their existing RDBMS system

to enable processing of transformed XML data using both XML and relational queries from a middleware environment

Page 12: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 12

Relational Approach XML-Publishing

XPERANTO - Carey et al., WebDB’00 SilkRoute – M. Fernandez et al., WWW’00

Schema-less approach Edge – D. Florescu et al., IEEE DEB’99 STORED – A. Deutsch et al., SIGMOD’99

Schema-based approach Basic, Shared and Hybrid inlining –

J. Shanmugasundaram et al., VLDB’99 ODTDMap – M. Atay et al., IS’06

Page 13: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 13

Schema-based Relational Approach

Schema Mapping XML data model is mapped into the relational

model Data Mapping

XML documents are shredded and composed into tuples to be inserted into the relational database

Query Mapping XML queries are translated into SQL queries

Reverse Data Mapping (Reconstruction) Original XML document is recovered from the

RDBMS

Page 14: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 14

Schema Mapping

Schema mapping algorithm ODTDMap contains the following steps: Simplifying DTDs Creating and inlining DTD graphs Generating relational schema and the

schema mapping file.

Page 15: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 15

Sample DTD – univ.dtd

<!DOCTYPE univ [ <!ELEMENT univ (colleges, schools?) > <!ATTLIST univ uName CDATA #REQUIRED> <!ELEMENT colleges (college+) > <!ELEMENT college (dep*) > <!ATTLIST college cName CDATA

#REQUIRED> <!ELEMENT schools (school+) > <!ELEMENT school (dep*) > <!ATTLIST school sName CDATA #REQUIRED> <!ELEMENT dep (tel?, fax?, website?) > <!ATTLIST dep dName CDATA #REQUIRED> <!ELEMENT tel (#PCDATA) > <!ELEMENT fax (#PCDATA) > <!ELEMENT website (#PCDATA) >]>

Page 16: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 16

Creating DTD Graph<!DOCTYPE univ [ <!ELEMENT univ (colleges, schools?) > <!ATTLIST univ uName CDATA

#REQUIRED> <!ELEMENT colleges (college+) > <!ELEMENT college (dep*) > <!ATTLIST college cName CDATA

#REQUIRED> <!ELEMENT schools (school+) > <!ELEMENT school (dep*) > <!ATTLIST school sName CDATA

#REQUIRED> <!ELEMENT dep (tel?, fax?, website?) > <!ATTLIST dep dName CDATA #REQUIRED> <!ELEMENT tel (#PCDATA) > <!ELEMENT fax (#PCDATA) > <!ELEMENT website (#PCDATA) >]>

Page 17: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 17

Inlining DTD Graph

Page 18: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 18

Generating Relational Schema

Page 19: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 19

Data Mapping Challenging issues of data mapping

Should respect to schema mapping Varying document structure Scalability

We introduced two efficient linear algorithms OXInsert

main memory data mapping algorithm DOM-based

SDM streaming data mapping algorithm SAX-based

Page 20: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 20

Sample XML document - univ.xml

<?xml version=”1.0” ?><!DOCTYPE univ SYSTEM “univ.dtd”><univ uname=”WSU”> <colleges> <college cname=”Science”> <dep dname=”CS”> <website>www.cs.wayne.edu</website> </dep> </college> <college cname=”Engineering”> <dep dname=”ECE”> <tel>313-5773920</tel> </dep> <dep dname=”IE”></dep> </college> <college cname=”Pharmacy”></college> </colleges></univ>

Page 21: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 21

XMLTree for univ.xml

<?xml version=”1.0” ?><!DOCTYPE univ SYSTEM “univ.dtd”>1 <univ uname=”WSU”>3 <colleges>4 <college cname=”Science”>6 <dep dname=”CS”>8

<website>www.cs.wayne.edu</website> </dep> </college>9 <college cname=”Engineering”>11 <dep dname=”ECE”>13 <tel>313-5773920</tel> </dep>14 <dep dname=”IE”></dep> </college>16 <college cname=”Pharmacy”></college> </colleges> </univ>

Page 22: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 22

XMLTree for univ.xml

<?xml version=”1.0” ?><!DOCTYPE univ SYSTEM “univ.dtd”>1 <univ uname=”WSU”>3 <colleges>4 <college cname=”Science”>6 <dep dname=”CS”>8

<website>www.cs.wayne.edu</website> </dep> </college>9 <college cname=”Engineering”>11 <dep dname=”ECE”>13 <tel>313-5773920</tel> </dep>14 <dep dname=”IE”></dep> </college>15 <college cname=”Pharmacy”></college> </colleges> </univ>

Page 23: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 23

Database state after univ.xml is mapped

Page 24: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 24

Performance of OXInsert and SDM

Data Mapping

0

20

40

60

80

100

10 20 30 40 50

Size (MB)

Tim

e (sec)

DOM

SAX

Page 25: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 25

Data Mapping Across Different Schema Mappings

0

500

1000

1500

2000

2500

3000

3500

25 50 75 100 125

Size (MB)

Tim

e (s

ec)

Basic Shared Hybrid

Page 26: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 26

Query Mapping

We translate simple XPath expressions to SQL

XPath is the core of XML query languages. We identified 3 algorithms for query

mapping Naïve Cluster Containment Join

Page 27: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 27

Naïve Takes an XPath expression creates a nested SQL

query comprised of SQL queries for each XPath step e.g.

XPath: /univ /colleges /college /dep[@dName=‘CS’]

SQL: Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in

(Select college.ID from college where college.parentID in

(Select colleges.ID from univ where colleges.parentID in

(Select univ.ID from univ where univ.parentID=0) ) )

Page 28: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 28

Cluster

A cluster is a sequence of consecutive elements stored in the same table

Takes an XPath expression and creates a nested SQL query comprised of SQL queries for each XPath cluster

e.g. XPath:

/univ /colleges /college /dep[@dName=‘CS’] SQL:

Select dep.ID from dep where dep.dName=‘CS’ and dep.parentID in

(Select college.ID from college where college.parentID in

(Select colleges.ID from univ) )

Page 29: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 29

Containment Join Relies on the well-formedness of XML

documents Requires the pre-computation of max. ID of

descendants of each element instance (endID) Facilitates efficient evaluation of recursive XML

queries e.g.

XPath: /univ /colleges /college /dep[@dName=‘CS’]

SQL: Select dep.ID from dep, college, univ where dep.dName=‘CS’ and dep.ID>=college.ID and dep.ID<=college.endID and

college.parentID=univ.colleges.ID

Page 30: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 30

A Recursive Query Example

XPath: /univ //dep

Sub queries of the recursive query /univ /colleges /college /dep /univ /schools /school /dep

Naïve: 8 SQL queries + 6 joins + 1 union Cluster: 6 SQL queries + 4 joins + 1

union Containment Join: 1 SQL query + 1 join

Select dep.ID from dep, univ where dep.ID>=univ.ID and dep.ID<=univ.endID

Page 31: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 31

Reconstruction In query mapping stage, the elements selected by an

XML query can be returned in one of the following two modes:

Select mode: returns IDs Reconstruct mode: returns XML subtrees

Algorithm Reconstruct reconstructs the XML subtree rooted at a given element

The importance of Reconstruction lies in two aspects: XML subtree reconstruction has great impact on the

query response time in reconstruct mode. It demonstrates that our mapping scheme is lossless

Page 32: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 32

Conclusions Schema mapping [1,5]

lossless and order preserving processing set-valued XML attributes simple processing of recursion

Data mapping [1,3] We described the first linear-time

schema-based data mapping algorithms We justified their effectiveness on

different schema mapping algorithms

Page 33: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 33

Conclusions (cont.) Query mapping

We identified 3 algorithms Our CJ algorithm outperforms the only

published recursive query mapping algorithm by Krishnamurthy et al., IEEE ICDE’04

Reconstruction [2] We introduced an efficient reconstruction

algorithm It can be used in relational schema-based

mapping unlike its rivals used in XML-publishing

Page 34: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 34

Future Work

Extending the schema mapping to XML Schema

Extending the query mapping to XQuery

Introducing DTD/Schema constraints to the proposed mapping scheme

Incorporating access control methods to the proposed mapping scheme

Page 35: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 35

Acknowledgements

Dr. Shiyong Lu Dr. Farshad Fotouhi Artem Chebotko Dapeng Liu Yezhou Sun

Page 36: Storing and Querying XML Documents Using Relational Databases

2/28/2006 Wayne State University 36

Publications

1. Mustafa Atay, Artem Chebotko, Dapeng Liu, Shiyong Lu, Farshad Fotouhi, "Efficient Schema-based XML-to-Relational Data Mapping", International Journal of Information Systems, 2006. (to appear)

2. Artem Chebotko, Dapeng Liu, Mustafa Atay, Shiyong Lu and Farshad Fotouhi, “Reconstructing XML Subtrees from Relational Storage of XML Documents”, in Proc. of the 2nd International Workshop on XML Schema and Data Management (XSDM’05), in conjunction with ICDE’2005, Tokyo, Japan, April, 2005

3. Mustafa Atay, Yezhou Sun, Dapeng Liu, Shiyong Lu and Farshad Fotouhi, “Mapping XML Data to Relational Data: DOM-based Approach”, in Proc. of the 8th IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA’2004). Kauai, Hawaii, USA. August, 2004.

4. Shiyong Lu, Yezhou Sun, Mustafa Atay, and Farshad Fotouhi, “On the consistency of XML DTDs”, International Journal of Data and Knowledge Engineering, 2004.

5. Shiyong Lu, Yezhou. Sun, Mustafa Atay, and Farshad Fotouhi, “A new inlining algorithm for mapping XML DTDs to relational schemas”, In Proc. of the First International Workshop on XML Schema and Data Management, in conjuction with the 22nd ACM International Conference on Conceptual Modeling (ER2003), Chicago, Illinois, USA, October 2003.

6. Shiyong Lu, Yezhou Sun, Mustafa Atay, Farshad Fotouhi, "A Sufficient and Necessary Condition for the Consistency of XML DTDs", in Proc. of  the First International Workshop on XML Schema and Data Management, in conjunction with the 22nd ACM International Conference on Conceptual Modeling (ER'2003), Chicago, Illinois, USA, October, 2003.