information integration with xml, part ii · pdf fileinformation integration with xml part ii...

59
1 San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Information Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group San Diego Supercomputer Center

Upload: tranbao

Post on 06-Feb-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

1

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Information Integration with XMLPART II

Chaitan BaruRichard Marciano

{baru,marciano}@sdsc.edu

Data Intensive Computing GroupSan Diego Supercomputer Center

Page 2: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

2

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

PART II

• Storing XML documents• Querying XML documents• XML and GIS• Technical Issues• Projects at SDSC

Page 3: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

3

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Storing XML documents

• Pure XML data servers• Documents are stored in native XML form• XML-based query languages are used to

retrieve data• Relational DBMS’s

• Documents are stored as BLOB’s• Or, XML elements are mapped to columns in

tables• SQL is used to retrieve data

Page 4: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

4

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Storing in “pure” XML data servers• eXcelon, from eXcelon Corp. (ex ODI)• Dynamic Application Platform

• Data Server• Toolbox• Xconnects

• B2B Integration Services• B2B Translator• Business Process Workflow Engine• Enterprise Connectivity• Business Module eXtensions

Page 5: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

5

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

eXcelon• Stores XML and non-XML (blob) data• Supports queries and indexes on stored XML

data• Uses a file system metaphor• Supports the use of (server-side) XSL

stylesheets• Provides visual tools (Studio, Explorer, Manager, Stylus)

• Provides Web & COM client interfaces• Provides Java & COM APIs to extend data server• Supports DOM for data access on the server• Can distribute XML data access across caches• Connects to 70 sources using ADBC / ADO

Page 6: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

6

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Studio:• define XML schemas (DCD)

• generate XML-based pages

Explorer:• browser to view, import, organize, modify, query and set security on data

• Xpath/ XQL query wizard

Manager:• administer & configure

• set server properties

• set load balancing parameters

Stylus:• Build Web pages using XML & XSL

• Transforms XML to HTML

eXcelon Tool Box

Page 7: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

7

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Connect to any data source:

• Cobol

• dBaseIII

• Act

• etc.

eXcelon Xconnects

Page 8: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

8

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Programming against eXcelon• Out-of-the-box tools: “no writing code”

• create / update / delete / query XML

• Programming:• In eXcelon server extensions

• COM / JAVA & DOM to manipulate XML contained in eXcelon XMLStores

• In Web server• Active Server Application that uses the eXcelon COM client API & ship

HTML to the browser. XSL can be applied in the context of the Web server

• In Browser• DHTML (VBScript, JavaScript, Visual Basic) or Java applet that

manipulates XML. Apply XSL stylesheet in the browser

Page 9: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

9

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Storing XML documents in RDBMS’s

• Store documents as BLOBs• Map document elements into a set of

relational tables• Need a DTD or schema for documents• Need to map the XML DTD or schema into a relational

schema• Relational schema will capture the hierarchical

“containment” relationship among elements as 1-1 or 1-many relationships

Page 10: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

10

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Example of an XML document and DTDPublication

Title AuthorName

XML Tutorial Richard Marciano

AuthorName

Chaitan Baru

Abstract Section

Heading Para Para

IntroDTD<!ELEMENT Publications (Publication)*><!ELEMENT Publication (Title, AuthorName+, Abstract, Section*)><!ELEMENT Section (Heading, Paragraph*))>

Pub_ID

<!ATTLIST Publication Pub_ID ID #REQUIRED>

Page 11: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

11

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Store data as BLOB’s in RDBMS

• Store XML document as BLOB, with text/path indexes

XML Document<title></title>

<abstract></abstract>RDBMS

textblob

textindex

Page 12: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

12

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Provide indexing of XML documents

XML Document<title></title>

<abstract></abstract>RDBMS text

blob

Title

textindex

Column index

• Store specified elements as columns in a table

Page 13: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

13

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Map DTD to a relational schema

• Un-nest the DTD hierarchy• Stop at a point where it is “sufficient” to

represent an element as a single compound value, rather than a hierarchy (e.g. Address)

Pub_ID Title Abstract Auth_ID Pub_ID AuthName

Sec_Num Pub_ID Heading Sec_Num Pub_ID Para_Num Text

Publication Author

Section Paragraph

Page 14: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

14

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Storing un-nested DTD hierarchies

• Store document elements across multiple tables• (Not yet available in COTS products)

XML Document<title> </title>

<author></author><author></author>

<abstract></abstract>

RDBMS

Pub_ID Title Abstract

Publication

Auth_ID Pub_ID AuthName

Author

Page 15: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

15

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Retrieving XML data from DBMS

• Retrieving from pure XML data servers• Use XML query languages, e.g. XQL

• Retrieving from RDBMS• Use SQL to query data from database tables• “Wrap” output of SQL query as an XML document• Define XML views over relational schemas – Xviews

• Use SQL statement(s) to create XML output

Page 16: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

16

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Retrieving from “pure” XML servers• XML Query Language (XQL), supported by

eXcelon• Reference

• http://www.w3.org/TandS/QL/QL98/pp/xql.html• Example: Publication DTD

<ELEMENT Publications (Publication)*><ELEMENT Publication (Title, AuthorName+, Abstract, Section*)><ELEMENT Section (Heading,Paragraph*))>

PublicationsPublication

(Title, AuthorName+, Abstract, Section*(Heading, Paragraph*))

Page 17: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

17

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

XQL• Example queries:• Output all section headings of all publications

/Publications/Publication/Section/Heading

• Output all documents that have a section called, “Conclusion”

/Publications/Publication[Section/Heading=“Conclusion”]

Page 18: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

18

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

XML-QL• XML Query Language (XML-QL)

• http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/• XML-QL example

WHERE <Publications><Publication>

<Title> XML Tutorial </Title><Section> $S </Section><AuthorName> $A </AuthorName>

<Publication></Publications> IN www.sdsc.edu/publications/pubs.xml”

CONSTRUCT $A• Meaning: list all authors of all publications with

title=“XML Tutorial” that have at least one section and one author

Page 19: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

19

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

XMAS• XML Matching And Structuring (XMAS)

CONSTRUCT <my_authors><my_author> $A </my_author>

</my_authors>WHERE<Publications>

<Publication> <Title> $T </Title><Section> $S </Section><AuthorName> $A </AuthorName>

</Publication> </Publications>IN "http://www.sdsc.edu/publications/pubs.xml” AND substr(”XML Tutorial", $T)

Page 20: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

20

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Retrieving XML from RDBMS servers• Wrapping SQL output in XML• Example query:

SELECT Title, AuthNameFROM Publication, AuthorWHERE Publication.Pub_ID = Author.Pub_ID

• Result:<result>

<row><title> XML Tutorial </title><author>Marciano</author>

</row><row>

<title> XML Tutorial </title><author>Baru</author>

</row></result>

Page 21: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

21

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Retrieving from RDBMS Servers - 2

•Define XML views over relational schemas• How to interpret relational data as XML documents• Relational schemas are “flat”, XML documents are hierarchical

Database

Relations

Tuples

Attributes

PublicationsDB

Publication

t1 t2 t3

Title Author Abstract

Page 22: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

22

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

The Xviews concept• Derive a DTD by “identifying” a “containment”

relationship among the set of tables• Example: the “canonical” data warehouse

schema

Lineitem

Region Product

Customer Candidate containment:Lineitem

Customer Product Region

• DTD<ELEMENT Lineitem (Customer,Product,Region)>

Page 23: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

23

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

The Xviews concept

• Alternative containment:Customer

Lineitem Product Region

• DTD<ELEMENT Customer (Lineitem*)><ELEMENT Lineitem (Product,Region)>

• Note: outer joins are needed in order to output customers who have no lineitems

Page 24: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

24

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

RDBMS product support

XML Document RDBMS

Database tables

Package the query output into XML

SQL queries

Page 25: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

25

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Oracle: XML SQL and XSQL Utility

• Retrieving data as XML• Generates an XML Document from SQL queries• Outputs text or Document Object Model from a SQL query

string or a JDBC ResultSet object• Inserting XML data into tables

• Writes data from an XML document into a (single) database table or (updateable) view

Page 26: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

26

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

XSQL Utility/Servlet

Oracle 8i

Web Server

Web browser

XSQL Servlet

XML-formatted SQL queries (.xsql)

Query result in XML, or transformed into HTML by XSL

{xsql filename, params, XSL stylesheet}

XSLTprocessor

XMLSQL

utility

Page 27: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

27

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

XSQL Example

<?xml version=“1.0”?><?xml-stylesheet type=“text/xsl”

href=“query1.xsl”?><query connection = “PublicationDB”

<doc-element = “Publications”<row-element = “Publication”>SELECT title, abstract, authornameFROM publication p, author aWHERE p.Pub_ID = a.Pub_ID

</query>

<Publications><Publication>

<title>XML Tutorial</title><abstract>...</abstract><authorname>Marciano </authorname>

</Publication><Publication>

<title>XML Tutorial</title><abstract>...</abstract><authorname>Baru</authorname>

</Publication>..... more rows...

</Publications>

Example XSQL file Sample XML output

Page 28: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

28

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

DB2: XML Extender• XML Extender in UDB Version 6.1• XML_Column type and Document Access

Definitions (DAD’s)• Insertion into a column of type

XML_Column triggers extraction of elements specified in DAD’s

RDBMS

Title XML blob

DADXML Column

XML Document<title> </title>

<abstract></abstract>

Page 29: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

29

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

GIS and XML

• Represent GIS metadata in XML• Represent spatial features in XML

Page 30: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

30

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

GIS & XML: 1st experiment (the data)

Page 31: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

31

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

GIS & XML: 1st experiment (XML wrapping)

Page 32: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

32

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Exporting GIS data in XML

Page 33: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

33

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Work in ProgressSpatial XML Markup Languages

• Geography Markup Language (GML) 1.0• OGC Working Draft 17-Jan-2000

• Web Mapping Testbed (WMT): NIMA, USACoE, FGDC, NASA, USDA, USGS ...

• Digital Earth (www.digitalearth.gov)

• AXL (ArcXML) pre-release• part of ESRI ArcIMS 31-Jan-2000

Page 34: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

34

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

GML• GML: XML specification to encode geo. info.

For both Data Storage & Data Transport• Initial release deals with OGC Simple Features:

• vector geodata: e.g. digital map info (streets, population, land use zones, property lines, watersheds, etc.)

• GML is not concerned with the visualization of geographic features (drawing of maps)

GMLin XML

Direct rendering Graphicformat

Transformation into a vector graphics rendering format

• SVG• VML• VRML

Direct routing w.o. viz. Numerical model

Page 35: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

35

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

GML•• GoalGoal: enable organizations to share geo. info. & to

enable linked geographic datasets• When GML data is exchanged over the Internet, it is

transmitted in “feature collection”• GML Simple features:

• geometry classes: Point, LineString, Polygon• geometry properties: coordinate lists, spatial reference system name

• pointproperty• linestringproperty• polygonproperty• multipointproperty• multilineproperty• multipolygonproperty

Page 36: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

36

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

GML Example• <?xml version="1.0" standalone="yes"?>

<!DOCTYPE FeatureCollection SYSTEM "FeatureCollection.dtd" [ <!--Description : Illustration of an area feature using the polygon property. Author : Ron Lake --> ]> <FeatureCollection xmlns:ogcgml="http://www.opengis.org/gml#" >

<BoundingBox> <coordinates>0.0,0.0 3.0,4.0</coordinates>

</BoundingBox> <Feature typeName="http://www.usgs.org/tp#Building" ID="1">

<Description>Hotel Vancouver</Description> <Property typeName="http://www.usgs.org/tp#Number of Rooms" type="int">4</Property> <polygonproperty parseType = "Resource" roleName="http://www.usgs.org/tp#extent"

srsName="http://www.opengis.org/srs/epsg:26751" > <type resource = "http://www.opengis.org/gml#Polygon" /> <boundary parseType = "Resource">

<type resource = "http://www.opengis.org/gml#LineString" /> <coordinates>0.0,0.0 1.123,1.56 2.34,4.5 0.0,0.0</coordinates>

</boundary> </polygonproperty>

</Feature>

</FeatureCollection>

Page 37: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

37

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

ArcXML (AXL)

• Being developed by ESRI, available in ArcIMS 3.0• Format for data exchange within ArcIMS 3.0• Provides tags for:

• Request / Response between Client, Middleware, and Server

• MapService Configuration• Viewer Configuration

Page 38: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

38

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

AXL• Config tags:• Properties (properties, extent, background, imagesize, output, featurecoordsys, etc.)

• Workspaces (workspaces, SDEworkspaces, shapeworkspaces, imageworkspaces, etc.)

• Layers (layer, dataset, query, coordsys)• Renderers (simple, group, scaledependent, valuemap, simplelabel, valuemap, etc.)• Symbols (simplemarker, rastermarker, simpleline, hashline, simplefill, simplepolygon, rasterfill,

gradientfill, text, etc.)

• Acetate layer objects (object, point, line, polygon, text, scalebar, northarrow)

• Admin tags: (admin, addservice, changeservice, removeservice, image)

• Request tags: • (request, get_service_info, get_map, get_features, get_extract, get_geocode)• Feature Server Request Tags (layer, query, spatialquery, spatialfilter, envelope)

• Response tags: (response, error)

• serviceinfo (serviceinfo, layerinfo, fclass, field)• featureserver• queryserver (features, feature)

• imageserver (map, output, legend)

Page 39: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

39

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Sample Request/Response AXL

• Example—Get_Map

<ARCXML VERSION="1.0">

<REQUEST>

<GET_MAP>

<PROPERTIES>

<EXTENT MINX="-180" MINY="-90" MAXX="180" MAXY="90" />

</PROPERTIES>

</GET_MAP>

</REQUEST>

</ARCXML>

Page 40: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

40

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Sample MapService AXL

• Example<WORKSPACES>

<SHAPEWORKSPACE name="shp_ws-0”

directory="D:\Data\ESRIDATA\USA" />

<SDEWORKSPACE name="sde_ws-0"

server="ims" instance="esri_sde"

user="gdt" password="gdt" />

</WORKSPACES>

Page 41: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

41

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Sample Viewer AXL

• Example<WORKSPACES>

<MAPPERWORKSPACE name="mapper_ws-0”

url="http://mammoth" service="baseimage" />

</WORKSPACES>

<LAYER type="image" name="baseimage" visible="false"

minscale="0.0” maxscale="1.7976931348623157E308”/>

<DATASET name="baseimage" type="image”

workspace="mapper_ws-0" />

</LAYER>

Page 42: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

42

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Page 43: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

43

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Page 44: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

44

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Page 45: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

45

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Page 46: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

46

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Open Technical Issues

• DTD inference• DTD evolution• Specifying access controls on XML

documents• Specifying, enforcing intr-document

constraints

Page 47: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

47

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

DTD Inference

• Document collections without DTD’s• “Tight” vs. “loose” DTD’s• Document 1:

<title> XML Tutorial </title><author> Richard Marciano </author>

• Possible DTD<ELEMENT document (title author))>

Page 48: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

48

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

DTD Inference• Document 2:

<title> XML Tutorial </title><author> Richard Marciano </author><author> Chaitan Baru </author>

• Document DTD 1<!ELEMENT document

(title (author1 | author1 author2))>• Document DTD 2

<!ELEMENT document (title (author+))>

Page 49: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

49

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

DTD Inference

• Alternative DTD’s. Introduce an extra level (authors) in the tree<!ELEMENT document (title authors)><!ELEMENT authors (author1 |

author1 author2)>OR<!ELEMENT document (title authors)><!ELEMENT authors (author+) >

Page 50: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

50

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

DTD Evolution

• Example, Document 3:<title> XML Primer </title><author> Richard Marciano </author><author> Chaitan Baru </author><keywords> XML, XSL, Schema </keywords>

• Document does not satisfy the Document DTD• Report an error• Record as exception and store the document• Evolve the DTD

Page 51: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

51

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Specifying access controls• User is associated with a level of access• Document elements are assigned levels of

access• Example

<abstract level=“unclassified”>….</abstract><section level=“classified”><heading>Introduction

</heading></section><section level=“top secret”><heading>Architecture

</heading></section>• Stylesheet processor matches authorization level of user

with auth level of the document element

Page 52: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

52

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Specifying access controls

• Do access control processing in the stylesheet language

• Useful for content dependent access control• Example

If title contains “nuclear” then show only abstract Else show the full document

• Access control processing should be done on server side in secure fashion

Page 53: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

53

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Specifying constraints

• Enforcing intra-document constraints• Constraints on structure• Example: A short paper may contain only one

section, but long papers must have at least two.<!ELEMENT Publication (Title, AuthorName*, Section*)<!ATTLIST Publication Type CDATA #REQUIRED>

• Specify type of document in Type attribute. Use that to check if document satisfies the constraint

• Constraints on value

Page 54: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

54

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Specifying constraints

• Example of value constraint<!ELEMENT Publication (Title, AuthorName*, Section*)<!ATTLIST Publication NumSecs CDATA #REQUIRED><Publication NumSecs=“3”>

<Title>…</Title><AuthorName>…</AuthorName><Section>……</Section><Section>……</Section><Section>……</Section>

</Publication>

Page 55: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

55

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Projects at SDSC• National Archives and Records Administration,

NARA• Persistent Archives and Electronic Records

• NHPRC• NPACI Neuroscience

• Federation of multiple brain image databases• I2T: An Information Integration Testbed for

Digital Government• Funded by NSF• Spatial mediation, wrapping of “unstructured” text

Page 56: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

56

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Projects at SDSC

• InterLib and California Digital Library• Funded by the NSF DLI-2 program• Implemented the Art Museum Image Consortium (AMICO)

Digital Library at SDSC• Community of Science, Inc. (www.cos.com)

• Specifying XML standards for Current Research Information Systems (CRIS)

• Enable creation of warehouse of research information and enable e-commerce

Page 57: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

57

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Projects at SDSC

• ESRI• Developers of ArcInfo, ArcView, ArcIMS products• Evaluate ArcXML (AXL) standard• Keep AXL developments in synch with activities in

OpenGIS Consortium, e.g. the evolving Geography Markup Language (GML) standard

• Connect AXL with other XML Web standards such as WAP (Wireless Application Protocol)

Page 58: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

58

San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure

Projects at SDSC

• NEES• Proposal to NSF’s Networked Earthquake Engineering

Simulation (NEES) program• Develop NeesML, an XML-based standard for

representing earthquake engineering simulation metadataand data

• NeesML will facilitate the creation of a NEES Curated Database, a warehouse of earthquake engineering simulation information

Page 59: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group

59