xml use cases: customer strategies and experiences€¦ · why store xml in a database? • xml use...

Post on 30-Apr-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2010 IBM Corporation

XML Use Cases: Customer Strategies and Experiences

https://www.ibm.com/developerworks/wikis/display/db2xml/XMLUseCases Charts and Recording

Susan Malaika, Matthias Nicolamalaika@us.ibm.com, mnicola@us.ibm.com

2

Agenda

• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases

• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…

• Summary

3

Agenda

• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases

• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…

• Summary

4

What is DB2 pureXML ?

• Native XML storage, indexing and processing in DB2

• Tight integration of XML and relational data management

• Based on standards: XQuery, XPath, SQL/XML, XML Schema, XSLT, Namespaces, etc.

• Facilitates end-to-end XML

5

DB2 pureXML Overview 1/2create table customer (cid integer, info XML)

insert into customer (cid, info) values (?,?)

select cid, info from customer

select xmlquery('$INFO/customer/name') from customer where cid > 1234 and xmlexists('$INFO/customer/addr[zip = 95123]')

xquery for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerwhere $i/addr/zip = 95123return <myresult>{$i/name}</myresult>

1

2

3

4

5

6

create index idx1 on customer(info) generate key using xmlpattern '/customer/addr/zip' as sql varchar(5)

update customer set info = ? where ….

update customerset info = xmlquery(‘copy $new := $INFO

modify do replace value of $new/customer/addr/zip with 95141

return $new')where …;

Plus: XML Schema Support, Utilities, Shredding, XSLT, etc.

DB2 pureXML Overview 2/2

9

10

7

8

7

Agenda• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases

• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…

• Summary

8© 2010 IBM Corporation

Why/When store XML instead of Relational ?When business objects are typically represented as XML when transmitted When data format (schema) changes over timeWhen schemas from different sources are combined to represent an external entityWhen schema is complex and highly variable, such as:

XML Schema Files

Type Definitions

Elements +Attributes

HL7 CDA 3 6 1953 945 + 477

STAR 192 5846 77319 + 625

FpML 4.2: 23 686 1867 + 196

FIXML 4.4: 41 1310 619 + 2593

Health Level 7, Clinical Document Architecture

Standards for Technology in Automotive Retail (OAGIS)

Financial products Markup Language

Financial Information eXchange Protocol

Many industry formats have similar XML Schema complexity/variability

9© 2010 IBM Corporation

In Relational: Many tables

In pureXML: as simple ascreate table T(ID int, trade XML);

Generate Relational Schema for FpML

10

Web Applications Before DB2 pureXML

Client Application ServerXML

XML ObjectsMapping Mapping

Relational schema

Relational Database

• XML to object mapping + object to relational mapping• Complex, very costly, inflexible, inefficient, difficult to

maintain over time

11

End-to-end XML Processing

Client XML Forms, XHMTL etc. DB2 pureXML

XMLSOA Gateway / WAS

optional: XSLT

• End-to-End Straight-Through Processing using XML• XML programming paradigm and architecture pattern

• XForms• REST/SOAP web services• XML Standards: XPath, XQuery, XML Schema, XSLT, etc.

Protocol SOAP, REST,

MQ

XMLrelational

XML XML XML

SQL/XMLXQuery

12

Parse Once – Access Many

Client XML Forms, XHMTL etc. DB2 pureXML

XMLSOA Gateway / WAS

optional: XSLT

• The XML is parsed once on input • The XML is processed many times – through declarative query

languages (SQL/XML, XQuery) • Benefits

• Reduce CPU usage overall and in the middle tier in particular• Access to the XML is through a declarative language that can be

optimized outside the application e.g., through indexing

Protocol SOAP, REST,

MQ

XMLrelational

XML XML XML

SQL/XMLXQuery

Reduce CPU Usagein Middle Tier

13

XML vs. Relational

DepartmentDEPTID DEPTNAME

15 SalesEmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

15 27 MICHAEL THOMPSON NULL 4125015 10 CHRISTINE SMITH 408-463-4963 52750

DEPTDOC<DEPARTMENT deptid="15" deptname="Sales">

<EMPLOYEE><EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>

</EMPLOYEE><EMPLOYEE>

<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME> <SALARY>41250.00</SALARY>

</EMPLOYEE></DEPARTMENT>

create table dept(deptdoc XML);

14

Schema Evolution

PhoneEMPNO PHONE

27 406-463-123410 415-010-123410 408-463-4963

Relational Requires:• Normalization of existing data !• Change of applications

Costly!

DepartmentDEPTID DEPTNAME

15 SalesEmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750

“Employees are now allowed to have multiple phone numbers…”DEPTDOC<DEPARTMENT deptid="15" deptname="Sales">

<EMPLOYEE><EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><PHONE>415-010-1234</PHONE><SALARY>52750.00</SALARY>

</EMPLOYEE><EMPLOYEE>

<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME><PHONE>406-463-1234</PHONE><SALARY>41250.00</SALARY>

</EMPLOYEE></DEPARTMENT>

15

Denormalization

Sales

EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 415-010-1234 52750Sales

DEPTNAME

15 10 CHRISTINE SMITH 408-463-4963 52750Sales

DEPTDOC<DEPARTMENT deptid="15" deptname="Sales">

<EMPLOYEE><EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><PHONE>415-010-1234</PHONE><SALARY>52750.00</SALARY>

</EMPLOYEE><EMPLOYEE>

<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME><PHONE>406-463-1234</PHONE><SALARY>41250.00</SALARY>

</EMPLOYEE></DEPARTMENT>

• …can simplify a database schema• …can reduce the number of joins• …can simplify application development• …can reduce number of DBMS API calls• …can improve performance

In relational, denormalizationintroduces redundancy

XML allows denormalizationwithout redundancy

create table dept(deptdoc XML);

16

12 Normalized Relational Tables representing 250K External Entities

XML

1 XML column representing 250K External Entities

Java ProgramPerforms 15SQL Selects

Java ProgramPerforms 1XML Select

3.2 million entities retrieved in 15 minutes at same server CPU utilization

5 million entities retrieved in 15 minutes at same server CPU, and much lower client CPU utilization

Two Equivalent Data RetrievalsComparing Normalized Relational and XML in a Customer Scenario

::

55% higherthroughput

18

Typical XML Collection Structure• Not either XML or relational• Hybrid database schema• Simple Storage Structure• Small number of tables• Few fixed fields in relational

columns• Variable data in XML• DB2 can extract selected XML

elements in relational columns if needed

Key1 Key2 XML

… … <DEPARTMENT deptid="15" deptname="Sales"><EMPLOYEE>

<EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>

</EMPLOYEE><EMPLOYEE>

<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME> <SALARY>41250.00</SALARY>

</EMPLOYEE></DEPARTMENT>

… … <DEPARTMENT deptid="33" deptname="HR"><EMPLOYEE>

<EMPNO>17</EMPNO><FIRSTNAME>Mary</FIRSTNAME><LASTNAME>JONES</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>

</EMPLOYEE></DEPARTMENT>

… … …

TABLE

19

Typical XML Collection Structure• Not either XML or relational• Hybrid database schema• Simple Storage Structure• Small number of tables• Few fixed fields in relational

columns• Variable data in XML• DB2 can extract selected XML

elements in relational columns if needed

• Can also define relational views over portions of XML

Key1 Key2 EmpID FNname LNname … …

15 … 10 Christine Smith … …

15 … 27 Michael Thompson … …

14 33 Mary Jones … …

… … … … … … …

Key1 Key2 XML

15 … <DEPARTMENT deptid="15" deptname="Sales"><EMPLOYEE>

<EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>

</EMPLOYEE><EMPLOYEE>

<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME> <SALARY>41250.00</SALARY>

</EMPLOYEE></DEPARTMENT>

14 … <DEPARTMENT deptid="14" deptname="HR"><EMPLOYEE>

<EMPNO>33</EMPNO><FIRSTNAME>Mary</FIRSTNAME><LASTNAME>JONES</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>

</EMPLOYEE></DEPARTMENT>

… … …

CREATE VIEW…TABLE

VIEW

20

Agenda• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases

• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…

• Summary

21

XML Use Cases - Categories• XML at the Core of Application Functionality

• Forms • SOA, Web Services• Data exchange with government and regulatory bodies

• Auditing & Compliance• Event logging / Application logging• Persisting non-XML data for audit purposes

• Adding Flexibility• Flexible Dimensions in Data Warehouses• Custom/user-defined fields in packaged applications• Configuration files and properties

• Metadata• XML for metadata • XML annotations for metadata

• XML Content Management • Creating business reports• Document management

and more…

22

XML Use Cases - Categories• XML at the Core of Application Functionality

• Forms • SOA, Web Services• Data exchange with government and regulatory bodies

• Auditing & Compliance• Event logging / Application logging• Persisting non-XML data for audit purposes

• Adding Flexibility• Flexible Dimensions in Data Warehouses• Custom/user-defined fields in packaged applications• Configuration files and properties

• Metadata• XML for metadata • XML annotations for metadata

• XML Content Management • Creating business reports• Document management

and more…

23

• Schema Diversity• In relational: complex relational database schema

and complex application logic• In XML: no fixed schema required

• Sparse Data• In relational: very many NULL values• In XML: space savings (omit optional elements)

• Schema Evolution• In relational: costly, time consuming, normalization• In XML: much less costly, no need to alter database tables

Core: Forms Processing<USAddress>– <Address>

<AddressLine1>PO BOX 228</AddressLine1><City>SCHENECTADY</City><State>NY</State><ZIPCode>123080000</ZIPCode></USAddress>

– </Address></USAddress></StateOfIncorporation><HdrCode>

– <FederalReturnFiledOther>String</FederalReturnFiledOther><FilerClassificationCode>AA3</FilerClassificationCode><FormType>CT5</FormType><ReturnTypeCode>CT5</ReturnTypeCode><SoftwareDeveloper><DeveloperName>

<BusinessNameLine1>Sunrise Investments Inc</BusinessNameLine1><BusinessNameLine2>A A</BusinessNameLine2>

XML enables end-to-end forms processing:

XMLrelational

XML XML XML

SQL/XMLXQueryXML

relational

XML XML XML

SQL/XMLXQuery

24

Application 1 Application 2

XMLAdapter

DB2 pureXML

XML

ApplicationA

ApplicationB

ApplicationM

Operational Systems/Heterogeneous Data Producers

Core: XML – The Backbone of SOA 1

XML

XML

Application NApplication 3

XMLAdapter

Service Bus

XMLXML XML

Stored procedures

XMLAdapter

XMLXML

Data Consumers

XML

XML • All SOA messages and business objects are XML• Standard XML format…

• …eases the integration of heterogeneous sources• …avoids implementing n x m conversions• …simplifies the integration of new applications (A2A)

• Examples: OAGIS, FpML, Origo• SQL/XML Stored Procedures implement services

• Simplifies application layer, reduces JDBC calls

Service Consumers

Service Providers

25

Application 1 Application 2

XMLAdapter

DB2 pureXML

XML

ApplicationA

ApplicationB

ApplicationM

XML

XML

Application NApplication 3

XMLAdapter

Service Bus

XMLXML XML

Stored procedures

XMLAdapter

XMLXML XML

XML

• XML manipulation on parsed XML in DB2 with declarative languages (SQL/XML, XQuery)

• Parse once, access many• Better end-to-end performance and lower CPU than

XML manipulation in the middle tier

Reduced CPU Usagein Middle Tier

Operational Systems/Heterogeneous Data Producers

Core: XML – The Backbone of SOA 2

Data Consumers

Service Consumers

Service Providers

26

Audit: Application and Event Logging• Requirement: Record "everything" that's happening, e.g. in online

banking, investment brokerage, insurance applications• Trace every step a customer takes, by date/sessionID/userID etc.• Log every event in an A2A or B2B environment

• Purpose:• Functional: application level monitors, used by call-centers, audits, etc.• Technical: system and infrastructure trouble shooting, used by IT

• Data Characteristics• "Event records" in XML format, common header fields, highly variable

body that cannot be represented in relational format• Header fields in relational columns, body in XML in the same row

XML

27http://www.ibm.com/developerworks/websphere/techjournal/0608_shetty/0608_shetty.html

Audit: Logging SOA Messages• DB2 pureXML logs all SOAP messages

regardless of structure• Troubleshooting and traceability

• Use XML indexes, XQuery, SQL/XML to analyze the log

28

DB2 Data Warehouse

Time

Sales

Product

Store

(Fact table)

XML

(Dimension table)

(Dimension table)

(Dimension table)

Flexibility: Dimensions in a Data Warehouse• Traditional relational warehouse

• Dimension table have fixed columns• Often hard to change/extend the table

definitions• Cannot add new columns for every new

characteristic of every new product• Limited agility in business reporting

• XML-enriched warehouse• Add attributes to products,

customers, etc. through XML insert and update operations

• Flexibility without adding columns• No impact to existing queries, but

allows new queries/reports to be introduced quicklyCustomer

(Dimension table)

29

Flexibility: Augment Applications with Temp Data Store

RelationalTables

Package ApplicationJSON

identification basic contact info & addressescontact persons & addressesagencycollaboratorsNomenclatures

XML(JSONx)

WebServer

Complex mappings

JSON <-> Relational

Simple mappings

JSON <-> JSONx

HTML Templates

DB2

XQuery

• Application uses JSON to communicate with User Interface• Benefits of storing JSON as JSONx in DB2 pureXML

• Can see what is happening in the application debugging • Can query the temporary data to understand application usage• Can monitor how application usage changes over time

• Variation on Audit: Logging JSONx messages

30

Content: Creating Business Reports A Sample Architecture

• System is simple and easy to understand • Less code - Many SQL stored procedures

became one XQuery statement • Quick changes can be made, e.g,

• When interfaces change application changes require half a day

• Changes are mostly in declarative languages which sometimes do not even require a recompile

• Easy access to report status • Because all state info and interim reports are

in XML, a Web based viewer is used • Piece-meal and parallel development possible

• Because of architected XML interfaces between components

• Different versions of reports handled concurrently • Because report versions and states are all

treated as well formed XML and are stored and queried together

• Source data and final reports are XML-based• DB2 pureXML simplifies data gathering, assembly, & validation• Bonus:

• Consistent and straightforward system configuration Because the applications configuration files are in XML

DataReadiness

DataCompilation &

Validation

ReportRendering Publishing

DataRetrieval

Data Readiness and

Queue Loader

Workflow ControllerJobScheduling

DataValidation

ReportAssembly Publish

Business

Technical

pureXML

Many Data Sources

ReportRender

XML XML XML XML XML XML

Report QueueViewer

DataAssembly

XML

XML

XML

XML

DataReadiness

DataCompilation &

Validation

ReportRendering Publishing

DataRetrieval

Data Readiness and

Queue Loader

Workflow ControllerJobScheduling

DataValidation

ReportAssembly Publish

Business

Technical

pureXML

Many Data Sources

ReportRender

XML XML XML XML XML XML

Report QueueViewer

DataAssembly

XML

XML

XML

XML

31

Agenda• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases

• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…

• Summary

32

Summary 1• DB2 pureXML:

• Native XML storage, indexing and processing in DB2• Access and update through SQL/XML, XQuery, XPath• Based on well-formed XML simplifying schema evolution

• When to store XML in a database?• When business objects are typically represented as

XML when transmitted • When data format (schema) changes over time• When schemas from different sources are combined to

represent an external entity• When schema is complex and highly variable

33

Summary 2• Use Cases include:

• XML at the Core of Application Functionality, e.g., forms • Auditing & Compliance, e.g., event logging• Adding Flexibility, e.g., flexible dimensions in data warehouse • XML Content Management, e.g., reporting

• Use Cases Audit and Flexibility are sensible to start with• Why XML in a database?

• Simpler to design (no data normalisations)• Speedier to develop (less mappings)• Faster to execute (less joins and shredding)• Quicker to evolve (less database structure changes)• Easier to understand (consistent data structures) • Straightforward to prototype (less customization)• Smooth integration with Web technologies (XML based)

34

Further Information 1• DB2 pureXML Wiki:

http://www.ibm.com/developerworks/wikis/display/db2xml/Technical+Papers+and+Articles

• DB2 pureXML Devotees: http://www.ibm.com/developerworks/wikis/display/db2xml/devotee

• XML Database Blog: http://nativexmldatabase.com/

• "pureXML™ in DB2 9: Which way to query your XML Data?" http://www.ibm.com/developerworks/db2/library/techarticle/dm-0606nicola/

• "XMLTABLE by Example", Part 1 & 2 http://www.ibm.com/developerworks/db2/library/techarticle/dm-0708nicola/ http://www.ibm.com/developerworks/db2/library/techarticle/dm-0709nicola/

• “Update XML in DB2 9.5”: http://www.ibm.com/developerworks/db2/library/techarticle/dm-0710nicola/

• 15 best practices for pureXML performance in DB2 9 http://www.ibm.com/developerworks/db2/library/techarticle/dm-0610nicola/

• Performance of DB2 9 pureXML vs. CLOB and shredded XML storage http://www.ibm.com/developerworks/db2/library/techarticle/dm-0612nicola/

• XML Database Benchmark: “Transaction Processing over XML (TPoX)” http://tpox.sourceforge.net/ , http://tpox.sourceforge.net/tpoxresults.htm

35

Further Information 2• Get started with Industry Formats and Services with pureXML

http://www.ibm.com/developerworks/data/library/techarticle/dm-0705malaika/ • Industry Bundles for pureXML http://www.alphaworks.ibm.com/tech/purexml/download• Universal Services for pureXML using Data Web Services

http://www.ibm.com/developerworks/data/library/techarticle/dm-0805malaika/ • Lotus Forms, XForms, and DB2 9 pureXML

http://www.ibm.com/developerworks/wikis/download/attachments/1824/LotusFormsXForms andDB2pureXML.pdf

• Build a pureXML and JSON application, Parts 1-3 http://www.ibm.com/developerworks/xml/library/x-db2JSONpt1/index.html?ca=drs-

• WebSphere DataPower and DB2 pureXML, Parts 1-2: DB2 pureXML as an audit log for WebSphere DataPower http://www.ibm.com/developerworks/data/library/techarticle/dm- 0806malaika

• IBM WebSphere Developer Technical Journal: Using DB2 Version 9 pureXML with WebSphere ESB http://www.ibm.com/developerworks/websphere/techjournal/0608_shetty/0608_shetty.html

• Using industry standard data formats with WebSphere ESB and DB2 Version 9 pureXML http://www.ibm.com/developerworks/websphere/techjournal/0706_elhilaly/0706_elhilaly.html

• Processing SEPA payments using DB2 pureXML and WebSphere Transformation Extender http://www.ibm.com/developerworks/data/library/techarticle/dm-0904sepapayments/

• IBM InfoSphere DataStage and DB2 pureXML, Parts 1-2: Integrate XML operational data into a data warehouse http://www.ibm.com/developerworks/views/data/libraryview.jsp?search_by=purexml+infosphere+s eries

36

• Comprehensive coverage of pureXML in DB2 for Linux, UNIX, Windows and DB2 for z/OS

• http://tinyurl.com/pureXML

37

pureXML Bootcamps – World-Wide• A 2 or 3 day pureXML Bootcamp – Run on demand – in

many cities globally• Includes tutoring on XQuery, SQL/XML, XML storage and

indexing, XML schema handling, universal services and more:

• The bootcamp has plenty of hands on labs to practice – including building a complete application

• Feedback includes “Fantastic – Fast paced”, “I really enjoyed the bootcamp”.

• See Q&A’s and photos here https://www.ibm.com/developerworks/wikis/display/ db2xml/BootCamps

• Watch the pureXML forum for announcements of upcoming bootcamps http://www.ibm.com/developerworks/forums/forum. jspa?forumID=1423

• Contact malaika@us.ibm.com if you would like a bootcamp run in your area

• Join the pureXML Devotee mailing list http://www.ibm.com/developerworks/wikis/display/ db2xml/devotee

38

Backup

39

Core: XML End-To-End – pureXML and Forms

Universal Services for pureXML using Data Web Services:http://www.ibm.com/developerworks/db2/library/techarticle/dm-0805malaikaForms and pureXML: http://www.ibm.com/developerworks/db2/library/techarticle/dm-0805malaika2

40

40

Relational Views over XML

<dept bldg="101"><employee id="901">

<name><first>John</first><last>Doe</last>

</name><office>344</office>

</employee><employee id="902">

<name><first>Peter</first><last>Pan</last>

</name><office>216</office>

</employee></dept>

CREATE VIEW empview(empid, firstname, lastname, office)AS SELECT X.* FROM dept,

XMLTABLE ('$DOC/dept/employee' COLUMNS empid INTEGER PATH '@id', firstname VARCHAR(30) PATH 'name/first', lastname VARCHAR(30) PATH 'name/last', office INTEGER PATH 'office') AS X

empid firstname lastname office901 John Doe 344902 Peter Pan 216

create table dept(doc XML);select lastname, officefrom empviewwhere empid = 901;

create index idx1 on dept(doc) generate keys using xmlpattern '/dept/employee/@id as sql double;

TABLE

VIEW

VIEW DEFINITION

SQL Access to XML

top related