data warehousing on system z what is available & how to ... · pdf filefrom business...
TRANSCRIPT
© 2008 IBM Corporation IBM Systems
Data Warehousing on System zWhat is available & How to implement
Expanding System z’s Role in Data Warehouse & Business Intelligence Implementations
Agenda
• Market Trends:From Business Automation to Business Optimization
• Why key markets trends are playing off for System z:Positioning System z strengths for Business Intelligence and Data Serving
• What is available on System z & How to implement
• Summary
Agenda
New Initiatives Increasingly Focus on OptimizationOrganizations Striving for Competitive Advantage
3
BusinessOptimization
InformationAgenda
BusinessAutomation
ApplicationAgenda
Faster Processing, Reduced Costs
Competitive AdvantageBusiness Optimization Growth is 2 Times Faster thanBusiness Automation
$594B5.1% CGR
Call CenterOperations
ERP & Financials
Supply ChainManagement
CustomerProfitability
Financial Risk Insight
5.1% CGR$594B
IT Spending Estimate*
2008
Dynamic DemandPlanning
Leveraging Information to Create Business Value
OLAP & Data Mining
−Merchandising, Inventory,
Operations
Manage and optimizedaily business operations�Many operational users�Unpredictable workload�Mission critical
Manage tactical initiatives to achieve strategic goals�What if analysis�Flexibility
Information On Demand
−Optimize Each Transaction
−Call Centers, Field Ops
Query & Reporting
− Financials, Sales
After the fact analysis�Limitted number of users�Scheduled and predicted workload
Customer Scenarios for Operational BI
Retail – inventory planningInform mangers how products sell
on an hourly base
Retail – inventory planningInform mangers how products sell
on an hourly base
Insurance – Claim processingInform call center agents about
fraudulent claim probability
Insurance – Claim processingInform call center agents about
fraudulent claim probability
Customer ServiceInform call center agents about actual life time value of customer
Customer ServiceInform call center agents about actual life time value of customer
Logistics – Truck fleet planningInform admin about possible
business impact of trucks being late
Logistics – Truck fleet planningInform admin about possible
business impact of trucks being late
Credit Card Service - Card identificationInform OLTP system about
possibly stolen cards, dependent on OLTP transaction profile
Credit Card Service - Card identificationInform OLTP system about
possibly stolen cards, dependent on OLTP transaction profile
Banking – Customer ServiceInform call center agents on
possible cross selling opportunitiesdependent on actual transaction
Banking – Customer ServiceInform call center agents on
possible cross selling opportunitiesdependent on actual transaction
Why key markets trends are playing off for System z:Positioning System z strengths for Business
Intelligence and Data Serving
Key questions for your DW Platform Design
1. Which DBMS to use for Datawarehousing
2. Where to place your Datawarehouse
3. How to solve the biggest challenge for dynamic DW: the mixed workload performancechallenge
4. Plan for growth, the requirement for Scalability
5. TCO for your DW
1. Which DBMS to use for Datawarehousing
“The transactional DBMSs have an edge
that challenges the DW DBMSs (such as Teradata)”Gartner Data Warehouse Magic Quadrant, 2006
Transactional
DBMS
DW
DBMS
Advanced Data Partioning
MQTs
Cubing Services2008
1970s IBM Information Management, 2008
2. Where to place your Datawarehouse
The larger the amount of data, the more likely it is to deploy operational BI centra lly.
Ventana Research 2007
A centralized approach can incorporate Information management to consolidate
• data integration • data transformation• metadata management
rather than have these information processes handled differently by each department or line of business.
2. Where to place your Datawarehouse: DW on z Solution Architecture
Member A
OLTP
CEC One
Member B
OLTP
CEC Two
DB2 DataSharingGroup
CFCF
CEC One CEC TwoCFCF
DB2 DataSharingGroupMember C
DWHMember D
DWH
Within a data sharing environment, the data warehouse resides in the same group as the transactional data.
The mixed workload performance will become the single mostimportant performance issue in DW
3. How to solve the biggest challenge for dynamic DW: the mixed workload performance challenge
The emergence of issues based on a mix of 4 DW work loads– Continous (near-real-time) data loading – similar to an OLTP
workload – Large numbers of standard reports – An incresing number of true ad hoc query users – An increasing level of analytics and BI-oriented functionality in
OLTP
Information On Demand
to Optimize Real-Time Processes
Dynamic Warehousing +
Operational BI
Gartner Data Warehouse Magic Quadrant, 2006
Traditional workload management approach
–Screen queries before they start execution
–Time consuming for DBAs.
–Some large queries slip through the crack.
–Running these queries degrade system performance.
–Cancellation of the queries wastes CPU cycles.
Workload Management
The ideal workload manager policy for data warehousing:
Consistent favoring ofshorter running work........
with select favoring of criticalbusiness users
keep em shortthrough WLM period aging
through WLM explicit prioritization of critical users
Workload Management
discretionary5
43010000004
3401000003
260500002
28050001
ImportanceVelocityDurationPeriod
Period aging can be used to lower the priorities of a query execution after it exceeds a specific amount of CPU time. This way, short running queries run with higher priority while long running queries are not blocking important resources.
Workload Management assigned to Service classes
service class A velocity 50 importance 3
service class B velocity 100 importance 1
1 hour to complete
5 minutes to complete
Some queries (the “CEO” queries) still need to return in a minimum amount of time. Their priority can override the period aging through their service class assignments.
3. How to solve the biggest challenge for dynamic DW: the mixed workload performance challenge
The mixed workload performance will become the single most important performance issue in DW
Information On Demand
to Optimize Real-Time Processes
Dynamic Warehousing
Gartner Data Warehouse Magic Quadrant, 2006
Consistent favoring ofshorter running work........
keep em shortthrough System z WLM period aging
The ideal workload manager Policy for data warehousing
4.Plan for growth, the requirement for Scalability: Scalability with DB2: Single Query Parallelism
CPU Intensive Queries
10 CPU
20 CPU
20 CPU
0
500
1,000
1,500
2,000T
ime
in s
eco
nd
s
92% scalability
1874s
1020s
937s
PerfectScalability
Sysplex
measure
d
SMP
4. Plan for growth, the requirement for Scalability: Scalability with DB2: Multiuser throughput
Query Throughput
0
100
200
300
400
500
600
700
293
551
C
ompl
etio
ns @
3 H
ours
94% scalability
(Sysplex)10 CPU
20 CPU
(30 users)
(60 users)
S M P
Mainframe Cost Per Unit of Work Goes Down as Workload Increases
Data Center Workload
Cos
t per
uni
t of w
ork
Mainframe
Distributed scale out
Most TCO benchmarks compare single applications
Most businesses operate here
Worldwide IT spending trend is playing off for system z
Source: IDC, Virtualization 2.0: The Next Phase in Customer Adoption, Doc #204904, Dec 2006
$0
$50
$100
$150
$200
$250
$300
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Installed Base(M Units)
Spending(US$B)
0
5
10
15
20
25
30
35
40
45
50
1996
Power and cooling costs
Server mgmt and admin costs
New server spending
Worldwide IT Spending TrendWorldwide IT Spending Trend
$0
$50
$100
$150
$200
$250
$300
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Installed Base(M Units)
Spending(US$B)
0
5
10
15
20
25
30
35
40
45
50
1996
$0
$50
$100
$150
$200
$250
$300
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Installed Base(M Units)
Spending(US$B)
0
5
10
15
20
25
30
35
40
45
50
1996
$0
$50
$100
$150
$200
$250
$300
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Installed Base(M Units)
Spending(US$B)
0
5
10
15
20
25
30
35
40
45
50
1996
Power and cooling costs
Server mgmt and admin costs
New server spending
Power and cooling costs
Server mgmt and admin costs
New server spending
Worldwide IT Spending TrendWorldwide IT Spending Trend
2000 – Raw processing Raw processing ““horsepowerhorsepower”” is the primary is the primary goal, while the infrastructure goal, while the infrastructure to support it is assumed to support it is assumed readyready
2006 –– Raw processing Raw processing ““horsepowerhorsepower”” is a given, but is a given, but the infrastructure to support the infrastructure to support deployment is a limiting factordeployment is a limiting factor
Power and cooling spend may eventually exceed new s erver spending
Management/Administration costs are already exceedi ng new server spending
Technology Evolution with Mainframe Specialty Engines
Internal Coupling Facility (ICF) 1997
Integrated Facility for Linux (IFL) 2001
IBM System z9 Integrated Information Processor (IBM zIIP) 2006
System z9 Application Assist Processor (zAAP) 2004
�Building on a strong track record of technology innovation with specialty engines, IBM is introducing the System z9 Integrated Information Processor
�Support for new workloads and open standards
� Designed to help improve resource optimization for eligible data workloads within the enterprise
� Centralized data sharing across
mainframes
� Incorporation of JAVA into existing mainframe solutions
• Portions of the following DB2 for z/OS V8 workloads may benefit from zIIP*:
2 - Data warehousing applications*• Requests that utilize parallel queries
3 - DB2 for z/OS V8 utilities LOAD, REORG & REBUILD*• DB2 utility functions used to maintain index maintenance structures
1 - ERP, CRM, Business Intelligence or other enterprise applications• Via DRDA over a TCP/IP connection (enclave SRBs, not stored procedures or UDFs)
* The zIIP is designed so that a program can work w ith z/OS to have all or a portion of its enclave Se rvice Request Block (SRB) work directed to the zIIP. The above t ypes of DB2 V8/9 work are those executing in enclav e SRBs, of which portions can be sent to the zIIP.
DB2 V8 exploitation of IBM zIIP value add
New Specialty Engine
Measured zIIP offload for a DW workload
0 10 20 30 40 50 60 70 80 90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
% Offload to zIIP
Hardware-assisted data compression
0
20
40
60
80
100
Com
pressed Ratio
Compressed Non-compressed
53%46%
61%
I/O Intensive CPU Intensive
281
157
283
200
354
412
Elapsed tim
e (sec)
Non-Compressed Compressed 60%I/O Wait I/O WaitCPU
CPU
Compress CPUOverhead
Effects of Compression on
Elapsed Time
Compression Ratios
Achieved
In V9Indexes
Can also be compressed
Why DW on z
• Requirements for DWH / BI users are changing :– there is an increasing demand for accessing warehouse data in
a near-real-time way– Business intelligence is becoming integrated into operational
processes – Data security, availability requirements are increasing – Large groups of users – Workload management capabilities of z/OS allow optimization
of system performance and prioritization of large volumes of queries, and workloads (OLTP, DWing, batch – ETL)
– Linux on System z offers brings software options to system z– The ETL process to bring operational data into a DWH has to
be performed more often and becomes more and more time critical
Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture
DB2 DataSharingGroup
WebSphere Classic Federation
LegacyDataSource
LegacyDataSource
WebSphere Classic Data Ev entPublisher
DataStage
Quality Stage
MetaData
Inf ormationServ er
Insert/Update
Extract/Query
SOA Serv er
WebSphere MQ
z/OS zLinux/xLinux/AIX...
ODBC
Analy tics andReporting
BI Query Workload
Data warehouse, Transf ormationand Deploy ment
Fed
erat
ion
and
Cha
nge C
aptu
re
Best Practises: Solution Architecture on System z
Member A
OLTP
CEC One
Member B
OLTP
CEC Two
DB2 DataSharingGroup
CFCF
CEC One CEC TwoCFCF
DB2 DataSharingGroupMember C
DWHMember D
DWH
Within a data sharing environment, the data warehouse resides in the same group as the transactional data.
Best Practises: Initial load of the Data Warehouse
Member A
OLTP
Member B
OLTP
CEC One CEC TwoCFCF
Member C
DWH
Member D
DWH
ETL Accelerator
Extract
Load
Software AGAdabas
VSAM,IAM &
sequential
CA IDMS
CA Datacom
IMS
Classic Federation Server
z/OS
DB2 UDBfor LUW
Oracle SQL Server
Distributed Data Sources- Integration -The data is extracted from the OLTP information of the data
sharing group, transformed by DataStage and then loaded into the data warehouse tables again.
Legacy data sources are integrated through the Classic Federation Server. This way data is extracted from DataStagedirectly out of the data sources like IMS, VSAM…
Distributed data sources are directly integrated through the IBM Information server.
Best Practises: At runtime, the DWH is updated incrementally
Member B
OLTP
CEC One CEC TwoCFCF
Member C
DWH
Member D
DWH
DB2
VSAMSoftware AGAdabas
CaptureIMS
Data Event Publisher
CA-IDMS
z/OS
DB2 UDBfor LUW
Oracle
Data Event Publisher
Capture
Distributed
MQ
MQ
MQ
Update & Insert
Once, the data warehouse is loaded, only incremental updates are performed. The changes on the original data sources (z/OS and distributed) are captured by WebSphere Data Event Publisher and sent through MQ to the ETL Accelerator. Special DataStage stages take the change information from MQ and updates or inserts are performed on the data warehouse data. This might of course be delayed to batch windows by just receiving the changes and only updating the warehouse at a later point in time.
Member A
OLTP
Cha
nge
Cap
ture
ETL Accelerator
Best Practises: In Database ELT is triggered by DataStage
Member A
OLTP
Member B
OLTP
CEC One CEC TwoCFCF
Member C
DWH
Member D
DWH
Simple example:
-- Aggregate by salary by department into AGGRSALARY
INSERT INTO AGGRSALARY ( DEPTCODE, AVGBAND, AVGSALARY )SELECT DEPTCODE, AVG( BAND ) AS AVGBAND, AVG( SALARY ) AS AVGSALARY FROM STAFFGROUP BY DEPTCODE
Wherever possible, “In Database” transformations (ELT) are used to spare the transport of the data to the accelerator. But the used SQL is still sent from the ETL Accelerator to the database to have one place of documentation for all ETL steps. This can also be used to shift the data up the hierarchy within the Layered Data Architecutre.
SQLELT
ETL Accelerator
Summary
Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture
DB2 DataSharingGroup
WebSphere Classic Federation
LegacyDataSource
LegacyDataSource
WebSphere Classic Data Ev entPublisher
DataStage
Quality Stage
MetaData
Inf ormationServ er
Insert/Update
Extract/Query
SOA Serv er
WebSphere MQ
z/OS zLinux/xLinux/AIX...
ODBC
Analy tics andReporting
BI Query Workload
Data warehouse, Transf ormationand Deploy ment
Fed
erat
ion
and
Cha
nge C
aptu
re
DB2 for z/OS features that support DWHing
• 64-Bit Addressability
• 2000 byte index keys
• MQT’s
• Multi-Row Operations
• 225 way table joins
• In-Memory Workfiles
• Automatic Space Allocation
• Non-Uniform Distribution Statistics on Non-Indexed Columns
• Parallel Sorting
• Data Partitioned Secondary Indexes
• 2MB SQL Statements
• Etc.
• Partition by growth
• Index Compression
• Dynamic index ANDing
• New row internal structure for faster VARCHAR processing
• Fast delete of all the rows in a partition
• Deleting first n rows
• Skipping uncommitted inserted/updated qualifying rows
• Etc.
A whitepaper can be downloaded from: http://www.ibm.com/software/sw-library/en_US/detail/A016040Z53841K98.html
Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture
DB2 DataSharingGroup
WebSphere Classic Federation
LegacyDataSource
LegacyDataSource
WebSphere Classic Data Ev entPublisher
DataStage
Quality Stage
MetaData
Inf ormationServ er
Insert/Update
Extract/Query
SOA Serv er
WebSphere MQ
z/OS zLinux/xLinux/AIX...
ODBC
Analy tics andReporting
BI Query Workload
Data warehouse, Transf ormationand Deploy ment
Fed
erat
ion
and
Cha
nge C
aptu
re
Unix, Linux, Wintel, zLinux*
Reports
Websphere
Information Service Framework
Logging
Authorization
ConnectorAccess
Services
Scheduling
Metadata
Reporting AccessAnalysis
WAS
* Q4, 2007
DB2
Metadata
PX Engine
DataStage Server
DSEngine
Data Event Publisher
Classic FederationServer
SEQ. VSAM IDMS Adabas Datacom
WebSphereMQ
DB2
SQL
SFTP or Batch Pipes
z/OS
IMS
z/OSConnectors
Non-z/OSConnectors
Oracle, SQL Server, …
DS,QS ClientWeb Console
DS,QS
IA Client
IA WISD
WISD Client
ADMIN, BG,
Reports
Meta Brokers
Windows
MetadataWorkbench
� Rich connectivity with:� Application & distributed data� DB2 for z/OS� Classic Federation for more z-data� Data Event Publishers for CDC� Optional DB2 for z/OS repository
� Continued native z/OS deployment options
� Common reusable services framework leverages the power of a SOA environment
� Meta data repository promotes:� reuse� compliance to standards� visual lineage� impact analysis
PX Engine(USS)
IMS
VSAM
SEQ.
z/OS Operators
New Linux for z deployment option� Robust, parallel processing� Hipersocket connectivity to z data� Full Information Server suite:
QualityStage, Information Analyzer…� Minimal impact on z/OS costs:
Leverages IFLs and zIIPs
Run Time Library DSMVS DSMVS Jobs
DS MVS UPLOADCOBOL,
andJCL
Metadata
DB2
Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture
DB2 DataSharingGroup
WebSphere Classic Federation
LegacyDataSource
LegacyDataSource
WebSphere Classic Data Ev entPublisher
DataStage
Quality Stage
MetaData
Inf ormationServ er
Insert/Update
Extract/Query
SOA Serv er
WebSphere MQ
z/OS zLinux/xLinux/AIX...
ODBC
Analy tics andReporting
BI Query Workload
Data warehouse, Transf ormationand Deploy ment
Fed
erat
ion
and
Cha
nge C
aptu
re
Read-from and write-to mainframe data sources using SQL from Unix, Windows, Linux and JVM platforms
• Standard ODBC/JDBC SQL interfaces– ODBC 3.5.1, JDBC 2.1 (some 3.0) type 4
• Metadata-driven means:– No mainframe programming
required Fast installation &– Configuration Ease of maintenance
• Works with existing and new:– Mainframe infrastructure– Application infrastructure– Toolsets
• Deliver mainframe data to– Self-service portals (real-time account details)– e-commerce solutions (real-time inventory)– Reporting and analytical tools, e.g. Business Objects– Data transformation and cleansing (e.g. DataStage, QualityStage)
for data warehousing, ODS, MDM, CDI, …
WebSphere Classic Federation Server for z/OSThe basics
DB2 UDBfor z/OS
Software AGAdabas
VSAM,IAM &
sequential
CA IDMS
CA Datacom
IMS
Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture
DB2 DataSharingGroup
WebSphere Classic Federation
LegacyDataSource
LegacyDataSource
WebSphere Classic Data Ev entPublisher
DataStage
Quality Stage
MetaData
Inf ormationServ er
Insert/Update
Extract/Query
SOA Serv er
WebSphere MQ
z/OS zLinux/xLinux/AIX...
ODBC
Analy tics andReporting
BI Query Workload
Data warehouse, Transf ormationand Deploy ment
Fed
erat
ion
and
Cha
nge C
aptu
re
WebSphere Classic Data Event Publisher for z/OS
• Real-time &/or background capture & publishing of data changes for:– VSAM files
– IMS databases
– CA-IDMS databases
– Software AG Adabas databases
• IBM owns multiple Event Publishing infrastructures:– WebSphere Data Event Publisher (DB2 on LUW and z/OS)
– WebSphere Classic Data Event Publisher (VSAM, IMS, …)
– WebSphere Business Event Publisher (focus is on PROCESS, some data)
VSAMchanges
IMS
changes
DB2 UDB
for z/OS changes
changes changes
CA-IDMS
changes
WebSphere MQ
AdabasChanges
Data Event Publisher Classic Data Event Publisher
Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture
DB2 DataSharingGroup
WebSphere Classic Federation
LegacyDataSource
LegacyDataSource
WebSphere Classic Data Ev entPublisher
DataStage
Quality Stage
MetaData
Inf ormationServ er
Insert/Update
Extract/Query
SOA Serv er
WebSphere MQ
z/OS zLinux/xLinux/AIX...
ODBC
Analy tics andReporting
BI Query Workload
Data warehouse, Transf ormationand Deploy ment
Fed
erat
ion
and
Cha
nge C
aptu
re
Available since March 2007: IBM DataQuant• Adds compelling new Warehouse/Business Intelligence component to WH on z
– Visual Dashboards, Enhanced Graphical Reporting & Analytics, Security and Personalization, SOA Layer
– Adds over 100 analytical functions to queries, reports and visual solutions• Extends page-based QMF ‘visual reports’ with a comprehensive array of charts
and graphical objects• Transforms text-based QMF reports into highly graphical, information-rich
documents
Analytics with DB2 Alphablox
• Provide reporting and dashboard capabilities on key indicators
• Leverage operations such as ranking, ordering, filtering, trending, and other sophisticated statistical functions and calculations
• Drive data analysis from multiple data sources, both relational and multidimensional
DataWarehousing on system z – IBM at your help
• Whitepaper : Why Data Warehousing on System zavailable in the WEB http://www-306.ibm.com/software/data/db2bi/systemz.html
• DW on system z – Demo available in the Technical Marketing Competence Center, Böblingen, Germany, [email protected]
• DW on system z – Customer Briefings possible in the Technical Marketing Competence Center, Böblingen,[email protected]
• DW on system z – ‘Redbook ’ available