demonstrating near real-time analytics with ibm db2 analytics

4
491 Demonstrating Near Real-Time Analytics with IBM DB2 Analytics Accelerator Daniel Martin, Iliyana Ivanova, Raphael Mueller, Luis Eduardo Velez Montoya, Klaus Maruschka IBM DB2 Analytics Accelerator Development IBM Deutschland Research & Development GmbH Schoenaicherstrasse 220 71032 Boeblingen, Germany [email protected], [email protected], [email protected] [email protected], [email protected] Abstract: Version 3 of the IBM 1 DB2 Analytics Accelerator (IDAA) takes a major step towards the vision of a universal relational DBMS that transparently processes both, OLTP and analytical-type queries in a single system. Based on heuristics in DB2 for z/OS, the DB2 optimizer decides if a query should be executed by ”mainline” DB2 or if it is beneficial to forward it to the attached IBM DB2 Analytics Optimizer that operates on copies of the DB2 tables. The new ”incremental update” functionality keeps these copy tables in sync by employing replication technology that monitors the DB2 transaction log and asynchronously applies the changes in micro-batches to IDAA. This enables near real-time analytics over online data, effectively marrying tra- ditionally separated OLTP and data warehouse environments. With IDAA, reports can access data that is constantly refreshed in contrast to traditional warehouses that are updated on a daily or even weekly basis. Without any changes to the applications and without the need to introduce cross-system ETL flows, an existing OLTP environment can be used for reporting purposes as well. In this demo, we present a near real- time reporting application modeled on an industry benchmark (TPC-DS), but with a constantly changing set of tables with over 800 million rows that is running on DB2 for z/OS. In a browser-based user interface, demo attendants can influence the rate of changes to the tables and observe how the reporting queries are capturing new data as it is being modified by a separately running OLTP workload generator. 1 Introduction IBM DB2 Analytics Accelerator (IDAA) version 3 2 is an evolution of IBM Smart Analyt- ics Optimizer (ISAO) version 1[SBS + 11] which was using the BLINK in-memory query engine[RSQ + 08, BBC + 12]. Since version 2, the query engine and hardware used as the 1 IBM, DB2, and z/OS are trademarks of International Business Machines Corporation in USA and/or other countries. Other company, product or service names may be trademarks, or service marks of others. All trade- marks are copyright of their respective owners. 2 http://www-01.ibm.com/software/data/db2/zos/analytics-accelerator/

Upload: hakhanh

Post on 01-Jan-2017

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Demonstrating Near Real-Time Analytics with IBM DB2 Analytics

491

Demonstrating Near Real-Time Analytics with IBM DB2Analytics Accelerator

Daniel Martin, Iliyana Ivanova, Raphael Mueller,Luis Eduardo Velez Montoya, Klaus Maruschka

IBM DB2 Analytics Accelerator DevelopmentIBM Deutschland Research & Development GmbH

Schoenaicherstrasse 22071032 Boeblingen, Germany

[email protected], [email protected], [email protected]@de.ibm.com, [email protected]

Abstract: Version 3 of the IBM1 DB2 Analytics Accelerator (IDAA) takes a majorstep towards the vision of a universal relational DBMS that transparently processesboth, OLTP and analytical-type queries in a single system. Based on heuristics in DB2for z/OS, the DB2 optimizer decides if a query should be executed by ”mainline” DB2or if it is beneficial to forward it to the attached IBM DB2 Analytics Optimizer thatoperates on copies of the DB2 tables. The new ”incremental update” functionalitykeeps these copy tables in sync by employing replication technology that monitorsthe DB2 transaction log and asynchronously applies the changes in micro-batches toIDAA. This enables near real-time analytics over online data, effectively marrying tra-ditionally separated OLTP and data warehouse environments. With IDAA, reports canaccess data that is constantly refreshed in contrast to traditional warehouses that areupdated on a daily or even weekly basis. Without any changes to the applications andwithout the need to introduce cross-system ETL flows, an existing OLTP environmentcan be used for reporting purposes as well. In this demo, we present a near real-time reporting application modeled on an industry benchmark (TPC-DS), but with aconstantly changing set of tables with over 800 million rows that is running on DB2for z/OS. In a browser-based user interface, demo attendants can influence the rate ofchanges to the tables and observe how the reporting queries are capturing new data asit is being modified by a separately running OLTP workload generator.

1 Introduction

IBM DB2 Analytics Accelerator (IDAA) version 32 is an evolution of IBM Smart Analyt-ics Optimizer (ISAO) version 1[SBS+11] which was using the BLINK in-memory queryengine[RSQ+08, BBC+12]. Since version 2, the query engine and hardware used as the

1IBM, DB2, and z/OS are trademarks of International Business Machines Corporation in USA and/or othercountries. Other company, product or service names may be trademarks, or service marks of others. All trade-marks are copyright of their respective owners.

2http://www-01.ibm.com/software/data/db2/zos/analytics-accelerator/

Page 2: Demonstrating Near Real-Time Analytics with IBM DB2 Analytics

492

IBM DB2 Analytics

Accelerator

SPU

Netezza

Process

HW

Monitoring

OS

Security

SPU SPU ...

Queries & heartbeat via DRDAQueries and

Maint. Tasks

HW Failure

Notifications

Service

Access

Stored Procedures

DDF

CatalogMatching

SQL Rewrite

Admin tasks (i.e. LOAD) via DRDA

OSA-Express3

10 GbE

Primary

Backup

OLTP workload

Analytical workload

Administration

IBM zEnterprise EC12 IBM Netezza 1000

Figure 1: IDAA components and interaction with System z

basis for IDAA is Netezza 1000 (also known as ”TwinFin”)3. Figure 1 gives an overviewof the involved components: IDAA is an appliance add-on to DB2 for z/OS running onan IBM zEnterprise 196 or EC12 mainframe, it comes in form of a purposely-built soft-ware and hardware combination that is attached to the mainframe via redundant 10GBfiber-channel connections to allow DB2 for z/OS to transparently offload scan-intensivequeries. The acceleration factor compared to DB2 for z/OS standalone for such kind ofqueries can be up to 2000, because IDAA processes table scans on all of its disks in par-allel, leveraging FPGAs to apply decompression, projection and restriction operations.Pushing these operations down to the level of the disk dramatically decreases the over-all size of data required to be shipped over the internal network fabric and between themain parallel processing units (called SPU in Figure 1) that process joins, aggregations orcomplex expressions.

It is very important to note that nothing has to be changed on existing applications usingDB2 for z/OS in order to get these benefits: they are connecting to DB2 for z/OS and areusing the DB2 SQL syntax. In fact, there is no indication to an application if a query wasprocessed by ”mainline” DB2 or by IDAA. Based on how many fetch operations are esti-mated for a given query and how well existing indices match the predicates of that query,DB2 decides whether to process the query by itself or if it should be offloaded to IDAA.Deep integration into existing DB2 components ensures that only minimal training is re-quired to operate IDAA with DB2 for z/OS as compared to DB2 for z/OS standalone. DB2for z/OS is the owner of the data; data maintenance, backup and monitoring procedures donot change. Because IDAA operates on copies of the tables in DB2 for z/OS, whenever

3http://www-01.ibm.com/software/data/netezza/1000/

Page 3: Demonstrating Near Real-Time Analytics with IBM DB2 Analytics

493

there are changes to these tables, the changes need also to be reflected in IDAA. Currently,this is done by running DB2 UNLOAD utilities to either refresh an entire table in IDAAor by refreshing the changed partitions of that table.

The vision of IDAA is to allow reporting over ”online” data, providing a solution thatcombines the features of traditionally separated OLTP and reporting systems that are con-nected by ETL jobs. For such scenarios, the granularity of the aforementioned refreshmechanisms is too coarse: if the total size of the changes is very low but the changes itselfare spread over multiple partitions (or the tables itself are not partitioned), re-loading themajority of partitions or an entire table causes a lot of unnecessary work. Also, the practi-cal minimum latency that can be achieved by running the refresh procedures is in the rangeof hours, it is impractical to run the UNLOAD-based refresh procedures every hour or evenless than that. IDAA version 3 introduces a feature called ”Incremental Update” that of-fers a mechanism to refresh the copy tables in IDAA by asynchronously monitoring theDB2 transaction log for changes. The changes are staged in memory and are transferredover the network to IDAA once their associated unit of work has been committed. Forefficiency reasons, IDAA applies the changes it receives in ”micro batches” that typicallycontain about 60 seconds of data from committed transactions sent from the log captureprocess on DB2 for z/OS. Multiple changes to a row within a micro batch are consolidatedto a single change during the 60 seconds collection phase.

2 Demo Scenario

The demo is running on an IBM zEnterprise 196 mainframe running DB2 for z/OS whichis connected to IDAA on a Netezza 1000-3 hardware. The IDAA hardware being used isa quarter-rack system that internally uses 24 1TB disks, 3 IBM HS-22 blades with 2 IntelWestmere 2.4Ghz 4-core CPUs and 3 FPGA daughter-boards on the Intel-blades with 2 4-core Xilinx FPGAs each. Communications in this FPGA-x86 cluster are coordinated usingtwo IBM 3650 M3 hosts with 2 Intel Nehalem 2.4 Ghz CPUs each. The demo workloadis based on TPC-DS4 with the fact table (STORE SALES) at a size of about 800.000.000rows. Figure 2 depicts the demo scenario: a web interface allows users to run TPC-DS-based reporting queries and view the results as the queries are being processed. A secondoption allows users to start an OLTP workload generator that modifies the data by runninghigh-frequent insert transactions on the tables. While the OLTP workload is running,users can again submit the reporting queries and observe that the reports are constantlychanging as they are picking up the new data from the workload generator. The demo alsoprovides status information about the total number changes that have been captured, thetotal number of changes that have been applied and an estimate of the current observablelatency of the report, i.e. how far back in seconds is the current snapshot of the data onIDAA compared the state of the tables on DB2.

We will also provide a second demo running on this system that focuses on the capability ofthe DB2 for z/OS optimizer heuristic to distinguish between an analytical query that would

4http://www.tpc.org/tpcds/default.asp

Page 4: Demonstrating Near Real-Time Analytics with IBM DB2 Analytics

494

;:9 8753 1/A@? 5>=<O<8A@;:9 8753 1/A@? 5>=<O<8A@

NMLKJAI

H@GO 3G/@FA

3?I=AOI

;:9 HG@ 8753 ;L=L"LIA! ASJS 5;3;:9 HG@ 8753 ;L=L"LIA! ASJS 5;3

;:9 KL=<RA

>@GFAII<KJ

;:9 KL=<RA

>@GFAII<KJ

6330-0*'%#*

!*#30HHECA

6330-0*'%#*

!*#30HHECA

5>A@L=<GKLQ PKLQ?=<FI! ,A>G@=I! 5+P*! )5>A@L=<GKLQ PKLQ?=<FI! ,A>G@=I! 5+P*! )

3<O/QL=A(

NGK=<K/G/I

1/A@?

*@GFAII<KJ

3<O/QL=A(

NGK=<K/G/I

1/A@?

*@GFAII<KJ

?C3*0=0C%'- :!8'%0

%# 500! C0'* *0'- %E=0 20*HE#C #/ H0-03%08 %',-0H

P>>Q<FL=<GK

P>>Q<FL=<GK

1/A@? A'AF/=<GK GK KL=<RA ;:9 &

%+L=AI= (L=L FMLKJAI

% $<JM @A>IGKIA =<OAI HG@ #FGO>QA'6 4/A@<AI

)&0*$ 0"03&%E#C #C 6330-0*'%#* G

%F0'* *0'-D%E=0 20*HE#C #/ 8'%'

% B#@ *0H!#CH0 %E=0H /#* 'C'-$%E3'- >&0*E0H

N/@@AKF? GH (L=L =@<JJA@A( "?&

%*/IM<KJ 2K=A@RLQ

%3=L=/I GH 2KF@AOAK=LQ 0>(L=A #5K75HH6

<0=# ;0,D?C%0*/'30 /#* 68=EC #/

?C3*0=0C%'- :!8'%0G

.KL"QA7;<IL"QA 0>(L=AI

;<I>QL? I=L=/I

-GK<=G@<KJ +GJ (A=L<QI

<0=# ;0,D?C%0*/'30 /#* 68=EC #/

?C3*0=0C%'- :!8'%0G

.KL"QA7;<IL"QA 0>(L=AI

;<I>QL? I=L=/I

-GK<=G@<KJ +GJ (A=L<QI

<0=# ;0,D?C%0*/'30 %# %*EAA0* 97'CA0H /*#=

4#&*30 H$H%0=HG 3=L@=73=G> ;:9 />(L=AI

<0=# ;0,D?C%0*/'30 %# %*EAA0* 97'CA0H /*#=

4#&*30 H$H%0=HG 3=L@=73=G> ;:9 />(L=AI

<0=# ;0,D?C%0*/'30 /#* 10H0%G

% +GL( 3=L@= (L=L GK ;:9 LK( 2;PP

<0=# ;0,D?C%0*/'30 /#* 10H0%G

% +GL( 3=L@= (L=L GK ;:9 LK( 2;PP

<0=# ;0,D?C%0*/'30 /#* )&0*$ ."03&%E#CG

ES 3=L@= ,A>A=<=<RA .'AF/=<GK GH 1/A@? GK 2;PP

9S 3MGD 1/A@? @AI/Q=I! M<JMQ<JM=<KJ =MA RLQ/AI

LHHAF=A( "? =MA FGK=<K/G/I FMLKJAI =@<JJA@A(

H@GO =MA IG/@FA I?I=AOI

CS P= =MA I<(A B IMGD =MA QL=AI= (L=L FMLKJAI LI

=@<JJA@A( H@GO =MA IG/@FA I?I=AOS

47#@ C0'*D*0'-%E=0 + -#@ -'%0C3$ (

<0=# ;0,D?C%0*/'30 /#* )&0*$ ."03&%E#CG

ES 3=L@= ,A>A=<=<RA .'AF/=<GK GH 1/A@? GK 2;PP

9S 3MGD 1/A@? @AI/Q=I! M<JMQ<JM=<KJ =MA RLQ/AI

LHHAF=A( "? =MA FGK=<K/G/I FMLKJAI =@<JJA@A(

H@GO =MA IG/@FA I?I=AOI

CS P= =MA I<(A B IMGD =MA QL=AI= (L=L FMLKJAI LI

=@<JJA@A( H@GO =MA IG/@FA I?I=AOS

47#@ C0'*D*0'-%E=0 + -#@ -'%0C3$ (

Figure 2: IDAA ”Incremental Update” Demo Scenario

benefit when being offloaded to IDAA and transactional queries that run best on DB2for z/OS. The workload generator in this demo simulates a real world business analyticsworkload with a mix of transactional and complex analytical queries (coming e.g. fromIBM Cognos BI5) that are triggered by a number of concurrent users. From the momentIDAA is enabled on the system, demo attendants will notice a drastic decrease in responsetime for the complex analytical queries and a major increase in throughput of the overallsystem.

References

[BBC+12] R. Barber, P. Bendel, M. Czech, O. Draese, F. Ho, N. Hrle, S. Idreos, M.S. Kim,O. Koeth, and J.G. Lee. Business Analytics in (a) Blink. IEEE Data EngineeringBulletin, 35(1):9–14, 2012.

[RSQ+08] V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Si-dle. Constant-time query processing. In IEEE 24th International Conference on DataEngineering (ICDE), pages 60–69, 2008.

[SBS+11] K. Stolze, F. Beier, K.U. Sattler, S. Sprenger, C.C. Grolimund, and M. Czech. Architec-ture of a Highly Scalable Data Warehouse Appliance Integrated to Mainframe DatabaseSystems. Database Systems for Business, Technology, and the Web (BTW), 2011.

5http://www-01.ibm.com/software/analytics/cognos/